Conditional Preg_Match and Replace (PHP) - php

I have preg_matches running. It works as follows:
Search for starting tag
Search for ending tag
This works; however, the page that I get data back from sometimes does not have data for that tag field. So instead of what should be a normal
<Field1>Data Here</Field1>
shows up as
<Field/>
So as you can see above, if there is no data (rather than not show the tag) it puts one ending tag and changes the tag itself too. Unfortunately, I need to enter "NA" for that data which may or may not be present.
(Note the <Field/> Not </Field>.
I'm curious to any thoughts you might have on being able to accomplish a workaround.
* Search for <field></field>
* Also search for ></field>
* Replace ></field> with <field></field> to match it all up.
Here is what I am using currently:
if(preg_match_all('#<(TicketNbr|Summary|Resolution|Site_Name|date_entered|status_description
|ServiceType)>\\s*(.*?)\\s*</\\1>#is', $resp, $m) ) {
so I figured I could go right into possibly a
preg_realace which I believe will work like match_all just replacing them.
Will a preg_replace work against the above preg_match_all or if I could just tie in a
><field/>
into the preg_match_all.

Related

preg_replace_callback quirk? doesn't match a certain type of string

Is this some quirk with the preg_replace_callback function or something?
The full code is really complex but I'll post the specific problem here which I found after alot of debugging, a specific type of tag doesn't get matched at all, any ideas why this could be?
preg_replace_callback('/(?:<del>(.|\n)*?<\/del>)|(?:<ins>(.|\n)*?<\/ins>)|(?P<DOT>[\s\S])/', function($match) {
//does something and runs fully for every tag except the tag below
}
it matches other tags including html inside del tags but this sort of img tag isn't matched for some reason and the code just crashes at this point without calling the callback function at all. I checked using an online regex tester that this should match.
<del><img src="?tpl=img&url=1/68/thumb/Back-View-Of-Statue-Of-Liberty-With-Full-Moon.jpg" srcset="?tpl=img&url=1/68/thumb/Back-View-Of-Statue-Of-Liberty-With-Full-Moon.jpg 320w,?tpl=img&url=1/68/med/Back-View-Of-Statue-Of-Liberty-With-Full-Moon.jpg 480w,?tpl=img&url=1/68/large/Back-View-Of-Statue-Of-Liberty-With-Full-Moon.jpg 640w" sizes="(max-width:320px) 320px, (max-width:480px) 480px, 640px"></del>
Just asking incase anyone has a clue or could point me in the right direction.

Recursive Regex in PHP with variable names

I try to make bbcode-ish engine for me website. But the thing is, it is not clear which codes are available, because the codes are made by the users. And on top of that, the whole thing has to be recursive.
For example:
Hello my name is [name user-id="1"]
I [bold]really[/bold] like cheeseburgers
These are the easy ones and i achieved making it work.
Now the problem is, what happens, when two of those codes are behind each other:
I [bold]really[/bold] like [bold]cheeseburgers[/bold]
Or inside each other
I [bold]really like [italic]cheeseburgers[/italic][/bold]
These codes can also have attributes
I [bold strengh="600"]really like [text font-size="24px"]cheeseburgers[/text][bold]
The following one worked quite well, but lacks in the recursive part (?R)
(?P<code>\[(?P<code_open>\w+)\s?(?P<attributes>[a-zA-Z-0-1-_=" .]*?)](?:(?P<content>.*?)\[\/(?P<code_close>\w+)\])?)
I just dont know where to put the (?R) recursive tag.
Also the system has to know that in this string here
I [bold]really like [italic]cheeseburgers[/italic][/bold] and [bold]football[/bold]
are 2 "code-objects":
1. [bold]really like [italic]cheeseburgers[/italic][/bold]
and
2. [bold]football[/bold]
... and the content of the first one is
really like [italic]cheeseburgers[/italic]
which again has a code in it
[italic]cheeseburgers[/italic]
which content is
cheeseburgers
I searched the web for two days now and i cant figure it out.
I thought of something like this:
Look for something like [**** attr="foo"] where the attributes are optional and store it in a capturing group
Look up wether there is a closing tag somewhere (can be optional too)
If a closing tag exists, everything between the two tags should be stored as a "content"-capturing group - which then has to go through the same procedure again.
I hope there are some regex specialist which are willing to help me. :(
Thank you!
EDIT
As this might be difficult to understand, here is an input and an expected output:
Input:
[heading icon="rocket"]I'm a cool heading[/heading][textrow][text]<p>Hi!</p>[/text][/textrow]
I'd like to have an array like
array[0][name] = heading
array[0][attributes][icon] = rocket
array[0][content] = I'm a cool heading
array[1][name] = textrow
array[1][content] = [text]<p>Hi!</p>[/text]
array[1][0][name] = text
array[1][0][content] = <p>Hi!</p>
Having written multiple BBCode parsing systems, I can suggest NOT using regexes only. Instead, you should actually parse the text.
How you do this is up to you, but as a general idea you would want to use something like strpos to locate the first [ in your string, then check what comes after it to see if it looks like a BBCode tag and process it if so. Then, search for [ again starting from where you ended up.
This has certain advantages, such as being able to examine each code and skip it if it's invalid, as well as enforcing proper tag closing order ([bold][italic]Nesting![/bold][/italic] should be considered invalid) and being able to provide meaningful error messages to the user if something is wrong (invalid parameter, perhaps) because the parser knows exactly what is going on, whereas a regex would output something unexpected and potentially harmful.
It might be more work (or less, depending on your skill with regex), but it's worth it.

PHP Regex URL parsing issues preg_replace

I have a custom markup parsing function that has been working very well for many years. I recently discovered a bug that I hadn't noticed before and I haven't been able to fix it. If anyone can help me with this that'd be awesome. So I have a custom built forum and text based MMORPG and every input is sanitized and parsed for bbcode like markup. It'll also parse out URL's and make them into legit links that go to an exit page with a disclaimer that you're leaving the site... So the issue that I'm having is that when I user posts multiple URL's in a text box (let's say \n delimited) it'll only convert every other URL into a link. Here's the parser for URL's:
$markup = preg_replace("/(^|[^=\"\/])\b((\w+:\/\/|www\.)[^\s<]+)" . "((\W+|\b)([\s<]|$))/ei", '"$1".shortURL("$2")."$4"', $markup);
As you can see it calls a PHP function, but that's not the issue here. Then entire text block is passed into this preg_replace at the same time rather than line by line or any other means.
If there's a simpler way of writing this preg_replace, please let me know
If you can figure out why this is only parsing every other URL, that's my ultimate goal here
Example INPUT:
http://skylnk.co/tRRTnb
http://skylnk.co/hkIJBT
http://skylnk.co/vUMGQo
http://skylnk.co/USOLfW
http://skylnk.co/BPlaJl
http://skylnk.co/tqcPbL
http://skylnk.co/jJTjRs
http://skylnk.co/itmhJs
http://skylnk.co/llUBAR
http://skylnk.co/XDJZxD
Example OUTPUT:
http://skylnk.co/tRRTnb
<br>http://skylnk.co/hkIJBT
<br>http://skylnk.co/vUMGQo
<br>http://skylnk.co/USOLfW
<br>http://skylnk.co/BPlaJl
<br>http://skylnk.co/tqcPbL
<br>http://skylnk.co/jJTjRs
<br>http://skylnk.co/itmhJs
<br>http://skylnk.co/llUBAR
<br>http://skylnk.co/XDJZxD
<br>
e flag in preg_replace is deprecated. You can use preg_replace_callback to access the same functionality.
i flag is useless here, since \w already matches both upper case and lower case, and there is no backreference in your pattern.
I set m flag, which makes the ^ and $ matches the beginning and the end of a line, rather than the beginning and the end of the entire string. This should fix your weird problem of matching every other line.
I also make some of the groups non-capturing (?:pattern) - since the bigger capturing groups have captured the text already.
The code below is not tested. I only tested the regex on regex tester.
preg_replace_callback(
"/(^|[^=\"\/])\b((?:\w+:\/\/|www\.)[^\s<]+)((?:\W+|\b)(?:[\s<]|$))/m",
function ($m) {
return "$m[1]".shortURL($m[2])."$m[3]";
},
$markup
);

Php regex match a string between two html tags with the tags been unknown

Ok, so here's my issue:
I have a link, say: http://www.blablabla.com/watch?v=1lyu1KKwC74&feature=list_other&playnext=1&list=AL94UKMTqg-9CfMhPFKXPXcvJ_j65v7UuV
And the link is between two tags say like this:
<br>http://www.blablabla.com/watch?v=1lyu1KKwC74&feature=list_other&playnext=1&list=AL94UKMTqg-9CfMhPFKXPXcvJ_j65v7UuV<br></p>
Using this regex with preg_replace:
'#(^|[^\/]|[^>])('.addcslashes($link,'.?+').')([^\w\/]|[^<]$)#i'
As such:
preg_replace('#(^|[^\/]|[^>])('.addcslashes($link,'.?+').')([^\w\/]|[^<]$)#i', "***",$strText);
The resulted string is :
<br***p>
Which is wrong!!
It should have been
<br>***<br></p>
How can I get the desired result? I have blasted my head out trying to solve this one out.
I would like to mention that str_replace replaces even the link within another valid link, so it's not a good method, I need an exact match between two boundaries, even if the boundary is text or another HTML tag.
Assuming you don't want to use a DOM parser for some reason, I believe doing what you intended is as simple as the following:
preg_replace('#(^|[^\/]|[^>])('.addcslashes($link,'.?+').')([^\w\/]|[^<]$)#i', "$1***$3",$strText);
This uses $1 and $3 to put back the delimiting text you matched in your regular expression.
As others have pointed out, using a DOM parser is more reliable.
Does this do what you want?

Optional regex pattern produces no value

I am having a bit of a problem with some regex I did for a project of mine (please keep in mind that I am a beginner at regex which shows in the follwoing example). I am having a bit of a problem with a piece of xml code from which I am trying to extract certain parts of it using an associated pattern.
<banner piclink="pic" urlactive="url_active" urltarget="globaltgt" urllink="globallink" timevar="globaldelay" swf="0" smooth="1" name="name" alt="alternate" />
I am using the following regular expression to obtain the piclink, urlactive, urltarget, urllink and timevar using preg_match_all:
/piclink=\"(?<pic>.+)\".+urltarget=\"(?<target>.+)\".+urllink=\"(?<url>.*)\".+timevar=\"(?<delay>.*)\"/iU
So far so good everything works right however, I am now trying to capture with association the name and alt tags which are optional as in they don't always appear. I have tried to put them in parenthesis followed by a ? to indicate that they are optional like such:
(name=\"(?<name>.*)\")?
However the $matches['name'] array is always empty, I do not know where I am messing up but I have tried all sorts of combinations and all of them result in an empty result except for when I put (?: at the end and encapsulate everything from swf= onwards then it does return like 115 results in the array which is not acceptabe as the result is like $matches['name'][X] = result, where x is sometimes 1 other times its at 109 for some reason.
I agree that something like SimpleXML would be better but if you want to get dirty, you can use lookaheads to try to match with the remaining characters.
/piclink=\"(?<pic>.+)\".+urltarget=\"(?<target>.+)\".+urllink=\"(?<url>.*)\".+timevar=\"(?<delay>[^"]*)\"(?=(.*name=\"(?<name>[^"]*)\")?)(?=(.*alt=\"(?<alt>[^"]*)\")?).*/iU

Categories