PHP Regex question - php

I'm trying to parse some text for example:
$text = "Blah blah [a]findme[/a] and [b]findmetoo[b], maybe also [z]me[/z].";
What I have now is:
preg_match_all("/[*?](.*?)[\/*?]/", $text, $matches);
Which doesn't work unfortunately.
Any ideas how to parse, return the node key and the corresponding node value?

Well firstly by you not putting () around your *? your not matching the tag name, and secondly, using [*?] will match multiple [ until the ] where you want to match inside, so you should be doing [(.*?)] and [\/(.*?)]
You would have to try something along the lines of:
/\[(.*?)\](.*?)\[\/(.*?)\]/is
this is not guaranteed to work but will get you closer.
you could also do:
/\[(.*?)\](.*?)\[\/\1\]/is
and then foreach result loop recursively until preg_match_all returns false, that's a possible way how to do nesting.

In order to match the same tags, you need a backreference:
This assumes no nesting, if you need nesting then let me know.
$matches = array();
if (preg_match_all('#\[([^\]]+)\](.+?)\[/\1\]#', $text, $matches)) {
// $matches[0] - entire matched section
// $matches[1] - keys
// $matches[2] - values
}
Incidentally, I do not know what you are going to do with this bbcode style work, but usually you would want to use preg_replace_callback() to deal with inline modification of this sort of text, with a regexp similar to the above.

Try:
$pattern = "/\[a\](.*?)\[\/a\]/";
$text = "Blah blah [a]findme[/a] and [b]findmetoo[b], maybe also [z]me[/z].";
preg_match_all($pattern, $text, $matches);
That should point you in the right direction.

I came up with this regex ((\[[^\/]\]).+?(\[\/[^\/]\])). Hope will work for you

I'm no regex monkey, but I think you need to escape those brackets and create groups to search for, as brackets don't return results (parentheses do):
preg_match_all("/\\[(*?)\\](.*?)\\[\(\/*?)\\]/", $text, $matches);
Hope this works!

Should your second example also be captured even though the [b] "tag" is not closed with the [\b] backslash 'b'. If tags should be properly closed then use
/\[(.*?)\](.*?)\[\/\1\]/
This will ensure that opening and closing tags match.

You can try this:
preg_match_all("/\[(.*?)\](.*?)\[\/?.*?\]/", $text, $matches);
See it
Changes made:
[ and ] are regex meta-characters
used to define character class. To
match literal [ and ] you need to
escape them.
To match any arbitrary text(without
newline) in non-greedy way you use
.*?.
To match the node key you need to
enclose the pattern matching it in
(..) so that they get captured.

Related

Matching substrings with PHP preg_match_all()

I'm attempting to create a lightweight BBCode parser without hardcoding regex matches for each element. My way is utilizing preg_replace_callback() to process the match in the function.
My simple yet frustrating way involves using regex to group the elements name and parse different with a switch for each function.
Here is my regex pattern:
'~\[([a-z]+)(?:=(.*))?(?: (.*))?\](.*)(?:\[/\1\])~siU'
And here is the preg_replace_callback() I've got to test.
return preg_replace_callback(
'~\[([a-z]+)(?:=(.*))?(?: (.*))?\](.*)(?:\[/\1\])~siU',
function($matches) {
var_dump($matches);
return "<".$matches[1].">".$matches[4]."</".$matches[1].">";
},
$this->raw
);
This one issue has stumped me. The regex pattern won't seem to recursively match, meaning if it matches an element, it won't match elements inside it.
Take this BBCode for instance:
[i]This is all italics along with a [b]bold[/b].[/i]
This will only match the [u], and won't match any of the elements inside of it, so it looks like
This is all italics along with a [b]bold[/b].
preg_match_all() continues to show this to be the case, and I've tried messing with greedy syntax and modes.
How can I solve this?
Thanks to #Casimir et Hippolyte for their comment, I was able to solve this using a while loop and the count parameter like they said.
The basic regex strings don't work because I would like to use values in the tags like [color=red] or [img width=""].
Here is the finalized code. It isn't perfect but it works.
$str = $this->raw;
do {
$str = preg_replace_callback(
'~\[([a-z]+)(?:=([^]\s]*))?(?: ([^[]*))?\](.*?)(?:\[/\1\])~si',
function($matches) {
return "<".$matches[1].">".$matches[4]."</".$matches[1].">";
},
$str,
-1,
$count
);
} while ($count);
return $str;

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

I need to find a way explode a specific string that has quotes in it

I'm having serious trouble with this and I'm not really experienced enough to understand how I should go about it.
To start off I have a very long string known as $VC. Each time it's slightly different but will always have some things that are the same.
$VC is an htmlspecialchars() string that looks something like
Example Link... Lots of other stuff in between here... 80] ,[] ,"","3245697351286309258",[] ,["812750926... and it goes on ...80] ,[] ,"","6057413202557366578",[] ,["103279554... and it continues on
In this case the <a> tag is always the same so I take my information from there. The numbers listed after it such as ,"3245697351286309258",[] and ,"6057413202557366578",[] will also always be in the same format, just different numbers and one of those numbers will always be a specific ID.
I then find that specific ID I want, I will always want that number inside pid%3D and %26oid.
$pid = explode("pid%3D", $VC, 2);
$pid = explode("%26oid", $pid[1], 2);
$pid = $pid[0];
In this case that number is 6057413202557366578. Next I want to explode $VC in a way that lets me put everything after ,"6057413202557366578",[] into a variable as its own string.
This is where things start to break down. What I want to do is the following
$vinfo = explode(',"'.$pid.'",[]',$VC,2);
$vinfo = $vinfo[1]; //Everything after the value I used to explode it.
Now naturally I did look around and try other things such as preg_split and preg_replace but I've got to admit, it is beyond me and as far as I can tell, those don't let you put your own variable in the middle of them (e.g. ',"'.$pid.'",[]').
If I'm understanding the whole regular expression idea, there might be other problems in that if I look for it without the $pid variable (e.g. just the surrounding characters), it will pick up the similar parts of the string before it gets to the one I want, (e.g. the ,"3245697351286309258",[]).
I hope I've explained this well enough, the main question though is - How can I get the information after that specific part of the string (',"'.$pid.'",[]') into a variable?
I hope this does what you want:
pid%3D(?P<id>\d+).*?"(?P=id)",\[\](?P<vinfo>.*?)}\);<\/script>
It captures the number after pid%3D in group id, and everything after "id",[] (until the next occurence of });</script>) in group vinfo.
Here's a demo with shortened text.
The problem of capturing more than you want is fixed using capture groups. You'll wrap part of a regular expression in parenthesis to capture it.
You can use preg_match_all to do more robust regular expression capture. You will get an array of things that contains matches to the string that matched the entire pattern plus a string with a partial match for each capture group you use. We'll start by capturing the parts of the string you want. There are no capture groups at this point:
$text = 'Example Link... Lots of other stuff in between here... 80] ,[] ,"","3245697351286309258",[] ,["812750926... and it goes on ...80] ,[] ,"","6057413202557366578",[] ,["103279554... and it continues on"';
$pattern = '/,"\\d+",\\[\\]/';
preg_match_all($pattern,
$text,
$out, PREG_PATTERN_ORDER);
echo $out[0][0]; //echo ,"3245697351286309258",[]
Now to get just the pids into a variable, you can add a capture group in your pattern. The capture group is done by adding parenthesis:
$text = ...
$pattern = '/,"(\\d+)",\\[\\]/'; // the \d+ match will be capture
preg_match_all($pattern,
$text,
$out, PREG_PATTERN_ORDER);
$pids = $out[1];
echo $pids[0]; // echo 3245697351286309258
Notice the first (and only in this case) capture group is in $out[1] (which is an array). What we have captured is all the digits.
To capture everything else, assuming everything is between square brackets, you could match more and capture it. To address the question, we'll use two capture groups. The first will capture the digits and the second will capture everything matching square brackets and everything in between:
$text = ...;
$pattern = '/,"(\\d+)",\\[\\] ,(\\[.+?\\])/';
preg_match_all($pattern,
$text,
$out, PREG_PATTERN_ORDER);
$pids = $out[1];
$contents = $out[2];
echo $pids[0] . "=" . $contents[0] ."\n";
echo $pids[1] . "=". $contents[1];

PHP regex for #[mention]

Can someone help me:
$pattern = "/^(?:[a-zA-Z0-9?. ]?)+#([a-zA-Z0-9]+)(.+)?$/";
$str = "Hey #[14256] hey how are you?";
preg_match($pattern, $title, $matches);
print_r($matches);
The print result works fine if I remove the brackets (#[14256]) of the # mention, however I can't figure out how to do the regex to work with the brackets. So I will get the result 14256 in my array.
You need to include the brackets in your regex:
"/^(?:[a-zA-Z0-9?. ]?)+#(\\[?[a-zA-Z0-9]+\\]?)(.+)?$/"
Notice the \\[? and \\]? I've added; those will match the [] characters, and will also match if there is no [].
Keep in mind, the above will match #[14256 and #14256]. If you want to only match one or the other, you need to do it a little differently.
"/^(?:[a-zA-Z0-9?. ]?)+#([a-zA-Z0-9]+|\\[[a-zA-Z0-9]+\\])(.+)?$/"
This will match EITHER #aA1 or #[aA1], but not the bad examples as I showed above.
One last thing to include: This regex will only match one instance of the #[mention]. If you want to match ALL instances of it (such as in "hey #123, how is #456 these days?"), use the following with preg_match_all():
"/#([a-zA-Z0-9]+|\\[[a-zA-Z0-9]+\\])/"
Then $matches[1] will contain both 123 and 456.
You need to escape the brackets in your regex so they don't get interpreted as a new character class. Try this instead (it will only capture the number, not the brackets. Place the escaped brackets in the parentheses to capture them as part of a backreference):
$pattern = "/^(?:[a-zA-Z0-9?. ]?)+#\[([a-zA-Z0-9]+)\](.+)?$/";

Extract content between first "]" and last "[" using regex?

Is it possible to have a PHP regex expression that extracts the content from the first ] to the last [?
For example if I had the following string:
$string = [shortcode]You write a shortcode by using ([])[/shortcode]
I would want to extract:
You write a shortcode by using brackets ([])
and store it in a variable. The content to be extracted could be anything. Thanks in advance.
You should be using capturing groups to make sure you match the closing tag.
\[(\w+)\].*?\[/\1\]
This will match a word inside [] and keep going until if finds the same word inside [/...].
Regexes are greedy by default, so this will do the job just fine:
/\](.*)\[/
To get this working in PHP properly, you would do something like this:
preg_match('/\](.*)\[/', $text, $matches);
$result = $matches[1];
this could make, what you need
[^\]]\](.*)\[[^\[]
This works:
preg_match( '#\](.*)\[#', $string, $matches);
print_r($matches);

Categories