How to handle a miss with regex / PHP / preg_match_all

How to handle a miss with regex / PHP / preg_match_all - php

I'm using the code at the bottom to grab parameters from a wordpress shortcode. The shortcode itself looks like this:
[FLOWPLAYER=http://www.tvovermind.com/wp-content/uploads/2013/01/pll-316-21.jpg|http://www.tvovermind.com/wp-content/uploads/2013/01/PLL316_fv2.h264HD-Clip2.flv,440,280]
Or
[FLOWPLAYER=http://www.tvovermind.com/wp-content/uploads/2013/01/pll-316-21.jpg|http://www.tvovermind.com/wp-content/uploads/2013/01/PLL316_fv2.h264HD-Clip2.flv,440,280,false]
What I would like to have happen is that if the extra parameter (false/true) is missing then that match becomes "false", however with the current code if the parameter is missing a match is never made. Any ideas?
function legacy_hook($content){
$regex = '/\[FLOWPLAYER=([a-z0-9\:\.\-\&\_\/\|]+)\,([0-9]+)\,([0-9]+)\,([a-z0-9\:\.\-\&\_\/\|]+)\]/i';
$matches = array();
preg_match_all($regex, $content, $matches);
if($matches[0][0] != '') {
foreach($matches[0] as $key => $data) {
$content = str_replace($matches[0][$key], flowplayer::build_player($matches[2][$key], $matches[3][$key], $matches[1][$key],$matches[4][$key]),$content);
}
}
return $content;
}

your regex is looking for the last comma to be there and one or more of the characters in the last set of brackets. Something like
/\[FLOWPLAYER=([a-z0-9\:\.\-\&\_\/\|]+)\,([0-9]+)\,([0-9]+)(\,[a-z]+)?\]/i
only issue is you'll get the comma in the match too.
might be what you're after, then you have to test for the last match being present. preg_match_all returns the number of matches so you might be able to use that, or you could do an inline if...
(count($matches) > 4 ? $matches[4][$key] : false)

You can add OR at the end of your expression
(,true|,false|$)
I didn't check does it work but you get the idea.

Related

Check number ID with Preg_match

i've a little problem.
I want to check the numer of post like this:
http://xxx.xxxxxx.net/episodio/168
this is part of my code, only need the number check:
[...]
if(preg_match('#^http://horadeaventura.enlatino.net/episodio/[0-9]',trim($url))){
[...]
Can help me?
Thanks!

If you want to do it with preg_match:
$url = 'http://horadeaventura.enlatino.net/episodio/168';
if(preg_match('#^http://horadeaventura.enlatino.net/episodio/([0-9]+)#',trim($url), $matches)){
$post = $matches[1];
echo $post;
}
So, basically: I added an end delimiter (#), changed "[0-9]" to "([0-9])+", added ", $matches" to capture the matches. Of course it can be done better and using other options than preg_match. But I wanted to make your snippet work - not rewrite it.

If you don't have your heart set on using preg_match(), you could do
$string = "http://xxx.xxxxxx.net/episodio/168";
$array = explode("/", $string);
echo end($array);
which will output
168
this is assuming the number you are looking for will always be the last section of the url string

Or, you can just check for number, on last position:
if(preg_match('#[0-9]+$#',trim($url),$match)){
print_r($match);
}

preg_match acting very strange

I am using preg_match() to extract pieces of text from a variable, and let's say the variable looks like this:
[htmlcode]This is supposed to be displayed[/htmlcode]
middle text
[htmlcode]This is also supposed to be displayed[/htmlcode]
i want to extract the contents of the [htmlcode]'s and input them into an array. i am doing this by using preg_match().
preg_match('/\[htmlcode\]([^\"]*)\[\/htmlcode\]/ms', $text, $matches);
foreach($matches as $value){
return $value . "<br />";
}
The above code outputs
[htmlcode]This is supposed to be displayed[/htmlcode]middle text[htmlcode]This is also supposed to be displayed[/htmlcode]
instead of
[htmlcode]This is supposed to be displayed[/htmlcode]
[htmlcode]This is also supposed to be displayed[/htmlcode]
and if have offically run out of ideas

As explained already; the * pattern is greedy. Another thing is to use preg_match_all() function. It'll return you a multi-dimension array of matched content.
preg_match_all('#\[htmlcode\]([^\"]*?)\[/htmlcode\]#ms', $text, $matches);
foreach( $matches[1] as $value ) {
And you'll get this: http://codepad.viper-7.com/z2GuSd

A * grouper is greedy, i.e. it will eat everything until last [/htmlcode]. Try replacing * with non-greedy *?.

* is by default greedy, ([^\"]*?) (notice the added ?) should make it lazy.
What do lazy and greedy mean in the context of regular expressions?

Look at this piece of code:
preg_match('/\[htmlcode\]([^\"]*)\[\/htmlcode\]/ms', $text, $matches);
foreach($matches as $value){
return $value . "<br />";
}
Now, if your pattern works fine and all is ok, you should know:
return statement will break all loops and will exit the function.
The first element in matches is the whole match, the whole string. In your case $text
So, what you did is returned the first big string and exited the function.
I suggest you can check for desired results:
$matches[1] and $matches[2]

Determine User Input Contains URL

I have a input form field which collects mixed strings.
Determine if a posted string contains an URL (e.g. http://link.com, link.com, www.link.com, etc) so it can then be anchored properly as needed.
An example of this would be something as micro blogging functionality where processing script will anchor anything with a link. Other sample could be this same post where 'http://link.com' got anchored automatically.
I believe I should approach this on display and not on input. How could I go about it?

You can use regular expressions to call a function on every match in PHP. You can for example use something like this:
<?php
function makeLink($match) {
// Parse link.
$substr = substr($match, 0, 6);
if ($substr != 'http:/' && $substr != 'https:' && $substr != 'ftp://' && $substr != 'news:/' && $substr != 'file:/') {
$url = 'http://' . $match;
} else {
$url = $match;
}
return '' . $match . '';
}
function makeHyperlinks($text) {
// Find links and call the makeLink() function on them.
return preg_replace('/((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:#=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])/e', "makeLink('$1')", $text);
}
?>

You will want to use a regular expression to match common URL patterns. PHP offers a function called preg_match that allows you to do this.
The regular expression itself could take several forms, but here is something to get you started (also maybe just Google 'URL regex':
'/^(((http|https|ftp)://)?([[a-zA-Z0-9]-.])+(.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]/+=%&_.~?-]))$/'
So your code should look something this:
$matches = array(); // will hold the results of the regular expression match
$string = "http://www.astringwithaurl.com";
$regexUrl = '/^(((http|https|ftp):\/\/)?([[a-zA-Z0-9]\-\.])+(\.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]\/+=%&_\.~?\-]*))*$/';
preg_match($regexUrl, $string, $matches);
print_r($matches); // an array of matched patterns
From here, you just want to wrap those URL patterns in an anchor/href tag and you're done.

Just how accurate do you want to be? Given just how varied URLs can be, you're going to have to draw the line somewhere. For instance. www.ca is a perfectly valid hostname and does bring up a site, but it's not something you'd EXPECT to work.

You should investigate regular expressions for this.
You will build a pattern that will match the part of your string that looks like a URL and format it appropriately.
It will come out something like this (lifted this, haven't tested it);
$pattern = "((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&]*)";
preg_match($pattern, $input_string, $url_matches, PREG_OFFSET_CAPTURE, 3);
$url_matches will contain an array of all of the parts of the input string that matched the url pattern.

You can use $_SERVER['HTTP_HOST'] to get the host information.
<?php
$host = $SERVER['HTTP_HOST'];
?>
Post

Get more backreferences from regexp than parenthesis

Ok this is really difficult to explain in English, so I'll just give an example.
I am going to have strings in the following format:
key-value;key1-value;key2-...
and I need to extract the data to be an array
array('key'=>'value','key1'=>'value1', ... )
I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:
/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/
to work with preg_match and this code:
for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
$parameters[$matches[$i]] = $matches[$i+1];
}
However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.
In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.

You can use a lookahead to validate the input while you extract the matches:
/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/
(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.
This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.
If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.

regex is powerful tool, but sometimes, its not the best approach.
$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
$e = explode("-",$k);
$array[$e[0]]=$e[1];
}
print_r($array);

Use preg_match_all() instead. Maybe something like:
$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';
preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$parameters[$match[1]] = $match[2];
}
print_r($parameters);
EDIT:
to first validate if the input string conforms to the pattern, then just use:
if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
EDIT2: the final semicolon is optional
if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
/* do the preg_match_all stuff */
}

No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.

what about this solution:
$samples = array(
"good" => "key-value;key1-value;key2-value;key5-value;key-value;",
"bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
"bad2" => "key;key1-value;key2-value;key5-value;key-value;",
"bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);
foreach($samples as $name => $value) {
if (preg_match("/^(\w+-\w+;)+$/", $value)) {
printf("'%s' matches\n", $name);
} else {
printf("'%s' not matches\n", $name);
}
}

I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

preg_replace inside of preg_match_all problems

I'm trying to find some certain blocks in my data file and replace something inside of them. After that put the whole thing (with replaced data) into a new file. My code at the moment looks like this:
$content = file_get_contents('file.ext', true);
//find certain pattern blocks first
preg_match_all('/regexp/su', $content, $matches);
foreach ($matches[0] as $match) {
//replace data inside of those blocks
preg_replace('/regexp2/su', 'replacement', $match);
}
file_put_contents('new_file.ext', return_whole_thing?);
Now the problem is I don't know how to return_whole_thing. Basically, file.ext and new_file.ext are almost the same except of the replaced data.
Any suggestion what should be on place of return_whole_thing?
Thank you!

You don't even need the preg_replace; because you've already got the matches you can just use a normal str_replace like so:
$content = file_get_contents('file.ext', true);
//find certain pattern blocks first
preg_match_all('/regexp/su', $content, $matches);
foreach ($matches[0] as $match) {
//replace data inside of those blocks
$content = str_replace( $match, 'replacement', $content)
}
file_put_contents('new_file.ext', $content);

It's probably best to strengthen your regular expression to find a subpattern within the original pattern. That way you can just call preg_replace() and be done with it.
$new_file_contents = preg_replace('/regular(Exp)/', 'replacement', $content);
This can be done with "( )" within the regular expression. A quick google search for "regular expression subpatterns" resulted in this.

I'm not sure I understand your problem. Could you perhaps post an example of:
file.ext, the original file
the regex you want to use and what you want to replace matches with
new_file.ext, your desired output
If you just want to read file.ext, replace a regex match, and store the result in new_file.ext, all your need is:
$content = file_get_contents('file.ext');
$content = preg_replace('/match/', 'replacement', $content);
file_put_contents('new_file.ext', $content);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to handle a miss with regex / PHP / preg_match_all - php

You can add OR at the end of your expression (,true|,false|$) I didn't check does it work but you get the idea.

Related

Check number ID with Preg_match

preg_match acting very strange

Determine User Input Contains URL

Get more backreferences from regexp than parenthesis

preg_replace inside of preg_match_all problems

Categories

Resources