preg_match_all doesn't match when using a carat (^) - php

I'm using preg_match_all to find a URL in a HTML file. The URL always appears at the start of the line, with no leading space, like this:
<strong>Next</strong>
I used this to match it:
preg_match_all('|^<A HREF="(?<url>.*?)"><strong>Next</strong>|', $html, $url_matches);
It didn't work until I removed the carat (^) character. I thought that the carat matched the start of a line. Why is it causing my match to fail?

You have to add the m modifier:
preg_match_all('|^<A HREF="(?<url>.*?)"><strong>Next</strong>|m', $html, $url_matches);
then ^ matches at start of a line, else it would only match at the start of the entire string.
More Info: http://php.net/manual/en/reference.pcre.pattern.modifiers.php

^ matches start-of-string not start-of-line. Use the m ("multi-line") modifier: //m

Related

PHP regular expression start and end with given strings

I have a string like this
05/15/2015 09:19 PM pt_Product2017.9.abc.swl.px64_kor_7700 I need to select the pt_Product2017.9.abc.swl.px64_kor from that. (start with pt_ and end with _kor)
$str = "05/15/2015 09:19 PM pt_Product2017.9.abc.swl.px64_kor_7700";
preg_match('/^pt_*_kor$/',$str, $matches);
But it doesn't work.
You need to remove the anchors, adda \b at the beginning to match pt_ preceded with a non-word character, and use a \S with * (\S shorthand character class that matches any character but whitespace):
preg_match('/\bpt_\S*_kor/',$str, $matches);
See regex demo
In your regex,^ and $ force the regex engine to search for the ptat the beginning and _kor at the end of the string, and _* matches 0 or more underscores. Note that regex patterns are not the same as wildcards.
In case there can be whitespace between pt_ and _kor, use .*:
preg_match('/\bpt_.*_kor/',$str, $matches);
I should also mention greediness: if you have pt_something_kor_more_kor, the .*/\S* will match the whole string, but .*?/\S*? will match just pt_something_kor. Please adjust according to your requirements.
^ and $ are the start and end of the complete string, not only the matched one. So use simply (pt_.+_kor) to match everything between pt_ and _kor: preg_match('/(pt_+_kor)/',$str, $matches);
Here's a demo: https://regex101.com/r/qL4fW9/1
The ^ and $ that you have used in the regular expression means that the string should start with pt AND end with kor. But it's neither starting as such, nor ending with kor (in fact, ending with kor_7700).
Try removing the ^ and $, and you'll get the match:
preg_match('/pt_.*_kor/',$str, $matches);

preg_match start and end of string and replace

Could someone help with a preg_match expression I need it to match the - dash character at the start and end of a string. This is for tags e.g. match -my-tag- should then be my-tag so It only matches the start and end of a string and replace it the characters with and empty string
You can do that with this easy expression:
$string = "-my-tag-";
$tag = preg_replace("/^-(.*)-$/", "$1", $string);
^ and $ are used to match the start and the end of the string, while (.*) captures every other symbols.
You can read more about regular expressions in the official PHP Documentation.

PHP search and replace regex pattern

I'm trying to get a string which matches by a regex pattern ( {$ ... } ). But I don't want the brackets and the $ sign returned.
For example
{$Testpath}/Testlink
should return
Testpath
My regex pattern looks like this at the moment:
^{\$.*}$
Try the following regex:
^\{\$\K[^}]*(?=\})
Regex101 Demo
This expression mathces start-of-string ^ then a literal { then a literal $ then it ignores those using \K anchor, then it matches one or more characters which aren't a } then it looks ahead (?=\}) for a literal }.
You may not need the end-of-line anchor $ because the text you are trying to match might not end at the end of the string and you may not need the start-of-line ^ anchor for the opposite reason, that is the pattern you are trying to match may not be at the start of the string or line.
I think you should remove ^ and $ and use the global modifier.

Regular expression starting with http and ending with pdf?

I have loaded the entire HTML of a page and want to retrieve all the URL's which start with http and end with pdf. I wrote the following which didn't work:
$html = file_get_contents( "http://www.example.com" );
preg_match( '/^http(pdf)$/', $html, $matches );
I'm pretty new to regex but from what I've learned ^ marks the beginning of a pattern and $ marks the end. What am I doing wrong?
You need to match the characters in the middle of the URL:
/\bhttp[\w%+\/-]+?pdf\b/
\b matches a word boundary
^ and $ mark the beginning and end of the entire string. You don't want them here.
[...] matches any character in the brackets
\w matches any word character
+ matches one or more of the previous match
? makes the + lazy rather than greedy
preg_match( '/http[^\s]+pdf/', $html, $matches );
Matches http followed by not ([^...]) spaces (\s) one or more times (+) followed by pdf
Try this,
preg_match( '/\bhttp\S*pdf\b/', $html, $matches );
You need to match the part between the http and the pdf, this is what .*? is doing.
^ matches the start of the string and $ the end, but this is not what you want, when you want to extract those links from a longer text.
\b is matching on word boundaries
Update
for completeness, the .*? would still match too much so exchanged with \S*
\S matches a non whitespace character
Try this one:
preg_match_all('/\bhttp\S*?pdf\b/', $html, $matches);
Note that you need to use the preg_match_all()-function here, since you are trying to match more than one occurrence. ^ and $ wont work, because they only apply to line or file boundaries (depending on the used modifiers).
preg_match( '/^http.*pdf$/', $html, $matches );
is better (working)

preg_replace curly brace when it is the only character on the line?

Let's say I have the following string:
Some Text Here }
}
How can I do a preg_replace so that only the "}" on the line by itself gets replaced?
I would expect the following to work, but it doesn't:
preg_replace('/^(\s*)(\})(\s*)/', etc);
The following should work:
preg_replace('/^\s*\}\s*$/m', $replacement, $subject);
The s* means any number of the character s. What you probably mean is \s*, any number of whitespace characters.
You need to enable multiline mode for the ^ anchor to work on a per line basis; the default setting is that ^ is the beginning and $ the end of the entire string, not a single line.
Remember the $ anchor, otherwise something like }hello would also get matched.
^ and $ matches the beginning and end of a string. You need the m modifier to make this match the beginning and end of a line.
Your RE will not work as expected. s* matches zero or more occurences of s. It's very likely that you wanted to use \s* instead, to match white space.
preg_replace('/^(\s*)(\})(\s*)$/m', $replacement, $subject);
A multi-line free version, that could be used in a larger regex should spanning lines be needed:
/(^|\n)([^\S\n]*\}[^\S\n]*)(?=\n|$)/

Categories