This question already has answers here:
Grabbing the href attribute of an A element
(10 answers)
Closed 9 years ago.
i am trying to regex a difficult link
preg_match_all('/<a[^>]*href\s*=\s*(["\'])(.*?)\1[^>]*>\s+TEXTTOFIND+(.*?)\s*<\/a>/', '<a href="http://subdomain.BLABLABLA.net/de/cgi/g.fcgi/BLABLABLA/print?folder=inbox&uid=U3RlcClzESBNZK9SDGsmQ05yIJTj7Eax&CUSTOMERNO=124332225&t=de1142311604.1315866430.20ba8551" style="margin-right: 10px;" title=""BLABLABLA.net Registrierung" <register#gutefrage.net>">"TEXTTOFIND.net R...
</a>', $match);
print_r($match);
BLABLABLA is only a test to hide the real page :)
all I want is to find the URL of link with "TEXTTOFIND"
but it doesn't work :(
You should be using a DOM parser to do this, not regular expressions. But if you want to do it the wrong way anyway, it looks like one of the reasons it isn't working is that you're trying to match:
... [^>]*>\s+TEXTTOFIND ...
But your test string is:
... >"TEXTTOFIND
Note the double quote " between the right angle bracket and your TEXTTOFIND string. The modifier from your regex, \s+, will not match this.
http://ua2.php.net/manual/en/function.preg-match-all.php
at first, try read docs, you miss 2nd parameter
at second, hello Alex :)
at 3rd \s+ you can change to . at some ... happens(sorry for my english)
Related
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
I imported some posts to my site from RSS but at the end of post this line appears - This Post Appeared First On This site.
<p>The post <a rel="nofollow" href="link">title</a> appeared first on <a rel="nofollow" href="Website.com"">Website</a>.</p>
however, my removal code doesn't work
preg_replace('/<p>The post <a\s+.*?href=".*?"\s+.*?>.*?<\/a> appeared first on <a\s+.*?href=".*?"\s+.*?>.*?<\/a>.</p>/i', '', $text);
hope someone can help me
I agree with the comments above, don't use regexes to parse HTML or XML strings, they're not the tools for the job. However, if you must, your original regex has two problems:
You didn't escape the </p> (as User3783243 mentioned). It needs to be <\/p> in the regex.
The regex requires a whitespace after the href="" attribute, which is not present in the example. You should probably remove the \s+ after the second " in the href.
If you add them in, the regex matches the provided string see here: https://regex101.com/r/MDwSua/1
This should work:
$regex = '/\<p\>The post \<a[^>]*\>title\<\/a\> appeared first on \<a[^>]*\>Website\<\/a\>.\<\/p\>/';
preg_replace($regex, '', $text);
The pattern [^>]* captures the attributes of a tag.
This question already has answers here:
Variable-length lookbehind-assertion alternatives for regular expressions
(5 answers)
Closed 4 years ago.
I cant get my regexpression to work in php. It works in javascript (vuejs):
(?<=.+: )(.*)
I have this string:
NL: abcdef
and i would like to get
abcdef
Can someone please tell me what i am doing wrong?
There are many ways to solve this using PHP/PCRE, one is to skip the preceding string using \K
[^:]+: \K(.*)
Regex Demo
If you can add an anchor to the beginning of the string, even better: ^[^:]+: \K(.*)
This question already has answers here:
What does the $1$2$4 mean in this preg_replace?
(3 answers)
Closed 4 years ago.
I want to loop through an array converting specific key/value pairs that contain markup to HTML.
So an example value for $comment['comment_text'] would be:
This has *bolded* text
And should become:
This has <strong>bolded</strong> text
Here's what I've tried:
$pattern = "/\*\b.*?\b\*/i";
$newComment = preg_replace($pattern, "<strong>$&</strong>",
$comment['comment_text']);
And what I get:
This has $& text
I realize I'm mashing up Javascript with PHP, but reading about back references in PHP hasn't made things any clearer.
My strings may have multiple bolded (in markup) instances...
Any help appreciated.
UPDATE:
Apologies - I didn't realize that Stackoverflow was converting asterisks to italics. I converted the example to code.
Also, my confusion came down to the use of $0 vs. $1. Which I still don't fully understand. I thought the numbers referred to the matches in the string...so if you had 5 instances you could refer to them by $0 through $4.
If you use $0 you get:
This has <strong>*bolded*</strong> text
But if you use $1 you get the desired result.
Do this.
$pattern = "/\*\b(.*?)\b\*/";
$newComment = preg_replace($pattern, "<strong>$1</strong>", $comment['comment_text']);
Here $1 refers to the group 1 match. Here I'm supposing that you want to make text between ** bolded.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regex - Greedyness - matching HTML tags, content and attributes
The text I want to parse is something like this:
Dir: Vinton Heuck, Ciro Nieli
With: Eric Loomis, Bumper Robinson, Dawn Olivieri
Usually, there're one or two anchor elements after "Dir" and multiple anchor elements after "With".
What I want to do is get all values of anchor elements after "Dir" and before "With". I tried some regular expression like this:
preg_match_all("/Dir: <a href=\"\/name\/.+\/\">(.+)<\/a>/", $content, $matches);
But this only works when there's only one anchor element after "Dir". Any suggestions? Thanks!
i think you are missing some grouping instruction "()+" to get not only one but one or two links, take a look at this to test your regex.
You would have to group your regex for finding the anchor tag, and use + for one or more.
Something like:
/Dir: (<a href=\"\/name\/.+\/\">(.+)<\/a>)+/
You'd have to edit to take into account the comma, but it will get you started.
Assuming that the line that contains "Dir:" appears only once:
preg_match_all("/(<([[:graph:]]+)[^>]*>)(.*?)(<\/\\2>)/", preg_replace("/[[:blank:]]*With:.*/","",$content), $matches);
print_r($matches[3]);
This question already has answers here:
What regex to use for this
(6 answers)
Closed 4 years ago.
I'm having troube modifying this regex. Right now it matches . or ? but I want to change it to match dot followed by a space. How do I do that?
'('/([.|?])/'
By the way, I need the grouping to stay.
What about this:
(\. |\?)
......
The easiest way would be:
'('/(\. )/'
or, if you want a space or a tab or a new-line:
'('/(\.\s)/'
Note that I only changed the part in the inner parenthesis as that part seems to be the focus of your question.
/\.\s/ should work for matching a dot followed by a space..
note: \s matches any whitespace