regexpr in PHP for capture all img ID - php

I have a problem creating regexpr for capture all IDs from tags imgs. i have a code:
#<span><img (id=\"([^"]*)\").*><\/span>#
Example:
https://regex101.com/r/aN0uO0/3
only capture ID from first tag IMG.
Thanks.

Thats because you are not searching globally, to enable that enter g on the right text field ( after the right / mark ).
If you are using preg_match in php, use preg_match_all and it will work.

If you're eager to use a regex at all, you might get along with:
<img[^>]*?id=(['"])(.+?)\1
You'll find the id in the second group ($2). However, if you have a > somewhere in an attribute (which is totally valid in HTML), the regex won't work as expected. In PHP code this would be:
$regex = '~<img[^>]*?id=([\'"])(.+?)\1~';
$string = 'your_string_here';
preg_match_all($regex, $string, $matches, PREG_SET_ORDER);
See an example on regex101.com.
Hint: It's almost always better to use a parser, e.g. SimpleXML or DomDocument.

Related

preg_match link text with less-than sign in it

I'm trying to get information in DB from html files, and suddenly found that link can be like:
channel crosstalk: <60dB
there for my regular expression doesn't find that link:
preg_match_all('|<a href="/blabla/([0-9]+)"[^>]*>([^<]*)</a>|Uis',$html,$matches);
This is a part of big regular expression, I just simplified it for example.
It's hard to tell what you are trying to pull. Are you looking for the entire link? Or are you looking to grab parts from the link (hence the parenthesis)? Here is a solution for getting the individual contents in the link:
preg_match_all( '#(.*?)#i', $html, $matches);
The first element of matches will be the entire link, while the other elements will be the sub parts.
Or here is one for just the entire link:
preg_match_all( "#(<a.*>.*</a>)#i", $html, $matches );
Or here is a slightly modified version of yours which currently isn't matching because it's saying to match anything that is not an angle bracket inside the opening and closing A tags as its contents has an angle bracket:
preg_match_all( '|<a href="/blabla/([0-9]+)"[^>]*>(.*?)</a>|Uis', $html, $matches );
Again, not 100% sure the exact results you are looking for, but maybe this will get your going and you can make modifications as needed.
You can use this regex to extract href and link text.
<a[^>]+?href="(.*?)"[^>]+?>(.*?)</a>
Group 1: href
Group 2: link text
This is the fundamental issue with trying to regex HTML. This is not really good HTML - because contents that are not meant to be interpreted as HTML should be html entities (aka &lte; instead of <). You won't always be able to handle that though.
In your case, something like this works for regex:
|.*?|Uis
The matching group gets shifted. This also allows nested tags (like <a><b><i></i></b></a>).
Keep in mind that the Ungreedy tag you used means that you can be a little more lax in your regex matching. If you wanted to do this without the U modifier you'd maybe need to do some negative lookaheads.
|(?:(?!).)*</a>|is

Check if there's an URL between span tags

I got a HTML code containing following:
<span rel="url">example.com</span>
<span rel="url">example.net.pl [SOMETHING]</span>
<span rel="url">[SOMETHING]imjustanexample.com</span> [..]
The question is, if there is a way to get the "url" string from between span tags. eg. it should get the following: example.com, example.net.pl (without the [SOMETHING] string), and imjustanexample.com.
I guess I will have to use regex for this purpose.
Try this regular expression in javascript,
/((http|https):\/\/(\w+:{0,1}\w*#)?(\S+)|)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/
to validate text from span tag
I would go this way (either in regex or just PHP code, like you prefer):
Locate next ""
Take everything from it's end until the next (but not including) space or lower-than sign < (whichever of those tow comes first).
Repeat until nothing is matched any longer.
Done. If regular expression is too complicated for you, you can also take string functions http://php.net/strings .
This should work:
$str = '<span rel="url">http://google.ca</span>';
$match = preg_match('#<span(.*)?>((http|https|ftp)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|"|\'|:|\<|$|\.\s)</span>#i', $str, $matches);
if($match)
var_dump($matches);
else
echo 'Nope<br />';
Regex from: https://stackoverflow.com/a/206087/1533203
Check out Simple HTML Dom Parser ( here ).
With it you can simply access elements on the DOM tree.
Your problem could be solved with:
$html->find("span[rel=url]");
And then you could simply use a loop on all elements and some regex which fits your needs.

Extract content between first "]" and last "[" using regex?

Is it possible to have a PHP regex expression that extracts the content from the first ] to the last [?
For example if I had the following string:
$string = [shortcode]You write a shortcode by using ([])[/shortcode]
I would want to extract:
You write a shortcode by using brackets ([])
and store it in a variable. The content to be extracted could be anything. Thanks in advance.
You should be using capturing groups to make sure you match the closing tag.
\[(\w+)\].*?\[/\1\]
This will match a word inside [] and keep going until if finds the same word inside [/...].
Regexes are greedy by default, so this will do the job just fine:
/\](.*)\[/
To get this working in PHP properly, you would do something like this:
preg_match('/\](.*)\[/', $text, $matches);
$result = $matches[1];
this could make, what you need
[^\]]\](.*)\[[^\[]
This works:
preg_match( '#\](.*)\[#', $string, $matches);
print_r($matches);

Regex - Grab a specific word within specific tags

I don't consider myself a PHP "noob", but regular expressions are still new to me.
I'm doing a CURL where I receive a list of comments. Every comment has this HTML structure:
<div class="comment-text">the comment</div>
What I want is simple: I want to get, from a preg_match_all, the comments that have the word "cool" in this specific DIV tag.
What I have so far:
preg_match_all("#<div class=\"comment-text\">\bcool\b</div>#Uis", $getcommentlist, $matchescomment);
Sadly, this doesn't work. But if the REGEX is simply #\bcool\b#Uis, it will work. But I really want to capture the word "cool" in those tags.
I know I could do 2 regular expressions (one that gets all the comments, the other that filters each of them to capture the word "cool"), but I was wondering how could I do this in one preg_match_all?
I don't think I'm far from the solution, but somehow I just can't find it. Something's definitely missing.
Thank you for your time.
This should give you what you're looking for, and provide some flexibility if you want to change things a bit:
$input = '<div class="comment-text">the comment</div><div class="comment-text">cool</div><div class="comment-text">this one is cool too</div><div class="comment-text">ool</div>';
$class="comment-text";
$text="cool";
$pattern = '#<div class="'.$class.'">([^<]*'.$text.'[^<]*)</div>#s';
preg_match_all($pattern, $input, $matches);
Obviously, you need to set your input as the value for $input. After this runs, an array of the <div>s that matched will be in $matches[0] and an array of the text that matched will be in $matches[1]
You can change the class of div to match or the within-div text to require by changing the $class and $text values, respectively.

PHP Regex to grab {tag}something{/tag}

I'm trying to come=up with a regex string to use with the PHP preg functions (preg_match, etc.) and am stumped on this:
How do you match this string?:
{area-1}some text and maybe a link.{/area-1}
I want to replace it with a different string using preg_replace.
So far I've been able to identify the first tag with preg_match like this:
preg_match("/\{(area-[0-9]*)\}/", $mystring);
Thanks if you can help!
If you don't have nested tags, something this simple should work:
preg_match_all("~{.+?}(.*?){/.+?}~", $mystring, $matches);
Your results can be then found in $matches[1].
I would suggest
preg_match_all("~\{(area-[0-9]*)\}(.*?)\{/\1\}~s", $mystring, $matches);
This will even work if other tags are nested inside the area tag you're looking at.
If you have several area tags nested within each other, it will still work, but you'll need to apply the regex several times (once for each level of nesting).
And of course, the contents of the matches will be in $matches[2], not $matches[1] as in Tatu's answer.

Categories