Let's say i have many images in a string and i only want to get the src of the image with a specific class
<img src="image1.jpg"/>
<img src="image2.jpg"/>
<img src="image3.jpg" class="main"/>
I wanted to get the src of the third one, which has a class main. How do i do that?
$pattern = '/< *img[^>]*src *= *["\']?([^"\']*)/i';
preg_match($pattern,$response,$matches);
this one matches all img tags.
Don't use a regex to parse HTML. Use DOMDocument instead.
Here's some code:
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xp = new DOMXPath($dom);
$imgs = $xp->query("//img[#class='main']");
$imgs now has a NodeList of images with the main class. (I think - I haven't used DOMXPath much)
Related
I have the following div:
<div class="myclass"><strong><a rel="nofollow noopener" href="some link">dynamic content</a></strong></div>
and I want to get only the: dynamic content anchor text.
so far I have tried with preg_match_all:
"'<div class=\"myclass\">(.*?)</div>'si"
that returns all div content.
I tried to combine it with:
"|<a.*(?=href=\"([^\"]*)\")[^>]*>([^<]*)</a>|i"
that returns anchor text but I cannot make it to work
can someone help?
thank you
You can use DOMDocument instead to preg_match_all
$html = '<div class="myclass"><strong><a rel="nofollow noopener" href="some link">dynamic content</a></strong></div>';
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$query = './/div[#class="myclass"]/strong/a';
$nodes = $xpath->query($query);
echo $nodes[0]->textContent;
My html content:
$content = <div class="class-name some-other-class">
<p>ack</p>
</div>
Goal: Remove div with class="class-name so that I'm left with:
<p>ack</p>
I know strip_tags($content, '<p>'); would do the job in this instance but I want to be able to target the divs with a certain class and preserve other divs etc.
And I'm aware that you shouldn't pass html through regex - So whats the best way/proper way to achieving this.
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($content); // loads your HTML
$xpath = new DOMXPath($doc);
// returns a list of all links with class containing class-name
$nlist = $xpath->query("div[contains(#class, 'class-name')]");
// Remove the nodes from the xpath query
foreach($nlist as $node) {
$node->parentNode->removeChild($node);
}
echo $doc->saveHtml();
Maybe with some jQuery? '$(".class-name").remove();'
I have a variable with HTML source and I need to find images within the variable that contain images with specific src attributes.
For example my image:
<img src="/path/img1.svg">
I have tried the below but doesnt work, any suggestions?
$hmtl = '<div> some stuff <img src="/path/img1.svg"/> </div><div>other stuff</div>';
preg_match_all('/<img src="/path/img1.svg"[^>]+>/i',$v, $images);
You should make use of DOMDocument Class, not regular expressions when it comes to parsing HTML.
<?php
$html='<img src="/path/img1.svg">';
$dom = new DOMDocument;
#$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('img') as $tag) {
echo $tag->getAttribute('src'); //"prints" /path/img1.svg
}
I want to create regex that match the text inside opening and its matching closing angle brackets of html img tag with PHP. Let's say I have the html text in variable $searchThis
$searchThis = "<html><div></div><img src='/relative/path/img1.png'/></div>
<img src='/relative/path/img2.png'/><div></div></div>
<img src='/relative/path/img3.png'/><ul><li></li></ul></html>";
I want to match the content in tags which ellipsis is substitution for. The result must be the following matches:
src='/relative/path/img1.png'
src='/relative/path/img2.png'
src='/relative/path/img3.png'
This is how I imagine the pattern should be and which actually doesn't work for me:
$pattern = "<img([^\/]+)\/>";
Never try to parse HTML with regex. For parsing HTML use DOM Parser. Consider code like this:
$html = <<< EOF
<html><div></div><img src='/relative/path/img1.png'/></div>
<img src='/relative/path/img2.png'/><div></div></div>
<img src='/relative/path/img3.png'/><ul><li></li></ul></html>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//img");
for($i=0; $i < $nodelist->length; $i++) {
$node = $nodelist->item($i);
$src = $node->attributes->getNamedItem('src')->nodeValue;
echo "src='$src'\n";
}
OUTPUT:
src='/relative/path/img1.png'
src='/relative/path/img2.png'
src='/relative/path/img3.png'
Try:
preg_match_all("`<img (.*)/>`Uis", $searchThis, $results);
print_r($results);
Printing the structure of $results will show you its content.
Note: If you wish to be more accurate, I would suggest you to include src= in your search and go until the closing quote mark, in order to to only select the image address. Then you can add the missing text (src=) afterwards.
This way, you still gets the relative path, even when your image tag doesn't look like expected (i.e. there are other stuffs in the tag like alt="Smiley face" height="42" width="42").
Example Parsing With simplehtmldom
<?php
include("simplehtmldom/simple_html_dom.php");
// Create DOM from URL or file
$html = str_get_html("<html><div></div><img src='/relative/path/img1.png'/></div>
<img src='/relative/path/img2.png'/><div></div></div>
<img src='/relative/path/img3.png'/><ul><li></li></ul></html>");
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
?>
i have no idea about php regex i wish to extract all image tags <img src="www.google.com/exampleimag.jpg"> form my html how can i do this using preg_match_all
thanks SO community for u'r precious time
well my scenario is like this there is not whole html dom but just a variable with img tag $text="this is a new text <img="sfdsfdimg/pfdg.fgh" > there is another iamh <img src="sfdsfdfsd.png"> hjkdhfsdfsfsdfsd kjdshfsd dummy text
Don't use regular expressions to parse HTML. Instead, use something like the DOMDocument that exists for this very reason:
$html = 'Sample text. Image: <img src="foo.jpg" />. <img src="bar.png" />';
$doc = new DOMDocument();
$doc->loadHTML( $html );
$images = $doc->getElementsByTagName("img");
for ( $i = 0; $i < $images->length; $i++ ) {
// Outputs: foo.jpg bar.png
echo $images->item( $i )->attributes->getNamedItem( 'src' )->nodeValue;
}
You could also get the image HTML itself if you like:
// <img src="foo.jpg" />
echo $doc->saveHTML ( $images->item(0) );
You can't parse HTML with regex. You're much better off using the DOM classes. They make it trivially easy to extract the images from a valid HTML tree.
$doc = new DOMDocument ();
$doc -> loadHTML ($html);
$images = $doc -> getElementsByTagName ('img'); // This will generate a collection of DOMElement objects that contain the image tags