I have page image.php
where images are kept in container like below :- Note: There are other Images outside container div too.. i just want images from container div.
<!DOCTYPE html>
<head>
<title>Image Holder</title>
</head>
<body>
<header>
<img src="http://examepl.com/logo.png">
<div id="side">
<div id="facebook"><img src="http://examepl.com/fb.png"></div>
<div id="twiiter"><img src="http://examepl.com/t.png"></div>
<div id="gplus"><img src="http://examepl.com/gp.png"></div>
</div>
</header>
<div class="container">
<p>SOme Post</p>
<img src="http://examepl.com/some.png" title="some image" />
<p>SOme Post</p>
<img src="http://examepl.com/some.png" title="some image" />
<p>SOme Post</p>
<img src="http://examepl.com/some.png" title="some image" />
</div>
<footer>
<div id="foot">
copyright © 2013
</div>
</footer>
</body>
</html>
and i am trying to fetch only image from my image.php file with preg_match_all, but it returns boolean(false) :(
my php code :-
<?php
$file = file_get_contents("image.php");
preg_match_all("/<div class=\"container\">(.*?)</div>/", $file, $match);
preg_match_all("/<img src=\"(.*?)\">/", $match, $images);
var_dump($images);
?>
Both the files are in root folder , and now i am getting blank page :(
Any help would be great
Thanks
I think this will work for you try the link below to test your regex
preg_match_all("/<div class=\"container\">(.*?)<\/div>/", $file, $match);
preg_match_all("/<img .*?(?=src)src=\"([^\"]+)\"/", $match[1][0], $images);
http://www.phpliveregex.com
You better not use regex for this purpose. PHP provides nice DOM api for this purpose. Consider code like below:
$html = <<< EOF
<div class="container">
<p>SOme Post</p>
<img src="http://examepl.com/some1.png" title="some image" />
<p>SOme Post</p>
<img src="http://examepl.com/some2.png" title="some image" />
<p>SOme Post</p>
<img src="http://examepl.com/some3.png" title="some image" />
</div>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//div[#class='container']/img");
$img = array();
for($i=0; $i < $nodelist->length; $i++) {
$node = $nodelist->item($i);
$img[] = $node->getAttribute('src');
}
print_r($img);
OUTPUT:
Array
(
[0] => http://examepl.com/some1.png
[1] => http://examepl.com/some2.png
[2] => http://examepl.com/some3.png
)
Live Demo: http://ideone.com/iBhVMF
You can easily obtain what you want with an XPath query:
$url = 'http://examepl.com/image.php';
$doc = new DOMDocument();
#$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$srcs = $xpath->query("//div[#class='container']//img/attribute::src");
foreach ($srcs as $src) {
echo '<br/>' . $src->value;
}
preg_match_all("/<img src=\"(.*?)\">/", $match, $images);
replace with
preg_match_all("/<img src=\"(.*?)\"/", $match, $images); // stripped ">" char
Related
This is my content:
<p><img src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
This is my PHP code:
function convert_the_content($content){
$content = preg_replace('/<p><img.+src=[\'"]([^\'"]+)[\'"].*>/i', "<p class=\"uploaded-img\"><img class=\"lazy-load\" data-src=\"$1\" /></p>", $content);
return $content;
}
I using my code to add a class for <p> tag and <img> tag and to convert src="" to data-src="".
The problem that my code has removed the width and the height attr from <img> tag, So my question is how can i change my code to work and getting this details with it too?
NOTE: My content may have many of <img> and <p> tags.
If you only have this very exact HTML snippet, you can do it simpler by just doing
$html = <<< HTML
<p><img src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
HTML;
$html = str_replace('<p>', '<p class="foo">', $html);
$html = str_replace(' src=', ' data-src=', $html);
echo $html;
This will output
<p class="foo"><img data-src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
If you are trying to convert arbitrary HTML, consider using a DOM Parser instead:
<?php
$html = <<< HTML
<html>
<body>
<p><img src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
<p><img width="215" height="1515" src="http://localhost/contents/uploads/2017/11/1.png"></p>
<p ><img
class="blah"
height="1515"
width="215"
src="http://localhost/contents/uploads/2017/11/1.png"
>
</p>
</body>
</html>
HTML;
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//p[img]') as $paragraphWithImage) {
$paragraphWithImage->setAttribute('class', 'foo');
foreach ($paragraphWithImage->getElementsByTagName('img') as $image) {
$image->setAttribute('class', trim('bar ' . $image->getAttribute('class')));
$image->setAttribute('data-src', $image->getAttribute('src'));
$image->removeAttribute('src');
}
};
echo $dom->saveHTML($dom->documentElement);
Output:
<html><body>
<p class="foo"><img width="215" height="1515" class="bar" data-src="http://localhost/contents/uploads/2017/11/1.jpg"></p>
<p class="foo"><img width="215" height="1515" class="bar" data-src="http://localhost/contents/uploads/2017/11/1.png"></p>
<p class="foo"><img class="bar blah" height="1515" width="215" data-src="http://localhost/contents/uploads/2017/11/1.png"></p>
</body></html>
I have some HTML snippets retrieved through PHP/JSON such as:
<div>
<p>Some Text</p>
<img src="example.jpg" />
<img src="example2.jpg" />
<img src="example3.jpg" />
</div>
I am loading it with DOMDocument() and xpath and would like to be able to manipulate it so I can add lazy loading to the images like so:
<div>
<p>Some Text</p>
<img class="lazy" src="blank.gif" data-src="example.jpg" />
<img class="lazy" src="blank.gif" data-src="example2.jpg" />
<img class="lazy" src="blank.gif" data-src="example3.jpg" />
</div>
Which entails:
Add class .lazy
Add data-src attribute from original src attribute
Modify src attribute to blank.gif
I am trying the following but it isn't working:
foreach ($xpath->query("//img") as $node) {
$node->setAttribute( "class", $node->getAttribute("class")." lazy");
$node->setAttribute( "data-src", $node->getAttribute("src"));
$node->setAttribute( "src", "./inc/image/blank.gif");
}
but it isn't working.
Are you sure? The following works for me.
<?php
$html = <<<EOQ
<div>
<p>Some Text</p>
<img src="example.jpg" />
<img src="example2.jpg" />
<img src="example3.jpg" />
</div>
EOQ;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//img') as $node) {
$node->setAttribute('class', $node->getAttribute('class') . ' lazy');
$node->setAttribute( "data-src", $node->getAttribute("src"));
$node->setAttribute( "src", "./inc/image/blank.gif");
}
echo $dom->saveHTML();
We have following rss feed
<title>THIS IS THE TITLE</title>
<link>http://www.website.com/....</link>
<description>
<div class="primary-image">
<img typeof="foaf:Image" src="http://website.com/" alt="Drink driving" title="Drink driving" />
</div>
<div class="field-group-format group_meta field-group-div group-meta speed-fast effect-none">
<span class="field field-name-field-published-date field-type-datetime field-label-hidden">
<span class="field-item even">
<span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2014-01-29T17:43:00+00:00">29 Jan, 2014 5:43pm</span>
</span>
</span>
<span class="field field-name-field-author field-type-node-reference field-label-hidden">
<span class="field-item even">Joe Finnerty</span>
</span>
</div>
<p class="short-desc">TEXT THAT I WANT TO EXTRACT FROM HERE</p>
</description>
And i am trying to extract the <p class="short-desc">TEXT THAT I WANT TO EXTRACT FROM HERE</p> with the following this script and checked some questions here but did not find a practical response.
I tried adding
$htmlStr = $node->getElementsByTagName('description')->item(0)->nodeValue;
$html = new DOMDocument();
$html->loadHTML($htmlStr);
$xpath = new DOMXPath($html);
$desc = $xpath->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' short-desc')]");
before $item = array ( , within the foreach loop but did not work.
but did not do the job. Also instead of
< is replacing < AND
" is replacing " AND
> is replacing >
Please help i am trying to find an answer for some days now and did not find it.
Assuming that you are passing the above HTML content to the $html variable ..
$dom = new DOMDocument;
#$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('p') as $tag) {
if ($tag->getAttribute('class') === 'short-desc') {
echo $tag->nodeValue; //"prints" TEXT THAT I WANT TO EXTRACT FROM HERE
}
}
If i understand correctly, you want to remove tags from feeds so you can try like this:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
?>
output will be:
Test paragraph. Other text
For more info:http://in3.php.net/strip_tags
why not use regex?
$strRegex = '%<p class="short-desc">(.+?)</p>%s';
if (preg_match_all($strRegex, $strContent, $arrMatches))
{
var_dump($arrMatches[1][0]);
}
and to get the content use
$path = 'path/to/file';
$strContent = file_get_contents($path);
I want to remove all image-tags before the headline starts, but they are not nested the same way. And then remove the empty tags.
<div class="c2">
<img src="image/file" width="480" height="360" alt="Image" />
</div>
<div class="c2">
<div class="headline">
headline
</div>
<div class="headline">
headline2
</div>
</div>
and different nested tags like
<div class="c2">
<p>
<img src="image/A.JPG" width="480" height="319" alt="Image" />
</p>
<div class="headline">
A headline
</div>
</div>
i think that could be solved recursively, but i dont know how.
Thanks for your help!
EDIT: if you want to remove only <img> followed by <div><div class="headline>" or <div class="headline">, use this xpath:
$imgs = $xpath->query("//img[../following-sibling::div[1]/div/#class='headline' or ../following-sibling::div[1]/#class='headline']");
see it working: http://codepad.viper-7.com/QhprLP
Do it like this:
$doc = new DOMDocument();
$doc->loadHTML($x); // assuming HTML in $x
$xpath = new DOMXpath($doc);
$imgs = $xpath->query("//img"); // select all <img> nodes
foreach ($imgs as $img) { // loop through list of all <img> nodes
$parent = $img->parentNode;
$parent->removeChild($img); // delete <img> node
if ($parent->childNodes->length >= 1) // if parent node of <img> is empty delete it
$parent->parentNode->removeChild($parent);
}
echo htmlentities($doc->saveHTML()); // display the new HTML
see it working: http://codepad.viper-7.com/350Hw6
My below code retrieves a series of images from the search results of a site and also the corresponding age data. It works fine however I get a list of images followed by a list of the information in the age field.
img img img img age age age age and so on.
How do I combine these so I can display them in sets: img age img age img age
<?php
error_reporting(-1);
$html = new DOMDocument();
#$html->loadHtmlFile('http://www.site.com/searchresults.html');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#class='age']" );
$tags = $html->getElementsByTagName('img');
foreach ($tags as $tag) {
$image = $tag->getAttribute('src');
echo '<img src='. $image .' alt="image" ><br>';
}
foreach ($nodelist as $n)
{
echo $n->nodeValue."<br>";
}
?>
Sample page, I want to extract the img source title data from <div class="age" title="30 usa">:
<div id="sr-15763292" class="search-result">
<div class="thumb-wrapper">
<a class="bioLink" href="http://www.site.com/user/" title="View user"><img src="http://www.site.com/img/15763292.jpg" class="thumb" alt="user" width="140" height="105"></a>
<p class="status"><a href="http://www.site.com/user/" >Online</a></p>
</div>
<div class="rating">
<div class="rating-stars rating4"></div>
</div>
<div class="age" title="30 usa">
<p>30</p>
<p class="gender m">m</p>
<p>USA</p>
</div>
<div>
<p class="headline">Hello there.</p>
</div>
</div>
It's hard to answer if we don't know what the HTML looks like! Assuming it looks something like this
<div class="age"><p>21</p>
<img src="a.jpg" />
</div>
<div class="age"><p>51</p>
<img src="b.jpg" />
</div>
you need to find each div and then find the image inside each div. getElementsByTagName() will give you a list even if there's only one result, so use item() to fetch the first.
error_reporting(-1);
$html = new DOMDocument();
#$html->loadHtmlFile('results.html');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#class='age']" );
foreach ($nodelist as $node) {
$tags = $node->getElementsByTagName('img');
$image = $tags->item(0)->getAttribute('src');
echo '<img src="'. $image .'" alt="image" ><br>';
echo $node->textContent . '<br>';
}
If the HTML is like this
<div class="age"><p>21</p></div><img src="a.jpg" />
you can try
$node->nextSibling()
As a general point trace through the HTML and think how do I get from A to B? Go forwards? backwards? up to parent, to the next node and down again ...?