i have the following php code not returned images containing non-latin character,because their links show weird address.
if($image) {
var_dump(mb_detect_encoding($image, 'UTF-8', true))
$doc = new DOMDocument();
$doc->loadHTML($image);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) { ?>
<img data-src="<?php echo $img->getAttribute("src");?>" class="something"/>
<?php }
} ?>
already i have <meta charset="utf-8">in <head>.how should i deal with this problem.
Related
I'm having a weird issue trying to append an image element to a noscript element using php DomDocument.
If I create a new div node I can append it without issue to the noscript element but as soon as a try to append an image element the script just times out.
What am I doing wrong?
<?php
$html = '<!DOCTYPE html><html><head><title>Sample</title></head><body><img src="https://example.com/images/example.jpg"></body></html>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
foreach ($images as $image) {
$src = $image->getAttribute('src');
$noscript = $doc->createElement('noscript');
$node = $doc->createElement('div');
//$node = $doc->createElement('img'); If a uncomment this line the script just times out
$node->setAttribute('src', $src);
$noscript->appendChild($node);
$image->setAttribute('x-data-src', $src);
$image->removeAttribute('src');
$image->parentNode->appendChild($noscript);
//$image->parentNode->appendChild($newImage);
}
$body = $doc->saveHTML();
echo $body;
You're getting caught in a recursive loop. This will help you visualize what's going on. I've added indenting for clarity:
php > $html = '<!DOCTYPE html><html><head><title>Sample</title></head><body><img src="https://example.com/images/example.jpg"></body></html>';
php >
php > $doc = new DOMDocument();
php > $doc->loadHTML($html);
php >
php > $images = $doc->getElementsByTagName('img');
php >
php > $count=0;
php > foreach ($images as $image) {
php { $count++;
php { if($count>4) {
php { die('limit exceeded');
php { }
php {
php { $src = $image->getAttribute('src');
php { $noscript = $doc->createElement('noscript');
php {
php { //$node = $doc->createElement('div');
php { $node = $doc->createElement('img'); //If a uncomment this line the script just times out
php {
php { $node->setAttribute('src', $src);
php {
php { $noscript->appendChild($node);
php {
php { $image->setAttribute('x-data-src', $src);
php { $image->removeAttribute('src');
php { $image->parentNode->appendChild($noscript);
php { //$image->parentNode->appendChild($newImage);
php {
php { }
limit exceeded
php > $body = $doc->saveHTML();
php >
php > echo $body;
<!DOCTYPE html>
<html><head><title>Sample</title></head><body>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img src="https://example.com/images/example.jpg">
</noscript>
</noscript>
</noscript>
</noscript>
</body></html>
php >
The troublesome line causing the recursion is
$image->parentNode->appendChild($noscript);
if you comment that out, the recursion goes away. Notice that when it recurses, the x-data-src is being applied to all but the last one.
I haven't quite figured out what is causing this behaviour, but hopefully being able to visualize it will help you diagnose it further.
**UPDATE
The OP took this and ran with it, and completed the answer with his solution as shown below.
The problem was in fact that getElementsByTagName returns a LiveNodeList so appending an image to the doc will cause the infinite recursion.
I solved it by first collecting all the image tags in a simple array
<?php
$html = '<!DOCTYPE html><html><head><title>Sample</title></head><body><img src="https://example.com/images/example.jpg"></body></html>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
$normal_array = [];
foreach ($images as $image) {
$normal_array[] = $image;
}
// Now we have all tags in a simple array NOT in a Live Node List
foreach ($normal_array as $image) {
$src = $image->getAttribute('src');
$noscript = $doc->createElement('noscript');
$node = $doc->createElement('img'); //If a uncomment this line the script just times out
$node->setAttribute('src', $src);
$noscript->appendChild($node);
$image->setAttribute('x-data-src', $src);
$image->removeAttribute('src');
$image->parentNode->appendChild($noscript);
//$image->parentNode->appendChild($newImage);
}
$body = $doc->saveHTML();
What I have so far:
<?php
$html = file_get_contents('content/');
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node)
{
echo $node->nodeValue.': '.$node->getAttribute("href")."\n";
}
?>
I have a directory called 'content' that has several HTML documents in it. Edit: Each document has one link in it, wrapped around an image. I want to parse each document and display the link from each page as an image. Would I need a loop to step through each document?
You can try something like this:
foreach (glob("content/*.html") as $filename) {
$html = file_get_contents($filename);
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node) {
echo $node->nodeValue.': '.$node->getAttribute("href")."\n";
}
}
Well Andrej Ludinovskov's answer helped guide me to the answer but it took a lot trial and error so here it is. How to fetch all the the links as images.
foreach ($dom->getElementsByTagName('a') as $link) {
echo "<a href=" .$link->getAttribute("href"). ">";
foreach ($dom->getElementsByTagName('img') as $img) {
echo "<img src=".$img->getAttribute('src').">";
}
}
hopefully this can help someone else.
I am having trouble with retrieving the src of an image that is part of a link. For example with this I would like to retrieve the src of the img between the tag.
<img src="http://example.com/picture1234.jpg" id="pic_1234" />
I will need to do this for a couple of the links on the page that are all laid out the same. So what I tried so far is this:
$dom = new DOMDocument;
#$dom->loadHTML($html);
$i = 0;
$links = $dom->getElementsByTagName('a');
//Get images
foreach ($links as $link){
$test = $link->getAttribute('href');
if (strpos($test,'/video') !== false) {
$XV_IMG[$i] = $link->nodeValue;
$i++;
}
}
If the link does not contain an img tag only, but instead it has plain text it will work just fine. Is there any way to get the src?
Just keep using getElementsByTagName on the node like this
foreach ($link->getElementsByTagName('img') as $img) {
$XV_IMG[] = $img->getAttribute('src');
}
try to use preg_match_all
$html= '<img src="http://example.com/picture1234.jpg" id="pic_1234" />
<img src="http://example.com/picture1224.jpg" id="pic_1224" />
<img src="http://example.com/picture1434.jpg" id="pic_1434" />
<img src="http://example.com/picture1554.jpg" id="pic_1554" />
<img src="http://example.com/picture1334.jpg" id="pic_1334" />';
preg_match_all('/<a href="(.*)"><img src="(.*)" id="pic_[0-9]{1,7}" \/><\/a>/i',$html,$out);
unset($out[0]);
unset($out[1]);
print_r($out);
I am querying image using getElementsByTagName("img") and printing it using image->src , it does not work. I also tried to use image->nodeValue this to does not work.
require('simple_html_dom.php');
$dom=new DOMDocument();
$dom->loadHTML( $str); /*$str contains html output */
$xpath=new DOMXPath($dom);
$imgfind=$dom->getElementsByTagName('img'); /*finding elements by tag name img*/
foreach($imgfind as $im)
{
echo $im->src; /*this doesnt work */
/*echo $im->nodeValue; and also this doesnt work (i tried both of them separately ,Neither of them worked)*/
// echo "<img src=".$im->nodeValue."</img><br>"; //This also did not work
}
/*the image is encolsed within div tags.so i tried to query value of div and print but still image was not printed*/
$printimage=$xpath->query('//div[#class="abc"]');
foreach($printimage as $image)
{
echo $image->src; //still i could not accomplish my task
}
Okay, use this to display your image:
foreach($imgfind as $im)
{
echo "<img src=".$im->getAttribute('src')."/>"; //use this instead of echo $im->src;
}
and it will surely display your image. Make sure path to the image is correct.
Espero te sirva
$dom = new DOMDocument();
$filename = "https://www.amazon.com/dp/B0896WB9XD/";
$html = file_get_contents($filename);
#$dom->loadHTML($html);
$imgfind=$dom->getElementsByTagName('img');
foreach($imgfind as $im)
{
$ids= $im->getAttribute('id');
if ($ids == 'landingImage') {
$im2 = $im->getAttribute('src');
echo '<img src="'.$im2.'">';
}
else{
}
}
para amazon.
I have some UTF8 text+image data which must be processed.
My whole code is in one file; here is the complete code:
<?php
echo "<html xmlns=\"http://www.w3.org/1999/xhtml\">
<head><meta http-equiv='Content-Type' content='text/html; charset=utf-8' /></head><body>";
$article_header="აბგდევზთ<img src='some_url/img/15.jpg' alt=''>აბგდევზთ";
echo "1".$article_header."<br>";
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) {
if(!$img->getAttribute('class')){
$src = $img->getAttribute('src');
$newSRC = str_replace('/img/', '/mini/', $src);
$img->setAttribute('src', $newSRC);
$img->removeAttribute('width');
$img->removeAttribute('height');
$article_header = $doc->saveHTML();
}
}
echo "2".$article_header."<br>";
echo "</body></html>";
?>
As you see I echo data 2 times.
The first time, it brings both text and image, as expected.
The second time, it brings the modified image as expected. But the text becomes damaged, like this: áƒáƒ‘გდევზთ
Is there any way to fix this problem?
Guys I've found the solution!!!!!!!!!! Huraaa !!!! :))))
For those who will face this problem in future here is the code
$article_header = mb_convert_encoding($article_header, 'HTML-ENTITIES', "UTF-8");
This must be done before loadHTML and everything works fine!!!!