UTF8 problems while working with DOM Object in PHP

UTF8 problems while working with DOM Object in PHP - php

I have some UTF8 text+image data which must be processed.
My whole code is in one file; here is the complete code:
<?php
echo "<html xmlns=\"http://www.w3.org/1999/xhtml\">
<head><meta http-equiv='Content-Type' content='text/html; charset=utf-8' /></head><body>";
$article_header="აბგდევზთ<img src='some_url/img/15.jpg' alt=''>აბგდევზთ";
echo "1".$article_header."<br>";
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) {
if(!$img->getAttribute('class')){
$src = $img->getAttribute('src');
$newSRC = str_replace('/img/', '/mini/', $src);
$img->setAttribute('src', $newSRC);
$img->removeAttribute('width');
$img->removeAttribute('height');
$article_header = $doc->saveHTML();
}
}
echo "2".$article_header."<br>";
echo "</body></html>";
?>
As you see I echo data 2 times.
The first time, it brings both text and image, as expected.
The second time, it brings the modified image as expected. But the text becomes damaged, like this: áƒáƒ‘áƒ’áƒ“áƒ”áƒ•áƒ–áƒ—
Is there any way to fix this problem?

Guys I've found the solution!!!!!!!!!! Huraaa !!!! :))))
For those who will face this problem in future here is the code
$article_header = mb_convert_encoding($article_header, 'HTML-ENTITIES', "UTF-8");
This must be done before loadHTML and everything works fine!!!!

Related

php getimagesize with persian file name

I'm trying to write an Joomla plugin to add width and height tag to each <img> in HTML file.
Some image file names are Persian, and getimagesize faces error.
The code is this:
#$dom->loadHTML('<?xml version="1.0" encoding="UTF-8"?>' . "\n" . '
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<img src="images\banners\س.jpg" style="max-width: 90%;" >
</body>
</html>
');
$x = new DOMXPath($dom);
foreach($x->query("//img") as $node)
{
$imgtag = $node->getAttribute("src");
$imgtag = pathinfo($imgtag);
$imgtag = $imgtag['dirname'].'\\'.$imgtag['basename'];
$imgtag = getimagesize($imgtag);
$node->setAttribute("width",$imgtag[0]);
$node->setAttribute("height",$imgtag[1]);
}
$newHtml = urldecode($dom->saveHtml($dom->documentElement));
And when Persian characters exist in file name, getimagesize shows:
Warning: getimagesize(images\banners\س.jpg): failed to open stream: No such file or directory in C:\wamp64\www\plugin.php
How can I solve this?

Thanks to all,
I couldn't reach to results on WAMP server (local server on Windows),
but when I migrated to Linux server, finally this code worked properly.
$html = $app->getBody();
setlocale(LC_ALL, '');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$x = new DOMXPath($dom);
foreach($x->query("//img") as $node)
{
$imgtag = $node->getAttribute("src");
if(strpos($imgtag,"data:image")===false)
{
$imgtag = getimagesize($imgtag);
$node->setAttribute("width",$imgtag[0]);
$node->setAttribute("height",$imgtag[1]);
}
}
$bodytag = $x->query("//body");
$node = $dom->createElement("script", ' /* java script which may be necessary on client */ ');
$bodytag[0]->appendChild($node);
$html = '<!DOCTYPE html>'."\n" . $dom->saveHtml($dom->documentElement);
Some hints:
the code, shouldn't touch base64 image sources, so I added an condition to the code.
if some script (or whatever, div, p, ....) should be added to body tag, you can use appendChild method.
<!DOCTYPE html> should be added to final DOM object output :)

PHP-How to re display images on website from fetched URL IMGS

So I'm a bit stuck, and I've been given various solutions, none of which work. Any hotshot PHP folks out there? Here's the deal, I'm trying to get an image to display on my website, from another website, that has a randomly generated IMG. Though I'm actually trying to do this off a personal art site of mine, this example will serve perfectly.
http://commons.wikimedia.org/wiki/Special:Random/File
A random image page with an image on it pops up with that link. Now, I'd like to display THAT random image, or whatever image comes up, on another site. The two possible solutions I have encountered is gathering an array of URL LINKS from a given link. And then re displaying that array as images on another site, like a: < a href="https
The code I get back from what I'm talking about looks like this:
Array
(
[0] => https ://kfjhiakwhefkiujahefawef/awoefjoiwejfowe.jpg
[1] => https ://oawiejfoiaewjfoajfeaweoif/awoeifjao;iwejfoawiefj.png
)
Instead of the print out however, I'd like the actual images displayed, well specifically array [0], but one thing at a time. The code that's actually doing this is:
<?php
/*
Credits: Bit Repository
URL: http://www.bitrepository.com/
*/
$url = 'http://commons.wikimedia.org/wiki/Special:Random/File';
// Fetch page
$string = FetchPage($url);
// Regex that extracts the images (full tag)
$image_regex_src_url = '/<img[^>]*'.
'src=[\"|\'](.*)[\"|\']/Ui';
preg_match_all($image_regex, $string, $out, PREG_PATTERN_ORDER);
$img_tag_array = $out[0];
echo "<pre>"; print_r($img_tag_array); echo "</pre>";
// Regex for SRC Value
$image_regex_src_url = '/<img[^>]*'.
'src=[\"|\'](.*)[\"|\']/Ui';
preg_match_all($image_regex_src_url, $string, $out, PREG_PATTERN_ORDER);
$images_url_array = $out[1];
echo "<pre>"; print_r($images_url_array); echo "</pre>";
// Fetch Page Function
function FetchPage($path)
{
$file = fopen($path, "r");
if (!$file)
{
exit("The was a connection error!");
}
$data = '';
while (!feof($file))
{
// Extract the data from the file / url
$data .= fgets($file, 1024);
}
return $data;
}
for($i=0; $i<count($arr1); $i++) {
echo '<img src="'.$arr1[$i].'">';
}
?>
Solution two,
Use a file_get_contents command. Which is this:
<?php
$html =
file_get_contents("http://commons.wikimedia.org/wiki/Special:Random/File");
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$image_src = $xpath->query('//div[contains(#class,"fullImageLink")]/a/img')
[0]->getAttribute('src') ;
echo "<img src='$image_src'><br>";
?>
However, there's unfortunately an error message I get: Fatal error: Cannot use object of type DOMNodeList as array in /home/wilsons888/public_html/wiki.php on line 11. Or, if I remove a "}" at the end, I just get a blank page.
I have been told that the above code will work, but with openssl extension included. Problem is, I have no idea how to do this. (I'm very new to PHP). Anyone know how to plug it in, so to speak? Thank you so much! I feel like I'm close, just missing the last element.

I was able to load the random image, and "print it" as an image directly (so you can embed the php file directly on the IMG tag) using this code:
<?php
$html = file_get_contents("http://commons.wikimedia.org/wiki/Special:Random/File");
$dom = new DOMDocument();
$dom->loadHTML($html);
$remoteImage = $dom->getElementById("file")->firstChild->attributes[0]->textContent;
header("Content-type: image/png");
header('Content-Length: ' . filesize($remoteImage));
echo file_get_contents($remoteImage);
?>
Get a new file called showImage.php and put this code in it:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<img src="test.php">
</body>
</html>
Next, go to your browser and get the showImage.php path, and will show a random image fromt he site you asked...

Can't append img element using php domdocument

I'm having a weird issue trying to append an image element to a noscript element using php DomDocument.
If I create a new div node I can append it without issue to the noscript element but as soon as a try to append an image element the script just times out.
What am I doing wrong?
<?php
$html = '<!DOCTYPE html><html><head><title>Sample</title></head><body><img src="https://example.com/images/example.jpg"></body></html>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
foreach ($images as $image) {
$src = $image->getAttribute('src');
$noscript = $doc->createElement('noscript');
$node = $doc->createElement('div');
//$node = $doc->createElement('img'); If a uncomment this line the script just times out
$node->setAttribute('src', $src);
$noscript->appendChild($node);
$image->setAttribute('x-data-src', $src);
$image->removeAttribute('src');
$image->parentNode->appendChild($noscript);
//$image->parentNode->appendChild($newImage);
}
$body = $doc->saveHTML();
echo $body;

You're getting caught in a recursive loop. This will help you visualize what's going on. I've added indenting for clarity:
php > $html = '<!DOCTYPE html><html><head><title>Sample</title></head><body><img src="https://example.com/images/example.jpg"></body></html>';
php >
php > $doc = new DOMDocument();
php > $doc->loadHTML($html);
php >
php > $images = $doc->getElementsByTagName('img');
php >
php > $count=0;
php > foreach ($images as $image) {
php { $count++;
php { if($count>4) {
php { die('limit exceeded');
php { }
php {
php { $src = $image->getAttribute('src');
php { $noscript = $doc->createElement('noscript');
php {
php { //$node = $doc->createElement('div');
php { $node = $doc->createElement('img'); //If a uncomment this line the script just times out
php {
php { $node->setAttribute('src', $src);
php {
php { $noscript->appendChild($node);
php {
php { $image->setAttribute('x-data-src', $src);
php { $image->removeAttribute('src');
php { $image->parentNode->appendChild($noscript);
php { //$image->parentNode->appendChild($newImage);
php {
php { }
limit exceeded
php > $body = $doc->saveHTML();
php >
php > echo $body;
<!DOCTYPE html>
<html><head><title>Sample</title></head><body>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img src="https://example.com/images/example.jpg">
</noscript>
</noscript>
</noscript>
</noscript>
</body></html>
php >
The troublesome line causing the recursion is
$image->parentNode->appendChild($noscript);
if you comment that out, the recursion goes away. Notice that when it recurses, the x-data-src is being applied to all but the last one.
I haven't quite figured out what is causing this behaviour, but hopefully being able to visualize it will help you diagnose it further.
**UPDATE
The OP took this and ran with it, and completed the answer with his solution as shown below.
The problem was in fact that getElementsByTagName returns a LiveNodeList so appending an image to the doc will cause the infinite recursion.
I solved it by first collecting all the image tags in a simple array
<?php
$html = '<!DOCTYPE html><html><head><title>Sample</title></head><body><img src="https://example.com/images/example.jpg"></body></html>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
$normal_array = [];
foreach ($images as $image) {
$normal_array[] = $image;
}
// Now we have all tags in a simple array NOT in a Live Node List
foreach ($normal_array as $image) {
$src = $image->getAttribute('src');
$noscript = $doc->createElement('noscript');
$node = $doc->createElement('img'); //If a uncomment this line the script just times out
$node->setAttribute('src', $src);
$noscript->appendChild($node);
$image->setAttribute('x-data-src', $src);
$image->removeAttribute('src');
$image->parentNode->appendChild($noscript);
//$image->parentNode->appendChild($newImage);
}
$body = $doc->saveHTML();

how to deal with non-latin character in <?php echo $img->

i have the following php code not returned images containing non-latin character,because their links show weird address.
if($image) {
var_dump(mb_detect_encoding($image, 'UTF-8', true))
$doc = new DOMDocument();
$doc->loadHTML($image);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) { ?>
<img data-src="<?php echo $img->getAttribute("src");?>" class="something"/>
<?php }
} ?>
already i have <meta charset="utf-8">in <head>.how should i deal with this problem.

Get image source from html dom element

I am querying image using getElementsByTagName("img") and printing it using image->src , it does not work. I also tried to use image->nodeValue this to does not work.
require('simple_html_dom.php');
$dom=new DOMDocument();
$dom->loadHTML( $str); /*$str contains html output */
$xpath=new DOMXPath($dom);
$imgfind=$dom->getElementsByTagName('img'); /*finding elements by tag name img*/
foreach($imgfind as $im)
{
echo $im->src; /*this doesnt work */
/*echo $im->nodeValue; and also this doesnt work (i tried both of them separately ,Neither of them worked)*/
// echo "<img src=".$im->nodeValue."</img><br>"; //This also did not work
}
/*the image is encolsed within div tags.so i tried to query value of div and print but still image was not printed*/
$printimage=$xpath->query('//div[#class="abc"]');
foreach($printimage as $image)
{
echo $image->src; //still i could not accomplish my task
}

Okay, use this to display your image:
foreach($imgfind as $im)
{
echo "<img src=".$im->getAttribute('src')."/>"; //use this instead of echo $im->src;
}
and it will surely display your image. Make sure path to the image is correct.

Espero te sirva
$dom = new DOMDocument();
$filename = "https://www.amazon.com/dp/B0896WB9XD/";
$html = file_get_contents($filename);
#$dom->loadHTML($html);
$imgfind=$dom->getElementsByTagName('img');
foreach($imgfind as $im)
{
$ids= $im->getAttribute('id');
if ($ids == 'landingImage') {
$im2 = $im->getAttribute('src');
echo '<img src="'.$im2.'">';
}
else{
}
}
para amazon.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

UTF8 problems while working with DOM Object in PHP - php

Guys I've found the solution!!!!!!!!!! Huraaa !!!! :)))) For those who will face this problem in future here is the code $article_header = mb_convert_encoding($article_header, 'HTML-ENTITIES', "UTF-8"); This must be done before loadHTML and everything works fine!!!!

Related

php getimagesize with persian file name

PHP-How to re display images on website from fetched URL IMGS

Can't append img element using php domdocument

how to deal with non-latin character in <?php echo $img->

Get image source from html dom element

Categories

Resources