I'm trying to write an Joomla plugin to add width and height tag to each <img> in HTML file.
Some image file names are Persian, and getimagesize faces error.
The code is this:
#$dom->loadHTML('<?xml version="1.0" encoding="UTF-8"?>' . "\n" . '
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<img src="images\banners\س.jpg" style="max-width: 90%;" >
</body>
</html>
');
$x = new DOMXPath($dom);
foreach($x->query("//img") as $node)
{
$imgtag = $node->getAttribute("src");
$imgtag = pathinfo($imgtag);
$imgtag = $imgtag['dirname'].'\\'.$imgtag['basename'];
$imgtag = getimagesize($imgtag);
$node->setAttribute("width",$imgtag[0]);
$node->setAttribute("height",$imgtag[1]);
}
$newHtml = urldecode($dom->saveHtml($dom->documentElement));
And when Persian characters exist in file name, getimagesize shows:
Warning: getimagesize(images\banners\س.jpg): failed to open stream: No such file or directory in C:\wamp64\www\plugin.php
How can I solve this?
Thanks to all,
I couldn't reach to results on WAMP server (local server on Windows),
but when I migrated to Linux server, finally this code worked properly.
$html = $app->getBody();
setlocale(LC_ALL, '');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$x = new DOMXPath($dom);
foreach($x->query("//img") as $node)
{
$imgtag = $node->getAttribute("src");
if(strpos($imgtag,"data:image")===false)
{
$imgtag = getimagesize($imgtag);
$node->setAttribute("width",$imgtag[0]);
$node->setAttribute("height",$imgtag[1]);
}
}
$bodytag = $x->query("//body");
$node = $dom->createElement("script", ' /* java script which may be necessary on client */ ');
$bodytag[0]->appendChild($node);
$html = '<!DOCTYPE html>'."\n" . $dom->saveHtml($dom->documentElement);
Some hints:
the code, shouldn't touch base64 image sources, so I added an condition to the code.
if some script (or whatever, div, p, ....) should be added to body tag, you can use appendChild method.
<!DOCTYPE html> should be added to final DOM object output :)
Related
So I'm a bit stuck, and I've been given various solutions, none of which work. Any hotshot PHP folks out there? Here's the deal, I'm trying to get an image to display on my website, from another website, that has a randomly generated IMG. Though I'm actually trying to do this off a personal art site of mine, this example will serve perfectly.
http://commons.wikimedia.org/wiki/Special:Random/File
A random image page with an image on it pops up with that link. Now, I'd like to display THAT random image, or whatever image comes up, on another site. The two possible solutions I have encountered is gathering an array of URL LINKS from a given link. And then re displaying that array as images on another site, like a: < a href="https
The code I get back from what I'm talking about looks like this:
Array
(
[0] => https ://kfjhiakwhefkiujahefawef/awoefjoiwejfowe.jpg
[1] => https ://oawiejfoiaewjfoajfeaweoif/awoeifjao;iwejfoawiefj.png
)
Instead of the print out however, I'd like the actual images displayed, well specifically array [0], but one thing at a time. The code that's actually doing this is:
<?php
/*
Credits: Bit Repository
URL: http://www.bitrepository.com/
*/
$url = 'http://commons.wikimedia.org/wiki/Special:Random/File';
// Fetch page
$string = FetchPage($url);
// Regex that extracts the images (full tag)
$image_regex_src_url = '/<img[^>]*'.
'src=[\"|\'](.*)[\"|\']/Ui';
preg_match_all($image_regex, $string, $out, PREG_PATTERN_ORDER);
$img_tag_array = $out[0];
echo "<pre>"; print_r($img_tag_array); echo "</pre>";
// Regex for SRC Value
$image_regex_src_url = '/<img[^>]*'.
'src=[\"|\'](.*)[\"|\']/Ui';
preg_match_all($image_regex_src_url, $string, $out, PREG_PATTERN_ORDER);
$images_url_array = $out[1];
echo "<pre>"; print_r($images_url_array); echo "</pre>";
// Fetch Page Function
function FetchPage($path)
{
$file = fopen($path, "r");
if (!$file)
{
exit("The was a connection error!");
}
$data = '';
while (!feof($file))
{
// Extract the data from the file / url
$data .= fgets($file, 1024);
}
return $data;
}
for($i=0; $i<count($arr1); $i++) {
echo '<img src="'.$arr1[$i].'">';
}
?>
Solution two,
Use a file_get_contents command. Which is this:
<?php
$html =
file_get_contents("http://commons.wikimedia.org/wiki/Special:Random/File");
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$image_src = $xpath->query('//div[contains(#class,"fullImageLink")]/a/img')
[0]->getAttribute('src') ;
echo "<img src='$image_src'><br>";
?>
However, there's unfortunately an error message I get: Fatal error: Cannot use object of type DOMNodeList as array in /home/wilsons888/public_html/wiki.php on line 11. Or, if I remove a "}" at the end, I just get a blank page.
I have been told that the above code will work, but with openssl extension included. Problem is, I have no idea how to do this. (I'm very new to PHP). Anyone know how to plug it in, so to speak? Thank you so much! I feel like I'm close, just missing the last element.
I was able to load the random image, and "print it" as an image directly (so you can embed the php file directly on the IMG tag) using this code:
<?php
$html = file_get_contents("http://commons.wikimedia.org/wiki/Special:Random/File");
$dom = new DOMDocument();
$dom->loadHTML($html);
$remoteImage = $dom->getElementById("file")->firstChild->attributes[0]->textContent;
header("Content-type: image/png");
header('Content-Length: ' . filesize($remoteImage));
echo file_get_contents($remoteImage);
?>
Get a new file called showImage.php and put this code in it:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<img src="test.php">
</body>
</html>
Next, go to your browser and get the showImage.php path, and will show a random image fromt he site you asked...
I am trying to read and display the content of the title (contained in a h1 tag) from many HTML files. These files are all in the same folder.
This is what the html files look like :
<!DOCTYPE html PUBLIC '-//W3C//DTD HTML 4.01//EN'>
<html>
<head>
<title>A title</title>
<style type='text/css'>
... Styles here ...
</style>
</head>
<body>
<h1>Être aidant</h1>
<p>En général, les aidants doivent équilibrer...</p>
... more tags ...
</body>
I have tried to display the content from the H1 tag with this PHP script :
<?php
foreach (glob("test/*.html") as $file) {
$file_handle = fopen($file, "r");
$doc = new DOMDocument();
$doc->loadHTMLfile($file);
$title = $doc->getElementsByTagName('h1');
if ( $title && 0<$title->length ) {
$title = $title->item(0);
$content = $doc->savehtml($title);
echo $content;
}
fclose($file_handle);
}
?>
But the output contains wrong characters. For the example file, the output is :
Être aidant
How can I achieve this output?
Être aidant
You should state a charset in the <head> of your HTML document.
<meta charset="utf-8">
you need to use utf-8 encoding
change echo $content to echo utf8_encode($content);
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Export particular element in DOMDocument to string
i know how to access different element depending on id but don't know how to get everything between html start tag to html end tag. Can anyone please help me.
thanks.
If you would like to parse an html page with PHP, you could use PHP's DOMDocument extension, as such:
// a new dom object
$dom = new domDocument;
// load the html into the object
$dom->loadHTML($html);
// keep white space
$dom->preserveWhiteSpace = true;
// nicely format output
$dom ->formatOutput = true;
//get element by tag name
$htmlRootElement = $dom->getElementsByTagName('html');
echo htmlspecialchars($dom->saveHTML(), ENT_QUOTES);
Or you could do this with JavaScript on the client side:
var htmlRootElement = document.getElementsByTagName("html");
alert(htmlRootElement.innerHTML);
You can access each element in the <html> tag with the DOMDocument class.
Example
$htmlDoc = new DOMDocument;
$html = <<<HTML
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>My Site</title>
<meta name="description" content="DOM test">
</head>
<body>
<h1>Hello</h1>
<p>This is a DOM test</p>
</body>
</html>
HTML;
$htmlDoc->loadHTML($html);
$htmlElement = $htmlDoc->getElementsByTagName("html");
foreach ($htmlElement->item(0)->childNodes as $element) {
echo 'Element name: ' . $element->nodeName . PHP_EOL;
echo 'Element value: '. $element->nodeValue . PHP_EOL;
}
I have a small problem: the tags, e.g. <br> tags, are not parsed when submitting a PHP DomDocument. Here is my PHP code:
$doc = new DOMDocument();
$doc->loadHTMLFile("Test.html");
$doc->formatOutput = true;
$node = new DOMElement('p', 'This is a test<br>This should be a new line in the same paragraph');
$doc->getElementsByTagName('body')->item(0)->appendChild($node);
$doc->saveHTMLFile("Test.html");
echo 'Editing successful.';
Here is the HTML code (before editing):
<!DOCTYPE html>
<html>
<head>
<title>Hey</title>
</head>
<body>
<p>Test</p>
</body>
</html>
(after editing)
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hey</title>
</head>
<body>
<p>Test</p>
<p>This is a test<br>This should be a new line in the same paragraph</p>
</body>
</html>
Why is it not working?
You are trying to append a fragment, which does not work as 'normal' string (how ever would it know what you want it to encode and what not?).
You can use theDOMDocumentFragment::appendXML() function, but as the name states, it wants XML, not HTML, so for this the <br> needs to be self-closing (because we are working in XML mode):
<?php
$doc = new DOMDocument();
$doc->loadHTMLFile("Test.html");
$doc->formatOutput = true;
$node = new DOMElement('p');
$p = $doc->lastChild->lastChild->appendChild($node);
$fragment = $doc->createDocumentFragment();
$fragment->appendXML('This is a test<br/>This should be a new line in the same paragraph');
$p->appendChild($fragment);
$doc->saveHTMLFile("Test.html");
Another solution not involving altering your string is to load a seperate document as HTML (so, $otherdoc->loadHTML('<html><body>'.$yourstring.'</body></html>'), and then loop through it importing in the main doc:
<?php
$doc = new DOMDocument();
$doc->loadHTMLFile("Test.html");
$doc->formatOutput = true;
$node = new DOMElement('p');
$p = $doc->lastChild->lastChild->appendChild($node);
$otherdoc = new DOMDocument();
$yourstring = 'This is a test<br>This should be a new line in the same paragraph';
$otherdoc->loadHTML('<html><body>'.$yourstring.'</body></html>');
foreach($otherdoc->lastChild->lastChild->childNodes as $node){
$importednode = $doc->importNode($node);
$p->appendChild($importednode);
}
$doc->saveHTMLFile("Test.html");
Have you tried <br/> rather than <br>? It could have to do with validity of the markup. <br> is invalid.
I have some UTF8 text+image data which must be processed.
My whole code is in one file; here is the complete code:
<?php
echo "<html xmlns=\"http://www.w3.org/1999/xhtml\">
<head><meta http-equiv='Content-Type' content='text/html; charset=utf-8' /></head><body>";
$article_header="აბგდევზთ<img src='some_url/img/15.jpg' alt=''>აბგდევზთ";
echo "1".$article_header."<br>";
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) {
if(!$img->getAttribute('class')){
$src = $img->getAttribute('src');
$newSRC = str_replace('/img/', '/mini/', $src);
$img->setAttribute('src', $newSRC);
$img->removeAttribute('width');
$img->removeAttribute('height');
$article_header = $doc->saveHTML();
}
}
echo "2".$article_header."<br>";
echo "</body></html>";
?>
As you see I echo data 2 times.
The first time, it brings both text and image, as expected.
The second time, it brings the modified image as expected. But the text becomes damaged, like this: áƒáƒ‘გდევზთ
Is there any way to fix this problem?
Guys I've found the solution!!!!!!!!!! Huraaa !!!! :))))
For those who will face this problem in future here is the code
$article_header = mb_convert_encoding($article_header, 'HTML-ENTITIES', "UTF-8");
This must be done before loadHTML and everything works fine!!!!