Fetching image using xpath or some other way - php

I need to fetch the image from a remote page, i tried xpath but i was told it wont work because img does not have nodevalue
Then i was advised to use getAttribute, but i dont know how to get it working.
Any suggestions?
This is my code
<?php
libxml_use_internal_errors(true);
//Setting content type to xml!
header('Content-type: application/xml');
//POST Field name is bWV0aG9k
$url_prefix = $_GET['bWV0aG9k'];
$url_http_request_encode = strpos($url_prefix, "http://");
//Checking to see if url has a http prefix
if($url_http_request_encode === false){
//does not have, add it!
$fetchable_url_link_consistancy_remote_data = "http://".$url_prefix;
}
else
//has it, do nothing
{
$fetchable_url_link_consistancy_remote_data = $url_prefix;
}
//Creating a new DOM Document on top of pre-existing one
$page = new DOMDocument();
//Loading the requested file
$page->loadHTMLFile($fetchable_url_link_consistancy_remote_data);
//Initliazing xpath
$xpath = new DOMXPath($page);
//Search parameters
//Searching for title attribute
$query = "//title";
//Searching for paragraph attribute
$query1 = "//p";
//Searching for thumbnails
$query2 = "//img";
//Binding the attributes to xpath for later use
$title = $xpath->query($query);
$paragraph = $xpath->query($query1);
$images = $xpath->query($query2);
echo "<remotedata>";
//Echoing the attributes
echo "<title-render>".$title->item(0)->nodeValue."</title-render>";
echo "<paragraph>".$paragraph->item(0)->nodeValue."</paragraph>";
echo "<image_link>".$images->item(0)->nodeValue."</image_link>";
echo "</remotedata>";
?>

you should get source attribute of an image tag.
$images->item(0)->getAttribute('src');

if this is normal xhtml, img has no value, you need the value of img/#src

Related

DOMDocument->saveHTML isn't working

An api returns me couple of html code (only part of the body, not full html) and i want to change all images src's with others.
I get and set attributes then if i echo it in foreach loop i see old and new value but when i try to save it with saveHTML then dump the full html block which is returned from api, i don't see replaced paths.
$page = json_decode($page);
$page = (array) $page->rows;
$page = ($page[0]->_->content);
$dom = new \DOMDocument();
$dom->loadHTML($page);
$tag = $dom->getElementsByTagName('img');
foreach($tag as $t)
{
echo $t->getAttribute('src').'<br'>; //showing old src
$t->setAttribute('src', 'bla');
echo $t->getAttribute('src').'<br'>; //showing new src
}
$dom->saveHTML();
var_dump($page); //nothing is changed
My_ friend this is not how it works.
You should have your edited HTML in the result of saveHTML() so:
$editedHtml = $dom->saveHTML()
var_dump($editedHtml);
Now you should see your changed HTML.
Explanation is that $page is completely different object that has nothing to do with $dom object.
Cheers!

Image pulling script not working PHP

Not sure why the code below is not working, its displaying the "Else" value in the IF statement basically saying that there are no IMG tags found on the page but.. im sure they are there? any advice or guidance will be appreciated.
// This variable will contain all the HTML source code of the sample page
$htmlContent = file_get_contents('https://www.instagram.com/ken_flavius/');
var_dump($htmlContent);
// We'll add all the images in this array
$images = [];
// Instantiate a new object of class DOMDocument
$doc = new DOMDocument();
// Load the HTML doc into the object
$doc->loadHTML($htmlContent);
// Get all the IMG tags in the document
$elements = $doc->getElementsByTagName('img');
// If we get at least one result
if($elements->length > 0)
{
// Loop on all of the IMG tags
foreach($elements as $element)
{
// Get the attribute SRC of the IMG tag (this is the link of the image)
$src = $element->getAttribute('src');
if (strlen($src) > 0) {
// Add the link to the array containing all the links
array_push($images, $src);
}
}
//show all links
echo '<pre>'."\r\n";
print_r($images);
echo '</pre>'."\r\n";
} else {
// No result, it means that there were no IMG tags
echo 'no img tag found in the HTML source provided!';
}
Edited it to show the exact example that im using.
$url="http://example.com";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}

Regex Replacement Dependent On Class

I have the following code that replaces all tags on a page and adds the nCode image resizer to it. The code is as follows:
function ncode_the_content($content) {
return preg_replace("/<img([^`|>]*)>/im", "<img onload=\"NcodeImageResizer.createOn(this);\"$1>", $content); }
}
What I need to do is make it so that if an image has the class of "noresize" it doesn't do the preg_match.
I have only managed to get it so that if there is the "noresize" class anywhere on the page it stops resizing all images instead of just the one with the correct class.
Any suggestions?
UPDATE:
Am I even remotely in the right ballpark with this?
function ncode_the_content($content) {
//Load the HTML page
$html = file_get_contents($content);
//Parse it. Here we use loadHTML as a static method
//to parse the HTML and create the DOM object in one go.
#$dom = DOMDocument::loadHTML($html);
//Init the XPath object
$xpath = new DOMXpath($dom);
//Query the DOM
$linksnoresize = $xpath->query( 'img[#class = "noresize"]' );
$links = $xpath->query( 'img[]' );
//Display the results as in the previous example
foreach($links as $link){
echo $link->getAttribute('onload'), 'NcodeImageResizer.createOn(this);';
}
foreach($linksnoresize as $link){
echo $link->getAttribute('onload'), '';
}
}
Here's some untested code:
$dom = DOMDocument::loadHTML($content);
$images = $dom->getElementsByTagName("img");
foreach ($images as $image) {
if (!strstr($image->getAttribute("class"), "noresize")) {
$image->setAttribute("onload", "NcodeImageResizer.createOn(this);");
}
}
But, if it were me, I would eschew any such inline event handler and instead just find the appropriate elements with Javascript.
I ended up just using pure CSS and adding a around the images I didn't want to be resized. Forced the width and height of that div back to auto and then removed the warning message that was displayed above them. Seems to work fine. Thanks for your help :)

How to extract title and meta description using PHP Simple HTML DOM Parser?

How can I extract a page's title and meta description using the PHP Simple HTML DOM Parser?
I just need the title of the page and the keywords in plain text.
$html = new simple_html_dom();
$html->load_file('some_url');
//To get Meta Title
$meta_title = $html->find("meta[name='title']", 0)->content;
//To get Meta Description
$meta_description = $html->find("meta[name='description']", 0)->content;
//To get Meta Keywords
$meta_keywords = $html->find("meta[name='keywords']", 0)->content;
NOTE: The names of meta tags are casesensitive!
I just took a look at the HTML DOM Parser, try:
$html = new simple_html_dom();
$html->load_file('xxx'); //put url or filename in place of xxx
$title = $html->find('title');
echo $title->plaintext;
$descr = $html->find('meta[description]');
echo $descr->plaintext;
$html = new simple_html_dom();
$html->load_file('http://www.google.com');
$title = $html->find('title',0)->innertext;
$html->find('title') will return an array
so you should use $html->find('title',0), so does meta[description]
Taken from LeiXC's solution above, you need to use the simple html dom class:
$dom = new simple_html_dom();
$dom->load_file( 'websiteurl.com' );// put your own url in here for testing
$html = str_get_html($dom);
$descr = $html->find("meta[name=description]", 0);
$description = $descr->content;
echo $description;
I have tested this code and yes it is case sensitive (some meta tags use a capital D for description)
Here is some error checking for spelling errors:
if( is_object( $html->find("meta[name=description]", 0)) ){
echo $html->find("meta[name=description]", 0)->content;
} elseif( is_object( $html->find("meta[name=Description]", 0)) ){
echo $html->find("meta[name=Description]", 0)->content;
}
$html->find('meta[name=keywords]',0)->attr['content'];
$html->find('meta[name=description]',0)->attr['content'];
$html = new simple_html_dom();
$html->load_file('xxx');
//put url or filename in place of xxx
$title = array_shift($html->find('title'))->innertext;
echo $title;
$descr = array_shift($html->find("meta[name='description']"))->content;
echo $descr;
you can using php code and so simple to know. like here
$result = 'site.com';
$tags = get_meta_tags("html/".$result);
The correct answer is:
$html = str_get_html($html);
$descr = $html->find("meta[name=description]", 0);
$description = $descr->content;
The above code gets html into an object format, then the find method looks for a meta tag with the name description, and finally you need to return the value of the meta tag's content, not the innertext or plaintext as outlined by others.
This has been tested and used in live code. Best
I found the easy way to take description
$html = new simple_html_dom();
$html->load_file('your_url');
$title = $html->load('title')->simpletext; //<title>**Text from here**</title>
$description = $html->load("meta[name='description']", 0)->simpletext; //<meta name="description" content="**Text from here**">
If your line contains extra spaces, then try this
$title = trim($title);
$description = trim($description);

PHP Dom problem, how to insert html code in a particular div

I am trying to replace the html code inside the div 'resultsContainer' with the html of $response.
The result of my unsuccessful code is that the contents of 'resultsContainer' remain and the html of $response shows up on screen as text rather than being parsed as html.
Finally, I would like to inject the content of $response inside 'resultContainer' without having to create any new div, I need this: <div id='resultsContainer'>Html inside $response here...</div> and NOT THIS: <div id='resultsContainer'><div>Html inside $response here...</div></div>
// Set Config
libxml_use_internal_errors(true);
$doc = new DomDocument();
$doc->strictErrorChecking = false;
$doc->validateOnParse = true;
// load the html page
$app = file_get_contents('index.php');
$doc->loadHTML($app);
// get the dynamic content
$response = file_get_contents('search.php'.$query);
$response = utf8_decode($response);
// add dynamic content to corresponding div
$node = $doc->createElement('div', $response);
$doc->getElementById('resultsContainer')->appendChild($node);
// echo html snapshot
echo $doc->saveHTML();
if $reponse is plain text:
// add dynamic content to corresponding div
$node = $doc->createTextNode($response);
$doc->getElementById('resultsContainer')->appendChild($node);
if it (can) contain html (one could use createDocumentFragment but that creates its own set of trouble with entities, dtd, etc.):
// add dynamic content to corresponding div
$frag = new DomDocument();
$frag->strictErrorChecking = false;
$frag->validateOnParse = true;
$frag->loadHTML($response);
$target = $doc->getElementById('resultsContainer');
if(isset($target->childNodes) && $target->childNodes->length)){
for($i = $target->childNodes->length -1; $i >= 0;$i--){
$target->removeChild($target->childNodes->item($i));
}
}
//if there's lots of content in $target, you might try this:
//$target->parentNode->replaceChild($target->cloneNode(false),$target);
foreach($frag->getElementsByTagName('body')->item(0)->childNodes as $node){
$target->appendChild($doc->importNode($node,true));
}
Which goes to show it really isn't that suited (or at least cumbersome) to use DOMDocuments as a templating engine.

Categories