I use xpath to change stylesheet of href of stylesheet <link> in header.
But it doesn't work at all.
$html=file_get_contents('http://stackoverflow.com');
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$css_links = $xpath->evaluate("//link[#type='text/css']");
for ($i = 0; $i < $css_links->length; $i++)
{
$csslink = $css_links->item($i);
$oldurl = $csslink->getAttribute('href');
$newURL='http://example.com/aaaa.css';
$csslink->removeAttribute('href');
$csslink->setAttribute('href', $newURL);
}
echo $html;
You're using #$doc->loadHTML(html); instead of #$doc->loadHTML($html); (note the $), otherwise it works.
Also use echo $doc->SaveHtml() instead of echoing $html.
Working example here.
You also can replace for($i...) with foreach because DOMNodeList implements Traversable:
foreach ($css_links as $csslink)
{
$oldurl = $csslink->getAttribute('href');
Related
How do I change the outerHtml of an element using PHP DomDocument class? Make sure, no third party library is used such as Simple PHP Dom or else.
For example:
I want to do something like this.
$dom = new DOMDocument;
$dom->loadHTML($html);
$tag = $dom->getElementsByTagName('h3');
foreach ($tag as $e) {
$e->outerHTML = '<h5>Hello World</h5>';
}
libxml_clear_errors();
$html = $dom->saveHTML();
echo $html;
And the output should be like this:
Old Output: <h3>Hello World</h3>
But I need this new output: <p>Hello World</p>
You can create a copy of the element content and attributes in a new node (with the new name you need), and use the function replaceChild().
The current code will work only with simple elements (a text inside a node), if you have nested elements, you will need to write a recursive function.
$dom = new DOMDocument;
$dom->loadHTML($html);
$titles = $dom->getElementsByTagName('h3');
for($i = $titles->length-1 ; $i >= 0 ; $i--)
{
$title = $titles->item($i);
$titleText = $title->textContent ; // get original content of the node
$newTitle = $dom->createElement('h5'); // create a new node with the correct name
$newTitle->textContent = $titleText ; // copy the content of the original node
// copy the attribute (class, style, ...)
$attributes = $title->attributes ;
for($j = $attributes->length-1 ; $j>= 0 ; --$j)
{
$attributeName = $attributes->item($j)->nodeName ;
$attributeValue = $attributes->item($j)->nodeValue ;
$newAttribute = $dom->createAttribute($attributeName);
$newAttribute->nodeValue = $attributeValue ;
$newTitle->appendChild($newAttribute);
}
$title->parentNode->replaceChild($newTitle, $title); // replace original node per our copy
}
libxml_clear_errors();
$html = $dom->saveHTML();
echo $html;
Can you echo the results of a document parser or do you have to first create an array to display the results? Anyway, when running the code, nothing appears (no output or errors), and I have tried both methods. Could possibly be a site issue but I have tried a few others and get the same result.
<?php
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#id="lvtitle"]/a');
foreach($links as $a) {
echo $a->nodeValue;
}
?>
There are a couple of problems with the code. Firstly is that loadHTML() takes a string for the HTML and not a filename or URI. So first you have to read the web page and pass it in ( I've used file_get_contents() here).
Secondly, the XPath was looking for any <h3> tag with an id attribute of lvtitle, there are only instances where the class attribute is lvtitle. I've updated the XPath expression to use this instead.
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$ebayhtml = file_get_contents($ebayhtml);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#class="lvtitle"]/a');
print_r($links);
foreach($links as $a) {
echo $a->nodeValue.PHP_EOL;
}
I was to use DOMXpath to loop through the nodes of a DOM and stops when it gets to the first piece of text.
So with this method I can capture and delete the first lot of line breaks but leave the rest after hello world:
$html = '<br><br><br>Hello World<br><br><br>'
I'm not sure what the $xpath query is to find plain text but I imaging the code would be something like this:
$doc = new DOMDocument();
$doc->loadHTML($html);
showDOMNode($doc);
$i = 1;
$dom_xpath = new DOMXpath($doc);
foreach($nodes as $node) {
do {
$node->parentNode->removeChild($node);
} while ($i > 0);
if($node == $xpath->query("/:TEXT")){
$i = 0;
}
}
Just a rough piece of code but imagine what I want is something like that, could somebody fill in the gaps for me please.
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach($xpath->query('//br[not(preceding::text())]') as $node) {
$node->parentNode->removeChild($node);
}
return $doc->saveHTML();
#cHao the man!
I'm working with a DOM parser and I'm having issues. I'm basically trying to grab the href within the tag that only contain the class ID of 'thumbnail '. I've been trying to print the links on the screen and still get no results. Any help is appreciated. I also turned on error_reporting(E_ALL); and still nothing.
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$classId = "thumbnail ";
$div = $html->find('a#'.$classId);
echo $div;
I also tried this but still had the same result of NOTHING:
include('simple_html_dom.php');
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$ret = $html->find('a[class=thumbnail]');
echo $ret;
You were almost there:
<?php
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a[contains(concat(' ',normalize-space(#class),' '),' thumbnail ')]");
var_dump($hrefs);
Gives:
class DOMNodeList#28 (1) {
public $length =>
int(25)
}
25 matches, I'd call it success.
This code would probably work:
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hyperlinks = $xpath->query('//a[#class="thumbnail"]');
foreach($hyperlinks as $hyperlink) {
echo $hyperlink->getAttribute('href'), '<br>;'
}
if you're using simple_html_dom, why are you doing all these superfluous things? It already wraps the resource in everything you need -- http://simplehtmldom.sourceforge.net/manual.htm
include('simple_html_dom.php');
// set up:
$html = new simple_html_dom();
// load from URL:
$html->load_file('http://www.reddit.com/r/funny');
// find those <a> elements:
$links = $html->find('a[class=thumbnail]');
// done.
echo $links;
Tested it and made some changes - this works perfect too.
<?php
// load the url and set up an array for the links
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$links = array();
// loop thru all the A elements found
foreach($dom->getElementsByTagName('a') as $link) {
$url = $link->getAttribute('href');
$class = $link->getAttribute('class');
// Check if the URL is not empty and if the class contains thumbnail
if(!empty($url) && strpos($class,'thumbnail') !== false) {
array_push($links, $url);
}
}
// Print results
print_r($links);
?>
why the script not working in the example below (not working :i.e the script is not excuting in the browser)
$xpath = new DOMXpath($doc);
$nodes = $xpath->query( "//div[#class = 'ad_stream_hd']");
foreach( $nodes as $node) {
$node->nodeValue = '<script type="text/javascript" src="http://clkrev.com/adServe/banners?tid=SPORTVE158X21&size=158x21" ></script>';
}
The node value is just text, its not like inner HTML where you can specify a string with markup in it. With a document fragment you can get something close to that, but you're setting xml rather than html so your html would have to be valid xml.
$xpath = new DOMXpath($doc);
$nodes = $xpath->query( "//div[#class = 'ad_stream_hd']");
if ($nodes->length > 0){
$node = $nodes->item($nodes->length-1);
$fragment = $doc->createDocumentFragment();
$fragment->appendXML('<script type="text/javascript" src="http://clkrev.com/adServe/banners?tid=SPORTVE158X21&size=158x21" ></script>');
$node->appendChild($fragment);
}
Edit: right way to iterate DOMNodeList:
$nodes_length = $nodes->length;
for ($i=0; $i < $nodes_length; $i++) {
$nodes->item($i)->nodeValue = '<script type="text/javascript" src="http://clkrev.com/adServe/banners?tid=SPORTVE158X21&size=158x21" ></script>';
}