PHP DOMDocument Strings to Objects - php

I have created a php script in PHP Dom where multiple html files are scraped to look for all P tags that contain a specific class.
I then want to get the values inside those p tags and build an unordered list in PHP Dom.
My problem is, while I can get the values and echo all of them onto a page, when I try to createElements and append each value in its own LI tag my results only returns the LAST item in the list. I hope that makes sense. Here is the code:
$dom = new DOMDocument();
$dom->formatOutput = true;
$dom->preservewhiteSpace = false;
//looping through an array
foreach ($pages as $page) {
foreach ($page['pageContent'] as $listlinks) {
$dom->loadHTMLFile($theurl . 'content_id_' . $listlinks['content'] . '.html');
//create the xPath object after loading the html source, otherwise the query won't work:/
$xPath = new DOMXPath($dom);
//get the p nodes in a DOMNodeList that has class"content_header_type_2":
$nodeList = $xPath->query("//p[#class='content_header_type_2']");
//create a new DOMDocument and add a ul element:
$newDom = new DOMDocument();
$ul = $newDom->createElement('ul');
$newDom->appendChild($ul);
// append all nodes from $nodeList to the new dom, as children of $ul:
foreach ($nodeList as $domElement) {
$domNode = $newDom->importNode($domElement, true);
echo $domNode->nodeValue . '<br>'; //This gives the entire list
$li = $newDom->createElement('li', $domNode->nodeValue); //This gives the last value in the list
$ul->appendChild($li);
}
}
};
$output = $newDom ->saveHTML();
echo $output;

Related

PHP xpath how to get start tag

I am trying to fetch a form start tag with attributes from a DomDocument loaded with a HTML string.
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$result = $xpath->query('//form[#class="af-form acf-form"]');
if ($result->length > 0) {
echo '<pre>';
print_r(($result->item(0)->C14N()));
echo '</pre>';
die();
}
But this way it prints out the entire form. I would like to fetch only this bit:
<form action="http://localhost/wp-test/form-loose" class="af-form acf-form" id="form_5b72d1cd12cc0" method="POST">
How to do so?
Xpath fetches nodes, not opening/closing tags. DOM is an hierarchy of objects - only the serialized (HTML) string has the opening/closing tags.
However, here are two possible approaches:
Clone the node without its child nodes. Save the clone and remove the closing tag with a string function.
$html = <<<'HTML'
<form
action="http://localhost/wp-test/form-loose"
class="af-form acf-form" id="form_5b72d1cd12cc0" method="POST">
some other stuff
<input>
</form>
HTML;
$document = new DOMDocument();
#$document->loadHTML($html);
$xpath = new DOMXpath($document);
$result = $xpath->evaluate('//form[#class="af-form acf-form"][1]');
foreach ($result as $node) {
echo substr($document->saveHTML($node->cloneNode()), 0, -7);
}
Output:
<form action="http://localhost/wp-test/form-loose" class="af-form acf-form" id="form_5b72d1cd12cc0" method="POST">
Or you save each attribute:
$result = $xpath->evaluate('//form[#class="af-form acf-form"][1]');
foreach ($result as $node) {
$result = '<'.$node->nodeName;
foreach ($node->attributes as $attribute) {
$result .= $document->saveHTML($attribute);
}
$result .= '>';
echo $result;
}
Note: Adding [1] to the Xpath expression limits the result list to the first found node.

get value of href inside of div from external site using PHP

good day Sir/Maam.
I have a certain html attribute that I want to search from the external website
I want to get the a href value but the problem is the id or class or name is random.
<div class="static">
Dynamic
</div>
This code should display all the hrefs in http://example.com
In this case I use DOMDocument and XPath to select the elements you want to access because it's very flexible and easy to use.
<?php
$html = file_get_contents("http://example.com");
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//a/#href");
print_r($nodeList);
// To access the values inside nodes
foreach($nodeList as $node){
echo "<p>" . $node->nodeValue . "</p>";
}
use jquery to get the value as follow:
var link = $(".static>a").attr("href");
You can use PHP DOMDocument:
<?php
$exampleurl = "http://YourDomain.com"; //set your url
$filterClass = "dynamicclass";
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($exampleurl);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href'); // all href
$class = $element->getAttribute('class');
if($class==$filterClass){
echo $href;
}
}
?>

Using DOMdocument createElement in foreach

I want to generate content with an array and place that in my Document that I loaded over a site. That is what I got so far. Is there a way for this with the foreach? Is there an append function?
$dom = new \DOMDocument();
$dom->loadHTML($data);
// create the new element
$newNode = $dom->createElement('div', 'this is new');
$newNode->setAttribute('id', 'new_div');
//foreach
$count = 0;
foreach($tags->contents as $content){
$contents[$count] = $content->text;
}
// fetch and replace the old element
$oldNode = $dom->getElementById('blog-xpath');
$oldNode->parentNode->replaceChild($newNode, $oldNode);
$nodes = $dom->saveHTML($dom->documentElement);

How to query a DOMNode using XPath in PHP?

I'm trying to get the bing search results with XPath. Here is my code:
$html = file_get_contents("http://www.bing.com/search?q=bacon&first=11");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHtml($html);
$x = new DOMXpath($doc);
$output = array();
// just grab the urls for now
foreach ($x->query("//li[#class='b_algo']") as $node)
{
//$output[] = $node->getAttribute("href");
$tmpDom = new DOMDocument();
$tmpDom->loadHTML($node);
$tmpDP = new DOMXPath($tmpDom);
echo $tmpDP->query("//div[#class='b_title']//h2//a//href");
}
return $output;
This foreach iterates over all results, all I want to do is to extract the link and text from $node in foreach, but because $node itself is an object I can't create a DOMDocument from it. How can I query it?
First of all, your XPath expression tries to match non-existant href subelements, query #href for the attribute.
You don't need to create any new DOMDocuments, just pass the $node as context item:
foreach ($x->query("//li[#class='b_algo']") as $node)
{
var_dump( $x->query("./div[#class='b_title']//h2//a//#href", $node)->item(0) );
}
If you're just interested in the URLs, you could also query them directly:
foreach ($x->query("//li[#class='b_algo']/div[#class='b_title']/h2/a/#href") as $node)
{
var_dump($node);
}

Simple HTML DOM gets only 1 element

I'm following a simplified version of the scraping tutorial by NetTuts here, which basically finds all divs with class=preview
http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/comment-page-1/#comments
This is my code. The problem is that when I count $items I get only 1, so it's getting only the first div with class=preview, not all of them.
$articles = array();
$html = new simple_html_dom();
$html->load_file('http://net.tutsplus.com/page/76/');
$items = $html->find('div[class=preview]');
echo "count: " . count($items);
Try using DOMDocument and DOMXPath:
$file = file_get_contents('http://net.tutsplus.com/page/76/');
$dom = new DOMDocument();
#$dom->loadHTML($file);
$domx = new DOMXPath($dom);
$nodelist = $domx->evaluate("//div[#class='preview']");
foreach ($nodelist as $node) { print $node->nodeValue; }

Categories