Loop through elements and parse them whith DOMDocument() in PHP - php

I've a list of item like this:
<div class="list">
<div class="ui_checkbox type hidden" data-categories="57 48 ">
<input id="attraction_type_119" type="checkbox" value="119"
<label for="attraction_type_119">Aquariums</label>
</div>
<div class="ui_checkbox type " data-categories="47 ">
<input id="attraction_type_120" type="checkbox" value="120"
<label for="attraction_type_120">Arènes et stades</label>
</div>
</div>
How can I loop through them with DOMDocument to get details like:
data-categories
input value
label text
This is what I tried:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xp = new DOMXpath($dom);
$elements = $dom->getElementsByTagName('div');
$data = array();
foreach($elements as $node){
foreach($node->childNodes as $child) {
$data['data_categorie'] = $child->item(0)->getAttribute('data_categories');
$data['input_value'] = $child->item(0)->getAttribute('input_value');
$data['label_text'] = $child->item(0)->getAttribute('label_text');
}
}
But it doesn't work.
What I'm missing here please ?
Thanks.

Setting multiple values in the loop like this $data['data_categorie'] = using the same key for the array $data = array(); will overwrite the values on every iteration.
As you have multiple items, you could create a temporary array $temp = []; to store the values and add the array to the $data array after storing all the values for the current iteration.
As you are already using DOMXpath, you could get the div with class="list" using an expression like //div[#class="list"]/div and loop the childNodes checking for nodeName input and get that value plus the value of the next sibling which is the value of the label
$data = array();
$xp = new DOMXpath($dom);
$items = $xp->query('//div[#class="list"]/div');
foreach($items as $item) {
$temp["data_categorie"] = $item->getAttribute("data-categories");
foreach ($item->childNodes as $child) {
if ($child->nodeName === "input") {
$temp["input_value"] = $child->getAttribute("value");
$temp["label_text"] = $child->nextSibling->nodeValue;
}
}
$data[] = $temp;
}
print_r($data);
Output
Array
(
[0] => Array
(
[data_categorie] => 57 48
[input_value] => 119
[label_text] => Aquariums
)
[1] => Array
(
[data_categorie] => 47
[input_value] => 120
[label_text] => Arènes et stades
)
)
Php demo

I used string() and evaluate to get result in a single query:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//div[contains(#class, "ui_checkbox")]');
foreach($elements as $node) {
$data = array();
$data['data_categorie'] = $xpath->evaluate('string(./#data-categories)', $node);
$data['input_value'] = $xpath->evaluate('string(./input/#value)', $node);
$data['label_text'] = $xpath->evaluate('string(./label/text())', $node);
}

Related

How to web-scrape in in divs with DOMparser

I am trying to get div and for other pages, trying to put it in a foreach.
But facing some troubles,
<div class="article_info">
<ul class="c-result_box">
<li>
<div class="inner cf">
<div class="c-header">
<div class="c-logo">
<im src="/e/designs/31sumai/common/img/logo_08.png" alt="#">
</div>
<p class="c-supplier">三井のマンション</p>
<p class="c-name">
パークリュクス大阪天満
</p>
I'm trying to get the text inside the <a> element, here is my codes, what I am missing here?
$start_id = 1501;
while(true){
$url = 'https://www.31sumai.com/mfr/K'.$start_id.'/outline.html';
$html = file_get_contents($url);
libxml_use_internal_errors(true);
$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$xpath = new \DOMXPath($DOMParser);
$classname="c-name";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
$MyTable = false;
$insertData = [];
foreach($nodes as $node){
$allNames = [];
foreach($node->getElementsByTagName('a') as $a){
$name = $a->getElementsByTagName('a');
$allProperties[] = [
'names' => $name];
}
}
Thank you for helping!
You can rely on your XPath query to pull all the text node that you want, and then just get the nodeValue property within your loop:
$start_id = "1501";
$url = "https://www.31sumai.com/mfr/K$start_id/outline.html";
$html = file_get_contents($url);
libxml_use_internal_errors(true);
$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$xpath = new \DOMXPath($DOMParser);
$classname="c-name";
$nodes = $xpath->query("//*[contains(#class, '$classname')]/a/text()");
foreach($nodes as $node){
echo $node->nodeValue;
}

Parsing HTML to extract array of DIV content by class

$html = file_get_contents("https://www.wireclub.com/chat/room/music");
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$result = array();
foreach($xpath->evaluate('//div[#class="message clearfix"]/node()') as $childNode) {
$result[] = $dom->saveHtml($childNode);
}
echo '<pre>'; var_dump($result);
I would like the content of each individual DIV in an array to be processed individually.
This code is clumping every DIV together.
You could retrieve all the div and get the nodeValue
$dom = new DOMDocument();
$dom->loadHTML($html);
$myDivs = $dom->getElementsByTagName('div');
foreach($myDivs as $key => $value) {
$result[] = $value->nodeValue;
}
var_dump($result);
for class you should
you could use you code
$xpath = new DOMXPath($dom);
$myElem = $xpath->query("//*[contains(#class, '$classname')]");
foreach($myElem as $key => $value) {
$result[] = $value->nodeValue;
}

How can I get all attributes with PHP xpath?

Given the following HTML string:
<div
class="example-class"
data-caption="Example caption"
data-link="https://www.example.com"
data-image-url="https://example.com/example.jpg">
</div>
How can I use PHP with xpath to output / retrieve an array with all attributes as key / value pairs?
Hoping for output like:
Array
(
[data-caption] => Example caption
[data-link] => https://www.example.com
[data-image-url] => https://example.com/example.jpg
)
// etc etc...
I know how to get individual attributes, but I'm hoping to do it in one fell swoop. Here's what I currently have:
function get_data($html = '') {
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//div/#data-link');
foreach ($nodes as $node) {
var_dump($node);
}
}
Thanks!
In XPath, you can use #* to reference attributes of any name, for example :
$nodes = $xpath->query('//div/#*');
foreach ($nodes as $node) {
echo $node->nodeName ." : ". $node->nodeValue ."<br>";
}
eval.in demo
output :
class : example-class
data-caption : Example caption
data-link : https://www.example.com
data-image-url : https://example.com/example.jpg
I think this should do what you want - or at least, give you the basis to proceed.
define('BR','<br />');
$strhtml='<div
class="example-class"
data-caption="Example caption"
data-link="https://www.example.com"
data-image-url="https://example.com/example.jpg">
</div>';
$dom=new DOMDocument;
$dom->loadHTML( $strhtml );
$xpath=new DOMXPath( $dom );
$col=$xpath->query('//div');
if( $col ){
foreach( $col as $node ) if( $node->nodeType==XML_ELEMENT_NODE ) {
foreach( $node->attributes as $attr ) echo $attr->nodeName.' '.$attr->nodeValue.BR;
}
}
$dom = $col = $xpath = null;

looking to loop for 2 element in the same time (php /xpath )

I'm trying to extract 2 elements using PHP Curl and Xpath!
So far have the element separated in foreach but I would like to have them in the same time:
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->evaluate("//p[#class='row']/a/#href");
//$elements = $xpath->query("//p[#class='row']/a");
foreach ($elements as $element) {
$url = $element->nodeValue;
//$title = $element->nodeValue;
}
When I echo each one out of the foreach I only get 1 element and when its echoed inside the foreach i get all of them.
My question is how can I get them both at the same time (url and title ) and whats the best way to add them into myqsl using pdo.
thank you
There is no need, in this case, to use XPath twice. You could do one query and navigate to the associated other node(s).
For example, find all of the hrefs that you are interested in and get their ownerElement's (the <a>) node value.
$hrefs = $xpath->query("//p[#class='row']/a/#href");
foreach ($hrefs as $href) {
$url = $href->value;
$title = $href->ownerElement->nodeValue;
// Insert into db here
}
Or, find all of the <a>s that you are interested in and get their href attributes.
$anchors = $xpath->query("//p[#class='row']/a[#href]");
foreach ($anchors as $anchor) {
$url = $anchor->getAttribute("href");
$title = $anchor->nodeValue;
// Insert into db here
}
You're overwriting $url on each iteration. Maybe use an array?
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->evaluate("//p[#class='row']/a/#href");
//$elements = $xpath->query("//p[#class='row']/a");
$urls = array();
foreach ($elements as $element){
array_push($urls, $element->nodeValue);
//$title = $element->nodeValue;
}

PHP DOM html get element from another element

I am trying to create something for php html dom to work with a element path pattern.
It looks as fallow. I can have different paths where I want to have some text out. like;
$elements = 'h1;span;';
$elements = 'div.test;h2;span';
I tried to create an function to handle these inserts but I am stuck on the
part to set 'getElementsByTagName()' in the good order and to receive the value of
the last element,
what I have done now;
function convertName($html, $elements) {
$elements = explode(';', $elements);
$dom = new DOMDocument;
$dom->loadHTML($html);
$name = null;
foreach ($elements as $element) :
$name. = getElementsByTagName($element)->item(0)->;
endforeach;
$test = $dom->$name.'nodeValue';
print_r($test); // receive value
}
I hope someone can give me some input or examples.
May be something like this:
function convertName($html, $elements) {
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$elements = explode(';', $elements);
$elemValues = array();
foreach ($elements as $element) {
$nodelist = $xpath->query("//$element");
for($i=0; $i < $nodelist->length; $i++)
$elemValues[$element][] = $nodelist->item($i)->nodeValue;
}
return $elemValues;
}
// TESTING
$html = <<< EOF
<span class="bar">Some normal Text</span>
<input type="hidden" name="hf" value="123">
<h1>Heading 1<span> span inside h1</span></h1>
<div class='foo'>Some DIV</div>
<span class="bold">Bold Text</span>
<p/>
EOF;
$elements = 'h1;span;';
// replace all but last ; with / to get valid XPATH
$elements = preg_replace('#;(?=[^;]*;)#', '/', $elements);
// call our function
$elemValues = convertName($html, $elements);
print_r($elemValues);
OUTPUT:
Array
(
[h1/span] => Array
(
[0] => span inside h1
)
)

Categories