PHP DOM html get element from another element - php

I am trying to create something for php html dom to work with a element path pattern.
It looks as fallow. I can have different paths where I want to have some text out. like;
$elements = 'h1;span;';
$elements = 'div.test;h2;span';
I tried to create an function to handle these inserts but I am stuck on the
part to set 'getElementsByTagName()' in the good order and to receive the value of
the last element,
what I have done now;
function convertName($html, $elements) {
$elements = explode(';', $elements);
$dom = new DOMDocument;
$dom->loadHTML($html);
$name = null;
foreach ($elements as $element) :
$name. = getElementsByTagName($element)->item(0)->;
endforeach;
$test = $dom->$name.'nodeValue';
print_r($test); // receive value
}
I hope someone can give me some input or examples.

May be something like this:
function convertName($html, $elements) {
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$elements = explode(';', $elements);
$elemValues = array();
foreach ($elements as $element) {
$nodelist = $xpath->query("//$element");
for($i=0; $i < $nodelist->length; $i++)
$elemValues[$element][] = $nodelist->item($i)->nodeValue;
}
return $elemValues;
}
// TESTING
$html = <<< EOF
<span class="bar">Some normal Text</span>
<input type="hidden" name="hf" value="123">
<h1>Heading 1<span> span inside h1</span></h1>
<div class='foo'>Some DIV</div>
<span class="bold">Bold Text</span>
<p/>
EOF;
$elements = 'h1;span;';
// replace all but last ; with / to get valid XPATH
$elements = preg_replace('#;(?=[^;]*;)#', '/', $elements);
// call our function
$elemValues = convertName($html, $elements);
print_r($elemValues);
OUTPUT:
Array
(
[h1/span] => Array
(
[0] => span inside h1
)
)

Related

Loop through elements and parse them whith DOMDocument() in PHP

I've a list of item like this:
<div class="list">
<div class="ui_checkbox type hidden" data-categories="57 48 ">
<input id="attraction_type_119" type="checkbox" value="119"
<label for="attraction_type_119">Aquariums</label>
</div>
<div class="ui_checkbox type " data-categories="47 ">
<input id="attraction_type_120" type="checkbox" value="120"
<label for="attraction_type_120">Arènes et stades</label>
</div>
</div>
How can I loop through them with DOMDocument to get details like:
data-categories
input value
label text
This is what I tried:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xp = new DOMXpath($dom);
$elements = $dom->getElementsByTagName('div');
$data = array();
foreach($elements as $node){
foreach($node->childNodes as $child) {
$data['data_categorie'] = $child->item(0)->getAttribute('data_categories');
$data['input_value'] = $child->item(0)->getAttribute('input_value');
$data['label_text'] = $child->item(0)->getAttribute('label_text');
}
}
But it doesn't work.
What I'm missing here please ?
Thanks.
Setting multiple values in the loop like this $data['data_categorie'] = using the same key for the array $data = array(); will overwrite the values on every iteration.
As you have multiple items, you could create a temporary array $temp = []; to store the values and add the array to the $data array after storing all the values for the current iteration.
As you are already using DOMXpath, you could get the div with class="list" using an expression like //div[#class="list"]/div and loop the childNodes checking for nodeName input and get that value plus the value of the next sibling which is the value of the label
$data = array();
$xp = new DOMXpath($dom);
$items = $xp->query('//div[#class="list"]/div');
foreach($items as $item) {
$temp["data_categorie"] = $item->getAttribute("data-categories");
foreach ($item->childNodes as $child) {
if ($child->nodeName === "input") {
$temp["input_value"] = $child->getAttribute("value");
$temp["label_text"] = $child->nextSibling->nodeValue;
}
}
$data[] = $temp;
}
print_r($data);
Output
Array
(
[0] => Array
(
[data_categorie] => 57 48
[input_value] => 119
[label_text] => Aquariums
)
[1] => Array
(
[data_categorie] => 47
[input_value] => 120
[label_text] => Arènes et stades
)
)
Php demo
I used string() and evaluate to get result in a single query:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//div[contains(#class, "ui_checkbox")]');
foreach($elements as $node) {
$data = array();
$data['data_categorie'] = $xpath->evaluate('string(./#data-categories)', $node);
$data['input_value'] = $xpath->evaluate('string(./input/#value)', $node);
$data['label_text'] = $xpath->evaluate('string(./label/text())', $node);
}

Why does not display the attribute html via xpath php

Why does not display the attribute html via xpath php
<?php
$content = '<div class="keep-me">Keep this div</div><div class="remove-me" id="test">Remove this div</div>';
$badClasses = array('');
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($content);
libxml_clear_errors();
$xPath = new DOMXpath($dom);
foreach($badClasses as $badClass){
$domNodeList = $xPath->query('//div[#class="remove-me"]/#id');
$domElemsToRemove = ''; // container of deleted elements
foreach ( $domNodeList as $domElement ) {
$domElemsToRemove .= $dom->saveHTML($domElement); // concat them
$domElement->parentNode->removeChild($domElement); // then remove
}
}
$content = $dom->saveHTML();
echo htmlentities($domElemsToRemove);
?>
Works - //div[#class="remove-me"] or //div[#class="remove-me"]/text()
Not working - //div[#class="remove-me"]/#id
Maybe there is a way easier
The XPath //div[#class="remove-me"]/#id is correct, but you need to just loop over the returned elements and add the nodeValue to a list of matching ID's...
$xPath = new DOMXpath($dom);
$domNodeList = $xPath->query('//div[#class="remove-me"]/#id');
$ids = []; // container of deleted elements
foreach ( $domNodeList as $domElement ) {
$ids[] = $domElement->nodeValue;
}
print_r($ids);
If the aim is to fetch the ID of any element with class "remove-me" as is how I interpret the question then perhaps you can try like this - untested btw...
.... other code before
$xp=new DOMXpath( $dom );
$col= $xp->query( '*[#class="remove-me"]' );
if( $col->length > 0 ){
foreach($col as $node){
$id=$node->hasAttribute('id') ? $node->getAttribute('id') : 'banana';
echo $id;
}
}
however looking at the code in the question suggests that you wish to delete nodes - in which case build an array of nodes ( nodelist ) and iterate through it from the end to the front - ie: backwards...

Why the query doesn't match the DOM?

Here is my code:
$res = file_get_contents("http://www.lenzor.com/photo/search/index/type/user/%D8%B9%D9%84%DB%8C//text/%D9%81%D8%A7%D8%B7%D9%85%D9%87");
$doc = new \DOMDocument();
#$doc->loadHTMLFile($res);
$xpath = new \DOMXpath($doc);
$links = $xpath->query("//ul[#class='user_box']/li");
$result = array();
if (!is_null($links)) {
foreach ($links as $link) {
$href = $link->getAttribute('class');
$result[] = [$href];
}
}
print_r($result);
Here is the content I'm working on. I mean it's the result of echo $res.
Ok well, the result of my code is an empty array. So $links is empty and that foreach won't be executed. Why? Why //ul[#class='user_box']/li query doesn't match the DOM ?
Expected result is an array contains the class attribute of lis.
Try this, Hope this will be helpful. There are few mistakes in your code.
1. You should search like this '//ul[#class="user_box clearfix"]/li' because class="user_box clearfix" class attribute of that HTML source contains two classes.
2. You should use loadHTMLinstead of loadHTMLFile.
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$res = file_get_contents("http://www.lenzor.com/photo/search/index/type/user/%D8%B9%D9%84%DB%8C//text/%D9%81%D8%A7%D8%B7%D9%85%D9%87");
$doc = new \DOMDocument();
$doc->loadHTML($res);
$xpath = new \DOMXpath($doc);
$links = $xpath->query('//ul[#class="user_box clearfix"]/li');
$result = array();
if (!is_null($links)) {
foreach ($links as $link) {
$href = $link->getAttribute('class');
$result[] = [$href];
}
}
print_r($result);

How to loop through all the Childs under a tag in PHP DOMDocument

I have the following html
$html = '<body><div style="font-color:#000">Hello</div>
<span style="what">My name is rasid</span><div>new to you
</div><div style="rashid">New here</div></body>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$elements = $dom->getElementsByTagName('body');
I have tried
foreach($elements as $child)
{
echo $child->nodeName;
}
The Ouput is
body
But I need to loop through all the tags under body not the body. How can I do that.
I have also tried in above example to replace
$elements = $dom->getElementsByTagName('body');
with
$elements = $dom->getElementsByTagName('body')->item(0);
But It gives Error. Any Solution??
try this
$elements = $dom->getElementsByTagName('*');
$i = 1; //counter to output from 3rd one, since foreach loop below will output" html body div span div div"
foreach($elements as $child)
{
if ($i > 2) echo $child->nodeName."<br>"; //output "div span div div"
++$i;
}
If you only want child nodes of the body element, you can use:
$body = $dom->getElementsByTagName( 'body' )->item( 0 );
foreach( $body->childNodes as $node )
{
echo $node->nodeName . PHP_EOL;
}
If you want all descending nodes of the body element, you could use DOMXPath:
$xpath = new DOMXPath( $dom );
$bodyDescendants = $xpath->query( '//body//node()' );
foreach( $bodyDescendants as $node )
{
echo $node->nodeName . PHP_EOL;
}
use this code
$elements = $dom->getElementsByTagName('*');
foreach($elements as $child)
{
echo $child->nodeName;
}

looking to loop for 2 element in the same time (php /xpath )

I'm trying to extract 2 elements using PHP Curl and Xpath!
So far have the element separated in foreach but I would like to have them in the same time:
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->evaluate("//p[#class='row']/a/#href");
//$elements = $xpath->query("//p[#class='row']/a");
foreach ($elements as $element) {
$url = $element->nodeValue;
//$title = $element->nodeValue;
}
When I echo each one out of the foreach I only get 1 element and when its echoed inside the foreach i get all of them.
My question is how can I get them both at the same time (url and title ) and whats the best way to add them into myqsl using pdo.
thank you
There is no need, in this case, to use XPath twice. You could do one query and navigate to the associated other node(s).
For example, find all of the hrefs that you are interested in and get their ownerElement's (the <a>) node value.
$hrefs = $xpath->query("//p[#class='row']/a/#href");
foreach ($hrefs as $href) {
$url = $href->value;
$title = $href->ownerElement->nodeValue;
// Insert into db here
}
Or, find all of the <a>s that you are interested in and get their href attributes.
$anchors = $xpath->query("//p[#class='row']/a[#href]");
foreach ($anchors as $anchor) {
$url = $anchor->getAttribute("href");
$title = $anchor->nodeValue;
// Insert into db here
}
You're overwriting $url on each iteration. Maybe use an array?
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->evaluate("//p[#class='row']/a/#href");
//$elements = $xpath->query("//p[#class='row']/a");
$urls = array();
foreach ($elements as $element){
array_push($urls, $element->nodeValue);
//$title = $element->nodeValue;
}

Categories