How to get child element of DOMDocument - php

Trying to get the specific value from a table with tr and td elements...
HTML:
<table>
<tr>
<td>value1</td>
<td>value2</td>
<td>value3</td>
</tr>
</table>
PHP:
$html = 'http://www.example.com'; // edited
$dom = new DOMDocument;
#$dom->loadHTML($html);
$data = $dom->getElementsByTagName('tr:nth-child(3n)');
foreach ($data as $datas){
echo $link->nodeValue;
}
Using such or different approach, how to get the value of specific td element... ?

Using getElementsByTagName() returns a list of the tags based on your starting point, so once you've found the table, you can then use the same function to get the <td> tags. You can then just pick out the elements your after...
$data = "<table>
<tr>
<td>value1</td>
<td>value2</td>
<td>value3</td>
</tr>
</table>";
$dom = new DOMDocument;
$dom->loadHTML($data);
$table = $dom->getElementsByTagName('table');
$td = $table[0]->getElementsByTagName('td'); // Fetch all td elements in the first table
echo $td[2]->nodeValue; // Echo out the value of the 3rd item (zero based arrays)
Prints out..
value3

xPath can be used to get particular element. Try the following code to get 3rd td value from given html.
$html = '<table>
<tr>
<td>value1</td>
<td>value2</td>
<td>value3</td>
</tr>
</table>';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$table = $dom->getElementsByTagName('table')->item(0);
$query = 'tr/td[3]';
$entries = $xpath->query($query, $table);
echo $entries[0]->nodeValue;
Read about DOMXpath query()
Update: Use of file_get_content is also simple, you can retrieve html/xml as string in $html variable and rest of the process is same:
$html = file_get_contents("path/to/file/x.html"); // target path

I think this should help you.
Just assign class Name for td elements and then use this
for eg:
<td class="two">value2</td>
$(this).closest('tr').children('td.two').text();

Related

Get specific output using php DOMElement and beginning of a value?

Here is the HTML code:
<table>
<tr>
<td>value1</td>
<td>value2</td>
<td>mailto:example#mail.com</td>
</tr>
</table>
And, the php:
$html = 'http://www.example.com'; // target path
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link){
$linkatt = $link->getAttribute('href');
$linkval = substr($linkatt, 0, 5 );
if($linkval == "mailto"){
echo $link->nodeValue;
}
}
Tried to export all child a elements with href attribute by looking for the starting "mailto" value and got no results so, not sure what is wrong with my code...
How can I get it done, exporting all the values of href attribute starting with mailto... ?
If you want to load HTML from a filename/URL, you need to use DOMDocument::loadHTMLFile(), not DOMDOcument::loadHTML(). The latter expects a string of HTML, not a filename or URL.

PHP DOM/xpath check element span class value

Within a curl request I have a html table that has the below structure. I now want to extract only table rows that contain a span element with the empty class and not the ones with the class="subcomponent".
I successfully tried Xpath to find the elements with the empty class but how to do I get the entire <tr> or even better specific <td> nodes that contain Version and Partnumber.
Thanks in advance.
<table>
...
<tbody>
<tr>
<td></td>
<td></td>
<td>
<span class="">Product</span>
</td>
<td>Version</td>
<td>Partnumber</td>
</tr>
<tr>
<td></td>
<td></td>
<td>
<span class="subcomponent">Component</span>
</td>
<td>Version</td>
<td>Partnumber</td>
</tr>
</tbody>
My PHP code
$doc = new DOMdocument();
libxml_use_internal_errors(true);
$doc->loadHTML($page);
$doc->saveHTML();
$xpath = new DOMXpath($doc);
$query ='//span[#class=""]';
$entries = $xpath->query($query);
foreach ($entries as $entry) {
echo $entry->C14N();
}
To access the table rows themselves using SimpleXML, you can use the following:
$sxml = simplexml_load_string('<table>...</table>');
$rows = $sxml->xpath('//tr[td/span[#class=""]]');
foreach ($rows as $row) {
echo "Version: ", $row->td[3], ", Partnumber: ", $row->td[4];
}
The XPath works by selecting all <tr> tags that have a child <td>, which itself has a child <span> with a blank class.
In the loop, you need to access the child cells of each row by number, since your sample doesn't indicate that they're labelled any other way. I'm assuming a table structure won't change too often though, so that should be fine.
See https://eval.in/860169 for an example.
Alternative DOMDocument Version
If you're fetching a full webpage, which won't necessarily be well-formed, you might need to use DOMDocument as you have in your first example. It's a bit less clean to access the child-elements, but something like the following will work:
$doc = new DOMdocument;
libxml_use_internal_errors(true);
$doc->loadHTML($page);
$xpath = new DOMXpath($doc);
$rows = $xpath->query('//tr[td/span[#class=""]]');
foreach ($rows as $row) {
$cells = $row->getElementsByTagName('td');
$version = $cells->item(3)->nodeValue;
$partNumber = $cells->item(4)->nodeValue;
echo "Version: {$version}, Part Number: {$partNumber}", PHP_EOL;
}
See https://eval.in/860217
I would use next XPath expression:
//td[text()="Version"] | //td[text()="Partnumber"]
Which gives me:
Element='<td>Version</td>'
Element='<td>Partnumber</td>'
Element='<td>Version</td>'
Element='<td>Partnumber</td>'

How to parse table inside table?

I have an html content, that look like this:
<table>
<tbody>
<tr>
<td>blabla</td>
<td>blabla</td>
</tr>
<tr>
<td>blabla</td>
<td>blabla</td>
</tr>
<tr>
<td>blabla</td>
<td><table>THIS IS MY TABLE CONTENT</table></td>
</tr>
</tbody>
</table>
I want to parse the THIS IS MY TABLE CONTENT and ONLY this table, the outer table is irrelevant for me.
I'm using Simple HTML DOM parser and right now my code look like this:
$table = $html->find('table');
foreach ($table->find('table') as $tbl){
foreach ($tbl->find('tr') as $tr){
foreach ($tr->find('td') as $td){
// some logic
}
}
}
My problem is, that this way I'm not getting any result. How can I perform this parsing the right way?
Thank you very much for the help!
What about using DOM with XPath
$str = '<table><tbody><tr><td>blabla</td><td>blabla</td></tr><tr><td>blabla</td><td>blabla</td></tr><tr><td>blabla</td><td><table>THIS IS MY TABLE CONTENT</table></td></tr></tbody></table>';
$dom = new DOMDocument();
$dom->loadHTML($str); // $str is your html string
$xpath = new DOMXPath($dom);
$tables = $xpath->query('.//td/table'); // fetch all tables inside td
foreach($tables as $table){
// Do your stuff with each table
echo $table->nodeValue; // $table is your current $table
}
The inner table would be:
$html->find('table table', 0);

xpath looping to unknown number of nodes

I've a xpath that looks like this:
$path = '//*[#id="page-content"]/table/tbody/tr[3]/td['.$i.']/div/a';
where $i goes from 1 to X. I would normaly use:
for($i=1; $i<X;$i++){
$path = '//*[#id="page-content"]/table/tbody/tr[3]/td['.$i.']/div/a';
$nodelist = $xpath->query($path);
$result = $nodelist->item(0)->nodeValue;
};
However, in this case, I dont know how much is X. Is there any way to loop through this without knowing X?
Why not just stack em? Something like (fragile code, add your checks):
// first xpath for the outer node-list
$tds = $xpath->query('//*[#id="page-content"]/table/tbody/tr[3]/td');
foreach ($tds as $td)
{
// fetch the included values with a relative xpath to the current node
$nodelist = $xpath->query('./div/a', $td);
...
}
And actually you wont even need that inner nodelist, because you want to query the node-values in the end. However I leave this here to show what you can do straight ahead by using an xpath relative to a concrete node.
So if you need the first <a> element inside any <div> inside the third <tr> of any table inside of any node with the id "page-content", you can write it as such directly, it is one query:
//*[#id="page-content"]/table/tbody/tr[3]/td/div/a[1]
The predicate (that are the brackets) is only for the node in the path prefixed to it, so the [1] is only for a at the end as was the [3] only for the tr.
Code Example:
$as = $xpath->query('//*[#id="page-content"]/table/tbody/tr[3]/td/div/a[1]');
foreach ($as as $a)
{
echo $a->nodeValue, "\n";
}
So this would give you the result as a single node-list, you do not need to run a second xpath query.
If I'm understanding your question, you're asking how to loop up until the max number of <td> elements under your XPath?
You could retrieve the number of nodes using:
count(//*[#id="page-content"]/table/tbody/tr[3]/td) and store it as a temp variable, then just use it in your next statement like so:
for($i=1; $i<numberOfTdElements;$i++){
$path = '//*[#id="page-content"]/table/tbody/tr[3]/td['.$i.']/div/a';
$nodelist = $xpath->query($path);
$result = $nodelist->item(0)->nodeValue;
};
In response to hakre's suggestion:
$tbody = $doc->getElementsByTagName('tbody')->item(0);
// our query is relative to the tbody node
$query = 'count(tr[3]/td)';
$tdcount = $xpath->evaluate($query, $tbody);
echo "There are $tdcount elements under tr[3]\n";
And then combine it all in:
for($i=1; $i<$tdcount;$i++){
$path = '//*[#id="page-content"]/table/tbody/tr[3]/td['.$i.']/div/a';
$nodelist = $xpath->query($path);
$result = $nodelist->item(0)->nodeValue;
};
I think what you are trying to do is fetch every a element that is a child of a div, which in its turn is a child of any td element that, in its turn, is a child of every third tr element, etc. If that is correct, you can simply fetch these with this query:
<?php
$doc = new DOMDocument();
$doc->loadXML( $xml );
$xpath = new DOMXPath( $doc );
$nodes = $xpath->query( '//*[#id="page-content"]/table/tbody/tr[3]/td/div/a' );
foreach( $nodes as $node )
{
echo $node->nodeValue . '<br>';
}
Where $xml is a document, similar to this:
<?php
$xml = <<<XML
<?xml version="1.0" encoding="utf-8" ?>
<result>
<div id="page-content">
<table>
<tbody>
<tr>
<td>
<div><a>This one shouldn't be fetched</a></div>
</td>
</tr>
<tr>
<td>
<div><a>This one shouldn't be fetched</a></div>
</td>
</tr>
<tr>
<td>
<div><a>This one should be fetched</a></div>
</td>
<td>
<div><a>This one should be fetched</a></div>
</td>
<td>
<div><a>This one should be fetched</a></div>
</td>
<td>
<div><a>This one should be fetched</a></div>
</td>
<td>
<div><a>This one should be fetched</a></div>
</td>
</tr>
<tr>
<td>
<div><a>This one shouldn't be fetched</a></div>
</td>
</tr>
</tbody>
</table>
</div>
</result>
XML;
In other words, no need to loop trough all these td elements. You can fetch them all in one go, resulting in a DOMNodeList with all required nodes.
$doc = new DOMDocument();
$doc->loadXML( $xml );
$xpath = new DOMXPath( $doc );
$nodes = $xpath->query( '/result/div[#id="page-content"]/table/tbody/tr[3]/td/div/a');
foreach( $nodes as $node )
{
echo $node->nodeValue . '<br>';
}

How can I get td values using dom and php

I have a table such this :
<table>
<tr>
<td>Values</td>
<td>5000</td>
<td>6000</td>
</tr>
</table>
And I want to get td's content. But I could not manage it.
<?PHP
$dom = new DOMDocument();
$dom->loadHTML("figures.html");
$table = $dom->getElementsByTagName('table');
$tds=$table->getElementsByTagName('td');
foreach ($tds as $t){
echo $t->nodeValue, "\n";
}
?>
There are multiple problems with this code:
To load from an HTML file, you need to use DOMDocument::loadHTMLFile(), not loadHTML() as you have done. Use $dom->loadHTMLFile("figures.html").
You can't use getElementsByTagName() on a DOMNodeList as you have done (on $table). It can only be used on a DOMDocument.
You could do something like this:
$dom = new DOMDocument();
$dom->loadHTMLFile("figures.html");
$tables = $dom->getElementsByTagName('table');
// Find the correct <table> element you want, and store it in $table
// ...
// Assume you want the first table
$table = $tables->item(0);
foreach ($table->childNodes as $td) {
if ($td->nodeName == 'td') {
echo $td->nodeValue, "\n";
}
}
Alternatively, you could just directly search for all elements with tag name td (though I'm sure you want to do that in a table-specific manner.
You should use a for loop to display the multiple td's with id attributes in it such that each td must signify a different id in html file
for example
for($i=1;$i<=10;$i++){
echo "<td id ='id_".$i."'>".$tdvalue."</td>";
}
and then again you can fetch the td values by just iterating another for loop over getElementById
The td data can be found inside childNodes
$dom = new domDocument;
$dom->loadHTML("your-url");
$tables = $dom->getElementsByTagName('table');
$rows = $tables->getElementsByTagName('tr');
foreach ($rows as $row) {
echo $row->childNodes[0]->nodeValue;
}

Categories