DOMDocument How get element a from node? - php

$url = file_get_contents('test.html');
$DOM = new DOMDocument();
$DOM->loadHTML(mb_convert_encoding($url, 'HTML-ENTITIES', 'UTF-8'));
$trs = $DOM->getElementsByTagName('tr');
foreach ($trs as $tr) {
foreach ($tr->childNodes as $td){
echo ' ' .$td->nodeValue;
}
}
test.html
<html>
<body>
<table>
<tbody>
<tr>
<td style="background-color: #FFFF80;">1</td>
<td>test1</td>
</tr>
<tr>
<td style="background-color: #FFFF80;">2</td>
<td>test2</td>
</tr>
<tr>
<td style="background-color: #FFFF80;">3</td>
<td>test3</td>
</tr>
</tbody>
</table>
</body>
</html>
in result i get:
1 test1 2 test2 3 test3
But how get link from td a?
And how get html from td?
P.S.: i try with $td->find('a'); and $td->getElementsByTagName('a'); but it not work...

I improved your code a little bit and this version works fine for me:
$DOM = new DOMDocument();
$DOM->loadHTML(mb_convert_encoding($url, 'HTML-ENTITIES', 'UTF-8'));
$trs = $DOM->getElementsByTagName('tr');
foreach ($trs as $tr) {
foreach ($tr->childNodes as $td){
if ($td->hasChildNodes()) { //check if <td> has childnodes
foreach($td->childNodes as $i) {
if ($i->hasAttributes()){ //check if childnode has attributes
echo $i->getAttribute("href") . "\n"; // get href="" attribute
}
}
}
}
}
Result:
test1.php
test2.php
test3.php

Related

I have to display image and data from xml, how can I do it in php?

Each time it loops, the text that it shows only the Product_URL. I really confuse how to solve this problem. I guess there is something wrong with the loop.
<html>
<head>
<title>Display main Image</title>
</head>
<body>
<table>
<tr>
<th>Thumbnail Image</th>
<th>Product Name</th>
<th>Product Description</th>
<th>Price</th>
<th>Weight</th>
<th>Avail</th>
<th>Product URL</th>
</tr>
<tr>
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->Load('xml_feeds7.xml');
$xpath = new DOMXPath($doc);
$listquery = array('//item/thumbnail_url', '//item/productname', '//item/productdesciption', '//item/price', '//item/weight', '//item/avail', '//item/product_url');
foreach ($listquery as $queries) {
$entries = $xpath->query($queries);
foreach ($entries as $entry) { ?>
<tr>
<td>
<img src="<?php echo $entry->nodeValue; ?>" width="100px" height="100px">
</td>
<td>
<?php echo "$entry->nodeValue"; ?>
</td>
<td>
<?php echo "$entry->nodeValue"; ?>
</td>
<td>
<?php
$price_value = $entry->nodeValue;
echo str_replace($price_value, ".00", "");
?>
</td>
<td>
<?php
$weight_value = $entry->nodeValue;
echo str_replace($weight_value, ".00", "");
?>
</td>
<td>
<?php echo "$entry->nodeValue"; ?>
</td>
<td>
<?php echo "$entry->nodeValue"; ?>
</td>
<td>
<?php echo "$entry->nodeValue"; ?>
</td>
</tr>
}
}
</tr>
</table>
</body>
</html>
The table should be displaying:
---------------------------------------------------------------------------------
| Thumbnail | Product Name | Description | Price | Weight | Avail | Product_URL |
---------------------------------------------------------------------------------
Xpath can return scalar values (strings and numbers) directly, but you have to do the typecast in the Expression and use DOMxpath::evaluate().
You should iterate the items and then use the item as a context for the detail data expressions. Building separate lists can result in invalid data (if an element in on of the items is missing).
Last you can use DOM methods to create the HTML table. That way it will take care of escaping and closing the tags.
$xml = <<<'XML'
<items>
<item>
<thumbnail_url>image.png</thumbnail_url>
<productname>A name</productname>
<productdescription>Some text</productdescription>
<price currency="USD">42.21</price>
<weight unit="g">23</weight>
<avail>10</avail>
<product_url>page.html</product_url>
</item>
</items>
XML;
$document = new DOMDocument;
$document->preserveWhiteSpace = false;
$document->loadXml($xml);
$xpath = new DOMXPath($document);
$fields = [
'Thumbnail' => 'string(thumbnail_url)',
'Product Name' => 'string(productname)',
'Description' => 'string(productdescription)',
'Price' => 'number(price)',
'Weight' => 'number(weight)',
'Availability' => 'string(avail)',
'Product_URL' => 'string(product_url)'
];
$html = new DOMDocument();
$table = $html->appendChild($html->createElement('table'));
$row = $table->appendChild($html->createElement('tr'));
// add table header cells
foreach ($fields as $caption => $expression) {
$row
->appendChild($html->createElement('th'))
->appendChild($html->createTextNode($caption));
}
// iterate the items in the XML
foreach ($xpath->evaluate('//item') as $item) {
// add a new table row
$row = $table->appendChild($html->createElement('tr'));
// iterate the field definitions
foreach ($fields as $caption => $expression) {
// fetch the value using the expression in the item context
$value = $xpath->evaluate($expression, $item);
switch ($caption) {
case 'Thumbnail':
// special handling for the thumbnail field
$image = $row
->appendChild($html->createElement('td'))
->appendChild($html->createElement('img'));
$image->setAttribute('src', $value);
break;
case 'Price':
case 'Weight':
// number format for price and weight values
$row
->appendChild($html->createElement('td'))
->appendChild(
$html->createTextNode(
number_format($value, 2, '.')
)
);
break;
default:
$row
->appendChild($html->createElement('td'))
->appendChild($html->createTextNode($value));
}
}
}
$html->formatOutput = TRUE;
echo $html->saveHtml();
Output:
<table>
<tr>
<th>Thumbnail</th>
<th>Product Name</th>
<th>Description</th>
<th>Price</th>
<th>Weight</th>
<th>Availability</th>
<th>Product_URL</th>
</tr>
<tr>
<td><img src="image.png"></td>
<td>A name</td>
<td>Some text</td>
<td>42.21</td>
<td>23.00</td>
<td>10</td>
<td>page.html</td>
</tr>
</table>
I've changed it to use SimpleXML as this is a fairly simple data structure - but this fetches each <item> and then displays the values from there. I've only done this with a few values, but hopefully this shows the idea...
$doc = simplexml_load_file('xml_feeds7.xml');
foreach ( $doc->xpath("//item") as $item ) {
echo "<tr>";
echo "<td><img src=\"{$item->thumbnail_url}\" width=\"100px\" height=\"100px\"></td>";
echo "<td>{$item->productname}</td>";
echo "<td>{$item->productdesciption}</td>";
// Other fields...
$price_value = str_replace(".00", "",(string)$item->price);
echo "<td>{$price_value}</td>";
// Other fields...
echo "</tr>";
}
Rather than use XPath for each value, it uses $item->elementName, so $item->productname is the productname. A much simpler way of referring to each field.
Note that with the price field, as you are processing it further - you have to cast it to a string to ensure it will process correctly.
Update:
If you need to access data in a namespace in SimpleXML, you can use XPath, or in this case there is a simple (bit roundabout way). Using the ->children() method you can pass the namespace of the elements you want, this will then give you a new SimpleXMLElement with all the elements for that namespace.
$extraData = $item->children('g',true);
echo "<td>{$extraData->productname}</td>";
Now - $extraData will have any element with g as the namespace prefix, and they can be referred to in the same way as before, but instead of $item you use $extraData.

PHP DOM Parser Get Specific text by Class While Looping

I am working on a PHP Simple DOM Parser and i want a simple solution for my question
<tr>
<td class="one">1</td>
<td class="two">2</td>
<td class="three">3</td>
</tr>
<tr>
<td class="one">10</td>
<td class="two">20</td>
<td class="three">30</td>
</tr>...
the html of mine is will look similar to the above
and i am looping over through td something like this
foreach ($sample->find("td") as $ele)
{
if($ele->class == "one")
echo "ONE = ".$ele->plaintext;
if($ele->class == "two")
echo "TWO= ".$ele->plaintext;
}
But is there any simple solution that without if condition getting the plaintext of particular class i dont want shorthand if also
I am expecting something like this below
$ele->class->one
take a look at it:
<?php
$html = "
<table>
<tr>
<td class='one'>1</td>
<td class='two'>2</td>
<td class='three'>3</td>
</tr>
<tr>
<td class='one'>10</td>
<td class='two'>20</td>
<td class='three'>30</td>
</tr>
</table>
";
// Your class name
$classeName = 'one';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
// Get the results
$results = $xpath->query("//*[#class='" . $classeName . "']");
for($i=0; $i < $results->length; $i++) {
echo $review = $results->item($i)->nodeValue . "<br>";
}
?>

XPath select TD's inside TR

I want to capture all the content between td tags but divide them by their tr. So i can get an array with the content inside every tr.
<div id="box">
<tr align='center'>
<td>1</td>
<td style='padding-left: 0px !important;padding-right: 10px !important;'> <div id=''></div></td>
<td>45</td>
<td>62</td>
</tr><tr align='center'>
<td>2</td>
<td style='padding-left: 0px !important;padding-right: 10px !important;'> <div id=''></div></td>
<td>35</td>
<td>47</td>
</tr><tr align='center'>
<td>3</td>
<td style='padding-left: 0px !important;padding-right: 10px !important;'> <div id=''></div></td>
<td>63</td>
<td>58</td>
</tr>
I've tried with this:
<?php
$url = '';
$html = file_get_contents($url);
$doc = new DOMDocument();
$doc->preserveWhiteSpace = FALSE;
#$doc->loadHTML($html);
$xpath = new DOMXpath ($doc);
$expresion = "//div[#id='box']//tr//td";
$node = $xpath->evaluate($expresion);
foreach ($node as $nd)
{
echo $nd->nodeValue;
}
?>
But the output is:
1
45
62
2
35
47
3
63
58
If you want to group the td values by their tr, I would separate the xpath into two queries. One query selects the <tr> nodes and a second query selects the <td> childs of that node.
If you put that into a loop it can look like this:
<?php
$html = <<<EOF
<div id="box">
... Your HTML comes here
</tr>
EOF;
$url = '';
$doc = new DOMDocument();
$doc->preserveWhiteSpace = FALSE;
#$doc->loadHTML($html);
$xpath = new DOMXpath ($doc);
$expresion = "//div[#id='box']//tr";
$trs = $xpath->evaluate($expresion);
foreach ($trs as $tr)
{
$tdvals = array();
foreach($xpath->query('td', $tr) as $td) {
/* Skip the td with the empty text value */
if(trim($td->nodeValue) !== '') {
$tdvals []= $td->nodeValue;
}
}
echo implode(',', $tdvals) . PHP_EOL;
}
which outputs:
1,45,62
2,35,47
3,63,58
One another thing. In your example you are using file_get_contents() to load the HTML. Note that you can use DOMDocument::loadHTMLFile() to load (remote) files.

Extract text and image src with PHP DomDocument

I'm trying to extract img src and the text of the TDs inside the div id="Ajax" but i'm unable to extract the img with my code. It just ignores the img src. How can i extract also the img src and add it in the array?
HTML:
<div id="Ajax">
<table cellpadding="1" cellspacing="0">
<tbody>
<tr id="comment_1">
<td>20:28</td>
<td class="color">
</td>
<td class="last_comment">
Text<br/>
</td>
</tr>
<tr id="comment_2">
<td>20:25</td>
<td class="color">
</td>
<td class="comment">
Text 2<br/>
</td>
</tr>
<tr id="comment_3">
<td>20:24</td>
<td class="color">
<img src="http://url.ext/img/image02.jpeg" alt="img alt 2"/>
</td>
<td class="comment">
Text 3<br/>
</td>
</tr>
<tr id="comment_4">
<td>20:23</td>
<td class="color">
<img src="http://url.ext/img/image01.jpeg" alt="img alt"/>
</td>
<td class="comment">
Text 4<br/>
</td>
</tr>
</div>
PHP:
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$contentArray = array();
$doc = $doc->getElementById('Ajax');
$text = $doc->getElementsByTagName ('td');
foreach ($text as $t)
{
$contentArray[] = $t->nodeValue;
}
print_r ($contentArray);
Thanks.
You're using $t->nodeValue to obtain the content of a node. An <img> tag is empty, thus has nothing to return. The easiest way to get the src attribute would be XPath.
Example:
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$expression = "//div[#id='Ajax']//tr";
$nodes = $xpath->query($expression); // Get all rows (tr) in the div
$imgSrcExpression = ".//img/#src";
$firstTdExpression = "./td[1]";
foreach($nodes as $node){ // loop over each row
// select the first td node
$tdNodes = $xpath->query($firstTdExpression ,$node);
$tdVal = null;
if($tdNodes->length > 0){
$tdVal = $tdNodes->item(0)->nodeValue;
}
// select the src attribute of the img node
$imgNodes = $xpath->query($imgSrcExpression,$node);
$imgVal = null;
if($imgNodes ->length > 0){
$imgVal = $imgNodes->item(0)->nodeValue;
}
}
(Caution: Code may contain typos)

xPath retrieve onclick value

I'm trying to retrieve the onclick value on a td element. This is what I have so far.
$xpath = new DOMXPath($dom);
$trs = $xpath->query("/html/body//table/tr");
foreach ($trs as $tr){
$tds = $xpath->query("td", $tr);
foreach ($tds as $td) {
$a = $xpath->query("#onclick", $td);
echo $a->nodeValue;
echo $td->nodeValue;
}
}
This doesn't seem to be working though.
Here's the structure
<table>
<tr>
<td>Name</td>
<td onclick="blahblah">Author</td>
<td>Title</td>
</tr>
</table>
$a is a NodeList, you must select an item:
#print($a->item(0)->nodeValue);

Categories