xpath match not working - php

I'm trying to fetch the content of message field with "//td[text()='message']/following-sibling::*/text()" from this result ( from curl ):
<BODY bgcolor=#dddddd>
<TABLE bgcolor=#dddddd border=1>
<TR>
<TD valign="top"><B>Something</B></TD>
<TD>ca</TD>
</TR>
<TR>
<TD valign="top"><B>Some list</B></TD>
<TD>
<TABLE>
<TR>
<TD>CA</TD>
</TR>
</TABLE>
</TD>
</TR>
<TR>
<TD valign="top"><B>message</B></TD>
<TD>CA already existed.</TD>
</TR>
</TABLE>
</BODY>
<br>
But it doenst seens to work, The funny thing is using the same expression with python i can get it to work. So, how could i get the content of the message field?
PS: I'm using this online tester tool: http://www.xpathtester.com/test
EDIT: This is my actual php code:
<?php
function get_url_data($acl)
{
// curl request
$xml_content = http_request($acl);
echo $xml_content ;
$dom = new DOMDocument();
#$dom->loadXML($xml_content);
$xpath = new DomXPath($dom);
$content_title = $xpath->query("//td[text()='message']/following-sibling::*/text()");
return $content_title;
}
if(isset($_POST)==true && empty($_POST)==false){
//Convert content of text area into an array
$data = explode("\n", str_replace("\r", "", $_POST['sendme']));
}
foreach ($data as $name => $value){
$content = get_url_data($value);
foreach ($content as $value)
{
echo $value->nodeValue . "<br/>";
}
echo "<br>";
}
?>

I was able to get it working with:
<?php
if(isset($_POST)==true && empty($_POST)==false){
//Convert content of text area into an array
$data = explode("\n", str_replace("\r", "", $_POST['sendme']));
}
foreach ($data as $name => $value){
$content = create_acl($value);
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
#$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$filtered = $xpath->query("//td[text()='message']/following-sibling::*/text()");
foreach ($filtered as $e) {
echo $e->nodeValue;
}
echo "<br>";
}
?>

Related

How to use simple html DOM for this table?

hi everybody i want to dom a webpage and fetch data from it's table.
table code :
<tbody>
<tr class="sh" onclick="ii.ShowShareHolder('21711,IRO1DMOR0004')">
<td>person</td>
<td><div class="ltr" title="1,100,000">1 M</div></td>
<td>2.050</td>
<td>0</td>
<td><div class=""></div></td>
</tr>
<tr class="sh" onclick="ii.ShowShareHolder('42123,IRO1DMOR0004')">
<td>person</td>
<td>953,169</td>
<td>1.780</td>
<td>0</td>
<td><div class=""></div></td>
</tr>
</tbody>
this table has two kind of data bigger than 1M and smaller than 1M . i want to get the 1.100.000 td div data and 953.169 data on this table.
my code is below.it works fine for bigger than 1M data but i don't know how to get the smaller data on this table.
foreach ($tables as $table) {
foreach ($table->find('tr') as $row) {
foreach($row->find('div') as $div)
{
if(array_key_exists('title',$div->attr))
{
$data[] = str_replace(",","",($div->attr['title']));
}
}
}
}
tnx man i use your code but it doesn't work and have many error.
this is my complete functions code.because the server is gzip encode i read with curl .
$url = "http://tsetmc.com/Loader.aspx?Partree=15131T&c=IRO1DMOR0004";
$curl = curl_get_data($url);
if(!empty($curl) ){
$html = str_get_html($curl);
$xml = simplexml_load_string($html);
var_dump($xml);
$data = [];
// For each <tr>
foreach ($xml->tr as $row) {
// check path `<td><div title="">`
$result = $row->xpath('td/div[#title]');
if (! empty($result)) {
foreach ($result as $item) {
$data[] = str_replace(',', '', $item['title']);
}
}
else {
// if not found, check the 2nd <td>
$result = $row->xpath('td[position()=2]');
foreach ($result as $item) {
$data[] = str_replace(',', '', $item);
}
}
}
return $data;
}
You could check if the <div title=""> exist. If true, get that value, else, get the value of the second <td>.
Here is an example using SimpleXML:
$html = <<<HTML
<tbody>
<tr class="sh" onclick="ii.ShowShareHolder('21711,IRO1DMOR0004')">
<td>person</td>
<td><div class="ltr" title="1,100,000">1 M</div></td>
<td>2.050</td>
<td>0</td>
<td><div class=""></div></td>
</tr>
<tr class="sh" onclick="ii.ShowShareHolder('42123,IRO1DMOR0004')">
<td>person</td>
<td>953,169</td>
<td>1.780</td>
<td>0</td>
<td><div class=""></div></td>
</tr>
</tbody>
HTML;
// parse HTML
$xml = simplexml_load_string($html);
$data = [];
// For each <tr>
foreach ($xml->tr as $row) {
// if not found, check the 2nd <td>
$item = $row->children()[1];
// check if a div with title exists
if (isset($item->div['title'])) {
$data[] = str_replace(',', '', $item->div['title']);
}
else { // else, take the content
$data[] = str_replace(',', '', $item);
}
}
var_dump($data);
Output:
array(2) {
[0]=>
string(7) "1100000"
[1]=>
string(6) "953169"
}
See the live demo.

How to select text from HTML table using PHP DOM query?

How can I get text from HTML table cells using PHP DOM query?
HTML table is:
<table>
<tr>
<th>Job Location:</th>
<td>Kabul
</td>
</tr>
<tr>
<th>Nationality:</th>
<td>Afghan</td>
</tr>
<tr>
<th>Category:</th>
<td>Program</td>
</tr>
</table>
I have following query but it doesn't work:
$xmlPageDom = new DomDocument();
#$xmlPageDom->loadHTML($html);
$xmlPageXPath = new DOMXPath($xmlPageDom);
$value = $xmlPageXPath->query('//table td /text()');
get a complete table with php domdocument and print it
The answer is like this:
$html = "<table ID='myid'><tr><td>1</td><td>2</td></tr><tr><td>4</td><td>5</td></tr><tr><td>7</td><td>8</td></tr></table>";
$xml = new DOMDocument();
$xml->validateOnParse = true;
$xml->loadHTML($html);
$xpath = new DOMXPath($xml);
$table =$xpath->query("//*[#id='myid']")->item(0);
$rows = $table->getElementsByTagName("tr");
foreach ($rows as $row) {
$cells = $row -> getElementsByTagName('td');
foreach ($cells as $cell) {
print $cell->nodeValue;
}
}
EDIT: Use this instead
$table = $xpath->query("//table")->item(0);

Unable to get both child elements with xpath from xhtml using xquery in php to manipulate

The xhtml data I need to get the childNodes from I don't need the child from the TH childNODES
<table>some data</table>
<table>
<tr>
<td class="c2">PCI Signal Error (SERR#) Enable</td>
<td>Yes</td>
</tr>
<tr>
<td class="c1">Controller Type 1</td>
<td>CISS</td>
</tr>
<tr>
<td class="c2">bus type</td>
<td>CISS</td>
</tr>
<tr>
<th><a name="systempcibus5">PCI Bus 31</a></th>
<td>Device</td>
</tr>
</table>
below is the latest attempt, I only want to get the textContent for the TD's in the above xml
so I can build a mysql statement to insert the data in mySql
I have tried so many variations over the last week.
I get this error. I won't bore you with all the various things I tried, but I believe this is the closest to what I want.
PHP Notice: Trying to get property of non-object in C:\inetpub\wwwroot\reports\gec\test1.php on line 40
<?php
libxml_use_internal_errors(true);
$dom = new DomDocument;
$dom->loadHTML($html);
$xpath = new DomXPath($dom);
$nodes = $xpath->query('/html/body/table[2]/tr');
//$nodes = $xpath->query("//tr[contains(concat(' ', #class, ' '), ' head ') ");
//header("Content-type: text/plain");
$node_count=$nodes->length ;
for( $i = 1; $i <= intval($node_count); $i++)
{
$node_td1 = $xpath->query('/html/body/table[2]/tr[$i]/td[1]');
$node_td2 = $xpath->query('/html/body/table[2]/tr[$i]/td[2]');
$result1=$node_td1->textContent;
$result2=$node_td2->textContent;
echo $result1 . "," . $result2 . "<br>";
}
Alternatively, you could just point out the row itself, then filter them out using that ->tagName:
$dom = new DomDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DomXPath($dom);
$rows = $xpath->query('/html/body/table[2]/tr');
foreach ($rows as $row) {
foreach($row->childNodes as $col) {
if(isset($col->tagName) && $col->tagName != 'th') {
echo $col->textContent . '<br/>';
}
}
echo '<hr/>';
}
Or with using xpath, to reference each row:
foreach ($rows as $row) {
$col1 = $xpath->evaluate('string(./td[1])', $row);
$col2 = $xpath->evaluate('string(./td[2])', $row);
echo $col1 . '<br/>';
echo $col2 . '<br/>';
echo '<hr/>';
}
Sample Output

Load and display HTML first before PHP

I have 2 php files(index.php and lelong.php). I am trying to load the html first in the index.php (table) and display the word(Calculating...) on the second column while the lelong.php extracting the data from the website before outputting them.
Is there a way to do that? I heard of using JS or AJAX but not really sure how to do it.
Index.php
<!DOCTYPE html>
<html>
<head>
<?php include 'lelong.php'; ?>
</head>
<body>
<table border ="1" style = "width:50%">
<tr>
<td>E-Commerce Website</td>
<td>No. of Products </td>
</tr>
<tr>
<td>Lelong</a></td>
<td><?php echo $lelong; ?></td>
</tr>
</table>
<body>
lelong.php
<?php
$grep = new DoMDocument();
#$grep->loadHTMLFile("http://www.lelong.com.my/Auc/List/BrowseAll.asp");
$finder = new DomXPath($grep);
$class = "CatLevel1";
$nodes = $finder->query("//*[contains(#class, '$class')]");
$total_L = 0;
foreach ($nodes as $node) {
$span = $node->childNodes;
$search = array(0,1,2,3,4,5,6,7,8,9);
$number = str_replace($search, '', $span->item(1)->nodeValue);
$number = preg_replace("/[^0-9]/", '', $span->item(1)->nodeValue);
$total_L += (int) $number;
}
$lelong = number_format( $total_L , 0 , '.' , ',' );
?>
Thanks
Assuming lelong.php is already working fine, yes you could use ajax to get the result:
Basic example:
So in your HTML:
<table border ="1" style = "width:50%">
<tr>
<td>E-Commerce Website</td>
<td>No. of Products </td>
</tr>
<tr>
<td>Lelong</a></td>
<td class="result_data">Calculating ...</td><!-- initial content. Loading ... -->
</tr>
</table>
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
$(document).ready(function(){
// use ajax, call the PHP
$.ajax({
url: 'lelong.php', // path of lelong
success: function(response){
$('.result_data').text(response);
}
})
});
</script>
Then in your PHP:
<?php
$grep = new DoMDocument();
#$grep->loadHTMLFile("http://www.lelong.com.my/Auc/List/BrowseAll.asp");
$finder = new DomXPath($grep);
$class = "CatLevel1";
$nodes = $finder->query("//*[contains(#class, '$class')]");
$total_L = 0;
foreach ($nodes as $node) {
$span = $node->childNodes;
$search = array(0,1,2,3,4,5,6,7,8,9);
$number = str_replace($search, '', $span->item(1)->nodeValue);
$number = preg_replace("/[^0-9]/", '', $span->item(1)->nodeValue);
$total_L += (int) $number;
}
$lelong = number_format( $total_L , 0 , '.' , ',' );
echo $lelong; // output lelong
exit;
?>
The effects on the front are yours to control. You could use plugins for that.
In your lelong.php page, Add a line echo $lelong; At the end.
In your index.php, remove <?php include 'lelong.php'; ?>, and import JQuery library. For example <script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
Replace <td><?php echo $lelong; ?></td> by <td id='lelong'>Calculating...</td>
Add jquery code <script>$('#lelong').load('lelong.php');</script>
It's better for you to learn jQuery first.

Preserving <br> tags when parsing HTML text content

I have a little issue.
I want to parse a simple HTML Document in PHP.
Here is the simple HTML :
<html>
<body>
<table>
<tr>
<td>Colombo <br> Coucou</td>
<td>30</td>
<td>Sunny</td>
</tr>
<tr>
<td>Hambantota</td>
<td>33</td>
<td>Sunny</td>
</tr>
</table>
</body>
</html>
And this is my PHP code :
$dom = new DOMDocument();
$html = $dom->loadHTMLFile("test.html");
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
$cols = $row->getElementsByTagName('td');
echo $cols->item(0)->nodeValue.'<br />';
echo $cols->item(1)->nodeValue.'<br />';
echo $cols->item(2)->nodeValue;
}
But as you can see, I have a <br> tag and I need it, but when my PHP code runs, it removes this tag.
Can anybody explain me how I can keep it?
I would recommend you to capture the values of the table cells with help of XPath:
$values = array();
$xpath = new DOMXPath($dom);
foreach($xpath->query('//tr') as $row) {
$row_values = array();
foreach($xpath->query('td', $row) as $cell) {
$row_values[] = innerHTML($cell);
}
$values[] = $row_values;
}
Also, I've had the same problem as you with <br> tags being stripped out of fetched content for the reason that they themselves are considered empty nodes; unfortunately they're not automatically replaced with a newline character (\n);
So what I've done is designed my own innerHTML function that has proved invaluable in many projects. Here I share it with you:
function innerHTML(DOMElement $element, $trim = true, $decode = true) {
$innerHTML = '';
foreach ($element->childNodes as $node) {
$temp_container = new DOMDocument();
$temp_container->appendChild($temp_container->importNode($node, true));
$innerHTML .= ($trim ? trim($temp_container->saveHTML()) : $temp_container->saveHTML());
}
return ($decode ? html_entity_decode($innerHTML) : $innerHTML);
}

Categories