I'm an Xpath newbie. I want to loop through the result of a cURL query and print each element of the only table on the page.
I've used the Xpath plugin for Firefox to obtain my expression and my table is structured as follows:
<table>
<tr class="listItemOneBg">
<td valign="top">
SMITH
</td>
<td valign="top">
WILLIAM C C
</td>
<td valign="top">
Male
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
BLACKWOOD
</td>
<td valign="top">
61
</td>
<td valign="top">
1924
</td>
<td valign="top">
<a target="_blank" href='XXX'>
order</a>
</td>
</tr>
<tr class="listItemTwoBg">
<td valign="top">
SMITH
</td>
<td valign="top">
WILLIAM C PAGE-
</td>
<td valign="top">
Male
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
</td>
<td valign="top">
SWAN
</td>
<td valign="top">
9
</td>
<td valign="top">
1914
</td>
<td valign="top">
<a target="_blank" href='XXY'>
order</a>
</td>
</tr>
Here's the code I've tried so far. I get a message"Warning: Invalid argument supplied for foreach()". What am I doing wrong?
$page = curl_exec($ch);
curl_close($ch);
// Create new PHP DOM document
$dom = new DOMDocument;
// Load html from curl request into document model
#$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
$tableRows = $xpath->query("id('divResults')/table/tbody/tr");
foreach ($tableRows as $row) {
// fetch all 'tds' inside this 'tr'
$td = $xpath->query('td', $row);
echo $td->item(1)->textContent;
}
Assuming the table you're after is actually in a <div id="divResults">...
$tableRows = $xpath->query('//div[#id="divResults"]/table/tbody/tr');
foreach ($tableRows as $row) {
$cells = $row->getElementsByTagName('td');
}
That's a non-standard XPath expression. It cannot work in DOMXPath.(Downvoters, the expression has been edited since the question was posted. Cheers!)
This is where you learn XPath:
Microsoft XPath Syntax
Microsoft XPath by Example
PS: It's where I learnt it.
Related
I'm trying to get the HTML markup of a table in a page:
$new_dom = new DOMDocument();
$table = '';
$nodesTable = $this->dom->getElementsbyTagName("table");
foreach($nodesTable as $nodeTable){
$color = $nodeTable->getAttribute('bordercolordark');
if ($color == '#73BAFF') {
$table = $nodeTable;
}
}
$new_dom->appendChild($table);
echo $new_dom->saveHTML();
Here is somepage.html:
<html>
<table>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
</table>
<table border="1" cellpadding="0" width="500" bordercolorlight="#ACD6FF" bordercolordark="#73BAFF" align="center">
<tr>
<td rowspan="2" colspan="2" bgcolor="#73BAFF"> </td>
<td colspan="3" align="center" bgcolor="#ACD6FF"> Element 1 </td>
<td colspan="3" align="center" bgcolor="#ACD6FF"> Element 2 </td>
</tr>
<tr>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
</tr>
<tr>
<td bgcolor="#ACD6FF" width="155" align="center"> Row 1</td>
<td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
</tr>
<tr>
<td bgcolor="#ACD6FF" width="155" align="center"> Row 2</td>
<td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
</tr>
<tr>
<td bgcolor="#ACD6FF" width="155" align="center"> Row 3</td>
<td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
</tr>
</table>
<table>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
</table>
<table>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
</table>
</html>
$new_dom just outputs \n instead of HTML markup. I tried looking at this answer: PHP DOMDocument stripping HTML tags, but appending the table this way didn't work either.
Fatal error: Uncaught exception 'DOMException' with message 'Wrong Document Error'
So you cannot move nodes from one document to another... If you want to do that, you have to use importNode() with the deep flag.
$dom = new DOMDocument();
$dom->loadHTMLFile('x.html');
$new_dom = new DOMDocument();
$table = '';
$nodesTable = $dom->getElementsbyTagName("table");
foreach($nodesTable as $nodeTable){
$color = $nodeTable->getAttribute('bordercolordark');
if ($color == '#73BAFF') {
$table = $new_dom->importNode($nodeTable, true);
}
}
$new_dom->appendChild($table);
echo $new_dom->saveHTML();
This imports only the table element, but not the children...
note: I'd disable the entity loader in your case libxml_disable_entity_loader(true);. I am not sure whether XEE attacks work with loadHTML() too, but just for the sake of security.
I've been trying to split a PHP string in an arbitrary number of characters per split. However, I'm looking for a way to do so without breaking HTML tags. Here is an example:
$string = 'Section 1:
<table width = "528" border="0" cellpadding="0" cellspacing="0">
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 1 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 2 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 3 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top">• </td> <td valign="top"> Element 4 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 5 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 6 </td></tr>
</table>
Section 2:
<table width = "528" border="0" cellpadding="0" cellspacing="0">
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 7 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 8 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 9 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 10 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 11 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 12 </td></tr>
<tr> <td width="20"> </td> <td width="15" valign="top"> • </td> <td valign="top"> Element 13 </td></tr>
</table>';
$charAmount = 450;
$textSplit = array();
while ($string){
array_push($textSplit, substr($string, 0, $charAmount));
$string = substr($string, $charAmount);
}
var_dump($textSplit);
In this case, two tags are broken. I'd like whatever tag that is cut up at the end of a split to just skip to the next split, but I have no idea how to do this.
I'm not php guys, But logicwise I can help, just before split check which of dese two character is present nearest backwards from the split index < or >
if < is encountered u r splitting in wrong place so skip
if > is encountered go ahead with split
I have done it in jQuery successfully sometimes back
About Splitting html string, I have no ideas now but cutting string with limit character you could refer the solution at the link: https://github.com/dhngoc/php-cut-html-string.
This resource may help you to get more ideas.
I have a Table, see Code. Its a table that has a table in it, so its nested. Now i want to get all vales of the parent table only and then all values of the child table.
To get the childs data i can do this:
$query = '//*[#id="WordClass"]/table[2]/tr/td[2]/table/tr';
$nodes = $xpath->query($query);
foreach ($nodes as $node) { //do more querys to get the td data and save it..
My problem is how to only get the data of the parent table without getting the child data/tr/td also.
<table cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr valign="top">
<td>
<table cellpadding="1" cellspacing="2" border="0">
<tr>
<td class="colTitle" align="center" colspan="4">
Da Titel
</td>
</tr>
<tr>
<td class="colTitle" align="center" colspan="2">One
</td>
<td class="colTitle" align="center" colspan="2">Two
I
</td>
</tr>
<tr>
<td class="colSubTitle">Pe</td>
<td class="colSubTitle">Ve</td>
<td class="colSubTitle">Pe</td>
<td class="colSubTitle">Ve</td>
</tr>
<tr>
<td class="rowTitle">x</td>
<td class="colVerbDef">y</td>
<td class="rowTitle">z</td>
<td class="colVerbDef">c</td>
</tr>
<tr>
<td class="rowTitle">r</td>
<td class="colVerbDef">t</td>
<td class="rowTitle">z</td>
<td class="colVerbDef">z</td>
</tr>
</table>
</td>
<td>
<table cellpadding="1" cellspacing="2" border="0">
<tr>
<td class="colTitle" align="center" colspan="4">
Da Titel2
</td>
</tr>
<tr>
<td class="colTitle" align="center" colspan="2">One
</td>
<td class="colTitle" align="center" colspan="2">Two
I
</td>
</tr>
<tr>
<td class="colSubTitle">Pe2</td>
<td class="colSubTitle">Ve2</td>
<td class="colSubTitle">Pe2</td>
<td class="colSubTitle">Ve2</td>
</tr>
<tr>
<td class="rowTitle">x2</td>
<td class="colVerbDef">y2</td>
<td class="rowTitle">z2</td>
<td class="colVerbDef">c2</td>
</tr>
<tr>
<td class="rowTitle">r2</td>
<td class="colVerbDef">t2</td>
<td class="rowTitle">z2</td>
<td class="colVerbDef">z2</td>
</tr>
</table>
</td>
</tr>
</tbody>
You can get the contents of the parent table's td elements using a direct path from the root:
/table/tbody/tr/td
The contents of those cells happen to be another table element, but you can strip those out with DOMDocument.
To get the inner tables' td elements only excluding the parents, you can look for tables that have a td parent, then select its tds:
//td/table//td
If I've misunderstood your question, please feel free to explain further and I will update.
I'm trying to extract a specific link from a table but is not displaying anything. It's the 3rd link in the td. I thought this would work but doesn't.
here the code:
<?php
$site = 'site';
$html = file_get_html($site);
foreach($html->find('td a', 3) as $element)
echo $element->href;
?>
Here is the HTML
<tr class="evenrow team-600-359">
<td>
Aug 17
</td>
<td>
FT
</td>
<td align="right">
Arsenal
</td>
<td align="center">
1-3
</td>
<td>Aston Villa</td>
<td style="text-align:right;">60,003</td>
</td>
<td>
Premier League
</td>
</tr>
You have invalid HTML. It can be the cause.
Check double closing of TD with 60,003 value.
Just use native DomDocument:
$str = <<<STR
<tr class="evenrow team-600-359">
<td>
Aug 17
</td>
<td>
FT
</td>
<td align="right">
Arsenal
</td>
<td align="center">
1-3
</td>
<td>Aston Villa</td>
<td style="text-align:right;">60,003</td>
</td>
<td>
Premier League
</td>
</tr>
STR;
$dom = new DOMDocument();
#$dom->loadHTML($str);
$elements = $dom->getElementsByTagName('td');
echo '<pre>' . print_r($dom->saveXML($elements->item(2)), true) . '</pre>';
OUTPUT
<td align="right">
Arsenal
</td>
I m working on module in which i have to make pdf from php page. I m Using tcpdf for that but m facing one problem that file contain some mysql queries and php coding which is not executed by pdf page.
$prn_no = $_POST['prn_no'];
$current_sem = $_POST['current_sem'];
$qr_fetch_sem_res_id = mysql_query("SELECT * FROM table1 WHERE ((prn='$prn_no') AND (semisterName='$current_sem'))")or die(mysql_error());
$qr_fetch_sem_result_ans = mysql_fetch_array($qr_fetch_sem_res_id);
<tr>
<td colspan="11" align="left" valign="middle">Programme Name: <?php echo $qr_fetch_sem_result_ans['programme_name'];?></td>
</tr>
<tr>
<td colspan="11" align="center" valign="middle"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="27%">Seat No.: <?php echo $qr_fetch_sem_result_ans['seatNo'];?></td>
<td width="3%"> </td>
<td width="22%">PR No. : <?php echo $qr_fetch_sem_result_ans['prn'];?></td>
<td width="2%"> </td>
<td width="17%">Semester : <?php echo $qr_fetch_sem_result_ans['semisterName'];?></td>
<td width="1%"> </td>
<td width="25%">Month / Year Of Exam : <?php echo $qr_fetch_sem_result_ans['month_year_of_exam'];?> </td>
<td width="3%"> </td>
</tr>
<tr>
<td colspan="3">Name: <?php echo $qr_fetch_sem_result_ans['student_name'];?></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td colspan="7">College / Institute: <?php echo $qr_fetch_sem_result_ans['institute_name'];?></td>
<td> </td>
</tr>
</table></td>
</tr>
I'm going to go out on a limb here and suggest that you run your queries fist and then build your pdf file. If you run the queries after you build the pdf then of course it will not have access to your data. If that doesn't help then I must not understand what you're asking.