Xpath nested tables - php

I have a Table, see Code. Its a table that has a table in it, so its nested. Now i want to get all vales of the parent table only and then all values of the child table.
To get the childs data i can do this:
$query = '//*[#id="WordClass"]/table[2]/tr/td[2]/table/tr';
$nodes = $xpath->query($query);
foreach ($nodes as $node) { //do more querys to get the td data and save it..
My problem is how to only get the data of the parent table without getting the child data/tr/td also.
<table cellpadding="0" cellspacing="0" border="0">
<tbody>
<tr valign="top">
<td>
<table cellpadding="1" cellspacing="2" border="0">
<tr>
<td class="colTitle" align="center" colspan="4">
Da Titel
</td>
</tr>
<tr>
<td class="colTitle" align="center" colspan="2">One
</td>
<td class="colTitle" align="center" colspan="2">Two
I
</td>
</tr>
<tr>
<td class="colSubTitle">Pe</td>
<td class="colSubTitle">Ve</td>
<td class="colSubTitle">Pe</td>
<td class="colSubTitle">Ve</td>
</tr>
<tr>
<td class="rowTitle">x</td>
<td class="colVerbDef">y</td>
<td class="rowTitle">z</td>
<td class="colVerbDef">c</td>
</tr>
<tr>
<td class="rowTitle">r</td>
<td class="colVerbDef">t</td>
<td class="rowTitle">z</td>
<td class="colVerbDef">z</td>
</tr>
</table>
</td>
<td>
<table cellpadding="1" cellspacing="2" border="0">
<tr>
<td class="colTitle" align="center" colspan="4">
Da Titel2
</td>
</tr>
<tr>
<td class="colTitle" align="center" colspan="2">One
</td>
<td class="colTitle" align="center" colspan="2">Two
I
</td>
</tr>
<tr>
<td class="colSubTitle">Pe2</td>
<td class="colSubTitle">Ve2</td>
<td class="colSubTitle">Pe2</td>
<td class="colSubTitle">Ve2</td>
</tr>
<tr>
<td class="rowTitle">x2</td>
<td class="colVerbDef">y2</td>
<td class="rowTitle">z2</td>
<td class="colVerbDef">c2</td>
</tr>
<tr>
<td class="rowTitle">r2</td>
<td class="colVerbDef">t2</td>
<td class="rowTitle">z2</td>
<td class="colVerbDef">z2</td>
</tr>
</table>
</td>
</tr>
</tbody>

You can get the contents of the parent table's td elements using a direct path from the root:
/table/tbody/tr/td
The contents of those cells happen to be another table element, but you can strip those out with DOMDocument.
To get the inner tables' td elements only excluding the parents, you can look for tables that have a td parent, then select its tds:
//td/table//td
If I've misunderstood your question, please feel free to explain further and I will update.

Related

Set While loop in FPDF

This is code is using to generate pdf file. I using $html to do the table and will display in pdf file. But now if insert the while loop in the $html, the output is the php code and got some error. I confusing the single quotes to closing the code.
$html='<br><br><br>
<table border="1">
<tr>
<td bgcolor="#D0D0FF">Product Name</td>
<td bgcolor="#D0D0FF">Product Price</td>
<td bgcolor="#D0D0FF">Product Quantity</td>
<td bgcolor="#D0D0FF">Total Price</td>
</tr>
//Here
</table>`
<br>
<table border="1">
<tr>
<td bgcolor="#D0D0FF">Order Date</td>
<td width="600">'.$row['Order_Date'].'</td>
</tr>
<tr>
<td bgcolor="#D0D0FF">Tracking ID</td>
<td width="600">'.$row['Tracking_ID'].'</td>
</tr>
<tr>
<td bgcolor="#D0D0FF">Recipient</td>
<td width="600">'.$row['Tracking_Recipient'].'</td>
</tr>
<tr>
<td bgcolor="#D0D0FF">Phone Number</td>
<td width="600">'.$row['Tracking_PhoneNumber'].'</td>
</tr>
<tr>
<td bgcolor="#D0D0FF">Address</td>
<td width="600">'.$row['Tracking_Address'].'</td>
</tr>
<tr>
<td bgcolor="#D0D0FF">Bank</td>
<td width="600">'.$row['Bank'].'</td>
</tr>
</table>';
//While loop

Get all information from each table

Because of all products have different price of its different package, I have separated its table for each products.
<table width="70%" border="0" cellspacing="0" cellpadding="0">
<tr c>
<td colspan="4"><div align="center">Product One</div></td>
</tr>
<tr>
<td>Id</td>
<td>package</td>
<td>price</td>
<td>image</td>
</tr>
<tr>
<td>1</td>
<td>2kg</td>
<td>$10</td>
<td>p2.jpg</td>
</tr>
<tr>
<td>2</td>
<td>4kg</td>
<td>$20</td>
<td>p4.jpg</td>
</tr>
<tr>
<td>3</td>
<td>6kg</td>
<td>$30</td>
<td>p6.jpg</td>
</tr>
<tr>
<td>4</td>
<td>8kg</td>
<td>$40</td>
<td>p8.jpg</td>
</tr>
</table></br></br>
<table width="70%" border="0" cellspacing="0" cellpadding="0">
<tr c>
<td colspan="4"><div align="center">Product Two</div></td>
</tr>
<tr>
<td>Id</td>
<td>package</td>
<td>price</td>
<td>image</td>
</tr>
<tr>
<td>1</td>
<td>2kg</td>
<td>$12</td>
<td>p2.jpg</td>
</tr>
<tr>
<td>2</td>
<td>4kg</td>
<td>$14</td>
<td>p4.jpg</td>
</tr>
<tr>
<td>3</td>
<td>6kg</td>
<td>$16</td>
<td>p6.jpg</td>
</tr>
<tr>
<td>4</td>
<td>8kg</td>
<td>$18</td>
<td>p8.jpg</td>
</tr>
</table>
Now what I want is to create a new table to collect the all products name: product one, product two, product three and product four.
<table width="70%" border="0" cellspacing="0" cellpadding="0">
<tr c>
<td colspan="4"><div align="center">Product One</div></td>
</tr>
<tr>
<td>Id</td>
<td>Name</td>
</tr>
<tr>
<td>1</td>
<td>Product One</td>
</tr>
<tr>
<td>2</td>
<td>Product Two</td>
</tr>
<tr>
<td>3</td>
<td>Product 3</td>
</tr>
<tr>
<td>4</td>
<td>Product Four</td>
</tr>
</table></br></br>
My question now is how to get all information of all product by connecting the name of products.
I am going to create a drop-down selection for its package. when they select the package, the price will be changed.
i know this is in coldfusion but i am sure there must be something similar in php
<cfquery name="gtmyinfo" datasource="mydb">
SELECT *
FROM mytable
</cfquery>
<table>
<tr>
<td colspan="4"><div align="center">Product ID</div></td>
<td colspan="4"><div align="center">Product Name</div></td>
<td colspan="4"><div align="center">Product Image</div></td>
</tr>
<tr>
<cfoutput query="gtmyinfo">
<td colspan="4"><div align="center">#ProductID#</div></td>
<td colspan="4"><div align="center">#ProductName#</div></td>
<td colspan="4"><div align="center">#ProductImage#</div></td>
</tr>
</cfoutput>
the table will keep looping as long with as many rows of data their are in the query.
you can limit which data is called with the "where" tag in the query or with an if/else tag in the table.

PHP DOM get element which contains

Need help with parsing HTML code by PHP DOM.
This is simple part of huge HTML code:
<table width="100%" border="0" align="center" cellspacing="3" cellpadding="0" bgcolor='#ffffff'>
<tr>
<td align="left" valign="top" width="20%">
<span class="tl">Obchodne meno:</span>
</td>
<td align="left" width="80%">
<table width="100%" border="0">
<tr>
<td width="67%">
<span class='ra'>STORE BUSSINES</span>
</td>
<td width="33%" valign='top'>
<span class='ra'>(od: 02.10.2012)</span>
</td>
</tr>
</table>
</td>
</tr>
</table>
What I need is to get text "STORE BUSINESS". Unfortunately, the only thing I can catch is "Obchodne meno" as a content of first tag, so according to this content I need to get its parent->parent->first sibling->child->child->child->child->content. I have limited experience with parsing html in php so any help will be valuable. Thanks in advance!
Make use of DOMDocument Class and loop through the <span> tags and put them in array.
<?php
$html=<<<XCOE
<table width="100%" border="0" align="center" cellspacing="3" cellpadding="0" bgcolor='#ffffff'>
<tr>
<td align="left" valign="top" width="20%">
<span class="tl">Obchodne meno:</span>
</td>
<td align="left" width="80%">
<table width="100%" border="0">
<tr>
<td width="67%">
<span class='ra'>STORE BUSSINES</span>
</td>
<td width="33%" valign='top'>
<span class='ra'>(od: 02.10.2012)</span>
</td>
</tr>
</table>
</td>
</tr>
</table>
XCOE;
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('span') as $tag) {
$spanarr[]=$tag->nodeValue;
}
echo $spanarr[1]; //"prints" STORE BUSINESS

How do I extract values from a html page stored as string using curl function

I am using PHP / curl to get a HTML into a string and then i need to extract the following data and then project a graph out of it .
The data I want looks like :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />
<title></title>
</head>
<body>
<table>
<tbody>
<tr>
<td>
<h3>Income</h3>
</td>
</tr>
<tr>
<td>Operating income</td>
<td class="numericalColumn">22,922.00</td>
<td class="numericalColumn">21,507.30</td>
<td class="numericalColumn">17,492.60</td>
<td class="numericalColumn">13,683.90</td>
<td class="numericalColumn">10,227.12</td>
</tr>
<tr>
<td>
<h3>Expenses</h3>
</td>
</tr>
<tr>
<td>Material consumed</td>
<td class="numericalColumn">4,029.40</td>
<td class="numericalColumn">3,442.60</td>
<td class="numericalColumn">2,952.30</td>
<td class="numericalColumn">1,889.00</td>
<td class="numericalColumn">1,367.67</td>
</tr>
<tr>
<td>Manufacturing expenses </td>
<td class="numericalColumn">2,213.20</td>
<td class="numericalColumn">1,841.80</td>
<td class="numericalColumn">299.80</td>
<td class="numericalColumn">120.50</td>
<td class="numericalColumn">1,020.70</td>
</tr>
<tr>
<td>Personnel expenses</td>
<td class="numericalColumn">9,062.80</td>
<td class="numericalColumn">9,249.80</td>
<td class="numericalColumn">7,409.10</td>
<td class="numericalColumn">5,768.20</td>
<td class="numericalColumn">4,279.03</td>
</tr>
<tr>
<td>Selling expenses</td>
<td class="numericalColumn">378.10</td>
<td class="numericalColumn">308.40</td>
<td class="numericalColumn">532.10</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">171.05</td>
</tr>
<tr>
<td>Adminstrative expenses</td>
<td class="numericalColumn">1,737.00</td>
<td class="numericalColumn">1,906.00</td>
<td class="numericalColumn">2,583.70</td>
<td class="numericalColumn">2,651.70</td>
<td class="numericalColumn">904.78</td>
</tr>
<tr>
<td>Expenses capitalised</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
</tr>
<tr>
<td>Cost of sales</td>
<td class="numericalColumn">17,420.50</td>
<td class="numericalColumn">16,748.60</td>
<td class="numericalColumn">13,777.00</td>
<td class="numericalColumn">10,429.40</td>
<td class="numericalColumn">7,743.22</td>
</tr>
<tr>
<td>Operating profit</td>
<td class="numericalColumn">5,501.50</td>
<td class="numericalColumn">4,758.70</td>
<td class="numericalColumn">3,715.60</td>
<td class="numericalColumn">3,254.50</td>
<td class="numericalColumn">2,483.90</td>
</tr>
<tr>
<td>Other recurring income</td>
<td class="numericalColumn">434.20</td>
<td class="numericalColumn">468.20</td>
<td class="numericalColumn">326.90</td>
<td class="numericalColumn">288.70</td>
<td class="numericalColumn">113.59</td>
</tr>
<tr>
<td>Adjusted PBDIT</td>
<td class="numericalColumn">5,935.70</td>
<td class="numericalColumn">5,226.90</td>
<td class="numericalColumn">4,042.50</td>
<td class="numericalColumn">3,543.20</td>
<td class="numericalColumn">2,597.49</td>
</tr>
<tr>
<td>Financial expenses</td>
<td class="numericalColumn">108.40</td>
<td class="numericalColumn">196.80</td>
<td class="numericalColumn">116.80</td>
<td class="numericalColumn">7.20</td>
<td class="numericalColumn">3.13</td>
</tr>
<tr>
<td>Depreciation </td>
<td class="numericalColumn">579.60</td>
<td class="numericalColumn">533.60</td>
<td class="numericalColumn">456.00</td>
<td class="numericalColumn">359.80</td>
<td class="numericalColumn">292.26</td>
</tr>
<tr>
<td>Other write offs</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
</tr>
<tr>
<td>Adjusted PBT</td>
<td class="numericalColumn">5,247.70</td>
<td class="numericalColumn">4,496.50</td>
<td class="numericalColumn">3,469.70</td>
<td class="numericalColumn">3,176.20</td>
<td class="numericalColumn">2,302.10</td>
</tr>
<tr>
<td>Tax charges </td>
<td class="numericalColumn">790.80</td>
<td class="numericalColumn">574.10</td>
<td class="numericalColumn">406.40</td>
<td class="numericalColumn">334.10</td>
<td class="numericalColumn">286.10</td>
</tr>
<tr>
<td>Adjusted PAT</td>
<td class="numericalColumn">4,456.90</td>
<td class="numericalColumn">3,922.40</td>
<td class="numericalColumn">3,063.30</td>
<td class="numericalColumn">2,842.10</td>
<td class="numericalColumn">2,016.00</td>
</tr>
<tr>
<td>Non recurring items</td>
<td class="numericalColumn">441.10</td>
<td class="numericalColumn">-948.60</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">38.33</td>
</tr>
<tr>
<td>Other non cash adjustments</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-33.85</td>
</tr>
<tr>
<td>Reported net profit</td>
<td class="numericalColumn">4,898.00</td>
<td class="numericalColumn">2,973.80</td>
<td class="numericalColumn">3,063.30</td>
<td class="numericalColumn">2,842.10</td>
<td class="numericalColumn">2,020.48</td>
</tr>
<tr>
<td>Earnigs before appropriation</td>
<td class="numericalColumn">4,898.00</td>
<td class="numericalColumn">2,973.80</td>
<td class="numericalColumn">3,063.30</td>
<td class="numericalColumn">2,842.10</td>
<td class="numericalColumn">2,020.48</td>
</tr>
<tr>
<td>Equity dividend</td>
<td class="numericalColumn">880.90</td>
<td class="numericalColumn">586.00</td>
<td class="numericalColumn">876.50</td>
<td class="numericalColumn">873.70</td>
<td class="numericalColumn">712.88</td>
</tr>
<tr>
<td>Preference dividend</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
</tr>
<tr>
<td>Dividend tax</td>
<td class="numericalColumn">128.30</td>
<td class="numericalColumn">99.60</td>
<td class="numericalColumn">148.90</td>
<td class="numericalColumn">126.80</td>
<td class="numericalColumn">99.98</td>
</tr>
<tr>
<td>Retained earnings</td>
<td class="numericalColumn">3,888.80</td>
<td class="numericalColumn">2,288.20</td>
<td class="numericalColumn">2,037.90</td>
<td class="numericalColumn">1,841.60</td>
<td class="numericalColumn">1,207.62</td>
</tr>
</tbody>
</table>
</body>
</html>
I want to extract each value like Manufacturing Data and the values of all the years mentioned in that line. How do I go about this?
I found something like preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match); but that doesn't get the values I want.
If i understood you question well you want something like this to be done. this was written by me so if you need clarifications i'd love to help.
cheers !
You can use libraries like PHP Simple HTML DOM Parser to extract data from HTML/XHTML.
http://simplehtmldom.sourceforge.net/manual.htm
An example:
$pageDom = str_get_html( $rawHtmlData );
foreach( $pageDom->find( 'td' ) as $tblElem )
{
if( FALSE !== stristr( $tblElem->innertext, 'Manufacturing expenses' ) )
{
// Do stuff
}
}

Php HTML DOM parsing

<table width="100%" cellspacing="0" cellpadding="0" border="0" id="Table4">
<tbody>
<tr>
<td valign="top" class="tx-strong-dgrey">
<a class="anc-noul" href="http://www.example.com/catalog/proddetail.asp?logon=&langid=EN&sku_id=0665000FS10129471&catid=25653">
Apple 8GB 3rd Generation iPod Touch</a></td>
</tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
<tr>
<td valign="top" class="tx-normal-grey">
Product detail
<a href="http://www.example.com/catalog/proddetail.asp?logon=&langid=EN&sku_id=0665000FS10129471&catid=25653">
More Info</a></td>
</tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
<tr>
<td valign="top" class="tx-normal-red">
<span class="tx-strong-dgrey">Price:</span>
$189.99</td>
</tr>
<tr>
<td valign="top">You save: $9.00 after instant savings</td>
</tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
<tr>
<td valign="top" class="tx-normal-grey">
<a href="http://www.example.com/catalog/subclass.asp?catid=25653&logon=&langid=EN">
View similar products</a>
<a href="http://www.example.com/catalog/mfr.asp?man=Apple&catid=19&logon=&langid=EN">
View similar products with same brand</a>
</td></tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
</tbody>
</table>
I want to be able to get the $189.99.
echo $ret[0]->find('tr', 4)->plaintext;
This outputs: 'Price: $189.99'
I just need $189.99, not 'Price:'
$exp = explode(":", $ret[0]->find('tr', 4)->plaintext);
$price =$exp[1];

Categories