I have trouble calculating a specific column with Dom Document and Xpath.
This is how the source file looks like:
already some other tables and then...
<table><hr><tr><td><table>
<td align="center" colspan="1"><u><b>Contracts</b></u></td>
<tr><th>pos</th><th>player</th><th>age</th><th>year 1</th><th>year 2</th><th>year 3</th><th>year 4</th><th>year 5</th><th>year 6</th></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp>James Harden </td><td>27</td><td>20.00</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp>Terry Rozier </td><td>22</td><td>1.10</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SG</td><td CLASS=tdp>Danny Green </td><td>29</td><td>2.60</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SG</td><td CLASS=tdp>Marco Belinelli </td><td>30</td><td>1.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SF</td><td CLASS=tdp>Luol Deng </td><td>31</td><td>1.75</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SF</td><td CLASS=tdp>Jeremy Evans </td><td>28</td><td>7.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>PF</td><td CLASS=tdp>Jeff Withey </td><td>26</td><td>6.25</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>PF</td><td CLASS=tdp>Lavoy Allen </td><td>27</td><td>1.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp> C</td><td CLASS=tdp>Jonas Valanciunas </td><td>24</td><td>12.75</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp> C</td><td CLASS=tdp>Ryan Hollins </td><td>31</td><td>1.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SF</td><td CLASS=tdp>K.J. McDaniels </td><td>23</td><td>1.50</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp>Briante Weber </td><td>24</td><td>4.35</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td CLASS=tdp>SF</td><td CLASS=tdp>Nicolas Brussino </td><td>23</td><td>1.00</td><td></td><td></td><td></td><td></td><td></td></tr>
</table></td><td><table>
...
I worked with this code, similar to one I've found here, but I always get "0" as result.
$doc = new DOMDocument;
$doc->loadHTML('URL');
$xpath = new DOMXPath($doc);
// sum of cells of the sixth table (contracts), in the fourth column (year1), skipping the first row (ignore Year 1)
print $xpath->evaluate('sum(//table[6]//tr[position() > 1]/td[4])');
It can be difficult when using terms like table[6] in XPath as this is so dependant on the overall document structure. It's better if you can pick up on something like <b>Contracts</b> as part of the table your interested in and search for that table.
So you could try...
print $xpath->evaluate('sum(//table[td/u/b/.="Contracts"]/tr[position() > 1]/td[4])');
Update:
To help work out what it's doing you can break it down to levels and see what it's returning. To check if it's finding the table, use...
$table = $xpath->query('//table[td/u/b="Contracts"]');
echo $doc->saveHTML($table[0]);
Then add onto it to see where it's failing. One of the big difficulties can be that as your using HTML, is a constant problem of bad HTML gets converted into XML and it can loose some of it's structure.
Related
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 6 years ago.
I am fetching html from a website with file_get_contents. I have a table (with a class name) inside html, and I want to get the data inside html tags.
This is how I fetch the html data from url:
$url = 'http://example.com';
$content = file_get_contents($url);
The html looks like:
<table class="space">
<thead></thead>
<tbody>
<tr>
<td class="marsia">1</td>
<td class="mars">
<div>Mars</div>
</td>
</tr>
<tr>
<td class="earthia">2</td>
<td class="earth">
<div>Earth</div>
</td>
</tr>
</body>
</table>
Is there a way to searh DOM elements in php like we do in jQuery? So that I can access the values 1, 2 (first td) and div's value inside second td.
Something like
a) search the html for table with class name space
b) inside that table, inside tbody, return each tr's 'first td's value' and 'div's value inside second td'
So I get; 1 and Mars, 2 and Earth.
Use the DOM extension, for example. Its DOMXPath class is particularly useful for such kind of tasks.
You can easily set the listed conditions with an XPath expression like this:
//table[#class="space"]//tr[count(td) = 2]/td
where
- //table[#class="space"] selects all table elements from the document having class attribute value equal to "space" string;
- //tr[count(td) = 2] selects all tr elements having exactly two td child elements;
- /td represents the td elements.
Sample implementation:
$html = <<<'HTML'
<table class="space">
<thead></thead>
<tbody>
<tr>
<td class="marsia">1</td>
<td class="mars">
<div>Mars</div>
</td>
</tr>
<tr>
<td class="earthia">2</td>
<td class="earth">
<div>Earth</div>
</td>
</tr>
<tr>
<td class="earthia">3</td>
</tr>
</tbody>
</table>
HTML;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$cells = $xpath->query('//table[#class="space"]//tr[count(td) = 2]/td');
$i = 0;
foreach ($cells as $td) {
if (++$i % 2) {
$number = $td->nodeValue;
} else {
$planet = trim($td->textContent);
printf("%d: %s\n", $number, $planet);
}
}
Output
1: Mars
2: Earth
The code above is supposed to be considered as a sample rather than an instruction for practical use, as it is not very scalable. The logic is bound to the fact that the XPath expression selects exactly two cells for each row. In practice, you may want to select the rows, iterate them, and put the extra conditions into the loop, e.g.:
$rows = $xpath->query('//table[#class="space"]//tr');
foreach ($rows as $tr) {
$cells = $xpath->query('.//td', $tr);
if ($cells->length < 2) {
continue;
}
$number = $cells[0]->nodeValue;
$planet = trim($cells[1]->textContent);
printf("%d: %s\n", $number, $planet);
}
DOMXPath::query() is called with an XPath expression relative to the current row ($tr), then checks if the returned DOMNodeList contains at least two cells. The rest of the code is trivial.
You can also use SimpleXML extension, which also supports XPath. But the extension is much less flexible as compared to the DOM extension.
For huge documents, use extensions based on SAX-based parsers such as XMLReader.
I am trying to extract a piece of text from an HTML using PHP command preg_match.
Ive successfully parsed the HTML into a variable, but now I got stuck with extracting the right piece of information - probably because I am a bit confused by the syntax of preg_match.
So basically, here is a piece of the HTML I am interested in:
...<tr >
<td >Metuje</td>
<td ><a href="./detail_stanice/307158.html" >Maršov nad Metují</a></td>
<td >A</td>
<td >90</td>
<td >120</td>
<td >150</td>
<td >cm</td>
<td >04.08. 14:20</td>
<td >31</td>
<td >0.53</td>
<td ><img src="./img/ldown.png" width="15" /></td>
</tr>...
What I need is to find this particular row in the table (which contains couple of other rows), so basically I need to search for the name "Maršov nad Metují" in the second cell and then, extract the values of the subsequent cells on that row into a string, in other words in this particular case I would like to have a string with values A, 90, 120, etc. until the end of the row.
On the website there are then other rows with the exact same format just with different values, so I would then use the same syntax to extract values for rows with different names in the second cell.
I have tried it myself, but I was not able to get the right output.
I tried something like this, but this does not solve the problem, I know I have to somehow implement the cell TD commands, but unfortunately I wasnt able to get it right in this particular case.:
preg_match("/Maršov nad Metují(.*?)\<\/tr/", $html, $results);
Any help is very much appreciated.
Thanks
Try this :
<?php
$info = '<tr ><td >Metuje</td><td ><a href="./detail_stanice/307158.html" >Maršov nad Metují</a></td><td >A</td><td >90</td><td >120</td><td >150</td><td >cm</td><td >04.08. 14:20</td><td >31</td><td >0.53</td><td ><img src="./img/ldown.png" width="15" /></td></tr>';
preg_match('/<a href="(.*)" >(.*)</Ui',$info,$result);
print_r($result[2]);// MarÅ¡ov nad MetujÃ
preg_match_all("/<td.*?>(.+?)<\/td>/is", $html, $matches);
$result = $matches[1];
array_shift($result);
array_shift($result);
print implode(', ', $result);
Trying to get the value of Internet Data Volume Balance - the script should echo 146.30mb
New to all these, having a look at all the tutorials.
How can this be done?
<tr >
<td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Account Status</FONT></B></div></td>
<td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">You exceeded your allowed credit.</FONT></div></td>
</tr>
<tr >
<td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Period Free Time Remaining</FONT></B></div></td>
<td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">0:00:00 hours</FONT></div></td>
</tr>
<tr >
<td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Internet Data Volume Balance</FONT></B></div></td>
<td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text" style="text-transform:none;">146.30 MB</FONT></div></td>
</tr>
If you were willing to or have already installed phpQuery, you can use that.
phpQuery::newDocumentFileHTML('htmlpage.html');
echo pq('td:eq(6)')->text();
PHP can interact with the DOM just like JavaScript can. This is vastly superior to parsing the markup, as most people will tell you is the wrong approach anyway:
Loading from an HTML File
// Start by creating a new document
$doc = new DOMDocument();
// I've loaded the table into an external file, and am loading it into the $doc
$doc->loadHTMLFile( 'htmlpage.html' );
// Since you have six table cells, I'm calling up all of them
$cells = $doc->getElementsByTagName("td");
// I'm grabbing the sixth cell's textContent property
echo $cells->item(5)->textContent;
This code will output "146.30 MB" to the screen.
Loading from a String
If you have the HTML stored within a string, you can load that into your document as well. We'll change the method used to load the file, into the method used to load from a string:
$str = "<table><tr><td>Foo</td></tr>...</table>";
$doc->loadHTML( $str );
We would then proceed with the same code as above to select the cells, and show their textContent in the output.
Check out the DOMDocument Class.
Hey everyone, I am using simplexml to pull data from an external xml source. I have got values even for limiting the number of results to display. I thought I could paginate with a simple query within the URL, something like "&page=2" but it is not possible as far as documentation shows.
I downloaded a pagination class intended to use within a MYSQL query an tried to used the vars output from the xml. But the output is loading the whole results of the xml and not the specified within the URL vars.
I think what I might do is to count the results first and then paginate, which is what I am trying to do. Do you see anything in this code that can be improved? Sorry If it isn´t clear, but maybe discussing with some coders fellas I can see a bit of light at the end of the tunnel and exaplin a bit better.
So here is the code:
<?
$url ="http://www.somedomain.com/cgi/xml/engine/get_data.php?ref=$ref&checkin=$checkin&checkout=$checkout&rval=$rval&pval=$pval&country=$country&city=$city&lg=$lg&orderby=$orderby&ordertype=$ordertype&maxrows=$maxrows";
// see I am already defining the max num of rows within the url. Which means that the proper way to sort this out is to start counting from the # aheads?
$all = new SimpleXMLElement($url, null, true);
$all->items_total = $hotels->id;
//
require_once 'paginator.class.php';
//calling the paginator class
foreach($all as $hotel) // loop through our hotels
{
$pages = new Paginator;
//creating a new paginator
$pages->mid_range = 7;
$pages->items_total = $hotel->id;
//extracting the var out from the XML
$rest = substr($hotel->description, 0, -150); // returns "abcde"
//echo <<<EOF
<table width="100%" border=0>
<tr>
<td colspan="2">{$hotel->name}<span class="stars" widht="{$hotel->rating}">{$hotel->rating}</span></h2></a><p><b>Direccion:</b> <i>{$hotel->address}</i> - {$hotel->province}</p>
<td colspan="2"><div align="center">PRECIO: {$hotel->currencyCode} {$hotel->minCostOfStay</a>
</div></a></a>
</td>
</tr>
<tr>
<td colspan="2"> $rest...<strong>ampliar información</strong></td>
<td valign="middle"><div align="center"><a href="{$hotel->rooms->room->bookUrl}"><img src="{$hotel->photoUrl}"></div></td>
</tr>
<tr>
<td colspan="2"><div align="center"><strong>VER TODO SOBRE ESTE </strong></div></td>
<td colspan="2"><div align="center">$text</a></div></td>
</a></div></td>
</tr>
//EOF;
echo '</table>';
$pages->paginate();
}
echo $pages->display_pages();
?>
You're clobbering your $all variable:
$all = new SimpleXMLElement($url, null, true); // used by the loop
$all = new Paginator; // reset within the loop
I have a table with the following structure. I cannot seem to get the data I want.
<table class="gsborder" cellspacing="0" cellpadding="2" rules="cols" border="1" id="d00">
<tr class="gridItem">
<td>Code</td><td>0adf</td>
</tr><tr class="AltItem">
<td>CompanyName</td><td>Some Company</td>
</tr><tr class="Item">
<td>Owner</td><td>Jim Jim</td>
</tr><tr class="AltItem">
<td>DivisionName</td><td> </td>
</tr><tr class="Item">
<td>AddressLine1</td><td>9314 W. SPRING ST.</td>
</tr>
</table>
This table is of course nested within another table within the page. How can I use DomDocument for example to refer to "Code" and "0adf" as a key value pair? They actually don't need to be in a key value pair but I should be able to call them each separately.
EDIT:
Using PHP Simple HTML, I was able to extract the data I needed using this:
$foo = $html->getElementById("d00")->childNodes(1)->childNodes(1);
The problem with this though is that I am getting the two <td></td> tags with my data. Is there a way to only grab the raw data without the tags?
Also, is this the right way to get my data out of this table?
If you're not dead set on using DOMDocument, try using the PHP Simple HTML DOM Parser. This has the benefit of allowing you to parse HTML which is not valid XML as well as providing a nicer interface to the parsed document.
You could write something like:
$html = str_get_html(...);
foreach($html->find('tr') as $tr)
{
print 'First td: ' . $tr->find('td', 0)->plaintext;
print 'Second td: ' . $tr->find('td', 1)->plaintext;
}