I am scraping data from an external html table that is 100 rows by 3 columns. I want to parse the data into a 10x10 table where the data from each row is combined. Ex:
<tr>
<td>info1</td>
<td>info2</td>
<td>info3</td>
</tr>
<tr>
<td>info4</td>
<td>info5</td>
<td>info6</td>
</tr>
<tr>
<td>info7</td>
<td>info8</td>
<td>info9</td>
</tr>
...and so on
into
<tr>
<td>info1<br/>info2<br/>info3</td>
<td>info4<br/>info5<br/>info6</td>
<td>info7<br/>info8<br/>info9</td>
...7 more times
</tr>
...9 more times
I can output the data into a single column by using line breaks. I have absolutely no idea to do what I want to do above. Also I want to be able to style the data using css. Any help/direction is appreciated. Here is my code:
$doc = new DOMDocument();
$doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
xpath = new DOMXPath($doc);
$table = $xpath->query('//table[#id="idTable"]')->item(0);
$rows = $table->getElementsByTagName("tr");
foreach($rows as $row)
{
$cells = $row -> getElementsByTagName('td');
foreach ($cells as $cell) print $cell->nodeValue . "<br/>";
}
Two (similar) ways you can do this:
1) By counting the <tr>s and combine each 10 of them, disregard its <td> number:
$doc=new DOMDocument();
$doc->loadHTML($html);
$xpath=new DOMXPath($doc);
echo "<table>\n";
/* 10 is the row count */
for($i=0;$i<10;$i++)
{
echo "<tr>\n";
/* 10 is the column count */
foreach($xpath->query('//table[#id="myTable"]/tr[position()>'.($i*10).' and position()<'.(($i+1)*10+1).']') as $tr)
{
echo "\t<td>";// "\t" to make it look nice
$tds=array();
foreach($tr->childNodes as $td)
{
if($td->nodeName!="td") continue;
$tds[]=$td->firstChild->nodeValue;
}
echo implode("<br />",$tds);
echo "</td>\n";
}
echo "</tr>\n";
}
echo "</table>";
Online demo
2) By counting the <td>s and combine each 3 of them into a new <td>, combine each 30 of them into a new <tr>, disregard the <tr>s:
$doc=new DOMDocument();
$doc->loadHTML($html);
$xpath=new DOMXPath($doc);
echo "<table>\n";
$i=0;
$tds=array();
foreach($xpath->query('//table[#id="myTable"]/tr/td/text()') as $td)
{
/* 30 is each row's old-cell-count */
if($i%30==0) echo "<tr>\n";
$tds[]=$td->nodeValue;
/* 3 is each cell's old-cell-count */
if($i%3==2)
{
echo "\t<td>".implode("<br />",$tds)."</td>\n";
$tds=array();
}
if($i%30==29) echo "</tr>\n";
$i++;
}
echo "</table>";
Online demo
Both outputs:
<table>
<tr>
<td>info0.1<br />info0.2<br />info0.3</td>
<td>info1.1<br />info1.2<br />info1.3</td>
<td>info2.1<br />info2.2<br />info2.3</td>
<td>info3.1<br />info3.2<br />info3.3</td>
<td>info4.1<br />info4.2<br />info4.3</td>
<td>info5.1<br />info5.2<br />info5.3</td>
<td>info6.1<br />info6.2<br />info6.3</td>
<td>info7.1<br />info7.2<br />info7.3</td>
<td>info8.1<br />info8.2<br />info8.3</td>
<td>info9.1<br />info9.2<br />info9.3</td>
</tr>
<tr>
<td>info10.1<br />info10.2<br />info10.3</td>
<td>info11.1<br />info11.2<br />info11.3</td>
<!-- ... -->
<td>info97.1<br />info97.2<br />info97.3</td>
<td>info98.1<br />info98.2<br />info98.3</td>
<td>info99.1<br />info99.2<br />info99.3</td>
</tr>
</table>
Related
How can I get text from HTML table cells using PHP DOM query?
HTML table is:
<table>
<tr>
<th>Job Location:</th>
<td>Kabul
</td>
</tr>
<tr>
<th>Nationality:</th>
<td>Afghan</td>
</tr>
<tr>
<th>Category:</th>
<td>Program</td>
</tr>
</table>
I have following query but it doesn't work:
$xmlPageDom = new DomDocument();
#$xmlPageDom->loadHTML($html);
$xmlPageXPath = new DOMXPath($xmlPageDom);
$value = $xmlPageXPath->query('//table td /text()');
get a complete table with php domdocument and print it
The answer is like this:
$html = "<table ID='myid'><tr><td>1</td><td>2</td></tr><tr><td>4</td><td>5</td></tr><tr><td>7</td><td>8</td></tr></table>";
$xml = new DOMDocument();
$xml->validateOnParse = true;
$xml->loadHTML($html);
$xpath = new DOMXPath($xml);
$table =$xpath->query("//*[#id='myid']")->item(0);
$rows = $table->getElementsByTagName("tr");
foreach ($rows as $row) {
$cells = $row -> getElementsByTagName('td');
foreach ($cells as $cell) {
print $cell->nodeValue;
}
}
EDIT: Use this instead
$table = $xpath->query("//table")->item(0);
I have a table with 3 columns where each of the columns could contain a link or data like this one:
<tr><td><a href='link1'>value1</a></td><td><a href='link2'>value2</a></td><td><a href='link3'>value3</a></td></tr>
<tr><td><a href='link4'>value4</a></td><td>value5</td><td>value6</td></tr>
<tr><td>value7</td><td><a href='link8'>value8</a></td><td>value9</td></tr>
<tr><td>value10</td><td>value11</td><td><a href='link12'>value12</a></td></tr>
<tr><td>value13</td><td>value14</td><td>value15</td></tr>
I am able to get the data for each cell of the table using the following code:
$data = file_get_contents('pathtomyfile');
$dom = new domDocument;
#$dom->loadHTML($data);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXPath($dom);
$rows = $xpath->query('//tr');
foreach ($rows as $row) {
$cols = $row->getElementsByTagName('td');
foreach ($cols as $col) {
echo $col->nodeValue;
}
echo "\n";
}
I am trying to output the table in a different format and am wondering how I can get the value of the href in addition to the value of the table cell for the cells where a link exists. For example, for the first table cell I'd like to get "link1" and "value1".
Alternatively, you could check inside the inner loop (the one that iterates each cols) whether a link exists inside it (since some of them don't have it):
foreach ($rows as $row) {
$cols = $row->getElementsByTagName('td');
foreach ($cols as $col) {
echo 'value = ' . $col->nodeValue;
if($xpath->evaluate('count(./a)', $col) > 0) { // check if an anchor exists
echo ' | link = ' . $xpath->evaluate('string(./a/#href)', $col); // if there is, then echo the href value
}
echo '<br/>';
}
echo "<br/>";
}
Sample Output
The xhtml data I need to get the childNodes from I don't need the child from the TH childNODES
<table>some data</table>
<table>
<tr>
<td class="c2">PCI Signal Error (SERR#) Enable</td>
<td>Yes</td>
</tr>
<tr>
<td class="c1">Controller Type 1</td>
<td>CISS</td>
</tr>
<tr>
<td class="c2">bus type</td>
<td>CISS</td>
</tr>
<tr>
<th><a name="systempcibus5">PCI Bus 31</a></th>
<td>Device</td>
</tr>
</table>
below is the latest attempt, I only want to get the textContent for the TD's in the above xml
so I can build a mysql statement to insert the data in mySql
I have tried so many variations over the last week.
I get this error. I won't bore you with all the various things I tried, but I believe this is the closest to what I want.
PHP Notice: Trying to get property of non-object in C:\inetpub\wwwroot\reports\gec\test1.php on line 40
<?php
libxml_use_internal_errors(true);
$dom = new DomDocument;
$dom->loadHTML($html);
$xpath = new DomXPath($dom);
$nodes = $xpath->query('/html/body/table[2]/tr');
//$nodes = $xpath->query("//tr[contains(concat(' ', #class, ' '), ' head ') ");
//header("Content-type: text/plain");
$node_count=$nodes->length ;
for( $i = 1; $i <= intval($node_count); $i++)
{
$node_td1 = $xpath->query('/html/body/table[2]/tr[$i]/td[1]');
$node_td2 = $xpath->query('/html/body/table[2]/tr[$i]/td[2]');
$result1=$node_td1->textContent;
$result2=$node_td2->textContent;
echo $result1 . "," . $result2 . "<br>";
}
Alternatively, you could just point out the row itself, then filter them out using that ->tagName:
$dom = new DomDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DomXPath($dom);
$rows = $xpath->query('/html/body/table[2]/tr');
foreach ($rows as $row) {
foreach($row->childNodes as $col) {
if(isset($col->tagName) && $col->tagName != 'th') {
echo $col->textContent . '<br/>';
}
}
echo '<hr/>';
}
Or with using xpath, to reference each row:
foreach ($rows as $row) {
$col1 = $xpath->evaluate('string(./td[1])', $row);
$col2 = $xpath->evaluate('string(./td[2])', $row);
echo $col1 . '<br/>';
echo $col2 . '<br/>';
echo '<hr/>';
}
Sample Output
I'm stuck with this.
I try to use php dom to parse some html code.
How can I get to know how many children current element has witch I iterate through in for loop?
<?php
$str='
<table id="tableId">
<tr>
<td>row1 cell1</td>
<td>row1 cell2</td>
</tr>
<tr>
<td>row2 cell1</td>
<td>row2 cell2</td>
</tr>
</table>
';
$DOM = new DOMDocument;
$DOM->loadHTML($str); // loading page contents
$table = $DOM->getElementById('tableId'); // getting the table that I need
$DOM->loadHTML($table);
$tr = $DOM->getElementsByTagName('tr'); // getting rows
echo $tr->item(0)->nodeValue; // outputs row1 cell1 row1 cell2 - exactly as I expect with both rows
echo "<br>";
echo $tr->item(1)->nodeValue; // outputs row2 cell1 row2 cell2
// now I need to iterate through each row to build an array with cells that it has
for ($i = 0; $i < $tr->length; $i++)
{
echo $tr->item($i)->length; // outputs no value. But how can I get it?
echo $i."<br />";
}
?>
This will give you all childnodes:
$tr->item($i)->childNodes->length;
... but: it will contain DOMText nodes with whitespace etc (so the count is 4). If you don't necessarily need the length, just want to iterate over all the nodes, you can do this:
foreach($tr->item($i)->childNodes as $node){
if($node instanceof DOMElement){
var_dump($node->ownerDocument->saveXML($node));
}
}
If you need only a length of elements, you can do this:
$x = new DOMXPath($DOM);
var_dump($x->evaluate('count(*)',$tr->item($i)));
And you can do this:
foreach($x->query('*',$tr->item($i)) as $child){
var_dump($child->nodeValue);
}
foreach-ing through the ->childNodes has my preference for simple 'array-building'. Keep in mind you van just foreach through DOMNodeList's as if they were arrays, saves a lot of hassle.
Building a simple array from a table:
$DOM = new DOMDocument;
$DOM->loadHTML($str); // loading page contents
$table = $DOM->getElementById('tableId');
$result = array();
foreach($table->childNodes as $row){
if(strtolower($row->tagName) != 'tr') continue;
$rowdata = array();
foreach($row->childNodes as $cell){
if(strtolower($cell->tagName) != 'td') continue;
$rowdata[] = $cell->textContent;
}
$result[] = $rowdata;
}
var_dump($result);
I have and XML file which is constructed like so:
<Row>
<Cell><Data>Name</Data></Cell>
<Cell><Data>Surname</Data></Cell>
<Cell><Data>Email</Data></Cell>
</Row>
<Row>
<Cell><Data>Name</Data></Cell>
<Cell><Data>Surname</Data></Cell>
<Cell><Data>Email</Data></Cell>
</Row>
<Row>
<Cell><Data>Name</Data></Cell>
<Cell><Data>Surname</Data></Cell>
<Cell><Data>Email</Data></Cell>
</Row>
<Row>
<Cell><Data>Name</Data></Cell>
<Cell><Data>Surname</Data></Cell>
<Cell><Data>Email</Data></Cell>
</Row>
What I want to do is add them to a table using PHP so far I have written this code:
<?php
$dom = new DomDocument();
$dom -> load("file.xml");
$data = $dom->getElementsByTagName('Data');
echo( "<table><tr>");
foreach( $data as $node){ echo( "<td>". $node -> textContent . "<td>");}
echo( "</tr></table>");
?>
The problem is that its appending all the data to td tags which get really long and what I need it to do is add a tr tag after the 3 data tags that are read.
Its currently creating something like:
<table>
<tr>
<td>Name</td><td>Surname</td><td>Email</td>
<td>Name</td><td>Surname</td><td>Email</td>
<td>Name</td><td>Surname</td><td>Email</td>
<td>Name</td><td>Surname</td><td>Email</td>
</tr>
</table>
I need it to be
<table>
<tr><td>Name</td><td>Surname</td><td>Email</td></tr>
<tr><td>Name</td><td>Surname</td><td>Email</td></tr>
<tr><td>Name</td><td>Surname</td><td>Email</td></tr>
<tr><td>Name</td><td>Surname</td><td>Email</td></tr>
</table>
HELP! :-)
Change your for loop a bit:
$n = 0;
foreach($data as $node)
{
if($n % 3 == 0) { echo '<tr>'; }
echo( "<td>". $node -> textContent . "<td>");
if(++$n % 3 == 0) { echo '</tr>'; }
}
And remove the opening and closing tr's that you already have in there
This involves a simple edit to your code. As you want it to appear for every third entry, you just need to move the <tr> inside the loop.
This should solve your problem:
<?php
$dom = new DomDocument();
$dom -> load("file.xml");
$data = $dom->getElementsByTagName('Data');
$counter = 0; // Set the entry counter
echo( "<table>");
foreach($data as $node) {
if ($counter % 3 == 0) {
echo '<tr>';
}
echo "<td>". $node -> textContent . "<td>";
if($counter % 3 == 0) {
echo '</tr>';
}
$counter++; // Increment the counter
}
echo( "</table>");
?>
It's not the cleanest code, but should work.