PHP textContent removing HTML? - php

I have the following script which loops through a HTML table and gets the values from it then returns the value of the table in a td.
$tds = $dom->getElementsByTagName('td');
// New dom
$dom2 = new DOMDocument;
$x = 1;
// Loop through all the tds printing the value with a new class
foreach($tds as $t) {
if($x%2 == 1)
print "</tr><tr>";
$class = ($x%2 == 1) ? "odd" : "even";
var_dump($t->textContent);
print "<td class='$class'>".$t->textContent."</td>";
$x++;
}
But the textContent seems to be stripping the HTML tags (for example it is a <p></p> wrapper tag). How can I get it to just give me the value?
Or is there another way of doing this? I have the following html
<table>
<tr>
<td>q1</td>
<td>a1</td>
</tr>
<tr>
<td>q2</td>
<td>a2</td>
</tr>
</table>
and I need to make it look like
<table>
<tr>
<td class="odd">q1</td>
<td class="even">a1</td>
</tr>
<tr>
<td class="odd">q2</td>
<td class="even">a2</td>
</tr>
</table>
It will always look the exact same way (minus extra element rows and the values which change).
Any help?

According to MDN this is the expected behaviour of textContent.
You can just add the class to the tds in the DomDocument
$tds = $dom->getElementsByTagName('td');
$x = 1;
foreach($tds as $td) {
if($x%2 == 1){
$td->setAttribute('class', 'odd');
}
else{
$td->setAttribute('class', 'even');
}
$x++;
}

Related

Convert HTML table to PHP array - problem with merge text

I have a little problem. I must convert in PHP from table html to array or json. My array always have two columns and N rows. I use:
$xml = new DOMDocument('1.0', 'utf-8');
libxml_use_internal_errors(true);
$xml->loadHTML('<?xml encoding="utf-8" ?>'.$content);
$xpath = new DOMXPath($xml);
$table =$xpath->query("//*[#class='".$autoAttributeHtmlClass."']");
$length = $table->length;
$j = 0;
$attrArr = array();
for ($i=0; $i <= $length-1; $i++) {
$element = $table->item($i);
$rows = $element->getElementsByTagName("tr");
foreach ($rows as $row)
{
$cols = $row->getElementsByTagName('td');
$attrArr[$j]['attr'] = rtrim($cols->item(0)->nodeValue, ':');
$attrArr[$j]['val'] = htmlspecialchars($cols->item(1)->nodeValue);
$j++;
}
}
echo json_encode($attrArr);
All is good until in column is only clear text. When in column is additional html code (for example: <div>, <span>, <p>, <li>, etc.) inner texts are merge
Example HTML table:
<table class="test">
<tbody>
<tr>
<td>Col1</td>
<td>Micro Tower</div></td>
</tr>
<tr>
<td>Col2</td>
<td>
<p>Micro-ATX</p>
<p>Mini-ITX</p>
</td>
</tr>
<tr>
<td>Col3</td>
<td>
<div>
<span>Test1</span>
</div>
<div>
<span>Test2</span>
</div>
</td>
</tr>
</tbody>
</table>
In case of secound row in nodeValue (PHP) I have a merge: Micro-ATXMini-ITX
In third row in nodeValue (PHP) I have a merge: Test1Test2
Any idea? I must have a separator in between text - now is not readable (space, coma or semicolon)
Try .textContent insted of .nodeValue

Scrape DOMDocument Table for Contents in PHP

I am really struggling attempting to scrape a table either via XPath or any sort of 'getElement' method. I have searched around and attempted various different approaches to solve my problem below but have come up short and really appreciate any help.
First, the HTML portion I am trying to scrape is the 2nd table on the document and looks like:
<table class="table2" border="1" cellspacing="0" cellpadding="3">
<tbody>
<tr><th colspan="8" align="left">Status Information</th></tr>
<tr><th align="left">Status</th><th align="left">Type</th><th align="left">Address</th><th align="left">LP</th><th align="left">Agent Info</th><th align="left">Agent Email</th><th align="left">Phone</th><th align="center">Email Tmplt</th></tr>
<tr></tr>
<tr>
<td align="left">Active</td>
<td align="left">Resale</td>
<td align="center">*Property Address*</td>
<td align="right">*Price*</td>
<td align="center">*Agent Info*</td>
<td align="center">*Agent Email*</td>
<td align="center">*Agent Phone*</td>
<td align="center"> </td>
</tr>
<tr>
<td align="left">Active</td>
<td align="left">Resale</td>
<td align="center">*Property Address*</td>
<td align="right">*Price*</td>
<td align="center">*Agent Info*</td>
<td align="center">*Agent Email*</td>
<td align="center">*Agent Phone*</td>
<td align="center"> </td>
</tr>
...etc
With additional trs continuing containing 8 tds with the same information as detailed above.
What I need to do is iterate through the trs and internal tds to pick up each piece of information (inside the td) for each entry (inside of the tr).
Here is the code I have been struggling with:
<?php
$payload = array(
'http'=>array(
'method'=>"POST",
'content'=>'key=value'
)
);
stream_context_set_default($payload);
$dom = new DOMDocument();
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('website-scraping-from.com');
libxml_clear_errors();
foreach ($dom->getElementsByTagName('tr') as $row){
foreach($dom->$row->getElementsByTagName('td') as $node){
echo $node->textContent . "<br/>";
}
}
?>
This code is not returning nearly what I need and I am having a lot of trouble trying to figure out how to fix it, perhaps XPath is a better route to go to find the table / information I need, but I have come up empty with that method as well. Any information is much appreciated.
If it matters, my end goal is to be able to take the table data and dump it into a database if the first td has a value of "Active".
Can this be of any help?
$table = $dom->getElementsByTagName('table')->item(1);
foreach ($table->getElementsByTagName('tr') as $row){
$cells = $row->getElementsByTagName('td');
if ( $cells->item(0)->nodeValue == 'Active' ) {
foreach($cells as $node){
echo $node->nodeValue . "<br/>";
}
}
}
This will fetch the second table, and display the contents of the rows starting with a first cell "Active".
Edit: Here is a more extensive help:
$arr = array();
$table = $dom->getElementsByTagName('table')->item(1);
foreach ($table->getElementsByTagName('tr') as $row){
$cells = $row->getElementsByTagName('td');
if ( $cells->item(0)->nodeValue == 'Active' ) {
$obj = new stdClass;
$obj->type = $cells->item(1)->nodeValue;
$obj->address = $cells->item(2)->nodeValue;
$obj->price = $cells->item(3)->nodeValue;
$obj->agent = $cells->item(4)->nodeValue;
$obj->email = $cells->item(5)->nodeValue;
$obj->phone = $cells->item(6)->nodeValue;
array_push( $arr, $obj );
}
}
print_r( $arr );

PHP DOM Parser Get Specific text by Class While Looping

I am working on a PHP Simple DOM Parser and i want a simple solution for my question
<tr>
<td class="one">1</td>
<td class="two">2</td>
<td class="three">3</td>
</tr>
<tr>
<td class="one">10</td>
<td class="two">20</td>
<td class="three">30</td>
</tr>...
the html of mine is will look similar to the above
and i am looping over through td something like this
foreach ($sample->find("td") as $ele)
{
if($ele->class == "one")
echo "ONE = ".$ele->plaintext;
if($ele->class == "two")
echo "TWO= ".$ele->plaintext;
}
But is there any simple solution that without if condition getting the plaintext of particular class i dont want shorthand if also
I am expecting something like this below
$ele->class->one
take a look at it:
<?php
$html = "
<table>
<tr>
<td class='one'>1</td>
<td class='two'>2</td>
<td class='three'>3</td>
</tr>
<tr>
<td class='one'>10</td>
<td class='two'>20</td>
<td class='three'>30</td>
</tr>
</table>
";
// Your class name
$classeName = 'one';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
// Get the results
$results = $xpath->query("//*[#class='" . $classeName . "']");
for($i=0; $i < $results->length; $i++) {
echo $review = $results->item($i)->nodeValue . "<br>";
}
?>

xPath retrieve onclick value

I'm trying to retrieve the onclick value on a td element. This is what I have so far.
$xpath = new DOMXPath($dom);
$trs = $xpath->query("/html/body//table/tr");
foreach ($trs as $tr){
$tds = $xpath->query("td", $tr);
foreach ($tds as $td) {
$a = $xpath->query("#onclick", $td);
echo $a->nodeValue;
echo $td->nodeValue;
}
}
This doesn't seem to be working though.
Here's the structure
<table>
<tr>
<td>Name</td>
<td onclick="blahblah">Author</td>
<td>Title</td>
</tr>
</table>
$a is a NodeList, you must select an item:
#print($a->item(0)->nodeValue);

Alternating row colors in html table from xml datasource with php

I would like to alternate the row color from odd and even from the following xml with php.
<?php
// load SimpleXML
$books = new SimpleXMLElement('books.xml', null, true);
echo <<<EOF
<table>
<tr>
<th>Title</th>
<th>Author</th>
<th>Publisher</th>
<th>Price at Amazon.com</th>
<th>ISBN</th>
</tr>
EOF;
foreach($books as $book) // loop through our books
{
echo <<<EOF
<tr>
<td>{$book->title}</td>
<td>{$book->author}</td>
<td>{$book->publisher}</td>
<td>\${$book->amazon_price}</td>
<td>{$book['isbn']}</td>
</tr>
EOF;
}
echo '</table>';
?>
How would I do this with php considering my source is XML?
Add a counter, initialize it to zero, increment on each iteration and put different classes in tr depending on the value of $counter%2 (zero or not). (like ($counter%2)?'odd':'even').
Something like this:
for($i=0;$i<6;$i++)
{
if($i % 2)
{
// even
}else{
// odd
}
}
Here's a simple way.
<?php
// load SimpleXML
$books = new SimpleXMLElement('books.xml', null, true);
echo <<<EOF
<table>
<tr>
<th>Title</th>
<th>Author</th>
<th>Publisher</th>
<th>Price at Amazon.com</th>
<th>ISBN</th>
</tr>
EOF;
$even = true;
foreach($books as $book) // loop through our books
{
$class = $even ? 'even' : 'odd';
$even = $even ? false : true;
echo <<<EOF
<tr class="$class">
<td>{$book->title}</td>
<td>{$book->author}</td>
<td>{$book->publisher}</td>
<td>\${$book->amazon_price}</td>
<td>{$book['isbn']}</td>
</tr>
EOF;
}
echo '</table>';
?>

Categories