Delete Rows from HTML Table After x Rows PHP [duplicate] - php

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
Html
<table>
<tr><td></td></tr> //1st row
<tr><td></td></tr> //2nd row
<tr><td></td></tr> //3rd row
<tr><td></td></tr> //4th row
<tr><td></td></tr> //5th row
</table>
What I want to do
if (intval($rows) > 3) {
delete all rows after 3rd row
}
I am using below php code to get html page
$index = substr_count(strtolower(file_get_contents('index.html')), '<tr>');
I hope my question clear enough to understand
Full code
<?php
$htaccess = file_get_contents('index.html');
$new_htaccess = str_replace('<table><tr><td>first row data</td></tr>', '<table><tr><td>first row data</td></tr><tr><td>sec row data</td></tr>', $htaccess);
$pos = strpos($htaccess, $ssa);
if ($pos == false) {
file_put_contents('index.html', $new_htaccess);
} else {
}
$index = substr_count(strtolower(file_get_contents('index.html')), '<tr>');
if (intval($index) > 20) {
//delete end rows and add a new one
}
?>

I would first extract the table using a regex such as \<table>.+<\/table>\, then
strip the <table> </table> tags.
turn the string into array using exlode with <tr> as the delimiter
and finally reconstruct the table using the first 3 items of the array
that is how i would attempt it, not sure it is applicable to your case. obviously you are scraping another site, so it depends a lot of how consistent the code is.

Here is a very simplistic, and untested, method :
//--- create a new DOM document
$doc = new DOMDocument();
//--- load your file
$doc->loadHTMLFile("filename.html");
//--- point to the tables [0] means first table in the file
$tables = $doc->getElementsByTagName('table')[0];
//--- get all the tr within the specified table
$tr = $tables->getElementsByTagName('tr');
//--- loop backwards
for( $x=count($tr)-1; $x>2 $x-- ) {
//--- remove the node (not sure which one will work)
$old = $tr->removeChild($tr[$x]);
$old = $tr->removeChild( $tr->item($x) );
}
//--- save the new file
$doc->saveHTMLFile("/tmp/test.html");
References:
http://www.php.net/manual/en/domdocument.loadhtmlfile.php
http://www.php.net/manual/en/domdocument.getelementsbytagname.php
http://www.php.net/manual/en/domnode.removechild.php
http://www.php.net/manual/en/domdocument.savehtmlfile.php
Hope this is of some help.

jeff posted a good solution, so if you are interested in using any 3rd party libraries.
I suggest you to use ganon.php
<?php
require_once( "ganon.php" );
// Your html
$html = '<table>
<tr><td>1</td></tr>
<tr><td>2</td></tr>
<tr><td>3</td></tr>
<tr><td>4</td></tr>
<tr><td>5</td></tr>
</table>';
// load the html
$html = str_get_dom( $html );
// search for our table
if ( $table = $html( "table", 0 ) ) {
// get all rows which is after 3rd row, here 0 is 1, so 3rd row is 2
if ( $rows = $html( "tr:gt(2)" ) ) {
// loop through rows
foreach( $rows as $row ) {
// .... and delete them
$row->delete();
}
}
}
// output your modified html
echo $html;
?>

Using jquery you can try as following
<script src='http://code.jquery.com/jquery-latest.min.js' type="text/javascript" ></script>
<?php
$html = '<table id="mytable">
<tr><td>1</td></tr>
<tr><td>2</td></tr>
<tr><td>3</td></tr>
<tr><td>4</td></tr>
<tr><td>5</td></tr>
</table>';
echo $html;
?>
<script>
$(function() {
var TRs = $("#mytable tr");
for(i=0; i<TRs.length; i++) {
if(i>=3) {
$(TRs[i]).remove();
}
}
});
</script>

Related

XPath for td/th based on tr count

Using XPath to webscrape.
The structure is:
<table>
<tbody>
<tr>
<th>
<td>
but one of those tr has contains just one th or one td.
<table>
<tbody>
<tr>
<th>
So I just want to scrape if TR contains two tags inside it. I am giving the path
$route = $path->query("//table[count(tr) > 1]//tr/th");
or
$route = $path->query("//table[count(tr) > 1]//tr/td");
But it's not working.
I am giving the orjinal table's links here. First table's last two TR is has just one TD. That is causing the problem. And 2nd or 3rd table has same issue as well.
https://www.daiwahouse.co.jp/mansion/kanto/tokyo/y35/gaiyo.html
$route = $path->query("//tr[count(*) >= 2]/th");
foreach ($route as $th){
$property[] = trim($th->nodeValue);
}
$route = $path->query("//tr[count(*) >= 2]/td");
foreach ($route as $td){
$value[] = trim($td->nodeValue);
}
I am trying to select TH and TD at the same time. BUT if TR has contains one TD then it caunsing the problem. Because in the and TD count and TH count not same I am scraping more TD then the TH
This XPath,
//table[count(.//tr) > 1]/th
will select all th elements within all table elements that have more than one tr descendent (regardless of whether tbody is present).
This XPath,
//tr[count(*) > 1]/*
will select all children of tr elements with more than one child.
This XPath,
//tr[count(th) = count(td)]/*
will select all children of tr elements where the number of th children equals the number of td children.
OP posted a link to the site. The root element is in the xmlns="http://www.w3.org/1999/xhtml" namespace.
See How does XPath deal with XML namespaces?
If I understand correctly, you want th elements in trs that contain two elements? I think that this is what you need:
//th[count(../*) = 2]
I've included a more explicit path in my answer with a or statement to count TH and TD elements
$html = '
<html>
<body>
<table>
<tbody>
<tr>
<th>I am Included</th>
<td>I am a column</td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<th>I am ignored</th>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<th>I am also Included</th>
<td>I am a column</td>
</tr>
</tbody>
</table>
</body>
</html>
';
$doc = new DOMDocument();
$doc->loadHTML( $html );
$xpath = new DOMXPath( $doc );
$result = $xpath->query("//table[ count( tbody/tr/td | tbody/tr/th ) > 1 ]/tbody/tr");
foreach( $result as $node )
{
var_dump( $doc->saveHTML( $node ) );
}
// string(88) "<tr><th>I am Included</th><td>I am a column</td></tr>"
// string(93) "<tr><th>I am also Included</th><td>I am a column</td></tr>"
You can also use this for any depth descendants
//table[ count( descendant::td | descendant::th ) > 1]//tr
Change the xpath after the condition (square bracketed part) to change what you return.

Find preceding element using PHP Simple HTML Parser

I have some HTML that is setup like the following (this can be different though!):
<table></table>
<h4>Content</h4>
<table></table>
I'm using PHP Simple HTML DOM Parser to loop over a section of code setup like this:
How can I say something like - "Find the table and the preceding h4, grab the text from the h4 if it exists, if it doesn't then leave blank".
If I just use $html->find('div[class=product-table] h4'); then it ignores the fact there was no title for the first table.
This is my full code for context:
$table_rows = $html->find('div[class=product-table] table');
$tablecounter = 1;
foreach ($table_rows as $table){
$tablevalue[] =
array(
"field_5b3f40cae191b" => "Table",
);
}
update_field( $field_key, $tablevalue, $post_id );
Update:
I've found in the documentation that you can use prev_sibling() so I've tried $table_title = $html->find('div[class=product-table] table')->prev_sibling('h4'); but can't seem to get it to work.
I've simplified the example to hopefully show the situation your after, it does assume that the <h4> tag is immediately prior to the <table> tag. But it uses the prev_sibling() of the table tag you find.
require_once 'simple_html_dom.php';
$source = "<html>
<body>
<div class='product-table'>
<table>t1</table>
<h4>Content</h4>
<table>t2</table>
</div>
</body>
</html>";
$html = str_get_html($source);
$table_rows = $html->find('div[class=product-table] table');
foreach ($table_rows as $table){
$prev = $table->prev_sibling();
if ( !empty($prev) && $prev->tag == "h4") {
echo "h4=".(string)$prev->innertext().PHP_EOL;
}
echo "content=".(string)$table.PHP_EOL;
}
echos..
content=<table>t1</table>
h4=Content
content=<table>t2</table>

Getting DOM elements of html from file_get_contents [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 6 years ago.
I am fetching html from a website with file_get_contents. I have a table (with a class name) inside html, and I want to get the data inside html tags.
This is how I fetch the html data from url:
$url = 'http://example.com';
$content = file_get_contents($url);
The html looks like:
<table class="space">
<thead></thead>
<tbody>
<tr>
<td class="marsia">1</td>
<td class="mars">
<div>Mars</div>
</td>
</tr>
<tr>
<td class="earthia">2</td>
<td class="earth">
<div>Earth</div>
</td>
</tr>
</body>
</table>
Is there a way to searh DOM elements in php like we do in jQuery? So that I can access the values 1, 2 (first td) and div's value inside second td.
Something like
a) search the html for table with class name space
b) inside that table, inside tbody, return each tr's 'first td's value' and 'div's value inside second td'
So I get; 1 and Mars, 2 and Earth.
Use the DOM extension, for example. Its DOMXPath class is particularly useful for such kind of tasks.
You can easily set the listed conditions with an XPath expression like this:
//table[#class="space"]//tr[count(td) = 2]/td
where
- //table[#class="space"] selects all table elements from the document having class attribute value equal to "space" string;
- //tr[count(td) = 2] selects all tr elements having exactly two td child elements;
- /td represents the td elements.
Sample implementation:
$html = <<<'HTML'
<table class="space">
<thead></thead>
<tbody>
<tr>
<td class="marsia">1</td>
<td class="mars">
<div>Mars</div>
</td>
</tr>
<tr>
<td class="earthia">2</td>
<td class="earth">
<div>Earth</div>
</td>
</tr>
<tr>
<td class="earthia">3</td>
</tr>
</tbody>
</table>
HTML;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$cells = $xpath->query('//table[#class="space"]//tr[count(td) = 2]/td');
$i = 0;
foreach ($cells as $td) {
if (++$i % 2) {
$number = $td->nodeValue;
} else {
$planet = trim($td->textContent);
printf("%d: %s\n", $number, $planet);
}
}
Output
1: Mars
2: Earth
The code above is supposed to be considered as a sample rather than an instruction for practical use, as it is not very scalable. The logic is bound to the fact that the XPath expression selects exactly two cells for each row. In practice, you may want to select the rows, iterate them, and put the extra conditions into the loop, e.g.:
$rows = $xpath->query('//table[#class="space"]//tr');
foreach ($rows as $tr) {
$cells = $xpath->query('.//td', $tr);
if ($cells->length < 2) {
continue;
}
$number = $cells[0]->nodeValue;
$planet = trim($cells[1]->textContent);
printf("%d: %s\n", $number, $planet);
}
DOMXPath::query() is called with an XPath expression relative to the current row ($tr), then checks if the returned DOMNodeList contains at least two cells. The rest of the code is trivial.
You can also use SimpleXML extension, which also supports XPath. But the extension is much less flexible as compared to the DOM extension.
For huge documents, use extensions based on SAX-based parsers such as XMLReader.

php: how to assign scraped html to array

I want to format what is output by the following php script:
<?php
$stop = $_POST["stop_number"]; // stop_number is an text input value provided by user
$depart_url = "http://64.28.34.43/hiwire?.a=iNextBusResults&StopId=" . $stop;
$html = file_get_contents($depart_url);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$my_xpath_query = "//td[#valign='top']";
$result = $xpath->query($my_xpath_query);
foreach($result as $result_object)
{
echo $result_object->childNodes->item(0)->nodeValue,'<br>';
}
?>
Here is the output (at least in one instance, as the data changes over time).
18 - GOLD
OUTBOUND
8:17p
8:16p
8 - GREEN
OUTBOUND
8:46p
8:46p
8 - GREEN
OUTBOUND
18 - GOLD
OUTBOUND
5 - PLUM
OUTBOUND
EDIT:
I want the output info above to go in a table such as the one below. However instead of the text between tags, it would be variables, or items from the php script output.
<!DOCTYPE html>
<html>
<title>Departure Table</title>
<body>
<h4>Next Departures for Stop Number: __ </h4>
<table border="1px solid black">
<tr>
<th>Route</th>
<th>Direction</th>
<th>Scheduled</th>
<th>Estimated</th>
</tr>
<tr>
<td>18 - Gold</td>
<td>Outbound</td>
<td>8:17p</td>
<td>8:16p</td>
</tr>
<tr>
<td>8 - Green</td>
<td>Outbound</td>
<td>8:46p</td>
<td>8:46p</td>
</tr>
</table>
</body>
</html>
Try appending a \n tag after your echo statement:
echo $result_object->childNodes->item(0)->nodeValue."\n";
EDIT:
If you want to store your data in PHP variables, you could do something like this:
Store data in an array like variable (or any other data structure as per your needs) and iterate over the variable.
$store_data_in_array_variable = array();
foreach($result as $result_object)
{
$store_data_in_array_variable[] = $result_object->childNodes->item(0)->nodeValue;
}
//iterate over all stored values
foreach ($store_data_in_array_variable as $key => $value)
{
echo $key;
echo '<br>';
echo $value;
}

Getting variables from SQL Server for mPDF

I'm using the mPDF class to output a pdf of data from a PHP file. I need to loop through a SQL Server query, save as new variables and write into the $html so it can be outputted to the pdf. I can't place it in the WriteHTML function because it does not recognize PHP code. I need the contents of the whole array so I can't just print one variable.
I have two files:
pdf-test.php:
This file gathers session variables from other php files that are included and reassigns them, so I can use them in the $html.
<?php
// Include files
require_once("form.php");
require_once("configuration.php");
session_start();
$html = '
<h3> Form A </h3>
<div>
<table>
<thead>
<tr>
<th colspan="3">1. Contact Information</th>
</tr>
</thead>
<tr>
<td> First Name: </td>
<td> Last Name: </td>
</tr>
<tr>
<td>'.$firstName.'</td>
<td>'.$lastName.'</td>
</tr>
.
.
.
</table>
';
echo $html;
pdf-make.php:
This file holds the code to actually convert the contents of pdf-test.php into a pdf.
<?php
// Direct to the mpdf file.
include('mpdf/mpdf.php');
// Collect all the content.
ob_start();
include "pdf-test.php";
$template = ob_get_contents();
ob_end_clean();
$mpdf=new mPDF();
$mpdf->WriteHTML($template);
// I: send the file inline to the browser.
$mpdf->Output('cust-form-a', 'I');
?>
This is my loop:
$tbl = "form_Customers";
$sql = "SELECT ROW_NUMBER() OVER(ORDER BY custFirt ASC)
AS RowNumber,
formID,
custFirt,
custLast,
displayRecord
FROM $tbl
WHERE formID = ? and displayRecord = ?";
$param = array($_SESSION["formid"], 'Y');
$stmt = sqlsrv_query($m_conn, $sql, $param);
$row = sqlsrv_fetch_array($stmt);
while ($row = sqlsrv_fetch_array($stmt)) {
$rowNum = $row['RowNumber'];
$firstN = $row['custFirt'];
$lastN = $row['custLast'];
}
When I try to include $rowNum, $firstN or $lastN in the $html such as
<td> '.$rowNum.'</td>
, it just shows up blank.
I'm not sure where the loop should go (which file) or how to include the $rowNum, $firstN and $lastN variables in the $html like the others.
I'm new to PHP (and relatively new to coding in general) and I don't have much experience working with it, but I've been able to make mPDF work for me in similar instances without the query included.
Any help would be greatly appreciated. Thank you so much!
I'm not sure how your loop interacts with the other two files, but this looks overly complex to me. I'd approach this in one .php file, something sort of like this:
<?php
//Include Files
include('mpdf/mpdf.php');
... //Your additional includes
//Define a row template string
$rowtemplate =<<<EOS
<tr>
<td>%%RowNumber%%</td>
<td>%%custFirt%%</td>
<td>%%custLast%%</td>
</tr>
EOS;
//Initialize the HTML for the document.
$html =<<<EOS
<h3> Form A </h3>
... //Your code
<td> Last Name: </td>
</tr>
EOS;
//Loop Code
$tbl = "form_Customers";
... //Your code
$row = sqlsrv_fetch_array($stmt);
while ($row = sqlsrv_fetch_array($stmt)) {
//Copy rowtemplate to a temporary variable
$out_tmp = $rowtemplate;
//Loop through your SQL variables and replace them when they appear in the template
foreach ($row as $key => $val) {
$out_tmp = str_ireplace('%%'.$key.'%%', $val, $out_tmp);
}
//Append the result to $html
$html .= $out_tmp;
}
// Close the open tags in $html
$html .= "</table></div>";
//Write the PDF
$mpdf=new mPDF();
$mpdf->WriteHTML($html);
$mpdf->Output('cust-form-a', 'I');
I'm using heredoc syntax for the strings, since I think this is the cleanest way to include a large string.
Also, I prefer to omit the closing ?> tag as it introduces a stupid source of errors.

Categories