Load and display HTML first before PHP - php

I have 2 php files(index.php and lelong.php). I am trying to load the html first in the index.php (table) and display the word(Calculating...) on the second column while the lelong.php extracting the data from the website before outputting them.
Is there a way to do that? I heard of using JS or AJAX but not really sure how to do it.
Index.php
<!DOCTYPE html>
<html>
<head>
<?php include 'lelong.php'; ?>
</head>
<body>
<table border ="1" style = "width:50%">
<tr>
<td>E-Commerce Website</td>
<td>No. of Products </td>
</tr>
<tr>
<td>Lelong</a></td>
<td><?php echo $lelong; ?></td>
</tr>
</table>
<body>
lelong.php
<?php
$grep = new DoMDocument();
#$grep->loadHTMLFile("http://www.lelong.com.my/Auc/List/BrowseAll.asp");
$finder = new DomXPath($grep);
$class = "CatLevel1";
$nodes = $finder->query("//*[contains(#class, '$class')]");
$total_L = 0;
foreach ($nodes as $node) {
$span = $node->childNodes;
$search = array(0,1,2,3,4,5,6,7,8,9);
$number = str_replace($search, '', $span->item(1)->nodeValue);
$number = preg_replace("/[^0-9]/", '', $span->item(1)->nodeValue);
$total_L += (int) $number;
}
$lelong = number_format( $total_L , 0 , '.' , ',' );
?>
Thanks

Assuming lelong.php is already working fine, yes you could use ajax to get the result:
Basic example:
So in your HTML:
<table border ="1" style = "width:50%">
<tr>
<td>E-Commerce Website</td>
<td>No. of Products </td>
</tr>
<tr>
<td>Lelong</a></td>
<td class="result_data">Calculating ...</td><!-- initial content. Loading ... -->
</tr>
</table>
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
$(document).ready(function(){
// use ajax, call the PHP
$.ajax({
url: 'lelong.php', // path of lelong
success: function(response){
$('.result_data').text(response);
}
})
});
</script>
Then in your PHP:
<?php
$grep = new DoMDocument();
#$grep->loadHTMLFile("http://www.lelong.com.my/Auc/List/BrowseAll.asp");
$finder = new DomXPath($grep);
$class = "CatLevel1";
$nodes = $finder->query("//*[contains(#class, '$class')]");
$total_L = 0;
foreach ($nodes as $node) {
$span = $node->childNodes;
$search = array(0,1,2,3,4,5,6,7,8,9);
$number = str_replace($search, '', $span->item(1)->nodeValue);
$number = preg_replace("/[^0-9]/", '', $span->item(1)->nodeValue);
$total_L += (int) $number;
}
$lelong = number_format( $total_L , 0 , '.' , ',' );
echo $lelong; // output lelong
exit;
?>
The effects on the front are yours to control. You could use plugins for that.

In your lelong.php page, Add a line echo $lelong; At the end.
In your index.php, remove <?php include 'lelong.php'; ?>, and import JQuery library. For example <script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
Replace <td><?php echo $lelong; ?></td> by <td id='lelong'>Calculating...</td>
Add jquery code <script>$('#lelong').load('lelong.php');</script>
It's better for you to learn jQuery first.

Related

PHP web scraping HTMLDOM pagination

I am scraping this url as it is my final year project but this code only scrape 1 page of searched query I want pagination (like 1,2,3,4,5) at the end please help
I have implemented one data scraping script which fetch data using CURL.
But that fetch record only one page but i want all data because on that page pagination is there.
<form action="" method="post" class="form-horizontal" id="home-search">
<input type="text" name="keyword" id="keyword">
<input type="submit">
</form>
<?php
if(isset($_POST['keyword'])){
$keyword = urlencode($_POST['keyword']);
ini_set('display_errors', 1);
ini_set('max_execution_time', 300);
$html = file_get_contents('https://www.bestjobs.co.za/jobs/?q='.$keyword);
//echo $html;
$indeedDotPk = array();
//$html = file_get_contents($result);
libxml_use_internal_errors( true);
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXpath( $doc);
$node = $xpath->query( '//div[#class="paginas"]/ul/li/a/#href');
$total_pages = 0;
$start = 0;
$job_title_index = 0;
$job_link_index = 0;
$job_description_index = 0;
$job_experience_index = 0;
foreach ($node as $key => $value) {
$total_pages++;
// echo $value->textContent;
// echo "<br>";
// echo "<br>";
// echo "<br>";
}
for ($i=0; $i < $total_pages; $i++) {
ini_set('max_execution_time', 300);
$html = file_get_contents('https://www.bestjobs.co.za/jobs/?q='.$keyword.'&start='.$start);
libxml_use_internal_errors( true);
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXpath( $doc);
// Job Description
$node = $xpath->query('//a[#class="js-o-link"]');
foreach ($node as $key => $value) {
if(is_string($value->textContent)){
$indeedDotPk[$job_description_index++]['job_description'] = $value->textContent;
}
}
// Job Description
$start = $start + 10;
}
foreach ($indeedDotPk as $key => $value) {
if(!empty($value['job_description'])){
?>
<table border="1">
<tr >
<td>
</td>
<td>
</td>
<td>
</td>
<td>
<?php echo $value['job_description']?>
</td>
</tr>
Does anyone have an idea how I can set pagination in the end like 1,2,3,4,5 ?
If anyone has any suggestion then please help me.
Thanks...
Pass the paging parameter in the url like this
https://www.bestjobs.co.za/jobs/?q=sales&p=2
Wrap everything in a function and using for loop pass the paging parameter to the function like this
function webScrape($p){
//scraping code
}
for($i=0;$i>=100;$i++){
webScrape($i);
}

php preg_match table and wrapping div

I have CMS driven content and when saving prep the content, as part of that, I want to clean the tables the authors create.
We use BootStrap on the front end, so want to be able to first - grab all tables.
Check the parent elements, if it is not <div class="table-resposnsive">, wrap it in that.
I have:
// $content = $_POST['content'];
// Set some TEST content
$content = "<h1>My Content</h1>
<p>This is some content</p>
<table border=\"1\">
<tr>
<td>cell</td>
<td>cell</td>
<td>cell</td>
</tr>
<tr>
<td>cell</td>
<td>cell</td>
<td>cell</td>
</tr>
</table>
<div align=\"center\">see the above content</div>
<p>Thanks!</p>\n\n";
// Make our example content longer with more variations...
$content = $content .
str_replace('<table border="1">', '<table border="0" class="my-table">', $content) .
str_replace('<table border="1">', '<table border="0" cellpadding="0" cellspacing="3">', $content);
$output = $content;
// Parse for table tags
preg_match_all("/<table(.*?)>/", $content, $tables);
// If we have table tags..
if(count($tables[1]) > 0) {
// loop over and get teh infor we want to build the new table tag.
foreach($tables[0] as $key => $match) {
$add_class = array();
$tag = ' '. $tables[1][$key] .' ';
$add_class[] = 'table';
// check if we have got Borders....
// If we do. add the bootstrap table-border calss.
if(strpos($tag, 'border="0"') === FALSE) {
$add_class[] = 'table-bordered';
}
// prepend any existing/custom classes.
if(strpos($tag, 'class="') > 0) {
preg_match("/class=\"(.*?)\"/", $tag, $classes);
if($classes[1]) {
$add_class = array_merge($add_class, explode(' ', $classes[1]));
}
}
// add classes.
$add_class = array_unique($add_class);
// Now - replace the original <table> tag with the new BS tag.
// adding any class attrs
// wrap in the responsive DIV. - THIS part - needs to be only added if its not already wrapped...
// this would happen if we have already edited the page before right ...
$output = str_replace($match, '<div class="table-responsive">'."\n".'<table class="'. implode(' ', $add_class) .'">', $output);
}
// replace all closing </table> tags with the closing responsive tag too...
$output = str_replace('</table>', '</table>'."\n".'</div>', $output);
}
echo highlight_string($content, TRUE);
echo '<hr>';
echo highlight_string($output, TRUE);
You can use Simple HTML dom parser do select divs
https://github.com/sunra/php-simple-html-dom-parser
$html = new simple_html_dom();
$html->file_get_html(__filepath__);
# get an element representing the second paragraph
$element = $html->find("#youdiv");`
Good luck

Unable to get both child elements with xpath from xhtml using xquery in php to manipulate

The xhtml data I need to get the childNodes from I don't need the child from the TH childNODES
<table>some data</table>
<table>
<tr>
<td class="c2">PCI Signal Error (SERR#) Enable</td>
<td>Yes</td>
</tr>
<tr>
<td class="c1">Controller Type 1</td>
<td>CISS</td>
</tr>
<tr>
<td class="c2">bus type</td>
<td>CISS</td>
</tr>
<tr>
<th><a name="systempcibus5">PCI Bus 31</a></th>
<td>Device</td>
</tr>
</table>
below is the latest attempt, I only want to get the textContent for the TD's in the above xml
so I can build a mysql statement to insert the data in mySql
I have tried so many variations over the last week.
I get this error. I won't bore you with all the various things I tried, but I believe this is the closest to what I want.
PHP Notice: Trying to get property of non-object in C:\inetpub\wwwroot\reports\gec\test1.php on line 40
<?php
libxml_use_internal_errors(true);
$dom = new DomDocument;
$dom->loadHTML($html);
$xpath = new DomXPath($dom);
$nodes = $xpath->query('/html/body/table[2]/tr');
//$nodes = $xpath->query("//tr[contains(concat(' ', #class, ' '), ' head ') ");
//header("Content-type: text/plain");
$node_count=$nodes->length ;
for( $i = 1; $i <= intval($node_count); $i++)
{
$node_td1 = $xpath->query('/html/body/table[2]/tr[$i]/td[1]');
$node_td2 = $xpath->query('/html/body/table[2]/tr[$i]/td[2]');
$result1=$node_td1->textContent;
$result2=$node_td2->textContent;
echo $result1 . "," . $result2 . "<br>";
}
Alternatively, you could just point out the row itself, then filter them out using that ->tagName:
$dom = new DomDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DomXPath($dom);
$rows = $xpath->query('/html/body/table[2]/tr');
foreach ($rows as $row) {
foreach($row->childNodes as $col) {
if(isset($col->tagName) && $col->tagName != 'th') {
echo $col->textContent . '<br/>';
}
}
echo '<hr/>';
}
Or with using xpath, to reference each row:
foreach ($rows as $row) {
$col1 = $xpath->evaluate('string(./td[1])', $row);
$col2 = $xpath->evaluate('string(./td[2])', $row);
echo $col1 . '<br/>';
echo $col2 . '<br/>';
echo '<hr/>';
}
Sample Output

xpath match not working

I'm trying to fetch the content of message field with "//td[text()='message']/following-sibling::*/text()" from this result ( from curl ):
<BODY bgcolor=#dddddd>
<TABLE bgcolor=#dddddd border=1>
<TR>
<TD valign="top"><B>Something</B></TD>
<TD>ca</TD>
</TR>
<TR>
<TD valign="top"><B>Some list</B></TD>
<TD>
<TABLE>
<TR>
<TD>CA</TD>
</TR>
</TABLE>
</TD>
</TR>
<TR>
<TD valign="top"><B>message</B></TD>
<TD>CA already existed.</TD>
</TR>
</TABLE>
</BODY>
<br>
But it doenst seens to work, The funny thing is using the same expression with python i can get it to work. So, how could i get the content of the message field?
PS: I'm using this online tester tool: http://www.xpathtester.com/test
EDIT: This is my actual php code:
<?php
function get_url_data($acl)
{
// curl request
$xml_content = http_request($acl);
echo $xml_content ;
$dom = new DOMDocument();
#$dom->loadXML($xml_content);
$xpath = new DomXPath($dom);
$content_title = $xpath->query("//td[text()='message']/following-sibling::*/text()");
return $content_title;
}
if(isset($_POST)==true && empty($_POST)==false){
//Convert content of text area into an array
$data = explode("\n", str_replace("\r", "", $_POST['sendme']));
}
foreach ($data as $name => $value){
$content = get_url_data($value);
foreach ($content as $value)
{
echo $value->nodeValue . "<br/>";
}
echo "<br>";
}
?>
I was able to get it working with:
<?php
if(isset($_POST)==true && empty($_POST)==false){
//Convert content of text area into an array
$data = explode("\n", str_replace("\r", "", $_POST['sendme']));
}
foreach ($data as $name => $value){
$content = create_acl($value);
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
#$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$filtered = $xpath->query("//td[text()='message']/following-sibling::*/text()");
foreach ($filtered as $e) {
echo $e->nodeValue;
}
echo "<br>";
}
?>

Preserving <br> tags when parsing HTML text content

I have a little issue.
I want to parse a simple HTML Document in PHP.
Here is the simple HTML :
<html>
<body>
<table>
<tr>
<td>Colombo <br> Coucou</td>
<td>30</td>
<td>Sunny</td>
</tr>
<tr>
<td>Hambantota</td>
<td>33</td>
<td>Sunny</td>
</tr>
</table>
</body>
</html>
And this is my PHP code :
$dom = new DOMDocument();
$html = $dom->loadHTMLFile("test.html");
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
$cols = $row->getElementsByTagName('td');
echo $cols->item(0)->nodeValue.'<br />';
echo $cols->item(1)->nodeValue.'<br />';
echo $cols->item(2)->nodeValue;
}
But as you can see, I have a <br> tag and I need it, but when my PHP code runs, it removes this tag.
Can anybody explain me how I can keep it?
I would recommend you to capture the values of the table cells with help of XPath:
$values = array();
$xpath = new DOMXPath($dom);
foreach($xpath->query('//tr') as $row) {
$row_values = array();
foreach($xpath->query('td', $row) as $cell) {
$row_values[] = innerHTML($cell);
}
$values[] = $row_values;
}
Also, I've had the same problem as you with <br> tags being stripped out of fetched content for the reason that they themselves are considered empty nodes; unfortunately they're not automatically replaced with a newline character (\n);
So what I've done is designed my own innerHTML function that has proved invaluable in many projects. Here I share it with you:
function innerHTML(DOMElement $element, $trim = true, $decode = true) {
$innerHTML = '';
foreach ($element->childNodes as $node) {
$temp_container = new DOMDocument();
$temp_container->appendChild($temp_container->importNode($node, true));
$innerHTML .= ($trim ? trim($temp_container->saveHTML()) : $temp_container->saveHTML());
}
return ($decode ? html_entity_decode($innerHTML) : $innerHTML);
}

Categories