PHP Web Page Scraping - php

I am able to get the coding of a website with file_get_contents but I want to be able to get certain values out of the html. This piece of code is always the same but the value between the html tag changes from time to time. This is the HTML Code:
<div class="cheapest-bins">
<h3>Cheapest Live Buy Now</h3>
<table>
<tbody><tr>
<th>Console</th>
<th>Buy Now Price</th>
</tr>
<tr class=" active">
<td class="xb1">XB1</td>
<td>1,480,000</td>
</tr>
<tr class="">
<td class="ps4">PS4</td>
<td>1,590,000</td>
</tr>
<tr class="">
<td class="x360">360</td>
<td>---</td>
</tr>
<tr class="">
<td class="ps3">PS3</td>
<td>2,800,000</td>
</tr>
</tbody></table>
</div>
How would I go about getting the: 1,480,000 .. 1,590,000 .. --- and 2,800,000?

short answer:
find a css selector library such as https://github.com/tj/php-selector
then you could grab all td:last-child elements/innerhtml
for your specific example you could just just
preg_match_all('#<td>(.*?)</td>#', $html, $matches);

Related

how can I combine well php and html in this code?

I have a query that returns me a long series of questions and answers. in my html page I have div that alternate with slider effect. so far everything is fine. at the moment I try to correctly display a question from the db bumps everything. why?
<div align=Center>
<br/>
<h1><?php echo $rowfirst['Cognome']." ".$rowfirst['Nome']; ?></h1>
<br/>
<div class="nivo-slider">
<div class="navigation"></div>
<?php
while($row = mysqli_fetch_array($result))
{
?>
<div id="nivo">
<div class="element" align=Center><h3>Anamnesi pregressa</h3>
<table class="w3-table w3-bordered " style="width:400px;padding:10px;" align="center">
<tbody>
<tr>
<th scope="row"><?php echo $row['testo'];}?></th>
<td>risposta 1</td>
</tr>
</tbody>
</table>
</div>
<div class="element" align=Center><h3>Scrittura</h3>
<table class="w3-table w3-bordered" style="width:400px;padding:10px;" align="center">
<tbody>
<tr>
<th scope="row">domanda 1</th>
<td>risposta 1</td>
</tr>
</tbody>
</table>
</div>
<div class="element"><h3>Motricità</h3>
<table class="w3-table w3-bordered " style="width:400px;padding:10px;" align="center">
<tbody>
<tr>
<th scope="row">domanda 1</th>
<td>risposta 1</td>
</tr>
</tbody>
</table>
</div>
</div>
<br/>
</div>
</div>
The problem is the while. cycle. how can I solve it?
What you have will create malformed HTML.
In order to prevent malformed HTML, you should place your while () { and } where the tags matches like I have done with <tr> and </tr> shown below:
<?php
while($row = mysqli_fetch_array($result))
{
?>
<tr>
<th scope="row"><?php echo $row['testo'];?></th>
<td>risposta 1</td>
</tr>
<?php } ?>
That will fix your malformed HTML but the while statement may or may not be in the right place and without knowing exactly what you are trying to achieve you'll have to correct it from there. As for the slider you mentioned, you probably need to supply more information if this answer doesn't solve your issue.

Cache is breaking HTML table when refreshing page

I've been battling with this issue for a while now and can't find out why it's happening. I basically have a table which displays Woocommerce variations with each variation being in it's own row. The problem I have is when refreshing the page, the <th> row collapses and doesn't line up with the tbody.
Interestingly, if i change some of the css of the <th> in the console after the page loads, it fixes itself and lines up. The same applies If i open up the console after turn on 'Disable Cache' in the network tab and refresh the page.
My guess is the html is being loaded before the Woocommerce data is being added and doesn't have enough time to get the correct widths. However, this is a recent issue which has only just started happening.
Does anyone know what could cause this?
<table class="variations variations-grid" cellspacing="0">
<thead>
<tr>
<th><img class="camera-icon" src="/wp-content/uploads/2016/10/picture-holder.png"></th>
<th>
Type
</th>
<th>Length</th>
<th class="length">Weight</th>
<th>Height</th>
<th>Depth</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<div class="thumbnail">
<img width="44" height="54" alt="Photinia Red Robin " src="/wp-content/uploads/2016/09/Photinia-Red-Robin-267x325.jpg">
</div>
</td>
<td class="attr attr-attribute_size">
<p>Hedge Bag</p>
</td>
<td class="attr attr-attribute_length">
<p>100cm</p>
</td>
<td class="attr attr-attribute_weight">
<p>75Kg</p>
</td>
<td class="attr attr-attribute_height">
<p>140-150cm+</p>
</td>
<td class="attr attr-attribute_depth">
<p>40cm</p>
</td>
<td class="price">
<span class="price"><span class="woocommerce-Price-amount amount"><span class="woocommerce-Price-currencySymbol">£</span>191.00</span></span>
per metre
</td>
<td class="product-actions">
<a class="button" href="#modal0">More info</a>
</td>
</tr>
</tbody>
</table>
This is what is looks like broken:
This is what is looks like if i the disable cache OR change the css once it's loaded. This is how It should look.
I count more <td> in your <tbody> than <th> in your <thead>. Could this be the issue? Try to add empty <td> to match your <th> head count to test.

Severity: Warning --> fopen(./upload/ <table cellspacing="0"

I am using file read functionality in one my application.
Sometimes I am getting the exception while opening and closing file because in filename I am getting as HTML string, what do I need to do to protect application or to identify this is not the correct name?
Here is the full stack error.
ERROR - 2017-02-27 06:28:21 --> Severity: Warning --> fopen(./upload/
<table cellspacing="0" cellpadding="0" width="100%">
<tbody>
<tr>
<td class="titleBorderx" width="30">
<table height="25" cellspacing="2" cellpadding="0" width="25" bgcolor="black">
<tbody>
<tr>
<td id="L_default_x" class="x" valign="middle" align="center">X</td>
</tr>
</tbody>
</table>
</td>
<td class="titleBorder" id="L_default_2">Network Access Message:<span class="TitleDescription"> The page cannot be displayed</span> </td>
</tr>
</tbody>
</table>
<table id="spacer">
<tbody>
<tr>
<td height="10"></td></tr></tbody></table>
<table width="400">
<tbody>
<tr>
<td nowrap="" width="25"></td>
<td width="400"><span class="explain"><id id="L_default_3"><b>Explanation:</b></id></span><id id="L_default_4"> There is a problem with the page you are trying to reach and it cannot be displayed. </id><br><br>
<b><span class="tryThings"><id id="L_default_5"><b>Try the following:</b></id></span></b>
<ul class="TryList">
<li id="L_default_6"><b>Refresh page:</b> Search for the page again by clicking the Refresh button. The timeout may have occurred due to Internet congestion.
</li><li id="L_default_7"><b>Check spelling:</b> Check that you typed the Web page address correctly. The address may have been mistyped.
</li><li id="L_default_8"><b>Access from a link:</b> If there is a link to the page you are looking for, try accessing the page from that link.
</li></ul>
<id id="L_default_9">If you are still not able to view the requested page, try contacting your administrator or Helpdesk.</id> <br><br>
</td>
</tr>
</tbody>
</table>
<table id="spacer"><tbody><tr><td height="15"></td></tr></tbody></table>
<table width="400">
<tbody>
<tr>
<td nowrap="" width="25"></td>
<td width="400" id="L_default_10"><b>Technical Information (for support personnel)</b>
<ul class="adminList">
<li id="L_default_11">Error Code: 502 Proxy Error. The request was rejected by the HTTP filter. Contact your Forefront TMG administrator. (12217)
</li><li id="L_default_12">IP Address: 10.40.0.20
</li><li id="L_default_13">Date: 2/27/2017 6:28:21 AM [GMT]
</li><li id="L_default_14">Server: SYS2019.netsolpk.com
</li><li id="L_default_15">Source: web filter
</li></ul>
</td>
</tr>
</tbody>
</table>
): failed to open stream: File name too long /srv/users/serverpilot/apps/php/public/application/controllers/Readfile.php 159
Your fopen function has a too long string in it. And I also think this cant be valid. You are looking for a file in ./upload/ with the name:
<table cellspacing="0" cellpadding="0&quot...
The name is too long and invalid. I think you pass a variable to the fopen function the content of the variable isn't correct.

IF/ELSE for header.tpl

I am working on a site that isn't SEO friendly. Specifically, the header.tpl is inserted automatically into every page, with no option to change it based on the content of the page. I.e., whether category = Bathroom or category = Kitchen, etc.
So I need an if/else command, but having trouble figuring it out in this instance, plus the change that goes along with it.
The code on the portfolio_category.php page is as follows, and what needs to change based on vf_category is parts/header.tpl (I can create Bathroomheader.tpl, Kitchensheader.tpl, etc so that relevant tpl has the relevant Title and Description tags for the page).
<?php
$vc_root_friendly = '.';
$vc_root_site = '.';
include_once("$vc_root_site/config.php");
include_once("$vc_includes_folder/IT.php");
$template_body = new HTML_Template_IT();
$template_body->loadTemplateFile("tpl_portfolio_category.html");
include_once("$vc_includes_folder/images.php");
if(!isset($_REQUEST['vf_category'])){
header("location:portfolio.php");
die();
}
//Show header
$template_header = new HTML_Template_IT();
$template_header->loadTemplateFile("parts/header.tpl",true,false);
$template_body->setVariable('header', $template_header->get());
//Show footer
$template_footer = new HTML_Template_IT();
$template_footer->loadTemplateFile("parts/footer.tpl",true,false);
$template_body->setVariable('footer', $template_footer->get());
$template_body->setVariable("image_category", $_REQUEST['vf_category']);
//Select photos for this category
etc.
Complicating things there is another page referenced in the code above:
tpl_portfolio_category.html
And this page too has its own header.tpl include:
<? include_once 'parts/header.tpl'; ?>
{header}
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tr>
<td class="main"><h1><span class="firstLetter">{image_category}</span></h1>
<p>
</p>
<table height="89" border="0" align="center" cellpadding="0" cellspacing="0">
<tr>
<td colspan="7"> </td>
</tr>
<!-- BEGIN block_thumb -->
<tr>
<td width='180' height="120" background="./images/thumb-bg.gif"> <div
align="center">{thumb1}</div></td>
<td width="10"> </td>
<td width='180' background="./images/thumb-bg.gif"> <div align="center">{thumb2}
</div></td>
<td width="10"> </td>
<td width='180' background="./images/thumb-bg.gif"> <div align="center">{thumb3}
</div></td>
<td width="10"> </td>
<td width='180' background="./images/thumb-bg.gif"> <div align="center">{thumb4}
</div></td>
</tr>
<tr valign="bottom">
<td height="3"></td>
<td height="3"></td>
<td height="3"></td>
<td height="3"></td>
<td height="3"></td>
<td height="3"></td>
<td height="3"></td>
</tr>
<!-- END block_thumb -->
</table>
<br>
<img src="images/spacer.gif"></td>
</tr>
</table>
{footer}
Any guidance would be appreciated! I'm just trying to get parts/header.tpl to change to parts/Bathroomheader.tpl or parts/Kitchenheader.tpl based on the vf_category pulled from the database. But it's driving me nuts.
Thanks in advance,
Gary
there are a few ways to do this, but in order to minimize the number of files you're changing, I suggest that you:
a) Pull the variable from the database and assign it to smarty, on your first php file, for both the header and body:
//Assuming you have retrieved $vf_category from your database;
$template_header->setVariable('vf_category', $vf_category);
$template_body->setVariable('vf_category', $vf_category);
b) Edit your file header.tpl and append the following code at the top:
{if 'vf_category' == 'some_value'}
{include file="parts/Kitchenheader.tpl"};
{elseif 'vf_category' == 'other_value'}
{include file="parts/Bathroomheader.tpl"};
{else}
//the rest of the header.tpl code goes here
{/if}

php replace images with divs

below is the markup im pulling from my database table. basically i want to replace the image
<img src="http://newvision.co.ug/IM/logo_white_big.gif" width="80" style="background-color:white;padding:1px">
to
<div style='background:url(http://newvision.co.ug/IM/logo_white_big.gif) center center no-repeat;width:40px;height:40px'></div>
I dnt wanna use regular expressions just an htmlparser that ships with php
<table>
<tbody>
<tr>
<td valign="top"><a href="http://newvision.co.ug/PA/8/13/748484" target=
"_blank"><img src="http://newvision.co.ug/IM/logo_white_big.gif" width="80"
style="background-color:white;padding:1px" /></a></td>
<td valign="top">
<table>
<tbody>
<tr>
<td></td>
</tr>
<tr>
<td valign="top"><b><a target="_blank" href=
"http://newvision.co.ug/PA/8/13/748484" style="font-size:9pt">The New
Vision Online : Holland withholds sh10b over CHOGM</a></b></td>
</tr>
<tr>
<td valign="top"><a href="http://newvision.co.ug/PA/8/13/748484" style=
"font-size:8pt;color<img src="smilies/worry.gif" alt="worry" />ilver"
target="_blank">http://newvision.co.ug/PA/8/13/748484</a></td>
</tr>
<tr>
<td valign="top" style="font-size:8pt;font-weight:normal">The New Vision
is Uganda's leading daily newspaper.</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
There is no parser that ships with PHP, so use PHPQuery, a way of manipulating the DOM in a JQuery like manner instead. This will allow you to use selectors to easily swap out chunks of HTML.

Categories