How to parse plaintext data from a particular url with PHP? - php

I am using this code to pull data to my site from an external source, and am wondering how I would parse this using PHP.
include(app_path().'/Includes/simple_html_dom.php');
$html = file_get_html("http://www.teamsideline.com/sites/parkfun/schedule/153461/Adult-Outdoor-Soccer-Mens-Competitive", NULL, NULL, NULL, NULL);
$data = array();
foreach($html->find("tbody tr") as $tr){
$row = array();
foreach($tr->find("td") as $td){
/* enter code here */
$row[] = $td->plaintext;
}
$data[] = $row;
}
I am trying to extract the standings data in particular, and recreate the standings table on my page with the data from this site. How can I achieve this?

Seems like the website doesn't allow file_get_html(). Atleast for me it's not working.
Anyway, you could use cUrl to get the html string and then parse that string using str_get_html().
To display the table on your website, simply echo the found data like this:
$url = 'http://www.teamsideline.com/sites/parkfun/schedule/153461/Adult-Outdoor-Soccer-Mens-Competitive';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$result = curl_exec($curl);
$html = str_get_html($result);
$data = "";
$data .= "<table width='100%'>";
foreach($html->find("tbody tr") as $tr){
$data .= "<tr>";
foreach($tr->find("td") as $td){
$data .= "<td style='border: 1px black solid'>";
$data .= $td->plaintext;
$data .= "</td>";
}
$data .= "</tr>";
}
$data .= "</table>";
echo $data;

Related

I am trying to grav meta tags of a webpage, help me to convert this into array?

I have a code by which I can fetch meta tags of a webpage. But I want to use it to fetch data from multiple pages or sites.
I have written this test code to grab metadata of a webpage but I want to grab the metadata of multiple pages at once. I am unable to do this by myself
<?php
// Web page URL
$url = $_POST['url'];
// Extract HTML using curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
// Load HTML to DOM object
$dom = new DOMDocument();
#$dom->loadHTML($data);
// Parse DOM to get Title data
$nodes = $dom->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
// Parse DOM to get meta data
$metas = $dom->getElementsByTagName('meta');
$description = $keywords = '';
for($i=0; $i<$metas->length; $i++){
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description'){
$description = $meta->getAttribute('content');
}
if($meta->getAttribute('name') == 'keywords'){
$keywords = $meta->getAttribute('content');
}
}
?>
<form class="form-horizontal" method="post" action="j.php">
<textarea class="form-control" id="name" placeholder="https://google.com" name="url" value=""></textarea>
<button
class="btn btn-primary send-button"
id="submit"
type="submit"
value="SEND">
<span class="send-text">SEND</span>
</button>
</form>
<?php
echo "<table border='1'> <tr/><td width='30%'>Title: $title</td>". "<td width='30%'>Description: $description</td>"."<td width='30%'>Keywords: $keywords</td></tr></table>";
?>
I am trying to fetch meta of a website but I want to use array for multiple pages
If you specified multiple URLs in your textarea - perhaps each one on a separate line - then you could split that value into an array and loop.
So something like this at the top of your PHP code:
$urls = [];
if (!empty($_POST['url'])) {
$urls = array_filter(array_map('trim', explode(PHP_EOL, $_POST['url'])));
}
$results = [];
foreach ($urls as $url) {
// ... then your code as before
$results[$url] = [
'title' => $title,
'description' => $description,
'keywords' => $keywords,
];
}
Then a display loop at the bottom
if (!empty($results)) {
echo "<table border='1'>";
foreach ($results as $url => $result) {
echo "<tr>";
echo "<td width='30%'>Title: " . htmlspecialchars($result['title']) . "</td>"
echo "<td width='30%'>Description: " . htmlspecialchars($result['description']) . "</td>";
echo "<td width='30%'>Keywords: " . htmlspecialchars($result['keywords']) . "</td>";
echo "</tr>";
}
echo "</table>";
}

how to get the content from external url when that page is fully loaded?

$data = array();
$html = file_get_html('www.example.com');
foreach($html->find(".bidsTable tr") as $tr){
$row = array();
foreach($tr->find("td") as $td){
/* enter code here */
$row[] = $td->plaintext;
}
$data[] = $row;
}
I am using this code to get the content from external url but I do not get the full html code is there any way to get the content when that page is fully loaded.
any help will be appreciated.
You can use cURL to get HTML content,
try like this, It will certainly wait until content is loaded.
$ch = curl_init("http://www.example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$html = curl_exec($ch);
curl_close($ch);
foreach($html->find(".bidsTable tr") as $tr){
$row = array();
foreach($tr->find("td") as $td){
/* enter code here */
$row[] = $td->plaintext;
}
$data[] = $row;
}

Issues with decoding two JSON feed sources and display with PHP/HTML

I am using two JSON feed sources and PHP to display a real estate property slideshow with agents on a website. The code was working prior to the feed provider making changes to where they store property and agent images. I have made the necessary adjustments for the images, but the feed data is not working now. I have contacted the feed providers about the issue, but they say the problem is on my end. No changes beyond the image URLs were made, so I am unsure where the issue may be. I am new to JSON, so I might be missing something. I have included the full script below. Here are the two JSON feed URLs: http://century21.ca/FeaturedDataHandler.c?DataType=4&EntityType=2&EntityID=2119 and http://century21.ca/FeaturedDataHandler.c?DataType=3&AgentID=27830&RotationType=1. The first URL grabs all of the agents and the second grabs a single agent's properties. The AgentID value is sourced from the JSON feed URL dynamically.
class Core
{
private $base_url;
private $property_image_url;
private $agent_id;
private $request_agent_properties_url;
private $request_all_agents_url;
private function formatJSON($json)
{
$from = array('Props:', 'Success:', 'Address:', ',Price:', 'PicTicks:', ',Image:', 'Link:', 'MissingImage:', 'ShowingCount:', 'ShowcaseHD:', 'ListingStatusCode:', 'Bedrooms:', 'Bathrooms:', 'IsSold:', 'ShowSoldPrice:', 'SqFootage:', 'YearBuilt:', 'Style:', 'PriceTypeDesc:');
$to = array('"Props":', '"Success":', '"Address":', ',"Price":', '"PicTicks":', ',"Image":', '"Link":', '"MissingImage":', '"ShowingCount":', '"ShowcaseHD":', '"ListingStatusCode":', '"Bedrooms":', '"Bathrooms":', '"IsSold":', '"ShowSoldPrice":', '"SqFootage":', '"YearBuilt":', '"Style":', '"PriceTypeDesc":' );
return str_ireplace($from, $to, $json); //returns the clean JSON
}
function __construct($agent=false)
{
$this->base_url = 'http://www.century21.ca';
$this->property_image_url = 'http://images.century21.ca';
$this->agent_id = ($agent ? $agent : false);
$this->request_all_agents_url =
$this->base_url.'/FeaturedDataHandler.c?DataType=4&EntityType=3&EntityID=3454';
$this->request_agent_properties_url =
$this->base_url.'/FeaturedDataHandler.c?DataType=3'.'&AgentID='.$this->agent_id.'&RotationType=1';
}
/**
* getSlides()
*/
function getSlides()
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $this->request_all_agents_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$response = curl_exec($ch);
curl_close($ch);
if (empty($response))
return false;
else
$agents = $this->decode_json_string($response);
// Loop Agents And Look For Requested ID
foreach ($agents as $agent)
{
if (($this->agent_id != false) && (isset($agent['WTLUserID'])) && ($agent['WTLUserID'] != $this->agent_id))
{
continue; // You have specified a
}
$properties = $this->getProperties($agent['WTLUserID']);
$this->print_property_details($properties, $agent);
}
}
/**
* getProperties()
*/
function getProperties($agent_id)
{
$url = $this->base_url.'/FeaturedDataHandler.c?DataType=3'.'&AgentID='.$agent_id.'&RotationType=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$response = curl_exec($ch);
curl_close($ch);
$json = json_decode($response);
if (empty($response))
die('No response 2'); //return false;
else
$json = $this->formatJSON($this->decode_json_string($response));
var_dump($json);
die();
// return $json;
}
/**
* print_property_details()
*/
function print_property_details($properties, $agent, $html='')
{
$BASE_URL = $this->base_url;
$PROPERTY_IMAGE_URL = $this->property_image_url;
foreach ($properties as $property)
{
$img = $property['Image'];
// $img = ($property['Image'] ? $property['Image'] : "some url to a dummy image here")
if($property['ListingStatusCode'] != 'SOLD'){
$address = $property['Address'];
$shortaddr = substr($address, 0, -12);
$html .= "<div class='listings'>";
$html .= "<div class='property-image'>";
$html .= "<img src='". $PROPERTY_IMAGE_URL ."' width='449' height='337' alt='' />";
$html .= "</div>";
$html .= "<div class='property-info'>";
$html .= "<span class='property-price'>". $property['Price'] ."</span>";
$html .= "<span class='property-street'>". $shortaddr ."</span>";
$html .= "</div>";
$html .= "<div class='agency'>";
$html .= "<div class='agent'>";
$html .= "<img src='". $agent['PhotoUrl']. "' class='agent-image' width='320' height='240' />";
$html .= "<span class='agent-name'><b>Agent:</b>". $agent['DisplayName'] ."</span>";
$html .= "</div>";
$html .= "</div>";
$html .= "</div>";
}
}
echo $html;
}
function decode_json_string($json)
{
// Strip out junk
$strip = array("{\"Agents\": [","{Props: ",",Success:true}",",\"Success\":true","\r","\n","[{","}]");
$json = str_replace($strip,"",$json);
// Instantiate array
$json_array = array();
foreach (explode("},{",$json) as $row)
{
/// Remove commas and colons between quotes
if (preg_match_all('/"([^\\"]+)"/', $row, $match)) {
foreach ($match as $m)
{
$row = str_replace($m,str_replace(",","|comma|",$m),$row);
$row = str_replace($m,str_replace(":","|colon|",$m),$row);
}
}
// Instantiate / clear array
$array = array();
foreach (explode(',',$row) as $pair)
{
$var = explode(":",$pair);
// Add commas and colons back
$val = str_replace("|colon|",":",$var[1]);
$val = str_replace("|comma|",",",$val);
$val = trim($val,'"');
$val = trim($val);
$key = trim($var[0]);
$key = trim($key,'{');
$key = trim($key,'}');
$array[$key] = $val;
}
// Add to array
$json_array[] = $array;
}
return $json_array;
}
}
Try this code to fix the JSON:
$url = 'http://century21.ca/FeaturedDataHandler.c?DataType=3&AgentID=27830&RotationType=1';
$invalid_json = file_get_contents($url);
$json = preg_replace("/([{,])([a-zA-Z][^: ]+):/", "$1\"$2\":", $invalid_json);
var_dump($json);
All your keys need to be double-quoted
JSON on the second URL is not a valid JSON, that's why you're not getting the reults, as PHP unable to decode that feed.
I tried to process it, and get this error
Error: Parse error on line 1:
{Props: [{Address:"28
-^
Expecting 'STRING', '}'
Feed image for first URL
and here is view of 2nd URL's feed
as per error for second feed, all the keys should be wrapped within " as these are strings rather than CONSTANTS.
e.g.
Props should be "Props" and all other too.
EDIT
You need to update your functionand add this one(formatJSON($json)) to your class
// Update this function, just need to update last line of function
function getProperties($agent_id)
{
$url = $this->base_url.'/FeaturedDataHandler.c?DataType=3'.'&AgentID='.$agent_id.'&RotationType=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$response = curl_exec($ch);
curl_close($ch);
$json = json_decode($response);
if (empty($response))
die('No response 2'); //return false;
else
return $this->formatJSON($this->decode_json_string($response)); //this one only need to be updated.
}
//add this function to class. This will format json
private function formatJSON($json){
$from= array('Props:', 'Success:', 'Address:', ',Price:', 'PicTicks:', ',Image:', 'Link:', 'MissingImage:', 'ShowingCount:', 'ShowcaseHD:', 'ListingStatusCode:', 'Bedrooms:', 'Bathrooms:', 'IsSold:', 'ShowSoldPrice:', 'SqFootage:', 'YearBuilt:', 'Style:', 'PriceTypeDesc:');
$to = array('"Props":', '"Success":', '"Address":', ',"Price":', '"PicTicks":', ',"Image":', '"Link":', '"MissingImage":', '"ShowingCount":', '"ShowcaseHD":', '"ListingStatusCode":', '"Bedrooms":', '"Bathrooms":', '"IsSold":', '"ShowSoldPrice":', '"SqFootage":', '"YearBuilt":', '"Style":', '"PriceTypeDesc":' );
return str_ireplace($from, $to, $json); //returns the clean JSON
}
EDIT
I've tested that function, and it's working fine, may be there is something wrong with your function decode_json_string($json)
I've taken unclean json from second URL, and cleaning it here, and putting that cleaned json in json editor to check either it's working or not HERE

Sequential cURLs returns the same content despite updating url

I'm trying to write a script to cURL a few pages from a password protected site.
The idea is to scrape information on submitted stock codes from their products database to generate and print out the results (eventually importing directly to my own database, but currently just printing the results on screen).
My function is as follows:
function LookupProduct($ItemCodes) {
//set a temp file name for the login cookie
$tmp_fname = "tmp/".md5(date('D F d')).".cookie";
$tmp_fname = realpath($tmp_fname);
//reset/declare the functions output
$return = '';
// build post data from form
$fields = array(
'UserName' => urlencode("username"),
'Password' => urlencode("password"),
);
$fieldString='';
foreach($fields as $key=>$value) {
$fieldString .= $key.'='.$value.'&';
}
rtrim($fieldString, '&');
//initialise the curl session
$ch = curl_init();
//set options for curl login
$loginurl = "https://suppliers-website/login/";
curl_setopt($ch,CURLOPT_URL, $loginurl);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_COOKIESESSION, true);
curl_setopt($ch,CURLOPT_POST, count($fields));
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_POSTFIELDS, $fieldString);
curl_setopt($ch,CURLOPT_COOKIEJAR, $tmp_fname);
curl_setopt($ch,CURLOPT_COOKIEFILE, $tmp_fname);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, true);
//do the actual login, generate cookie
$result = curl_exec($ch);
//build array of codes to lookup
$codes=explode(",", $ItemCodes);
//lookup each code in the array
foreach($codes as $code) {
//set the product page to curl
$lookupUrl = "https://suppliers-website/product/".$code;
curl_setopt($ch,CURLOPT_URL, $lookupUrl);
//load product page html into $lookupcontent
unset($lookupcontent);
$lookupcontent = curl_exec($ch);
//if we have a valid page, then go ahead and pluck the data
if (strlen($lookupcontent) < 100) {
echo "<li>Error logging in: <blockquote>".$lookupcontent."</blockquote></li>";
} else {
//load product page html into a DOM
unset($dom);
unset($xpath);
$dom = new DOMDocument;
$dom->loadHTML($lookupcontent);
$xpath = new DOMXPath($dom);
//find the image src
unset($imgnames);
foreach($dom->getElementsByTagName('a') as $node) {
if (strpos($node->getAttribute('href'),'StockLoRes') !== false) {
$imgnames = explode("=", $node->getAttribute('href'));
$imgname = $imgnames[1];
$filelocation = $node->getAttribute('href');
}
}
//set the image to curl
$imglink = "https://suppliers-website/login/".$filelocation;
curl_setopt($ch,CURLOPT_URL,$imglink);
//curl the image
unset($curlimage);
$curlimage = curl_exec($ch);
//save the image locally
unset($saveimage);
$saveimage = fopen('tmp/'.$imgname, 'w');
fwrite($saveimage, $curlimage);
fclose($saveimage);
// find the product description
unset($results);
$classname = 'ItemDetails_Description';
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
$description = $results->item(0)->nodeValue;
$description = strip_tags($description);
$description = str_replace("•", "", $description);
}
//find the price
unset($pricearray);
foreach($dom->getElementsByTagName('div') as $node) {
if (strpos($node->nodeValue,'£') !== false) {
$pricearray[] = $node->nodeValue;
}
}
$pricearray=array_reverse($pricearray);
$price = $pricearray[0];
$price = str_replace("£", "", $price);
//find the title
unset($results);
$classname = 'ItemDetails_ItemName';
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
$title = $results->item(0)->nodeValue;
}
//find the publisher
unset($results);
$classname = 'ItemDetails_Publisher';
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
$publisher = $results->item(0)->nodeValue;
}
}
//add all the values to the data to be returned
$return .= '<div style="border:1px solid grey;margin:20px;float:left;">';
$return .= "<a href='tmp/".$imgname."'>";
$return .= "<img src='tmp/".$imgname."' width='100' align='left' /></a>";
$return .= "<h1>" .$title ."</h1>";
$return .= "<h3>" .$publisher ."</h3>";
$return .= "<h2>£" .$price ."</h2>";
$return .= "<h4>" .$description."</h2>";
$return .= '</div><br clear="all" />';
}
//echo out the data
echo $return;
//close connection
curl_close($ch);
}
I am using the following to trigger it:
if(isset($_POST['ItemCodes'])) {
$code=$_POST['ItemCodes'];
$code=str_replace("\n\r", ",", $code);
$code=str_replace("\r", ",", $code);
echo "ItemCodes: ".$code;
echo LookupProduct($code);
}
The script can successfully log in, save a cookie, and get info from a page, but if I try to request multiple pages the script fails to work as intended, instead returning 3 instances of the same product. Did I fail to reset a variable somewhere? I've tried unsetting everything but I still just get the same product three times, as if my function only works once.

php function adding multiple html lines to variable

I'm trying to create a function which contains a table and then loops through data and add rows equal to the loop execution. At the moment it shows an empty page and was wondering what i'm doing wrong in order to return the table? You wont be able to see the result since a backend file is required which is quite big, but i hope someone can see what i'm doing wrong in order to output the table?
$url = "http://URL";
function getHTML($url,$timeout)
{
$ch = curl_init($url); // initialize curl with given url
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set useragent
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
return #curl_exec($ch);
}
function getGames() {
$html = str_get_html(getHTML($url,10));
//$title = str_replace(array("\n", "\r"), '',$html->find("/[#id='main']/div[1]/div[2]/div[1]/h2/strong",0)->plaintext);
//$manuf = $html->find("/[#id='main']/div[1]/div[2]/div[3]/table/tbody/tr[1]/td[2]/strong",0)->plaintext;
$table = $html->find("/[#id='matches_list']/",0);
$livescore = "<table class='match-table' cellspacing='0' cellpadding='0'>";
$livescore = ."<tbody>";
foreach($table->find("li") as $line){
$game = $line->find("a/span/img",0)->title;
if( $game == "CS:GO" or $game == "Hearthstone" or $game == "Dota 2" or $game == "StarCraft II" or $game == "League of Legends"){
$opp1 = $line->find("span.opp1",0)->plaintext;
$opp2 = $line->find("span.opp2",0)->plaintext;
$score = $line->find("span.score",0)->plaintext;
$livescore .= "<tr class='margin-tr'>";
$livescore .= "<td class='match-icon'>Icon</td>";
$livescore .= "<td class='match-home'>Opp1</td>";
$livescore .= "<td class='match-score'>score</td>";
$livescore .= "<td class='match-away'>opp2</td>";
$livescore .= "<td class='match-time'>22:00</td>";
$livescore .= "</tr>";
}
}
$livescore = ."</tbody>";
$livescore = ."</table>";
return $livescore;
}
echo getGames();
Your function is called getGames() and you try to access it with echo getGame();
And
$html = str_get_html(getHTML($url,10)); //<<<< $url isn't set

Categories