How to get a webpage containing an external API in php - php

I have a php script that loads this webpage to extract some data from it's tables.
The following methods failed to get it's table contents:
Using file_get_contents:
$document -> file_get_contents("http://www.webpage.com/");
print_r($document);
Using cURL:
$document = curl_init('http://www.webpage.com/');
curl_setopt($document, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($document);
print_r($html);
Using loadHTMLFile:
$document->loadHTMLFile('http://www.webpage.com/');
print_r($document);
I'm not an expert in php and except the first method, the other ones are copied from StackOverflow's answers.
What am I doing wrong?
and How they do block some contents from loading?

Not the answer you're likely to want to hear, but none of the methods you describe will evaluate JavaScript and other browser resources as a normal browser client would. Instead, each of those methods retrieves the contents of only the file you've specified. A quick glance at the site you're targeting clearly shows this table in question being populated as the result of an AJAX call, which none of the methods you've tried are able to evaluate.
You'll need to lean on a library or script that has the capability for this type of emulation; namely laravel/dusk, the PHP bindings for Selenium webdriver, or something similar.

This is what I did to scrape data from a webpage using php curl:
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
// Defining the basic scraping function
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$target_url = "https://www.somesite.com";
$scraped_website = curl($target_url);
$data_set_1 = scrape_between($scraped_website, "%before%", "%after%");
$data_set_2 = scrape_between($scraped_website, "%before%", "%after%");
The %before% and %after% is data that always shows up on the webpage before and after the data you wish to grab. Could be div tags or some other html tags that are unique to the data you wish to grab.

So maybe look into using curl and and imitate the same ajax request that the site is using? When I searched for that, this is what I found:
Mimicking an ajax call with Curl PHP

Related

how to get unknown string in $_GET method through URL

I want to pass a string from one PHP file to another using $_GET method. This string has different value each time it is being passed. As I understand, you pass GET parameters over a URL and you have to explicitly tell what the parameter is. What if you want to return whatever the string value is from providing server to server requesting it? I want to pass in json data format. Additionally how do I send it as Ajax?
Server (get.php):
<?php
$tagID = '123456'; //this is different every time
$tag = array('tagID' => $_GET['tagID']);
echo json_encode($tag);
?>
Server (rec.php):
<?php
$url = "http://192.168.12.169/RFID2/get.php?tagID=".$tagID;
$json = file_get_contents($url);
#var_dump($json);
$data = json_decode($json);
#var_dump($data);
echo $data;
?>
If I understand correctly, you want to get the tagID from the server? You can simply pass a 'request' parameter to the server that tells the server what to return.
EDIT: This really isn't the proper way to implement an API (like, at all), but for the sake of answering your question, this is how:
Server
switch($_GET['request']) {
case 'tagID';
echo json_encode($tag);
break;
}
You can now get the tagID with a URL like 192.168.12.169/get.php?request=tagId
Client (PHP with CURL)
When it comes to the client it gets a bit more complicated. You mention AJAX, but that will only work for JavaScript. Your php file can't use AJAX, you'll have to use cURL.
$request = "?request=tagID";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, '192.168.12.169/get.php' . $request);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '3');
$content = trim(curl_exec($ch));
curl_close($ch);
echo $content;
EDIT: added the working cURL example just for completeness.
Included cURL example from: How to switch from POST to GET in PHP CURL
Client (Javascript with AJAX)
$.get("192.168.12.169/get.php?request=tagId", function(data) {
alert(data);
});

How to call posts from PHP

I have a website, that uses WP Super Cache plugin. I need to recycle cache once a day and then I need to call 5 posts (URL adresses) so WP Super Cache put these posts into cache again (caching is quite time consuming so I'd like to have it precached before users come so they dont have to wait).
On my hosting I can use a CRON but only for 1 call/hour. And I need to call 5 different URL's at once.
Is it possible to do that? Maybe create one HTML page with these 5 posts in iframe? Will something like that work?
Edit: Shell is not available, so I have to use PHP scripting.
The easiest way to do it in PHP is to use file_get_contents() (fopen() also works), if the HTTP stream wrapper is enabled on your server:
<?php
$postUrls = array(
'http://my.site.here/post1',
'http://my.site.here/post2',
'http://my.site.here/post3',
'http://my.site.here/post4',
'http://my.site.here/post5',
);
foreach ($postUrls as $url) {
// Get the post as an user will do it
$text = file_get_contents();
// Here you can check if the request was successful
// For example, use strpos() or regex to find a piece of text you expect
// to find in the post
// Replace 'copyright bla, bla, bla' with a piece of text you display
// in the footer of your site
if (strpos($text, 'copyright bla, bla, bla') === FALSE) {
echo('Retrieval of '.$url." failed.\n");
}
}
If file_get_contents() fails to open the URLs on your server (some ISP restrict this behaviour) you can try to use curl:
function curl_get_contents($url)
{
$ch = curl_init($url);
curl_setopt_array($ch, array(
CURLOPT_CONNECTTIMEOUT => 30, // timeout in seconds
CURLOPT_RETURNTRANSFER => TRUE, // tell curl to return the page content instead of just TRUE/FALSE
));
$text = curl_exec($ch);
curl_close($ch);
return $text;
}
Then use the function curl_get_contents() listed above instead of file_get_contents().
An example using PHP without building a cURL request.
Using PHP's shell exec, you can have an extremely light function like so :
$siteList = array("http://url1", "http://url2", "http://url3", "http://url4", "http://url5");
foreach ($siteList as &$site) {
$request = shell_exec('wget '.$site);
}
Now of course this is not the most concise answer and not always a good solution also, if you actually want anything from the response you will have to work with it a different way to cURLbut its a low impact option.
Thanks to Arkascha tip I created a PHP page that I call from CRON. This page contains simple function using cURL:
function cache_it($Url){
if (!function_exists('curl_init')){
die('No cURL, sorry!');
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 50); //higher timeout needed for cache to load
curl_exec($ch); //dont need it as output, otherwise $output = curl_exec($ch);
curl_close($ch);
}
cache_it('http://www.mywebsite.com/url1');
cache_it('http://www.mywebsite.com/url2');
cache_it('http://www.mywebsite.com/url3');
cache_it('http://www.mywebsite.com/url4');

PHP + cURL getting loading local xml while latest is retrieved

I'm trying to load an xml file from another website. I can do this using cURL using the following:
function getLatestPlayerXML($par1) {
$url = "http://somewebsite/page.php?par1=".$par1;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xmlresponse = curl_exec($ch);
$xml = simplexml_load_string($xmlresponse);
$xml->asXML("./userxml/".$par1.".xml");
return $xml;
}
This works all well and good, however, the external website takes a long time to respond with the file, which is why I save the xml file to ./userxml/$par1.xml which also works. I load like this:
function getLocalPlayerXML($par1) {
$xml = simplexml_load_file("./userxml/".$par1.".xml");
if($xml != False) {
// How can I make it so that when called it only temporarily uses this file until the latest is available?
return $xml;
} else {
return $getLatestPlayerXML($par1);
}
}
The problem I am having is that I want it so when I call a single load function it first tries to load the xml from file and if it exists use that file until the latest file has been received at which point, update the page. If the file does not exist, simply wait until the latest file has been retrieved and then use that. Is even possible?

Need help converting my beautiful soup (Python) code to PHP

I'm currently in the process of converting all of my beautiful soup code into PHP just to get used to PHP. However i've run into a bit of a problem, my php code will only work when the wiki page has 'External links' after the original run in the html (such as True Detective Wiki). I just found out that this won't always happen because there may not always be an 'External links' section. I was wondering if there was anyway to convert my beautiful soup code into php code using the same technique my beautiful soup code uses?
import requests, re
from bs4 import BeautifulSoup
def get_date(url):
r = requests.get(url)
soup = BeautifulSoup(r.content)
date = soup.find_all("table", {"class": "infobox"})
for item in date:
dates = item.find_all("th")
for item2 in dates:
if item2.text == "Original run":
test2 = item2.find_next("td").text.encode("utf-8")
mysub = re.sub(r'\([^)]*\)', '', test2)
return my sub
and here is my php code currently
<?php
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
?>
<?php
// Defining the basic scraping function
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
?>
<?php
$scraped_page = curl("http://en.wikipedia.org/wiki/The_Walking_Dead_(TV_series)"); // Downloading IMDB home page to variable $scraped_page
$scraped_data = scrape_between($scraped_page, "<table class=\"infobox vevent\" style=\"width:22em\">", "</table>"); // Scraping downloaded dara in $scraped_page for content between <title> and </title> tags
$original_run = mb_substr($scraped_data, strpos($scraped_data, "Original run")-2, strpos($scraped_data, "External links") - strpos($scraped_data, "Original run")-2);
echo $original_run;
?>
Have you considered simply using Wikipedia API? Autogenerated wiki markup is generally incredibly terrible to deal with and may change at any time.
Additionally, instead of trying to regex-parse HTML or something in PHP, just use the phpQuery library with composer, you can just search for the selector table.infobox.vevent.

Display only certain HTML using PHP's DOMDocument

I am looking to draw html of a webpage inside my website.
Take this scenario:
I have a website that checks availability of a hotel. But instead of hosting that hotel's images on my server. I simple curl, a specific page on the hotels website that contains their images.
Can I grab anything from the html and display it on my website? using their HTML code, but only the div(s) or images that i want to display?
I'm using this code, sourced from:
http://davidwalsh.name/download-urls-content-php-curl
As practice and arguments sake, lets try and display Google's logo from their homepage.
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('http://www.google.com');
echo '<base href="http://www.google.com/" />';
echo $returned_content;
Thanks to #alex I have started to play with DOMDocument in PHP's lib. However, I have hit a snag.
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$url = "www.abc.net.au";
$html = get_data($url);
$dom = new DOMDocument;
#$dom->loadHTML($html);
$logo = $dom->getElementById("abcLogo");
var_dump($logo);
Returns: object(DOMElement)[2]
How do I parse this further? Or Simply just print/echo the contents of the DIV with that id..?
Yes, run the resulting HTML through something like DOMDocument to extract the portions you require.
Once you have found a DOM element, it can be a bit tricky to get the HTML of the element itself (rather than just its contents).
You can get the XML value of a single element very easily with DOMDocument::saveXML:
echo $dom->saveXML($logo);
This may be good enough for you. I believe there is a change coming that will add this functionality to saveHTML as well.
echo $logo->nodeValue should work because you can only have 1 element by id!

Categories