Using readability parser API to fetch image url from web page - php

Here I found readability which parse the data from web page and give information of interest.
But I could not understand how to use it.
https://www.readability.com/developers/api/parser
Request: GET /api/content/v1/parser?url=http://blog.readability.com/2011/02/step-up-be-heard-readability-ideas/&token=1b830931777ac7c2ac954e9f0d67df437175e66e
It gives json response.
I am working on PHP, how to make above request for particular url like http://google.com?
UPDATE1:
<?php
define('TOKEN', "1b830931777ac7c2ac954e9f0d67df437175e66e");
define('API_URL', "https://www.readability.com/api/content/v1/parser?url=%s&token=%s");
function get_image($url) {
// sanitize it so we don't break our api url
$encodedUrl = urlencode($url);
//$TOKEN = '1b830931777ac7c2ac954e9f0d67df437175e66e';
//Also tried with $API_URL = 'http://blog.readability.com/2011/02/step-up-be-heard-readability-ideas'; with no luck
// build our url
$url = sprintf(API_URL, $encodedUrl, TOKEN); //Also tried with $TOKEN
// call the api
$response = file_get_contents($url);
if( $response ) {
return false;
}
$json = json_decode($response);
if(!isset($json['lead_image_url'])) {
return false;
}
return $json['lead_image_url'];
}
echo get_image('http://nextbigwhat.com/');
?>
Any issue with code? gives error - >
Warning: file_get_contents(https://www.readability.com/api/content/v1/parser?url=http%3A%2F%2Fnextbigwhat.com&token=1b830931777ac7c2ac954e9f0d67df437175e66e): failed to open stream: HTTP request failed! HTTP/1.1 403 FORBIDDEN in F:\wamp\www\inviteold\test2.php on line 14

Perhaps something like this:
define('TOKEN', "your_token_here");
define('API_URL', "https://www.readability.com/api/content/v1/parser?url=%s&token=%s");
function get_image($url) {
// sanitize it so we don't break our api url
$encodedUrl = urlencode($url);
// build our url
$url = sprintf(API_URL, $encodedUrl, $token);
// call the api
$response = file_get_contents($url);
if( $response ) {
return false;
}
$json = json_decode($response);
if(!isset($json['lead_image_url'])) {
return false;
}
return $json['lead_image_url'];
}
The code is not tested so you might need to adjust it.

Related

How to create a script to check the website is using WordPress?

I'm trying to create a simple script that'll let me know if a website is based off WordPress.
The idea is to check whether I'm getting a 404 from a URL when trying to access its wp-admin like so:
https://www.audi.co.il/wp-admin (which returns "true" because it exists)
When I try to input a URL that does not exist, like "https://www.audi.co.il/wp-blablabla", PHP still returns "true", even though Chrome, when pasting this link to its address bar returns 404 on the network tab.
Why is it so and how can it be fixed?
This is the code (based on another user's answer):
<?php
$file = 'https://www.audi.co.il/wp-blabla';
$file_headers = #get_headers($file);
if(!$file_headers || strpos($file_headers[0], '404 Not Found')) {
$exists = "false";
}
else {
$exists = "true";
}
echo $exists;
You can try to find the wp-admin page and if it is not there then there's a good change it's not WordPress.
function isWordPress($url)
{
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER , 1 );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
// grab URL and pass it to the browser
curl_exec($ch);
$httpStatus = curl_getinfo($ch, CURLINFO_RESPONSE_CODE);
// close cURL resource, and free up system resources
curl_close($ch);
if ( $httpStatus == 200 ) {
return true;
}
return false;
}
if ( isWordPress("http://www.example.com/wp-admin") ) {
// This is WordPress
} else {
// Not WordPress
}
This may not be one hundred percent accurate as some WordPress installations protect the wp-admin URL.
I'm probably late to the party but another way you can easily determine a WordPress site is by crawling the /wp-json. If you're using Guzzle by PHP, you can do this:
function isWorpress($url) {
try {
$http = new \GuzzleHttp\Client();
$response = $http->get(rtrim($url, "/")."/wp-json");
$contents = json_decode($response->getBody()->getContents());
if($contents) {
return true;
}
} catch (\Exception $exception) {
//...
}
return false;
}

Get incoming webhook of woocommerce

I'm trying to receive a JSON of a woocommerce webhook in a tracker.php to manipulate the content, but something is wrong because it doesn't save anything in $_SESSION. This is my code ....
(!isset($_SESSION))? session_start() : null;
if($json = json_decode(file_get_contents("php://input"), true)) {
$data = json_decode($json, true);
$_SESSION["json"] = $data;
} else {
var_dump($_SESSION["json"]);
}
tested the webhook with http://requestbin.fullcontact.com/ and received the content. here a capture
issue is in this line
$data = json_decode($json, true);
here $json is array and jsondecode expect string.
here is code which will work.
(!isset($_SESSION))? session_start() : null;
if($json = json_decode(file_get_contents("php://input"), true)) {
//this seection will execute if you post data.
$_SESSION["json"] = $json;
} else {
//this will execute if you do not post data
var_dump($_SESSION["json"]);
}

PHP5 cURL - When attempting to scrape a page, it loads a blank page

I'm trying to scrape some recipes off a page to use as samples for a school project, but the page just keeps loading a blank page.
I'm following this tutorial - here
This is my code:
<?php
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$continue = true;
$url = curl("https://www.justapinch.com/recipes/main-course/");
while ($continue == true) {
$results_page = curl($url);
$results_page = scrape_between($results_page,"<div id=\"grid-normal\">","<div id=\"rightside-content\"");
$separate_results = explode("<h3 class=\"tight-margin\"",$results_page);
foreach ($separate_results as $separate_result) {
if ($separate_result != "") {
$results_urls[] = "https://www.justapinch.com" . scrape_between($separate_result,"href=\"","\" class=\"");
}
}
// Commented out to test code above
// if (strpos($results_page,"Next Page")) {
// $continue = true;
// $url = scrape_between($results_page,"<nav><div class=\"col-xs-7\">","</div><nav>");
// if (strpos($url,"Back</a>")) {
// $url = scrape_between($url,"Back</a>",">Next Page");
// }
// $url = "https://www.justapinch.com" . scrape_between($url, "href=\"", "\"");
// } else {
// $continue = false;
// }
// sleep(rand(3,5));
print_r($results_urls);
}
?>
I'm using cloud9 and I've installed php5 cURL, and am running apache2. I would appreciate any help.
This is where the problem lies:
$results_page = curl($url);
You tried to fetch content not from a URL, but from a HTML page. Because, right before while(), you set $url to the result of a page. I think you should do the following:
$results_page = curl("https://www.justapinch.com/recipes/main-course/");
edit:
You should change how you query the html to using DOM.
why do people do this? code completely void of error checking, then they go to some forum and ask why is this code, which completely ignores any and all errors, not working? I DONT FKING KNOW, BUT AT LEAST YOU COULD PUT UP SOME ERROR CHECKING AND RUN IT BEFORE ASKING. it's not just you, lots of people are doing it, and its annoying af, and you should all feel bad for doing it. curl_setopt returns bool(false) if there's an error setting the option. curl_exec returns bool(false) if there was an error in the transfer. curl_init returns bool(false) if there was an error creating the curl handle. extract the error description with curl_error, and report it with \RuntimeException. now delete this thread, add some error checking, and if the error checking does not reveal the problem, or it does but you're not sure how to fix it, THEN make a new thread about it.
here's some error-checking function wrappers to get you started:
function ecurl_setopt ( /*resource*/$ch , int $option , /*mixed*/ $value ):bool{
$ret=curl_setopt($ch,$option,$value);
if($ret!==true){
//option should be obvious by stack trace
throw new RuntimeException ( 'curl_setopt() failed. curl_errno: ' . return_var_dump ( curl_errno ($ch) ).'. curl_error: '.curl_error($ch) );
}
return true;
}
function ecurl_exec ( /*resource*/$ch):bool{
$ret=curl_exec($ch);
if($ret!==true){
throw new RuntimeException ( 'curl_exec() failed. curl_errno: ' . return_var_dump ( curl_errno ($ch) ).'. curl_error: '.curl_error($ch) );
}
return true;
}
function return_var_dump(/*...*/){
$args = func_get_args ();
ob_start ();
call_user_func_array ( 'var_dump', $args );
return ob_get_clean ();
}

File_get_content failed to open the stream

I have checked previous related threads, but my problem deals with specific readability API.
I am trying to get most relavant image from web page. I am using redhttps://www.readability.com/developers/api/parser for that.
Here is my code:
<?php
define('TOKEN', "1b830931777ac7c2ac954e9f0d67df437175e66e");
define('API_URL', "https://www.readability.com/api/content/v1/parser?url=%s&token=%s");
function get_image($url) {
// sanitize it so we don't break our api url
$encodedUrl = urlencode($url);
$TOKEN = '1b830931777ac7c2ac954e9f0d67df437175e66e';
$API_URL = 'https://www.readability.com/api/content/v1/parser?url=%s&token=%s';
// $API_URL = 'http://blog.readability.com/2011/02/step-up-be-heard-readability-ideas';
// build our url
$url = sprintf($API_URL, $encodedUrl, $TOKEN);
// call the api
$response = file_get_contents($url);
if( $response ) {
return false;
}
$json = json_decode($response);
if(!isset($json['lead_image_url'])) {
return false;
}
return $json['lead_image_url'];
}
echo get_image('https://www.facebook.com/');
?>
Error:
Warning: file_get_contents(https://www.readability.com/api/content/v1/parser?url=https%3A%2F%2Fwww.facebook.com%2F&token=1b830931777ac7c2ac954e9f0d67df437175e66e): failed to open stream: HTTP request failed! HTTP/1.1 403 FORBIDDEN in F:\wamp\www\inviteold\test2.php on line 16
You seem to be getting content from Facebook.
Due to your error 403 FORBIDDEN, I would say that your parser/code is not authorised to access the facebook page you are accessing and thus you are getting denied due to privacy settings.

file_get_contents HTTP request failed! HTTP/1.1 501 Method Not Implemented

I am trying to grab some information from a json ecoded page, however I'm receiving the HTTP/1.1 501 Method Not Implemented error while assigning the value to the variable $data
if(isset($_GET['xblaccount'])) {
$gamertag = htmlspecialchars(strip_tags($_GET['xblaccount']));
$url = 'http://www.xboxleaders.com/api/profile.json?gamertag=' . urlencode($gamertag);
$data = file_get_contents($url);
}
I've also tried
$gamertag = 'kfj32j8fj23f';
And receive the same error
This is an interesting question because the code does work if you use an actual account name (e.g. Stallionh83 or Major Nelson).
Having discovered this I then checked the headers in a browser (FF) which also returned the 501 response header. I assume therefore that this is then a server side response for non-accounts.
Bearing that in mind I suggest editing your code to:
if(isset($_GET['xblaccount'])) {
$gamertag = htmlspecialchars(strip_tags($_GET['xblaccount']));
$url = 'http://www.xboxleaders.com/api/profile.json?gamertag=' . urlencode($gamertag);
$data = #file_get_contents($url); // # to suppress the warning
if($data){
//Account exists...
}
else{
//Account doesn't exist...
}
}
$data will return FALSE if the account doesn't exist, otherwise it will return the JSON string.

Categories