i have a webservice like this
http://www.sample.com/api/v2/exchanges/Web/stocks/Stock/lastN=200
that return json
the main problem is you should login to http://sample.com to see this json otherwise you see 403 Forbidden error
and this website use google authenticate for login, can i use browser cookie with curl for get this json?
this is the code i found but it didnt work for me
function get_content($url,$ref)
{
$browser = $_SERVER['HTTP_USER_AGENT'];
$ch = curl_init();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; // browsers keep this blank.
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $browser);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $ref);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
$html = curl_exec($ch);
curl_close ($ch);
return $html;
}
this website use google authenticate for login
then login on gmail first, gmail will give you a special cookie which should allow you to fetch the json as authenticated.
, can i use browser cookie with curl for get this json?
yup. you can get the cookie by logging in on your browser and check document.cookie in a js terminal, but sounds like a very inconvenient solution
code i found but it didnt work for me
that code does not attempt to authenticate at all. for an example of google-authentication via gmail, check
https://gist.github.com/divinity76/544d7cadd3e88e057ea3504cb8b3bf7e
Related
I know there are many questions like this but I couldn't find any that pinged on the specifics relevant to my case. Either that or they went unanswered.
I've written a pretty simple PHP script and uploaded to a wordPress site in a zip folder and when I try to activate the plugin, WordPress gives me a message reading: "Plugin could not be activated because it triggered a fatal error." It does not actually give me any error message. I have WP_DEBUG, WP_DEBUG_LOG, and WP_DEBUG_DISPLAY all set to true, but none of these are updated upon the supposed error. It seems I have no way of finding out what exactly the fatal error is.
I'm kind've at a loss as to how to proceed with this problem. Any help would be useful.
<?php
/*
Plugin Name: Denrile's Plogger
Plugin URI: http://my-awesomeness-emporium.com
description: >- a plugin to that takes the user to the Pruvan website,
after using CURL to log them in so that the redirect doesn't hit a user authentication wall.
Version: 1.0
Author: John Mauran
Author URI: http://github.com/jmauran91
License: GPL2
*/
$j_username = "Denrile";
$j_password = "*************";
$login_url ="https://titlereporter.direct.pruvan.com/v2/login";
$last_url = "https://titlereporter.direct.pruvan.com/v2/pmgr";
function loginToJulian($url, $username, $password){
$curl = curl_init();
$header[0]= "Accept: application/json, text/javascript, */*; q=0.01";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Content-Type: application/x-www-form-urlencoded";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$verbose = fopen(dirname(__FILE__).'/errorlog.txt', 'w');
curl_setopt($curl, CURLOPT_VERBOSE, true);
curl_setopt($curl, CURLOPT_STDERR, $verbose);
// Make the errors visible in a new file
$payload_username = '"'.$username.'"';
$payload_password = '"'.$password.'"';
$payloadtext=urlencode('{"username":'.$payload_username.',"password":'.$payload_password.'}');
$payload = "payload=".$payloadtext;
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_POSTFIELDS, $payload);
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$store = curl_exec($curl);
curl_close($store);
}
if(isset($_GET['prvn_login'])){
loginToJulian($login_url, $j_username, $j_password);
header("Location: https://titlereporter.direct.pruvan.com/v2/pmgr");
exit();
}
else{
exit();
}
?>
The general idea of this plugin is that it will hook into a javascript-generated A-tag on the wordPress site, CURL Post to another site to login, and then re-direct to that site, hopefully bypassing the user authentication since the user will already be logged in thanks to the CURL.
This code works for me.
But one question, Why you added exit(); inside the else condition, It is breaking the plugin activation process.
Please check and let me know.
<?php
/*
Plugin Name: Denrile's Plogger
Plugin URI: http://my-awesomeness-emporium.com
description: >- a plugin to that takes the user to the Pruvan website,
after using CURL to log them in so that the redirect doesn't hit a user authentication wall.
Version: 1.0
Author: John Mauran
Author URI: http://github.com/jmauran91
License: GPL2
*/
$j_username = "Denrile";
$j_password = "*************";
$login_url ="https://titlereporter.direct.pruvan.com/v2/login";
$last_url = "https://titlereporter.direct.pruvan.com/v2/pmgr";
function loginToJulian($url, $username, $password){
$curl = curl_init();
$header[0]= "Accept: application/json, text/javascript, */*; q=0.01";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Content-Type: application/x-www-form-urlencoded";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$verbose = fopen(dirname(__FILE__).'/errorlog.txt', 'w');
curl_setopt($curl, CURLOPT_VERBOSE, true);
curl_setopt($curl, CURLOPT_STDERR, $verbose);
// Make the errors visible in a new file
$payload_username = '"'.$username.'"';
$payload_password = '"'.$password.'"';
$payloadtext=urlencode('{"username":'.$payload_username.',"password":'.$payload_password.'}');
$payload = "payload=".$payloadtext;
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_POSTFIELDS, $payload);
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$store = curl_exec($curl);
curl_close($store);
}
function default_wordpress_hook(){
if(isset($_GET['prvn_login'])){
loginToJulian($login_url, $j_username, $j_password);
header("Location: https://titlereporter.direct.pruvan.com/v2/pmgr");
exit();
} else {
}
}
add_action("init","default_wordpress_hook");
?>
I'm trying to test server response from a few client websites using cURL to retrieve only the header, wrapped in a microtime call to calculate full execution time (for server roundtrip), and the HTTP status code so that myself and the client can be aware of any issues.
I need to call cURL by server IP with the host defined there as I want to be 100% sure to eliminate DNS server downtime - I'm using another script to make sure my DNS copies are up to date, so that's not an issue.
I'm using the following code which is working on 90% of servers, but the odd few are rejecting with 400 and 404 codes despite being accessible in a browser.
// Setup headers
$header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Host: $this->url";
$header[] = "Keep-Alive: 300";
$header[] = "Pragma: "; // browsers keep this blank.
$starttime = microtime(true);
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "http://{$this->ip}/");
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_USERAGENT,"MyMonitor/UpCheck");
// curl_setopt($curl, CURLOPT_REFERER, 'http://www.mysite.com/');
curl_setopt($curl, CURLOPT_HTTPGET, true);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl, CURLOPT_TIMEOUT, $this->timeout); //timeout in seconds
$this->header = curl_exec($curl);
$this->statuscode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
The code is wrapped within an Object and all relevant variables are correctly passed, trimmed, and sanitised. Because I need to call the server IP, this is passed as CURLOPT_URL, with the URL passed in the Header. I've tried setting the referer, but this didn't help.
Thanks,
If all you need is the first line from header, then using curl is overkill. Using socket functions you can close connection immediately after receiving first line with status code:
$conn = fsockopen($this->ip, 80);
$nl = "\r\n";
fwrite($conn, 'GET / HTTP/1.1'.$nl);
foreach($header as $h) {
fwrite($conn, $h.$nl);
}
fwrite($conn, $nl);
$statusLine = fgets($conn);
fclose($conn);
$status = substr($statusLine, 9, 3);
I'm using the following script to pull the latest post from my Facebook page.
It does this as expected, however, if the Facebook post contains a hyperlink, the link becames garbled & no longer works. Try it out if you can using my code - making sure curl is installed.
<?php
$url = "http://www.facebook.com/feeds/page.php?id=466171083413035&format=json";
// disguises the curl using fake headers and a fake user agent.
function disguise_curl($url)
{
$curl = curl_init();
// Setup headers - the same headers from Firefox version 2.0.0.6
// below was split up because the line was too long.
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; // browsers keep this blank.
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla');
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_REFERER, '');
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
$html = curl_exec($curl); // execute the curl command
curl_close($curl); // close the connection
return $html; // and finally, return $html
}
// uses the function and displays the text off the website
$text = disguise_curl($url);
$json_feed_object = json_decode($text);
$i = 0;
foreach ( $json_feed_object->entries as $entry )
{
echo "<h2>{$entry->title}</h2>";
$published = date("g:i A F j, Y", strtotime($entry->published));
echo "<small>{$published}</small>";
$content = preg_replace("/<img[^>]+\>/i", "", $entry->content);
echo "<p style='word-wrap:break-word;'>{$content}</p>";
echo "<hr />";
$i++;
if ($i == 1) { break;}
}
?>
EDIT
My hyperlink appears as:
http://www.empireonline.com/news/story.asp?NID=36903<br/><br/>
Has anyone ever come across this issue before? Is there a solution?
Many thanks for any pointers.
My bad.
All I needed to was do a string replace on any URLs and append facebook.com.
Here is my code in case it helps any others:
$content = str_replace(' href="/l.php', ' href="http://www.facebook.com/l.php',$content);
I'm trying to make some kind of page parser (more specific - highlighting some words on pages) and i've got some problems with it. I'm getting whole page data from url using curl and most pages are cooperating nicely, while others don't.
My goal is to get all page html just like browser is getting it and I'm trying to use it anonymously - like browser is. I mean - if some page needs log in to show data for browser that doesn't interest me. The problem is that I can't get on Twitter or Facebook pages that I can reach anonymously from regular browser, even when I set all headers just like they are send normally form Firefox or Chrome.
Is there any way to simply emulate browser to get page from these side or I have to use OAuth (and can someone explain why browsers don't need to use it)?
EDIT
I got the solution! If somebody will have problems with that you should:
-> try to switch protocol from https to http
-> get rid of the /#!/ element if there is one in url
-> for my curl element "Accept-Encoding: gzip, deflate" was also causing problems.. dunno why, but now everything is OK
code of mine:
if (substr($this->url,0,5) == 'https')
$this->url = str_replace('https://', 'http://', $this->url);
$this->url = str_replace('/#!/', '/', $this->url);
//check, if a valid url is provided
if(!filter_var($this->url, FILTER_VALIDATE_URL))
return false;
$curl = curl_init();
$header = array();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
// -> gives an error: $header[] = "Accept-Encoding: gzip, deflate";
$header[] = "Accept-Language: pl,en-us;q=0.7,en;q=0.3";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Pragma: "; // browsers keep this blank.
curl_setopt($curl, CURLOPT_HTTPHEADER,$header);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_URL, $this->url);
curl_setopt($curl, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($curl, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($curl, CURLOPT_COOKIESESSION,true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER,1);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 (.NET CLR 3.5.30729)');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
$response = curl_exec($curl);
curl_close($curl);
if ($response) return $response;
return false;
All was in class, but you can extract code very easy. For me it's getting both (twitter and facebook) nicely.
Yes, this is possible to emulate a browser: but you need to carefully watch all the http headers (including cookies) that are sent by the browser, and also handle redirects as well. Some of this can be "automated" by cUrl functions, the rest you'll need to manually handle.
Note: I'm not talking about HTML headers in code; these are HTTP headers sent and received by browsers.
The easiest way to spot these is to user fiddler to monitor the traffic. Choose a URL and look on the right for "inspect element" and you'll see headers that get send, and headers that are received.
Facebook makes this more complicated with a mirad of iFrames, so I suggest you start on a simpler website!
I got the solution! If somebody will have problems with that you should:
-> try to switch protocol from https to http
-> get rid of the /#!/ element if there is one in url
-> for my curl element "Accept-Encoding: gzip, deflate" was also causing problems.. dunno why, but now everything is OK
code of mine:
if (substr($this->url,0,5) == 'https')
$this->url = str_replace('https://', 'http://', $this->url);
$this->url = str_replace('/#!/', '/', $this->url);
//check, if a valid url is provided
if(!filter_var($this->url, FILTER_VALIDATE_URL))
return false;
$curl = curl_init();
$header = array();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
// -> gives an error: $header[] = "Accept-Encoding: gzip, deflate";
$header[] = "Accept-Language: pl,en-us;q=0.7,en;q=0.3";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Pragma: "; // browsers keep this blank.
curl_setopt($curl, CURLOPT_HTTPHEADER,$header);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_URL, $this->url);
curl_setopt($curl, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($curl, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($curl, CURLOPT_COOKIESESSION,true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER,1);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 (.NET CLR 3.5.30729)');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
$response = curl_exec($curl);
curl_close($curl);
if ($response) return $response;
return false;
All was in class, but you can extract code very easy. For me it's getting both (twitter and facebook) nicely.
I'm using simple-html-dom to scrape the title off of a specified site.
<?php
include('simple_html_dom.php');
$html = file_get_html('http://www.pottermore.com/');
foreach($html->find('title') as $element)
echo $element->innertext . '<br>';
?>
Any other site I've tried works, apple.com for example.
But if I input pottermore.com, it doesn't output anything. Pottermore has flash elements on it, but the home screen I'm trying to scrape the title off of has no flash, just html.
This works for me :)
$url = 'http://www.pottermore.com/';
$html = get_html($url);
file_put_contents('page.htm',$html);//just to test what you have downloaded
echo 'The title from: '.$url.' is: '.get_snip($html, '<title>','</title>');
function get_html($url)
{
$ch = curl_init();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; //browsers keep this blank.
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows;U;Windows NT 5.0;en-US;rv:1.4) Gecko/20030624 Netscape/7.1 (ax)');
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE);
$result = curl_exec ($ch);
curl_close ($ch);
return($result);
}
function get_snip($string,$start,$end,$trim_start='1',$trim_end='1')
{
$startpos = strpos($string,$start);
$endpos = strpos($string,$end,$startpos);
if($trim_start!='')
{
$startpos += strlen($start);
}
if($trim_end=='')
{
$endpos += strlen($end);
}
return(substr($string,$startpos,($endpos-$startpos)));
}
Just to confirm what others are saying, if you don't send a user agent string this site sends 403 Forbidden.
Adding this worked for me:
User-Agent: Mozilla/5.0 (Windows;U;Windows NT 5.0;en-US;rv:1.4) Gecko/20030624 Netscape/7.1 (ax)
The function file_get_html uses file_get_contents under the covers. This function can pull data from a URL, but to do so, it sends a User Agent string.
By default, this string is empty. Some webservers use this fact to detect that a non-browser is accessing its data and opt to forbid this.
You can set user_agent in php.ini to control the User Agent string that gets sent. Or, you could try:
ini_set('user_agent','UA-String');
with 'UA-String' set to whatever you like.