Hi I am using following api to get the data from mediawiki. When I copy this url and paste it into a browser, an xml response appears.
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main_Page&rvprop=timestamp|user|comment|content
but when I try to do with curl it gives me the error "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice. ".
I am using following code for this. Can any one trace my error?
$url='http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main_Page&rvprop=timestamp|user|comment|content';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
//curl_setopt($curl, CURLOPT_TIMEOUT, 1);
$objResponse = curl_exec($curl);
curl_close($curl);
echo $objResponse;die;
this will work to bypass there referrer user agent checks:
<?php
function getwiki($url="", $referer="", $userAgent="") {
if($url==""||$referer==""||$userAgent=="") { return false;};
$headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
$headers[] = 'Connection: Keep-Alive';
$headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8';
$user_agent = $userAgent;
$process = curl_init($url);
curl_setopt($process, CURLOPT_HTTPHEADER, $headers);
curl_setopt($process, CURLOPT_HEADER, 0);
curl_setopt($process, CURLOPT_USERAGENT, $user_agent);
curl_setopt($process, CURLOPT_REFERER, $referer);
curl_setopt($process, CURLOPT_TIMEOUT, 30);
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
$return = curl_exec($process);
curl_close($process);
return $return;
}
//edited to include Adam Backstrom's sound advice
echo getwiki('http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main_Page&rvprop=timestamp|user|comment|content', 'http://en.wikipedia.org/', 'Mozilla/5.0 (compatible; YourCoolBot/1.0; +http://yoursite.com/botinfo)');
?>
From the MediaWiki API:Quick start guide:
Pass a User-Agent header that properly identifies your client: don't use the default User-Agent from your client library, but use a custom one including the name of your client and the version number, something like MyCuteBot/0.1.
On Wikimedia wikis, failing to supply a User-Agent header or supplying an empty or generic one will cause the request to fail with an HTTP 403 error. See meta:User-Agent policy. Other MediaWiki wikis may have similar policies.
From meta:User-Agent policy:
If you run a bot, please send a User-Agent header identifying the bot and supplying some way of contacting you, e.g.: User-Agent: MyCoolTool (+http://example.com/MyCoolToolPage/)
Related
I'm using the below code to upload an MP4 file to a web service, using PHP cURL.
I've specified the 'Content-Type' as 'video/mp4', in CURLOPT_HTTPHEADER.
Unfortunately, having uploaded the file, the 'Content-Type' stored for it in the service displays as: "content_type":"video/mp4; boundary=----WebKitFormBoundaryfjNZ5VkJS8z3CB9X"
As you can see, the 'boundary' has been inserted into the 'content_type'.
When I then download the file, it fails to play, with a 'file unsupported/file extension incorrect/file corrupt' message.
$authorization = "Authorization: Bearer [token]";
$args['file'] = curl_file_create('C:\example\example.mp4','video/mp4','example');
$url='[example web service URL]';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: multipart/form-data', 'Accept: application/vnd.mendeley-content-ticket.1+json', $authorization));
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS , $args);
$response = curl_exec($ch); // URL encoded output - needs to be URL encoded to get the HREF link header
curl_close($ch);
Would be extremely grateful for any help, advice or pointers!
Maybe the API doesn't expects a POST multipart, but the actual contents in the body itself:
Ref: How to POST a large amount of data within PHP curl without memory overhead?
You need to use PUT method for the actual contents of the file to go inside the body - if you use POST, it will try to send as a form.
$authorization = "Authorization: Bearer [token]";
$file = 'C:\example\example.mp4';
$infile = fopen($file, 'r');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://api.mendeley.com/file_contents");
curl_setopt($ch, CURLOPT_PUT, 1 ); // needed for file upload
curl_setopt($ch, CURLOPT_INFILESIZE, filesize($file));
curl_setopt($ch, CURLOPT_INFILE, $infile);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST' );
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: video/mp4', 'Accept: application/vnd.mendeley-content-ticket.1+json', $authorization));
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result=curl_exec ($ch);
I have the same problem calling a Openai API from PHP curl.
I send options: Content-Type and Authorization (with my api key) but, when I send request, I receive this error:
Invalid Content-Type header (application/json; boundary=------------------------66a0b850cd1421c8), expected application/json
I tried to use some of your options with no success.
I cannot remove the boundary parameter added automatically.
I'm trying to use Application Only Authentication, as described here:
https://developer.twitter.com/en/docs/basics/authentication/overview/application-only
I'm using the following PHP code to do so.
if(empty($_COOKIE['twitter_auth'])) {
require '../../social_audit_config/twitter_config.php';
$encoded_key = urlencode($api_key);
$encoded_secret = urlencode($api_secret);
$credentials = $encoded_key.":".$encoded_secret;
$encoded_credentials = base64_encode($credentials);
$request_headers = array(
'Host: api.twitter.com',
'User-Agent: BF Sharing Report',
'Authorization: Basic '.$encoded_credentials,
'Content-Type: application/x-www-form-urlencoded;charset=UTF-8',
'Content-Length: 29',
'Accept-Encoding: gzip'
);
print_r($request_headers);
$ch = curl_init();
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_HTTPHEADER, $request_headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL, 'https://api.twitter.com/oauth2/token');
curl_setopt($ch, CURLOPT_POSTFIELDS, 'grant_type=client_credentials');
$attempt_auth = curl_exec($ch);
print_r($attempt_auth);
}
It should return JSON with the token in it, but instead it returns gobbledygook, as seen in the image below:
I'm sure I'm missing some very simple step, where am I going wrong?
If I send the curl request without the headers, it returns an error in JSON format as expected, so is there something wrong with my headers?
You have few options here. Instead of setting header directly, use below
curl_setopt($ch, CURLOPT_ENCODING, 'gzip');
If you set header directly then you should use
print_r(gzdecode($attempt_auth));
See below thread as well
Decode gzipped web page retrieved via cURL in PHP
php - Get compressed contents using cURL
I am trying to login into to a remote site using curl. ( before doing some data scraping)
Using the following code I am producing a cookies.txt file that has the following:
# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
#HttpOnly_www.xxx.com FALSE / TRUE 0 xxxv5 h_r4hXtn-gNAilZwhvHjYdE3Vr4HewhxtGrxja57LbW03-M9MLNqZSeiW7lQ2wRT9lZypNsAiX0gS0Ev1PrvNkGLmwL3B8ZmyOUMLYbTYbSW0y_aPGrIFlEp4skDzh0GJGIGtFHisCmQjEMlu0CJr0UEw2rCT9jbjzg0IyOnFYxNffaMPo229NZWV7HDfCK5M1_y6MPNvW_Kt-h4qTy8YmqGbfBwKxB-bulV78MSXU9ZWz_DVvdu6jXfPiHwCBDMV8FFBLaXm5rqYgNzvbsq8JLe1xkTPn1PNJhyizUa-hlwB6ev8HNwIwBpzs7406l6mL3VgyrDJpay6bHNoMtjh4fLwI7KapFANhFHfn57mg4
#HttpOnly_www.xxx.com FALSE / TRUE 0 ASP.NET_SessionId txakhdi15oeqxyfq53f44dts
When I manually log into the web site the cookie names are correct. So I think I am creating the login ( otherwise the cookies would not be created) but when I output
echo 'HELLO html1 = '.$html1;
I see the page telling me I have entered the wrong username and password.
Code as follows:
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
$username = 'xxx';
$password = 'xxx';
// echo 'STARTING';
//login form action url
$url="https://www.xxxx.com/Login";
$postinfo = "username=".$username."&password=".$password;
$cookie_file_path = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
//set the cookie the site has for certain features, this is optional
curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS,5); // return into a variable
// curl_setopt($ch, CURLOPT_UPLOAD, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST" );
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
// set content length
$headers[] = 'Content-length: 0';
$headers[] = 'Transfer-Encoding: chunked';
curl_setopt($ch, CURLOPT_HTTPHEADER , $headers);
$html1 = curl_exec($ch);
echo 'HELLO html1 = '.$html1;
I cannot show the site for security reasons. ( which may be a killer)
Can anyone point me in the right direction?
first off, this won't work: ini_set('display_startup_errors', 1);
- the startup phase is already finished before the userland php code starts to run,
so this setting is set too late. it must be set in the php.ini config file. (not strictly true, but close enough, like on windows you can do crazy registry hacks to enable it, and you can set it with .user.ini files, etc, more info here http://php.net/manual/en/configuration.php )
second, obvious error here is that you don't urlencode $username and $password in $postinfo = "username=".$username."&password=".$password; -
if the username OR password contains any characters with special meanings in urlencoded format, you'll send the wrong credentials and won't get logged in (this includes &,=,#, spaces, and many other characters). fixed version would look like $postinfo = "username=".urlencode($username)."&password=".urlencode($password);
third, don't use CURLOPT_CUSTOMREQUEST for POST requests,
just use CURLOPT_POST.
fourth, your Content-length header is outright lying. the
correct length is actually 'Content-length: '.strlen($postinfo) - which with your code, is definitely not 0 -
but you shouldn't set this header at all, curl will do it for you
if you don't, and unlike you, curl won't mess up the code calculating
the size, so get rid of the entire line.
fifth, this code is also wrong:
$headers[] = 'Transfer-Encoding: chunked';
your curl code here is NOT using chuncked transfers,
and if it were, curl would send that header automatically,
so get rid of it.
sixth, don't just call curl_setopt, if there's an
error setting any of your options, curl_setopt will return
bool(false), and you should watch out for such errors,
use curl_error to extract the error message, and throw an exception,
if such an error occur. - instead of what your code is doing right now,
silently ignoring any curl_setopt errors. use something like
function ecurl_setopt($ch,int $option, $value){if(!curl_setopt($ch,$option,$value)){throw new \RuntimeException('curl_setopt failed!: '.curl_error($ch));}}
if fixing all of these problems is not enough to log in, you're not giving us enough information to help you any further. what does the browsers http login request look like? or what is the login url?
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
$username = 'xxx';
$password = 'xxx';
//login form action url
$url="https://www.xxxx.com/Login";
$postinfo = array("username"=>$username,"password"=>$password);
$cookie_file_path = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch,CURLOPT_COOKIEFILE,$cookie_file_path);
curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie_file_path);
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
$html = curl_exec($ch);
echo $html;
Above code must works fine.
If there is still an issue, you must check cookie.txt file permissions.
Also if there is an invisible data needs to be sent including post, you can check it using firefox Live Http Headers plugin.
It is not as simple as reading the HTML page using curl. You need to supply a POST value for the submit button. If there is any javascript that executes prior to the activation of ACTION script, then that has to be looked at as well.
Usually you get better results if you use Selenium. See http://www.seleniumhq.org/
EDIT1:
If the server is rejecting your post string try: curl_setopt($handle, CURLOPT_POSTFIELDS, http_build_query($data));
I apologize in advance for my English. I have small problem.
I want to get Final Effective URL from page
streamuj.tv/video/00e276bf5841bf77c8de?streamuj=original&authorize=ac13bb77d3d863ca362315b9b4dcdf3e
When you put a link into the browser gives me to .flv file
But when I put it through PHP gives me s3.streamuj.tv/unauthorized.flv
When I try it through this: getlinkinfo.com/info?link=http%3A%2F%2Fwww.streamuj.tv%2Fvideo%2F00e276bf5841bf77c8de%3Fstreamuj%3Doriginal%26authorize%3Dac13bb77d3d863ca362315b9b4dcdf3e&x=49&y=11
So everything is fine indicates that
s4.streamuj.tv:8080/vid/d0fe77e1020b6414a16aa5316c759add/58aaf1dd/00e276bf5841bf77c8de_hd.flv?start=0
My PHP CODE:
<?php
session_start();
include "simple_html_dom.php";
$proxy = array("189.3.93.114:8080");
$proxyNum = 0;
$proxy = explode(':', $proxy[$proxyNum]);
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://www.streamuj.tv/video/00e276bf5841bf77c8de?streamuj=original&authorize=ac13bb77d3d863ca362315b9b4dcdf3e');
curl_setopt($curl, CURLOPT_FILETIME, true);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_PROXY, $proxy[0]);
curl_setopt($curl, CURLOPT_PROXYPORT, $proxy[1]);
$header = curl_exec($curl);
$info = curl_getinfo($curl);
curl_close($curl);
$u1 = $info['url'];
echo "u1: $u1</br>";
$u2 = str_replace("flv?start=0","flv",$u1);
echo $u2;
?>
Where is the problem? Why it makes unauthorized.flv?
Solution
Server was checking client legitimacy via user-agent HTTP header parameter.
Using custom user-agent solved the problem.
curl_setopt($curl, CURLOPT_HTTPHEADER, array( 'user-agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2950.0 Iron Safari/537.36' ));
Original post:
Most likely the generated flv URL is not pointing to static place. It
probably uses sessionID + cookie / verifies IP (one of these, or
both).
Without knowing what header you have to request with via CURL, you
probably won't get a relevant response.
When i try this code on some other server it works properly, but when i run it on server where is SSL "installed" i get empty string from var_dump.
$feedUrl = 'https://api.pinnaclesports.com/v1/feed?sportid=29&leagueid=1980-1977-1957-1958-1983-2421-2417-2418-2419-1842-1843-2436-2438-2196-2432-2036-2037-1928-1817-2386-2592-2081';
// Set your credentials here, format = clientid:password from your account.
$credentials = base64_encode("password");
// Build the header, the content-type can also be application/json if needed
$header[] = 'Content-length: 0';
$header[] = 'Content-type: application/xml';
$header[] = 'Authorization: Basic ' . $credentials;
// Set up a CURL channel.
$httpChannel = curl_init();
// Prime the channel
curl_setopt($httpChannel, CURLOPT_URL, $feedUrl);
curl_setopt($httpChannel, CURLOPT_RETURNTRANSFER, true);
curl_setopt($httpChannel, CURLOPT_HTTPHEADER, $header);
curl_setopt($httpChannel, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)' );
// Unless you have all the CA certificates installed in your trusted root authority, this should be left as false.
curl_setopt($httpChannel, CURLOPT_SSL_VERIFYPEER, false);
// This fetches the initial feed result. Next we will fetch the update using the fdTime value and the last URL parameter
$initialFeed = curl_exec($httpChannel);
//var_dump($initialFeed);
I already have script on this ssl server who downloads csv files from an other url and it works normally, so i think that problem is in my header, but how it works on other servers, same code?
Try this
Basically says to do:
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_CAINFO, getcwd() . "/CAcerts/BuiltinObjectToken-EquifaxSecureCA.crt");
Or try this