I'm trying to use Instagram API. when I open following link in browser, it's completely fine an you can click on it and see the json response:
https://www.instagram.com/nasa/?__a=1
When I tried to open the same url via file_get_contents() I faced 403 Forbidden Error.
So I tried to use curl. here is my code :
$url = "https://www.instagram.com/nasa/?__a=1";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_close($ch);
var_dump($result);
The problem is $result is an empty string. When I try to get contents using file_get_contents, I face 403 Forbidden Error, and when I try to get contents using curl it return an empty string.
Can Some body help? Tnx.
Edit
I dont get 403 Forbidden in my browser because I'm logged in.
you need to enable cookie support (eg CURLOPT_COOKIEFILE) AND log in before you can access https://www.instagram.com/nasa/?__a=1 , and your curl code never attempts to log in.
here you can see how to log in to Instagram with PHP: https://stackoverflow.com/a/41684531/1067003
Related
Hear I am trying to access NSEIndia.com website URL "https://www1.nseindia.com/live_market/dynaContent/live_watch/stock_watch/niftySmallcap50OnlineStockWatch.json".
this is working fine when I am opening this in browser but it is not working when I try to open this using php file_get_contents.
Please help me or suggest me what should I try another way so I will receive output of this URL in my code.
$url = "https://www1.nseindia.com/live_market/dynaContent/live_watch/stock_watch/niftySmallcap50OnlineStockWatch.json";
echo file_get_contents( $url );
die;
Thank you very much in advance.
See this answer for more info
Basically the webserver is configured in a way that blocks request from file_get_contents.
Maybe try curl?
In the linked question the following code is provided
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
When I am looking at
https://www.tutti.ch/de/vi/zaurich/haushalt/geraate-utensilien/tassen-und-unterteller-arv-ikea-blaue-streifen/27002681
with a browser, I see a complete other site than when I use:
file_get_contents(...) // or
$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,...);
$result=curl_exec($ch);
var_dump($result);`
How can I get the html code like seen with the browser?
The html on this website is rendered in the client side by the browser using javascript. If you are trying to parse some content from the website, try using a headless browser. A headless browser is a browser that works without the graphical interface, but behaves like a normal browser. Both Chrome and Firefox have headless versions.
Here is a useful lib to query headless browsers with php: https://github.com/php-webdriver/php-webdriver
You can also interact with the javascript send commands like a real user would do.
You may install the browser and the driver in a different machine (or even your own pc) if you don't have the necessary permissions to do it in your hosting account.
I try to crawl Twitter search using curl. last month it works but now it got 302 http response. but using browser and postman return 200 OK
this is my curl
$param = "?f=tweets&q=+LAPOR1708&src=typd&max_position=".$scrollCursor;
$url = "https://twitter.com/i/search/timeline".$param;
$ch = curl_init();
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_setopt($ch, CURLOPT_HTTPHEADER, ["Accept: text/html"]);
dd(curl_getinfo($ch));
curl_close($ch);
and this is my curl_getinfo
my image
and response using postman
enter image description here
A 302 response is a redirect.
Postman automatically follows redirects.
cURL does not.
This is normal. You should follow the redirect.
Twitter’s Terms of Service prohibits crawling in this manner. You should use the official developer API to retrieve search results.
I am transferring an Object Array. I have a cURL client (submitter) on own Server and listening script on other's Server, which one is not under my control. Then i think there, they are blocking the incoming cURL requests because when i test with the normal HTML <form>, it is working. But not via cURL anyway.
So i think they have done some restriction to cURL.
Then my questions here are:
Can a Server restrict/block the cURL incoming requests?
If so, can i trick/change the HTTP Header (User Agent) in my initiating cURL script?
Or is there any other possible stories?
Thanks!
IF you are still facing the problem then do the following.
1.
$config['useragent'] = 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0';
curl_setopt($curl, CURLOPT_USERAGENT, $config['useragent']);
curl_setopt($curl, CURLOPT_REFERER, 'https://www.domain.com/');
2.
$dir = dirname(__FILE__);
$config['cookie_file'] = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';
curl_setopt($curl, CURLOPT_COOKIEFILE, $config['cookie_file']);
curl_setopt($curl, CURLOPT_COOKIEJAR, $config['cookie_file']);
NOTE: You need a COOKIES folder in directory.
3.
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
If doing these don't solve the problem then Give the Sample Input/Output/Error/etc.
So, that more precise solution can be provided.
$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)';
$curl=curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, $agent);
In the server side, we can block some requests by recognize the header fields(including refer, cookie, user-agent and so on) in http request, the ip address, access frequency. And in most case, requests generated by machine usually has something different than human requests,for example, no refer & cookie, or with higher access frequency, we can write some rules to deny these requests.
According to 1, you can try your best to simulate real requests by filling the header fields, using random and slower frequency, using more ip addresses. (sounds like attack)
Generally, using lower frequency and do not make heavy load for their server, follow their access rules, they will seldom block your requests.
Server cannot block only cURL requests because they are just HTTP requests. So changing User Agent of your cURL can solve your problem, as server will think you are connecting through browser presented in UA.
Example of curl GET call in php.
ftp file in a variable.
The solution was on Stackoverflow... where ?!?
not mine.
BTW, you need to be able to execute php code from within html
modify your /etc/apache2/mods-enabled' edit '#mime.conf
if you want to do so...
Go to end of file and add the following line:
"AddType application/x-httpd-php .html .htm"
BEFORE tag '< /ifModules >'
verified and tested with 'apache 2.4.23' and 'php 5.6.17-1' under 'debian'
I choose to execute php in html file because faster development.
example code begin :
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<?php
$host = "https://tgftp.nws.noaa.gov/data/observations/metar/decoded/CYHU.TXT";
$agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $host);
curl_setopt($curl, CURLOPT_USERAGENT, $agent);
curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1) ;
curl_exec($curl);
$ftp_result=curl_exec($curl);
print_r($ftp_result);
//and the big work commencing,
//extracting text ...
$zelocation="";
$zedatetime="";
$zewinddirection="";
$zewindspeed="";
$zeskyconditions="";
$zetemp="";
$zehumidity="";
?>
</body>
</html>
I've faced the same issue when I was trying login to a website using CURL, the server was rejecting my request until I've sent the user-agent header and the cookies returned when entering the login page, however, you can use this curl library if you don't familiar with curl.
$curl = new Curl();
$curl->setHeaders('user-agent', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0');
// Disable SSL verification
$curl->setOpt(CURLOPT_SSL_VERIFYPEER, '0');
$curl->post($url, $data);
$response = $curl->getRawResponse();
Can anybody tell me why this CURL code only works on my local server and not on live server?
Tried on 3 different hosting and nothing works.
Checked everything on live
1) Curl enabled
2) PHP version is OK
3) Curl executes without any error but no result
Its been 3 days and I am not able to find any solution
please help.
error_reporting(1);
set_time_limit(1500);
$fname=time().'_myfile.flv';
header('Content-type: video/x-flv');
header("Content-Type: application/octet-stream");
header("Content-Disposition: attachment; filename=\"$fname\"");
define('USERAGENT', "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.2; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)");
$url='http://v3.lscache5.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Calgorithm%2Cburst%2Cfactor&fexp=914010%2C907605&algorithm=throttle-factor&itag=34&ip=112.0.0.0&burst=40&sver=3&signature=D51A660BDF83B54B3584425DBE8930D5D0F805E1.B3FB21D0CAF625D36A17B558A0A653F20788B49F&expire=1313503200&key=yt1&ipbits=8&factor=1.25&id=1cacd26a9913e4ec';
$ch = curl_init() or die("Error");
curl_setopt($ch, CURLOPT_USERAGENT, USERAGENT);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
if(curl_exec($ch) === FALSE)
{
die("Curl failed: " . curl_error($ch)); // Never goes here
}
curl_close($ch);
?>
I had this issue for several days, was not able to find any errors in the PHP and the cURL response seemed to come back completely null. Finally found suggested code to put into the cURL request
if (curl_exec($curl) === FALSE) {
die("Curl Failed: " . curl_error($curl));
} else {
return curl_exec($curl);
}
Adding this finally gave me an error in the PHP which was:
Curl Failed: SSL certificate problem, verify that the CA cert is OK.
Details: error:14090086:SSL
routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Searching that error on SO gave me this: HTTPS and SSL3_GET_SERVER_CERTIFICATE:certificate verify failed, CA is OK
Which lead me to add code to my cURL request that essentially disables the SSL verification.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
I'm not working w/ my server admin to find a better solution because I don't know if this workaround is so good, but for now this works.
Do a curl_getinfo($ch) after exec to see the response code returned by the server.
Test it with
error_reporting(E_ALL);
Some things to check:
Your value for error_reporting() is pretty weird: 1 equals E_ERROR and implies that you are ignoring almost everything, including warnings. I would not recommend that even in production, not to mention development.
You define $header_list and never use it.
Make sure you are inspecting the real ouput, not the output as rendered by a browser.
I'm sure it'll help you (I've been working through this problem alone for a few days)
add an agent to your cURL:
(option when we have crul_opt in the array)
CURLOPT_USERAGENT => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100508 SeaMonkey/2.0.4',
(option when setting curl individually)
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1');
I noticed that it works because postman gave me the code that I pasted mindlessly into the files, although it did not include an agent (for localhost and in postman it works) and live hosting did not support it. (the postman himself adds his agent)