I try to crawl Twitter search using curl. last month it works but now it got 302 http response. but using browser and postman return 200 OK
this is my curl
$param = "?f=tweets&q=+LAPOR1708&src=typd&max_position=".$scrollCursor;
$url = "https://twitter.com/i/search/timeline".$param;
$ch = curl_init();
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_setopt($ch, CURLOPT_HTTPHEADER, ["Accept: text/html"]);
dd(curl_getinfo($ch));
curl_close($ch);
and this is my curl_getinfo
my image
and response using postman
enter image description here
A 302 response is a redirect.
Postman automatically follows redirects.
cURL does not.
This is normal. You should follow the redirect.
Twitter’s Terms of Service prohibits crawling in this manner. You should use the official developer API to retrieve search results.
Related
Hear I am trying to access NSEIndia.com website URL "https://www1.nseindia.com/live_market/dynaContent/live_watch/stock_watch/niftySmallcap50OnlineStockWatch.json".
this is working fine when I am opening this in browser but it is not working when I try to open this using php file_get_contents.
Please help me or suggest me what should I try another way so I will receive output of this URL in my code.
$url = "https://www1.nseindia.com/live_market/dynaContent/live_watch/stock_watch/niftySmallcap50OnlineStockWatch.json";
echo file_get_contents( $url );
die;
Thank you very much in advance.
See this answer for more info
Basically the webserver is configured in a way that blocks request from file_get_contents.
Maybe try curl?
In the linked question the following code is provided
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
I'm trying to use Instagram API. when I open following link in browser, it's completely fine an you can click on it and see the json response:
https://www.instagram.com/nasa/?__a=1
When I tried to open the same url via file_get_contents() I faced 403 Forbidden Error.
So I tried to use curl. here is my code :
$url = "https://www.instagram.com/nasa/?__a=1";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_close($ch);
var_dump($result);
The problem is $result is an empty string. When I try to get contents using file_get_contents, I face 403 Forbidden Error, and when I try to get contents using curl it return an empty string.
Can Some body help? Tnx.
Edit
I dont get 403 Forbidden in my browser because I'm logged in.
you need to enable cookie support (eg CURLOPT_COOKIEFILE) AND log in before you can access https://www.instagram.com/nasa/?__a=1 , and your curl code never attempts to log in.
here you can see how to log in to Instagram with PHP: https://stackoverflow.com/a/41684531/1067003
I am using Curl on server to curl www.yelp.com, but using Curl on Http(s)://localhost will not output any CSS htmls sheets. I have tried:
CURLOPT_SSL_VERIFYPEER, FALSE
But the issue is not curling the page, it is the fact that my browser Chrome does not seem to recognize any sort of CSS formatting. Any ideas?
For example, running the below code from http://localhost gives a well formatted page. Running the below code from https://localhost gives a page without css.
<?php
$url="http://www.yelp.com/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_TIMEOUT,10);
$cl = curl_exec($ch);
echo $cl;
exit;
I'm using the google text to speech api, but for some reason it's being really slow when I connect to it via php or command line.
I'm doing this:
$this->mp3data = file_get_contents("http://translate.google.com/translate_tts?tl=en&q={$text}");
Where $text is just a urlencoded string.
I've also tried doing it via wget on the command line:
wget http://translate.google.com/translate_tts?tl=en&q=test
Either way takes about 20 seconds or more. Via php it does eventually get the contents and add them to a new file on my server as I want it to. Via wget it times the connection out.
However, if I just go to that url in the browser, it's pretty much instant.
Could anyone shed any light on why this might be occuring?
Thanks.
It's due to how Google parses robots. You need to spoof the User-Agent headers to pretend to be a computer.
Some info on how to go about this would be here:
https://duckduckgo.com/?q=php%20curl%20spoof%20user%20agent
Managed to sort this out now, this is what I ended up doing and now it's only taking a few seconds:
$header=array("Content-Type: audio/mpeg");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $uri);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$this->mp3data = curl_exec($ch);
curl_close($ch);
I'm writing a cURL script, but how can I check if it's working and passing properly when it's visiting the website?
$ckfile = '/tmp/cookies.txt';
$useragent= "Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0_1 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Mobile/7A400";
$ch = curl_init ("http://website.com");
curl_setopt($ch, CURLOPT_AUTOREFERER , true);
=> true
curl_setopt($ch, CURLOPT_USERAGENT, $useragent); // set user agent
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$output = curl_exec ($ch);
curl_close($ch);
just make a php page like this on your server and try your script on your own url
var_dump($_SERVER);
and check the HTTP_USER_AGENT string.
You can also achieve the same things by looking at the Apache logs.
But I am pretty sure curl is setting the User-Agent string like it should ;-)
You'll find the FF extension LiveHTTPHEaders will help you see exactly what happens to the headers when using a normal browsing session.
http://livehttpheaders.mozdev.org/
This will increase your understanding of how your target server responds, and even shows if it redirects your request internally.