Stop Curl redirecting to new page - php

I am trying to make curl request to domain: http://xyz.com. here is my code.
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_URL, $strURL);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $arrData);
curl_exec($ch);
While making request it gets redirected to some page within and don't come back to my page.
How can i stop being redirected in middle of curl request.
M sorry guys...
after the suggestion i tried CURLOPT_FOLLOWLOCATION to 0 and it worked... it was my mistake that i didn't remove next line of header redirection and it went on passing and passing...
sorry my mistake.
once more... CURLOPT_FOLLOWLOCATION to 0 wont transfer...

I think because that page checks your user-agent or sets cookies, so you need to try mimic web browser as much as possible.
Like adding user-agent:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7');
Or try set cookie:
$cookieJar = tempnam ("/tmp", "CURLCOOKIE");
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookieJar);
If you provide url maybe i could help more.

Try using the CURLOPT_MAXREDIRS option.
CURLOPT_MAXREDIRS : The maximum amount of HTTP redirections to follow. Use this option alongside CURLOPT_FOLLOWLOCATION.

Try to play with :
CURLOPT_RETURNTRANSFER,
CURLOPT_FOLLOWLOCATION,
CURLOPT_COOKIEJAR,
CURLOPT_COOKIEFILE
And think to log, it's easier to debug with it !
$handle = fopen('log.tmp', 'w');
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, $handle);

Related

PHP Curl gets 403 error, but browser from same machine can request page?

I've got this script working with generally no problems. I say generally, because while it retrieves pages from CNN.com, allrecipes.com, reddit.com, etc - when I point it towards at least one URL (foxnews.com), I get a 403 error instead.
As you can see, I've set the user agent to the same as my machine's browser (that was necessitated by sending a request to Facebook's homepage, which returned a message that the browser wasn't supported).
So, basically wondering what step(s) I need to take to have as many sites as possible recognize the CURL request as coming from a real, actual browser, rather than 403'ing it.
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $this->url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8');
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
Fox News appears to be blocking access to their website from any request passing a USERAGENT. Simply removing the USERAGENT string works fine for me:
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $this->url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
Hope this helps! :)

Cookie.txt disabling

ima having a problem with login via curl function......
My problem is that it would like to be able to login without the cookie.txt.......
because if i remove cookie.txt i cant login........ when cookie.txt is there it logins successfully, but i would like to login without using cookies....... i tried unlinking cookie.txt but as i said i cant login then......
PART OF THE CODE
$ret=false;
$useragent = "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10";
$data = setData($email,$pass);
$ch = curl_init('https://www.website.com/login.php');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_ENCODING , "gzip,deflate");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
$source=curl_exec($ch);
$info=curl_getinfo($ch);
if($info["redirect_count"]==1)
{
$ret=true;
}
You can't loging without using cookies, neither via curl, nor via browser (unless the site you are logging to implements a different mechanism to save the session id, for example as part of the urls for example, but this is rarely the case and it doesn't depend on you). The reason is that without the cookie the server can't know that the request comes from you and not from someone else.
Facebook doesn't implement a login system that doesn't use cookies, so you can't.

error 301 by using function fileget_contents_curl on a url

I have an error curl code 301. I get an error 301 when I made ​​the request to curl leboncoin.fr
I try to solve the problem by adding: curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, 1) in the code of my function curl.
Code work find on one day only. and next day I found again the same code erreor(301 error)
Here are the curl code below:
function file_get_contents_curl($url)
{
$ch = curl_init();
$timeout = 10;
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
**curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);**
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER , 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE , 1);
curl_setopt($ch, CURLOPT_TIMEOUT , 10);
curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
do you have any idea to solve this?
thanks.
If it works one day and there are no changes on your side or the site you are trying to connect, it should be the same result.
In any case, since you know that the address changed, change it in your code to reduce time and steps.
Also, you may be having problems due to the time limit set for the wait on connection, try to increase CURLOPT_CONNECTTIMEOUT a bit, like 13, just in case the servers are taking too long to respond or do the redirection.

Accept cookies using cURL?

I've been trying to get the contents of a webpage using cURL, but have trouble getting cURL to accept cookies.
For example, on Target.com, when I cURL it, it still says that I have to enable cookies.
Here is my code:
$url = "http://www.target.com/p/Acer-Gateway-15-6-Laptop-PC-NV57H77u-with-320GB-Hard-Drive-4GB-Memory-Black/-/A-13996190#?lnk=sc_qi_detailbutton";
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 10); // times out after 4s
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0");
$cookie_file = "cookie1.txt";
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
$result = curl_exec($ch); // run the whole process
curl_close($ch);
echo $result;
What else am I missing?
The cookie1.txt file is 077 permission, by the way.
077 is an malformed permission setting, this means the owner (probably apache) has no access. Try setting it to 644 (owner has read/write) as it's only a file.

php curl data scraping

I have a CURL code to fetch data from a site it is working fine for last few months but suddenly stop working for me it says
HTTP/1.0 302 Moved Temporarily
my code is:
$ch = curl_init();
curl_setopt($ch, CURLOPT_REFERER, $baseUrl);
curl_setopt($ch, CURLOPT_PROXY, $proxy[0]);
curl_setopt($ch, CURLOPT_PROXYPORT, $proxy[1]);
//curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIE , $phpSId);
curl_setopt($ch, CURLOPT_COOKIEJAR , $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE , $cookie);
curl_setopt($ch, CURLOPT_USERAGENT , "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1");
curl_setopt($ch, CURLOPT_TIMEOUT , 40);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST , 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER , 0);
curl_setopt($ch, CURLOPT_URL , $url);
curl_setopt($ch, CURLOPT_HEADER , 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION , 1);
curl_setopt($ch, CURLOPT_POST , 1);
curl_setopt($ch, CURLOPT_POSTFIELDS , $data);
$result = curl_exec($ch);
curl_close ($ch);
unset($ch);
die($result);
Please help, thanks in advance
The specified options already make curl follow redirects. However, in the case of a long redirect chain, you may want to increase CURLOPT_MAXREDIRS.
You can use a packet dumper such as wireshark to check which requests are sent by curl. It may be simply a bug in the scraped website which causes it to redirect infinitely.

Categories