i made curl script , my script get blocked after website detects bot.
how can i prvent it ,
i tried below code
$ch = curl_init();
$proxy = "10.128.60.40:3128";
// needed to disable SSL checks for this site
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,0);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,0);
curl_setopt($ch,CURLOPT_VERBOSE, 0);
curl_setopt($ch,CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0.1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, "$proxy");
but still my i get data as you are blocked due to automated script,
how can u send dynamic ip to avoid this issue
you should use:
1) Anonimous proxies (dies quick and need to parse them)
OR
2) TOR https://www.torproject.org
OR
3) Be not so active. use sleep(1); in your code
Related
using cURL I'm trying to transfer to my server a zip file from another server.
after authentication this second server gives me a url that contains even parameters as credentials:
http://content.website.com/file_to_download.zip?nvb=20160622094506&nva=20160622095506&hash=089366e5fe3da46f9caf2
if I surf this url to another web browser or on a private session I can download the content without problems (doesn't ask for login), but if I send the url to my function I get an empity file.
this is my function
function download ($zipUrl, $zipFilename){
$ch = curl_init();
$fp = fopen ($zipFilename, 'w+');
$ch = curl_init($zipUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2'));
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_setopt($ch, CURLOPT_POSTFIELDS, "nvb=20160622094506&nva=20160622095506&hash=089366e5fe3da46f9caf2");
curl_exec($ch);
curl_close($ch);
fclose($fp);
}
what's wrong?
If curl_exec returns error, you can read that error with curl_error. Take a look what http code server returned so you can understand clearly what's going on.
I have this link I want to parse some information in it or just save it in a file...
can't do it without this simple code:
Example:
<?php
$myFile = 'test.txt';
$get= file_get_contents("http://www.ticketmaster.com/json/resale?command=get_resale_listings&event_id=0C004B290BF2D95F");
file_put_contents($myFile, $get); ?>
The output is:
{"version":1.1,"error":{"invalid":{"cookies":true}},"command":"get_resale_listings"}
I tried many other things like fopen or include did not work either. I don't understand because when I put the url in the browser it shows exactly ALL the code (google chrome) OR even better ask me to save it as a file (explorer). Looks like a browser cookies or something that doesn't load on my localhost ??
thanks for your tips.
You need to access that url with CURL.
The server checks if the client has cookies enabled. Using file_get_content() You do not send any information about client (browser).
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.ticketmaster.com/json/resale?command=get_resale_listings&event_id=0C004B290BF2D95F');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_exec($ch);
Trying to automate login to a ASP.NET site using PHP & cURL but running into a cookie problem.
When I check in the browser, initial login page stores 5 cookies. Which are ASP.NET_SessionId, __utma, __utmb, __utmc & __utmz
When this page is accessed via cURL the cookie file is storing only one cookie: "ASP.NET_SessionId"
I referred to many posts & tried all kinds of cURL option combinations returning the same result.
I don't know how ASP.NET cookies work or differ from PHP. Any help is appreciated.
Here is my php code:
$cookie_file_path = "tmp/cookie.txt";
$LOGINURL = "https://godaddy.com";
$agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_URL, $LOGINURL);
$content = curl_exec($ch);
curl_close($ch);
echo '<textarea style="width:1000px; height:300px">'.$content.'</textarea>';
__utma, __utmb, __utmc & __utmz are all Google Analytics cookies stored by javascript, thus being created client side.
So no way of processing them through cURL / PHP
I am working on a scraping script. It works on most websites but I cannot access a specific SSL site.
Here is my code:
if (!extension_loaded('openssl')){
// not occurring
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.chase.com/');
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
if($result === false)
{
$err = curl_error($ch);
//$err = SSL read: error:00000000:lib(0):func(0):reason(0), errno 10054
}
$result is always FALSE, and it shows this error message:
SSL read: error:00000000:lib(0):func(0):reason(0), errno 10054
But it works on other websites that have SSL. I also checked phpinfo(), cURL and OpenSSL are active. I am using WAMP, any ideas?
You need to set a Useragent. I tested with and without one and it fixes the issue. It appears Chase is wanting a UA to be provided in the request.
So add this:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US)');
I solved issue by just using following PHP librery.
https://github.com/rmccue/Requests
[use this library code on your Linux based server, may be it will not work on xampp or wamp ]
I'm only able to fetch from a site when I use cURL with a proxy. cURL without a proxy and file_get_contents() return nothing (cURL HTTP code "0" and curl_error()
Empty reply from server). I'm able to fetch other sites just fine without a proxy.
Aside from being blocked, is there any other possible explanation of why I can only access this site via proxy?
Did you set a USER AGENT in cURL? Sometimes websites will block you if your USER AGENT isn't set or if your HTTP request looks suspicious.
To set your USER AGENT in PHP:
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
Is this from your workplace or something? Many companies disable file_get_contents() on shared PHP installs, as it's quite risky.
The site probably has user agent detection. You can fake that in your curl call but I don't believe that's possible with file_get_contents(). Another method sites use is to only display content once a cookie has been set so site scrapers will never see the data.
Try this:
function curl_scrape($url,$data,$proxy,$proxystatus)
{
$fp = fopen("cookie.txt", "w");
fclose($fp);
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if ($proxystatus == 'on')
{
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
ob_start(); // prevent any output
return curl_exec ($ch); // execute the curl command
ob_end_clean(); // stop preventing output
curl_close ($ch);
unset($ch);
}
I'm guessing I was truly blocked. Using proxy now and it works fine.