we are have next headers when we send post:
POST http://www.autonavigator.ru/dispatcher.pl HTTP/1.1
Host: www.autonavigator.ru
Connection: keep-alive
Content-Length: 55
Origin: http://www.autonavigator.ru
X-Request: JSON
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36
Content-type: application/x-www-form-urlencoded; charset=UTF-8
Accept: application/json
X-Requested-With: XMLHttpRequest
Referer: http://www.autonavigator.ru/my/offer_add/
Accept-Encoding: gzip,deflate
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4
Cookie: region_id=45; city_id=22; user_name=rora%40gmail.com; user_type=user; user_offer_count=1; user_message_count=0; user_no_confirm=1; session_id=WR9q4d41DgD7biTOOsMzgtXfJm83VFQn; USession=WR9q4d41DgD7biTOOsMzgtXfJm83VFQn; _ym_visorc_5781676=b
class=list&method=make&show_all=1&vehicle=car&type=used
I would like emulate browser with curl.
For this i use next code:
$ch = curl_init('http://www.autonavigator.ru/dispatcher.pl');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36");
$headers = array
(
'Accept: application/json',
'Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4',
'Accept-Encoding: gzip,deflate',
'Accept-Charset: windows-1251,utf-8;q=0.7,*;q=0.7'
);
curl_setopt($ch, CURLOPT_HTTPHEADER,$headers);
curl_setopt($ch, CURLOPT_REFERER, "http://www.autonavigator.ru/my/offer_add/");
curl_setopt($ch, CURLOPT_POSTFIELDS, 'class=list&method=make&show_all=1&vehicle=car&type=used');
$result = curl_exec($ch);
curl_close($ch);
var_dump($result);
But in result we get some errors(http://i.stack.imgur.com/zWkdP.png):
Tell me please where error in code an how will be right ?
Most likely like the content is gzipped, so you just need to do:
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
Related
I use the PHP cURL code from this answer https://stackoverflow.com/a/46834320/12616388. When I run the script on localhost I get the desired output. If I run it from my web server, I retrieve a captcha to verify that I am not a bot. I am new to this topic and would like to know the cause. My code:
$request = array();
//$request[] = 'host:www.amazon.com';
$request[] = 'Connection: keep-alive';
$request[] = 'Pragma: no-cache';
$request[] = 'Cache-Control: no-cache';
$request[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
$request[] = 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0';//Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36';
$request[] = 'DNT: 1';
$request[] = 'Accept-Encoding: gzip, deflate';
$request[] = 'Accept-Language: en-US,en;q=0.8';
$url = 'https://www.amazon.de/Wenn-Dunkeln-Sterne-funkeln-Puste-Licht-Buch/dp/3480236529/ref=sr_1_3?keywords=buch&qid=1670662644&sr=8-3';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
$output = curl_exec($ch);
EDIT:
I slightly modified the code (random user agent string and multiple cURL requests in a loop) but the problems are the same: on localhost no problems on the webserver I get the captchas).
$user_agents = array('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:89.0) Gecko/20100101 Firefox/89.0', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (K HTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0', 'Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0', 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0');
foreach ($products as $key => $value) {
$request = array();
$request[] = 'Connection: keep-alive';
$request[] = 'Pragma: no-cache';
$request[] = 'Cache-Control: no-cache';
$request[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
$request[] = 'User-Agent: ' . $user_agents[array_rand($user_agents)];
$request[] = 'DNT: 1';
$request[] = 'Accept-Encoding: gzip, deflate';
$request[] = 'Accept-Language: en-US,en;q=0.8';
$url = $value['url'];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
$output = curl_exec($ch);
...
}
Since it only gets triggered when you're on the server, the captcha probably tracks IP addresses. Any chance it's a Recaptcha?
Whatever the captcha is, one thing that could help is solving the captcha from the webserver's IP address.
If the webserver has a desktop environment, connect via VNC (or whatever you ususally use for connecting), open a browser and solve the captcha.
If it does not, try setting up a VPN server on the webserver (this one seems easy enough), connect to the VPN from your computer (and thus get the same IP address as your webserver), open a browser and solve the captcha.
Another option is creating a proxy server which will achieve similar result to VPN.
Sadly you'll have to do it from time to time because that's exactly what captcha is for - preventing automated scrapping of websites via bots.
To fix this, you can try to include additional headers or cookies in your cURL request to make it appear more like a real user. For example, you could include the User-Agent header to specify the browser and operating system that your cURL request is coming from, and also you could include the Cookie header to include cookies that are typically sent by a real user.
For example:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Include additional headers
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
'Cookie: __cfduid=<cookie-data-goes-here>'
));
$response = curl_exec($ch);
I'm trying to make this to work, but it only works if I remove the two lines that are commented out. If I enable it stops working with a Error 404 (Not Found). Why?
Thanks in advance.
$headers = [
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding: gzip, deflate, br',
'Accept-Language: pt-PT,pt;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control: no-cache',
'Content-Type: application/x-www-form-urlencoded; charset=utf-8',
'Host: httpbin.org',
'Referer: http://www.google.com',
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'X-MicrosoftAjax: Delta=true',
'X-Amzn-Trace-Id: Root=1-60e1dc05-44eb1f0a7cff152139d79c76'
];
$url = $_POST['URL'];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36');
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com');
//curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$get = curl_exec($ch);
I was scraping redirection result with curl spoofing. After almost a year getting what i want without problem, suddenly it stopped working. On browser, its working perfectly, but with curl I only get a url with 404 error on another server. I am getting redirected to the wrong url. Here is my code.
function curl_spoofred($url)
{
$curl = curl_init();
//set some headers if you want
$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
curl_setopt($curl, CURLOPT_URL, $url);
//Spoof the agent
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36');
//Spoof the Referer
curl_setopt($curl, CURLOPT_REFERER, 'https://wwv.example.com');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_COOKIESESSION, true);
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate,br');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
if (!$html = curl_exec($curl))
{
}
return curl_getinfo($curl,CURLINFO_EFFECTIVE_URL );
curl_close($curl);
}
After checking the network console in Chrome, it gave me this :
Request URL: https://www.example.com/video.php?p=2&c=V1RKa2RHTlRhXbTFhUbmtvMVRWYzRQUT09&id=631
Request Method: GET
Status Code: 301
Remote Address: 104.26.5.130:443
Referrer Policy: no-referrer-when-downgrade
cache-control: max-age=3600
cf-ray: 51b45707fb1969aa-CDG
date: Tue, 24 Sep 2019 11:15:20 GMT
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
expires: Tue, 24 Sep 2019 12:15:20 GMT
location: https://www.example.com/video.php?p=2&c=V1RKa2RHTlRhXbTFhUbmtvMVRWYzRQUT09&id=631
server: cloudflare
status: 301
vary: Accept-Encoding
:authority: www.example.com
:method: GET
:path: /video.php?p=2&c=V1RKa2RHTlRhXbTFhUbmtvMVRWYzRQUT09&id=631
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
accept-encoding: gzip, deflate, br
accept-language: fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7,ar;q=0.6
cookie: __cfduid=d948bdac17ae9dca8577dffc6dc3509cd1565743125; _ga=GA1.2.1798570479.1565743125; HstCfa2982759=1565743132096; __dtsu=3DD172A73239535D5B772D7602A61C9A; HstCmu2982759=1568475393523; HstCla2982759=1568892578738; HstPn2982759=1; HstPt2982759=96; HstCnv2982759=18; HstCns2982759=34; _gid=GA1.2.1495236200.1569313316; _gat_gtag_UA_138212094_1=1
referer: https://wwv.example.com/some-article.htm
sec-fetch-mode: nested-navigate
sec-fetch-site: same-site
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36
p: 2
c: V1RKa2RHTlRhXbTFhUbmtvMVRWYzRQUT09
id: 631
I hope you can help me with this. Regards.
I have this code
<?php
$mLoginUrl = "https://www.test.com/login";
$mCookieFile = dirname(__FILE__).'/tmpCookies/cookie'.rand().'.txt';
define('USER_AGENT', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36');
define('COOKIE_FILE', $mCookieFile);
define('LOGIN_FORM_URL', $mLoginUrl);
define('LOGIN_ACTION_URL', $mLoginUrl);
$postValues = array(
'user[email]' => "mymail#email.com",
'user[password]' => "mypassword"
);
$headers = Array(
"Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
"Cache-Control: max-age=0",
"Connection: keep-alive",
"Keep-Alive: 300",
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7",
"Accept-Language: en-us,en;q=0.5",
"Pragma: "
);
$curl = curl_init();
curl_setopt($curl, CURLOPT_VERBOSE, TRUE);
curl_setopt($curl, CURLOPT_URL, LOGIN_ACTION_URL);
curl_setopt($curl, CURLOPT_POST, TRUE);
curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($postValues));
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER,FALSE);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE);
curl_setopt($curl, CURLOPT_COOKIEFILE, COOKIE_FILE);
curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_REFERER, LOGIN_FORM_URL);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_HTTPHEADER,$headers) ;
curl_setopt($curl, CURLINFO_HEADER_OUT, TRUE);
$res = curl_exec($curl);
$info = curl_getinfo($curl);
print_r($info['request_header']);
exit;
?>
This works fine on my local computer and one of my servers and shows following output
GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36
Host: www.test.com
Referer: https://www.test.com/login
Cookie: _property_session=WjROWEYvTHNYaE5Zb29jVk04WGM0Z3FybmhmY1ZIeVdBc2N6d2N3UmViaXlZdFNhR1dSbUN4QVh6aFFSRjFPYktybmdnRGlXNG0yWWcremEzcklKTnE1ZE1lTTM0eUQrSG90SVhRRzhvYW5rWmFQTVhBMjVCWjBtb1FSc0RrTEh2RjhHSFI3aHkwa3U4N3Y3czJhTzJuN2ZGbWRRN0Nra2Z6OTR4aHhvVG42bVVRS3kwTExUL1hMN2JoZ0xRd2g3VVdIMC81cGhLQzJjOTJvc2RYajIwakE0VjZqRnhTeHBleFltTGF4Z3hpUGJEb0E3Nlo2S3BwMElqNnVkaWhDVS0tc0pCRlozSVE5bXRHQXlHWE1IbTl4UT09--67cf6e056b84b4cae4d275507f544927802eb78d
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Cache-Control: max-age=0
Connection: keep-aliveKeep-Alive: 300 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Language: en-us,en;q=0.5
which means that cookie was created and attached in headers of CURL (as we can see in above header print of CURL.) The cookie file is created as well at mentioned location.
But on one of my server the code do not work as per expectations and gives following output
POST /users/sign_in? HTTP/1.1 User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36 Host: www.test.com Referer: https://www.test.com/login Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Cache-Control: max-age=0 Connection: keep-alive Keep-Alive: 300 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Language: en-us,en;q=0.5 Content-Length: 76 Content-Type: application/x-www-form-urlencoded
Means cookie information is not attached in headers of CURL. I have checked that cookie file is created in this (problematic) server too and is having cookie contents but still cookie contents are not included in CURL headers. Temporary cookie directory mentioned is having full rights/permissions (777) for all users.
PHP version is 5.4.19 and CURL version is 7.19.7 on problematic server.
If anybody can help? I have tried all of the solutions found on internet.
Thanks in advance.
$mCookieFile = dirname(__FILE__).'/tmpCookies/cookie'.rand().'.txt';
remove rand() ... to be a static file
Thanks all for your help. I have solved issue. The issue was open_basedir path. I set this to "none" on server which fixed the problem.
Curl doesn't send cookie from dev server, but when i run script from another server, it's works well. I can't understand what's wrong with dev server.
$curl=curl_init($request);
//$cook = './cook/1.txt';
//curl_setopt($curl, CURLOPT_COOKIEJAR, $cook);
//curl_setopt($curl, CURLOPT_COOKIEFILE, $cook)
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36");
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER,true);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
//curl_setopt($curl, CURLOPT_ENCODING, 'UTF-8');
curl_setopt($curl, CURLOPT_VERBOSE,1);
curl_setopt($curl, CURLINFO_HEADER_OUT,1);
curl_setopt($curl, CURLOPT_COOKIE, "departureCity=2; path=/;");
$out = curl_exec($curl);
$info = curl_getinfo($curl);
curl_close($curl);
$info:
Array
(
[request_header] => GET /hotels/greece/showhotel/12_islands_villas_26 HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36
Host: SITE
Accept: */*
Referer: SITE
)
there is now cookie data in header