I'm trying to submit a form to a .aspx page with curl and then do something with the response. The problem is that my code works when I'm submiting it from my local xampp server but when submited from webserver I get "HTTP Error 400. The request URL is invalid."
I tried removing CURLOPT_POST option, found it somewhere on SO. I also tried urlencoding but then I get nothing.
$url = "http://www.somepage.com/locations/default.aspx#location_page_map";
$kv[]='search=92627';
$kv[]='__VIEWSTATE';
$kv[]='__EVENTTARGET';
$query_string = join("&", $kv);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_POST, count($kv));
curl_setopt($ch, CURLOPT_POSTFIELDS, $query_string);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$output = curl_exec($ch);
var_dump($output);
curl_close($ch);
You can actually leave out the __VIEWSTATE and __EVENTTARGET there most likely something todo with ASP's form value persistence, also you can remove the #location_page_map as thats just to focus the page on the map section, so will not impact the results from the service/site your trying to scrape. You then use http_build_query() to turn the array into a string for curl.
<?php
//$url = "http://www.myfitfoods.com/locations/default.aspx#location_page_map";
$url = "http://www.somepage.com/locations/default.aspx#location_page_map";
$kv['search'] = '92627';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_POST, count($kv));
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($kv));
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$output = curl_exec($ch);
var_dump($output);
curl_close($ch);
You haven't defined your $kv array propertly. Curl will take an array, but it has to be in key=>value format. All you've provided is 3 values. e.g. you'd actually be passing
=search%3D62627&=__VIEWSTATE&=__EVENTTARGET
^--no key ^---no key ^--- no key
Try:
$kv = array(
'search' => 92627,
'x' => '__VIEWSTATE',
'y' => '__EVENTTARGET'
)
curl_setopt($ch, CURL_POSTFIELDS, $kv);
or similar instead.
Related
I'm trying to cURL this URL and I can't figure out what I'm doing wrong. I'd really appreciate some help!
Here's my code (with my API Token taken out)
$url ='https://app.files.com/api/rest/v1/users/0.json';
$header = array("Accept: application/json, X-FilesAPI-Key: FakeKeyGoesHere");
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.0.3705; .NET CLR 1.1.4322)');
$retValue = curl_exec($ch);
print_r($retValue);
there are two problems:
the $header is in incorrect form. it must be an array with one element per one header, but you have them alltogather in one string.
try
$header = array("Accept: application/json", "X-FilesAPI-Key: FakeKeyGoesHere");
do not use curl_setopt($ch, CURLOPT_ENCODING, "gzip");, if the data comes in another encoding - the result will be mess. use curl_setopt($ch, CURLOPT_ENCODING, ""); for input-encoding-autodetect.
I'm using the same code to get the price of different web pages (7 in particular), all work perfect, but in 1 I can not get any data, could you tell me if it is impossible, if the page has any protection? Thanks in advance.
$source = file_get_contents("https://www.cyberpuerta.mx/Computo-Hardware/Discos-Duros-SSD-NAS/Discos-Duros-Internos-para-PC/Disco-Duro-Interno-Western-Digital-Caviar-Blue-3-5-1TB-SATA-III-6-Gbit-s-7200RPM-64MB-Cache.html");
preg_match("'<span class=\"priceText\">(.*?)</span>'", $source, $price);
echo $price[1];
I hope this result:
$869.00
This code only works badly on the website shown in the code.
Use curl with an agent set, this usually tricks the website protections to believe it's a true user.
$URL = "https://www.cyberpuerta.mx/Computo-Hardware/Discos-Duros-SSD-NAS/Discos-Duros-Internos-para-PC/Disco-Duro-Interno-Western-Digital-Caviar-Blue-3-5-1TB-SATA-III-6-Gbit-s-7200RPM-64MB-Cache.html";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, $URL);
$result =curl_exec($ch);
preg_match("'<span class=\"priceText\">(.*?)</span>'", $result, $price);
echo $price[1];
i am trying to get web page content with curl from some websites but they return 400 bad request ( file_get_contents return empty ) here's the function i am using :
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Put error_reporting(E_ALL); line at the top file where you are calling this function.
It will generate the cause of an error.
$loginUrl = 'http://mp3.zing.vn/json/song/get-source/ZmJmTknNCBmLNzHtZbxtvmLH';
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$loginUrl);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
$result=curl_exec($ch);
curl_close($ch);
var_dump(json_decode($result));
I have a problem to get the data using curl operation. If i use the url only in my browser then it returns the data but here i using var_dump its null. I have consult some post in stackoverflow but i cant sovle this problem.
Where i do some mistake, please help my. Thanks
The URL is invalid, i.e. the path mentioned as the variable $loginURL doesnot exist.
loginUrl = 'http://mp3.zing.vn/json/song/get-source/ZmJmTknNCBmLNzHtZbxtvmLH';
If you visit URL:
https://selfsolve.apple.com/agreementWarrantyDynamic.do?caller=sp&sn=990002316140324
then it will redirect and results will be shown at URL:
https://selfsolve.apple.com/wcResults.do
I'm trying with PHP cURL to get this results but the page is empty. Its not redirecting.
Here is my code which I tried:
<?php
$url ='https://selfsolve.apple.com/agreementWarrantyDynamic.do?caller=sp&sn=990002316140324';
$http_headers = array(
'Accept: /*',
'Connection: keep-alive'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $http_headers);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/applecookie.txt');
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
$retValue = curl_exec($ch);
$response = json_decode(curl_exec($ch));
$ee = curl_getinfo($ch);
print_r($ee);
print_r($retValue);
?>
How to make it work?
==== (Possible) Issue: Your PHP configuration has safe_mode or open_basedir enabled.
CURLOPT_FOLLOWLOCATION (integer) This constant is not available when
open_basedir or safe_mode are enabled.(http://php.net/manual/en/curl.constants.php)
==== (Possible) Issue: The remote service isn't responding as you expect. Break it down into individual parts and log the output, or check Google Chrome (or similar) for the redirect:
A-ha! Chrome shows that there is no redirect!
In PHP this might look something like the below. This code will cycle through the redirect chain manually and give you chance to inspect responses along the way.:
(see code below)
==== Issue: You are executing the request twice (you probably noticed this!):
$retValue = curl_exec($ch);
$response = json_decode(curl_exec($ch));
==== Issue: You are expecting to json_decode a HTML response. This will not work (and can't be expected to).
IN SHORT
It looks like there is a redirect in JavaScript that this page is using, as opposed to normal header redirects. You might have to rethink your approach as you'll probably struggle to extract this information from the page, and it's certainly going to be subject to change. (It's actually submitting a form to the next URL so you'll have to work out where the data is from -- again, check the Chrome log).
(footnote) And the code that will help you spot this in PHP (for this URL it returns 200 straight away -- there is no redirect!):
<?php
$url = 'https://selfsolve.apple.com/agreementWarrantyDynamic.do?caller=sp&sn=990002316140324';
$http_headers = array(
'Accept: */*',
'Connection: keep-alive',
'Accept-Encoding:gzip, deflate, sdch',
'Accept-Language:en-US,en;q=0.8,es;q=0.6'
);
$finished = false;
$count = 0;//we don't want to redirect forever!
$currentUrl = $url;
while( $finished == false && $count < 10 ) {
$count++;
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $http_headers);
curl_setopt($ch, CURLOPT_URL, $currentUrl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
// not while we're testing: //curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
$retValue = curl_exec($ch);
$info = curl_getinfo($ch);
$responseCode = $info['http_code'];
if($responseCode > 300 && $responseCode < 303) {
echo "\n redirecting ($responseCode) to ".$info['redirect_url'];
$currentUrl = $info['redirect_url'];
} else {
$finished = true;
echo "\n finished ($responseCode) content length:".strlen($retValue);
}
}
//now try the whole thing
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $http_headers);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
$retValue = curl_exec($ch);
$info = curl_getinfo($ch);
echo "\nWhole request: finished ($responseCode) content length:".strlen($retValue). " total redirects:".$info['redirect_count'];
echo "\n\n";
Output:
finished (200) content length:4833
Whole request: finished (200) content length:4833 total redirects:0