cURL returns null array - php

I have made a simple web Crawler with PHP cURL that should grab all the images of a particular page from Amazon where the keyword samsung has been searched.
Here is the code:
$curl = curl_init(); // $curl is going to be data type curl resource
$search_string = "samsung";
$url = "https://www.amazon.com/s?k$search_string";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); // ssl
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); // storing in variable
$result = curl_exec($curl);
preg_match_all("!https://m.media-amazon.com/images/I/[^\s]*?._AC_UL320_.jpg!", $result, $matches);
print_r($matches);
curl_close($curl);
But now I get Null array:
Array ( [0] => Array ( ) )
I don't why it is showing that, so if you know what is going wrong or how can I handle this, please let me know, I would really appreciate any idea from you guys...
Thanks in advance.
Note that I have specified [^\s]*? regular expression instead of image name to load all the available images on web page.
UPDATE #1:
Results of curl --head https://www.amazon.com/s?k=samsung
HTTP/1.1 503 Service Unavailable
Content-Type: text/html
Content-Length: 2671
Connection: keep-alive
Server: Server
Date: Tue, 15 Jun 2021 20:59:38 GMT
x-amz-rid: 9BVX8KQMWJ4QDJ75ETYV
Vary: Content-Type,Accept-Encoding,X-Amzn-CDN-Cache,X-Amzn-AX-Treatment,User-Agent
Last-Modified: Fri, 14 May 2021 19:08:48 GMT
ETag: "a6f-5c24ef9383000"
Accept-Ranges: bytes
Strict-Transport-Security: max-age=47474747; includeSubDomains; preload
Permissions-Policy: interest-cohort=()
X-Cache: Error from cloudfront
Via: 1.1 5345148f0ba8ae3c67b69d035acdbfc5.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: AMS50-C1
X-Amz-Cf-Id: AHdq2-QLEtCE4WvXZIEh_P75D8hCrHP09EAkNqBer5VBS-pI-blj1w==

First issue: Your code:
$url = "https://www.amazon.com/s?k$search_string";
should be (note the "=")
$url = "https://www.amazon.com/s?k=$search_string";
Second issue: Amazon is smart, they're not going to let you scrape as you will. The result is the content for:
You can see this with:
$result = curl_exec($curl);
var_dump($result);
Third issue: Regex is not working. One should test Regex at https://www.phpliveregex.com/#tab-preg-match-all
(Using a right-click > view source, copy and paste of the page content.)
From what I got your regex did not return any results, but this did: https://m.media-amazon.com/images/I/[^\s]*?.jpg
May be that the string bit ._AC_UL320_ is also a Amazon anti-scraping thing... :(

it's not https://www.amazon.com/s?k$search_string, it's supposed to be 'https://www.amazon.com/s?k='.urlencode($search_string);, also Amazon.com requires you to send a Accept-Encoding header, otherwise you'll risk getting gzip-compressed responses with nothing to decompress it which means you need a CURLOPT_ENCODING, also amazon will block you if you don't supply a User-Agent header, so you must supply a CURLOPT_USERAGENT, also Amazon will block you without a browser-like Accept header, so you need CURLOPT_HTTPHEADER => array('accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng')
also Do not parse html with regex, Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts.
Instead use a HTML parser like DOMDocument
this code
<?php
$curl = curl_init(); // $curl is going to be data type curl resource
$search_string = "samsung";
$url = "https://www.amazon.com/s?k=".urlencode($search_string);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); // ssl
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); // storing in variable
curl_setopt_array($curl,array(
CURLOPT_ENCODING =>'',
CURLOPT_USERAGENT=>'libcurl',
CURLOPT_HTTPHEADER=>array(
'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
)
));
$html=curl_exec($curl);
$domd = new DOMDocument();
#$domd->loadHTML($html);
foreach($domd->getElementsByTagName("img") as $img){
echo $img->getAttribute("src"),"\n";
}
outputs
//fls-na.amazon.com/1/batch/1/OP/ATVPDKIKX0DER:136-7756522-9160852:777GSTVR1XJ9MBF1N0KN$uedata=s:%2Frd%2Fuedata%3Fstaticb%26id%3D777GSTVR1XJ9MBF1N0KN:0
https://images-na.ssl-images-amazon.com/images/G/01/gno/sprites/nav-sprite-global-1x-hm-dsk-reorg._CB405937547_.png
https://m.media-amazon.com/images/I/81HdcaHSq4L._AC_UY218_.jpg
https://m.media-amazon.com/images/I/91eAcgt9fSL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81afsli5ctL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/61m1Dot5KCL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/61HFJwSDQ4L._AC_UY218_.jpg
https://m.media-amazon.com/images/I/216-OX9rBaL._SS72_.png
https://m.media-amazon.com/images/I/21OXy0oJ8VL._SS160_.png
https://m.media-amazon.com/images/I/61jfI8GyQgL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/61LUNEgB6iL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/813dec-cszS._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81AT+Flc+EL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/216-OX9rBaL._SS72_.png
https://m.media-amazon.com/images/I/21OXy0oJ8VL._SS160_.png
https://m.media-amazon.com/images/I/61a5ejk6K2L._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81+3SWSAhDL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/61pwE8H34zL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/71ejkOW4y2L._AC_UY218_.jpg
https://m.media-amazon.com/images/I/71G6eW8H8hL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/91dFUw5MUTS._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81P4RzFnw6L._AC_UY218_.jpg
https://m.media-amazon.com/images/I/712iry8nIYL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/61VgW9ZZXiL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/61ft-L7HnUL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/51icdppvRVL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/6164p9jY2jS._AC_UY218_.jpg
https://m.media-amazon.com/images/I/51skvShlcsL._AC_UY218_.jpg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/68995c82-c645-4ec0-9168-20f77b8ae24d/625e2c3f-01d9-401e-b4a4-bb865ad9e525/media._SL60_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/93913ead-ae42-4933-8fc4-e9f88b0396c9/1635f47b-1fa9-40ca-8d85-47f529c1ba8b/media._SL480_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/68995c82-c645-4ec0-9168-20f77b8ae24d/625e2c3f-01d9-401e-b4a4-bb865ad9e525/media._SL60_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/6aa489c6-af9d-48d0-94c8-cce1a4f50fc7/ff2a7805-3166-41b9-9881-d00901ca9dfd/media._SL480_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/68995c82-c645-4ec0-9168-20f77b8ae24d/625e2c3f-01d9-401e-b4a4-bb865ad9e525/media._SL60_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/73b89b9f-ee28-446f-8535-beacd328c95a/8caa5478-3583-49f9-9dcb-6e5b0a254fa6/media._SL480_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/68995c82-c645-4ec0-9168-20f77b8ae24d/625e2c3f-01d9-401e-b4a4-bb865ad9e525/media._SL60_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/457fd8ad-f566-4682-bb66-fd865954aec0/fb2cdc76-7ed6-4b86-9196-d40c3ead2914/media._SL480_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/68995c82-c645-4ec0-9168-20f77b8ae24d/625e2c3f-01d9-401e-b4a4-bb865ad9e525/media._SL60_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/5c60fcd5-17c1-4389-8423-2252436f21c8/0125e72d-9178-4048-bea3-9d268a406a05/media._SL480_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/68995c82-c645-4ec0-9168-20f77b8ae24d/625e2c3f-01d9-401e-b4a4-bb865ad9e525/media._SL60_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/f852e5ab-0fa9-4f91-b195-b0facc4d0d70/30b0ec08-79b2-428d-98df-aadffd2c00eb/media._SL480_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/68995c82-c645-4ec0-9168-20f77b8ae24d/625e2c3f-01d9-401e-b4a4-bb865ad9e525/media._SL60_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/d173de56-5162-463f-be97-d256c1895024/7974c773-0c53-43a1-bfb4-91d7cc3ce801/media._SL480_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/68995c82-c645-4ec0-9168-20f77b8ae24d/625e2c3f-01d9-401e-b4a4-bb865ad9e525/media._SL60_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/grey-pixel.gif
https://m.media-amazon.com/images/S/mms-media-storage-prod/final/BrandPosts/brandPosts/2cfe5e10-6a7e-43f4-80c7-d87f212b8007/43e8a030-58c5-491a-9854-cd4d8824a873/media._SL480_.jpeg
https://images-na.ssl-images-amazon.com/images/G/01/personalization/ybh/loading-4x-gray._CB485916920_.gif
https://assoc-na.associates-amazon.com/abid/um?s=136-7756522-9160852&m=ATVPDKIKX0DER
//fls-na.amazon.com/1/batch/1/OP/ATVPDKIKX0DER:136-7756522-9160852:777GSTVR1XJ9MBF1N0KN$uedata=s:%2Frd%2Fuedata%3Fnoscript%26id%3D777GSTVR1XJ9MBF1N0KN:0

$url = "https://www.amazon.com/s?k$search_string";
yes your url is wrong
Actull url is.you can try
$url = "https://www.amazon.com/s?k=$search_string";

Firstly there is a typo
change
$url = "https://www.amazon.com/s?k".$search_string;
to
$url = "https://www.amazon.com/s?k=".$search_string;
Amazon expects some header values to be there when requesting content please refer to the following curl request
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.3>
curl_setopt($curl, CURLOPT_HTTPHEADER, array(
'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v>
));
curl_setopt($curl, CURLOPT_ENCODING, '');
$result=curl_exec($curl);
Lastly, Change your preg_match_all function from
preg_match_all("!https://m.media-amazon.com/images/I/[^\s]*?._AC_UL320_.jpg!", $result, $matches);
To
preg_match_all('/(https?:\/\/\S+\.(?:jpg|png|gif))\s+/', $result, $matches);
Complete Code :
<?php
$curl = curl_init();
$search_string = "samsung";
$url = "https://www.amazon.com/s?k=".$search_string;
//set headers to match with amazon header . you can check headers with any browsers developer tool.
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36');
curl_setopt($curl, CURLOPT_HTTPHEADER, array(
'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
));
curl_setopt($curl, CURLOPT_ENCODING, '');
$result=curl_exec($curl);
preg_match_all('/(https?:\/\/\S+\.(?:jpg|png|gif))\s+/', $result, $matches);
print_r($matches);

Related

logging in to Joomla from external script

I'm trying to login a user who was authenticated elsewhere to a Joomla-site and was following Brent Friar's nice program, but had to apply two modifications:
added a field "return" which was contained in the form
refencing com_users, not com_user
I do not know if that site has specific customizations, uses a specific login-module or if is a different version - I do not have admin-access to the site, so I cannot check.
Now, my script is running, but it does not successfully login the user - it doesn't get a cookie in return which it is expecting.
Instead, the site returns
HTTP/1.1 100 Continue
HTTP/1.1 303 See other Date: Wed, 23 Jul 2014 18:18:25 GMT Server:
Apache/2.2.22 X-Powered-By: PHP/5.2.17 Location:
http://www.strassenbau.forum-kundenportal.de/login-erfolgreich
Content-Length: 0 Connection: close Content-Type: text/html;
charset=utf-8
I know a bit of Joomla, but know nothing about the depths of http-communication with it, so I have no idea what the problem is here.
Here's my code:
<?php
$uname = "*** secret";
$upswd = "*** credentials";
$url = "http://www.strassenbau.forum-kundenportal.de/login-anmeldung";
set_time_limit(0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url );
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE );
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE );
curl_setopt($ch, CURLOPT_COOKIEJAR, realpath('./cookie.txt'));
curl_setopt($ch, CURLOPT_COOKIEFILE, realpath('./cookie.txt'));
curl_setopt($ch, CURLOPT_HEADER, TRUE );
$ret = curl_exec($ch);
if (!preg_match('/name="([a-zA-z0-9]{32})"/', $ret, $spoof)) {
preg_match("/name='([a-zA-z0-9]{32})'/", $ret, $spoof);
}
preg_match('/name="return" value="(.*)"/', $ret, $return); // search for hidden field "return" and get its value
// POST fields
$postfields = array();
$postfields['username'] = urlencode($uname);
$postfields['password'] = urlencode($upswd);
$postfields['option'] = 'com_users';
$postfields['task'] = 'user.login';
$postfields['return'] = $return[1];
$postfields[$spoof[1]] = '1';
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
$ret = curl_exec($ch);
echo "ret2: <pre>"; var_dump($ret); echo "</pre>"; // no cooking being set here!
// Get logged in cookie and pass it to the browser
preg_match('/^Set-Cookie: (.*?);/m', $ret, $m);
$cookie=explode('=',$m[1]);
setcookie($cookie[0], $cookie[1]);
?>
I ended up using the AutoLogin-extension which did the job. Not as "elegant" as I wanted it to be (because it requires installation of that plugin), but hey, it works! :-)

How to get router informations using cURL and PHP

I am building a web application for my router, it will be my Bachelor's Thesis.
The bad thing is that I can't display my router's informations using my cURL function because I get bad router username and password error. I didn't found any problem at all:
The cURL function:
function myCurl($url, $post="")
{
global $status;
$header = 'Authorization: Basic YWRtaW46YWRtaW4=';
$cookiepath_tmp = "c:/xampp/htdocs/wifi/cookie.txt";
$resp = array();
$ch = curl_init();
curl_setopt($ch,CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
curl_setopt($ch,CURLOPT_URL, trim($url));
curl_setopt($ch,CURLOPT_REFERER, trim($url));
curl_setopt($ch,CURLOPT_COOKIEJAR,$cookiepath_tmp);
curl_setopt($ch,CURLOPT_COOKIEFILE,$cookiepath_tmp);
curl_setopt($ch,CURLOPT_COOKIESESSION, true);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_MAXREDIRS, 10);
curl_setopt($ch,CURLOPT_ENCODING, "");
curl_setopt($ch,CURLOPT_RETURNTRANSFER, true);
#curl_setopt($ch,CURLOPT_AUTOREFERER, true);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch,CURLOPT_TIMEOUT, 15);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false );
curl_setopt($ch,CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_HTTPHEADER, array( 'Expect:' ) );
curl_setopt($ch,CURLOPT_VERBOSE, 1);
#curl_setopt($ch,CURLOPT_FAILONERROR, true);
if($post) { curl_setopt($ch, CURLOPT_POST,1); curl_setopt($ch,CURLOPT_POSTFIELDS,$post); }
$returned = curl_exec($ch);
$resp['returned'] = $returned;
$status=curl_getinfo($ch);
$resp['status'] = $status;
curl_close($ch);
return $resp;
}
I am trying to display the informations using PHP:
The PHP code:
<?php echo $success_msg;
$url = "http://192.168.0.1/session.cgi";
$post = "REPORT_METHOD=xml&ACTION=login_plaintext&USER=admin&PASSWD=admin&CAPTCHA=";
$data = myCurl($url, $post);
#$url = "http://192.168.0.1/st_log.php";
#$data = myCurl($url);
echo $data['returned'];
?>
The error is:
Username or Password is incorrect.
However, The username and password admin are correct.
I have added the following code into myCurl function but still doesn't work:
$header = 'Authorization: Basic YWRtaW46YWRtaW4=';
YWRtaW46YWRtaW4= is the encoded username:password in Base64.
LAST EDIT:
I set the CURLOPT_HEADER to true, and I got this text displayed:
HTTP/1.1 501 Not Implemented Server: Router Webserver Connection: close WWW-Authenticate: Basic realm="TP-LINK Wireless Lite N Router WR740N" Content-Type: text/html
Any solution for this?
I really appreciate your help! Thank you!
I don't known what is your router (vendor / model) but most of them use HTTP basic authentication. And, when the authentication is empty or wrong you get a HTTP 401 error: Unauthorized, which could correspond to your error string.
So you should try to insert a HTTP authorization header in the cURL request:
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

Getting content body from http post using php CURL

I am trying to debug an http post the I am trying to send from list application. I have been able to send the correct post from php CURL which corectly interfaces with my drupal 7 website and uploads an image.
In order to get this to work in my lisp application I really need to see the content body of my http post I have been able to see the headers using a call like this:
curl_setopt($curl, CURLOPT_STDERR, $fp);
curl_setopt($curl, CURLOPT_VERBOSE, 1);
and the headers look the same in my lisp application but I have been unable to examine the body of the post. I have searched online and other people have asked this question but no one posted a response.
The content type of my http post is:
application/x-www-form-urlencoded
I have also tried many http proxy debuging tools but they only ever the http GET to get my php page but never capture the get sent from server once the php code is executed.
EDIT: I have added a code snipet showing where I actually upload the image file.
// file
$file = array(
'filesize' => filesize($filename),
'filename' => basename($filename),
'file' => base64_encode(file_get_contents($filename)),
'uid' => $logged_user->user->uid,
);
$file = http_build_query($file);
// REST Server URL for file upload
$request_url = $services_url . '/file';
// cURL
$curl = curl_init($request_url);
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Content-type: application/x-www-form-urlencoded'));
curl_setopt($curl, CURLOPT_STDERR, $fp);
curl_setopt($curl, CURLOPT_VERBOSE, 1);
curl_setopt($curl, CURLOPT_POST, 1); // Do a regular HTTP POST
curl_setopt($curl, CURLOPT_POSTFIELDS, $file); // Set POST data
curl_setopt($curl, CURLOPT_HEADER, FALSE); // Ask to not return Header
curl_setopt($curl, CURLOPT_COOKIE, "$cookie_session"); // use the previously saved session
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_FAILONERROR, TRUE);
curl_setopt_array($curl, array(CURLINFO_HEADER_OUT => true) );
$response = curl_exec($curl);
CURLOPT_VERBOSE should actually show the details. If you're looking for the response body content, you can also use CURLOPT_RETURNTRANSFER, curl_exec() will then return the response body.
If you need to inspect the request body, CURLOPT_VERBOSE should give that to you but I'm not totally sure.
In any case, a good network sniffer should give you all the details transparently.
Example:
$curlOptions = array(
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_VERBOSE => TRUE,
CURLOPT_STDERR => $verbose = fopen('php://temp', 'rw+'),
CURLOPT_FILETIME => TRUE,
);
$url = "http://stackoverflow.com/questions/tagged/java";
$handle = curl_init($url);
curl_setopt_array($handle, $curlOptions);
$content = curl_exec($handle);
echo "Verbose information:\n", !rewind($verbose), stream_get_contents($verbose), "\n";
curl_close($handle);
echo $content;
Output:
Verbose information:
* About to connect() to stackoverflow.com port 80 (#0)
* Trying 64.34.119.12...
* connected
* Connected to stackoverflow.com (64.34.119.12) port 80 (#0)
> GET /questions/tagged/java HTTP/1.1
Host: stackoverflow.com
Accept: */*
< HTTP/1.1 200 OK
< Cache-Control: private
< Content-Type: text/html; charset=utf-8
< Date: Wed, 14 Mar 2012 19:27:53 GMT
< Content-Length: 59110
<
* Connection #0 to host stackoverflow.com left intact
<!DOCTYPE html>
<html>
<head>
<title>Newest 'java' Questions - Stack Overflow</title>
<link rel="shortcut icon" href="http://cdn.sstatic.net/stackoverflow/img/favicon.ico">
<link rel="apple-touch-icon" href="http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png">
<link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml">
...
Just send it to a random local port and listen on it.
# terminal 1
nc -l localhost 12345
# terminal 2
php -e
<?php
$curl = curl_init('http://localhost:12345');
// etc
If you're talking about viewing the response, if you add curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );, then the document returned by the request should be returned from your call to curl_exec.
If you're talking about viewing the postdata you are sending, well, you should be able to view that anyway since you're setting that in your PHP.
EDIT: Posting a file, eh? What is the content of $file? I'm guessing probably a call to file_get_contents()?
Try something like this:
$postdata = array( 'upload' => '#/path/to/upload/file.ext' );
curl_setopt( $curl, CURLOPT_POSTFIELDS, $postdata );
You can't just send the file, you still need a postdata array that assigns a key to that file (so you can access in PHP as $_FILES['upload']). Also, the # tells cURL to load the contents of the specified file and send that instead of the string.
You were close:
The PHP manual instructs that you must call the constant CURLINFO_HEADER_OUT in both curl_setopt and curl_getinfo.
$ch = curl_init($url);
... other curl options ...
curl_setopt($ch,CURLINFO_HEADER_OUT,true);
curl_exec(ch);
//Call curl_getinfo(*args) after curl_exec(*args) otherwise the output will be NULL.
$header_info = curl_getinfo($ch,CURLINFO_HEADER_OUT); //Where $header_info contains the HTTP Request information
Synopsis
Set curl_setopt
Set curl_getinfo
Call curl_getinfo after curl_exec
I think you're better off doing this with a proxy than in the PHP. I don't think it's possible to pull the raw POST data from the PHP CURL library.
A proxy should show you the request and response contents
To get the header the CURLINFO_HEADER_OUT flag needs to be set before curl_exec is called.
Then use curl_getinfo with the same flag to get the header after curl_exec.
If you want to see the post data, grab the value you set at CURLOPT_POSTFIELDS
For example:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://example.com/webservice");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($payload));
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_exec($ch);
$header = curl_getinfo($ch, CURLINFO_HEADER_OUT);
curl_close($ch);
echo "Request-Header:\r\n" . $header . "\r\n";
echo "Request-Body(URL Encoded):\r\n" . http_build_query($payload) . "\r\n";
echo "Request-Body(Json Encoded):\r\n" . json_encode($payload) . "\r\n";

[Dead]How to successfully POST to an old ASP.NET site utilizing Asynchronous Postback

[Update] Unfortunately I never had an opportunity to solve this problem. However, there are some interesting responses below that are worth a try for other readers looking to do something similar.
I'm trying to parse data from a site running ASP.NET. This site has a login page that I've successfully traversed (using a legitimate account) and stored the cookie for, but when I get deeper into the site I need to navigate it by updating UpdatePanels via Asynchronous Postbacks. The UpdatePanels contain the data that I want.
I'm trying to do this all using PHP and curl. I can successfully load the initial page. When I POST to my target page with all the relevant data (obtained via Firefox's Tamper Data plugin), the echoed result returned from curl always clears my page. Typically, echoing the result would just print out (or spew some error/garbled text) further down the page. curl_error() doesn't print out anything, so it's something wrong with what's being returned to me.
I'm at wits end about how to go about this from here. Please tell me if: a) you know what error I'm getting, b) if this is even going to be possible with exclusively PHP, and c) if, conversely, I need to brush off javascript to interact with ASP.NET's UpdatePanels.
$uri = "TARGETURL";
$cl=curl_init();
curl_setopt($cl, CURLOPT_URL, $uri);
curl_setopt($cl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0');
curl_setopt($cl, CURLOPT_COOKIEFILE, "/tmp/cookie2.txt");
curl_setopt($cl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($cl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($cl, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($cl, CURLOPT_POST, 1);
$postdata=array(
"__VIEWSTATE" => $viewstate,
"OTHER DATA" => "asdfkljsddflkjshdjf",
"__ASYNCPOST" => "true",
);
echo "<PRE>";
print_r($postdata);
echo "</PRE>";
curl_setopt ($cl, CURLOPT_POSTFIELDS, $postdata);
$result = curl_exec($cl); // execute the curl command
echo $result;
Here is the Header and Body I am receiving back from the server (e-mailed to myself to bypass the page-clearing happening with the echo statement):
HEADER RESPONSE:
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/plain; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.5
X-Content-Type-Options: nosniff
Set-Cookie: culture=en-US; expires=Tue, 27-Nov-2012 20:02:37 GMT; path=/
X-Powered-By: ASP.NET Date: Mon, 28 Nov 2011 20:02:37 GMT
Content-Length: 112
BODY RESPONSE:
69|dataItem||<script type="text/javascript">window.location="about:blank"</script>|11|pageRedirect||/Error.aspx|
This explains the problem I'm getting with the page going blank (javascript redirecting my browser output). It also seems to indicate that the header isn't the issue as I'd be getting an HTTP error from bad header values.
A. You state in your request that you are Firefox browser:
curl_setopt($cl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0');
Do not claim you're Firefox:
if you cannot process scripts (as Firefox can and does)
if you want to prevent ASP.NET from sending you a partial rendering response
Make your own user agent name, or don't send it at all.
ASP.NET checks if user agent supports callbacks:
HttpCapabilitiesBase.SupportsCallback Property
B. Don't send __ASYNCPOST = true (give it a try).
Here you are an addapted approach that works for me:
public function doPostbackToAspDotNetPage()
{
$uri = '*** THE_URL ***';
$cl = curl_init();
curl_setopt($cl, CURLOPT_URL, $uri);
curl_setopt($cl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0) Gecko/20100101 Firefox/54.0');
curl_setopt($cl, CURLOPT_COOKIESESSION, '*** OPTIONAL ***');
curl_setopt($cl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($cl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($cl, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($cl, CURLOPT_POST, 1);
// Just in case the url is https and the certification gives some kind of error
curl_setopt($cl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($cl, CURLOPT_SSL_VERIFYPEER, false);
$postdata = array(
'__EVENTTARGET' => '*** A value such as: SOME_ID$ctl20$ctl02 ***',
'__EVENTARGUMENT' => ' *** OPTIONAL ***',
"__VIEWSTATE" => '*** REQUIRED BUNCH OF CHARACTERS ***',
"__ASYNCPOST" => "true",
'__VIEWSTATEGENERATOR' => '*** OPTIONAL ***',
'__EVENTVALIDATION' => "*** REQUIRED BUNCH OF CHARACTERS ***",
);
curl_setopt($cl, CURLOPT_POSTFIELDS, $postdata);
$result = curl_exec($cl);
if (!$result) {
echo sprintf('ERROR:%s', PHP_EOL);
echo curl_error($cl);
} else {
echo $result;
}
curl_close($cl);
}
A different approach can be use a very useful PHP tool (a class emulating browser behavior) that do all the job to keep trace of all fields, do the post/get by clicking on links or buttons.
Here the link:
simpletest
I have no clue about php and curl, but if I understand Correctly, you are trying to send info to an ASP page. Maybe the problem is that the page has the CausesValidation option activated. so, the server is not allowing external POSTs to the page.

how to obtain the download name from an url php coding

ive been trying to get the name of files from a url, i found it simple with base name
until i came urls that has no sign of the true name, intil downloaded
here is an example of the links i found
here the true name is youtubedownloadersetup272.exe
http://qdrive.net/index.php/page-file_share-choice-download_file-id_file-223658-ce-0
as you can see it shows no name until download.
ive been searching a lot, i got desperated of finding nothing, ill apreciate if someone can point me the way thanks.
i sorry to bother again but i foun this link from download.com and i dont se the filename using curl
<?php
function getFilename($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
$data = curl_exec($ch);
echo $data;
preg_match("#filename=([^\n]+)#is", $data, $matches);
return $matches[1];
}
echo getFilename("http://software-files-l.cnet.com/s/software/11/88/39/66/YouTubeDownloaderSetup272.exe?e=1302969716&h=89b64b6e8e7485eab1e560bbdf68281d&lop=link&ptype=1901&ontid=2071&siteId=4&edId=3&spi=fdc220b131cda22d9d3f715684d064ca&pid=11883966&psid=10647340&fileName=YouTubeDownloaderSetup272.exe");
?>
it returns this with echo $data
HTTP/1.1 200 OK
Server: Apache
Accept-Ranges: bytes
Content-Disposition: attachment
Content-Type: application/download
Age: 866
Date: Sat, 16 Apr 2011 10:16:54 GMT
Last-Modified: Fri, 08 Apr 2011 18:04:41 GMT
Content-Length: 4700823
Connection: keep-alive
if i understood the scrip you gave me it wont work because it has no filename,
is there a way to get the name with out having to do regex or parsing the url (YouTubeDownloaderSetup272.exe?e.........), like the scrip you gave me ?
You need to use curl, or some other library to request the file and look at the headers of the response.
You'll be looking for a header like:
Content-Disposition: attachment; filename=???
Where the question marks are the name of the file.
(You will still have to download the file, or at least look like you are downloading the file.)
function getFilename($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
$data = curl_exec($ch);
preg_match("#filename=([^\n]+)#is", $data, $matches);
return $matches[1];
}
echo getFilename("http://qdrive.net/index.php/page-file_share-choice-download_file-id_file-223658-ce-0"); //YouTubeDownloaderSetup272.exe
For download.com...
function download_com($url){
$filename = explode("?", $url);
$filename = explode("/", $filename[0]);
$filename = end($filename);
return $filename;
}
echo download_com("http://software-files-l.cnet.com/s/software/11/88/39/66/YouTubeDownloaderSetup272.exe?e=1302969716&h=89b64b6e8e7485eab1e560bbdf68281d&lop=link&ptype=1901&ontid=2071&siteId=4&edId=3&spi=fdc220b131cda22d9d3f715684d064ca&pid=11883966&psid=10647340&fileName=YouTubeDownloaderSetup272.exe");

Categories