When in the browser you follow the link:
http://steamcommunity.com/market/priceoverview/?country=US%C2%A4cy=5&appid=570&market_hash_name=Gem%20of%20Taegeuk
Gives out { "success": false }, In headings 500 a mistake. But when I do the same inquiry through cUrl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://steamcommunity.com/market/priceoverview/?country=US¤cy=5&appid=570&market_hash_name=Gem%20of%20Taegeuk");
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$curl = curl_exec($ch);
In response, instead of json I get this:
‹ЄV*.MNN-.VІJKМ)NяятКC4
Tell me how to fix this and what might be the cause of the error (500)?
The server return gzipped response (header Content-Encoding: gzip). So, you need auto encoding:
curl_setopt($ch,CURLOPT_ENCODING, '');
P.S. Browser unlike the curl unpacks the response automatically.
Two problems:
1) There's an additional %C2%A4cy% and missing curren after country=US in the example link. The URL in CURL looks ok.
2) Your CURL commands do not follow redirects, the URL should be with https:// (browser does that automatically). You can follow redirects with curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
Related
I want to get full redirect path of the url.
Let's say if source.com redirects to destination.com after multiple redirects like this:
http://www.source.com/ -> http://www.b.com/ -> http://www.c.com/ -> http://www.destination.com/
how do I get all redirected URL's?
using this below code I am getting only http://www.destination.com/ how do I detect full url redirect chain?
<?php
$url='windows.com';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow the redirects
curl_setopt($ch, CURLOPT_HEADER, false); // no needs to pass the headers to the data stream
curl_setopt($ch, CURLOPT_NOBODY, true); // get the resource without a body
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // accept any server certificate
curl_exec($ch);
// get the last used URL
$lastUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
echo $lastUrl;
?>
This code has another problem it can't detect redirected url of youtube redirects.
Tested URL : https://www.youtube.com/redirect?redir_token=QUFFLUhqbkVxUFZUME9NbWF4RThxdFpGV3pmTTJEdFVWQXxBQ3Jtc0tubGJqU016TzJ6WnlfeUItX0ZmOUItUE1jRlZoZXhxMzNpQllpM0NLSk4ycnBLMGNidTFsX3N6WkU2X3RsUTRZb1lXQVp5SEZjbnU3eDFuZS1VU3dhdzg2QW9ZMTl1azFCZFZHcHRLdFF3dTM1MlRWdw%3D%3D&event=video_description&v=KEa2XWRGf_4&q=https%3A%2F%2Fwww.facebook.com%2Fabhiandniyu
My question is how do I detect full url redirect chain for all types of redirect requests.
You're probably missing:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
add it to your CURL config and it should work then.
Don't follow HTTP redirects: curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
And output HTTP headers, while testing: curl_setopt($ch, CURLOPT_HEADER, true);
Then you can obtain the Location header from the received HTTP 302 response.
When it's more than one redirect, this would have to run in a loop, until HTTP 200 has been received. In this context HTTP 200 means, that the final destination has been reached.
EDIT: The proper thing to do is just to send a response from Node-red as hardillb pointed out below.
My CURL request is working fine and instantly, but I simply need to have the page visit the url and not wait around for a response. I have tried every combination I can think of and my browser still sits waiting for a server response until timeout.
$url = 'http://example.com:1880/get?temperature='.$temperature;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT_MS, 1);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
// 3. execute and fetch the resulting HTML output
$output = curl_exec($ch);
// 4. free up the curl handle
curl_close($ch);
}
As mentioned in the comments.
The correct solution is to ensure your http-in node is paired with a http-response node in your Node-RED flow
I am using cURL to try to scrape an ASP site that is not on my server, with the following option to automatically follow redirects it comes across:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
but it is not properly following all redirects that the website sends back: it is putting several of the redirect URLs as relative to my server and my PHP script's path, instead of the website's server and the path that the website's pages should be relative to. Is there any way to set the base path or server path in cURL, so my script can properly follow the relative redirects it comes across when scraping through the other website?
For example: If I authenticate on their site and then try to access "https://www.theirserver.com/theirapp/mainForm/securepage.aspx" with my script at "https://www.myserver.com/php/myscript.php", then, under some circumstances, their website tries to redirect back to their login page, but this causes a big problem, because the redirect sends my cURL client to "https://www.myserver.com/php/mainForm/login.aspx", that is, '/mainForm/login.aspx' relative to my script on my server, instead of the correct "https://www.theirserver.com/theirapp/mainForm/login.aspx" relative to the site I am scraping on their server.
I would expect cURL's FOLLOWLOCATION option to properly follow relative redirects based on the "Location:" header of the web pages I am accessing, but it seems that it doesn't and can't. Since this seems to not work, preferably I want a way to tell cURL a base path for the server or for all relative redirects it sees, so I can just use FOLLOWLOCATION. If not, then I need to figure out some code that will do the same thing FOLLOWLOCATION does, but that can let me specify a base path to handle these relative URLs when it comes across them.
I see several similar questions about following relative paths with cURL, but none of the answers have any good suggestions for dealing with this problem, where I don't own the website's server and I don't know every single redirect that might come up. In fact, none of the answers I've seen for similar questions seem to even understand that a person might be trying to scrape an external website and would want any relative redirects they come across while scraping the site to just be relative to that site.
EDIT: Here is the code in question:
$urlLogin = "https://www.theirsite.com/theirApp/MainForm/login.aspx"
$urlSecuredPage = "https://www.theirsite.com/theirApp/ContentPages/content.aspx"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $urlLogin);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; yie8)");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
// GET login page
$data=curl_exec($ch);
// Read ASP viewstate and eventvalidation fields
$viewstate = parseExtract($data,$regexViewstate, 1);
$eventval = parseExtract($data, $regexEventVal, 1);
//set POST data
$postData = '__EVENTTARGET='.$eventtarget
.'&__EVENTARGUMENT='.$eventargument
.'&__VIEWSTATE='.$viewstate
.'&__EVENTVALIDATION='.$eventval
.'&'.$nameUsername.'='.$valUsername
.'&'.$namePassword.'='.$valPassword
.'&'.$nameLoginBtn.'='.$valLoginBtn;
// POST authentication
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_URL, $urlLogin);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
$data = curl_exec($ch);
/******************************************************************
GET secure page (This is where a redirect fails... when getting
the secure page, it redirects to /mainForm/login.aspx relative to my
script, instead of /mainForm/login.aspx on their site.
*****************************************************************/
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_URL, $urlSecuredPage);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile);
$data = curl_exec($ch);
echo $data; // Page Not Found
You may be running into redirects that are JavaScript redirects.
To find out what is there:
This will give you additional info.
curl_setopt($ch, CURLOPT_FILETIME, true);
You should set fail on error:
curl_setopt($ch, CURLOPT_FAILONERROR,true);
You may also need to see all the Request and Response headers:
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
The big thing you are missing is curl_getinfo($ch);
It has info on all the redirects and the headers.
You may want to turn off: CURLOPT_FOLLOWLOCATION
And do each request individually. You can get the redirect location from curl_getinfo("redirect_url")
Or you can set CURLOPT_MAXREDIRS to the number of successful redirects, then do a separate curl request for the problem redirect location
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
When you get the response, if no curl error, get the resposne header
$data = curl_exec($ch);
if (curl_errno($ch)){
$data .= 'Retreive Base Page Error: ' . curl_error($ch);
echo $data;
}
else {
$skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE));
$responseHeader = substr($data,0,$skip);
$data= substr($data,$skip);
$info = curl_getinfo($ch);
$info = var_export($info,true);
}
echo $responseHeader . $info . $data;
A better way to web scraping a webpage is to use 2 PHP Packages = Guzzle + DomCrawler.
I made a lot of tests with this combination and i came to the conclusion that this is the best choice.
Here, you will find an example for your implementation.
Let me know if you have any problem! ;)
I'm trying to fetch this url via a script: http://api.alarabiya.net/sections/2/
But JSON response received is much smaller than when I open it directly in a browser,
please notice that I tried this url through CURL and set the same USER-AGENT of the browser and all request header used in the browser and I still get a smaller response.
Here's an exmaple using just file_get_contents
<?php
echo file_get_contents("http://api.alarabiya.net/sections/2/");
?>
My question is if there's a request size limit when using file_get_contents or if the PHP's memory can't handle it or what's the problem exactly?
When I CURLed this in shell it gave me the same o/p as in php (the trimmed output).
I finally found a solution for this:
$url = "http://api.alarabiya.net/sections/2/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true );
// This is what solved the issue (Accepting gzip encoding)
curl_setopt($ch, CURLOPT_ENCODING, "gzip,deflate");
$response = curl_exec($ch);
curl_close($ch);
echo $response;
I use the following command in some old scripts:
curl -Lk "https:www.example.com/stuff/api.php?"
I then record the header into a variable and make comparisons and so forth. What I would really like to do is convert the process to PHP. I have enabled curl, openssl, and believe I have everything ready.
What I cannot seem to find is a handy translation to convert that command line syntax to the equivalent commands in PHP.
I suspect something in the order of :
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
// What goes here so that I just get the Location and nothing else?
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// Get the response and close the channel.
$response = curl_exec($ch);
curl_close($ch);
The goal being $response = the data from the api “OK=1&ect”
Thank you
I'm a little confused by your comment:
// What goes here so that I just get the Location and nothing else?
Anyway, if you want to obtain the response body from the remote server, use:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
If you want to get the headers in the response (i.e.: what your comment might be referring to):
curl_setopt($ch, CURLOPT_HEADER, 1);
If your problem is that there is a redirection between the initial call and the response, use:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);