How to force curl (with) PHP to download page as browser? The page I want to download is a price comparator, for e.g. http://www.ceneo.pl/22416171. It's public, anybody can access site.
To check if the curl downloading is even possible, I typed on my Debian-based local server
curl http://www.ceneo.pl/22416171
And it worked perfectly. But I do need to use it on my Virtual PHP-Apache serv, so I need to use PHP to do it.
While trying to download page as PHP-based curl, it gives me nothing, opposite to shell curl.
Why? How can I get the right content on PHP?
Tried:
<?php
$curl = curl_init(http://www.ceneo.pl/22416171);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_HEADER, 1);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl,CURLOPT_HTTPHEADER,
array(
'User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:28.0) Gecko/20100101 Firefox/28.0',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: pl,en-US;q=0.7,en;q=0.3',
'Accept-Encoding: gzip, deflate',
'p3p: CP="NOI CURa ADMa DEVa TAIa OUR BUS IND UNI COM NAV INT"',
'Vary: Accept-Encoding',
'Content-Type: text/html; charset=utf-8',
'Cache-Control: private'
));
$body = curl_exec($curl);
curl_close($curl);
echo $body;
?>
I tried also to use
<?php exec(curl http://www.ceneo.pl/22416171); ?>
But it gave
curl: /usr/local/lib/libcurl.so.4: no version information available (required by curl)
Take a look at the documentation: http://www.php.net/manual/en/curl.examples.php
This is how you do it:
test.php
<?php
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "http://www.ceneo.pl/22416171");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//set headers
curl_setopt($ch,CURLOPT_HTTPHEADER, array(
'User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:28.0) Gecko/20100101 Firefox/28.0',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: pl,en-US;q=0.7,en;q=0.3',
//'Accept-Encoding: gzip, deflate',
'p3p: CP="NOI CURa ADMa DEVa TAIa OUR BUS IND UNI COM NAV INT"',
'Vary: Accept-Encoding',
'Content-Type: text/html; charset=utf-8',
'Cache-Control: private'
));
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
// debug
echo $output;
Demo of it working (only the html output from the site is retrieved):
Related
I have a server, with 3rd party API installed, located: http://65.21.1.13:3000/. When I open it in browser, I receive the answer - Service start!, meaning that the service is working. I successfully receive this answer using android java or Visual c++ MFC.
But when I'm trying to open this site using PHP (curl or file_get_contents) - I receive an error. I tried to add headers, flags and other - my curl_exec always returns false. Is there solution, to get proper answer from server using PHP? One of the curl tries below:
$url = 'http://65.21.1.13';
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_PORT ,3000);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$headers = [
'X-Apple-Tz: 0',
'X-Apple-Store-Front: 143444,12',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding: gzip, deflate',
'Accept-Language: en-US,en;q=0.5',
'Cache-Control: no-cache',
'Content-Type: application/x-www-form-urlencoded; charset=utf-8',
'Host: www.example.com',
'Referer: http://www.example.com/index.php', //Your referrer address
'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:28.0) Gecko/20100101 Firefox/28.0',
'X-MicrosoftAjax: Delta=true'
];
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
var_dump($result);
The answer was very simple. The 3000 port was blocked on PHP machine by firewall. Sorry to bother you.
I am trying cURL, but curl_exec() returns unreadable text like the screenshot below.
I wrote cURL like below. I was wondering how to fix this issue.
$ch = curl_init("https://app.kajabi.com/login");
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Host: app.kajabi.com',
'Connection: keep-alive',
'Cache-Control: max-age=0',
'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
'sec-ch-ua-mobile: ?0',
'sec-ch-ua-platform: "Windows"',
'Upgrade-Insecure-Requests: 1',
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-Fetch-Site: none',
'Sec-Fetch-Mode: navigate',
'Sec-Fetch-User: ?1',
'Sec-Fetch-Dest: document',
'Accept-Encoding: gzip, deflate, br',
'Accept-Language: en-GB,en-US;q=0.9,en;q=0.8',
'Cookie: _kjb_session=795006a5538f30410ce2f56bd813ddb0; __cf_bm=7iLyh_LWPmJjzo07YdEJQaE_RT0LPS2R6NL1Hp3Li6g-1649142817-0-Ae4i2Gq5QTr+PktvLBJEV8MHcgGTw5ADVHkedUa3JTcVLHEDTyE01Nw6qsZtmjs7Quu+phKNOlCtu/8Cxpdwxec=; __cfruid=531ca052551b47923660c7b1832af0f2ea867981-1649142817; _kjb_ua_components=41e11a8e3c73294e1d2e0f1813e1f86d'
));
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
if (curl_errno($ch)) {
print "Error: " . curl_error($ch);
exit();
}
echo $response;
I tried putting the response into a file, and it appears that the response is in gzip format.
file_put_contents('temp.gz',$response)
I extracted the archive and found that it's a HTML document telling you that the access is denied.
You can show the response directly in the output of your php script, though:
$decoded_response = gzdecode($response);
echo $decoded_response;
And maybe you should check whether the content is actually gzip before attempting to use gzdecode; see this thread: php curl, detect response is gzip or not
Edit:
You can let php automatically do the decoding by setting CURLOPT_ENCODING to '':
<?php
$ch = curl_init("https://app.kajabi.com/login");
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Host: app.kajabi.com',
'Connection: keep-alive',
'Cache-Control: max-age=0',
'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
'sec-ch-ua-mobile: ?0',
'sec-ch-ua-platform: "Windows"',
'Upgrade-Insecure-Requests: 1',
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-Fetch-Site: none',
'Sec-Fetch-Mode: navigate',
'Sec-Fetch-User: ?1',
'Sec-Fetch-Dest: document',
'Accept-Encoding: gzip, deflate, br',
'Accept-Language: en-GB,en-US;q=0.9,en;q=0.8',
'Cookie: _kjb_session=795006a5538f30410ce2f56bd813ddb0; __cf_bm=7iLyh_LWPmJjzo07YdEJQaE_RT0LPS2R6NL1Hp3Li6g-1649142817-0-Ae4i2Gq5QTr+PktvLBJEV8MHcgGTw5ADVHkedUa3JTcVLHEDTyE01Nw6qsZtmjs7Quu+phKNOlCtu/8Cxpdwxec=; __cfruid=531ca052551b47923660c7b1832af0f2ea867981-1649142817; _kjb_ua_components=41e11a8e3c73294e1d2e0f1813e1f86d'
));
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
if (curl_errno($ch)) {
print "Error: " . curl_error($ch);
exit();
}
echo $response;
?>
You are getting un-handled GZIP from Curl because you are manually setting the Accept-Encoding: header in your header array, rather than letting Curl handle it. Curl then gets an unexpectedly-encoded response and goes "I dunno, you deal with this".
You're telling the remote side "I want things handled this way" but you're not actually telling the local side.
Easy fix: Remove the Accept-Encoding: header from your header array, optionally move those encoding specifications to the CURLOPT_ENCODING setting you added in your own answer, but I would say that this is unnecessary as curl will prefer compression anyway.
Other headers that you should likely not be manually setting:
Host: unnecessary unless you need a value other than the hostname in the URL
Connection: client needs to be aware
Upgrade-Insecure-Requests: client needs to be aware, browser-specific
Okay, i have some website which i should parse...
Firstly, i open debugger in Firefox hitting F12, and look at Network tab, then enter needed website, and reading first root GET request, like
Doman => website.com
File => /
I get there all the request headers and write them into php array manually, then in code i call
curl_setopt($curl, CURLOPT_HTTPHEADER, $headerArray);
and also other options, then call
curl_exec();
while inspecting the Network tab in Firefox, i see that request headers are maybe such as default, and no specific headers written manually into array were sent. Similar problem with CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR, cookies are just written to cookie file on server, but in fact, there are another cookies in next request instead of previously saved in cookies file.
Actual request headers in browser's inspector:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Cache-Control: max-age=0
Connection: keep-alive
Cookie: _ga=GA1.1.1951751996.1563984714; _gid=GA1.1.1564173251.1563984714; _userGUID=0:jyhg490v:AIQdD2Qpm9rmbla1U93mK2a45CFRe49c; jv_enter_ts_2VumZAPpbr=1563984717382; jv_visits_count_2VumZAPpbr=1; .....
Host: localhost
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0
PHP Code:
<?php
$headers = ['Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3',
'Cache-Control: max-age=0',
'Connection: keep-alive',
'Cookie: visid_incap_1987259....,
'Host: website.com',
'TE: Trailers',
'Upgrade-Insecure-Requests: 1',
'User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'];
$curl = curl_init("https://www.website.com/");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_COOKIEFILE, dirname(__FILE__)."/cookies.txt");
curl_setopt($curl, CURLOPT_COOKIEJAR, dirname(__FILE__)."/cookies.txt");
echo curl_exec($curl);
?>
You will not be able to see the headers send CURL in the Browser Dev Tools. All requests are executed on the server side. Your headers are sent successfully. You can check it out like this:
curl_setopt($curl, CURLINFO_HEADER_OUT, true);
$sentHeaders = curl_getinfo($curl, CURLINFO_HEADER_OUT);
print_r($sentHeaders);
I have 1 REST api :
http://www.animemobile.com/service/v2/mobile2.php?episode_id=47272
I use curl to request it, in my PC with xampp, it works well and returns the correct results. This is results from my PC with xampp:
[
{
"Title":"English Subbed",
"link":"\/[HorribleSubs] Pascal-sensei - 01 [720p]_af.mp4?
st=14GwNjlMxuI8524DS56IUA&e=1495183034"
}
]
I use
/[HorribleSubs] Pascal-sensei - 01 [720p]_af.mp4?
st=14GwNjlMxuI8524DS56IUA&e=1495183034
to create a link as:
http://st2.anime1.com/[HorribleSubs]%20Pascal-sensei%20-
%2001%20[720p]_af.mp4?st=14GwNjlMxuI8524DS56IUA&e=1495183034.
This link is a video that can be played when request from browser (now).
But when I use the curl in my SERVER, it still works well but does not return the correct results. This is results from my Server:
[
{
"Title":"English Subbed",
"link":"\/[HorribleSubs] Pascal-sensei - 01 [720p]_af.mp4?
st=ghDP4290fsBNdmfsSKCD=1495195645"
}
]
When I use
/[HorribleSubs] Pascal-sensei - 01 [720p]_af.mp4?
st=ghDP4290fsBNdmfsSKCD=1495195645
to create a link as:
http://st2.anime1.com/[HorribleSubs]%20Pascal-sensei%20-
%2001%20[720p]_af.mp4?%20st=ghDP4290fsBNdmfsSKCD=1495195645.
It doesn't play on my browser.
This is my curl:
$c = curl_init();
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($c, CURLINFO_HEADER_OUT, true);
$headers = [
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding: gzip, deflate, sdch',
'Accept-Language: vi,en-US;q=0.8,en;q=0.6',
'Cache-Control: max-age=0',
'Connection: keep-alive',
'Cookie: __cfduid=d7bf11c717fbcd54ec9b259e301a966d71480412679',
'Host: www.animemobile.com',
'Upgrade-Insecure-Requests: 1',
'User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
];
curl_setopt($c, CURLOPT_HTTPHEADER, $headers);
$data = curl_exec($c);
What is the problem here? Please help me!
Edit1: If you want to test the results, you need to request the REST-api again because it had limited time for link to be created. Important that request the REST-api on PC returns correct results but request from server returns wrong results although they look very similar!
$headerSet=array(
'GET'.$url,
'Host: verify-email.org',
'User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:28.0) Gecko/20100101 Firefox/28.0',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-gb,en;q=0.5',
'Accept-Encoding: gzip, deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Proxy-Connection: Close',
'Cookie: __utma=67582614.1178183750.1396541997.1396541997.1396550804.2; __utmz=67582614.1396541997.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); __atuvc=3%7C14; b38b8c1b3d6a4a2d4cc2696153b7cd63=lv63r03hdfq0p66ohv13brhlq5; __utmb=67582614.2.10.1396550804; __utmc=67582614',
'Cache-Control: max-age=0',
'Connection: keep-alive',
'DNT: 1'
);
$ch = curl_init(); // Initialising cURL
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
Now after adding this code I tried running my program and am still not getting the headers set.
I have checked in httpsfox as well. The host is localhost but I want the host set to verify-email.org
Remove 'GET'.$url, from the header array. and place the url inside the curl_init
$ch = curl_init($url); // <- url is placed here, and it will be GET by default
And I didn't see you setting the $headerSet with your curl. Do it like this way:
curl_setopt($ch, CURLOPT_HTTPHEADER, $headerSet);