download a source code of web pages using curl php - php

I am trying to download a source code of web pages using curl php code but its downloading only for few pages for rest pages file is empty.
I googled it but im not getting solution.
My source code is :-
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $strurl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_USERAGENT, 'CURL via PHP');
$out = curl_exec($ch);
$fp = fopen('f1.html', 'w');
fwrite($fp, $out);
fclose($fp);
curl_close($ch);
What options to add ? Where i am wrong ?
Pls help.

Try setting a user-agent that suggests you're a browser. Some servers will block curl/wget/etc.
For example: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.152 Safari/537.22

Related

PHP Setting custom header starting with ':'

I need to setup some custom headers start with ":".
$option['headers'][] = ":authority: example.com"; //<-- Here is the problem
$option['headers'][] = "accept-encoding: gzip, deflate, br";
$option['post'] = json_encode(array("Domain"=>"example.com"));
$url = "https://www.google.com";
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36");
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_COOKIEFILE,"file.cookie");
curl_setopt($ch,CURLOPT_COOKIEJAR,"file.cookie");
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_VERBOSE, true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $option['post']);
curl_setopt($ch, CURLOPT_HTTPHEADER, $option['headers']);
$getdata = curl_exec($ch);
I try to replace the ":" with chr(58) but same problem. I get error 55 and from log "* Failed sending HTTP POST request". If I comment first line is working, but I really need that header. I'm stuck here. Any solutions?
:authority: looks like an HTTP/2 psuedo header, and you can't set them like this with curl. curl will however pass it on itself and it will use the same content as it would set for Host: to make it work the same way, independently of which HTTP version that is eventually used (it will also work with HTTP/3).

Transalte curl command to PHP

Hello I have this Linux command that downloads a compressed file
curl -L -O http://www.url.com
The problem is that when I do curl inside PHP I get the HTML code instead of the compressed file.
The PHP code is this:
$url = https://www.example.com
$filePath = '/app/storage/temp/' . $fileName;
$fp = fopen($filePath . 'me', "w");
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_FTPAPPEND, 0);
curl_setopt($ch, CURLOPT_FILE, $fp);
$data = curl_exec($ch);
fwrite($fp, $data);
curl_close($ch);
fclose($fp);
I can't share the real URL since it contains secret keys.
EDIT
The curl command downloaded the same HTML file as the curl command when I added the -L -O options to the curl command it started working, so the thing here is, how can I add those lines with PHP
Using CURL_FILE means that the output is written to the file handle. You don't need to also use fwrite (especially since without setting CURLOPT_RETURNTRANSFER, the return from curl_exec is just true or false).
If it is indeed possible to load from this URL, either remove the fwrite, or remove the CURLOPT_FILE and use:
curl_setopt($ch, CUROPT_RETURNTRANFER, TRUE)
That way, the return from curl_exec will be the loaded data.
Create an empty zip file where you want to download your file.
function downloadFile($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
set_time_limit(65);
$rawFileData = curl_exec($ch);
$info = curl_getinfo($ch);
if (curl_errno($ch)) {
return curl_error($ch);
}
curl_close($ch);
$filepath = dirname(__FILE__) . '/testfile.zip';
file_put_contents($filepath, $rawFileData); // Put the downloaded content in the file
return $info;
}
Hope this helps!
Guys sorry for making this hard for you since I couldn't give much information about my doubt.
The problem is solved, I realized that the URL did work with
curl -O -L htttp://www.example.com
and also by the web browser.
This last thing was actually the one that gave me the path:
Open the web browser
Click F12
Paste the URL and hit enter
I came to realize I needed to add some headers the headers were these:
accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
accept-encoding:gzip, deflate, br
accept-language:en-US,en;q=0.8,es-MX;q=0.6,es;q=0.4
user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36
After I added these headers to the curl inside PHP the result was the compressed zip file I was searching for.

PHP cURL, file_get_contents blank page

I'm trying to get a page content with cURL or file_get_content. On many websites it's working but i'm trying to do that on a friend's server and it's not.
I think there is a protection with header or things like that. I get the following error code : 401 forbidden. If i try to reach the same page with a normal browser it works.
Here is my code for the file_get_contents function :
$homepage = file_get_contents('http://192.168.1.3');
echo $homepage; // just a test to see if the page is loaded, it's not.
if (preg_match("/my regex/", $homepage)) {
// ... some code
}
I also tryed with cURL :
$url = urlencode('http://192.168.1.3');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
$result = curl_exec($ch) or die("Not working");
curl_close($ch);
echo $result; // not working ..
Nothing works, maybe i should add more args to curl_setopt ...
Thanks.
PS : If i try with linux (wget) i get an error, but if i try with aria2c it's working.
HTTP Status 401 means that UNAUTHORIZED. You need send the server with username and passwd。
With file_get_contents, you add the second param . That's a context-steam, which you can set header info.
You'd better to use curl for file_get_contents intend to access local file, as it's a block function. Add the option as following, it's a basic authorize.
curl_setopt($ch,CURLOPT_USERPWD,"my_username:my_password");
try this update with useragent
<?php
$curlSession = curl_init();
curl_setopt($curlSession, CURLOPT_URL, 'http://192.168.1.3/');
curl_setopt($curlSession,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);
$homepage = curl_exec($curlSession);
curl_close($curlSession);
echo $homepage ;
?>
if still getting blank page you have to install this add-on on firefox and see the "request-headers" and "response-headers"

php curl super long url malformed

I have a long url nested in a variable: $mp4, and trying to download it with curl but i'm getting malformed error. Please help me if you can, thank you in advance!
The below is what I have in my php script:
exec("curl -o $fnctid.mp4 \"$mp4\"");
Error message:
curl: (3) <url> malformed
Sample url to test download:
http://f26.stream.nixcdn.com/6f4df1d8c248cf149b846c24d32f1c35/514e0209/PreNCT5/22-TaylorSwift-2426783.mp4
The current url is returning 408 - Request Timeout if that is fixed you are you this simple code :
$url = 'http://f26.stream.nixcdn.com/6f4df1d8c248cf149b846c24d32f1c35/514e0209/PreNCT5/22-TaylorSwift-2426783.mp4';
$useragent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2';
$file = __DIR__ . DIRECTORY_SEPARATOR . basename($url);
$fp = fopen($file, 'w+');
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_TIMEOUT, 320);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
echo curl_exec($ch);
var_dump(curl_getinfo($ch)); // return request information
curl_close($ch);
fclose($fp);
This error can be resolved through using urlencode
$url = urlencode ( $url )
This function is convenient when encoding a string to be used in a query part of a URL, as a convenient way to pass variables.
Hope this solve answer

User information is disabled (while using PHP Curl)

I can access a web page when I type its URL in my browser. However, while using curl to access the details of that web page, I get the message on screen
User information is disabled.
This operation cannot be accepted. User certification is invalid or date expired.
Update page.
I can access the details of my network printer(Canon IR3570) by typing in the IP of that printer in my browser. This opens up the remote UI. However, it doesn't seem to work with curl.
This is my code in PHP curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"URL");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0.1');
$result= curl_exec($ch);
echo $result;
curl_close ($ch);
What could be the reason for such a message?
Where's UserAgent set? They may filtrate requests.
try adding:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,1);
curl_setopt($ch, CURLOPT_USERAGENT, "User-Agent Mozilla/5.0");
curl_setopt($ch, CURLOPT_HEADER ,1); // DO NOT RETURN HTTP HEADERS
This isn't CURL but it works in Visual Basic 2012 - one of the headers solved it for me
Sub Main()
Dim web_client As New System.Net.WebClient
Dim baseDate As DateTime = New DateTime(1970, 1, 1)
Dim diff As TimeSpan = DateTime.Now - baseDate
web_client.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
web_client.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3")
web_client.Headers.Add("Accept-Encoding", "gzip, deflate, sdch")
web_client.Headers.Add("Accept-Language", "en-US,en;q=0.8")
web_client.Headers.Add("Cookie", "iR = 1711753823")
web_client.Headers.Add("Host", "172.23.100.14")
web_client.Headers.Add("Referer", "http://172.23.100.14/jlp.cgi?Flag=Html_Data&LogType=0&Dummy=" & diff.Milliseconds)
web_client.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31")
web_client.DownloadFile("http://172.23.100.14/pprint.csv?Flag=Csv_Data&LogType=0&Dummy=" & diff.Milliseconds, "P:\Transfer\mstavers\printlogs\" & Format(Now, "yyyy-MM-dd-hh-mm-ss") & ".csv")
End Sub

Categories