Getting html source code in php [duplicate] - php

This question already has answers here:
php: Get html source code with cURL
(4 answers)
Closed 7 years ago.
I got a few links, some on rapidshare and some on other uploading hosts.
I tried getting the source code with curl and file_get_contents and then search for "Deleted" or so but I was not able to get the source code on some hosts.
On some hosts curl is working and on other file_get_contents is working but most won't return source code.
Here is my code for curl:
function curl_download($Url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch,CURLOPT_SSLVERSION,3);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE, 1);
$output = curl_exec($ch);
curl_close($ch);
return $output;

Try adding following in your cURL code:
//after -- curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
Hope it works for you

Related

PHP Curl - 400 Bad Request

I know this is a common problem when using Curl but I have not found a solution after looking through StackOverflow and Google.
I've tried different User Agents and I'm getting different errors:
The requested URL returned error: 400 Bad Requestresource(19) of type (Unknown)
The requested URL returned error: 400 Bad Requeststring(42) of type (Unknown) (I noticed the 42 refers to the '=' in the $target_url)
depending on some of the modifications I make to my code below, however none has pointed me in the direction to solve this problem.
I appreciate any advice:
$target_url = "http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=170307";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)');
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
//curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if ($html === false) $html = curl_error($ch);
echo stripslashes($html);
curl_close($ch);
var_dump($ch);
*** I should note that I'm actually reading the url (and a few others) from a file, so maybe there is something wrong with the format of the url?
I've done this before and had no problem with it, but now I'm stumped.
I read each line/url and place it into an array which I loop through later on.
*** If I hardcode the url then it works fine, but for some reason reading it from the file produces the error.
Don't use stripslashes() use preg_replace() to filter the URLs
<?php
$target_url="http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=170307";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,4);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
$html = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+) ([\"'>]+)#",'$1'.$target_url.'$2$3', $html);
echo $html;
curl_close($ch);
var_dump($ch);
?>

How would I log into a site with curl?

I've been trying to log into a site (www.namemc.com/login) with a curl script for about 5 hours now and I think I'm going crazy trying to do so.
I was wondering if you would be able to check what's wrong with my code. I think the site may be blocking it somehow, but I'm not sure.
Code:
<?php
$username = 'example#gmail.com';
$password = 'example';
$accountUrl = 'https://namemc.com/login';
$postdata = "email=".$username."&password=".$password;
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_NOBODY, 0);
curl_setopt($ch, CURLOPT_URL, $accountUrl);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
$result = curl_exec($ch);
echo $result;
?>
curl is a tool to transfer data from or to a server. The command is designed to work without user interaction.
You should be using either a cookie or a Header to maintain a session in your code like:
cookie="cookie.txt";
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
Also have you tried arguements in double quotes instead of single .
and ensure send all form data which is needed in $postdata.

PHP Curl followlocation working from command line but not from browser

I have written a small script to scrape some data from a website using cUrl in PHP. When curl is executed, there is a 301-redirect issued by the site which is taken care of by :
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
However, when I run the same code from my browser, the redirect is NOT working.
Here is the complete curl request:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $arr_params['url']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_STDERR, $arr_params['error_file']);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
//curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
In the above code, $arr_params is set previously....

login form requiring loading the page first php curl

I posted this as php/curl but am open to any working solution.
example.com/login.asp has a hidden value inside the login form:
input type="hidden" name="security" value="123456789abcdef"
I tried to use curl to get this extra security value and include it to another curl call however the value changed after the first curl. I have read a related post, which suggests using php file_get_contents but it didn't work with the specific website.
Current php curl looks like this:
function curling ($websitehttps,$postfields,$cookie,$ref,$follow) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $websitehttps);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded', 'Connection: Close'));
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
if ($cookie != "") {
curl_setopt($ch, CURLOPT_COOKIE,$cookie);
}
if ($postfields != "") {
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$postfields);
}
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $follow);
curl_setopt($ch, CURLOPT_AUTOREFERER,TRUE);
curl_setopt($ch, CURLOPT_REFERER, $ref);
curl_setopt($ch, CURLOPT_FAILONERROR, TRUE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
I am required to use the extra security code in post fields ($postfields) which should look like something similar to this:
ref=https%3A%2F%2Fexample.com%2F&security=123456789abcdef
Is there a way to do this?
Adding some extra lines to two separate curl sessions solved the problem.
Lines added to the first curl session:
curl_setopt ($ch, CURLOPT_COOKIEJAR, '/tmp/cookie.txt');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
These additions created a cookie file in tmp folder.
Lines added to the second curl session:
curl_setopt ($ch, CURLOPT_COOKIEFILE, '/tmp/cookie.txt');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
And here, used the information inside the cookie file to get the same security code on login page.
The solution described at another website may also work. In my case, server settings did not let me to use it.

How to retrieve captcha and save session with PHP cURL?

UPDATE: SOLVED
Hi all, i've got it, just save cookie
to temp file, and resubmit form with
curl and set cookies with previous
temp file :) thanks all for respond :)
This my working code
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url_register);
curl_setopt($ch, CURLOPT_USERAGENT, $this->useragent);
curl_setopt($ch, CURLOPT_COOKIEJAR, $this->cookie);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$out['result'] = curl_exec($ch);
$out['error'] = curl_error($ch);
$out['info'] = curl_getinfo($ch);
curl_close($ch);
And for next curl just use CURLOPT_COOKIEFILE like this
/* fetch captcha url with existed cookie */
$ch = curl_init($captcha_url);
curl_setopt($ch, CURLOPT_USERAGENT, $this->useragent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $this->cookie);
curl_setopt($ch, CURLOPT_FILE, $fp);
$out2['result'] = curl_exec($ch);
$out2['error'] = curl_error($ch);
$out2['info'] = curl_getinfo($ch);
curl_close($ch);
Hi all,
i'm create some script to submit content via php curl. first fetch session and captcha, and user must submit captcha to final submit.
the problem is i can't get captcha, i've try with this code and preg_match to get image tag and return it
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_COOKIE, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, "1");
curl_setopt($ch, CURLOPT_COOKIEFILE, "1");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$result = curl_exec($ch);
curl_close($ch);
But no luck, page i'm trying to submit is http://abadijayaiklan.co.cc/pasang-iklan/.
I hope someone can help me out :)
Thanks and regards
From the php manual page on curl_setopt, CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR should both specify a filename. You have them set to '1' (which may be valid, but is that what you intended?)

Categories