Php curl multiple queries - php

i would like to open all the page ids of the website starting with http://website.com/page.php?id=1 and ending with id=1000
take the data via preg_match and record it somewhere or .txt or .sql
bellow is the curl function i'm using at the moment please kindly advise the full code that will get this job done.
function curl($url)
{
$POSTFIELDS = 'name=admin&password=guest&submit=save';
$reffer = "http://google.com/";
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
$cookie_file_path = "C:/Inetpub/wwwroot/spiders/cookie/cook"; // Please set your Cookie File path. This file must have CHMOD 777 (Full Read / Write Option).
$ch = curl_init(); // Initialize a CURL session.
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL, $url); // The URL to fetch. You can also set this when initializing a session with curl_init().
curl_setopt($ch, CURLOPT_USERAGENT, $agent); // The contents of the "User-Agent: " header to be used in a HTTP request.
curl_setopt($ch, CURLOPT_POST, 1); //TRUE to do a regular HTTP POST. This POST is the normal application/x-www-form-urlencoded kind, most commonly used by HTML forms.
curl_setopt($ch, CURLOPT_POSTFIELDS,$POSTFIELDS); //The full data to post in a HTTP "POST" operation.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).
curl_setopt($ch, CURLOPT_REFERER, $reffer); //The contents of the "Referer: " header to be used in a HTTP request.
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); // The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); // The name of a file to save all internal cookies to when the connection closes.
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

You can try it with the function file_put_contents and a loop calling your function.
$file = "data.txt";
$website_url = "http://website.com/page.php?id=";
for(i = 1; i <= 1000; i++){
file_put_contents($file, curl($website_url.i), FILE_APPEND);
}

Related

Log into website using curl failing

I am trying to login into to a remote site using curl. ( before doing some data scraping)
Using the following code I am producing a cookies.txt file that has the following:
# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
#HttpOnly_www.xxx.com FALSE / TRUE 0 xxxv5 h_r4hXtn-gNAilZwhvHjYdE3Vr4HewhxtGrxja57LbW03-M9MLNqZSeiW7lQ2wRT9lZypNsAiX0gS0Ev1PrvNkGLmwL3B8ZmyOUMLYbTYbSW0y_aPGrIFlEp4skDzh0GJGIGtFHisCmQjEMlu0CJr0UEw2rCT9jbjzg0IyOnFYxNffaMPo229NZWV7HDfCK5M1_y6MPNvW_Kt-h4qTy8YmqGbfBwKxB-bulV78MSXU9ZWz_DVvdu6jXfPiHwCBDMV8FFBLaXm5rqYgNzvbsq8JLe1xkTPn1PNJhyizUa-hlwB6ev8HNwIwBpzs7406l6mL3VgyrDJpay6bHNoMtjh4fLwI7KapFANhFHfn57mg4
#HttpOnly_www.xxx.com FALSE / TRUE 0 ASP.NET_SessionId txakhdi15oeqxyfq53f44dts
When I manually log into the web site the cookie names are correct. So I think I am creating the login ( otherwise the cookies would not be created) but when I output
echo 'HELLO html1 = '.$html1;
I see the page telling me I have entered the wrong username and password.
Code as follows:
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
$username = 'xxx';
$password = 'xxx';
// echo 'STARTING';
//login form action url
$url="https://www.xxxx.com/Login";
$postinfo = "username=".$username."&password=".$password;
$cookie_file_path = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
//set the cookie the site has for certain features, this is optional
curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS,5); // return into a variable
// curl_setopt($ch, CURLOPT_UPLOAD, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST" );
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
// set content length
$headers[] = 'Content-length: 0';
$headers[] = 'Transfer-Encoding: chunked';
curl_setopt($ch, CURLOPT_HTTPHEADER , $headers);
$html1 = curl_exec($ch);
echo 'HELLO html1 = '.$html1;
I cannot show the site for security reasons. ( which may be a killer)
Can anyone point me in the right direction?
first off, this won't work: ini_set('display_startup_errors', 1);
- the startup phase is already finished before the userland php code starts to run,
so this setting is set too late. it must be set in the php.ini config file. (not strictly true, but close enough, like on windows you can do crazy registry hacks to enable it, and you can set it with .user.ini files, etc, more info here http://php.net/manual/en/configuration.php )
second, obvious error here is that you don't urlencode $username and $password in $postinfo = "username=".$username."&password=".$password; -
if the username OR password contains any characters with special meanings in urlencoded format, you'll send the wrong credentials and won't get logged in (this includes &,=,#, spaces, and many other characters). fixed version would look like $postinfo = "username=".urlencode($username)."&password=".urlencode($password);
third, don't use CURLOPT_CUSTOMREQUEST for POST requests,
just use CURLOPT_POST.
fourth, your Content-length header is outright lying. the
correct length is actually 'Content-length: '.strlen($postinfo) - which with your code, is definitely not 0 -
but you shouldn't set this header at all, curl will do it for you
if you don't, and unlike you, curl won't mess up the code calculating
the size, so get rid of the entire line.
fifth, this code is also wrong:
$headers[] = 'Transfer-Encoding: chunked';
your curl code here is NOT using chuncked transfers,
and if it were, curl would send that header automatically,
so get rid of it.
sixth, don't just call curl_setopt, if there's an
error setting any of your options, curl_setopt will return
bool(false), and you should watch out for such errors,
use curl_error to extract the error message, and throw an exception,
if such an error occur. - instead of what your code is doing right now,
silently ignoring any curl_setopt errors. use something like
function ecurl_setopt($ch,int $option, $value){if(!curl_setopt($ch,$option,$value)){throw new \RuntimeException('curl_setopt failed!: '.curl_error($ch));}}
if fixing all of these problems is not enough to log in, you're not giving us enough information to help you any further. what does the browsers http login request look like? or what is the login url?
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
$username = 'xxx';
$password = 'xxx';
//login form action url
$url="https://www.xxxx.com/Login";
$postinfo = array("username"=>$username,"password"=>$password);
$cookie_file_path = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch,CURLOPT_COOKIEFILE,$cookie_file_path);
curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie_file_path);
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postinfo);
$html = curl_exec($ch);
echo $html;
Above code must works fine.
If there is still an issue, you must check cookie.txt file permissions.
Also if there is an invisible data needs to be sent including post, you can check it using firefox Live Http Headers plugin.
It is not as simple as reading the HTML page using curl. You need to supply a POST value for the submit button. If there is any javascript that executes prior to the activation of ACTION script, then that has to be looked at as well.
Usually you get better results if you use Selenium. See http://www.seleniumhq.org/
EDIT1:
If the server is rejecting your post string try: curl_setopt($handle, CURLOPT_POSTFIELDS, http_build_query($data));

.txt->Curl->Pregmetch->Mysql

1)Instead of http://webiste.com/filename i want to take the data from the .txt file which contains 20374 rows and each row contains different website
for example:
website1.com
website2.com
website3.com
etc.
2)parse them individually using curl command
3)get the needed data via preg_match
4)and final results i want to save to my mysql databse
bellow is the code that i am using at the moment, please advise the solution what needs to be added to achieve this goal ?
function curl($url)
{
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: ddosdefend=1d4607e3ac67b865e6c7263260c34e888cae7c56"));
curl_setopt($ch, CURLOPT_USERAGENT, $agent); // The contents of the "User-Agent: " header to be used in a HTTP request.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$test = curl('http://webiste.com/filename');
preg_match('/<iframe class=\"metaframe rptss\" src="(.*?)"/', $test, $matches);
$test2 = $matches[1];

CURL doesn't post data PHP

I am doing a curl in PHP to post data on site and echo the result but it doesn't post data
here is my code:
<?php
$imei = "imei=XXXXXXXXX";
//set POST variables
$url = 'http://XXX.com/XXX.php';
//open connection
$ch = curl_init();
//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_POST, true);
curl_setopt($ch,CURLOPT_POSTFIELDS, $imei);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.XXX.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_STDERR, fopen('php://output', 'w+'));
curl_setopt($ch, CURLOPT_VERBOSE, 1);
//execute post
$result = curl_exec($ch);
//close connection
curl_close($ch);
echo $result;
echo $imei;
?>
when i echo $imei, it shows up the full string but data isn't passed :S
Please help, what's wrong with code??
Thanks in advance
Update: In the orgignal html of the website it's name="imei" not id="imei"
You haven't set the option CURLOPT_RETURNTRANSFER. cURL does not return the response if you don't set this option to true.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
I also suggest changing your POST data to an array instead of a string. The cURL option accepts an array and it automatically builds and encodes the string from that. if you build it yourself, you have to encode it as well.
$imei = array('imei' => 'XXXXXXXXX');
Edit: based on your debug data, I think your resource requires a GET request not post.
You should also set a user-agent, because the site may reject requests without one:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)');
From the Manual:
CURLOPT_RETURNTRANSFER
TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.

Send data using curl

Trying to send data to a server which accepts data in the following format
VERIFY_DATA=MER_ID=xxx|MER_TRNX_ID=xxx| MER_TRNX_AMT=xxx
Will the following lines do?
$datatopost="VERIFY_DATA=MER_ID=xxx|MER_TRNX_ID=xxx| MER_TRNX_AMT=xxx";
curl_setopt ($ch, CURLOPT_POSTFIELDS,$datatopost);<br />
Any help will be appreciated,I am new with curl.
you can use this article to see how to do it properly.
I personally used this code to do it on a project of mine
$data="from=$from&to=$to&body=".urlencode($body)."&url=$url";
//$urlx contains the url where you want to post. $data contains the data you are posting
//$resp contains the response.
$process = curl_init($urlx);
curl_setopt($process, CURLOPT_HEADER, 0);
curl_setopt($process, CURLOPT_POSTFIELDS, $data);
curl_setopt($process, CURLOPT_POST, 1);
curl_setopt($process, CURLOPT_RETURNTRANSFER,1);
curl_setopt($process,CURLOPT_CONNECTTIMEOUT,1);
$resp = curl_exec($process);
curl_close($process);
here is a working example in php. This asks for and returns an FX quote. My data request is in the URL yours is in the post-fields though so you need to adjust. It looks as though you have spaces in the data you are passing "x| ME" i suspect it will not like that.
$ch = curl_init(); // initialise CURL
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $curl_opt_string ); // this contains the URL and the request
curl_setopt($ch, CURLOPT_HEADER, false); // no header
curl_setopt($ch, CURLOPT_INTERFACE, "93.129.141.79"); // where to send the data back to / outgoing network
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"); // tell it what the agent is
curl_setopt($ch, CURLOPT_POST, 1); // want the data back as a post
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return as a string rather than to the screen
$output = curl_exec($ch); // varu=iable to return it to
curl_close($ch); // close cURL resource, and free up system resources
$subject = $output; // get the data
I constructed my output sting as follows
$curl_opt_string = "http://msxml.rexefore.com/index.php?username=MiJoee4r65&password=L8e44Y&instrument=245.20." . $lhsrhs . "LITE&fields=D4,D6";

Curl redirect,, not working?

I'm using the following code:
$agent= 'Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.9) Gecko/2008052906 Firefox/3.0';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, "www.example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$output = curl_exec($ch);
echo $output;
But it redirects to like this:
http://localhost/aide.do?sht=_aide_cookies_
Instead of to the URL page.
Can anyone help me solve my problem, please?
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
http://docs.php.net/function.curl-setopt says:
CURLOPT_FOLLOWLOCATION
TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).
If it's up to URL redirection only then see the following code, I've documented it for you so you can use it easily & directly, you've two main cURL options control URL redirection (CURLOPT_FOLLOWLOCATION/CURLOPT_MAXREDIRS):
// create a new cURL resource
$ch = curl_init();
// The URL to fetch. This can also be set when initializing a session with curl_init().
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
// The contents of the "User-Agent: " header to be used in a HTTP request.
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.9) Gecko/2008052906 Firefox/3.0");
// TRUE to include the header in the output.
curl_setopt($ch, CURLOPT_HEADER, false);
// TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// The maximum amount of HTTP redirections to follow. Use this option alongside CURLOPT_FOLLOWLOCATION.
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
// grab URL and pass it to the output variable
$output = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
// Print the output from our variable to the browser
print_r($output);
The above code handles the URL redirection issue, but it doesn't deal with cookies (your localhost URL seems to be dealing with cookies). If you wish to deal with cookies from the cURL resource, then you may have to give the following cURL options a look:
CURLOPT_COOKIE
CURLOPT_COOKIEFILE
CURLOPT_COOKIEJAR
For further details please follow the following link:
http://docs.php.net/function.curl-setopt

Categories