Stay on "CURLOPT_URL" for 60s - php

I'm using this code to test my Awstats with private proxies [4 ips]
curl_setopt($ch, CURLOPT_URL, "http://example.com/");
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTPGET,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_PROXY, trim($privateproxylist[$p]));
When I'm checking my stats I can see visits and refs but is there any option to make this script stay on CURLOPT_URL for 60s for each proxy?
Thanks

According to a webpage i found:
Based on the time between a visitor's first and last document access
AWStats tries to calculate an average visit duration.
Therefore you need to wait 60s then make a request to the website again. As i don't know the internals of AWStats, you may need to use a different page url, but in theory, you should be able to just request the same url. Therefore its just a case of:
// 1. Make your curl request to URL
// 2. Wait 60s
sleep(60);
// 3. Make the curl request again
// 4. Change proxy and go back to step 1
Of course this is syncronous, so you will have the script running for a minimum of 4 minutes (based on 4 proxy ips), so dont forget to set the execution time limit of your php script to unlimited (or very high).
You may also need to set the "cookiejar" config on the curl resource, as awstats MAY use a session cookie or something like that, to identify visitors. So a cookiejar text file will need to be set, so the session cookie can be stored and then resent on the second request. Don't forget to clear the cookie file (or just set a new text file in the options) before using a new proxy ip.

Related

Page executes very slow while curl runs

here i explain details about my question
first check below code
$ch = curl_init('http://example123.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $xml);
$result = #curl_exec($ch);
now my question is that if "http://example123.com" is not valid or there is no such URL, then what is the problem.
i have a page there written the above. while execute the code the page takes too much time. but if i comment above 5 line then my page executes faster.
can anybody told me what is the reason behind and why the page execute very slow.
Thanks Sanjib
Your script waits for a response (which may take 60 seconds for default_socket_timeout.)
You should set curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); to make the script redirect from http://example123.com/ to http://ww38.example123.com/, the way it does in browser.
When cUrl request encounters INVALID URI it waits until Default connection timeout is reached, this makes page load slower.
Default Connection timeout set in lib/connect.h (if Linux Server)
You can change it here
#define DEFAULT_CONNECT_TIMEOUT 300000 /* milliseconds == five minutes */
Or you can Explicitly set this in you Codes
curl_setopt( $c, CURLOPT_CONNECTIONTIMEOUT, 100 ); # Or curl_setopt($ch, CURLOPT_CONNECTTIMEOUT_MS ,0);
curl_setopt($ch, CURLOPT_TIMEOUT, 400); # timeout in seconds
CURLOPT_CONNECTTIMEOUT : The number of seconds to wait while trying to connect. Use 0 to wait indefinitely.
CURLOPT_TIMEOUT : The maximum number of seconds to allow cURL functions to execute.
*** (If you are using PHP as a fastCGI application then make sure you check the fastCGI timeout settings.)*

PHP NTLM session with cURL

So a little trivia first..
There is written in ASP.NET website, which uses NTLM protocol to authenticate users that want to log in. It's perfectly ok when they normally use it, they type in website URL, they provide their credentials, authenticate and maintain session in web browser.
What I want to do, is create PHP website that will act as bot. It is my companys internal website and I am approved to do so. The problem I run into, is managing session. Users will be able to type in their credentials in my PHP website, and my PHP website will authenticate them to target site, using cURL.
The code I got so far is:
$cookie_file_path = dirname(__FILE__) . '/cookies.txt';
$ch = curl_init();
//==============================================================
curl_setopt($ch, CURLOPT_USERPWD, $username. ':' . $password);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_FAILONERROR, 0);
curl_setopt($ch, CURLOPT_MAXREDIRS, 100);
//=============================================================
$ret = curl_exec($ch);
Above code logs in to target website by cURL (which manages NTLM handshake, as it seems), and fetches websites content. It also stores Session ID that is sent back in cookie file.
What I'm trying to do next, is comment the CURLOPT_USERPWD option, in hope that this script will use session ID stored in cookie file to authenticate previously logged in user in second execution of this script. It could get rid of user credentials and do not store it anywhere that way, becouse it is not safe to store it in manually created session, database, or anywhere else.
I need this becouse bot will be using CRON to periodically check if website status has changed and perform some user actions as reaction to this. But to do this, user first must be authenticated, and his username and password must not be stored anywhere, so I have to use session information estabilished when he initially logged in.
CURL seems to NOT DO THIS. When I execute script second time with commented CURLOPT_USERPWD option, it does not use stored cookie to keep beeing authenticated. Instead, it REWRITES cookie file with not relevant data send to me from service as response to NOT AUTHRORISED access request.
My questions are:
Why cURL doesnt use stored session information to keep beeing authenticated?
Is there any way to maintain this session with cURL and NTLM protocol based website?
Thanks in advance.
A few Month ago I had a similar problem then you. I tried to get a connection to a navision soap api. Navision use the ntlm authentication. The problem is that curl doesn't native support ntlm so you have to do it yourself.
A blog post that helped me a lot in this situation was the following:
http://rabaix.net/en/articles/2008/03/13/using-soap-php-with-ntlm-authentication
** Edit
Sorry i misread you question.
You problem is simple.
Just receive the header from a request with this line
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
You can then get from the result of curl_exec function, the Set-Cookie header.
preg_match('/^Set-Cookie:\s*([^;]*)/mi', $ret, $match);
$cookie = parse_url($match[0]);
Now you can store it somewhere, and use it on the 2ten request.
I have the same problem and i solved it using curl_setopt($ch, CURLOPT_COOKIEFILE, ""); line of code. The string should be exactly empty.

Is there any advantages storing cURL cookies? And relating questions

I've got a cURL PHP script which works. It gets my schedule from my school site. Though there is one strange thing: On my webhost it creates the cookie.txt and on my localhost it doesn't.
Why doesn't it create a cookie on my localhost? Any suggestions? Something with relative paths and wampserver?
And the questions that follows the latter:
Is there any (speed) advantage of already being logged in on the school site (storing the cookie and thus saving an cURL request)?
I could for example check after the first cURL request if there is evidence in the response that I am already logged in.
If the answer to the above question is: 'no, this doesn't make the script faster' I've got another question:
Is it than best to specify only the CURLOPT_COOKIEFILE option? With an empty value? So no cookie jar?
I can't give you my login information, though here is the script:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,
'http://www.groenewoud.nl/infoweb/infoweb/index.php');
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($curl, CURLOPT_ENCODING, 'gzip');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$tokenSource = curl_exec($curl);
print_r (curl_getinfo($curl));
if (!$tokenSource) echo 'token problem';
// Get the token from within the source codes of infoweb.
preg_match('/name="csrf" value="(.*?)"/', $tokenSource, $token);
$postFields = array(
'user' => $userNum,
'paswoord' => $userPass,
'login' => 'loginform',
'csrf' => $token[1]);
$postData = http_build_query($postFields);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt($curl, CURLOPT_POSTFIELDS, $postData);
$tableSource = curl_exec($curl);
print_r( curl_getinfo($curl));
if (!$tableSource) echo 'post problem';
curl_close($curl);
1) /cookie/cookie.txt means you'd need to have your cookie directory in the ROOT directory of your entire server. cookie/cookie.txt (note: NO leading slash) means the cookie directory would be a sub-directory of your script's CURRENT directory. E.g. your script is running in /a/b/c/, then you'd have /a/b/c/cookie/cookie.txt.
2) For speed advantages, there's no change in HTTP speeds - you're still stuck with the same pipes and transfer rates. But having the cookie initially MIGHT save you a few extra hits on the server to simulate the login-sequence, so would effectively be SLIGHTLY faster.
3) As for creating the cookies, that's entirely up to curl's settings. If you don't specify a cookie file or cookie jar, it won't create or look for the cookie file. Check the configuration/compile options between the two servers to see if one specifies some curl defaults that the other doesn't have.
4) str_pos WOULD be faster than a curl request. Think of it as the difference between looking in your fridge for some food versus driving to the grocery store. Fridge is local and therefore faster.
5) curlopt_cookiefile tells curl where to store new cookies. curlopt_cookiejar tells curl where to load cookies from when it first starts up. They CAN be different files, but don't have to be. If you'd like to keep some "clean" baseline cookies, then you use cookiejar = newstuff.txt, and cookiejar=baseline.txt. Once you've got an appropriate cookie environment set up, you reset cookiejar to newstuff.txt for subsequent curl runs.

Trying to AVOID an ASP.NET session using cURL

I'm using a web-service from a provider who is being a little too helpful in anticipating my needs. They have given me a HTML snippet to paste on my website, for users to click on to trigger their services. I'd prefer to script this process, so I've got a php script which posts a cURL request to the same url, as appropriate. However, this provider is keeping tabs on my session, and interprets each new request as an update of the first one, rather than each being a unique request.
I've contacted the provider regarding my issue, and they've gone so far as to inform me that their system is working as intended, and that it's impossible for me to avoid using the same ASP.NET session for each subsequent cURL request. While my favored option would be to switch to a different vendor, that doesn't appear to be an option right now. Is there a reliable way to get a new ASP.NET session with each cURL request?
I've tried the following set of CURLOPT's, to no avail:
//initialize curl
$ch = curl_init($url);
//build a string out of the post_vars
$post_str = http_build_query($post_vars);
//set the necessary curl options
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_str);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, 1);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "UZ_".uniqid());
curl_setopt($ch, CURLOPT_REFERER, CURRENT_SITE_URL."index.php?newsession=".uniqid());
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Pragma: no-cache", "Cache-Control: no-cache"));
//execute the call to the backend script, retrieve the results
$xmlstr = curl_exec($ch);
If cURL isn't helping much, why not try other methods to call the services from your script, like php's file() function, or file_get_contents().
If you see do not see any difference at all, then the service provider might be using your ip to track your requests. Try using some proxy for a test.
Normal Asp.net session is tracked by a cookie called ASP.NET_SessionId. This cookie is sent within the response to your first request. So as long as your curl requests don't send back this asp.net cookie, each of your requests will have no connection to each other. Use the curl -c option to see what cookies are flying in-between you and them. Overriding this cookie with a cookie file should work if you confirm that it is normal asp.net session being used here.
It is quite poor for a service to use session (http has much cleaner ways of maintaining state which ReST exploits) so I wouldn't completely rule out the vendor switch option.
Well given the options you are using, it seems you have covered your basics. Can you find out how their sessions are setup?
If you know how they setup a session, IE what they use (if it is IP or what not) and then you can figure out a work around. Another option is trying to set the cookies in a different cookie file:
CURLOPT_COOKIEFILE - The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
But if all they do is check cookies your current code should work. If you can figure out what the cookie's name is, you can pass a custom cookie that is blank with the request to see if that works. But if you can get information out of them on how their session's work, that would be best.
use these two line to handle the session:
curl_setopt($ch, CURLOPT_COOKIEJAR, "path/to/cookies.txt"); // cookies.txt should be writable
curl_setopt($ch, CURLOPT_COOKIEFILE, "path/to/cookies.txt");

PHP / Curl: HEAD Request takes a long time on some sites

I have simple code that does a head request for a URL and then prints the response headers. I've noticed that on some sites, this can take a long time to complete.
For example, requesting http://www.arstechnica.com takes about two minutes. I've tried the same request using another web site that does the same basic task, and it comes back immediately. So there must be something I have set incorrectly that's causing this delay.
Here's the code I have:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
$content = curl_exec ($ch);
curl_close ($ch);
Here's a link to the web site that does the same function: http://www.seoconsultants.com/tools/headers.asp
The code above, at least on my server, takes two minutes to retrieve www.arstechnica.com, but the service at the link above returns it right away.
What am I missing?
Try simplifying it a little bit:
print htmlentities(file_get_contents("http://www.arstechnica.com"));
The above outputs instantly on my webserver. If it doesn't on yours, there's a good chance your web host has some kind of setting in place to throttle these kind of requests.
EDIT:
Since the above happens instantly for you, try setting this curl setting on your original code:
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
Using the tool you posted, I noticed that http://www.arstechnica.com has a 301 header sent for any request sent to it. It is possible that cURL is getting this and not following the new Location specified to it, thus causing your script to hang.
SECOND EDIT:
Curiously enough, trying the same code you have above was making my webserver hang too. I replaced this code:
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
With this:
curl_setopt($ch, CURLOPT_NOBODY, true);
Which is the way the manual recommends you do a HEAD request. It made it work instantly.
You have to remember that HEAD is only a suggestion to the web server. For HEAD to do the right thing it often takes some explicit effort on the part of the admins. If you HEAD a static file Apache (or whatever your webserver is) will often step in an do the right thing. If you HEAD a dynamic page, the default for most setups is to execute the GET path, collect all the results, and just send back the headers without the content. If that application is in a 3 (or more) tier setup, that call could potentially be very expensive and needless for a HEAD context. For instance, on a Java servlet, by default doHead() just calls doGet(). To do something a little smarter for the application the developer would have to explicitly implement doHead() (and more often than not, they will not).
I encountered an app from a fortune 100 company that is used for downloading several hundred megabytes of pricing information. We'd check for updates to that data by executing HEAD requests fairly regularly until the modified date changed. It turns out that this request would actually make back end calls to generate this list every time we made the request which involved gigabytes of data on their back end and xfer it between several internal servers. They weren't terribly happy with us but once we explained the use case they quickly came up with an alternate solution. If they had implemented HEAD, rather than relying on their web server to fake it, it would not have been an issue.
If my memory doesn't fails me doing a HEAD request in CURL changes the HTTP protocol version to 1.0 (which is slow and probably the guilty part here) try changing that to:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); // ADD THIS
$content = curl_exec ($ch);
curl_close ($ch);
I used the below function to find out the redirected URL.
$head = get_headers($url, 1);
The second argument makes it return an array with keys. For e.g. the below will give the Location value.
$head["Location"]
http://php.net/manual/en/function.get-headers.php
This:
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
I wasn't trying to get headers.
I was just trying to make the page load of some data not take 2 minutes similar to described above.
That magical little options has dropped it down to 2 seconds.

Categories