Using fsockopen to precompile key jsp pages - php

So the original problem is that we run an "industry standard" java based web app application, on WebSphere App Servers with around 100 million visits per year. The issue is after a restart of these appservers, we need to hit a few of the key pages so that the main servlets get compiled before we let the public onto them ... otherwise they tend to crash in the initial crush.
On some clusters, its about 6 pages that need to be hit, once for each of 35+ markets.... 200 ish url's!
So the script I am working on has all the hard work done of how to put together all these URL's and at the end of it all is a list of 200 url's in an array... now how to hit them?
We were using CGI for this earlier and it's main problem was that is was synchronous... taking a loooooong time. Now I am trying to make a simple url.php which will hit one single URL which I can then call from JQuery in an asynchronous way. I don't want to hit all 200 at first of course, probably in batchs of 5 should mean a 500% speed increase :)
So onto the url.php . I haven't use php much in the past so sockets is a bit new to me. What I have cobbled together so far is this:
function checkUrl($url,$port) {
set_time_limit(20);
ob_start();
header("Content-Type: text/plain");
$u = $url;
$p = $port;
$post = "HEAD / HTTP/1.1\r\n";
$post .= "Host: $u\r\n";
$post .= "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2\r\n";
$post .= "Keep-Alive: 200\r\n";
$post .= "Connection: keep-alive\r\n\r\n";
$sock = fsockopen($u, $p, $errno, $errstr, 10);
if (!$sock) {
echo "$errstr ($errno)<br />\n";
} else {
fwrite($sock, $post, strlen($post));
while (!feof($sock)){
echo fgets($sock);
}
ob_end_flush();
}
}
Which works great if the url is simply someserver.somedomain.com but if the is a Uri tapped on the end it fails (e.g. someserver.somedomain.com/gb/en)
As I understand it, all I have done with the code so far is open the socket connection ... but how can I get it to parse the path separately?
The only output I need from this in the end is the HTTP Status code (200, 404, 301 etc) though it is important that it does fetch the complete page first in order for it to be compiled properly.

Maybe I'm missing something but do you have the curl extension available? No need to get jQuery in the mix, you can run asynchronous queries straight from PHP with ease. You'll also be able to control batch size easily, and put in delays and what-not per your needs. Also I'm not sure why you would need to use a raw socket to hit the JSP pages, hopefully this makes your life easier!
Here's a quick test script I have, based on code from php.net I'm sure:
<?php
// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "http://news.php.net/php.general/255000");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://news.php.net/php.general/255001");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//create the multiple cURL handle
$mh = curl_multi_init();
//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
?>

Related

cURL Mult Simultaneous Requests (domain check)

I'm trying to take a list of 20,000 + domain names and check if they are "alive". All I really need is a simple http code check but I can't figure out how to get that working with curl_multi. On a separate script I'm using I have the following function which simultaneously checks a batch of 1000 domains and returns the json response code. Maybe this can be modified to just get the http response code instead of the page content?
(sorry about the syntax I couldn't get it to paste as a nice block of code without going line by line and adding 4 spaces...(also tried skipping a line and adding 8 spaces)
$dotNetRequests = array of domains...
//loop through arrays
foreach(array_chunk($dotNetRequests, 1000) as $Netrequests) {
$results = checkDomains($Netrequests);
$NetcurlRequest = array_merge($NetcurlRequest, $results);
}
function checkDomains($data) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
// post?
if (is_array($d)) {
if (!empty($d['post'])) {
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
// $result[$id] = curl_multi_getcontent($c);
// if($result[$id]) {
if (curl_multi_getcontent($c)){
//echo "yes";
$netName = $data[$id];
$dName = str_replace(".net", ".com", $netName);
$query = "Update table1 SET dotnet = '1' WHERE Domain = '$dName'";
mysql_query($query);
}
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
In any other language you would thread this kind of operation ...
https://github.com/krakjoe/pthreads
And you can in PHP too :)
I would suggest a few workers rather than 20,000 individual threads ... not that 20,000 threads is out of the realms of possibility - it isn't ... but that wouldn't be a good use of resources, I would do as you are now and have 20 workers getting the results of 1000 domains each ... I assume you don't need me to give the example of getting a response code, I'm sure curl would give it to you, but it's probably overkill to use curl being that you do not require it's threading capabilities: I would fsockopen port 80, fprintf GET HTTP/1.0/\n\n, fgets the first line and close the connection ... if you're going to be doing this all the time then I would also use Connection: close so that the receiving machines are not holding connections unnecessary ...
This script works great for handling bulk simultaneous cURL requests using PHP.
I'm able to parse through 50k domains in just a few minutes using it!
https://github.com/petewarden/ParallelCurl/

Why are curl_multi_select and curl_multi_info_read contradicting each other?

When I run the below code it seems to me curl_multi_select and curl_multi_info_read are contradicting each other. As I understand it curl_multi_select is supposed to be blocking until curl_multi_exec has a response but I haven't seen that actually happen.
$url = "http://google.com";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
$mc = curl_multi_init();
curl_multi_add_handle($mc, $ch);
do {
$exec = curl_multi_exec($mc, $running);
} while ($exec == CURLM_CALL_MULTI_PERFORM);
$ready=curl_multi_select($mc, 100);
var_dump($ready);
$info = curl_multi_info_read($mc,$msgs);
var_dump($info);
this returns
int 1
boolean false
which seems to contradict itself. How can it be ready and not have any messages?
The php version I'm using is 5.3.9
Basically curl_multi_select blocks until there is something to read or send with curl_multi_exec. If you loop around curl_multi_exec without using curl_multi_select this will eat up 100% of a CPU core.
So curl_multi_info_read is used to check if any transfer has ended (correctly or with an error).
Code using the multi handle should follow the following pattern:
do
{
$mrc = curl_multi_exec($this->mh, $active);
}
while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK)
{
curl_multi_select($this->mh);
do
{
$mrc = curl_multi_exec($this->mh, $active);
}
while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($info = curl_multi_info_read($this->mh))
{
$this->process_ch($info);
}
}
See also: Doing curl_multi_exec the right way.
From the spec:
Ask the multi handle if there are any messages or information from the individual transfers. Messages may include information such as an error code from the transfer or just the fact that a transfer is completed.
The 1 could mean there is activity, but not necessarily a message waiting: in this case probably that some of your download data is available, but not all. The example in the curl_multi_select doc explicitly tests for false values back from curl_multi_info_read.

Exit out of a cURL fetch

I'm trying to find a way to only quickly access a file and then disconnect immediately.
So I've decided to use cURL since it's the fastest option for me. But I can't figure out how I should "disconnect" cURL.
With the code below, Apache's access logs says that the file I tried accessing was indeed accessed, but I'm feeling a little iffy about this, because when I just run the while loop without breaking out of it, it just keeps looping. Shouldn't the loop stop when cURL has finished fetching the file? Or am I just being silly; is the loop just restarting constantly?
<?php
$Resource = curl_init();
curl_setopt($Resource, CURLOPT_URL, '...');
curl_setopt($Resource, CURLOPT_HEADER, 0);
curl_setopt($Resource, CURLOPT_USERAGENT, '...');
while(curl_exec($Resource)){
break;
}
curl_close($Resource);
?>
I tried setting the CURLOPT_CONNECTTIMEOUT_MS / CURLOPT_CONNECTTIMEOUT options to very small values, but it didn't help in this case.
Is there a more "proper" way of doing this?
This statement is superflous:
while(curl_exec($Resource)){
break;
}
Instead just keep the return value for future reference:
$result = curl_exec($Resource);
The while loop does not help anything. So now to your question: You can tell curl that it should only take some bytes from the body and then quit. That can be achieved by reducing the CURLOPT_BUFFERSIZE to a small value and by using a callback function to tell curl it should stop:
$withCallback = array(
CURLOPT_BUFFERSIZE => 20, # ~ value of bytes you'd like to get
CURLOPT_WRITEFUNCTION => function($handle, $data) {
echo "WRITE: (", strlen($data), ") $data\n";
return 0;
},
);
$handle = curl_init("http://stackoverflow.com/");
curl_setopt_array($handle, $withCallback);
curl_exec($handle);
curl_close($handle);
Output:
WRITE: (10) <!DOCTYPE
Another alternative is to make a HEAD request by using CURLOPT_NOBODY which will never fetch the body. But it's not a GET request.
The connect timeout settings are about how long it will take until the connect times out. The connect is the phase until the server accepts input from curl and curl starts to know about that the server does. It's not related to the phase when curl fetches data from the server, that's
CURLOPT_TIMEOUT The maximum number of seconds to allow cURL functions to execute.
You find a long list of available options in the PHP Manual: curl_setoptĀ­Docs.
Perhaps that might be helpful?
$GLOBALS["dataread"] = 0;
define("MAX_DATA", 3000); // how many bytes should be read?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch, CURLOPT_WRITEFUNCTION, "handlewrite");
curl_exec($ch);
curl_close($ch);
function handlewrite($ch, $data)
{
$GLOBALS["dataread"] += strlen($data);
echo "READ " . strlen($data) . " bytes\n";
if ($GLOBALS["dataread"] > MAX_DATA) {
return 0;
}
return strlen($data);
}

Curl Multi Threading

i am finding a Curl function which can open particular no. of webpage open at a time also there will no output or returndata false will more good . I need to access 5-10 url at a same time .. I heard abt Curl Multi Threading but dont have proper function or class to use it ..
i find some by searching but most of them seems to be loop mean it i not using continuous connection just one after one ! I want something which can connect multiple connection at a time not one by one !
I made one :
function mutload($url){
if(!is_array($url)){
exit;
}
for($i=0;$i<count($url);$i++){
// create both cURL resources
$ch[] = curl_init();
$ch[] = curl_init();
// set URL and other appropriate options
curl_setopt($ch[$i], CURLOPT_URL, $url[$i]);
curl_setopt($ch[$i], CURLOPT_HEADER, 0);
curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 0);
}
//create the multiple cURL handle
$mh = curl_multi_init();
for($i=0;$i<count($url);$i++){
//add the two handles
curl_multi_add_handle($mh,$ch[$i]);
}
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
for($i=0;$i<count($url);$i++){
curl_multi_remove_handle($mh, $ch[$i]);
}
curl_multi_close($mh);
}
ok ! but i m confused that will it connect all the urls at a time or one by one ! mre over i am geeting the content also i only want to connect or request to the site do not need ay content from there i used RETURNTRASFER,false but didnt work .. please hlep me thanks !
You're looking for the curl_multi_* family of functions. Have a look at curl_multi_exec.
Set CURLOPT_NOBODY to prevent curl from downloading any cotent.
I didn't test your code but curl_multi adds items to a queue from a loop and process them in parallel. Sometimes there can be issues if you are trying to load 100s of URLs, but it should be fine for a few URLs. If you have long DNS lookups or slow servers, all your results will have to wait for the slowest request.
This code is tested and should work, it is somewhat similar to yours:
http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/

How do you detect a website visitor's country (Specifically, US or not)?

I need to show different links for US and non-US visitors to my site. This is for convenience only, so I am not looking for a super-high degree of accuracy, and security or spoofing are not a concern.
I know there are geotargeting services and lists, but this seems like overkill since I only need to determine (roughly) if the person is in the US or not.
I was thinking about using JavaScript to get the user's timezone, but this appears to only give the offset, so users in Canada, Mexico, and South America would have the same value as people in the US.
Are there any other bits of information available either in JavaScript, or PHP, short of grabbing the IP address and doing a lookup, to determine this?
There are some free services out there that let you make country and ip-based geolocalization from the client-side.
I've used the wipmania free JSONP service, it's really simple to use:
<script type="text/javascript">
// plain JavaScript example
function jsonpCallback(data) {
alert('Latitude: ' + data.latitude +
'\nLongitude: ' + data.longitude +
'\nCountry: ' + data.address.country);
}
</script>
<script src="http://api.wipmania.com/jsonp?callback=jsonpCallback"
type="text/javascript"></script>
Or if you use a framework that supports JSONP, like jQuery you can:
// jQuery example
$.getJSON('http://api.wipmania.com/jsonp?callback=?', function (data) {
alert('Latitude: ' + data.latitude +
'\nLongitude: ' + data.longitude +
'\nCountry: ' + data.address.country);
});
Check the above snippet running here.
The best indicator is probably the HTTP Accept-Language header. It will look something like below in the HTTP request:
GET / HTTP/1.1
Accept: */*
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.21022; .NET CLR 3.5.30729; MDDC; OfficeLiveConnector.1.4; OfficeLivePatch.0.0; .NET CLR 3.0.30729)
Accept-Encoding: gzip, deflate
Host: www.google.com
Connection: Keep-Alive
You should be able to retrieve this in PHP using the following:
<?php
echo $_SERVER['HTTP_ACCEPT_LANGUAGE'];
?>
I would say that geotargetting is the only method that's even remotely reliable. But there are also cases where it doesn't help at all. I keep getting to sites that think I'm in France because my company's backbone is there and all Internet traffic goes through it.
The HTTP Accept Header is not enough to determine the user locale. It only tells you what the user selected as their language, which may have nothing to do with where they are. More on this here.
Wipmania.com & PHP
<?php
$site_name = "www.your-site-name.com";
function getUserCountry() {
$fp = fsockopen("api.wipmania.com", 80, $errno, $errstr, 5);
if (!$fp) {
// API is currently down, return as "Unknown" :(
return "XX";
} else {
$out = "GET /".$_SERVER['REMOTE_ADDR']."?".$site_name." HTTP/1.1\r\n";
$out .= "Host: api.wipmania.com\r\n";
$out .= "Typ: php\r\n";
$out .= "Ver: 1.0\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
$country = fgets($fp, 3);
}
fclose($fp);
return $country;
}
}
?>
#rostislav
or using cURL:
public function __construct($site_name) {
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type: text/xml"));
curl_setopt($ch, CURLOPT_URL, "http://api.wipmania.com".$_SERVER['REMOTE_ADDR']."?".$site_name);
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
$response = curl_exec($ch);
$info = curl_getinfo($ch,CURLINFO_HTTP_CODE);
if (($response === false) || ($info !== 200)) {
throw new Exception('HTTP Error calling Wipmania API - HTTP Status: ' . $info . ' - cURL Erorr: ' . curl_error($ch));
} elseif (curl_errno($ch) > 0) {
throw new Exception('HTTP Error calling Wipmania API - cURL Error: ' . curl_error($ch));
}
$this->country = $response;
// close cURL resource, and free up system resources
curl_close($ch);
}
Simply we can use Hostip API
<?php $country_code = file_get_contents("http://api.hostip.info/country.php"); <br/>if($country_code == "US"){ echo "You Are USA"; } <br/>else{ echo "You Are Not USA";} ?>
All Country codes are here..
http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
My solution, easy and small, in this example i test Canada region from language fr-CA or en-CA
if( preg_match( "/^[a-z]{2}\-(ca)/i", $_SERVER[ "HTTP_ACCEPT_LANGUAGE" ] ) ){
$region = "Canada";
}
Depending on which countries you want to distinguish, time zones can be a very easy way to achieve it - and I assume it's quite reliable as most people will have the clocks on their computers set right. (Though of course there are many countries you can't distinguish using this technique).
Here's a really simple example of how to do it:
http://unmissabletokyo.com/country-detector

Categories