How to non block process in php on windows server? - php

I am writing a php script to interact with a CouchDb server. The script reads an SQL database and creates documents and PUTs them on the server. Each script runs every 5 minutes and puts about 2000 documents (creates and updates).
Running sync, this takes about 3 minutes to PUT all the docs. In a test I did using node and promises, I found CouchDb can handle 100 asnyc puts at the same time and respond back in only slightly more time than it took to do a single document. I want to utilize this feature in PHP instead.
I have available, php 5.3 and php 7.0.10 on a Windows server.
How do I do this asnyc?
My first thought was using the pclose(popen()) trick, but that spawns a new process each time, and even if I restrict this to 100 docs at a time (my tests show up to 700 at a time is doable), that would still results in 6 scripts creating and recreating total of 600 new processes every 100/2000 docs every 5 minutes, or a total of 12,000 processes created and run every 5 minutes. I don't think Windows can handle that.
My second idea was to set up a basic node script to handle it, with PHP creating and formatting the data, writing to a file, and passing the file to a node script to process async and report back to PHP using exec. But I am hoping to find a pure PHP solution.
I currently send requests to couch like this
private function sendJSONRequest($method, $url, $post_data = NULL)
{
// Open socket
$s = fsockopen($this->db_host, $this->db_port, $errno, $errstr);
if (!$s) {
throw new Exception("fsockopen: $errno: $errstr");
}
// Prepare request
$request = "$method $url HTTP/1.0\r\n" .
($this->db_auth === false ? "" : "Authorization: $this->db_auth\r\n") .
"User-Agent: couchdb-php/1.0\r\n" .
"Host: $this->db_host:$this->db_port\r\n" .
"Accept: application/json\r\n" .
"Connection: close\r\n";
if ($method == "POST" || $method == "PUT") {
$json_data = json_encode($post_data);
$request .= "Content-Type: application/json\r\n" .
"Content-Length: " . strlen($json_data) . "\r\n\r\n" .
$json_data;
} else {
$request .= "\r\n";
}
// Send request
fwrite($s, $request);
$response = "";
// Receive response
while (!feof($s)) {
$response .= fgets($s);
}
$headers = array();
$body = '';
$reason = '';
if (!empty($response)) {
// Split header & body
list($header, $body) = explode("\r\n\r\n", $response);
// Parse header
$first = true;
foreach (explode("\r\n", $header) as $line) {
if ($first) {
$status = intval(substr($line, 9, 3));
$reason = substr($line, 13);
$first = false;
} else {
$p = strpos($line, ":");
$headers[strtolower(substr($line, 0, $p))] = substr($line, $p + 2);
}
}
} else {
$status = 200;
}
// Return results
return array($status, $reason, $headers, json_decode($body));
}
My PHP knowledge is only basic, so examples to learn from would be greatly appreciated.
Thank you

Guzzle is a PHP library that helps send HTTP requests and can do so asynchronously. The documentation for the async function can be found here.

It's a litte bit ago since i have researched in this topic, but simply what you are looking for is a queue runner system. At my old employee i have worked with a custom built queue runner in php.
That mean, you have e.g. 4 queue runners. Thats are php processes which are watching a control table maybe "queue". Each time a queue process is inserted in maybe status "new" a runner lock this entry and start the process with a fork job.
PHP forking: http://php.net/manual/de/function.pcntl-fork.php
so... this 4 queue runners can maybe let's say fork 10 processes, than you have 40 parallel working processes.
To seperate them what each do is in best way another control tables from which each job selects a amount of data with LIMIT and OFFSET Queries. Lets say job 1 selects the first 0-20 rows, job 2 the 21-40 rows.
Edit:
After a little research this looks nearly similar to on what i've worked: https://github.com/CoderKungfu/php-queue

Related

PHP CLI writing and reading multiple raw tcp API's in parallel like curl_multi_init for https API's

My situation:
I have multiple servers running a raw TCP API that requires me to send a string to get information from them. I need to get a response within a timeout of 5 seconds. All APIs should be contacted at the same time and from there on they got 5 seconds to respond. (So the maximum execution time is 5 seconds for all servers at once)
I already managed to do so for HTTP/S APIs with PHP cURL:
// array of curl handles
$multiCurl = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
foreach ($row_apis as $api) {
$id = $api[0];
$ip = $api[1];
$port = $api[2];
// URL from which data will be fetched
$fetchURL = "$ip:$port/api/status";
$multiCurl[$id] = curl_init();
curl_setopt($multiCurl[$id], CURLOPT_URL,$fetchURL);
//curl_setopt($multiCurl[$id], CURLOPT_HEADER,0);
curl_setopt($multiCurl[$id], CURLOPT_HTTP_VERSION,CURL_HTTP_VERSION_1_1);
curl_setopt($multiCurl[$id], CURLOPT_CUSTOMREQUEST,"GET");
curl_setopt($multiCurl[$id], CURLOPT_TIMEOUT,5);
curl_setopt($multiCurl[$id], CURLOPT_RETURNTRANSFER,1);
curl_multi_add_handle($mh, $multiCurl[$id]);
}
$index=null;
do {
curl_multi_exec($mh,$index);
} while($index > 0);
// get content and remove handles
foreach($multiCurl as $k => $ch) {
$result[$k] = json_decode(curl_multi_getcontent($ch), true);
curl_multi_remove_handle($mh, $ch);
}
// close
curl_multi_close($mh);
This sample fetches all APIs at once and waits 5 seconds for a respose. It will never take longer than 5 seconds.
Is there a way to do the same thing with raw TCP APIs in PHP?
I already tried to use sockets and was able to get the information but every API is fetched after another, so the script takes way to long for multiple servers.
Thanks for your help.
EDIT:
I've tried to implement your suggestions and my code now looks like this:
$apis = array();
$apis[0] = array(1, "123.123.123.123", 1880);
$method = "summary";
$sockets = array();
//Create socket array
foreach($apis as $api){
$id = $api[0];
$ip = $api[1];
$port = $api[2];
$sockets[$id] = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_set_nonblock($sockets[$id]);
#socket_connect($sockets[$id], $ip, $port);
//socket_write($sockets[$id], $method);
}
//Write to every socket
/*
foreach($sockets as $socket){
socket_write($socket, $method);
//fwrite($socket, "$method");
}
*/
//Read all sockets for 5 seconds
$write = NULL;
$except = NULL;
$responses = socket_select($sockets, $write, $except, 5);
//Check result
if($responses === false){
echo "Did not work";
}
elseif($responses > 0){
echo "At least one has responded";
}
//Access the data
//???
But I'm getting a 0 as the result of socket_select...
When do I need to write the method to the socket?
And if I will get something back, how do I access the data that was in the response?
absolutely. set SO_SNDBUF to the appropriate size, so you can send all the requests instantly/non-blockingly, then send all the reqeusts, then start waiting for/reading the responses.
the easy way to do the reading is to call socket_set_block on the sockets, and read all responses 1 by 1, but this doesn't give a hard guarantee of a 5 second timeout (but then again, neither does your example curl_multi code), if you need a 5 second timeout, use socket_set_nonblock & socket_select instead.

Inconsistencies with CURL Multi PHP

When I run a check on 10 urls, if I am able to get a connection with the host server, the handle will return a success message (CURLE_OK)
When processing each handle if a server refuses the connection, the handle will include a error message.
The problem
I assumed that when we get a bad handle, CURL will mark this handle but continue to process the unprocessed handles, however this is not what seems to happen.
When we come across a bad handle, CURL will mark this handle as bad, but will not process the remaining unprocessed handles.
This can be hard to detect, if I do get a connection with all handles, which is what happens most of the time, then the problem is not visible.(CURL only stops on first bad connection);
For the test, I had to find a suitable site which loads slow/refuses x amount simultaneous of connections.
set_time_limit(0);
$l = array(
'http://smotri.com/video/list/',
'http://smotri.com/video/list/sports/',
'http://smotri.com/video/list/animals/',
'http://smotri.com/video/list/travel/',
'http://smotri.com/video/list/hobby/',
'http://smotri.com/video/list/gaming/',
'http://smotri.com/video/list/mult/',
'http://smotri.com/video/list/erotic/',
'http://smotri.com/video/list/auto/',
'http://smotri.com/video/list/humour/',
'http://smotri.com/video/list/film/'
);
$mh = curl_multi_init();
$s = 0;
$f = 10;
while($s <= $f)
{
$ch = curl_init();
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT => 0,
CURLOPT_CONNECTTIMEOUT => 0,
CURLOPT_RETURNTRANSFER => 1
);
curl_setopt_array($ch, $curlsettings);
curl_multi_add_handle($mh,$ch);
$s++;
}
$active = null;
do
{
curl_multi_exec($mh,$active);
curl_multi_select($mh);
$info = curl_multi_info_read($mh);
echo '<pre>';
var_dump($info);
if($info['result'] === CURLE_OK)
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
if($info['result'] != 0)
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
} while ($active > 0);
curl_multi_close($mh);
I have dumped $info in the script which asks the Multi Handle if there is any new information on any handles whilst running.
When the script has ended we will see some bool(false) - when no new information was available(handles were still processing), along with all handles if all was successful or limited handles if one handle failed.
I have failed at fixing this, its probably something I have overlooked and I have gone too far down the road on attempting to fix things which are not relevant.
Some attempts at fixing this was.
Assign each $ch handle to a array - $ch[1], $ch[2] etc (instead of
adding current $ch handle to multi_handle then overwriting - as whats
in the test)
Removing handles after success/failure with
curl_​multi_​remove_​handle
Set CURLOPT_CONNECTTIMEOUT and CURLOPT_TIMEOUT to infinity.
many more.(I will update this post as I have forgotten all of them)
Testing this with Php version 5.4.14
Hopefully I have illustrated the points well enough.
Thanks for reading.
I've been mucking around with your script for a while now trying to get it to work.It was only when I read Repeated calls to this function will return a new result each time, until a FALSE is returned as a signal that there is no more to get at this point., for http://se2.php.net/manual/en/function.curl-multi-info-read.php, that I realized a while loop might work.
The extra while loop makes it behave exactly how you'd expect. Here is the output I get:
http://smotri.com/video/list/sports/ failed
http://smotri.com/video/list/travel/ failed
http://smotri.com/video/list/gaming/ failed
http://smotri.com/video/list/erotic/ failed
http://smotri.com/video/list/humour/ failed
http://smotri.com/video/list/animals/ success
http://smotri.com/video/list/film/ success
http://smotri.com/video/list/auto/ success
http://smotri.com/video/list/ failed
http://smotri.com/video/list/hobby/ failed
http://smotri.com/video/list/mult/ failed
Here's the code I used for testing:
<?php
set_time_limit(0);
$l = array(
'http://smotri.com/video/list/',
'http://smotri.com/video/list/sports/',
'http://smotri.com/video/list/animals/',
'http://smotri.com/video/list/travel/',
'http://smotri.com/video/list/hobby/',
'http://smotri.com/video/list/gaming/',
'http://smotri.com/video/list/mult/',
'http://smotri.com/video/list/erotic/',
'http://smotri.com/video/list/auto/',
'http://smotri.com/video/list/humour/',
'http://smotri.com/video/list/film/'
);
$mh = curl_multi_init();
$s = 0;
$f = 10;
while($s <= $f)
{
$ch = curl_init();
if($s%2)
{
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT_MS => 3000,
CURLOPT_RETURNTRANSFER => 1,
);
}
else
{
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT_MS => 4000,
CURLOPT_RETURNTRANSFER => 1,
);
}
curl_setopt_array($ch, $curlsettings);
curl_multi_add_handle($mh,$ch);
$s++;
}
$active = null;
do
{
$mrc = curl_multi_exec($mh,$active);
curl_multi_select($mh);
while($info = curl_multi_info_read($mh))
{
echo '<pre>';
//var_dump($info);
if($info['result'] === 0)
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
}
else
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
}
}
} while ($active > 0);
curl_multi_close($mh);
Hope that helps. For testing just adjust CURLOPT_TIMEOUT_MS to your internet connection. I made it so it alternates between 3000 and 4000 milliseconds as 3000 will fail and 4000 usually succeeds.
Update
After going through the PHP and libCurl docs I have found how curl_multi_exec works (in libCurl its curl_multi_perform). Upon first being called it starts handling transfers for all the added handles (added before via curl_multi_add_handle).
The number it assigns $active is the number of transfers still running. So if it's less than the total number of handles you have then you know one or more transfers are complete. So curl_multi_exec acts as a kind of progress indicator as well.
As all transfers are handled in a non-blocking fashion (transfers can finish simultaneously) the while loop curl_multi_exec's in cannot represent each iteration of completed url requests.
All data is stored in a queue so as soon as one or more transfers are complete you can call curl_multi_info_read to fetch this data.
In my original answer I had curl_multi_info_read in a while loop. This loop would keep iterating until curl_multi_info_read found no remaining data in the queue. After which the outer while loop would move onto the next iteration if $active != 0 (meaning curl_multi_exec reported transfers still not complete).
To summarize, the outer loop keeps iterating when there are still transfers not completed and the inner loop iterates only when there's data from a completed transfer.
The PHP documentation is pretty bad for curl multi functions so I hope this cleared a few things up. Below is an alternative way to do the same thing.
do
{
curl_multi_exec($mh,$active);
} while ($active > 0);
// while($info = curl_multi_info_read($mh)) would work also here
for($i = 0; $i <= $f; $i++){
$info = curl_multi_info_read($mh);
if($info['result'] === 0)
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
}
else
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
}
}
From this information you can also see curl_multi_select is not needed as you don't want something that blocks until there is activity.
With the code you provided in your question it only seemed like curl wasn't proceeding after a few failed transfers but there was actually still data queued in the buffer. Your code just wasn't calling curl_multi_info_read enough times. The reason all the successful transfers were picked up by your code is due to PHP being run on a single thread and so the script hanged waiting for the requests. The timeouts for the failed requests didn't impact PHP enough to make it hang/wait that long so the number of iterations the while loop was doing was less than the number of queued data.

Google Sitemap Ping Success [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 11 months ago.
Improve this question
I have a php script that creates an xml sitemap. At the end, I use
shell_exec('ping -c1 www.google.com/webmasters/tools/ping?sitemap=sitemapurl');
to submit the updated sitemap to Google Webmaster tools.
Having read the Google documentation, I'm unsure whether I need to do this each time or not. Entering the link in the code manually, results in a success page from google, but using the ping command I receive no confirmation. I would also like to know if there is any way of checking if the command has actually worked.
Here is a script to automatically submit your site map to google, bing/msn and ask:
/*
* Sitemap Submitter
* Use this script to submit your site maps automatically to Google, Bing.MSN and Ask
* Trigger this script on a schedule of your choosing or after your site map gets updated.
*/
//Set this to be your site map URL
$sitemapUrl = "http://www.example.com/sitemap.xml";
// cUrl handler to ping the Sitemap submission URLs for Search Engines…
function myCurl($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return $httpCode;
}
//Google
$url = "http://www.google.com/webmasters/sitemaps/ping?sitemap=".$sitemapUrl;
$returnCode = myCurl($url);
echo "<p>Google Sitemaps has been pinged (return code: $returnCode).</p>";
//Bing / MSN
$url = " https://www.bing.com/webmaster/ping.aspx?siteMap=".$sitemapUrl;
$returnCode = myCurl($url);
echo "<p>Bing / MSN Sitemaps has been pinged (return code: $returnCode).</p>";
//ASK
$url = "http://submissions.ask.com/ping?sitemap=".$sitemapUrl;
$returnCode = myCurl($url);
echo "<p>ASK.com Sitemaps has been pinged (return code: $returnCode).</p>";
you can also send yourself an email if the submission fails:
function return_code_check($pingedURL, $returnedCode) {
$to = "webmaster#yoursite.com";
$subject = "Sitemap ping fail: ".$pingedURL;
$message = "Error code ".$returnedCode.". Go check it out!";
$headers = "From: hello#yoursite.com";
if($returnedCode != "200") {
mail($to, $subject, $message, $headers);
}
}
Hope that helps
Since commands like shell_exec(), exec(), passthru() etc. are blocked by many hosters, you should use curl and check for a response code of 200.
You could also use fsockopen if curl is not available. I'm going to check for the code snippet and update the answer when I found it.
UPDATE:
Found it. I knew I used it somewhere. The funny coincedence: It was in my Sitemap class xD
You can find it here on github: https://github.com/func0der/Sitemap. It is in the Sitemap\SitemapOrg class.
There is a also an example for the curl call implemented.
Either way, here is the code for stand alone implementation.
/**
* Call url with fsockopen and return the response status.
*
* #param string $url
* The url to call.
*
* #return mixed(boolean|int)
* The http status code of the response. FALSE if something went wrong.
*/
function _callWithFSockOpen($url) {
$result = FALSE;
// Parse url.
$url = parse_url($url);
// Append query to path.
$url['path'] .= '?'.$url['query'];
// Setup fsockopen.
$port = 80;
$timeout = 10;
$fso = fsockopen($url['host'], $port, $errno, $errstr, $timeout);
// Proceed if connection was successfully opened.
if ($fso) {
// Create headers.
$headers = 'GET ' . $url['path'] . 'HTTP/1.0' . "\r\n";
$headers .= 'Host: ' . $url['host'] . "\r\n";
$headers .= 'Connection: closed' . "\r\n";
$headers .= "\r\n";
// Write headers to socket.
fwrite($fso, $headers);
// Set timeout for stream read/write.
stream_set_timeout($fso, $timeout);
// Use a loop in case something unexpected happens.
// I do not know what, but that why it is unexpected.
while (!feof($fso)){
// 128 bytes is getting the header with the http response code in it.
$buffer = fread($fso, 128);
// Filter only the http status line (first line) and break loop on success.
if(!empty($buffer) && ($buffer = substr($buffer, 0, strpos($buffer, "\r\n")))){
break;
}
}
// Match status.
preg_match('/^HTTP.+\s(\d{3})/', $buffer, $match);
// Extract status.
list(, $status) = $match;
$result = $status;
}
else {
// #XXX: Throw exception here??
}
return (int) $result;
}
If you guys find any harm or improvement in this code, do not hesitate to open up a ticket/pull request on GitHub, please. ;)
Simplest solution: file_get_contents("https://www.google.com/webmasters/tools/ping?sitemap={$sitemap}");
That will work on every major hosting provider. If you want optional error reporting, here's a start:
$data = file_get_contents("https://www.google.com/webmasters/tools/ping?sitemap={$sitemap}");
$status = ( strpos($data,"Sitemap Notification Received") !== false ) ? "OK" : "ERROR";
echo "Submitting Google Sitemap: {$status}\n";
As for how often you should do it, as long as your site can handle the extra traffic from Google's bots without slowing down, you should do this every time a change has been made.

Need to ping 1000 urls every 2 minutes

I have 1000 feed urls sitting in a MySQL database table. I need to do a http request to all these urls every 2 minutes. I wrote a php script to do that, but the script takes 5min 30sec to run.
I want to be able to finish all the 1000 requests in under a minute. Is there a way to run multiple async processes to get the job done faster? Any help is appreciated. Thanks in advance.
Since your question is about sending http requests, not really ping, you can use Grequests (Requests+gevent) to do it easily and fast (in my experience seconds for a couple hundred url requests):
import grequests
urls = [
'http://www.python.org',
'http://python-requests.org',
'http://www.google.com',
]
rs = (grequests.get(u) for u in urls)
grequests.map(rs) # [<Response [200]>, <Response [200]>, <Response [200]>]
Your Php script takes 5 mins to run because it is synchronous code, which means that for every request you sent, you have to wait for response to arrive before moving onto sending the next request.
The trick here is not to wait (or block as many would call) for responses but go straight to make the next request, and you can achieve it easily with gevent(coroutine-based) or nodejs. You can read more on it here.
Have a look at the AnyEvent::Ping or AnyEvent::FastPing modules on CPAN.
Below is straightforward example of using AnyEvent::Ping to ping 10000 urls:
use strict;
use warnings;
use AnyEvent;
use AnyEvent::Ping;
my $cv = AnyEvent->condvar;
my $ping = AnyEvent::Ping->new;
my #results;
for my $url (get_ten_thousand_urls()) {
$cv->begin;
# ping each URL just once
$ping->ping( $url, 1, sub {
# [ url address, ping status ]
push #results, [ $url, $_[0]->[0][0] ];
$cv->end;
});
}
$cv->recv;
# now do something with #results
Some quick tests of above using 10,000 random URLs all took just over 7 seconds to run on my Macbook Air. With tweaking and/or using faster event loop then this time will drop further (above used default pure Perl event loop).
NB. AnyEvent is an abstraction library which will allow you to use the async event system provided by (or installed on) your system. If you want to use a specific event loop then remember to install the relevant Perl module from CPAN, for e.g. EV if using libev. AnyEvent will default to a pure Perl event loop if nothing else is found (installed).
BTW, If you just need to check an HTTP request (ie. not ping) then simply replace AnyEvent::Ping part with AnyEvent::HTTP.
You tagged this with "python", so I'll assume that using Python is an option here. Look at the multiprocessing module. For example:
#!/usr/bin/env python
import multiprocessing
import os
import requests
import subprocess
addresses = ['1.2.3.4', '1.2.3.5', '4.2.2.1', '8.8.8.8']
null = open(os.devnull, 'w')
def fbstatus(count):
"""Returns the address, and True if the ping returned in under 5 seconds or
else False"""
return (count,
requests.get('http://www.facebook.com/status.php').status_code)
def ping(address):
"""Returns the address, and True if the ping returned in under 5 seconds or
else False"""
return address, not subprocess.call(['ping', '-c1', '-W5', address],
stdout=null)
pool = multiprocessing.Pool(15)
if False:
print pool.map(ping, addresses)
else:
pool.map(fbstatus, range(1000))
New - Fetching pages
The fbstatus() function fetches a page from Facebook. This scaled almost linearly with the size of the pool up through 30 concurrent processes. It averaged a total runtime of about 80 seconds on my laptop. At 30 workers, it took a total of about 3.75 wall clock seconds to finish.
Old - Pinging
This uses the subprocess module to call the ping command with a 5 second timeout and a count of 1. It uses the return value of ping (0 for success, 1 for failure) and negates it to get False for failure and True for success. The ping() function returns the address it was called with plus that boolean result.
The last bit creates a multiprocessing pool with 5 child processes, then calls ping() on each of the values in addresses. Since ping() returns its address, it's really easy to see the result of pinging each of those addresses.
Running it, I get this output:
[('1.2.3.4', False), ('1.2.3.5', False), ('4.2.2.1', True), ('8.8.8.8', True)]
That run took 5.039 seconds of wallclock time and 0% CPU. In other words, it spent almost 100% of its time waiting for ping to return. In your script, you'd want to use something like Requests to fetch your feed URLs (and not the literal ping command that I was using as an example), but the basic structure could be nearly identical.
You could try multithreading ping on python.
Here is good example.
#!/usr/bin/env python2.5
from threading import Thread
import subprocess
from Queue import Queue
num_threads = 4
queue = Queue()
ips = ["10.0.1.1", "10.0.1.3", "10.0.1.11", "10.0.1.51"]
#wraps system ping command
def pinger(i, q):
"""Pings subnet"""
while True:
ip = q.get()
print "Thread %s: Pinging %s" % (i, ip)
ret = subprocess.call("ping -c 1 %s" % ip,
shell=True,
stdout=open('/dev/null', 'w'),
stderr=subprocess.STDOUT)
if ret == 0:
print "%s: is alive" % ip
else:
print "%s: did not respond" % ip
q.task_done()
#Spawn thread pool
for i in range(num_threads):
worker = Thread(target=pinger, args=(i, queue))
worker.setDaemon(True)
worker.start()
#Place work in queue
for ip in ips:
queue.put(ip)
#Wait until worker threads are done to exit
queue.join()
I used Perl's POE Ping Component module for this task quite extensively.
[Update: Re-tested this with maxSockets = 100 and while connected to a very good network connection. The script finished in < 1 second, meaning the biggest factor is probably network thruput / latency, as previously noted. Your results will almost certainly vary. ;) ]
You can use node.js for this, as it's API for doing HTTP is powerful, clean, and simple. E.g. The following script fetches ~1000 requests in 10 seconds less than one second on my MacBook Pro:
test.js
var http = require('http');
// # of simultaneouse requests allowed
http.globalAgent.maxSockets = 100;
var n = 0;
var start = Date.now();
function getOne(url) {
var id = n++;
var req = http.get(url, function(res) {
res.on('data', function(chunk){
// do whatever with response data here
});
res.on('end', function(){
console.log('Response #' + id + ' complete');
n--;
if (n == 0) {
console.log('DONE in ' + (Date.now() - start)/1000 + ' secs');
}
});
});
}
// Set # of simultaneous connections allowed
for (var i = 0; i < 1000; i++) {
getOne('http://www.facebook.com/status.php');
}
Outputs ...
$ node test.js
Response #3 complete
Response #0 complete
Response #2 complete
...
Response #999 complete
DONE in 0.658 secs
Thanks Alex Lunix for the suggestion. I looked up curl_multi_* and found a solution to do it in curl, so I don't have to change my code much. But thank you all the others for the answers. Here is what I did:
<?php
require("class.php");
$obj=new module();
$det=$obj->get_url();
$batch_size = 40;
function curlTest2($urls) {
clearstatcache();
$batch_size = count($urls);
$return = '';
echo "<br/><br/>Batch:";
foreach ($urls as &$url)
{
echo "<br/>".$url;
if(substr($url,0,4)!="http") $url = "http://".$url;
$url = "https://ajax.googleapis.com/ajax/services/feed/load?v=1.0&num=-1&q=".$url;
}
$userAgent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)';
$chs = array();
for ($i = 0; $i < $batch_size; $i++)
{
$ch = curl_init();
array_push($chs, $ch);
}
for ($i = 0; $i < $batch_size; $i++)
{
curl_setopt($chs[$i], CURLOPT_HEADER, 1);
curl_setopt($chs[$i], CURLOPT_NOBODY, 1);
curl_setopt($chs[$i], CURLOPT_USERAGENT, $userAgent);
curl_setopt($chs[$i], CURLOPT_RETURNTRANSFER, 1);
curl_setopt($chs[$i], CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($chs[$i], CURLOPT_FAILONERROR, 1);
curl_setopt($chs[$i], CURLOPT_FRESH_CONNECT, 1);
curl_setopt($chs[$i], CURLOPT_URL, $urls[$i]);
}
$mh = curl_multi_init();
for ($i = 0; $i < $batch_size; $i++)
{
curl_multi_add_handle($mh, $chs[$i]);
}
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
for ($i = 0; $i < $batch_size; $i++)
{
curl_multi_remove_handle($mh, $chs[$i]);
}
curl_multi_close($mh);
}
$startTime = time();
$urls = array();
foreach($det as $key=>$value){
array_push($urls, $value['url']);
if (count($urls) == $batch_size)
{
curlTest2($urls);
$urls = array();
}
}
echo "<br/><br/>Time: ".(time() - $startTime)."sec";
?>
This brought down my processing time from 332sec to 18sec. The code probably can be optimized a little but you get the gist of it.

HTTPS Post Request via PHP and Cookies

I am kinda new to PHP however I used JSP a lot before (I have quite information) and everything was easier with Java classes.
So, now, I want to perform a POST request on a HTTPS page (not HTTP) and need to get returned cookies and past it to another GET request and return the final result. Aim is to make a heavy page for mobile phones more compatible to view in a mobile browser by bypassing the login page and directly taking to the pages which are also served in an ajax user interface.
I am stuck, my code does not work, it says it is Bad Request.
Bad Request
Your browser sent a request that this
server could not understand. Reason:
You're speaking plain HTTP to an
SSL-enabled server port. Instead use
the HTTPS scheme to access this URL,
please.
<?php
$content = '';
$flag = false;
$post_query = 'SOME QUERY'; // name-value pairs
$post_query = urlencode($post_query) . "\r\n";
$host = 'HOST';
$path = 'PATH';
$fp = fsockopen($host, '443');
if ($fp) {
fputs($fp, "POST $path HTTP/1.0\r\n");
fputs($fp, "Host: $host\r\n");
fputs($fp, "Content-length: ". strlen($post_query) ."\r\n\r\n");
fputs($fp, $post_query);
while (!feof($fp)) {
$line = fgets($fp, 10240);
if ($flag) {
$content .= $line;
} else {
$headers .= $line;
if (strlen(trim($line)) == 0) {
$flag = true;
}
}
}
fclose($fp);
}
echo $headers;
echo $content;
?>
From past experience, I've never used PHP's internal functions like fsocketopen() for external data posting. The best way to do these actions are using CURL, which gives much more ease and is massively more powerful for developers to leverage.
for example, look at these functions
http://php.net/curl_setopt
and look at the one with URL, POST, POSTDATA, and COOKIESFILES which is for .JAR, which you get then retrieve and that you can use file_get_contents() to send the data using GET.

Categories