Can I compile my PHP script to a faster executing format? - php

I have a PHP script that acts as a JSON API to my backend database.
Meaning, you send it an HTTP request like: http://example.com/json/?a=1&b=2&c=3... it will return a json object with the result set from my database.
PHP works great for this because it's literally about 10 lines of code.
But I also know that PHP is slow and this is an API that's being called about 40x per second at times and PHP is struggling to keep up.
Is there a way that I can compile my PHP script to a faster executing format? I'm already using PHP-APC which is a bytecode optimization for PHP as well as FastCGI.
Or, does anyone recommend a language I rewrite the script in so that Apache can still process the example.com/json/ requests?
Thanks
UPDATE: I just ran some benchmarks:
PHP script takes 0.6 second to
complete
If I use the generated SQL from the PHP script above and run the query from the same web server but directly from within the MySQL command, meaning, network latency is still in play - the fetched result set takes only 0.09 seconds to complete.
As you notice, PHP is literally 1 order of magnitude slower in generating the results. Network does not appear to be the major bottleneck in this case, though I agree it typically is the root cause.

Before you go optimizing something, first figure out if it's a problem. Considering it's only 10 lines of code (according to you) I very much suspect you don't have a problem. Time how long the script takes to execute. Bear in mind that network latency will typically dwarf trivial script execution times.
In other words: don't solve a problem until you have a problem.
You're already using an opcode cache (APC). It doesn't get much faster than that. More to the point, it rarely needs to get any faster than that.
If anything you'll have problems with your database. Too many connections (unlikely at 20x per second), too slow to connect or the big one: query is too slow. If you find yourself in this situation 9 times out of 10 effective indexing and database tuning is sufficient.
In the cases where it isn't is where you go for some kind of caching: memcached, beanstalkd and the like.
But honestly 20x per second means that these solutions are almost certainly overengineering for something that isn't a problem.

I've had a lot of luck with using PHP, memcached and nginx's memcache module together for very fast results. The easiest way is to just use the full URL as the cache key
I'll assume this URL:
/widgets.json?a=1&b=2&c=3
Example PHP code:
<?
$widgets_cache_key = $_SERVER['REQUEST_URI'];
// connect to memcache (requires memcache pecl module)
$m = new Memcache;
$m->connect('127.0.0.1', 11211);
// try to get data from cache
$data = $m->get($widgets_cache_key);
if(empty($data)){
// data is not in cache. grab it.
$r = mysql_query("SELECT * FROM widgets WHERE ...;");
while($row = mysql_fetch_assoc($r)){
$data[] = $row;
}
// now store data for next time.
$m->set($widgets_cache_key, $data);
}
var_dump(json_encode($data));
?>
That in itself provides a huge performance boost. If you were to then use nginx as a front-end for Apache (put Apache on 8080 and nginx on 80), you could do this in your nginx config:
worker_processes 2;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
access_log off;
sendfile on;
keepalive_timeout 5;
tcp_nodelay on;
gzip on;
upstream apache {
server 127.0.0.1:8080;
}
server {
listen 80;
server_name _;
location / {
if ($request_method = POST) {
proxy_pass http://apache;
break;
}
set $memcached_key $uri;
memcached_pass 127.0.0.1:11211;
default_type text/html;
proxy_intercept_errors on;
error_page 404 502 = /fallback;
}
location /fallback {
internal;
proxy_pass http://apache;
break;
}
}
}
Notice the set $memcached_key $uri; line. This sets the memcached cache key to use REQUEST_URI just like the PHP script. So if nginx discovers a cache entry with that key it will serve it directly from memory, and you never have to touch PHP or Apache. Very fast.
There is an unofficial Apache memcache module as well. Haven't tried it but if you don't want to mess with nginx this may help you as well.

The first rule of optimization is to make sure you actually have a performance problem. The second rule is to figure out where the performance problem is by measuring your code. Don't guess. Get hard measurements.
PHP is not going to be your bottleneck. I can pretty much guarantee that. Network bandwidth and latency will dwarf the small overhead of using PHP vs. a compiled C program. And if not network speed, then it will be disk I/O, or database access, or a really bad algorithm, or a host of other more likely culprits than the language itself.

If your database is very read-heavy (I'm guessing it is) then a basic caching implementation would help, and memcached would make it very fast.
Let me change your URL structure for this example:
/widgets.json?a=1&b=2&c=3
For each call to your web service, you'd be able to parse the GET arguments and use those to create a key to use in your cache. Let's assume you're querying for widgets. Example code:
<?
// a function to provide a consistent cache key for your resource
function cache_key($type, $params = array()){
if(empty($type)){
return false;
}
// order your parameters alphabetically by key.
ksort($params);
return sha1($type . serialize($params));
}
// you get the same cache key no matter the order of parameters
var_dump(cache_key('widgets', array('a' => 3, 'b' => 7, 'c' => 5)));
var_dump(cache_key('widgets', array('b' => 7, 'a' => 3, 'c' => 5)));
// now let's use some GET parameters.
// you'd probably want to sanitize your $_GET array, however you want.
$_GET = sanitize($_GET);
// assuming URL of /widgets.json?a=1&b=2&c=3 results in the following func call:
$widgets_cache_key = cache_key('widgets', $_GET);
// connect to memcache (requires memcache pecl module)
$m = new Memcache;
$m->connect('127.0.0.1', 11211);
// try to get data from cache
$data = $m->get($widgets_cache_key);
if(empty($data)){
// data is not in cache. grab it.
$r = mysql_query("SELECT * FROM widgets WHERE ...;");
while($row = mysql_fetch_assoc($r)){
$data[] = $row;
}
// now store data for next time.
$m->set($widgets_cache_key, $data);
}
var_dump(json_encode($data));
?>

You're already using APC opcode caching which is good. If you find you're still not getting the performance you need, here are some other things you could try:
1) Put a Squid caching proxy in front of your web server. If your requests are highly cacheable, this might make good sense.
2) Use memcached to cache expensive database lookups.

Consider that if you're handling database updates, your MySQL performance is what, IMO, needs attention. I would expand the test harness like so:
run mytop on the dbserver
run ab (apache bench) from a client, like your desktop
run top or vmstat on the webserver
And watch for these things:
updates to the table forcing reads to wait (MyISAM engine)
high load on the webserver (could indicate low memory conditions on webserver)
high disk activity on webserver, possibly from logging or other web requests causing random seeking of uncached files
memory growth of your apache processes. If your result sets are getting transformed into large associative arrays, or getting serialized/deserialized, these can become expensive memory allocation operations. Your code might need to avoid calls like mysql_fetch_assoc() and start fetching one row at a time.
I often wrap my db queries with a little profiler adapter that I can toggle to log unusually query times, like so:
function query( $sql, $dbcon, $thresh ) {
$delta['begin'] = microtime( true );
$result = $dbcon->query( $sql );
$delta['finish'] = microtime( true );
$delta['t'] = $delta['finish'] - $delta['begin'];
if( $delta['t'] > $thresh )
error_log( "query took {$delta['t']} seconds; query: $sql" );
return $result;
}
Personally, I prefer using xcache to APC, because I like the diagnostics page it comes with.
Chart your performance over time. Track the number of concurrent connections and see if that correlates to performance issues. You can grep the number of http connections from netstat from a cronjob and log that for analysis later.
Consider enabling your mysql query cache, too.

Please see this question. You have several options. Yes, PHP can be compiled to native ELF (and possibly even FatELF) format. The problem is all of the Zend creature comforts.

Since you already have APC installed, it can be used (similar to the memcached recommendations) to store objects. If you can cache your database results, do it!
http://us2.php.net/manual/en/function.apc-store.php
http://us2.php.net/manual/en/function.apc-fetch.php

From your benchmark it looks like the php code is indeed the problem. Can you post the code?
What happens when you remove the MySQL code and just put in a hard-coded string representing what you'll get back from the db?
Since it takes .60 seconds from php and only .09 seconds from a MySQL CLI I will guess that the connection creation is taking too much time. PHP creates a new connection per request by default and that can be slow sometimes.
Think about it, depending on your env and your code you will:
Resolve the hostname of the MySQL server to an IP
Open a connection to the server
Authenticate to the server
Finally run your query
Have you considered using persistent MySQL connections or connection pooling?
It effectively allows you to jump right to query step from above.
Caching is great for performance as well. I think others have covered this pretty well already.

Related

PHP Caching with APC

Let's say I cache data in a PHP file in PHP array like this:
/cache.php
<?php return (object) array(
'key' => 'value',
);
And I include the cache file like this:
<?php
$cache = include 'cache.php';
Now, the question is will the cache file be automatically cached by APC in the memory? I mean as a typical opcode cache, as all .php files.
If I store the data differently for example in JSON format (cache.json), the data will not be automatically cached by APC?
Would apc_store be faster/preferable?
Don't mix APC's caching abilities with its ability to optimize intermediate code and cache compiled code. APC provides 2 different things:
It gives a handy method of caching data structures (objects,
arrays etc), so that you can store/get them with apc_store and
apc_fetch
It keeps a compiled version of your scripts so that the
next time they run, they run faster
Let's see an example for (1): Suppose you have a data structure which takes 1 second to calculate:
function calculate_array() {
sleep(1);
return array('foo' => 'bar');
}
$data = calculate_array();
You can store its output so that you don't have to call the slow calculate_array() again:
function calculate_array() {
sleep(1);
return array('foo' => 'bar');
}
if (!apc_exists('key1')) {
$data = calculate_array();
apc_store('key1', $data);
} else {
$data = apc_fetch('key1');
}
which will be considerably faster, much less than the original 1 second.
Now, for (2) above: having APC will not make your program run faster than 1 second, which is the time that calculate_array() needs. However, if your file additionally needed (say) 100 milliseconds to initialize and execute, simply having enabled APC will make it need (approx) 20 millisecond. So you have an 80% increase in initialization/preparation time. This can make quite a difference in production systems, so simply installing APC can have a noticeable positive impact on your script's performance, even if you never explicitly call any of its functions
If you are just storing static data (as in your example), it would be preferable to use apc_store.
The reasoning behind this is not so much whether the opcode cache is faster or slower, but the fact you are using include to fetch static data into scope.
Even with an opcode cache, the file will still be checked for consistency on each execution. PHP will not have to parse the contents, but it will have to check whether the file exists, and that it hasn't changed since the opcode cache was created. Filesystem checks are resource expensive, even if it is only to stat a file.
Therefore, of the two approaches I would use apc_store to remove the filesystem checks completely.
Unlike the other answer I would use the array-file-solution (the first one)
<?php return (object) array(
'key' => 'value',
);
The reason is, that with both solutions you are on the right side, but when you let the caching up to APC itself you don't have to juggle around with the apc_*()-functions. You simply include and use it. When you set
apc.stat = 0
you avoid the stat-calls on every include too. This is useful for production, but remember to clear the system-cache on every deployment.
http://php.net/apc.configuration.php#ini.apc.stat
Oh, not to forget: With the file-approach it works even without APC. Useful for the development setup, where you usually shouldn't use any caching.

Best way to implement frequent calls to taxing Python scripts

I'm working on a bit of python code that uses mechanize to grab data from another website. Because of the complexity of the website the code takes 10-30 seconds to complete. It has to work its way through a couple pages and such.
I plan on having this piece of code being called fairly frequently. I'm wondering the best way to implement something like this without causing a huge server load. Since I'm fairly new to python I'm not sure how the language works.
If the code is in the middle of processing one request and another user calls the code, can two instances of the code run at once? Is there a better way to implement something like this?
I want to design it in a way that it can complete the hefty tasks without being too taxing on the server.
My data grabbing scripts all use caching to minimize load on the server. Be sure to use HTTP tools such as, "If-Modified-Since", "If-None-Match", and "Accept-Encoding: gzip".
Also consider using the multiprocessing module so that you can run requests in parallel.
Here is an excerpt from my downloader script:
def urlretrieve(url, filename, cache, lock=threading.Lock()):
'Read contents of an open url, use etags and decompress if needed'
request = urllib2.Request(url)
request.add_header('Accept-Encoding', 'gzip')
with lock:
if ('etag ' + url) in cache:
request.add_header('If-None-Match', cache['etag ' + url])
if ('date ' + url) in cache:
request.add_header('If-Modified-Since', cache['date ' + url])
try:
u = urllib2.urlopen(request)
except urllib2.HTTPError as e:
return Response(e.code, e.msg, False, False)
content = u.read()
u.close()
compressed = u.info().getheader('Content-Encoding') == 'gzip'
if compressed:
content = gzip.GzipFile(fileobj=StringIO.StringIO(content), mode='rb').read()
written = writefile(filename, content)
with lock:
etag = u.info().getheader('Etag')
if etag:
cache['etag ' + url] = etag
timestamp = u.info().getheader('Date')
if timestamp:
cache['date ' + url] = timestamp
return Response(u.code, u.msg, compressed, written)
The cache is an instance of shelve, a persistent dictionary.
Calls to the downloader are parallelized with imap_unordered() on a multiprocessing Pool instance.
You can run more than one python process at a time. As for causing excessive load on the server that can only be alleviated by making sure either you only have one instance running at any given time or some other number of processes, say two. To accomplish this you can look at using a lock file or some kind of system flag, mutex ect...
But, the best way to limit excessive use is to limit the number of tasks running concurrently.
The basic answer is "you can run as many instances of your program you want, as long as they don't share critical ressources (like a database)".
In real-world, you often need to use the "multiprocessing module" and to properly design your application to handle concurrency and avoid corrupted states or deadlocks.
By the way, design of multiprocesses app's is beyond the scope of a simple question on StackOverflow...

Options to lessen database open connection?

I have a free application that is very used and I get around 500 to 1000 concurrent users from time to time.
This application is a desktop application that will communicate with my website API to receive data every 5 ~ 15 minutes as well as send back minimum data about 3 selects top every 15 minutes.
Since users can turn the application on and off as they wish the timer for each one of them to query my API may vary and as such I have been hitting the max connection limit available for my hosting plan.
Not wanting to upgrade it for financial matter as well as because it is a non-profitable application for the moment I am searching for other options to reduce the amount of connections and cache some information that can be cached.
The first thing that came to my mind was to use FastCGI with Perl I have tested it for some time now and it seems to work great but I have to problems while using it:
if for whatever reason the application goes idle for 60 the
server kills it and for the next few requests it will reply with
error 500 until the script is respawned which takes about 3+ minutes
(yes it takes that much I have tried my code locally on my own test
server and it comes up instantly so I am sure it is a server issue
of my hosting company but they don't seem like wanting to resolve
it).
the kill timeout which is set to 300 and will kill/restart the
script after that period which would result on the above said at 1)
about the respawn of the script.
Given that I am now looking for alternatives that are not based on FastCGI if there is any.
Also due to the limitations of the shared host I can't make my own daemon and my access to compile anything is very limited.
Are there any good options that I can archive this with either Perl or PHP ?
Mainly reduce the database open connections to a minimum and still be able to cache some select queries for returning data... The main process of the application is inserting/updating data anyway so there inst much to cache.
This was the simple code I was using for testing it:
#!/usr/bin/perl -w
use CGI::Simple; # Can't use CGI as it doesn't clear the data for the
# next request haven't investigate it further but needed
# something working to test and using CGI::Simples was
# the fastest solution found.
use DBI;
use strict;
use warnings;
use lib qw( /home/my_user/perl_modules/lib/perl/5.10.1 );
use FCGI;
my $dbh = DBI->connect('DBI:mysql:mydatabase:mymysqlservername',
'username', 'password',
{RaiseError=>1,AutoCommit=>1}
) || die &dbError($DBI::errstr);
my $request = FCGI::Request();
while($request->Accept() >= 0)
{
my $query = new CGI::Simple;
my $action = $query->param("action");
my $id = $query->param("id");
my $server = $query->param("server");
my $ip = $ENV{'REMOTE_ADDR'};
print $query->header();
if ($action eq "exp")
{
my $sth = $dbh->prepare(qq{
INSERT INTO
my_data (id, server) VALUES (?,INET_ATON(?))
ON DUPLICATE KEY UPDATE
server = INET_ATON(?)});
my $result = $sth->execute($id, $server, $server)
|| die print($dbh->errstr);
$sth->finish;
if ($result)
{
print "1";
}
else
{
print "0";
}
}
else
{
print "0";
}
}
$dbh->disconnect || die print($DBI::errstr);
exit(0);
sub dbError
{
my ($txt_erro) = #_;
my $query = new CGI::Simple;
print $query->header();
print "$txt_erro";
exit(0);
}
Run a proxy. Perl's DBD::Proxy should fit the bill. The proxy server shouldn't be under your host's control, so its 60-???-of-inactivity rule shouldn't apply here.
Alternatively, install a cron job that runs more often than the FastCGI timeout, simply to wget some "make activity" page on your site, and discard the output. Some CRMs do this to force a "check for updates" for example, so it's not completely unusual, though somewhat of an annoyance here.
FWIW, you probably want to look at CGI::Fast instead of CGI::Simple to resolve your CGI.pm not dealing in the expected manner with persistent variables...

Redirect user to a static page when server's usage is over a limit

We would like to implement a method that checks mysql load or total sessions on server and
if this number is bigger than a value then the next visitor of the website is redirected to a static webpage with a message Too many users try later.
One way I implemented it in my website is to handle the error message MySQL outputs when it denies a connection.
Sample PHP code:
function updateDefaultMessage($userid, $default_message, $dttimestamp, $db) {
$queryClients = "UPDATE users SET user_default_message = '$default_message', user_dtmodified = '$dttimestamp' WHERE user_id = $userid";
$resultClients = mysql_query($queryClients, $db);
if (!$resultClients) {
log_error("[MySQL] code:" . mysql_errno($db) . " | msg:" . mysql_error($db) . " | query:" . $queryClients , E_USER_WARNING);
$result = false;
} else {
$result = true;
}
}
In the JS:
function UpdateExistingMsg(JSONstring)
{
var MessageParam = "msgjsonupd=" + JSON.encode(JSONstring);
var myRequest = new Request({url:urlUpdateCodes, onSuccess: function(result) {if (!result) window.open(foo);} , onFailure: function(result) {bar}}).post(MessageParam);
}
I hope the above code makes sense. Good luck!
Here are some alternatives to user-lock-out that I have used in the past to decrease load:
APC Cache
PHP APC cache (speeds up access to your scripts via in memory caching of the scripts): http://www.google.com/search?gcx=c&ix=c2&sourceid=chrome&ie=UTF-8&q=php+apc+cache
I don't think that'll solve "too many mysql connections" for you, but it should really really help your website's speed in general, and that'll help mysql threads open and close more quickly, freeing resources. It's a pretty easy install on a debian system, and hopefully anything with package management (perhaps harder if you're using a if you're using a shared server).
Cache the results of common mysql queries, even if only within the same script execution. If you know that you're calling for certain data in multiple places (e.g. client_info() is one that I do a lot), cache it via a static caching variable and the info parameter (e.g.
static $client_info;
static $client_id;
if($incoming_client_id == $client_id){
return $client_info;
} else {
// do stuff to get new client info
}
You also talk about having too many sessions. It's hard to tell whether you're referring to $_SESSION sessions, or just browsing users, but too many $_SESSION sessions may be an indication that you need to move away from use of $_SESSION as a storage device, and too many browsing users, again, implies that you may want to selectively send caching headers for high use pages. For example, almost all of my php scripts return the default caching, which is no cache, except my homepage, which displays headers to allow browsers to cache for a short 1 hour period, to reduce overall load.
Overall, I would definitely look into your caching procedures in general in addition to setting a hard limit on usage that should ideally never be hit.
This should not be done in PHP. You should do it naturally by means of existing hard limits.
For example, if you configure Apache to a known maximal number of clients (MaxClients), once it reaches the limit it would reply with error code 503, which, in turn, you can catch on your nginx frontend and show a static webpage:
proxy_intercept_errors on;
error_page 503 /503.html;
location = /503.html {
root /var/www;
}
This isn't hard to do as it may sound.
PHP isn't the right tool for the job here because once you really hit the hard limit, you will be doomed.
The seemingly simplest answer would be to count the number of session files in ini_get("session.save_path"), but that is a security problem to have access to that directory from the web app.
The second method is to have a database that atomically counts the number of open sessions. For small numbers of sessions where performance really isn't an issue, but you want to be especially accurate to the # of open sessions, this will be a good choice.
The third option that I recommend would be to set up a chron job that counts the number of files in the ini_get('session.save_path') directory, then prints that number to a file in some public area on the filesystem (only if it has changed), visible to the web app. This job can be configured to run as frequently as you'd like -- say once per second if you want better resolution. Your bootstrap loader will open this file for reading, check the number, and give the static page if it is above X.
Of course, this third method won't create a hard limit. But if you're just looking for a general threshold, this seems like a good option.

How can my PHP script tell whether the server is busy?

I want to run a cron job that does cleanup that takes a lot of CPU and Mysql resources. I want it to run only if the server is not relatively busy.
What is the simplest way to determine that from PHP? (for example, is there a query that returns how many queries were done in the last minute? ...)
if (function_exists('sys_getloadavg')) {
$load = sys_getloadavg();
if ($load[0] > 80) {
header('HTTP/1.1 503 Too busy, try again later');
die('Server too busy. Please try again later.');
}
}
Try to update this function to your needs
If this is a Unix system, parse the output of uptime. This shows the CPU load which is generally considered to be a good measure of "busy." Anything near or over 1.0 means "completely busy."
There are three CPU "load" times in that output giving the load average over 1, 5, and 15 minutes. Pick whichever makes sense for your script. Definition of "load" is here.
On Linux you can get load from /proc/loadavg file.
$load = split(' ',file_get_contents('/proc/loadavg'))
$loadAvg = $load[0]
You can probably use the info in the Mysql List Processes function to see how many are active vs. sleeping, a decent indicator about the load on the DB.
If you're on Linux you can use the Sys GetLoadAvg function to see the overall system load.

Categories