Passing data between running PHP scripts

Passing data between running PHP scripts - php

For multiple running PHP scripts (10 to 100) to communicate, what is the least memory intensive solution?
Monitor flat files for changes
Keep running queries on a DB to check for new data
Other techniques I have heard of, but never tried:
Shared memory (APC, or core functions)
Message queues (Active MQ and company)

In general, a shared memory based solution is going to be the fastest and have the least overhead in most cases. If you can use it, do it.
Message Queues I don't know much about, but when the choice is between databases and flat files, I would opt for a database because of concurrency issues.
A file you have to lock to add a line to it, possibly causing other scripts to fail to write their messages.
In a database based solution, you can work with one record for each message. The record would contain a unique ID, the recipient, and the message. The recipient script can easily poll for new messages, and after reading, quickly and safely remove the record in question.

This is hard to answer without knowing:
How much data will they send in each message (2 bytes or 4 megabytes)?
Will they run in the same machine? (This looks like a yes, else you wouldn't be considering shared memory)
What are the performance requirements (one message a minute or a zillion per second)?
What resource is most important to you?
And so on...
Using a DB is probably easiest to setup in a PHP environment and, depending on how many queries per minute and the type of those queries, that might indeed be the sanest solution. Personally I'd try that first and then see if it's not enough.
But, again, hard to tell for sure without more information on the application.

Related

Best practice to record large amount of hits into MySQL database

Well, this is the thing. Let's say that my future PHP CMS need to drive 500k visitors daily and I need to record them all in MySQL database (referrer, ip address, time etc.). This way I need to insert 300-500 rows per minute and update 50 more. The main problem is that script would call database every time I want to insert new row, which is every time someone hits a page.
My question, is there any way to locally cache incoming hits first (and what is the best solution for that apc, csv...?) and periodically send them to database every 10 minutes for example? Is this good solution and what is the best practice for this situation?

500k daily it's just 5-7 queries per second. If each request will be served for 0.2 sec, then you will have almost 0 simultaneous queries, so there is nothing to worry about.
Even if you will have 5 times more users - all should work fine.
You can just use INSERT DELAYED and tune your mysql.
About tuning: http://www.day32.com/MySQL/ - there is very useful script (will change nothing, just show you the tips how to optimize settings).
You can use memcache or APC to write log there first, but with using INSERT DELAYED MySQL will do almost same work, and will do it better :)
Do not use files for this. DB will serve locks much better, than PHP. It's not so trivial to write effective mutexes, so let DB (or memcache, APC) do this work.

A frequently used solution:
You could implement an counter in memcached which you increment on an visit, and push an update to the database for every 100 (or 1000) hits.

We do this by storing locally on each server to CSV, then having a minutely cron job to push the entries into the database. This is to avoid needing a highly available MySQL database more than anything - the database should be able to cope with that volume of inserts without a problem.

Save them to a directory-based database (or flat file, depends) somewhere and at a certain time, use a PHP code to insert/update them into your MySQL database. Your php code can be executed periodically using Cron, so check if your server has Cron so that you can set the schedule for that, say every 10 minutes.
Have a look at this page: http://damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/. Some codes have been written in the cloud and are ready for you to use :)

One way would be to use Apache access.log. You can get a quite fine logging by using cronolog utility with apache . Cronolog will handle the storage of a very big number of rows in files, and can rotate it based on volume day, year, etc. Using this utility will prevent your Apache from suffering of log writes.
Then as said by others, use a cron-based job to analyse these log and push whatever summarized or raw data you want in MySQL.
You may think of using a dedicated database (or even database server) for write-intensive jobs, with specific settings. For example you may not need InnoDB storage and keep a simple MyIsam. And you could even think of another database storage (as said by #Riccardo Galli)

If you absolutely HAVE to log directly to MySQL, consider using two databases. One optimized for quick inserts, which means no keys other than possibly an auto_increment primary key. And another with keys on everything you'd be querying for, optimized for fast searches. A timed job would copy hits from the insert-only to the read-only database on a regular basis, and you end up with the best of both worlds. The only drawback is that your available statistics will only be as fresh as the previous "copy" run.

I have also previously seen a system which records the data into a flat file on the local disc on each web server (be careful to do only atomic appends if using multiple proceses), and periodically asynchronously write them into the database using a daemon process or cron job.
This appears to be the prevailing optimium solution; your web app remains available if the audit database is down and users don't suffer poor performance if the database is slow for any reason.
The only thing I can say, is be sure that you have monitoring on these locally-generated files - a build-up definitely indicates a problem and your Ops engineers might not otherwise notice.

For an high number of write operations and this kind of data you might find more suitable mongodb or couchdb

Because INSERT DELAYED is only supported by MyISAM, it is not an option for many users.
We use MySQL Proxy to defer the execution of queries matching a certain signature.
This will require a custom Lua script; example scripts are here, and some tutorials are here.
The script will implement a Queue data structure for storage of query strings, and pattern matching to determine what queries to defer. Once the queue reaches a certain size, or a certain amount of time has elapsed, or whatever event X occurs, the query queue is emptied as each query is sent to the server.

you can use a Queue strategy using beanstalk or IronQ

Live chat with PHP and jQuery. Where to store information? Mysql or file?

There are 1 on 1 live chat. Two solutions:
1) I store every message into database and with jQuery's help I check if there is a new message in database every second. Of course I use cache either. If there is, we give that message.
2) I store every message in one html file and every second through jQuery that file is shown over and over again.
What is better? Or there is third option? And in general, what is better, mysql or file for this kinda project?
Thank you very much.
P.S. The most important question is: what is more efficient and what way will eat less resources!
Edit: And is it, nowadays, very bad for many chats (let's say 2,500 chats, that means 5,000 users) to use long polling and check when file was edited every second through javascript? I use very similiar methods like this chat: http://css-tricks.com/jquery-php-chat/ Will it kill my hosting?

Everyone has given a wide range of opinions but I don't think anyone has really hit the nail on the head.
When it comes down to storing data, the amount of data, the rate it is to be accessed, and several other factors all determine what's the best storage platform.
Some people have suggested using memcached. Now although this is a valid answer (you can use it), I don't think that this is a good idea, solely based on the fact that memcached stores data within your server's memory.
Your memory is not for data storage, it's for use of the actual applications, operating system, shared libraries, etc.
Storing data within the memory can cause a lot of issues with other applications currently running. If you store too much data in your RAM your applications would not be able to complete operations assigned to them.
Although this is faster then a disk based storage platform such as MySQL, it's not as reliable.
I would personally use MySQL as your storage engine server-side. This would reduce the amount of problems you would come across and also makes the data very manageable.
To speed up the responses to your clients I would look at running node on your server.
This is because it's event driven and non-blocking.
What does that mean?
Well, when Client A requests some data that is stored on the hard drive, traditionally PHP might say to the C++, fetch me this chunk of data stored on this sector of the hard drive. C++ would say 'ok no problem', and while it goes of to get the information PHP would sit and wait for the data to be read and returned before it continues it's operations, blocking all other client's in the meantime.
With node, it's slightly different. Node will say to the kernel, 'fetch me this chunk of information and when your done, give me call', and then it continues to take requests from other clients that may not need disk access.
So suddenly because we have assigned a callback to the kernel, we do not have to wait :), happy days.
Take a look at this image:
This really could be the answer your looking for, please see the following for a more descriptive and detailed information regarding how node could be the right choice for you:
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/

A fourth option, probably not what you want if you already have PHP code you want to use, but maybe the most efficient is to use a Javascript based server instead of php.
Node.js is easily capable of being a chat server and can store all the recent messages as a Javascript variable.
You can use long polling or other comet techniques so that you so not have to wait a second for messages to update.
Also, the event based architecture of a Javascript server means that there is no overhead for idling around waiting for messages.

It depends on number of chats in the same time. If it's for support and you expect average load to be 1 to 5 chat sessions at a time then you don't to worry too much. Just make sure that when there is no activity for some time stop refreshing and show a message for user to click to resume chat session.
If the visitors will chat with each other and you expect big number of sessions - 10-50 at the same time you can still use PHP + database. Just make sure you don't make redundant queries and your queries are cached correctly. To reduce load you can also deny chat script from being logged in web server:
SetEnvIf Request_URI "^/chat.php$" dontlog
CustomLog /var/log/apache2/access.log combined env=!dontlog
Edit:
you can have delay schema. For example if you query 2 times with delay 1 second and you get no data you can increase delay to 2 seconds. if you reach 10 queries with no response - increase delay to 5 seconds. After 10 minute you can pause the conversation, requiring users to click on a button to resume the chat. That'll, combined with advices above will guarantee low enough load to have many concurrent chats
Edit2:
I suggest you to find some flash or java solution and buy it. With 5000-10000 users you have to be genius to make it work on VPS, especially if RAM is not much. Not that it's not possible but you can rent cheaper VPS and with the rest of the money buy some solution in java or flash (don't know if flush supports 2 way connection, I'm not a flash expert).
Note about number of users: if you have 10 000 users my guess is that you'll have not more than 100 chats at the same time. Go and look dating sites - they have not more than 10% of the users online and maybe most of them are doing something else and not chatting

3rd option. use MEMCACHE. infinitely faster read/writes. perfect for your application.

Store the chat messages in the database but use Memcached as a caching layer for the database reads. So the most popular reads (e.g. the last 20 messages in the chat room) will always be served straight out of memory.
This gives you the benefit for speed for the most frequent operations and persistant storage for all of the messages.

Just to throw in another option... flat files could provide a less resource-hungry alternative.
Every chat is assigned a unique ID and a flat file stored for it. Every chat adds a line to this file. Each client machine then uses jquery to check ONLY the modified date of the file, to see if the chat has been updated.
While I would never normally recommend flat files over a database, I have a sneaky feeling that checking the modified date on a flat file would scale up better than the MySQL alternative.
I was intrigued so I did some tests and here are the results:
With an existing db connection, the number of "SELECT field FROM table LIMIT 0,1" that could be run in 1 second: ~ 4,000
Opening and closing a db connection, but running the same query: ~ 1,800
Checking the modified date on various different files: ~225,000
So to check if a conversation has been updated, storing the conversations in flat files and checking for the last modified date would easily be faster than doing anything with a database.

In general, http connections are not very useful when it comes to pushing data to the client. Doing polls at every x seconds tend to be a resource hog on any server, given you have significant traffic.
You should try XMPP combined with BOSH. Luckily, most of the heavy work is already done for you. You can implement a pure jquery (or other js framework) based solution very quickly. Read this tutorial, it will help you a lot - not only solving your specific problem but, giving you a broader view on how to implement push technologies over the good ole' http.

Unless, its a small-audience script - Between Database vs File-System, its better to use Database(.)
P.S:- Flash also makes a great platform for chat servers, you might wanna look into that aswell.

If you define a conversation as only two people, then a request every second is going to look like one read request per second per user, and one write request every time somebody writes something (say every 10 seconds). So every 10 seconds you will have about 2.2 requests per second, per conversation.
For 50 conversations, that's 100 users and 220 requests per second. That's a lot of load on a server for such a small number of conversations. Writing the conversation to JSON or XML, would probably provide a more scalable solution.
This article discusses the architecture of Meebo - long-polling, comet.
As an afterthought, have you considered installing an IM server like Jabber rather than starting from scratch?

you could always get the right tool for the job ... an XMPP compliant bit of software. for as poor as the documentation is, ejabber is pretty alright. because it follows closely the XMPP standard: http://code.google.com/p/ijab/ you can use any XMPP client. You can store all of it in an RDBMS if you like and provide similar functionalities that are offered in gmail / google talk.
$0.02

A really fast alternative could be a NoSQL database like MongoDB:
MongoDB homepage
Some benchmarks
MongoDB's extension homepage on php.net

I don't use it but you maybe can try Photon , a very high speed framework based on Mongrel.
On the author blog (in french) you have a example , 30 lines of code for a real time chat server, with video demonstration.

I think storing the data on the database is better. Please refer the following link
Script Tutorials Chat

What is more expensive for template reading: Database query or File reading?

My question is fairly simple; I need to read out some templates (in PHP) and send them to the client.
For this kind of data, specifically text/html and text/javascript; is it more expensive to read them out a MySQL database or out of files?
Kind regards
Tom
inb4 security; I'm aware.
PS: I read other topics about similar questions but they either had to do with other kind of data, or haven't been answered.

Reading from a database is more expensive, no question.
Where do the flat files live? On the file system. In the best case, they've been recently accessed so the OS has cached the files in memory, and it's just a memory read to get them into your PHP program to send to the client. In the worst case, the OS has to copy the file from disc to memory before your program can use it.
Where does the data in a database live? On the file system. In the best case, they've been recently accessed so MySQL has that table in memory. However, your program can't get at that memory directly, it needs to first establish a connection with the server, send authentication data back and forth, send a query, MySQL has to parse and execute the query, then grab the row from memory and send it to your program. In the worst case, the OS has to copy from the database table's file on disk to memory before MySQL can get the row to send.
As you can see, the scenarios are almost exactly the same, except that using a database involves the additional overhead of connections and queries before getting the data out of memory or off disc.

There are many factors that would affect how expensive both are.
I'll assume that since they are templates, they probably won't be changing often. If so, flat-file may be a better option. Anything write-heavy should be done in a database.
Reading a flat-file should be faster than reading data from the database.
Having them in the database usually makes it easier for multiple people to edit.
You might consider using memcache to store the templates after reading them, since reading from memory is always faster than reading from a db or flat-file.

It really doesnt make enough difference to worry you. What sort of volume are you working with? Will you have over a million page views a day? If not I'd say pick whichever one is easiest for you to code with and maintain and dont worry about the expense of the alternatives until it becomes a problem.
Specifically, if your templates are currently in file form I would leave them there, and if they are currently in DB form I'd leave them there.

php poor man's cache

I have some small sets of data from the database (mysql) who are seldom updated.
Basically 3 or 4 small bi dimensional arrays (50-200 items).
This is the ideal case for memcached, but I'm on a shared server and can't install anything.
I only have PHP and MySQL.
I'm thinking about storing the arrays on file and regenerate the file via a cron job every 2-3 hours.
Any better idea or suggestion about this approach?
What's the best way to store those arrays?

If you're working with an overworked MySQL server then yes, cache that data into a file. Then you have two ways to update your cache: either via a cron job, unconditionally, every N minutes (I wouldn't update it less frequently than every hour) or everytime the data changes. The best approach depends on your specific situation. In general, the cron job way is the simplest but the on-change way pretty much guarantees that you won't ever use stale data.
As for the storage format, you could just serialize() the array and save the string to a file. With big arrays, unserialize() is faster than a big array(...) declaration.

As said in the comments, it would be better to check whether the root of the problem can't be fixed first. A roundtrip that long sounds like a network configuration problem.
Otherwise, if the DB simply is that slow, nothing speaks against a filesystem based cache. You could turn each query into an md5() hash, and use that as a file name. Serialize() the result set into the file and fetch it from there. Use filemtime() to determine whether the cache file is older than x hours. If it is, regenerate the query - or in fact, to avoid locking problems on the cache files, use a cron job to regenerate it.
Just note that this way, you would be dealing with whole result sets that you have to load into your script's memory all at once. You wouldn't have the advantage of being able to query a result set row by row. This can be done too in a cached way, but it's more complicated.

My english is not good, sorry.
Some times I have read about any alternative to memcache. Is complex, but I think that you can use http://www.php.net/manual/en/ref.sem.php acceding to shared memory.
A simple class example used for storing data is here:
http://apuntesytrucosdeprogramacion.blogspot.com/2007/12/php-variables-en-memoria-compartida.html
Is written in spanish, sorry, but the code is easy to understand (Eliminar=delete)
I never have test this code!! and I don't know if it's viable in a shared server.

Multithreading/Parallel Processing in PHP

I have a PHP script that will generate a report using PHPExcel from data queried from a MySQL DB. Currently, it is linear in processing in that it gets the data back from MySQL, reads in the Excel template, writes the data to the template, then outputs it. I have optimized the code to the point that the data is only iterated over once, and there is very little processing done on the PHP side. The query returns hundreds of lines in less than .001 seconds, so it is running fast enough. After some timing I have found my bottlenecks to be (surprise, surprise) reading the template and writing the output.
I would like to do this:
Spawn a thread/process to read the template
Spawn a thread/process to fetch the data
Return back to parent thread - Parent thread will wait until both are complete
Proceed on as normal
My main questions are is this possible, is it worth it? If yes to both, how would you tackle it?
Also, it is PHP 5 on CentOS

It is generally not a good idea to fork an Apache process. That can cause undetermined results. Instead, using some kind of queuing mechanism is preferable. Gearman is an open source queuing mechanism you can use. I also have a blog post on the Zend Server Job Queue that talks about running tasks asynchronously Do you queue? Introduction to the Zend Server Job Queue.
You could also use something like the Zend Framework Queuing classes to implement some of the asynchronous work. Zend_Queue
#Swisstack, also I will disagree with your assertion that PHP is not created for high performance. Very seldom are language features the cause of slow performance. Perhaps by doing a raw language test comparing $a++ among different languages you will see that, but that type of testing is irrelevant. I've done consulting on PHP for several years and I have never seen a performance problem that was due to the language.

I would try to figure out if you can cache or store the template in some faster to read format. I don't know if that's possible, but the PHPExcel forum is pretty good and is watched by the developers.

You can't multithread but you can fork (pcntl_fork, pcntl_wait). As I'm sure know, you'll want to test carefully the process spawn times to make sure that this is even worth it for your situation.
$pid = pcntl_fork();
if ($pid == -1) {
// fork failed
} elseif ($pid > 0) {
// we're the parent! Wait for child to finish
pcntl_waitpid($pid);
} else {
// we're the child
}

If both reading the template, AND the db query were slow, then I'd say there's a decent chance that worthwhile performance could be gained by running the tasks in parallel. But, you said it yourself, reading the template is slow, and the db query is fast. So, even ignoring any additional overhead created by introduced by the additions needed to run the tasks in parallel, in the best case, you stand to save 0.001 seconds(the time needed for db query).
Running multiple tasks in parallel will always still require the time of the slowest task. Running tasks in series is the sum of all tasks. In your case, templateTime + queryTime(0.001)
Not worth it imo.
Usually the database is the turtle in the equation. You can do that part async without too much effort. See the newly added mysqli_poll() and friend functions.

You can definitely spawn processes on CentOS with PHP (http://php.net/manual/en/function.pcntl-fork.php). Before doing that though, I'd consider at least one thing... If bottleneck appears to be on reading the template and writing the output, it might be an I/O bound issue only and therefore dealing with multiple processess might not help much... Personally I'd try to see if it's possible to do some caching instead...

Read the template once, then do a clone for each workbook that you need to create from the data

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.