PHP long polling, without excessive database access

PHP long polling, without excessive database access - php

I've always enjoyed the idea of long polling; on my development server I've played with various notification / new post systems, each using javascript to hold a connection and 'wait' for some sort of response. I've always had an issue with many implementations of this, they all involve repetitively polling the mySQL server to check for new rows.
A dedicated server for long polling requests is a possibility, but it seems very wasteful to continuously poll (around ever 3 seconds seems common) a database server for every client. Its a huge waste of resources for something that is relatively insignificant.
Is there a batter way?

If your specific problem is that you're trying to avoid notifying events through a database, you should probably be looking at using shared memory or semaphores.
Instead of continuously polling the database, you would instead monitor the shared memory. When something writes to the db (I'm assuming some sort of message queue), you can flag the event via the shared memory. The listening code would detected this, and only then establish a db connection to retrieve the message. Alternatively, you could use shared memory to entirely replace the use of the database.
The reference for the php semaphore and shared memory functions is here - http://uk.php.net/manual/en/ref.sem.php

I would use some nosql to notify there is new data. Redis has pub/sub and a blocking list.
You can also use, for example, memcache and create a new key when the data is available.

WebSockets
...
When it's actually fully supported ;)

data cache, I like the one from Zend Server, it dramatically reduced pulling from database

Whenever you Insert or update your database create a cache for that field of the database. You can use any simple PHP based cache (http://hycus.com/2011/03/31/hcache-a-cache-system-for-php/).
Then you can poll that cache continuously using JQUERY.

Can look into having a flash movie in the background that maintains a continuous connection with the server using sockets. Java also supports sockets so can also be a java applet embedded in your page.

Related

Multiple socket streams in PHP

In PHP, my script is only trying to test if a server is online - nothing more. How would I go about creating multiple socket streams that all run at the same time? Doing it one-after-another would take forever if you're testing a bunch of servers.

Usually you would start a pool of threads and the threads would read all the sites that need to be tested from a queue. This would allow each thread to open a connection to a site (supporting concurrency)
or maybe pthreads? I dunno, i've never written threaded code in php

Use select. This takes a list of which sockets you want to read or write to and then tells you when they are ready. When you read/write to these, you know you'll be able to get/send some data. You then process what you need to do on those sockets, and go back and select again waiting for more data.
If you need to do other things as well, set the timeout on select and it will return in that amount of time, even if nothing is ready on any sockets.
edit: also, once you figure out how to use select (not that hard), it's a TON simpler to debug and deal with than dealing with synchronization between threads.

Detecting server overload to limit mysql queries

I'm programming c++ service which constantly every 1 second makes SELECT query with LIMIT 1 on mysql server, something computes and then makes INSERT and this in loop forever and ever.
I'd like to detect server overloading to make SELECTs with bigger LIMIT, for example LIMIT 10 and in greater inetrvals, like every 5 seconds or so. Not sure if my solution will lighten server overloads.
My problem is how to detect these overloads and I'm not sure what I mean by overload :) It could be anything, but my application is web application in php (chat) so overload could be detected on Apache2 side, or mysql side, or detecting how many users make how many inputs (chat messages) in time interval. I don't know :/
Thank you!
EDIT: Okay, I made an socket server from my C++ application and its really fast that way. Now I'm struggling with memory leaks, but that's another story.
So thank you #brianbeuning for helpful thoughts about my problem.

Better solve that forever and ever loop, its not good idea.
If that loop is really must, then use some caching technique.
For detecting "overload" (I would call it high MySQL CPU usage), try calling external commands supported by operating system.
For example if you use this on Linux, play with ps command.
EDIT:
I realized now that you are programming chatting server.
Using MySQL as middleman is NOT good idea.
Try solving this without using MySQL, and then if you need to save chat log, occasionally save it to MySQL (eg. every 10 seconds or so).
I bet it is CPU hog right now for just 20 intensive users.
Try to make direct client-to-client communication, without requiring server (use server only to establish communication between 2 clients).

Another approach would be to buffer the data in your app and use a connection pool of sorts to manage the load. Keep a rolling buffer of data that needs to be inserted and manage the 'limit' based on the size of the buffer.

Using memcached as a database buffer for chat messages

I am playing around with building a chat application using PHP and CodeIgniter.
For this, I am implementing a cache 'buffer' with memcached to hold the most recent chat messages in memory, reducing load on the database. What I want to do is this:
When a message arrives, I save it in memcached using the current minute (YYYY-MM-DD-HH-MM) as the key. No database I/O involved. The idea being that all messages from the same minute are collected under the same key.
Users receive new chat messages also fetched from memcached (for now I'm using long-polling, but this will move to WebSockets under Node.js for obvious performance reasons). Again, no database I/O involved.
An automated server script (cronjob) will run once every 5 minutes, collecting the memcached data from the last 5 minutes and inserting the messages into the database.
The memcached objects are set to go stale after 6 minutes, so we never need to keep more than 6 minutes worth of message data in memory
This for a total of one database write operation per 5 minutes and zero database read operations.
Does this sound feasible? Is there a better (maybe even built-in?) way to use memcached for this purpose?
Update: I have been experimenting a little now, and I have an idea for a shortcut (read: hack). I can 'buffer' the messages temporarily in the Node.js server script until I'm ready to store them. A Javascript object/array of messages in the Node.js server is basically a memory cache - kind of.
So: Every N messages/seconds, I can pass the buffered messages (the contents of the JS array) to my database, using whatever method I want, since it won't be called very often.
However, I'm worried this might cripple the Node.js server process, since it probably won't enjoy carrying around that 200 KB array.
Any thoughts on this strategy? Is it completely crazy?

Have you looked into HTML5 socket connections? With a socket server, you do not need to store anything. The server receives a message from one subscriber, and immediately sends it back out to the correct subscribers. I have not done this myself using HTML5, but I know the functionality now exists. I have done this before using Flash which also supports socket conenctions.

Why don't use INSERT DELAYED ? It offers you almost the same functionality you are trying to achieve without the need of memcached.
Anyway your solution looks good, too.

Would a Socket Connection Outperform an Intarvaled Database Sweep and Requests?

I'm building a small chat application to add to an existing framework. There will only be 20-50 users maximum at any one time.
I was wondering if I could get away with updating a cache file containing (semi) live chat data for whichever users happen to be chatting just by performing timed queries and regular AJAX refreshes for new data as opposed to learning how to open and maintain a socket connection.
I'm sure there are existing chat plug-ins out there, but I just had a hell of a time installing one and I could see building the whole damn thing taking just as much time as plugging one in.
Am I off to a bad start?
Thanks in advance -J
(p.s. this is a semi closed network behind a php login so security isn't a great concern)

First of all, I would suggest reading up on JavaScript Long Polling to retrieve your data instantaneously.
As far as collecting and distributing your data, I would recommend you use a database that supports LISTEN and NOTIFY. (For example, Postgres provides you with pg_get_notify() in PHP)
With long-polling and a notification-enabled database like Postgres, you could easily build a real-time, scalable chat application.
Other resources and links:
http://www.postgresql.org/docs/current/static/sql-notify.html
http://www.postgresql.org/docs/current/static/sql-listen.html
http://www.php.net/manual/en/function.pg-get-notify.php
http://blog.perplexedlabs.com/2009/05/04/php-jquery-ajax-javascript-long-polling/
http://en.wikipedia.org/wiki/Comet_%28programming%29
http://www.webdevelopmentbits.com/avoiding-long-polling

I second that long polling is a good approach. However, understanding it, and doing it correctly, is far more difficult than just polling in intervals. With 20-50 users, scalability shouldn't be an issue. For a good long polling design, you should look at how you can avoid to suspend a server thread for the lifetime of an http request.
It could be wise to start out with a simple polling approach, advancing to long polling later on.

What are some good distributed queue managers in php?

I'm working an image processing website, instead of having lengthy jobs hold up the users browser I want all commands to return fast with a job id and have a background task do the actual work. The id could then be used to check for status and results (ie a url of the processed image). I've found a lot of distributed queue managers for ruby, java and python but I don't know nearly enough of any of those languages to be able to use them.
My own tests have been with shared mysql database to queue jobs, lock them to a worker, and mark them as completed (saving the return data in the db). It was just a messy prototype, and the entire time I felt as if I was reinventing the wheel (and not very elegantly). Does something exist in php (or that I can talk to RESTfully?) that I could use?
Reading around a bit more, I've found that what I'm looking for is a queuing system that has a php api, it doesn't have to be written in php. I've only found classes for use with Amazon's SQS, but not only is that not free, it's also quite latent sometimes (over a minute for a message to show up).

Have you tried ActiveMQ? It makes mention of supporting PHP via the Stomp protocol. Details are available on the activemq site.
I've gotten a lot of mileage out of the database approach your describing though, so I wouldn't worry too much about it.

Do you have full control over server?
MySQL queue could be fine in such case. Have a PHP script that is running constantly (in endless while loop), querying the MySQL database for new "tasks" and sleep()ing in between to reduce the load in idle time.
When each task is completed, mark it in the database and move to the next one.
To prevent that whole thing stops if your script crashes/exists (PHP memory overflow, etc.) you can, for example, place it in inittab (if you use Linux as a server) and init will restart it automatically.

Zend_Framework has a queue class, with a number of implementations of Mysql-backed, SQS and some other back-ends.
Personally, I've had excellent results with BeanstalkD recently, which also has a PHP client. I'm just serialising some data with JSON to throw into it, which gets decoded and run on the worker(s).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.