I am playing around with building a chat application using PHP and CodeIgniter.
For this, I am implementing a cache 'buffer' with memcached to hold the most recent chat messages in memory, reducing load on the database. What I want to do is this:
When a message arrives, I save it in memcached using the current minute (YYYY-MM-DD-HH-MM) as the key. No database I/O involved. The idea being that all messages from the same minute are collected under the same key.
Users receive new chat messages also fetched from memcached (for now I'm using long-polling, but this will move to WebSockets under Node.js for obvious performance reasons). Again, no database I/O involved.
An automated server script (cronjob) will run once every 5 minutes, collecting the memcached data from the last 5 minutes and inserting the messages into the database.
The memcached objects are set to go stale after 6 minutes, so we never need to keep more than 6 minutes worth of message data in memory
This for a total of one database write operation per 5 minutes and zero database read operations.
Does this sound feasible? Is there a better (maybe even built-in?) way to use memcached for this purpose?
Update: I have been experimenting a little now, and I have an idea for a shortcut (read: hack). I can 'buffer' the messages temporarily in the Node.js server script until I'm ready to store them. A Javascript object/array of messages in the Node.js server is basically a memory cache - kind of.
So: Every N messages/seconds, I can pass the buffered messages (the contents of the JS array) to my database, using whatever method I want, since it won't be called very often.
However, I'm worried this might cripple the Node.js server process, since it probably won't enjoy carrying around that 200 KB array.
Any thoughts on this strategy? Is it completely crazy?
Have you looked into HTML5 socket connections? With a socket server, you do not need to store anything. The server receives a message from one subscriber, and immediately sends it back out to the correct subscribers. I have not done this myself using HTML5, but I know the functionality now exists. I have done this before using Flash which also supports socket conenctions.
Why don't use INSERT DELAYED ? It offers you almost the same functionality you are trying to achieve without the need of memcached.
Anyway your solution looks good, too.
Related
I'm programming c++ service which constantly every 1 second makes SELECT query with LIMIT 1 on mysql server, something computes and then makes INSERT and this in loop forever and ever.
I'd like to detect server overloading to make SELECTs with bigger LIMIT, for example LIMIT 10 and in greater inetrvals, like every 5 seconds or so. Not sure if my solution will lighten server overloads.
My problem is how to detect these overloads and I'm not sure what I mean by overload :) It could be anything, but my application is web application in php (chat) so overload could be detected on Apache2 side, or mysql side, or detecting how many users make how many inputs (chat messages) in time interval. I don't know :/
Thank you!
EDIT: Okay, I made an socket server from my C++ application and its really fast that way. Now I'm struggling with memory leaks, but that's another story.
So thank you #brianbeuning for helpful thoughts about my problem.
Better solve that forever and ever loop, its not good idea.
If that loop is really must, then use some caching technique.
For detecting "overload" (I would call it high MySQL CPU usage), try calling external commands supported by operating system.
For example if you use this on Linux, play with ps command.
EDIT:
I realized now that you are programming chatting server.
Using MySQL as middleman is NOT good idea.
Try solving this without using MySQL, and then if you need to save chat log, occasionally save it to MySQL (eg. every 10 seconds or so).
I bet it is CPU hog right now for just 20 intensive users.
Try to make direct client-to-client communication, without requiring server (use server only to establish communication between 2 clients).
Another approach would be to buffer the data in your app and use a connection pool of sorts to manage the load. Keep a rolling buffer of data that needs to be inserted and manage the 'limit' based on the size of the buffer.
I have a PHP app on Heroku, the app does lots of communication with external APIs, which in turn trigger jobs on the database, the results are then displayed in a Facebook app...
Currently I have 2 worker processes and a web process. The web process triggers the workers and monitors database flags to know when each worker job is complete...I know...this setup isn't great, ideally I'd like to get notified in my web process when each worker process is finished, but this doesn't seem to be possible...
Is there a better way to approach this in Heroku using PHP?
Maybe a PHP app on Heroku isn't the best solution, but I've written lots of PHP that I'd rather not re-write....
Thanks in advance...
I can think of two relatively straightforward things you can do without ditching PHP (though I have to mention that PHP doesn't have much to recommend it, and you would likely be better off with Python/Django, Python/Flask or Ruby/Rails):
One is that you can switch to Redis for managing your workers instead of using your database. The advantage to this is that Redis has a pub/sub system where you can subscribe to signals while you hold a connection open. This means that if a connection is open, for instance from a web process, you will be notified of a change immediately without having to poll.
Two is that you can switch to using ajax so that you don't block the loading of your page while you're waiting. Load your page immediately and then use javascript to hit a separate PHP page to periodically check for updates on the status of your job, and then use javascript to render the results on the page in place when the results are available.
Even better, use ajax long polling. Render your page immediately and then use javascript to send a request back. Then when your php page receives the second request, register a subscription with Redis and then also manually check for updates (if you're not using Redis, just check for updates). If there are no updates, then just wait until the subscription receives a message, or wait for 30 seconds, whichever. (To be honest, I've never done Redis subscriptions in PHP so I'm not sure how implement that -- if you can't do it easily then just poll every couple of seconds instead.) If the 30 second timer expires, return json that says there are no results and have the javascript retry immediately. If you do receive results within that time, return the results and have the javascript render them.
I have a game running in N ec2 servers, each with its own players inside (lets assume it a self-contained game inside each server).
What is the best way to develop a frontend for this game allowing me to have near real-time information on all the players on all servers.
My initial approach was:
Have a common-purpose shared hosting php website polling data from each server (1 socket for each server). Because most shared solutions don't really offer permanent sockets, this would require me to create and process a connection each 5 seconds or so. Because there isn't cronjob with that granularity, I would end up using the requests of one unfortunate client to make this update. There's so many wrong's here, lets consider this the worst case scenario.
The best scenario (i guess) would be to create small ec2 instance with some python/ruby/php web based frontend, with a server application designed just for polling and saving the data from the servers on the website database. Although this should work fine, I was looking for some solution where I don't need to spend that much money (even a micro instance is expensive for such pet project).
What's the best and cheap solution for this?
Is there a reason you can't have one server poll the others, stash the results in a json file, then push that file to the web server in question? The clients could then use ajax to update the listings in near real time pretty easily.
If you don't control the game servers I'd pass the work on updating the json off to one of the random client requests. it's not as bad as you think though.
Consider the following:
Deliver (now expired) data to client, including timestamp
call flush(); (test to make sure the page is fully rendered, you may need to send whitespace or something to fill the buffer depending on how the webserver is configured. appending flush(); sleep(4); echo "hi"; to a php script should be an easy way to test.
call ignore user abort (http://php.net/manual/en/function.ignore-user-abort.php) so your client will continue execution regardless of what the user does
poll all the servers, update your file
Client waits a suitable amount of time before attempting to update the updated stats via AJAX.
Yes that client does end up with the request taking a long time, but it doesn't affect their page load, so they might not even notice.
You don't provide the information needed to make a decision on this. It depends on the number of players, number of servers, number of games, communication between players, amount of memory and cpu needed per game/player, delay and transfer rate of the communications channels, geographical distribution of your players, update rate needed, allowed movement of the players, mutual visibility. A database should not initially be part of the solution, as it only adds extra delay and complexity. Make it work real-time first.
Really cheap would be to use netnews for this.
I've always enjoyed the idea of long polling; on my development server I've played with various notification / new post systems, each using javascript to hold a connection and 'wait' for some sort of response. I've always had an issue with many implementations of this, they all involve repetitively polling the mySQL server to check for new rows.
A dedicated server for long polling requests is a possibility, but it seems very wasteful to continuously poll (around ever 3 seconds seems common) a database server for every client. Its a huge waste of resources for something that is relatively insignificant.
Is there a batter way?
If your specific problem is that you're trying to avoid notifying events through a database, you should probably be looking at using shared memory or semaphores.
Instead of continuously polling the database, you would instead monitor the shared memory. When something writes to the db (I'm assuming some sort of message queue), you can flag the event via the shared memory. The listening code would detected this, and only then establish a db connection to retrieve the message. Alternatively, you could use shared memory to entirely replace the use of the database.
The reference for the php semaphore and shared memory functions is here - http://uk.php.net/manual/en/ref.sem.php
I would use some nosql to notify there is new data. Redis has pub/sub and a blocking list.
You can also use, for example, memcache and create a new key when the data is available.
WebSockets
...
When it's actually fully supported ;)
data cache, I like the one from Zend Server, it dramatically reduced pulling from database
Whenever you Insert or update your database create a cache for that field of the database. You can use any simple PHP based cache (http://hycus.com/2011/03/31/hcache-a-cache-system-for-php/).
Then you can poll that cache continuously using JQUERY.
Can look into having a flash movie in the background that maintains a continuous connection with the server using sockets. Java also supports sockets so can also be a java applet embedded in your page.
I'm working an image processing website, instead of having lengthy jobs hold up the users browser I want all commands to return fast with a job id and have a background task do the actual work. The id could then be used to check for status and results (ie a url of the processed image). I've found a lot of distributed queue managers for ruby, java and python but I don't know nearly enough of any of those languages to be able to use them.
My own tests have been with shared mysql database to queue jobs, lock them to a worker, and mark them as completed (saving the return data in the db). It was just a messy prototype, and the entire time I felt as if I was reinventing the wheel (and not very elegantly). Does something exist in php (or that I can talk to RESTfully?) that I could use?
Reading around a bit more, I've found that what I'm looking for is a queuing system that has a php api, it doesn't have to be written in php. I've only found classes for use with Amazon's SQS, but not only is that not free, it's also quite latent sometimes (over a minute for a message to show up).
Have you tried ActiveMQ? It makes mention of supporting PHP via the Stomp protocol. Details are available on the activemq site.
I've gotten a lot of mileage out of the database approach your describing though, so I wouldn't worry too much about it.
Do you have full control over server?
MySQL queue could be fine in such case. Have a PHP script that is running constantly (in endless while loop), querying the MySQL database for new "tasks" and sleep()ing in between to reduce the load in idle time.
When each task is completed, mark it in the database and move to the next one.
To prevent that whole thing stops if your script crashes/exists (PHP memory overflow, etc.) you can, for example, place it in inittab (if you use Linux as a server) and init will restart it automatically.
Zend_Framework has a queue class, with a number of implementations of Mysql-backed, SQS and some other back-ends.
Personally, I've had excellent results with BeanstalkD recently, which also has a PHP client. I'm just serialising some data with JSON to throw into it, which gets decoded and run on the worker(s).