I'm trying to write a PHP script for 'long-polling', returning data when new rows are added to a (Postgres) database table. Is there any way to get a SELECT query to return only when it would return results, blocking otherwise? Or should I use another signaling mechanism, outside of the database?
Take a look at LISTEN/NOTIFY:
The NOTIFY command sends a
notification event to each client
application that has previously
executed LISTEN name for the specified
notification name in the current
database
http://www.postgresql.org/docs/8.4/static/sql-notify.html
You can add an "ON INSERT" trigger to the table to fire off a NOTIFY event. However, you will need another mechanism to figure out which records need to be selected as the ability to deliver a payload with the NOTIFY event won't be available until 9.0:
http://www.postgresql.org/docs/9.0/static/sql-notify.html
there is no blocking select statement.
you could just issue the select statement on a regular basis - which incurs a certain overhead. If the query is expensive, then you might write a cheaper one like count(*) and keep track of new entries that may possibly be returned, and if the number changes issue the more expensive query.
You could look into LOCK and FOR UPDATE. FOR UPDATE can allow a query to wait until the row(s) that are being selected are unlocked. I'm not sure if there is a timeout or what resources impact having a large number of these can have, but it's one possibility.
You're trying to get an interrupt (event), when you should probably think about polling.
Create and call a stored procedure which will determine if there are new rows that the client should retrieve. If this is a web app, call an Ajax method periodically which, on the server, will query the db to see if there are new rows since its last call. If so, run another query to retrieve them and send them back to the client.
I love postgres and all, but if you're trying to do something simple and not super enterprisey, perhaps redis will be enough for you. I've had a lot of success with using it myself, and it can scale.
http://code.google.com/p/redis/
Related
I believe in PHP, whenever a user sends a request to backend PHP page, there is a one-to-one communication started, that is, a new instance of that page is created and executed as per the request of user.
My question, if each time a new instance is created, I want to create a PHP script, which is shared among all instances,
For ex: I want to store few hundred random numbers in that script (lets name it as pool.php - A static pool), and each time a request to Back end page ( lets say BE.php ) is made, BE.php requests pool.php to return a unique variable each time, and once all variables are used, I will put a logic in pool.php to create new set of variables
If my question is not clear, pls let me know
Memcached is a good candidate for this.
It is a key/value store that persists despite PHP coming and going. You can write values from one PHP script and read them from another. This isn't exactly what you are looking for, but can be used for the same purpose, and is much easier to deal with than sockets connecting to other PHP scripts.
http://memcached.org/
You could solve this with MySQL and locking the table in question. Keep this pool of variables in a separate table, then use SQL table-level lock to hold-off other requests until current request is finished, by using:
SELECT GET_LOCK( 'my_lock', 10 ) AS success
Make sure to check that the query returns 1, which means you now have a lock. If it doesn't, your query timed out waiting for the lock.
Then perform your ordinary query, like checking if a non-occupied variable exists. If so, occupy it by updating it or whatever.
Then you release the lock, using:
DO RELEASE_LOCK( 'my_lock' )
The number 10 is the timeout that each request will wait before failing.
Tarun, you do know that databases have something called AUTO_INCREMENT fields that can be used as primary keys for your user comments. And every time you add a new comment, that field gets incremented by the DB server and you get a unique ID on every new entry without breaking a sweat?
The only viable way for your need is using a database and some kind of Mutex or MySQL's internal Mutex like John Severinson said if the AUTO_INCREMENT field will not suffice.
PS: Performance overhead... when talking about PHP scripting is kind of a non-issue. If you need performance, write your sites in C/C++. You are talking about milliseconds (0.001 seconds) or less. If that will impact your performance, you need to revisit your project/logic.
Is their any method/way that we come to know, any change occurs in desired table?
I have implemented one solution that it checks in db every 30 sec, that is if any change occur in chats table then refresh my chats listing. it waists lot of performance and slow down the site.
Is their any way that our action listener only calls whenever any row is inserted in table?
Any idea?
You may create table with notifications (which will always contains just unread messages for last 2 hours or so) and then create trigger [syntax] that will create new notification each time change occurs:
CREATE TRIGGER create_notification AFTER INSERT ON chats
FOR EACH ROW INSERT INTO notifications (...) VALUES( NEW.col1, NEW.col2)...
Then you'll end up with much smaller (and faster) database table.
There are some alternatives like writing a socket server but I think it would be less resource effective than pooling database plus I have a bad experience running php scripts for a long time.
well in your scenario you need a listener, which tracks the moment the new row of chat is inserted to the database.
It will be easier if you emit chat message to the users before you insert it into the database.
You can use socket. you can either use nodejs+socket.io to perform this. Here is bit about that
socket.io and node.js to send message to particular client
You can also look this article
https://www.flynsarmy.com/2012/02/php-websocket-chat-application-2-0
My webservices are as structured as follows:
Receive request
Parse and validate input
Do actual webservice
Validate output
I am primarily using logging for debugging purposes, if something went wrong I want to know what the request was so I can hope to reproduce it (ie. send the exact same request).
Currently I'm logging to a MySQL database table. After 1. a record is created, which is updated with more info after 2. and 3. and cleaned up after 4. (Logs of successful requests are pruned).
I want the logging to be as quick and painless as possible. Any speed up here will considerably improve overall performance (round trip of each request).
I was thinking of using INSERT DELAYED but I can't do that because I need the LAST_INSERT_ID to update and later delete the log record, or at least record the status of the request (ie. success or error) so I know when to prune.
I could generated a unique id myself (or at least an id that is 'unique enough') but even then I wont know the order of the DELAYED statements and I might end up trying to update or delete a record that doesn't exist yet. And since DELAYED also removes the ability to use NUM_AFFECTED_ROWS I can't check if the queries are effected.
Any suggestions?
When you say pruned im assuming if it was a success your removing the record? If so I think it would be better for you if you had a Java object storing the information instead of the database as the process unfolds then if an exception occurs you log the information in the object to the database all at once.
If you wanted to take it further, I did something similiar to this, I have a service which queues the logging of audit data and inserts the info at a steady pace so if the services are getting hammered were not clogging the DB with logging statements, but if your only logging errors that would probably be overkill.
I figured I can probably just do a REPLACE DELAYED and do the pruning some other time. (with a DELETE DELAYED).
I am currently developing a simple PHP application where users can send messages to each other. Messages are stored in a SQL database. I'd like to put a count of unread messages in the menu on every page, so that a user can see quickly if they have new messages without checking the inbox periodically.
While this may be an easy problem to solve, I don't know what the best method would be, performance-wise :
Do a plain SQL COUNT() of unread messages on every page load (instantly notified of changes, but it may impact performance badly ?)
Do the same thing, but cache the result for X minutes (we create an annoying delay)
Same as 2., but only update when we read a message or when a message is sent to us (can use up a lot of RAM / stress the disk, since we create one persistent entry/file per user : we can't store it in $_SESSION because we need to update it when another user sends a message to us)
All my solutions are somewhat server-based, because I'm not very familiar with JS. But if a better solution exists using JavaScript, It's okay.
Thank you for your help !
I'd suggest 4'th:
Once new message has been sent to a user, you update counter in memcache. You create simple ajax application on client side sending a request every X seconds. At server side, you just check is there unread messages. At page refresh, you don't need to query the database since you get count from memcache extremely fast.
That's what I'd done if I had bottleneck in DB (in 90% cases, DB is the weekest part of any database-driven application).
That's what we usually do at highloaded web sites: we trying to avoid any COUNTER queries. If not, we denormalize the database to store counters right in the appropriate table as yet another column e.g. if you can not use memcache, you would store the unread messages counter as a column for Users table.
I'd go for option three, except I'd add memcached as solution 4.
Do a plain SQL COUNT() of unread
messages on every page load (instantly
notified of changes, but it may impact
performance badly ?)
As long as you have a decent table structure, COUNT() is a pretty fast command. I wouldn't cache this particular command. I'd instead work out the other queries to make sure you're only returning the data you need when showing them a listing. For example, if all you need is an excerpt, I'd make sure to do something like this:
SELECT id, author, msgdate, substring(body, 0, 50) from table where recipient = ?
instead of
SELECT * from table where recipient = ?;
Imho. It's best to let the client ping the server and send a json back with the amount of unread messages. Counting in mysql should be fast so I see no reason not to use it. Just filter the results on the chat session.
For the database part. The best way would be to store a new_message filled in your db table and default it to 1, and set that one to 0 when the message has been loaded.
We have this PHP application which selects a row from the database, works on it (calls an external API which uses a webservice), and then inserts a new register based on the work done. There's an AJAX display which informs the user of how many registers have been processed.
The data is mostly text, so it's rather heavy data.
The process is made by thousands of registers a time. The user can choose how many registers to start working on. The data is obtained from one table, where they are marked as "done". No "WHERE" condition, except the optional "WHERE date BETWEEN date1 AND date2".
We had an argument over which approach is better:
Select one register, work on it, and insert the new data
Select all of the registers, work with them in memory and insert them in the database after all the work was done.
Which approach do you consider the most efficient one for a web environment with PHP and PostgreSQL? Why?
It really depends how much you care about your data (seriously):
Does reliability matter in this case? If the process dies, can you just re-process everything? Or can't you?
Typically when calling a remote web service, you don't want to be calling it twice for the same data item. Perhaps there are side effects (like credit card charges), or maybe it is not a free API...
Anyway, if you don't care about potential duplicate processing, then take the batch approach. It's easy, it's simple, and fast.
But if you do care about duplicate processing, then do this:
SELECT 1 record from the table FOR UPDATE (ie. lock it in a transaction)
UPDATE that record with a status of "Processing"
Commit that transaction
And then
Process the record
Update the record contents, AND
SET the status to "Complete", or "Error" in case of errors.
You can run this code concurrently without fear of it running over itself. You will be able to have confidence that the same record will not be processed twice.
You will also be able to see any records that "didn't make it", because their status will be "Processing", and any errors.
If the data is heavy and so is the load, considering the application is not real time dependant the best approach is most definately getting the needed data and working on all of it, then putting it back.
Efficiency speaking, regardless of language is that if you are opening single items, and working on them individually, you are probably closing the database connection. This means that if you have 1000's of items, you will open and close 1000's of connections. The overhead on this far outweighs the overhead of returning all of the items and working on them.