Fast sequential logging for webservices - php

My webservices are as structured as follows:
Receive request
Parse and validate input
Do actual webservice
Validate output
I am primarily using logging for debugging purposes, if something went wrong I want to know what the request was so I can hope to reproduce it (ie. send the exact same request).
Currently I'm logging to a MySQL database table. After 1. a record is created, which is updated with more info after 2. and 3. and cleaned up after 4. (Logs of successful requests are pruned).
I want the logging to be as quick and painless as possible. Any speed up here will considerably improve overall performance (round trip of each request).
I was thinking of using INSERT DELAYED but I can't do that because I need the LAST_INSERT_ID to update and later delete the log record, or at least record the status of the request (ie. success or error) so I know when to prune.
I could generated a unique id myself (or at least an id that is 'unique enough') but even then I wont know the order of the DELAYED statements and I might end up trying to update or delete a record that doesn't exist yet. And since DELAYED also removes the ability to use NUM_AFFECTED_ROWS I can't check if the queries are effected.
Any suggestions?

When you say pruned im assuming if it was a success your removing the record? If so I think it would be better for you if you had a Java object storing the information instead of the database as the process unfolds then if an exception occurs you log the information in the object to the database all at once.
If you wanted to take it further, I did something similiar to this, I have a service which queues the logging of audit data and inserts the info at a steady pace so if the services are getting hammered were not clogging the DB with logging statements, but if your only logging errors that would probably be overkill.

I figured I can probably just do a REPLACE DELAYED and do the pruning some other time. (with a DELETE DELAYED).

Related

Android: How to synchronize database entries between Client and server

I have an android app, which needs to fetch new data from a server running a MySQL database. I add data to the database via a Panel which I access online on my domain.com/mypanel/.
What would be the best way to fetch the data on the client to reduce the overhead, but keep the programming effort as small as possible. It's not necessary for the client to get the latest database changes right after they have been updated, i.e. it would be okay if the client is updated some hours after the update.
Currently I thought of the following:
Add a column timestamp to the database-tables so that I know which changes have been made
Run some sort of background service on the client (in the app) which runs every X hours and then checks for the latest updates since the last successfull server-client synchronization
Send the time-gap to the server in which there haven't been any updates on the client anymore, using HTTP-POST
On the server, there will be some sort of MySQL SELECT-statement which considers the sent time-gap (if there is no time-gap sent from the client, just SELECT everything, e.g. in case of the first synchronization (full-sync)) --> JSON-Encode the Arrays -> Sent the JSON Response to the Client
On the client, take the data, loop row by row and insert into the local database file
My question would be:
Is there something you would rather do differently?
Or would you maybe send the database changes as a whole package/sql-file instead of the raw-data as array?
What would happen, when the internet connection aborts during the synchronization? I thought of the following to avoid any conflicts in this sort of process: Only after successfull retrieve of the complete server-response (i.e. the complete JSON-array), ONLY then insert the rows into the local database and update the local update timestamp to the actual time. If I've retrieved only some of the JSON rows and the internet connection gets interrupted inbetween (or app is being killed), I would NOT have inserted ANY of the retrieved rows into my local app-database, which means that the next time the background service is running, there will hopefully be no conflicts.
Thank you very much
You've mentioned database on client and my guess is that database is SQLite.
SQLite fully supports transaction, which means that you could wrap your inserts in BEGIN TRANSACTION and END TRANSACTION statements. A successful transaction would mean that all your inserts/updates/deletes are fine.
Choosing JSON has a lot of ups and a few downs - its easy for both client and server side. A downside I've been struggling in the past is with big JSONs (a few Mb). The client device have to download all the string and parse it at once, so it may run out of memory while converting the string to JSONObject. I've been there, so just keep that in mind as a possibility. That could be solved by splitting your update into pieces and marking each piece with its number and total number of pieces. Then the client device should know that it'd make a few requests to get all the pieces.
Another option you have is the good old CSV. You won't need the JSON includes, which will save your app some space. An upside is that you may parse and process the data line by line, so the memory impact would be very low. The obvious downside here is that you'll have to parse the data, which might be a problem, depending on your data.
I should also mention XML as an option. My personal opinion is that I'd use only if I really have to.

PHP/MySQL: Insert rows temporarily

I am working on a complex database application written with PHP and Mysqli. For large database operations I´m using daemons (also PHP) working in the background. During these operations which may take several minutes I want to prevent the users from accesing the data being affected and show them a message instead.
I thought about creating a Mysql table and insert a row each time a specific daemon operation takes place. Then I would be always able to check if a certain operation takes place while trying to access the data.
However, of course it is important that the records do not stay in the database, if the daemon process gets terminated by any reason (kill from console, losing database connection, pulling the plug, etc.) I do not think that Mysql transactions / rollbacks are able do this, because a commit is necessary in order to make the changes public and the records will remain in the database if terminated.
Is there a way to ensure that the records get deleted if the process gets terminated?
This is an interesting problem, I actually implemented it for a University course a few years ago.
The trick I used is to play with the transaction isolation. If your daemons create the record indicating they are in progress, but do not commit it, then you are correct in saying that the other clients will not see that record. But, if you set the isolation level for that client to READ UNCOMMITTED, you will see the record saying it's in progress - since READ UNCOMMITTED will show you the changes from other clients which are not yet committed. (The daemon would be left with the default isolation level).
You should set the client's isolation level to read uncommitted for that daemon check only, not for it's other work as it could be very dangerous.
If the daemon crashes, the transaction gets aborted and the record goes. If it's successful, it can either mark it done in the db or delete the record etc, and then it commits.
This is really nice, since if the daemon fails for any reason all the changes it made are reversed and it can be retried.
If you need more explanation or code I may be able to find my assignment somewhere :)
Transaction isolation levels reference
Note that this all requires InnoDB or any good transactional DB.

Unread message count in a PHP app

I am currently developing a simple PHP application where users can send messages to each other. Messages are stored in a SQL database. I'd like to put a count of unread messages in the menu on every page, so that a user can see quickly if they have new messages without checking the inbox periodically.
While this may be an easy problem to solve, I don't know what the best method would be, performance-wise :
Do a plain SQL COUNT() of unread messages on every page load (instantly notified of changes, but it may impact performance badly ?)
Do the same thing, but cache the result for X minutes (we create an annoying delay)
Same as 2., but only update when we read a message or when a message is sent to us (can use up a lot of RAM / stress the disk, since we create one persistent entry/file per user : we can't store it in $_SESSION because we need to update it when another user sends a message to us)
All my solutions are somewhat server-based, because I'm not very familiar with JS. But if a better solution exists using JavaScript, It's okay.
Thank you for your help !
I'd suggest 4'th:
Once new message has been sent to a user, you update counter in memcache. You create simple ajax application on client side sending a request every X seconds. At server side, you just check is there unread messages. At page refresh, you don't need to query the database since you get count from memcache extremely fast.
That's what I'd done if I had bottleneck in DB (in 90% cases, DB is the weekest part of any database-driven application).
That's what we usually do at highloaded web sites: we trying to avoid any COUNTER queries. If not, we denormalize the database to store counters right in the appropriate table as yet another column e.g. if you can not use memcache, you would store the unread messages counter as a column for Users table.
I'd go for option three, except I'd add memcached as solution 4.
Do a plain SQL COUNT() of unread
messages on every page load (instantly
notified of changes, but it may impact
performance badly ?)
As long as you have a decent table structure, COUNT() is a pretty fast command. I wouldn't cache this particular command. I'd instead work out the other queries to make sure you're only returning the data you need when showing them a listing. For example, if all you need is an excerpt, I'd make sure to do something like this:
SELECT id, author, msgdate, substring(body, 0, 50) from table where recipient = ?
instead of
SELECT * from table where recipient = ?;
Imho. It's best to let the client ping the server and send a json back with the amount of unread messages. Counting in mysql should be fast so I see no reason not to use it. Just filter the results on the chat session.
For the database part. The best way would be to store a new_message filled in your db table and default it to 1, and set that one to 0 when the message has been loaded.

Block SELECT until results available

I'm trying to write a PHP script for 'long-polling', returning data when new rows are added to a (Postgres) database table. Is there any way to get a SELECT query to return only when it would return results, blocking otherwise? Or should I use another signaling mechanism, outside of the database?
Take a look at LISTEN/NOTIFY:
The NOTIFY command sends a
notification event to each client
application that has previously
executed LISTEN name for the specified
notification name in the current
database
http://www.postgresql.org/docs/8.4/static/sql-notify.html
You can add an "ON INSERT" trigger to the table to fire off a NOTIFY event. However, you will need another mechanism to figure out which records need to be selected as the ability to deliver a payload with the NOTIFY event won't be available until 9.0:
http://www.postgresql.org/docs/9.0/static/sql-notify.html
there is no blocking select statement.
you could just issue the select statement on a regular basis - which incurs a certain overhead. If the query is expensive, then you might write a cheaper one like count(*) and keep track of new entries that may possibly be returned, and if the number changes issue the more expensive query.
You could look into LOCK and FOR UPDATE. FOR UPDATE can allow a query to wait until the row(s) that are being selected are unlocked. I'm not sure if there is a timeout or what resources impact having a large number of these can have, but it's one possibility.
You're trying to get an interrupt (event), when you should probably think about polling.
Create and call a stored procedure which will determine if there are new rows that the client should retrieve. If this is a web app, call an Ajax method periodically which, on the server, will query the db to see if there are new rows since its last call. If so, run another query to retrieve them and send them back to the client.
I love postgres and all, but if you're trying to do something simple and not super enterprisey, perhaps redis will be enough for you. I've had a lot of success with using it myself, and it can scale.
http://code.google.com/p/redis/

Database various connections vs. one

We have this PHP application which selects a row from the database, works on it (calls an external API which uses a webservice), and then inserts a new register based on the work done. There's an AJAX display which informs the user of how many registers have been processed.
The data is mostly text, so it's rather heavy data.
The process is made by thousands of registers a time. The user can choose how many registers to start working on. The data is obtained from one table, where they are marked as "done". No "WHERE" condition, except the optional "WHERE date BETWEEN date1 AND date2".
We had an argument over which approach is better:
Select one register, work on it, and insert the new data
Select all of the registers, work with them in memory and insert them in the database after all the work was done.
Which approach do you consider the most efficient one for a web environment with PHP and PostgreSQL? Why?
It really depends how much you care about your data (seriously):
Does reliability matter in this case? If the process dies, can you just re-process everything? Or can't you?
Typically when calling a remote web service, you don't want to be calling it twice for the same data item. Perhaps there are side effects (like credit card charges), or maybe it is not a free API...
Anyway, if you don't care about potential duplicate processing, then take the batch approach. It's easy, it's simple, and fast.
But if you do care about duplicate processing, then do this:
SELECT 1 record from the table FOR UPDATE (ie. lock it in a transaction)
UPDATE that record with a status of "Processing"
Commit that transaction
And then
Process the record
Update the record contents, AND
SET the status to "Complete", or "Error" in case of errors.
You can run this code concurrently without fear of it running over itself. You will be able to have confidence that the same record will not be processed twice.
You will also be able to see any records that "didn't make it", because their status will be "Processing", and any errors.
If the data is heavy and so is the load, considering the application is not real time dependant the best approach is most definately getting the needed data and working on all of it, then putting it back.
Efficiency speaking, regardless of language is that if you are opening single items, and working on them individually, you are probably closing the database connection. This means that if you have 1000's of items, you will open and close 1000's of connections. The overhead on this far outweighs the overhead of returning all of the items and working on them.

Categories