Double requests causing double inserts

Double requests causing double inserts - php

We have a SPA app with big traffic and sometimes occasionally double rows are inserted in several parts of application.
For example user registration. Normally the validation mechanism will do the trick checking if email address already exists, however I was able to reproduce the problem by dispatching the same request twice using axios resulting doubled user in the database.
I initially thought that the second request should throw validation error but apparently it's too quick and checks user before the first request was able to store it.
So I put 500ms delay between those requests and yes it worked, the second request thrown a validation error.
My question is, what are the techniques to prevent double inserts IF TWO REQUESTS ARE ALREADY DISPATCHED IN THE SAME FRACTION OF A SECOND?
Of course we have blocked submit form button (since the beginning) after making first request, yet people somehow manages to dispatch requests twice.

One option I've utilized in the past is database locking. I'm a bit rusty on this, but in your case:
Request WRITE LOCK for the table
Run SELECT on table to find user.
Run INSERT on table.
Release WRITE LOCK.
This post on DB locking should give you a better idea of what locks apply what affect. Note: some database systems may implement locks differently.
Edit: I should also note that there will be additional performance issues using database locks.

Related

Rollback Database Update After Refreshing a Page (PHP)

I have a php application where the user can make some changes to an oracle database with adodb.
After the request is executed, the page is refreshed and the user can see the result.
How would I add an undo option of this UPDATE after refreshing the page?
I've tried beginTrans(), but it seems like it automatically rollbacks after the php script is executed.

Database transactions are tied to a single connection. Connections are normally closed when the PHP script finishes and trying to make a connection persist for the same user on multiple requests would be very problematic.
As much as possible, it's best to treat HTTP requests as stateless. Meaning, changes should be committed to the database at the end of every request and an undo in HTTP should probably not be concerned with rolling back a previous transaction, but also actually committing changes in the database.

Log to filesystem or database?

I'm using LAMP and have full access to server configuration and setup.
I am very confused about which is the best method to simply log some data to.
I want to log things like this:
Analytics (server side with PHP) of every visitor.
When creating a new user, store id number so an email and SMS message can be sent later in a Cron Task (saves from sending the email/SMS during the users' request).
Number of page views of certain 'articles'. Increment once per visit to that page.
As you can see they are all simple insert/add actions that all can be processed later by a Cron Task.
The application needs to be scalable for the future.
These are my options (and what I have learned):
(1) Database (MySQL). People say don't use this for logging data like above.
(2) Use file_get_contents() WITHOUT a file lock. I'm told this can cause data corruption.
(3) Use file_get_contents() WITH a file lock but I believe this either results in missed data as file_get_contents returns false and doesn't add the data if the lock is in force -OR- it results in PHP having to wait for the lock to release. I don't think MySQL has to wait to do multiple inserts.
Which is the best option? Does it make a difference if I'm handling tens of requests per seconds compared to thousands of requests per second, or would I use the same option?

Executing a long action

I'm creating a PHP script that will allow a user to log into a website and execute database queries and do other actions that could take some time to complete. If the PHP script runs these actions and they take too long, the browser page times out on the user end and the action never completes on the server end. If I redirect the user to another page and then attempt to run the action in the PHP script, will the server run it even though the user is not on the page? Could the action still time out?

In the event of long-running server-side actions in a web application like this, a good approach is to separate the queueing of the actions (which should be handled by the web application) from the running of the actions (which should be handled by a different server-side application).
In this case it could be as simple as the web application inserting a record into a database table which indicates that User X has requested Action Y to be processed at Time Z. A back-end process (always-running daemon, scheduled script, whatever you prefer) would be constantly polling that database table to look for new entries. ("New" might be denoted by something like an "IsComplete" column in that table.) It could poll every minute, every few minutes, every hour... whatever is a comfortable balance between server performance and the responsiveness of an action beginning when it's requested.
Once the action is complete, the server-side application that ran the action would mark it as complete in the database and would store the results wherever you need them to be stored. (Another database table or set of tables? A file? etc.) The web application can check for these results whenever you need it to (such as on each page load, maybe there could be some sort of "current status" of queued actions on each page so the user can see when it's ready).
The reason for all of this is simply to keep the user-facing web application responsive. Even if you do things like increase timeouts, users' browsers may still give up. Or the users themselves may give up after staring at a blank page and a spinning cursor for too long. The user interface should always respond back to the user quickly.

You could look at using something like ignore_user_abort but that is still not ideal in my opinion. I would look at deferring these actions and running them through a message queue. PHP comes with Gearman - that is one option. Using a message queue scales well and does a better job ensuring the request actions actually get completed.
Lots on SO on the subject... Asynchronous processing or message queues in PHP (CakePHP) ...but don't use Cake :)

set_time_limit() is your friend.

If it were me, I would put a loading icon animation in the user interface telling them to wait. Then I would execute the "long process" using an asynchronous AJAX call that would then return an answer, positive or negative, that you would pass to the user through JavaScript.
Just like when you upload pictures to Facebook, you can tell the user what is going on. Very clean!

Singleton database.php

If I have a database.php class (singleton) which reads and writes information for users of my web application, what happens when simultaneous requests for the same database function is called?
Is it possible that the database class will return the wrong information to other users accessing the same function at the same time?
What other similar problems could occur?

what happens when simultaneous requests for the same database function is called? Is it possible that the database class will return the wrong information to other users accessing the same function at the same time?
Absolutely not.
Each PHP request is handled entirely in it's own process space. There is no threading, no application server connection pool, no shared memory, nothing funky like that. Nothing is shared unless you've gone out of your way to do so (like caching things in APC/memcached).
Every time the application starts, your Singleton will get created. When the request ends, so does the script. When the script exits, all of the variables, including your Singleton, go away with it.
What other similar problems could occur?
Unless you are using transactions (and if you're using MySQL, using a transaction-safe table type like InnoDB), it is possible that users could see partial updates. For example, let's say that you need to perform an update to three tables to update one set of data properly. After the first update has completed but before the other two have completed, it's possible for another process to come along and request data from the three tables and grab the now inconsistent data. This is one form of race condition.

Dealing with long server-side operations using ajax?

I've a particularly long operation that is going to get run when a
user presses a button on an interface and I'm wondering what would be the best
way to indicate this back to the client.
The operation is populating a fact table for a number of years worth of data
which will take roughly 20 minutes so I'm not intending the interface to be
synchronous. Even though it is generating large quantities of data server side,
I'd still like everything to remain responsive since the data for the month the
user is currently viewing will be updated fairly quickly which isn't a problem.
I thought about setting a session variable after the operation has completed
and polling for that session variable. Is this a feasible way to do such a
thing? However, I'm particularly concerned
about the user navigating away/closing their browser and then all status
about the long running job is lost.
Would it be better to perhaps insert a record somewhere lodging the processing record when it has started and finished. Then create some other sort of interface so the user (or users) can monitor the jobs that are currently executing/finished/failed?
Has anyone any resources I could look at?
How'd you do it?

The server side portion of code should spawn or communicate with a process that lives outside the web server. Using web page code to run tasks that should be handled by a daemon is just sloppy work.

You can't expect them to hang around for 20 minutes. Even the most cooperative users in the world are bound to go off and do something else, forget, and close the window. Allowing such long connection times screws up any chance of a sensible HTTP timeout and leaves you open to trivial DOS too.
As Spencer suggests, use the first request to start a process which is independent of the http request, pass an id back in the AJAX response, store the id in the session or in a DB against that user, or whatever you want. The user can then do whatever they want and it won't interrupt the task. The id can be used to poll for status. If you save it to a DB, the user can log off, clear their cookies, and when they log back in you will still be able to retrieve the status of the task.

Session are not that realible, I would probably design some sort of tasks list. So I can keep records of tasks per user. With this design I will be able to show "done" tasks, to keep user aware.
Also I will move long operation out of the worker process. This is required because web-servers could be restrated.
And, yes, I will request status every dozens of seconds from server with ajax calls.

You can have JS timer that periodically pings your server to see if any jobs are done. If user goes away and comes back you restart the timer. When job is done you indicate that to the user so they can click on the link and open the report (I would not recommend forcefully load something though it can be done)

From my experience the best way to do this is saving on the server side which reports are running for each users, and their statuses. The client would then poll this status periodically.
Basically, instead of checkStatusOf(int session), have the client ask the server of getRunningJobsFor(int userId) returning all running jobs and statuses.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.