I have developed an AJAX based game where there is a bug caused (very remote, but in volume it happens at least once per hour) where for some reason two requests get sent to the processing page almost simultaneously (the last one I tracked, the requests were a difference of .0001 ms). There is a check right before the query is executed to make sure that it doesn't get executed twice, but since the difference is so small, the check hasn't finished before the next query gets executed. I'm stumped, how can I prevent this as it is causing serious problems in the game.
Just to be more clear, the query is starting a new round in the game, so when it executes twice, it starts 2 rounds at the same time which breaks the game, so I need to be able to stop the script from executing if the previous round isn't over, even if that previous round started .0001 ms ago.
The trick is to use an atomic update statement to do something which cannot succeed for both threads. They must not both succeed, so you can simply do a query like:
UPDATE gameinfo SET round=round+1 WHERE round=1234
Where 1234 is the current round that was in progress. Then check the number of rows affected. If the thread affects zero rows, it has failed and someone else did it before hand. I am assuming that the above will be executed in its own transaction as autocommit is on.
So all you really need is an application wide mutex. flock() sem_acquire() provide this - but only at the system level - if the application is spread across mutliple servers you'd need to use memcached or implement your own socket server to coordinate nodes.
Alternatively you could use a database as a common storage area - e.g. with MySQL acquire a lock on a table, check when the round was last started, if necessary update the row to say a new round is starting (and remember this - then unlock the table. Carry on....
C.
Locks are one way of doing this, but a simpler way in some cases is to make your request idempotent. This just means that calling it repeatedly has exactly the same effect as calling it once.
For example, your call at the moment effectively does $round++; Clearly repeated calls to this will cause trouble.
But you could instead do $round=$newround; Here repeated calls won't have any effect, because the the round has already been set, and the second call just sets it to the same value.
Related
I have this code that never finishes executing.
Here is what happens:
We make an API call to get large data and we need to check to see if any difference from our database, we need to update our DB for that specific row. Row numbers will increase as the project grows, could go even over 1 billion rows in some cases.
Issue is making this scalable that even in 1 billion row update, it works
To simulate it I did 9000 for loop
<?PHP
ini_set("memory_limit","-1");
ignore_user_abort(true);
for ($i=0; $i < 9000; $i++) {
// Complex SQL UPDATE query that requires joining tables,
// and doing search and update if matches several variables
}
//here I have log function to see if for loop has been finished
If I loop it 10 times, it still takes time but it works and records, but with 9000 it doesn't finish the loop and never records anything.
Note: I added ini_set("memory_limit","-1"); ignore_user_abort(true); to prevent memory errors.
Is there any way to make this scalable?
Details: I do this query 2 times a day
Without knowing the specifics of the API, how often you call it, how much data it's returning at a time, and how much information you actually have to store, it's hard to give you specific answers. In general, though, I'd approach it like this:
Have a "producer" script query the API on whatever basis you need, but instead of doing your complex SQL update, have it simply store the data locally (presumably in a table, let's call it tempTbl). That should ensure it runs relatively fast. Implement some sort of timestamp on this table, so you know when records were inserted. In the ideal world, the next time this "producer" script runs, if it encounters any data from the API that already exists in tempTbl, it will overwrite it with the new data (and update the last updated timestamp). This ensures tempTbl always contains the latest cached updates from the API.
You'll also have a "consumer" script which runs on a regular basis and which processes the data from tempTbl (presumably in LIFO order, but could be in any order you want). This "consumer" script will process a chuck of, say, 100 records from tempTbl, do your complex SQL UPDATE on them, and delete them from tempTbl.
The idea is that one script ("producer") is constantly filling tempTbl while the other script ("consumer") is constantly processing items in that queue. Presumably "consumer" is faster than "producer", otherwise tempTbl will grow too large. But with an intelligent schema, and careful throttling of how often each script runs, you can hopefully maintain stasis.
I'm also assuming these two scripts will be run as cron jobs, which means you just need to tweak how many records they process at a time, as well as how often they run. Theoretically there's no reason why "consumer" can't simply process all outstanding records, although in practice that may put too heavy a load on your DB so you may want to limit it to a few (dozen, hundred, thousand, or million?) records at a time.
Short:
Is there a way to get the amount of queries that were executed within a certain timespan (via PHP) in an efficient way?
Full:
I'm currently running an API for a frontend web application that will be used by a great amount of users.
I use my own custom framework that uses models to do all the data magic and they execute mostly INSERTs and SELECTs. One function of a model can execute 5 to 10 queries on a request and another function can maybe execute 50 or more per request.
Currently, I don't have a way to check if I'm "killing" my server by executing (for example) 500 queries every second.
I also don't want to have surprises when the amount of users increases to 200, 500, 1000, .. within the first week and maybe 10.000 by the end of the month.
I want to pull some sort of statistics, per hour, so that I have an idea about an average and that I can maybe work on performance and efficiency before everything fails. Merge some queries into one "bigger" one or stuff like that.
Posts I've read suggested to just keep a counter within my code, but that would require more queries, just to have a number. The preferred way would be to add a selector within my hourly statistics script that returns me the amount of queries that have been executed for the x-amount of processed requests.
To conclude.
Are there any other options to keep track of this amount?
Extra. Should I be worried and concerned about the amount of queries? They are all small ones, just for fast execution without bottlenecks or heavy calculations and I'm currently quite impressed by how blazingly fast everything is running!
Extra extra. It's on our own VPS server, so I have full access and I'm not limited to "basic" functions or commands or anything like that.
Short Answer: Use the slowlog.
Full Answer:
At the start and end of the time period, perform
SELECT VARIABLE_VALUE AS Questions
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Questions';
Then take the difference.
If the timing is not precise, also get ... WHERE VARIABLE_NAME = 'Uptime' in order to get the time (to the second)
But the problem... 500 very fast queries may not be as problematic as 5 very slow and complex queries. I suggest that elapsed time might be a better metric for deciding whether to kill someone.
And... Killing the process may lead to a puzzling situation wherein the naughty statement remains in "Killing" State for a long time. (See SHOW PROCESSLIST.) The reason why this may happen is that the statement needs to be undone to preserve the integrity of the data. An example is a single UPDATE statement that modifies all rows of a million-row table.
If you do a Kill in such a situation, it is probably best to let it finish.
In a different direction, if you have, say, a one-row UPDATE that does not use an index, but needs a table scan, then the query will take a long time and possible be more burden on the system than "500 queries". The 'cure' is likely to be adding an INDEX.
What to do about all this? Use the slowlog. Set long_query_time to some small value. The default is 10 (seconds); this is almost useless. Change it to 1 or even something smaller. Then keep an eye on the slowlog. I find it to be the best way to watch out for the system getting out of hand and to tell you what to work on fixing. More discussion: http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog
Note that the best metric in the slowlog is neither the number of times a query is run, nor how long it runs, but the product of the two. This is the default for pt-query-digest. For mysqlslowdump, adding -s t gets the results sorted in that order.
Which one is best when to choose from server-side or client-side?
I have a PHP function something like:
function insert(argument)
{
//do some heavy MySQL work such as sp_call
// that takes near about 1.5 seconds
}
I have to call this function about 500 times.
for(i=1;i<=500;i++)
{
insert(argument);
}
I have two options:
a) call through loop in PHP(server-side)-->server may timed out
b) call through loop in JavaScript(AJAX)-->takes a long time.
Please suggest the the best one, if there is any third one.
If I understand correctly your server still needs to do all the work, so you can't use the clients computer to lessen the power needed on your server, so you have a choice of the following:
Let the client ask the server 500 times. This will easily let you show the process for the client, giving him the satisfactory knowledge that something is happening, or
Let the server do everything to skip the 500 extra round trip times, and extra overhead needed to process the 500 requests.
I would probably go with 1 if it't important that the client don't give up early, or 2 if it's important that the job is done all the way though, as the client might stop the requests after 300.
EDIT: With regard to your comment I would then suggest having a "start work"-button on the client that tells the server to start the job. Your server then tells a background service (which can be created in php) to do the work. And it can update it's process to a file or in a database or something. Then the client and the php server is free to timeout and log out without problems. And then you can update the page to see if the work is completed in the background, which can be collected from the database or file or whatever. Then you minimize both time and dependencies.
You have not given any context for what you are trying to achieve - of key importance here are performance and whether a set of values should be treated as a single transaction.
The further the loop is from the physical storage (not just the DBMS) then the bigger the performance impact. For most web applications the biggest performance bottleneck is the network latency between the client and webserver - even if you are relatively close....say 50 milliseconds away...and have keeaplives working properly, then it will take a minimum of 25 seconds to carry out this operation for 500 data items.
For optimal performance you should be sending the data the DBMS in the least number of DML statements - you've mentioned MySQL which supports multiple row inserts and if you're using MySQLi you can also submit multiple DML statements in the same database call (although the latter just eliminates the chatter between PHP and DBMS while a single DML inserting multiple rows also reduces chatter between the DBMS and the storage). Depending on the data structure and optimiziation this should take in the region of 10s of milliseconds to insert hundreds of rows - both methods will be much, MUCH faster than having the loop running in the client even if the latency were 0.
The length of time the transaction in progress is going to determine the likelihood of the transaction failing - the faster method will therefore be thousands of times more reliable than the Ajax method.
As Krycke suggests, using the client to do some of the work will not save resource on your system - there is an additional overhead of the webserver, PHP instances and DBMS connection. Although these are relatively small, they add up quickly. If you test both approaches you will find that having the loop in PHP or in the database will result in significantly less effort and therefore greater capacity on your server.
Once I had script which was running tens of minutes. My solutions was doing long request through AJAX with timeout 1 second and checking for result in another AJAX threads. Experience for user is better than waiting too long for response from php without ajax.
$.ajax({
...
timeout: 1000
})
So Finally I Got this.
a) Use AJAX if you wanna sure that it will complete. it is also user-friendly as he gets regular responses between AJAX calls.
b) Use Server Side Script if you almost sure that server will not get it down in between and want less load on client.
Now i am using Server Side Script with a waiting message window for the user and user waits for successful submission message else he have to try again.
with a probability that it will succeed in first attempt is 90-95%.
I am building a turn-based multiplayer game with Flash and PHP. Sometimes two users may call on the same PHP script at the same time. This script is designed to write some information to the database. But that script should not run if that information has already been written by another user, or else the game will break. If PHP processes these scripts sequentially (similar to how MySQL queues up multiple queries), then only one script should run in total and everything should be fine.
However, I find that around 10% of the time, BOTH user's scripts are executed. My theory is that the server sometimes receives both user requests to run the script at exactly the same time and they both run because neither detected that anything has been written to the database yet. Is it possible that both scripts were executed at the same time? If so, what are the possible solutions to this problem.
THis is indeed possible. You can try locking and unlocking tables at the beginning and end of your scripts.
Though this will slow down some requests, as they would have to first wait for the locked tables to be unlocked.
It doesnt matter, if it is PHP, C, Java what ever. At the same time can run max only as much processes, as you have CPUs (and cores). There can be running lets say 100 processes at the same time, if you have only 2 cores. Only 2 are running, rest is waiting.
Now it depends what you see under run. If you take it as active or if you take also waiting processes. Secondly, it depends on your system configuration, how many processes can wait and on your system specs.
Sounds, at first glance, like what keeps a 2nd instance of the script to roll just does not happen fast enough, 10% of the time... I understand that you already have some kind of a 'lock' like someone told you to add, which is great; as someone mentioned above, always put this lock FIRST THING in your script, if not even before calling the script (a.k.a in parent script). Same goes for competing functions / objects etc...
Just a note though, I was directed here by google and what i wanted to find out is if script B will run IN AN IFRAME (so in a 'different window' if you wish) if script A is not finished running; basically your title is a bit blurry. Thank you very much.
Fortunately enough we're in the same pants : I'm programing an Hearthstone-like card game using php (which I know, ain't suited for this at all, but I just like challenging tasks, (and okay, that's the only language i'm familiar with)). Basically I gotta keep multiple 'instants' or actions if you prefer from triggering while another set of global event/instant - instants - sub-instants is rolling. This includes NEVER calling a function that has an event into it into the same rolling snipet, EXCEPT if I roll a while on a $_SESSION variable with value y that only does sleep(1) (that happens in scritpt A); while $_SESSION["phase"] == "EndOfTurnEffects" and then continue to roll until $_SESSION["phase"] == "StandBy" (other player's turn), and I wish script B to mofity $_SESSION["phase"]. Basically if script B does not run before script A is done executing, I'm caught in an endless loop of the while statement...
That's very plausible that they do. Look into database transactions.
Briefly, database transactions are used to control concurrency between programs that access the database at the same time. You start a transaction, then execute multiple queries and finally commit the transaction. If two scripts overlap each other, one of them will fail.
Note that isolation levels can further give fine grained control of how much the two (or more) competing scripts may share. Typically all are allowed to ready from the database, but only one is allowed to write. So the error will happen at the final commit. This is fine as long as all side effects are happening in the database, but not sufficient if you have external side effects (Such as deleting a file or sending an email). In these cases you may want to lock a table or row for the duration of the transaction or set the isolation level.
Here is an example of SQL table locking that you can use so that the first PHP thread which grabs the DB first will lock the table (using lock name "awesome_lock_row") until it finally releases it. The second thread attempts to use the same table, and since the lock name is also "awesome_lock_row"), it keeps on waiting until the first PHP thread has unlocked the table.
For this example, you can try running the same script perhaps 100 times concurrently as a cron job and you should see "update_this_data" number field increment to 100. If the table hadn't been locked, all the concurrent 100 threads would probably first see "update_this_data" as 0 at the same time and the end result would have been just 1 instead of 100.
<?php
$db = new mysqli( 'host', 'username', 'password', 'dbname');
// Lock the table
$db->query( "DO GET_LOCK('awesome_lock_row', 30)" ); // Timeout 30 seconds
$result = $db->query( "SELECT * FROM table_name" );
if($result) {
if ( $row = $result->fetch_object() )
$output = $row;
$result->close();
}
$update_id = $output->some_id;
$db->query( UPDATE table_name SET update_this_data=update_this_data+1 WHERE id={$update_id} );
// Unlock the table
$db->query( "DO RELEASE_LOCK('awesome_lock_row')" );
?>
Hope this helps.
I have a PHP application that currently has 5k users and will keep increasing for the forseeable future. Once a week I run a script that:
fetches all the users from the database
loops through the users, and performs some upkeep for each one (this includes adding new DB records)
The last time this script ran, it only processed 1400 users before dieing due to a 30 second maximum execute time error. One solution I thought of was to have the main script still fetch all the users, but instead of performing the upkeep process itself, it would make an asynchronous cURL call (1 for each user) to a new script that will perform the upkeep for that particular user.
My concern here is that 5k+ cURL calls could bring down the server. Is this something that could be remedied by using a messaging queue instead of cURL calls? I have no experience using one, but from what I've read it seems like this might help. If so, which message queuing system would you recommend?
Some background info:
this is a Symfony project, using Doctrine as my ORM and MySQL as my DB
the server is a Windows machine, and I'm using Windows' task scheduler and wget to run this script automatically once per week.
Any advice and help is greatly appreciated.
If it's possible, I would make a scheduled task (cron job) that would run more often and use LIMIT 100 (or some other number) to process a limited number of users at a time.
A few ideas:
Increase the Script Execution time-limit - set_time_limit()
Don't go overboard, but more than 30 seconds would be a start.
Track Upkeep against Users
Maybe add a field for each user, last_check and have that field set to the date/time of the last successful "Upkeep" action performed against that user.
Process Smaller Batches
Better to run smaller batches more often. Think of it as being the PHP equivalent of "all of your eggs in more than one basket". With the last_check field above, it would be easy to identify those with the longest period since the last update, and also set a threshold for how often to process them.
Run More Often
Set a cronjob and process, say 100 records every 2 minutes or something like that.
Log and Review your Performance
Have logfiles and record stats. How many records were processed, how long was it since they were last processed, how long did the script take. These metrics will allow you to tweak the batch sizes, cronjob settings, time-limits, etc. to ensure that the maximum checks are performed in a stable fashion.
Setting all this may sound like alot of work compared to a single process, but it will allow you to handle increased user volumes, and would form a strong foundation for any further maintenance tasks you might be looking at down the track.
Why don't you still use the cURL idea, but instead of processing only one user for each, send a bunch of users to one by splitting them into groups of 1000 or something.
Have you considered changing your logic to commit changes as you process each user? It sounds like you may be running a single transaction to process all users, which may not be necessary.
How about just increasing the execution time limit of PHP?
Also, looking into if you can improve your upkeep-procedure to make it faster can help too. Depending on what exactly you are doing, you could also look into spreading it out a bit. Do a couple once in a while rather than everyone at once. But depends on what exactly you're doing of course.