A theoretical thought experiment

A theoretical thought experiment - php

I recently came upon this theoretical problem:
There are two PHP scripts in an application;
The first script connects to a DB each day at 00:00 and inserts in an existing DB table 1 million rows;
The second script has a foreach loop, iterating through the same DB table's rows; It then makes an API call which takes exactly 1 second to complete (request + response = 1s); Independently of the content of a response, it then deletes one row from the DB table;
Hence, each day the DB table gains 1 million rows, but only loses 1 row per second, i.e. 86400 rows per day, and because of that it grows infinitely big;
What modification to the second script should be changed so that the DB table size does not grow infinitely big?
Does this problem sound familiar to anyone? If so, is there a 'canonical' solution to it? Because the first thing that crossed my mind was, if the row deletion does not depend on the API response, why not just simply take the API call outside of the foreach loop? Unfortunately, I didn't have a chance to ask my question.
Any other ideas?

Related

How to get the time difference in seconds between two db column status (or events)

I am trying to get the time difference in seconds in my database between two events. On the database, the users table has different columns but I'm working with the status column. If a user places an order on our website the status column shows "pending" and, if it is confirmed by our agents it then switches to "success". I'm trying to get the time difference (in secs) between when it shows pending and when it shows success.
NB: I'll be glad if anyone can explain the time() function in PHP with an example.

You can use MySQL's unix_timestamp() function. Assuming that your table has two records for the two events, and there is a column called eventTime, then two queries like the following can give you the two values which are the respective number of seconds since Epoch. You can subtract the latter by the former and get the time difference
select unix_timestamp(eventTime) ... where status='pending'
select unix_timestamp(eventTime) ... where status='success'
Update
After re-reading your question, I guess your DB design is that there is only one row for the whole life cycle of the transaction (from pending to success). In this case, if all three parties (the agent who updates the status to pending, the agent who updates the status to success, and the agent who needs to find the time difference between the two events) involved are the same thread, then you can keep the two event time in memory and simply compute the difference.
However, I think it is more likely that the three parties are two or three different threads. In this case, I think you must have some mechanism to pass the knowledge (of the first event time) from one thread to another. This can be done by way of adding a new column called lastUpdateTime, or by adding a new table for the purpose of time tracking.
By the way, if you use the second approach, I think MySQL Trigger may be useful for you (so that whenever the table gets updated, it trigger another command to update the second table which is used solely to keep track of event time of elapsed time). This approach allows you to not changing the original table but just add a new table.

Bulk-update a DB table using values from a JSON object

I have a PHP program which gets from an API the weather forecast data for the following 240 hours, for 100 different cities (for a total of 24.000 records; I save them in a single table). The program gets, for every city and for every hour, temperature, humidity, probability of precipitation, sky cover and wind speed. This data is in JSON format, and I have to store all of it into a database, preferably mySQL. It is important that this operation has to be done in a single time for all the cities.
Since I would like to update the values every 10 minutes or so, performance is very important. If someone can tell me which is the most efficient way to update my table with the values from the JSON it would be of great help.
So far I have tried the following strategies:
1) decode the JSON and use a loop with a prepared statement to update each value at a time {too slow};
2) use a stored procedure {I do not know how to pass the procedure a whole JSON object, and I know there is a limited number of individual parameters I can pass};
3) use LOAD DATA INFILE {the generation of the csv file is too slow};
4) use UPDATE with CASE, generating the sql dynamically {the string gets so long that the execution is too slow}.
I will be happy to provide additional information if needed.

You have a single table with about a dozen columns, correct? And you need to insert 100 rows every 10 minutes, correct?
Inserting 100 rows like that every second would be only slightly challenging. Please show us the SQL code; something must be miserably wrong with it. I can't imagine how any of your options would take more than a few seconds. Is "a few seconds" too slow?
Or does the table have only 100 rows? And you are issuing 100 updates every 10 minutes? Still, no sweat.
Rebuild technique:
If practical, I would build a new table with the new data, then swap tables:
CREATE TABLE new LIKE real;
Load the data (LOAD DATA INFILE is good if you have a .csv)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
There is no downtime -- the real is always available, regardless of how long the load takes.
(Doing a massive update is much more "effort" inside the database; reloading should be faster.)

Optimizing a database-driven table display

I'm in the process of revising a PHP page that displays all of our items with various statistics regarding each one. We're looking at a period running from the first of the month one year ago up to yesterday. I've managed to get a basic version of the script working, but it performs poorly. The initial implementation of this script (not my revision) retrieved all sales records and item information at once, then used the resulting records to create objects (not mysql_fetch_objects). These were then stored in an array that used hard-coded values to access the object's attributes. The way this is all set up is fairly confusing and doesn't easily lend itself to reusability. It is, however, significantly faster than my implementation since it only calls to the database once or twice.
My implementation utilizes three calls. The first obtains basic report information needed to create DateTime objects for the report's range (spanning the first of the month twelve months ago up to yesterday's date). This is, however, all it's used for and I really don't think I even need to make this call. The second retrieves all basic information for items included in the report. This comes out to 854 records. The last select statement retrieves all the sales information for these items, and last I checked, this returned over 6000 records.
What I tried to do was select only the records pertinent to the current item in the loop, represented by the following.
foreach($allItems as $item){
//Display properties of item here
//db_controller is a special class used to handle DB manipulation
$query = "SELECT * FROM sales_records WHERE ...";
$result = $db_controller->select($query);
//Use sales records to calculate report values
}
This is the problem. Calling to the database for each and every item is extremely time-consuming and drastically impacts performance. What's returned is simply the sums of quantities sold in each month within the timeframe specified earlier in the script, along with the resulting sales amounts. At maximum, each item will only have 13 sales records, ranging from 2015/1 to 2016/1 for example. However, I'm not sure if performing a single fetch for all these sales records before the loop will help performance, the reason being that I would then have to search through the result array for the first instance of a sales record pertaining to the current item. What can I do to alleviate this issue? Since this is a script important to the company's operations, I want to be sure that performance on my script is just as quick as the old script or, at the very least, only slightly slower than the old script. My results are accurate but just painfully slow.

How do I limit max website requests to 10 per user per day, using a database to record request counts?

I have a person's username, and he is allowed ten requests per day. Every day the requests go back to 0,and old data is not needed.
What is the best way to do this?
This is the way that comes to mind, but I am not sure if it's the best way
(two fields, today_date, request_count):
Query the DB for the date of last request and request count.
Get result and check if it was today.
If today, check the request count, if less than 10, update query database to ++count.
If not today, update the DB with today's date and count = 1.
Is there another way with fewer DB queries?

I think your solution is good. It is possible to reset the count on a daily basis too. That will allow you to skip a column, but you do need to run a cron job. If there are many users that won't have any requests at all, it is needless to reset their count each day.
But whichever you pick, both solutions are very similar in performance, data size and development time/complexity.

Just one column request_count. Then query this column and update it. As far as I know with stored procedures this may be possible in one single query. Even if not, it will be just two. Then create a cron job, that calls a script, that resets the column to 0 every day at 00:00.

To spare you some requests to the DB define
the maximum number of requests per day allowed.
the first day available to your application (date offset).
Then add a requestcount field to the database per user.
On the first request get the count from the db.
The count is always the number of the day multiplied with the maximum + 1 of requests per day plus the actual requests by that user:
day * (max + 1) + n
So if on first request the count from the db is actually higher than allowed, block.
Otherwise if it's lower than the current day base, reset to the current day base (in the PHP variable)
And count up. Store this value into the DB.
This is one read operation, and in case the request is still valid, one write operation to the DB per request.
There is no need to run a cron job to clean this up.
That's actually the same as you propose in your question, but the day information is part of the counter value. So you can do more with one value at once, while counting up with +1 per request still works for the block.

You have to take into account that each user may be in a different time zone than your server, so you can't just store the count or the "day * max" trick. Try to get the time offset and then the start of the user's day can be stored in your "quotas" database. In mySQL, that would look like:
`start_day`=ADDTIME(CURDATE()+INTERVAL 0 SECOND,'$offsetClientServer')
Then simply look at this time the next time you check the quota. The quota check can all be done in one query.

Mysql - Summary Tables

Which method do you suggest and why?
Creating a summary table and . . .
1) Updating the table as the action occurs in real time.
2) Running group by queries every 15 minutes to update the summary table.
3) Something else?
The data must be near real time, it can't wait an hour, a day, etc.

I think there is a 3rd option, which might allow you to manage your CPU resources a little better. How about writing a separate process that periodically updates the summarized data tables? Rather than recreating the summary with a group by, which is GUARANTEED to run slower over time because there will be more rows every time you do it, maybe you can just update the values. Depending on the nature of the data, it may be impossible, but if it is so important that it can't wait and has to be near-real-time, then I think you can afford the time to tweak the schema and allow the process to update it without having to read every row in the source tables.
For example, say your data is just login_data (cols username, login_timestamp, logout_timestamp). Your summary could be login_summary (cols username, count). Once every 15 mins you could truncate the login_summary table, and then insert using select username, count(*) kind of code. But then you'd have to rescan the entire table each time. To speed things up, you could change the summary table to have a last_update column. Then every 15 mins you'd just do an update for every record newer than the last_update record for that user. More complicated of course, but it has some benefits: 1) You only update the rows that changed, and 2) You only read the new rows.
And if 15 minutes turned out to be too old for your users, you could adjust it to run every 10 mins. That would have some impact on CPU of course, but not as much as redoing the entire summary every 15 mins.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.