PHP processing time and CPU usage with million rows comparing - php

I have a tool witch compare one string with, on average - 250k strings from database.
Two tables are used during compare process - categories and categories_strings. In string table there is around 2.5 million rows while pivot - categories_string contains of 7 million rows.
My query is pretty simple, selecting strings columns, joining pivot table, adding where clause to specify category and setting limit of 10 000.
I run this query in a loop, every batch is 10 000 strings. To execute whole script faster I use Seek Method instead of MySQL offset which was a way too slow on huge offsets.
Then, comparing by common algorithms such us simple text, levenshtein etc. is perfomed on each batch. This part is simple.
The question starts here.
On my laptop (lenovo x230) whole process for i.e. 250k string compared takes: 7,4 seconds to load SQL, 13,3 seconds to compare all rows. And then 0,1 second sorting and transforming for view.
I've also small dedicated server. Same PHP version, same MySQL. Web server doesn't matter, as I run it from command line right now. As on my laptop it takes +- 20 seconds in total, on the server it is... 120 seconds.
So, what is the most important factor for a long running PHP program which have impact on execution time? All I can think of is CPU, which on the dedicated server is worse, it is Intel(R) Atom(TM) CPU N2800 # 1.86GHz. Memory comsumption is pretty low, about 2-4%. CPU usage, however is around 60% on my laptop and 99,7 - 100% on the server.
Is CPU the most importing factor in this case? Is there any way to split it for example into several processes which in total would take less? Despite all, how to monitor CPU usage, which part of script is most consuming.

Related

Performance of mysql uuid_to_bin in a php json import script

I have a php script which iterates through a JSON file line by line (using JsonMachine), checks each line for criteria (foreach); if criteria are met it checks if it's already in a database, and then it imports/updates (MYSQL 8.0.26). As an example last time this script ran it iterated through 65,000 rows and imported 54,000 of them in 24 seconds.
Each JSON row has a UUID as unique key, and I am importing this as a VARCHAR(36).
I read that it can be advantageous to store the UUIDs as BINARY(16) using uuid_to_bin and bin_to_uuid so I coded the script to store the UUID as binary and the read php scripts to unencode back to UUID, and the database fields to BINARY(16).
This worked functionally, but the script import time went from 24 seconds to 30 minutes. The server was not CPU-bound during that time, running at 25 to 30% (normally <5%).
The script without uuid conversion runs at about 3,000 lines per second, using uuid conversion it runs at about 30 lines per second.
The question: can anyone with experience on bulk importing using uuid_to_bin comment on performance?
I've reverted to native UUID storage, but I'm interested to hear others' experience.
EDIT with extra info from comments and replies:
The UUID is the primary key
The server is a VM with 12GB and 4 x assigned cores
The table is 54,000 rows (from the import), and is 70MB in size
Innodb buffer pool size is not changed from default, 128MB: 134,​217,​728
Oh, bother. UUID_TO_BIN changed the UUID values from being scattered to being roughly chronologically ordered (for type 1 uuids). This helps performance by clustering rows on disk better.
First, let's check the type. Please display one (any one) of the 36-char uuids or HEX(binary) using the 16-byte binary version. After that, I will continue this answer depending on whether it is type 1 or some other type.
Meanwhile, some other questions (to help me focus on the root cause):
What is the value of innodb_buffer_pool_size?
How much RAM?
How big is the table?
Were the incoming uuids in some particular order?
A tip: Use IODKU instead of SELECT + (UPDATE or INSERT). That will double the speed.
Then batch them 100 at a time. That may give another 10x speedup.
More
Your UUIDs are type 4 -- random. UUID_TO_BIN() changes from one random order to another. (Dropping from 36 bytes to 16 is still beneficial.)
innodb_buffer_pool_size -- 128M is an old, too small, default. If you have more than 4GB, set that to about 70% of RAM. This change should help performance significantly. Your VM has 12GB, so change the setting to 8G. This will eliminate most of the I/O, which is the slow part of SQL.

mysqli_free_result() is slow when all the data are not read while using MYSQLI_USE_RESULT

I am using MYSQLI_USE_RESULT(Unbuffered query) while querying a huge data from the table.
For testing I took a table of 5.6 GB size.
I selected all columns Select * from test_table.
If I am not reading any rows with method like fetch_assoc() etc. Then try to close the result with mysqli_free_result(). It takes 5 to 10 secs to close it.
Sometimes I read required number rows based on available memory. And then I call mysqli_free_result() it takes less time when compared with not even one row is read.
So lesser unread rows means lesser time to free results. More unread rows more time to free results.
It's no where documented that this functionality will consume time in best of my knowledge.
Time taken for query is around 0.0008 sec.
Is it bug or is it expected behavior?
For me this sluggishness defeating whole point using MYSQLI_USE_RESULT.
MySQL v5.7.21, PHP v7.2.4 used for testing.
Alias of this function are mysqli_result::free -- mysqli_result::close -- mysqli_result::free_result -- mysqli_free_result.

MYSQL take too much time for insertion record

MYSQL take too much time for insertion record
I have 32 GB RAM Dedicated Server and its hardly use CPU upto 15% and Memory 20% even 5 crons simultaneously executing.
The issue is that, there is PHP script has simple 200 lines of code with some basic calculation and total 3 queries to
select and insert with 12 column (4 has an integer, 8 has varchar datatype)
It executes once per day and insert records around 280000 to 300000 records, it takes on average 5-6 hours to execute.
Questions:
1) Why it takes 5-6 hours to insert just 3 lack of records?
2) Why it's not used much resources, RAM and CPU?
3) Is that any configuration to limit mysql execution?
Server Details:
Total 4 processors each have Intel(R) Xeon(R) CPU E3-1220 v3 # 3.10GHz Cache 8192 KB
32 GB RAM
Please help me to figure out the issue
First create INDEXon YOUR table .Then try to Execute this query
SELECT COLUMN1,COLUMN2..
FROM YOUR.
How much time it took for execution?Note down that timing and Note down Execution timing for same query without creating index on both table.
You will definitely get more timing.
So indirectly it indicates that Data you want to insert is more dependent on how fast it was fetched.
So once Fetching is fast obviously insertion is faster than previous one.
Hope this will helps.

performance concerns when storing log records in MySQL

I have setup a 15 min cron job to monitor 10+ websites' performance (all hosted on different servers). Every 15 mins I will check if the server is up/down, the response time in ms etc.
I would like to save these information into MySQL instead of a log.txt file, so that it can be easily retrieve, query and analysis. (ie. the server performance on x day or x month or server performance between x and y days)
here is my table's going to look like:
id website_ip recorded_timestamp response_time_in_ms website_status
If I insert a new entry for each website, every day I'll have 1440 records for each website (15 x 4 x 24), then for 10 websites it will be 14400 records every day!!
so, I'm thinking of creating only 1 entry each hour / website. In this way instead of creating 14400 records every day, I'll only have 24 x 10 = 240 records every day for 10 websites.
But still, it's not perfect, what if I want to know keep the records for the whole year? then I'll have 87600 records for 365 years for 10 websites.
is 87600 records alot? my biggest concern is the difference between the server local time and client local time. How to improve the design without screwing up the accuracy and timezone?
This is a bit long or too short for a comment. The simple answer to your question is "no".
No, 87,600 records is not a lot of records. In fact, your full data with 5,256,000 records per year is not that much. It could be a lot of data if you had really wide records, but your record is at most a few tens of bytes. The resulting table would still have less than a gigabyte per year. Not very much at all, really.
Databases are designed to have big tables. You have the opportunity to do things to speed your queries on the log files. The most obvious is to create indexes on columns that would often be used for selection purposes. Another opportunity is to use partitioning to break up the storage of the tables into separate "files" (technically table spaces), so queries require less I/O.
You may want to periodically summarize more recent records and store the results in summary tables for your most common queries. However, that level of detail is beyond the scope of this answer.

Strange performance test results for LAMP site

We have an online application of large amount of data in tables ranging usually from 10+ million in each table.
The performance hits i am facing is in reporting modules where some charts and tables are displayed loads very slow.
Assuming that total time = PHP execution time + MYSQL query time + http response time
To verify this when i open phpmyadmin which again another web app.
If i click a table with 3 records (SELECT * from table_name) = total time for displaying is 1 - 1.5 seconds. i can see mysql query time 0.0001 sec
When I click a table with 10 million records = total time is 7 -8 second and mysql query time being again close to 0.0001 sec
shouldnt the page load time be the sum of mysql and script run times ? why it loads slow when mysql rows has larger data even mysql says it took same time.
PHPMyAdmin uses LIMIT, so that's an irrelevant comparison.
You should use EXPLAIN to see why your query is so awfully slow. 10 million is a small dataset (assuming average row size) and shouldn't take anywhere near 7 seconds.
Also, your method of counting the execution time is flawed. You should measure by timing the individual parts or your script. If SQL is your bottleneck, start optimizing your table or query.

Categories