MongoDB write performance - php

Given, 20M documents with each average of 550bytes and PHP driver on a single machine.
First insert (not mongoimport) with journal on, WriteConcern to default (1). Took about 12 hours. Then it made me wonder, so I tried the second import.
Second, I used batchInsert() with --nojournal and WriteConcern=0 and I took noted the performance. In total it TOO took 12 hours?! What was interesting what started to be 40000 records being inserted per minute it ended up being 2500 records per minutes and I can only imagine it would have been 100 records per minute towards the end.
My questions are:
I assumed by turning journal off and make w=0 and use batchInsert() my total insertion should drop significantly!
How is the significant drop of inserts per minutes is explained?
--UPDATE--
Machine is Core Duo 3GHz, with 8GB of RAM. RAM usage stays steady at %50 during whole process. CPU usage however goes high. In PHP I have ini_set('memory_limit', -1) to not limit the memory usage.

If it only one time migration, I would suggest you to delete all indexes before these inserts. Using deleteIndex(..) method.
After all inserts finished use isureIndex(..) to get the indexes back.
PS. From numbers you provided, it is not a big amount of data, probably you have mis-configured the MongoDB Server. Please provide your MongoDB Server config and Memory size, maybe I could find something else to improve.
Replying to your (2) question, probably your server is luck of memory after some inserts.

After a lot of hair pulling, I realized the backlog effect. Interesting enough when I noundled my documents to 5000 rows, batch insert worked like magic and imported in just under 4 minutes!!
This tool gave me the idea: https://github.com/jsteemann/BulkInsertBenchmark

Related

Memory not freed after execution of a PHP script

I have a LAMP server on which I run a PHP script that makes a SELECT query on a table containing about 1 million rows.
Here is my script (PHP 8.2 and mariaDB 10.5.18) :
$db = new PDO("mysql:host=$dbhost;dbname=$dbname;", $dbuser, $dbpass);
$req = $db->prepare('SELECT * FROM '.$f_dataset);
$req->execute();
$fetch = $req->fetchAll(PDO::FETCH_ASSOC);
$req->closeCursor();
My problem is that each execution of this script seems to consume about 500MB of RAM on my server, and this memory is not released at the end of the execution, so having only 2GB of RAM, after 3 executions, the server kills the Apache2 task, which forces me to restart the Apache server each time.
Is there a solution to this? A piece of code that allows to free the used memory?
I tried to use unset($fetch) and gc_collect_cycles() but nothing works and I haven't found anyone who had the same problem as me.
EDIT
After the more skeptical among you about my problem posted several responses asking for evidence as well as additional information, here is what else I can tell you:
I am currently developing a trading strategy testing tool where I set the parameters manually via an HTML form. This one is then processed by a PHP script that will first perform calculations in order to reproduce technical indicators (using the Trader library for some of them, and reprogrammed for others) from the parameters returned by the form.
In a second step, after having reproduced the technical indicators and having stored their values in my database, the PHP script will simulate a buy or sell order according to the values of the stock market price I am interested in, and according to the values of the technical indicators calculated just before.
To do this, I have in my database for example 2 tables, the first one stores the information of the candles of size 1 minute (opening price, closing price, max price, min price, volume ...), that is to say 1 candle per line, the second table stores the value of a technical indicator, corresponding to a candle, thus to a line of my 1st table.
The reason why I need to make calculations, and therefore to get my 1 million candles, is that my table contains 1 million candles of 1 minute on which I want to test my strategy. I could do this with 500 candles as well as with 10 million candles.
My problem now, is only with the candle retrieval, there are not even any calculations yet. I shared my script above which is very short and there is absolutely nothing else in it except the definitions of my variables $dbname, $dbhost etc. So look no further, you have absolutely everything here.
When I run this script on my browser, and I look at my RAM load during execution, I see that an apache process consumes up to 697 MB of RAM. I'd like to say that so far, nothing abnormal, the table I'm retrieving candles from is a little over 100 MB. The real problem is that once the script is executed, the RAM load remains the same. If I run my script a second time, the RAM load is 1400 MB. And this continues until I have used up all the RAM, and my Apache server crashes.
So my question is simple, do you know a way to clear this RAM after my script is executed?
What you describe is improbable and you don't say how you made these measurements. If your assertions are valid then there are a couple of ways to solve the memory issue, however this is the xy problem. There is no good reason to read a million rows into a web page script.
After several hours of research and discussion, it seems that this problem of unreleased memory has no solution. It is simply the current technical limitations of Apache compared to my case, which is not able to free the memory it uses unless it is restarted every time.
I have however found a workaround in the Apache configuration, by only allowing one maximum request per server process instead of the default 5.
This way, the process my script is running on gets killed at the end of the run and is replaced by another one that starts automatically.

Running 600+ threads with PHP pthreads - what about the overhead

I have a server with 2 physical CPU which have together 24 cores and 10 GB RAM.
The PHP program is calculating a statistic and I could run each section totally independent of the others. Once all calculations finished I have only to "merge" them.
Therefore I had the idea to perform each calculation phase in a separate thread created/controlled by "pthread".
Each calculation takes around 0,10 seconds but the amount of the calculations lets it take that long when they are serialized.
My questions:
Is there a limitation when creating a new "thread" with "pthreads"?
What is the overhead when creating a new thread? I must consider this to avoid a new delay.
I can imagine that for several seconds the load would be very high but then it ends suddenly once each calculation finished. This is not the problem. It is "my" server and I do not have to take care regarding other users [or when it is a shared server].
While "waiting" for an answer :-) I started to rewrite the class.
I can summarize it like this:
There is no way to start 600 threads at once. I expected it but I wanted to know where is the limit. My configuration "allowed" around 160 threads to be started.
When starting more than these 150 threads the PHP script stopped working without any further notice.
As Franz Gleichmann pointed out the whole process took longer when starting lot of threads. I found out that starting 20 threads has the best performance.
The achieved performance gain is between 20% and 50% - I am satisfied.
I don't know if it is a bug in the pthread library but I could not access any class members. I had to move the class members inside the function. Due to the fact the calculation is in one function it did not bother me and I do not investigate it further.

Optimizing innoDB but my inserts wait for each other

I'm really new to MySQL and just started to use the InnoDB table engine on my VPS.
Server info: 4 x 2.4 ghz, 8 Gb ram, 120gb Raid10.
My.cfg:
[mysqld]
innodb_file_per_table=1
innodb_buffer_pool_size=4G
innodb_log_file_size=512M
innodb_flush_log_at_trx_commit=2
Table to insert has 6 ints and 1 date and 1 trigger for mapping -> inserts 1 row for each row into a mapping table.
The last line helps a lot with speeding up the inserts (the database is 80% insert/update vs 20% read).
But when I do some tests, I call a PHP file in the webbrowser which will do 10 000 inserts, it takes about 3 seconds (is this fast/slow for this hardware?). But when I open multiple tabs and open the PHP file at the same time, they all have an execution time of 3 seconds, but they are waiting for each other to finish :/
Any ideas which settings I should add? Or other settings I should add for faster inserts are greatly appreciated!
3,300 inserts a second is quite respectable, especially with a trigger. It's hard to know more about that without understanding the tables.
What are you doing about commits? Are you doing all 10K inserts and then committing once? If so, other clients doing similar tasks are probably locked out until each client runs. That's a feature!
On the other hand, if you're using autocommit you're making your MySQL server churn.
The best strategy is to insert chunks of something like 100 or 500 rows, and then commit.
You should attempt to solve this kind of lockout not with configuration settings, but by tuning your web php application to manage transactions carefully.
You might want to consider using a MyISAM table if you don't need robust transaction semantics for this application. MyISAM handles large volumes of inserts more quickly. That is, if you don't particularly care if you read data that's current or a few seconds old, then MyISAM will serve you well.

Suggestion for using php array?

Let say i have 100k records in table, after fetching that records from table i am pushing it to an array with some calculations, and then send them to server for further processing.
I have test the scenario with(1k) records, its working perfectly, but worrying about if there is performance issue, because the page which do calculation and fetching records from db run after each 2 mins.
My Question is can I use array for more than 2 Millions records?
There's no memory on how much data an array can hold, the limit is server memory/PHP memory limit.
Why would you push 100k records into an array? You know databases have sorting and limiting for that reason!
My Question is can I use array for more than 2 Millions records?
Yes you can, 2 Million array entries is not a limit in PHP for arrays. The array limit depends on the memory that is available to PHP.
ini_set('memory_limit', '320M');
$moreThan2Million = 2000001;
$array = range(0, $moreThan2Million);
echo count($array); #$moreThan2Million
You wrote:
The page is scheduled and run after 2 min, so I am worrying about the performance issue.
And:
But I need to fetch all, not 100 at time, and send them to server for further processing.
Performance for array operations is dependent on processing power. With a fast enough computer, you should not run into any problems. However, keep in mind that PHP is an interpreted language and therefore considerably slower than compiled binaries.
If you need to run the same script every 2 minutes but the runtime of the script is larger than two minutes, you can distribute script execution over multiple computers, so one process is not eating the CPU and memory resources of the other and can finish the work in meantime another process runs on an additional box.
Edit
Good answer, but can you write your consideration, about how much time the script will need to complete, if the there is no issue with the server processor and RAM.
That depends on the size of the array, the amount of processing each entry needs (in relation to the overall size of the array) and naturally the processor power and the amount of RAM. All these are unspecified with your question, so I can specifically say, that I would consider this unspecified. You'll need to test this on your own and building metrics for your application by profiling it.
I have 10GB RAM and More than 8 Squad processor.
For example you could do a rough metric for 1, 10, 100, 1000, 10000, 100000 and 1 million entries to see how your (unspecified) script scales on that computer.
I am sending this array to another page for further processing.
Metric as well the amount of data you send between computers and how much bandwidth you have available for inter-process communication over the wire.
Let say i have 100k records in table, after fetching that records from table i am pushing it to an array with some filters.
Filters? Can't you just write a query that implements those filters instead? A database (depending on vendor) isn't just a data store, it can do calculations and most of the time it's much quicker than transferring the data to PHP and doing the calculations there. If you have a database in, say, PostgreSQL, you can do pretty much everything you've ever wanted with plpgsql.

Optimizing mysql / PHP based website | 300 qps

Hey,
I currently have over 300+ qps on my mysql. There is roughly 12000 UIP a day / no cron on fairly heavy PHP websites. I know it's pretty hard to judge if is it ok without seeing the website but do you think that it is a total overkill?
What is your experience? If I optimize the scripts, do you think that I would be able to get substantially lower of qps? I mean if I get to 200 qps that won't help me much. Thanks
currently have over 300+ qps on my mysql
Your website can run on a Via C3, good for you !
do you think that it is a total overkill?
That depends if it's
1 page/s doing 300 queries, yeah you got a problem.
30-60 pages/s doing 5-10 queries each, then you got no problem.
12000 UIP a day
We had a site with 50-60.000, and it ran on a Via C3 (your toaster is a datacenter compared to that crap server) but the torrent tracker used about 50% of the cpu, so only half of that tiny cpu was available to the website, which never seemed to use any significant fraction of it anyway.
What is your experience?
If you want to know if you are going to kill your server, or if your website is optimizized, the following has close to zero information content :
UIP (unless you get facebook-like numbers)
queries/s (unless you're above 10.000) (I've seen a cheap dual core blast 20.000 qps using postgres)
But the following is extremely important :
dynamic pages/second served
number of queries per page
time duration of each query (ALL OF THEM)
server architecture
vmstat, iostat outputs
database logs
webserver logs
database's own slow_query, lock, and IO logs and statistics
You're not focusing on the right metric...
I think you are missing the point here. If 300+ qps are too much heavily depends on the website itself, on the users per second that visit the website, that the background scripts that are concurrently running, and so on. You should be able to test and/or compute an average query throughput for your server, to understand if 300+ qps are fair or not. And, by the way, it depends on what these queries are asking for (a couple of fields, or large amount of binary data?).
Surely, if you optimize the scripts and/or reduce the number of queries, you can lower the load on the database, but without having specific data we cannot properly answer your question. To lower a 300+ qps load to under 200 qps, you should on average lower your total queries by at least 1/3rd.
Optimizing a script can do wonders. I've taken scripts that took 3 minutes before to .5 seconds after simply by optimizing how the calls were made to the server. That is an extreme situation, of course. I would focus mainly on minimizing the number of queries by combining them if possible. Maybe get creative with your queries to include more information in each hit.
And going from 300 to 200 qps is actually a huge improvement. That's a 33% drop in traffic to your server... that's significant.
You should not focus on the script, focus on the server.
You are not saying if these 300+ querys are causing issues. If your server is not dead, no reason to lower the amount. And if you have already done optimization, you should focus on the server. Upgrade it or buy more servers.

Categories