PHP Max amount of inserts in one SQL query - php

I have a pretty simple question. I am inserting a lot of records at once in a MySQL table. It works for about 2000 records (actually a bit more). But say I want to insert 3000 records, than it doesn't do anything.
I'm working through AS3 sending an array containing all the records via AMFPHP to a simple PHP script to parse and insert the array.
Is this normal, or should I look into it?
Currently I'm slicing my array in parts of 2000 records, and sending a couple AMFPHP requests instead of just 1.

PHP's queries are limited by the "max_allowed_packet" configuration option. It defines the absolute length limit, in characters, that a query string can be. Note that this isn't just the total size of the data being inserted, it's the entire query string. SQL commands, punctuation, spaces, etc...
Check how long your 3000 record version is vs. the 2000 one, and then get your server's packet length limit:
SHOW VARIABLES WHERE Variable_name LIKE '%max_allowed_packet%'
If your 3000-record version is longer than this limit, the query will defnitely fail because it'll be chopped off somewhere part-way

I don't think there is really a limit in the number of inserts in one query.
Instead, there is a limit in the size of the query you can send to MySQL
See :
max_allowed_packet
Packet too large
So, basically, this depends on the amount of data you have in each insert.

I would ensure max_allowed_packet is larger than your PHP SQL query.
http://dev.mysql.com/doc/refman/5.5/en/packet-too-large.html

I think PHP doesn't limit the amount of inserted query at one, instead its limit the amount of the memory usage that can be taken by script, and max time of the execution.

Related

Storing a large mysql dataset into an array in php

Some background:
I have a php program that does a lot of things with large data sets that I get every 15 minutes (about 10 million records each file every 15 minutes). I have a table on a mysql database with phone numbers (over 300 million rows) that I need to check with each row in my file and if that phone number from the mysql table is contained in the raw file record I need to know that so I can add it to my statistics record. So far I have tried to just do a sql call each time like:
select * from phone.table where number = '$phoneNumber';
Where $phoneNumber is the number in the raw record that I'm trying to compare. Then I check if the query brought back results and that is how I know if that record contained a phone number I need to check for.
That is me doing 10 million sql queries every 15 minutes and it is just too slow and too memory intensive. The second thing I tried was to just do the sql query once and store the results in an array and compare the raw record phone numbers that way. But a 300 million record array stored in memory was just too much as well.
I'm at a loss here and I can't seem to find a way to do it. Just to add a few things, yes I have to have the table stored in mysql and yes I have to do this with PHP (boss requires it being done in php).

MySQL (MariaDB) execution timeout within query called from PHP

I'm stress testing my database for a geolocation search system. It has a lot of optimisation built in already such a square box long/lat index system to narrow searches before performing arc distance calculations. My aim is to serve 10,000,000 users from one table.
At present my query time is between 0.1 and 0.01 seconds based on other conditions such as age, gender etc. This is for 10,000,000 users evenly distributed across the UK.
I have a LIMIT condition as I need to show the user X people, where X can be between 16 and 40.
The issue is when there are no other users / few users that match, the query can take a long time as it cannot reach the LIMIT quickly and may have to scan 400,000 rows.
There may be other optimisation techniques which I can look at but my questions is:
Is there a way to get the query to give up after X seconds? If it takes more than 1 second then it is not going to return results and I'm happy for this to occur. In pseudo query code it would be something like:
SELECT data FROM table WHERE ....... LIMIT 16 GIVEUP AFTER 1 SECOND
I have thought about a cron solution to kill slow queries but that is not very elegant. The query will be called every few seconds when in production so the cron would need to be on continuously.
Any suggestions?
Version is 10.1.14-MariaDB
Using MariaDB in version 10.1, you have two ways of limiting your query. It can be done based on time or on total of rows queried.
By rows:
SELECT ... LIMIT ROWS EXAMINED rows_limit;
You can use the keyword EXAMINED and set an amount of lines like 400000 as you mentioned (since MariaDB 10.0).
By time:
If the max_statement_time variable is set, any query (excluding stored
procedures) taking longer than the value of max_statement_time
(specified in seconds) to execute will be aborted. This can be set
globally, by session, as well as per user and per query.
If you want it for a specific query, as I imagine, you can use this:
SET STATEMENT max_statement_time=1 FOR
SELECT field1 FROM table_name ORDER BY field1;
Remember that max_statement_time is set in seconds (just the opposite of MySQL, which are milliseconds), so you can change it until you find the best fit for your case (since MariaDB 10.1).
If you need more information I recommend you this excellent post about queries timeouts.
Hope this helps you.

Insertion efficiency of a large amount of data with SQL

I have a program that I use to read CSV file and insert the data into a database. I am having trouble with it because it needs to able to insert big records ( up to 10,000 rows ) of data at a time. At first I had it looping through and inserting each record one at a time. That is slow because it calls an insert function 10,000 times... Next I tried to group it together so it inserted 50 rows at a time. I figured this way it would have to connect to the database less, but it is still too slow. What is an efficient way to insert many rows of a CSV file into a database? Also, I have to edit some data(such as add a 1 to a username if two are the same) before it goes into the database.
For a text file you can use the LOAD DATA INFILE command which is designed to do exactly this. It'll handle CSV files by default, but has extensive options for handling other text formats, including re-ordering columns, ignoring input rows, and reformatting data as it loads.
So I ended up using the fputcsv to put the data I changed into a new CSV file, then I used the LOAD DATA INFILE command to put the data from the new csv file into the table. This changed it from timing out at 120 secs for 1000 entries, to taking about 10 seconds to do 10,000 entries. Thank you to everyone that replied.
I have this crazy idea: Could you run multiple parallels scripts, each one takes care of a bunch of rows from your CSV.
Some thing like this:
<?php
// this tells linux to run the import.php in background,
// and releases your caller script.
//
// do this several times, and you could increase the overal time
$cmd = "nohup php import.php [start] [end] & &>/dev/null";
exec($cmd);
Also, have you tried to increase these limit of 50 bulk inserts to 100 or 500 for example?

Difference in efficiency of retrieving all rows in one query, or each row individually?

I have a table in my database that has about 200 rows of data that I need to retrieve. How significant, if at all, is the difference in efficiency when retrieving all of them at once in one query, versus each row individually in separate queries?
The queries are usually made via a socket, so executing 200 queries instead of 1 represents a lot of overhead, plus the RDBMS is optimized to fetch a lot of rows for one query.
200 queries instead of 1 will make the RDBMS initialize datasets, parse the query, fetch one row, populate the datasets, and send the results 200 times instead of 1 time.
It's a lot better to execute only one query.
I think the difference will be significant, because there will (I guess) be a lot of overhead in parsing and executing the query, packaging the data up to send back etc., which you are then doing for every row rather than once.
It is often useful to write a quick test which times various approaches, then you have meaningful statistics you can compare.
If you were talking about some constant number of queries k versus a greater number of constant queries k+k1 you may find that more queries is better. I don't know for sure but SQL has all sorts of unusual quirks so it wouldn't surprise me if someone could come up with a scenario like this.
However if you're talking about some constant number of queries k versus some non-constant number of queries n you should always pick the constant number of queries option.
In general, you want to minimize the number of calls to the database. You can already assume that MySQL is optimized to retrieve rows, however you cannot be certain that your calls are optimized, if at all.
Extremely significant, Usually getting all the rows at once will take as much time as getting one row. So let's say that time is 1 second (very high but good for illustration) then getting all the rows will take 1 second, getting each row individually will take 200 seconds (1 second for each row) A very dramatic difference. And this isn't counting where are you getting the list of 200 to begin with.
All that said, you've only got 200 rows, so in practice it won't matter much.
But still, get them all at once.
Exactly as the others have said. Your RDBMS will not break a sweat throwing 200+++++ rows at you all at once. Getting all the rows in one associative array will also not make much difference to your script, since you no doubt already have a loop for grabbing each individual row.
All you need do is modify this loop to iterate through the array you are given [very minor tweak!]
The only time I have found it better to get fewer results from multiple queries instead of one big set is if there is lots of processing to be done on the results. I was able to cut out about 40,000 records from the result set (plus associated processing) by breaking the result set up. Anything you can build into the query that will allow the DB to do the processing and reduce result set size is a benefit, but if you truly need all the rows, just go get them.

PHP and MYSQL maximum variable length?

I'm trying to do an INSERT into a mysql db and it fails when any of the values are longer than 898 characters. Is there somewhere to get or, better, set this maximum value? I'll hack the string into chunks and store 'em in separate rows if I must, but I'd like to be able to insert up to 2k at a time.
I'm guessing this is php issue as using LONGTEXT or BLOB fields should be more than enough space in the db.
Thanks.
Side Note:
When you get into working with large blobs and text columns, you need to watch out for the MySQL max_allowed_packet variable. I believe it defaults to at least 1M.
I'm assuming this is a varchar column you're trying to insert into? If so, I assume the maximum length has been set to 898 or 900 or something like that.
In MySQL 5 the total row size can be up to 65,536 bytes so a varchar can be defined to whatever size keeps the total row size under that.
If you need larger use text (65,536) or longtext (4 billion?).

Categories