I'm trying to query a very large table some 35+ millions rows to process each row 1 by 1 because I can't pull in the full database in php at once (out of memory) I'm using 'limit' in a loop but every time it trys to query the 700K mark it throws an out of disk space error (error 28)
select * from dbm_new order by id asc limit 700000,10000
I'm pulling in 10K rows at once into php and I can even make it pull in 100K rows it still throws the same error trying to start at row 700K, I can see it's eating a huge amount of disk space.
In php I'm freeing the result set after each loop
mysql_free_result ($res);
But it's not a PHP related issue, I've run the query in mysql only and it gives the same error
Why does starting the limit at the 700K mark eat up so much disk space, I'm talking over 47gig here, surely it doesn't need that much space, What other options do I have?
here's the code
$start = 0;
$increment = 10000;
$hasResults = true;
while ($hasResults) {
$sql = "select * from dbm_new order by id asc limit $start,$increment ";
....
}
You can use the PK instead of OFFSET to get chunks of data:
$start = 0;
while(1) {
$sql = "SELECT * FROM table WHERE id > $start ORDER BY id ASC LIMIT 10000";
//get records...
if(empty($rows)) break;
foreach($rows as $row) {
//do stuff...
$start = $row['id'];
}
}
Related
My problem is explained below.
This is my PHP code running on my server right now :
$limit = 10000;
$annee = '2017';
//Counting the lines I need to delete
$sql = " SELECT COUNT(*) FROM historisation.cdr_".$annee." a
INNER JOIN transatel.cdr_transatel_v2 b ON a.id_cdr = b.id_cdr ";
$t = $db_transatel->selectAll($sql);
//The number of lines I have to delete
$i = $t[0][0];
do {
if ($i < $limit) {
$limit = $i;
}
//The problem is comming from that delete
$selectFromHistoryAndDelete = " DELETE FROM transatel.cdr_transatel_v2
WHERE id_cdr IN (
SELECT a.id_cdr FROM historisation.cdr_".$annee." a
INNER JOIN (SELECT id_cdr FROM historisation.cdr_transatel_v2) b ON a.id_cdr = b.id_cdr
)
LIMIT " . $limit;
$delete = $db_transatel->exec($selectFromHistoryAndDelete, $params);
$i = $i - $limit;
} while ($i > 0);
The execution of the query.
As you can see on the picture, in the first 195 loops the execution time was between 13 and 17 seconds.
It increased to 73 seconds on the 195th loop and to 1305 seconds on the 196th loop.
Now the query is running for 2000 seconds.
The query is deleting rows in a test table that no one is using right.
I'm deleting row 10,000 by 10,000 for the query to be quick and not overload the server.
I am wondering why is the execution time increasing like that, I though it will be quicker at the end because I though the inner join would be much quicker as they are less rows in the table.
Does anyone has an idea ?
Edit : The tables engine is MyISAM.
Based on your latest comment the inner join is redundant, since you're deleting from the table that contains the values you're joining on. In essence you're having to process b.id_cdr = a.id_cdr twice, since the number of values compared on cdr_2017 are not changed by the inner join, just the number of values queried to be deleted.
As for the cause of the incremental slowness, it is because you are manually performing the same function as SELECT cdr_id FROM cdr_2017 LIMIT 10000 OFFSET x.
That is to say, your query has to perform a full-table scan on cdr_2017 to determine the id values to delete. As you delete the values, the SQL optimizer has to move further through the cdr_2017 table to retrieve the values.
Resulting in
DELETE FROM IN(1,2,3,...10000)
DELETE FROM IN(1,2,3,...20000)
...
DELETE FROM IN(1,2,3,...1000000)
Assuming cdr_id is the incremental primary key, to resolve the issue you could use the last index retrieved from cdr_2017 to filter the selected values.
This will be much faster, as a full-table scan is no longer required to validate the joined records, since you're now utilizing an indexed value on both sides of the query.
$sql = " SELECT COUNT(a.cdr_id) FROM historisation.cdr_".$annee." a
INNER JOIN transatel.cdr_transatel_v2 b ON a.id_cdr = b.id_cdr ";
$t = $db_transatel->selectAll($sql);
//The number of lines I have to delete
$i = $t[0][0];
//set starting index
$previous = 0;
do {
if ($i < $limit) {
$limit = $i;
}
$selectFromHistoryAndDelete = 'DELETE d
FROM transatel.cdr_transatel_v2 AS d
JOIN (
SELECT #previous := cdr_id AS cdr_id
FROM historisation.cdr_2017
WHERE cdr_id > ' . $previous . '
ORDER BY cdr_id
LIMIT 10000
) AS a
ON a.cdr_id = d.cdr_id';
$db_transatel->exec($selectFromHistoryAndDelete, $params);
//retrieve last id selected in cdr_2017 to use in next iteration
$v = $db_transatel->selectAll('SELECT #previous'); //prefer fetchColumn
$previous = $v[0][0];
$i = $i - $limit;
} while ($i > 0);
//optionally reclaim table-space
$db_transatel->exec('OPTIMIZE TABLE transatel.cdr_transatel_v2', $params);
You could also refactor to use cdr_id > $previous AND cdr_id < $last to remove the order by limit clauses, which should also improve performance.
Though I would like to note, that a table lock on cdr_transatel_v2 is performed during this operation by the MyISAM database engine. Due to the way MySQL handles concurrent sessions and queries, there is not much gain from a batch delete in this manner, and is really only applicable to InnoDB and transactions. Especially when using PHP with FastCGI, as opposed to Apache mod_php. Since other queries not on cdr_transatel_v2 will still be executed and write operations on cdr_transatel_v2 will still be queued. If using mod_php I would reduce the limit to 1,000 records to reduce queue times.
For more information see https://dev.mysql.com/doc/refman/5.7/en/internal-locking.html#internal-table-level-locking
Alternative approach.
Considering the large number of records that need to be deleted, when the records deleted exceed those that are kept, it would be more beneficial to invert the operation by using INSERT instead of DELETE.
#ensure the storage table doesn't exist already
DROP TABLE IF EXISTS cdr_transatel_temp;
#duplicate the structure of the original table
CREATE TABLE transatel.cdr_transatel_temp
LIKE transatel.cdr_transatel_v2;
#copy the records that are not to be deleted from the original table
INSERT transatel.cdr_transatel_temp
SELECT *
FROM transatel.cdr_transatel_v2 AS d
LEFT JOIN historisation.cdr_2017 AS b
ON b.cdr_id = d.cdr_id
WHERE b.cdr_id IS NULL;
#replace the original table with the storage table
RENAME TABLE transatel.cdr_transatel_v2 to transatel.backup,
transatel.cdr_transatel_temp to cdr_transatel_v2;
#remove the original table
DROP TABLE transatel.backup;
I have a problem, so I have in my project 10 database. Each database have the table members. In each table members I have 2 millions of rows so in 10 databases ~ 20 millions of rows. I tried like this :
foreach ($aDataBases as $database) {
$sSql = sprintf('SELECT nom,prenom,naiss FROM `%s`', $sTableName);
$rResult = Mysqli::query($sSql, $database);
while ($aRecord = $rResult->fetch_array(MYSQLI_ASSOC))
{
$aUsers['lastName'] = $aRecord['name'];
$aUsers['firstName'] = $aRecord['f_name'];
$aUsers['birthDate'] = $aRecord['birth'];
$aTotalUsers[] = $aUsers;
}
}
When I run I get the error Allowed memory size of 134217728 bytes exhausted. If for example I put in the select LIMIT 100 work perfect. Can you help me please ?
Just put your code in a loop, and make SQL calls of, say, 1000 entries each. Loop until all rows have been printed. Some people will tell just to raise your memory limit, but there's always a physical limit you can't pass.
I won't code that for you because you're a PHP programmer and you got the idea. Here's the pseudocode, though:
base = 0
while (rows = getrows(base,1000))
foreach row in rows
print row
base = base + 1000
When I run this query on phpmyadmin
SELECT *
FROM `product_stock`
WHERE `product_warehouse_id` =5
LIMIT 100000
it loads the data by section 50 by 50 till its stops. But when i do it in a normal php page it takes a while to load and sometimes i get server error. How can I aproach something similar to the way phpmyadmin loads the result?
http://www.tutorialspoint.com/php/mysql_paging_php.htm
this is tutorial how to add paging on mysql long queries.
The MySQL LIMIT takes a couple of possible arguments. A single value specifies to return all of those results to the caller. If you pass two values:
LIMIT 0, 50
Then you're passing the start row and the page size. The 10000 in your example is really just a short form for:
LIMIT 0, 10000
See the MySQL help documents on SELECT for some more detail.
I recommend if you're not going to use all data for all columns, do not use:
SELECT * FROM tablename WHERE 1 AND fieldkey = 'value'
If the data size is very large, causes performance problems in Mysql.
Use:
SELECT field1, field2, field3 FROM tablename WHERE 1 AND fieldkey = 'value'
In the SELECT statement, use comma separated names of the columns you want to display, this will help you the server to respond faster without problems and you can paginate through the results easyly.
Also verify that the data on "fieldkey" are indexed, this helps to the query to work faster.
You can paginate the results of your query with something like this:
php- Paginate data from array
This is what I wanted I had to do it with logic at the end it works, below the sample code:
while ($limit <= 1000){
$q1 = "SELECT *
FROM product_stock
WHERE product_warehouse_id = 5
LIMIT $start, $limit";
$r1 = mysql_query($q1) or die(mysql_error());
while($stock_info = mysql_fetch_assoc($r1)){
echo $stock_info[product_stock_id]."<br />";
}
$start = $limit + 1;
$limit += 50;
}
I have been running a foreach loop 1000 times on php page. The code inside the foreach loop looks like below:
$first = mysql_query("SELECT givenname FROM first_names order by rand() LIMIT 1");
$first_n = mysql_fetch_array($first);
$first_name = $first_n['givenname'];
$last = mysql_query("SELECT surname FROM last_name order by rand() LIMIT 1");
$last_n = mysql_fetch_array($last);
$last_name = $last_n['surname'];
$first_lastname = $first_name . " " . $last_name;
$add = mysql_query("SELECT streetaddress FROM user_addresss order by rand() LIMIT 1");
$addr = mysql_fetch_array($add);
$address = $addr['streetaddress'];
$unlisted = "unlisted";
$available = "available";
$arr = array(
$first_lastname,
$address,
$unlisted,
$available
);
Then I have been using array_rand function to get a randomized value each time the loop runs:
<td><?php echo $arr[array_rand($arr)] ?></td>
So loading the php page is taking a really long time. Is there a way I could optimize this code. As I need a unique value each time the loop runs
The problem is not your PHP foreach loop. If you order your MySQL table by RAND(), you are making a serious mistake. Let me explain to you what happens when you do this.
Every time you make a MySQL request, MySQL will attempt to map your search parameters (WHERE, ORDER BY) to indices to cut down on the data read. It will then load the relevant info in memory for processing. If the info is too large, it will default to writing it to disk and reading from disk to perform the comparison. You want to avoid disk reads at all costs as they are inefficient, slow, repetitive and can sometimes be flat-out wrong under specific circumstances.
When MySQL finds an index that is possible to be used, it will load the index table instead. An index table is a hash table between memory location and the value of the index. So, for instance, the index table for a primary key looks like this:
id location
1 0 bytes in
2 17 bytes in
3 34 bytes in
This is extremely efficient as even very large index tables can fit in tiny amounts of memory.
Why am I talking about indices? Because by using RAND(), you are preventing MySQL from using them. ORDER BY RAND() forces MySQL to create a new random value for each row. This requires MySQL to copy all your table data in what is called a temporary table, and to add a new field with the RAND() value. This table will be too big to store in memory, so it will be stored to disk.
When you tell MySQL to ORDER BY RAND(), and the table is created, MySQL will then compare every single row by pairs (MySQL sorting uses quicksort). Since the rows are too big, you're looking at quite a few disk reads for this operation. When it is done, it returns, and you get your data -at a huge cost.
There are plenty of ways to prevent this massive overhead SNAFU. One of them is to select ID from RAND() to maximum index and limit by 1. This does not require the creation of an extra field. There are plenty of similar Stack questions.
It has already been explained why ORDER BY RAND() should be avoided, so I simply provide a way to do it with some faster queries.
First get a random number based on your table size:
SELECT FLOOR(RAND()*COUNT(*)) FROM first_names
Second use the random number in a limit
SELECT * FROM first_names $pos,1
Unfortunately I don't think there is any way to combine the two queries into one.
Also you can do a SELECT COUNT(*) FROM first_names, store the number, and generate random $pos in PHP as many times as you like.
You should switch to using either mysqli or pdo if your host supports it but something like this should work. You will have to determine what you want to do if you don't have a enough record in either of the tables though (array_pad or wrap the indexes and restart)
function getRandomNames($qty){
$qty = (int)$qty;
$fnames = array();
$lnames = array();
$address = array();
$sel =mysql_query("SELECT givenname FROM first_names order by rand() LIMIT ".$qty);
while ($rec = mysql_fetch_array($sel)){$fnames[] = $rec[0]; }
$sel =mysql_query("SELECT surname FROM last_name order by rand() LIMIT ".$qty);
while ($rec = mysql_fetch_array($sel)){ $lnames[] = $rec[0]; }
$sel =mysql_query("SELECT streetaddress FROM user_addresss order by rand() LIMIT ".$qty);
while ($rec = mysql_fetch_array($sel)){ $address[] = $rec[0]; }
// lets stitch the results together
$results = array();
for($x = 0; $x < $qty; $x++){
$results[] = array("given_name"=>$fnames[$x], "surname"=>$lnames[$x], "streetaddress"=>$address[$x]);
}
return $results;
}
Hope this helps
UPDATE
Based on Sébastien Renauld's answer a more complete solution may be to structure the queries more like
"SELECT givenname from first_names where id in (select id from first_names order by rand() limit ".$qty.")";
I have a table with roughly 1 million rows. I'm doing a simple program that prints out one field from each row. However, when I started using mysql_pconnect and mysql_query the query would take a long time, I am assuming the query needs to finish before I can print out even the first row. Is there a way to process the data a bit at a time?
--Edited--
I am not looking to retrieve a small set of the data, I'm looking for a way to process the data a chunk at a time (say fetch 10 rows, print 10 rows, fetch 10 rows, print 10 rows etc etc) rather than wait for the query to retrieve 1 million rows (who knows how long) and then start the printing.
Printing one million fields will take some time. Retrieving one million records will take some time. Time adds up.
Have you profiled your code? I'm not sure using limit would make such a drastic difference in this case.
Doing something like this
while ($row = mysql_fetch_object($res)) {
echo $row->field."\n";
}
outputs one record at a time. It does not wait for the whole resultset to be returned.
If you are dealing with a browser you will need something more.
Such as this
ob_start();
$i = 0;
while ($row = mysql_fetch_object($res)) {
echo $row->field."\n";
if (($i++ % 1000) == 0) {
ob_flush();
}
}
ob_end_flush();
Do you really want to print one million fields?
The customary solution is to use some kind of output pagination in your web application, showing only part of the result. On SELECT queries you can use the LIMIT keyword to return only part of the data. This is basic SQL stuff, really. Example:
SELECT * FROM table WHERE (some conditions) LIMIT 40,20
shows 20 entries, starting from the 40th (off by one mistakes on my part may be possible).
It may be necessary to use ORDER BY along with LIMIT to prevent the ordering from randomly changing under your feet between requests.
This is commonly needed for pagination. You can use the limit keyword in your select query. Search for limit here:
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
You might be able to use
Mysqli::use_result
combined with a flush to output the data set to the browser. I know flush can be used to output data to the browser at an incremental state as I have used it before to do just that, however I am not sure if mysqli::use_result is the correct function to retrieve incomplete result sets.
This is how I do something like that in Oracle. I'm not sure how it would cross over:
declare
my_counter integer := 0;
begin
for cur in (
select id from table
) loop
begin
-- do whatever your trying to do
update table set name = 'steve' where id = cur.id;
my_counter := my_counter + 1;
if my_counter > 500 then
my_counter := 0;
commit;
end if;
end;
end loop;
commit;
end;
An example using the basic mysql driver.
define( 'CHUNK_SIZE', 500 );
$result = mysql_query( 'select count(*) as num from `table`' );
$row = mysql_fetch_assoc( $result );
$totalRecords = (int)$row['num'];
$offsets = ceil( $totalRecords / CHUNK_SIZE );
for ( $i = 0; $i < $offsets; $i++ )
{
$result = mysql_query( "select * from `table` limit " . CHUNK_SIZE . " offset " . ( $i * CHUNK_SIZE ) );
while ( $row = mysql_fetch_assoc( $result ) )
{
// your per-row operations here
}
unset( $result, $row );
}
This will iterate over your entire row volume, but do so only 500 rows at a time to keep memory usage down.
It sounds like you're hitting the limits of various buffer sizes within the mysql server... Some methods you could do would be to specify the field you want in the SQL statement to reduce this buffer size, or play around with the various admin settings.
OR, you can use a pagination like method but have it output all on one page...
(pseudocode)
function q($part) {
$off = $part*SIZE_OF_PARTITIONS;
$size = SIZE_OF_PARTITIONS;
return( execute_and_return_sql('SELECT `field` FROM `table` LIMIT $off, $size'));
}
$ii = 0;
while ($elements = q($ii)) {
print_fields($elements);
$ii++;
}
Use mysql_unbuffered_query() or if using PDO make sure PDO::MYSQL_ATTR_USE_BUFFERED_QUERY is false.
Also see this similar question.
Edit: and as others have said, you may wish to combine this with flushing your output buffer after each batch of processing, depending on your circumstances.