PHP - Exceeding allowed memory when processing large dataset

PHP - Exceeding allowed memory when processing large dataset - php

I have a list data with 999,000 records.
I have a select query and a while loop to get the data, I use array_push to add the retrieved value in loop into one array.
And then I want it so every loop processes 1000 values in this array.
My problem is when use array_push with big data I get the error:
Fatal Error: Allowed Memory Size of 134217728 Bytes
How can I optimize my code to resolve my problem?
My code is below:
$sql = "select customer_id";
$sql .= " from";
$sql .= " t_customer t1";
$sql .= " inner join t_mail_address t2 using(mid, customer_id)";
$result = $conn->query($sql);
$customerArray = array();
while ($row = $result ->fetch(PDO::FETCH_ASSOC)) {
array_push($customerArray , $row);
}
// Execute every 1000 record
foreach(array_chunk($customerArray , 1000) as $execCustomerArray ) {
// My code to execute for every records.
// ....
}

I'm unsure if it would fix anything, but one thing I will say is, your use of pushing all records into an array is silly.
You're using fetch to fetch them one by one, then adding them all to an array, why on earth aren't you just using PDOStatement::fetchAll() ?
Example:
$sql = "select customer_id";
$sql .= " from";
$sql .= " t_customer t1";
$sql .= " inner join t_mail_address t2 using(mid, customer_id)";
$result = $conn->query($sql);
$customerArray = $result->fetchAll(PDO::FETCH_ASSOC);
// Execute every 1000 record
foreach(array_chunk($customerArray , 1000) as $execCustomerArray ) {
// My code to execute for every records.
// ....
}
This may not fix your memory issue, because we can't see what the heavy lifting is for every customer record, but I will say that while loop you had was silly but most likely not the cause of your memory issue
Depending on if this is a script, or a web page thing, you could also have an incremental loop sort of thing, and use the MySQL LIMIT function to implement basic paging for your data, thus preventing it from coming into memory all at once,

Related

PHP process MySQL query row-by-row

I have combined a PHP script to count words in MySQL textfield and update another field accordingly.
It works well with relatively small tables - but when I tried with really big table (10M records) - of course I've got "PHP Fatal error: Allowed memory size of 134217728 bytes exhausted"
Could somebody hint how to modify the script below to process the data "row-by-row" ?
<?php
$con1 = mysqli_connect('localhost','USERNAME','PASSWORD','DATABASE');
if (!$con1) {
die('Could not connect: ' . mysqli_error($con1));
}
$sql = "SELECT id FROM TableName";
$result = mysqli_query($con1, $sql);
while ($row = mysqli_fetch_assoc($result)) {
$to = $row["id"];
$sql1 = "SELECT textfield FROM TableName WHERE id = '$to' ";
$result1 = mysqli_query($con1,$sql1);
$row1 = mysqli_fetch_assoc($result1);
$words=(str_word_count($row1['textfield'],0));
$sql2="UPDATE TableName SET wordcount = '$words' WHERE id ='$to'";
$result2 = mysqli_query($con1,$sql2);
}
mysqli_close($con1);
?>

MySQL queries have the clause limit o, n, so you can run
SELECT id FROM TableName limit 0, 10
for example to get only 10 elements from the start. Now, the first number is the offset (the index where you start your work from) and the second is the number of elements you would expect to get. Now, these are the ideas you need to know in order to have success in doing this:
you will need to write a loop
in the loop you will always get n elements (n could be 1 as you wanted, or more)
in each step you increment o by n, so the new offset will be starting where the results ended previously
you can ensure an order, like order by id, for example
you can wrap the loop we are speaking about here around most of your code

PHP and SQL: code is really slow

$unique = array();
$sql = "SELECT ID, TitleName, ArtistDisplayName, Mix FROM values_to_insert as A
WHERE A.ID = ";
//Get a single row from our data that needs to be inserted...
while($result = $conn->query(($sql. $count)))
{
//Get the $data of the single row query for inserting.
$data = mysqli_fetch_row($result);
$count++;
//SQL to get a match of the single row of $data we just fetched...
$get_match = "SELECT TitleName_ti, Artist_ti, RemixName_ti from titles as B
Where B.TitleName_ti = '$data[1]'
and B.Artist_ti = '$data[2]'
and B.RemixName_ti = '$data[3]'
LIMIT 1";
//If this query returns a match, then push this data to our $unique value array.
if(!$result = $conn->query($get_match))
{
//If this data has been pushed already, (since our data includes repeats), then don't
//put a repeat of the data into our unique array. Else, push the data.
if(!in_array($unique, $data))
{
echo 'Pushed to array: ' . $data[0] . "---" . $data[1] . "</br>";
array_push($unique, $data);
}
else
echo'Nothing pushed... </br>';
}
}
This has taken 5+ minutes and nothing has even printed to screen. I'm curious as to what is eating up so much time and possibly an alternative method or function for whatever it is taking all this time up. I guess some pointers in the right direction would be great.
This code basically gets all rows, one at a time, of table 'A'. Checks if there is a match in table 'B', and if there is, then I don't want that $data, but if there isn't, I then check whether or not the data itself is a repeat because my table 'A' has some repeat values.
Table A has 60,000 rows
Table B has 200,000 rows

Queries within queries are rarely a good idea
But there appear to be multiple issues with your script. It might be easier to just do the whole lot in SQL and push the results to the array each time. SQL can remove the duplicates:-
<?php
$unique = array();
$sql = "SELECT DISTINCT A.ID,
A.TitleName,
A.ArtistDisplayName,
A.Mix
FROM values_to_insert as A
LEFT OUTER JOIN titles as B
ON B.TitleName_ti = A.ID
and B.Artist_ti = A.TitleName
and B.RemixName_ti = A.ArtistDisplayName
WHERE B.TitleName_ti IS NULL
ORDER BY a.ID";
if($result = $conn->query(($sql)))
{
//Get the $data of the single row query for inserting.
while($data = mysqli_fetch_row($result))
{
array_push($unique, $data);
}
}
As to your original query.
You have a count (I presume it is initialised to 0, but if a character then that will do odd things), and get the records with that value. If the first id was 1,000,000,000 then you have done 1b queries before you ever find a record to process. You can just get all the rows in ID order anyway by removing the WHERE clause and ordering by ID.
You then just get a single record from a 2nd query where the details match, but only process them if no record is found. You do not use any of the values that are returned. You can do this by doing a LEFT OUTER JOIN to get matches, and checking that there was no match in the WHERE clause.
EDIT - as you have pointed out, the fields you appear to be using to match the records do not appear to logically match. I have used them as you did but I expect you really want to match B.TitleName_ti to A.TitleName, B.Artist_ti to A.ArtistDisplayName and B.RemixName_ti to A.Mix

Database Insert into Random Unused Row - With Transaction

I am writing an app that wishes to randomly assign a number to users, then puts in into a MySql database. There are many people who use it at the same time and as such I dont want parallel uses to overwite each other.
My current code is the following:
$sql_get = "SELECT * FROM database";
$results = mysql_query($sql_get, $bd);
$list = array();
while($row = mysql_fetch_array($results))
{
if ($row['userId'] == "")
{
array_push($list, $row['number']);
}
}
$rand_nums = array_rand($list , 1);
$sql_update = "UPDATE database SET userId='". $userId ."' WHERE number=". $rand_nums;
$results = mysql_query($sql_update, $bd);
So basically, it gets the empty rows, puts them into a list, chooses a random empty row number and puts the data into the row. The current issue is that the get and the empty rows can happen at the same time for multiple users and may overwrite data written at the same time.
How can I structure this code (transaction or otherwise) to ensure concurrent use has no bad effects?
Thank you

You could do it all in one query:
UPDATE database AS d
SET d.userId = $userId
WHERE d.userId = ''
ORDER BY RAND()
LIMIT 1

Mysql query eating large amount of space using limit

I'm trying to query a very large table some 35+ millions rows to process each row 1 by 1 because I can't pull in the full database in php at once (out of memory) I'm using 'limit' in a loop but every time it trys to query the 700K mark it throws an out of disk space error (error 28)
select * from dbm_new order by id asc limit 700000,10000
I'm pulling in 10K rows at once into php and I can even make it pull in 100K rows it still throws the same error trying to start at row 700K, I can see it's eating a huge amount of disk space.
In php I'm freeing the result set after each loop
mysql_free_result ($res);
But it's not a PHP related issue, I've run the query in mysql only and it gives the same error
Why does starting the limit at the 700K mark eat up so much disk space, I'm talking over 47gig here, surely it doesn't need that much space, What other options do I have?
here's the code
$start = 0;
$increment = 10000;
$hasResults = true;
while ($hasResults) {
$sql = "select * from dbm_new order by id asc limit $start,$increment ";
....
}

You can use the PK instead of OFFSET to get chunks of data:
$start = 0;
while(1) {
$sql = "SELECT * FROM table WHERE id > $start ORDER BY id ASC LIMIT 10000";
//get records...
if(empty($rows)) break;
foreach($rows as $row) {
//do stuff...
$start = $row['id'];
}
}

how to return array for mysql_query?

// make empty array
$sqlArray=array();
$jsonArray=array();
// START NEED FAST WORKING ALTERNATİVES -----------------------------------------------------
// first 20 vistors
$query = "SELECT user_id FROM vistors LIMIT 20";
$result = mysql_query ($query) or die ($query);
// make vistors user query array
while ($vstr_line = mysql_fetch_array($result)){
array_push($sqlArray, $vstr_line['user_id']);
}
// implode vistors user array
$sqlArray_impl = implode("', '", $sqlArray);
// END NEED FAST WORKING ALTERNATİVES -----------------------------------------------------
// Get vistors information
$query = "SELECT id, username, picture FROM users WHERE id IN ('$sqlArray_impl')";
$qry_result = mysql_query($query) or die($query);
while ($usr_line = mysql_fetch_array($qry_result)){
array_push($jsonArray, $usr_line['id'].' - '.$usr_line['username'].' - '.$usr_line['picture']);
}
print_r($sqlArray);
echo '<br><br>';
print_r($jsonArray);
see this my functions..
i need a replacement for fast working alternatives..
function within the range specified above, to me, running faster than the alternative.
the query will return back array ?
thx for all helpers !

Can you use a JOIN or SUB SELECT to reduce the query count from 2 to 1? Might not give much of a boost but worth a shot and a cleaner implementation.
Where is the bottleneck? Most likely the db and not the php code.
Are the tables/columns properly indexed? Run EXPLAIN on both queries.

Easiest would be to include first query as subquery eliminating one turn to the DB and a lot of code:
// Get vistors information
$query = "SELECT id, username, picture FROM users WHERE id IN (SELECT user_id FROM vistors LIMIT 20)";
$qry_result = mysql_query($query) or die($query);
Unless there is more reason to have the first one seperate, but that is not visible in your code example.

If you use PDO (recommended anyway...), you can return the result array all at once using fetchAll().
For your second query, you can use string concatenation in MySQL to directly return the result you want.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - Exceeding allowed memory when processing large dataset - php

Related

PHP process MySQL query row-by-row

PHP and SQL: code is really slow

Database Insert into Random Unused Row - With Transaction

Mysql query eating large amount of space using limit

how to return array for mysql_query?

Categories

Resources