PHP/MYSQL: Iterate over every record in a database - php

I am new to the whole php/mysql thing. I have a weeks worth of server logs (about 300,000 items) and I need to do some analysis. I am planning on reading them all into a mysql db and then analysing them with php.
The thing I am not sure about is how to iterate through them. Using java reading a file I would do something like this:
Scanner s = new Scanner(myfile);
while(s.hasNext()){
String line = s.nextLine();
~~ Do something with this record.
}
How do I iterate through all records in a mysql db using php? I think that something like this will take a stupid amount of memory.
$query = "SELECT * FROM mytable";
$result = mysql_query($query);
$rows = mysql_num_rows($result);
for($j = 0; $j < $rows; ++$j){
$curIndex = mysql_result($result,$j,"index");
$curURL = mysql_result($result,$j,"something");
~~ Do something with this record
}
So I have added a limit to the select statement and I repeat until all records have been cycled through. Is there a more standard way to do this? Is there a built in that will do this?
while($startIndex < $numberOfRows){
$query = "SELECT * FROM mytable ORDERBY mytable.index LIMIT $startIndex,$endIndex";
$result = mysql_query($query);
$rows = mysql_num_rows($result);
for($j = 0; $j < $rows; ++$j){
$curIndex = mysql_result($result,$j,"index");
$curURL = mysql_result($result,$j,"something");
~~ Do something with this record
}
$startIndex = $endIndex + 1;
$endIndex = $endIndes + 10;
}

You don't want to do a SELECT * FROM MYTABLE if your table is large, you're going to have the whole thing in memory. A trade-off between memory overhead and database calls would be to batch requests. You can get the min and max id's of rows in your table:
SELECT MIN(ID) FROM MYTABLE;
SELECT MAX(ID) FROM MYTABLE;
Now loop from minId to maxId, incrementing by say 10,000 each time. In pseudo-code:
for (int i = minId; i < maxId; i = i + 10000) {
int x = i;
int y = i + 10000;
SELECT * FROM MYTABLE WHERE ID >= x AND ID < y;
}

See here:
http://www.tizag.com/mysqlTutorial/
http://www.tizag.com/mysqlTutorial/mysqlfetcharray.php
<?php
// Make a MySQL Connection
$query = "SELECT * FROM example";
$result = mysql_query($query) or die(mysql_error());
while($row = mysql_fetch_array($result)){
echo $row['name']. " - ". $row['age'];
echo "<br />";
}
?>
Depending on what you need to do with the resulting rows, you can use a different loops style, whether its 'while', 'for each' or 'for x to x'. Most of the time, a simple 'while' iteration will be great, and is efficient.

Use mysql_fetch_*
$result = mysql_query(...);
while($row = mysql_fetch_assoc($result)) {
$curIndex = $row['index'];
}
I think that retrieves results in a "streaming" manner, rather than loading them all into memory at once. I'm not sure what exactly mysql_result does.
Side note: Since you're still new, I'd advice to get into good habits right away and immediately skip the mysql_ functions and go for PDO or at least mysqli.

In an ideal world, PHP would generate aggregate queries, send them to MySQL, and only get a small number of rows in return. For instance, if you're counting the number of log items of each severity between two dates:
SELECT COUNT(*), severity
FROM logs
WHERE date < ? AND date > ?
GROUP BY severity
Doing the work on the PHP side is quite unusual. If you find out that you have needs too complex for SQL queries to handle (which, given that you have control over your database structure, leaves you with a lot of freedom), a better option would be to move to a Map-Reduce database engine like CouchDB.

I strongly believe the batch processing with Doctrine or any kind of iterations with MySQL (PDO or mysqli) are just an illusion.
#dimitri-k provided a nice explanation especially about unit of work. The problem is the miss leading: "$query->iterate()" which doesn't really iterate over the data source. It's just an \Traversable wrapper around already fully fetched data source.
An example demonstrating that even removing Doctrine abstraction layer completely from the picture, we will still run into memory issues:
echo 'Starting with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";
$pdo = new \PDO("mysql:dbname=DBNAME;host=HOST", "USER", "PW");
$stmt = $pdo->prepare('SELECT * FROM my_big_table LIMIT 100000');
$stmt->execute();
while ($rawCampaign = $stmt->fetch()) {
// echo $rawCampaign['id'] . "\n";
}
echo 'Ending with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";
Output:
Starting with memory usage: 6 MB
Ending with memory usage: 109.46875 MB
Here, the disappointing getIterator() method:
namespace Doctrine\DBAL\Driver\Mysqli\MysqliStatement
/**
* {#inheritdoc}
*/
public function getIterator()
{
$data = $this->fetchAll();
return new \ArrayIterator($data);
}
You can use my little library to actually stream heavy tables using PHP Doctrine or DQL or just pure SQL. However you find appropriate: https://github.com/EnchanterIO/remote-collection-stream

Related

Retrieve all rows from table in doctrine

I have table with 100 000+ rows, and I want to select all of it in doctrine and to do some actions with each row, in symfony2 with doctrine I try to do with this query:
$query = $this->getDefaultEntityManager()
->getRepository('AppBundle:Contractor')
->createQueryBuilder('c')
->getQuery()->iterate();
foreach ($query as $contractor) {
// doing something
}
but then I get memory leak, because I think It wrote all data in memory.
I have more experience in ADOdb, in that library when I do so:
$result = $ADOdbObject->Execute('SELECT * FROM contractors');
while ($arrRow = $result->fetchRow()) {
// do some action
}
I do not get any memory leak.
So how to select all data from the table and do not get memory leak with doctrine in symfony2 ?
Question EDIT
When I try to delete foreach and just do iterate, I also get memory leak:
$query = $this->getDefaultEntityManager()
->getRepository('AppBundle:Contractor')
->createQueryBuilder('c')
->getQuery()->iterate();
The normal approach is to use iterate().
$q = $this->getDefaultEntityManager()->createQuery('select u from AppBundle:Contractor c');
$iterableResult = $q->iterate();
foreach ($iterableResult as $row) {
// do something
}
However, as the doctrine documentation says this can still result in errors.
Results may be fully buffered by the database client/ connection allocating additional memory not visible to the PHP process. For large sets this may easily kill the process for no apparant reason.
The easiest approach to this would be to simply create smaller queries with offsets and limits.
//get the count of the whole query first
$qb = $this->getDefaultEntityManager();
$qb->select('COUNT(u)')->from('AppBundle:Contractor', 'c');
$count = $qb->getQuery()->getSingleScalarResult();
//lets say we go in steps of 1000 to have no memory leak
$limit = 1000;
$offset = 0;
//loop every 1000 > create a query > loop the result > repeat
while ($offset < $count){
$qb->select('u')
->from('AppBundle:Contractor', 'c')
->setMaxResults($limit)
->setFirstResult($offset);
$result = $qb->getQuery()->getResult();
foreach ($result as $contractor) {
// do something
}
$offset += $limit;
}
With this heavy datasets this will most likely go over the maximum execution time, which is 30 seconds by default. So make sure to manually change set_time_limit in your php.ini. If you just want to update all datasets with a known pattern, you should consider writing one big update query instead of looping and editing the result in PHP.
Try using this approach:
foreach ($query as $contractor) {
// doing something
$this->getDefaultEntityManager()->detach($contractor);
$this->getDefaultEntityManager()->clear($contractor);
unset($contractor); // tell to the gc the object is not in use anymore
}
Hope this help
If you really need to get all the records, I'd suggest you to use database_connection directly. Look at its interface and choose method which won't load all the data into memory (and won't map the records to your entity).
You could use something like this (assuming this code is in controller):
$db = $this->get('database_connection');
$query = 'select * from <your_table>';
$sth = $db->prepare($query);
$sth->execute();
while($row = $sth->fetch()) {
// some stuff
}
Probably it's not what you need because you might want to have objects after handling all the collection. But maybe you don't need the objects. Anyway think about this.

PHP Use for loop to iterate over MySQLi recordset

Whenever I am working with PHP MySQLi recordsets, I have always worked with the returned data using the standard while loop to iterate over the recordset. Recently, however, I started wondering if there is a way to use a for loop instead. This would be handy in situations where you want to limit the number of results returned.
Here is an example of using the while loop:
//Prepare a query that will produce a reverse-order recordset
$sql = "SELECT * FROM tblNames ORDER BY numberID DESC";
$recordset = $conn -> query($sql);
//Count the number of contacts added to the list
$contactCount = 0;
while($row = $recordset -> fetch_assoc())
{
//If the list has reached its maximum number (5), end the display loop
if($contactCount >= 5)
{
break;
}
$contactList .= $row["name"] . "<br>";
//Increment the number of contacts added to the list
$contactCount ++;
}
//Use '$contactList' somewhere....
echo($contactList);
While this definitely works, there must be a better way to end the loop after a specified number of iterations. Is it easier to use a for loop in a situation like this? If so, how?
You can use LIMIT in the query. For example:
SELECT * FROM tblNames ORDER BY numberID DESC LIMIT 15
This way you don't have to worry about what happens if your query does return less than 15 results.
As I was writing this question, I suddenly decided that I would try it one last time, but in a different way than I had been previously. I had been stuck finding an efficient/safe way to tell when the recordset was empty (had been running into issues when the custom max number was greater than the number of records, and when there were no records).
//Execute the SQL query (reverse order), and store the results in a recordset
$sql = "SELECT * FROM tblNames ORDER BY numberID DESC";
$recordset = $conn -> query($sql);
//Use a 'for' loop to iterate over the recordset
for($i = 0; $i < 15; $i++)
{
//If there is another row in the recordset, add the column value to the list
if($row = $recordset -> fetch_assoc())
{
$contactList .= $row["name"] . "<br>";
}
else
{
//Break from the loop when there are no more records (used if the
// given maximum number was actually greater than the number of records)
break;
}
}
echo($contactList);
As far as I can tell, this is a much better way to loop through a set/custom number of records, and then stop. It also will safely catch the end of the recordset (assuming it is reached before the cutoff number), and end the loop.
Edit
As is pointed out in the answer by HenryTK above, if you have control over the query, the best way is to use the LIMIT SQL statement. However, if you merely have access to the recordset, I still think the for loop will save time. (Although I'm not sure when this would happen).

Getting faster results from 2 databases to form 1 resultset

So here is my scenario...
The bug_tracker table is in one server and task_traker is in another.
I want to show a combined result but can't since there are in two separate databases remotely.
So I am calling the task tracker first and then getting the bug details per iteration.
$task = oci_parse($task_conn, "select * from task_table where ....");
oci_execute($task);
while ($task_row = oci_fetch_array($task, OCI_ASSOC+OCI_RETURN_NULLS)) {
$bug = oci_parse($bug_conn, "select * from bug_table where id = " . $task_row['BUGID'] );
oci_execute($bug);
while ($task_row = oci_fetch_array($task, OCI_ASSOC+OCI_RETURN_NULLS)) {
... //output
}
... //output
}
But this entire process is making it very slow... since there are large number of records and columns.
Is there any way to make it even slightly faster? Note: I don't have access so can't setup oracle db links.
You could improve it using the IN statement:
<?php
$task = oci_parse($task_conn, "select * from task_table where ....");
oci_execute($task);
while ($task_row = oci_fetch_array($task, OCI_ASSOC+OCI_RETURN_NULLS)) {
$bugs[] = $task_row['BUGID'];
$users[] = $task_row['USER'];
$status[] = $task_row['TASK_STATUS'];
}
$bug = oci_parse($bug_conn, "select * from bug_table where id IN (" . implode(',', $bugs) . ");" );
oci_execute($bug);
while ($task_row = oci_fetch_array($task, OCI_ASSOC+OCI_RETURN_NULLS)) {
// ...
}
?>
On a sidenote, why are you not using PDO? I believe using it will already give you a performance boost.
PHP is not meant for this kind of operation, neither should you try to write your own join function.
One proper way of solving this issue is to dump the data from both databases into a local database, and there do the join.
You do not need anything fancy for the local database, an SQLite3 is probably enough.
Just dump the data from each database into a CSV files using a bash script that you put into cron. After the dump, (re)create each table in your SQLite3, and load the CSVs into these tables. After this you can do a join once and push the result into a new table which you then are free to query.
This is what in the datawarehouse world is often referred to as an ETL process, just in this case, very very simplified.

Yii large SQL queries consumes a large amount of memory

I am using Yii 1.1.14 with php 5.3 on centos 6 and I am using CDbCommand to fetch data from a very large table, the result set is ~90,000 records over 10 columns I am exporting it to a csv file and the file size is about 15MB,
the script always crashed without any error messages and only after some research I figured out that I need to raise the memory_limit in php.ini in order to be able to execute the script successfully.
The only problem is that for a successful execution I had to raise the memory limit to 512MB(!) which is a lot! and if 10 users will be executing the same script my server will not respond very well...
I was wondering if anyone might know of a way to reduce memory consumption on sql queries with Yii?
I know I can split the query to multiple queries using limits and offsets, but it just doesn't seem logical that a 15MB query will consume 512MB.
Here is the code:
set_time_limit(0);
$connection = new CDbConnection($dsn,$username,$password);
$command = $connection->createCommand('SELECT * FROM TEST_DATA');
$result = $command->queryAll(); //this is where the script crashes
print_r($result);
Any ideas would be greatly appreciated!
Thanks,
Instead of using readAll that will returns all the rows in a single array (the real memory problem is here), you should simply use a foreach loop (take a look at CDbDataReader), e.g. :
$command = $connection->createCommand('SELECT * FROM TEST_DATA');
$rows = $command->query();
foreach ($rows as $row)
{
}
EDIT : Using LIMIT
$count = Yii::app()->db->createCommand('SELECT COUNT(*) FROM TEST_DATA')->queryScalar();
$maxRows = 1000:
$maxPages = ceil($count / $maxRows);
for ($i=0;$i<$maxPages;$i++)
{
$offset = $i * $maxRows;
$rows = $connection->createCommand("SELECT * FROM TEST_DATA LIMIT $offset,$maxRows")->query();
foreach ($rows as $row)
{
// Here your code
}
}

Creating a very large MySQL Database from PHP Script

Please bear with me on this question.
I'm looking to create a relatively large MySQL database that I want to use to do some performance testing. I'm using Ubuntu 11.04 by the way.
I want to create about 6 tables, each with about 50 million records. Each table will have about 10 columns. The data would just be random data.
However, I'm not sure how I can go about doing this. Do I use PHP and loop INSERT queries (bound to timeout)? Or if that is inefficient, is there a way I can do this via some command line utility or shell script?
I'd really appreciate some guidance.
Thanks in advance.
mysql_import is what you want. Check this for full information. It's command line and very fast.
Command-line mode usually has the timeouts disabled, as that's a protection against taking down a webserver, which doesn't apply at the command line.
You can do it from PHP, though generating "random" data will be costly. How random does this information have to be? You can easily read from /dev/random and get "garbage", but it's not a source of "good" randomness (You'd want /dev/urandom, then, but that will block if there isn't enough entropy available to make good garbage).
Just make sure that you have keys disabled on the tables, as keeping those up-to-date will be a major drag on your insert operations. You can add/enable the keys AFTER you've got your data set populated.
If you do want to go the php way, you could do something like this:
<?php
//Edit Following
$millionsOfRows = 2;
$InsertBatchSize = 1000;
$table = 'ATable';
$RandStrLength = 10;
$timeOut = 0; //set 0 for no timeout
$columns = array('col1','col2','etc');
//Mysql Settings
$username = "root";
$password = "";
$database = "ADatabase";
$server = "localhost";
//Don't edit below
$letters = range('a','z');
$rows = $millionsOfRows * 1000000;
$colCount = count($columns);
$valueArray = array();
$con = #mysql_connect($server, $username, $password) or die('Error accessing database: '.mysql_error());
#mysql_select_db($database) or die ('Couldn\'t connect to database: '.mysql_error());
set_time_limit($timeOut);
for ($i = 0;$i<$rows;$i++)
{
$values = array();
for ($k = 0; $k<$colCount;$k++)
$values[] = RandomString();
$valueArray[] = "('".implode("', '", $values)."')";
if ($i > 0 && ($i % $InsertBatchSize) == 0)
{
echo "--".$i/$InsertBatchSize."--";
$sql = "INSERT INTO `$table` (`".implode('`,`',$columns)."`) VALUES ".implode(',',$valueArray);
mysql_query($sql);
echo $sql."<BR/><BR/>";
$valueArray = array();
}
}
mysql_close($con);
function RandomString ()
{
global $RandStrLength, $letters;
$str = "";
for ($i = 0;$i<$RandStrLength;$i++)
$str .= $letters[rand(0,25)];
return $str;
}
Of course you could just use a created dataset, like the NorthWind Database.
all you need to do is launch your script from command line like this:
php -q generator.php
it can then be a simple php file like this:
<?php
$fid = fopen("query.sql", "w");
fputs($fid, "create table a (id int not null auto_increment primary key, b int, c, int);\n");
for ($i = 0; $i < 50000000; $i++){
fputs($fid, "insert into table a (b,c) values (" . rand(0,1000) . ", " . rand(0,1000) . ")\n");
}
fclose($fid);
exec("mysql -u$user -p$password $db < query.sql");
Probably it is fastest to run multiple inserts in one query as:
INSERT INTO `test` VALUES
(1,2,3,4,5,6,7,8,9,0),
(1,2,3,4,5,6,7,8,9,0),
.....
(1,2,3,4,5,6,7,8,9,0)
I created a PHP script to do this. First I tried to construct a query that will hold 1 million inserts but it failed. Then I tried with 100 thousend and it failed again. 50 thousends don't do it also. My nest try was with 10 000 and it works fine. I guess I am hitting the transfer limit from PHP to MySQL. Here is the code:
<?php
set_time_limit(0);
ini_set('memory_limit', -1);
define('NUM_INSERTS_IN_QUERY', 10000);
define('NUM_QUERIES', 100);
// build query
$time = microtime(true);
$queries = array();
for($i = 0; $i < NUM_QUERIES; $i++){
$queries[$i] = 'INSERT INTO `test` VALUES ';
for($j = 0; $j < NUM_INSERTS_IN_QUERY; $j++){
$queries[$i] .= '(1,2,3,4,5,6,7,8,9,0),';
}
$queries[$i] = rtrim($queries[$i], ',');
}
echo "Building query took " . (microtime(true) - $time) . " seconds\n";
mysql_connect('localhost', 'root', '') or die(mysql_error());
mysql_select_db('store') or die(mysql_error());
mysql_query('DELETE FROM `test`') or die(mysql_error());
// execute the query
$time = microtime(true);
for($i = 0; $i < NUM_QUERIES; $i++){
mysql_query($queries[$i]) or die(mysql_error());
// verify all rows inserted
if(mysql_affected_rows() != NUM_INSERTS_IN_QUERY){
echo "ERROR: on run $i not all rows inserted (" . mysql_affected_rows() . ")\n";
exit;
}
}
echo "Executing query took " . (microtime(true) - $time) . " seconds\n";
$result = mysql_query('SELECT count(*) FROM `test`') or die(mysql_error());
$row = mysql_fetch_row($result);
echo "Total number of rows in table: {$row[0]}\n";
echo "Total memory used in bytes: " . memory_get_usage() . "\n";
?>
The result on my Win 7 dev machine are:
Building query took 0.30241012573242 seconds
Executing query took 5.6592788696289 seconds
Total number of rows in table: 1000000
Total memory used in bytes: 22396560
So for 1 mil inserts it took 5 and a half seconds. Then I ran it with this settings:
define('NUM_INSERTS_IN_QUERY', 1);
define('NUM_QUERIES', 1000000);
which is basically doing one insert per query. The results are:
Building query took 1.6551470756531 seconds
Executing query took 77.895285844803 seconds
Total number of rows in table: 1000000
Total memory used in bytes: 140579784
Then I tried to create a file with one insert per query in it, as suggested by #jancha. My code is slightly modified:
$fid = fopen("query.sql", "w");
fputs($fid, "use store;");
for($i = 0; $i < 1000000; $i++){
fputs($fid, "insert into `test` values (1,2,3,4,5,6,7,8,9,0);\n");
}
fclose($fid);
$time = microtime(true);
exec("mysql -uroot < query.sql");
echo "Executing query took " . (microtime(true) - $time) . " seconds\n";
The result is:
Executing query took 79.207592964172 seconds
Same as executing the queries through PHP. So, probably the fastest way is to do multiple inserts in one query and shouldn't be a problem to use PHP to do the work.
Do I use PHP and loop INSERT queries (bound to timeout)
Certainly running long duration scripts via a webserver mediated requset is not a good idea. But PHP can be compiled to run from the command line - in fact most distributions of PHP come bundled with this.
There are lots of things you do to make this run more efficiently, exactly which ones will vary depedning on how you are populating the data set (e.g. once only, lots of batch additions). However for a single load, you might want to have a look at the output of mysqldump (note disabling, enabling indexes, multiple insert lines) and recreate this in PHP rather than connecting directly to the database from PHP.
I see no point in this question, and, especially, in raising a bounty for it.
as they say, "the best is the enemy of good"
You have asked this question ten days ago.
If you'd just go with whatever code you've got, you'd have your tables already and even done with your tests. But you lose so much time just in vain. It's above my understanding.
As for the method you've been asking for (just to keep away all these self-appointed moderators), there are some statements as a food for thought:
mysql's own methods considered more effective in general.
mysql can insert all data from the table into another using INSERT ... SELECT syntax. so, you will need to run only about 30 queries to get your 50 mil records.
and sure mysql can copy whole tables as well.
keep in mind that there should be no indexes at the time of table creation.
I just want to point you to http://www.mysqldumper.net/ which is a tool that allows you to backup and restore big databases with PHP.
The script has some mechanisms to circumvent the maximum execution time of PHP -> imo worth a look.
This is not a solution for generating data, but a great one for importing / exporting.

Categories