How does PHP PDO work internally? - php

I want to use pdo in my application, but before that I want to understand how internally
PDOStatement->fetch and PDOStatement->fetchAll.
For my application, I want to do something like "SELECT * FROM myTable" and insert into csv file and it has around 90000 rows of data.
My question is, if I use PDOStatement->fetch as I am using it here:
// First, prepare the statement, using placeholders
$query = "SELECT * FROM tableName";
$stmt = $this->connection->prepare($query);
// Execute the statement
$stmt->execute();
var_dump($stmt->fetch(PDO::FETCH_ASSOC));
while ($row = $stmt->fetch(PDO::FETCH_ASSOC))
{
echo "Hi";
// Export every row to a file
fputcsv($data, $row);
}
Will after every fetch from database, result for that fetch would be store in memory ?
Meaning when I do second fetch, memory would have data of first fetch as well as data for second fetch.
And so if I have 90000 rows of data and if am doing fetch every time than memory is being updated to take new fetch result without removing results from previous fetch and so for the last fetch memory would already have 89999 rows of data.
Is this how PDOStatement::fetch
works ?
Performance wise how does this stack
up against PDOStatement::fetchAll ?
Update: Something about fetch and fetchAll from memory usage point of view
Just wanted to added some thing to this question as recently found something regarding fetch and fetchAll, hope this would make this question worthwhile for people would visit this question in future to get some understanding on fetch and fetchAll parameters.
fetch does not store information in memory and it works on row to row basis, so it would go through the result set and return row 1, than again would go to the result set and than again return row 2 mind here that it will not return row 1 as well as 2 but would only return row 2, so fetch will not store anything into memory but fetchAll will store details into the memories. So fetch is better option compared to fetchAll if we are dealing with an resultant set of around 100K in size.

PHP generally keeps its results on the server. It all depends on the driver. MySQL can be used in an "unbuffered" mode, but it's a tad tricky to use. fetchAll() on a large result set can cause network flooding, memory exhaustion, etc.
In every case where I need to process more than 1,000 rows, I'm not using PHP. Consider also if your database engine already has a CSV export operation. Many do.

I advice you to use PDO::FETCH_LAZY instead of PDO::FETCH_ASSOC for big data.
I used it for export to csv row by row and it works fine.
Without any "out of memory" errors.

Related

How to create only part of queries in transaction with mysqli?

I'm trying to create a script to import about 10m records to mysql database.
When I did a loop with single queries, import with 2000 records takes 20 minutes.
So I'm trying to do this with transactions. The problem is, in my loop there are some select queries that need to be trigger at once to get some values to create inserts. Last two queries (insert and update) could be in in transaction.
Something like this:
foreach($record as $rec) {
//select sth
//do sth with result
//second select sth
//do sth with second result
//prepare values from above results and $rec
// below part I'd like to do with transaction
//insert with new record
//update table
}
I know this is little messy and not exact, but this function is more complicated, so I decided to put just a "draft" and I need just advice, not complete code.
Regards
Transactions are for multiple statements that need to be treated as a single group that either entirely succeeds or entirely fails. It sounds like your issue has a lot more to do with performance than transactions. Unless there is a bit of information that you haven't included that involves groups of statements "which all must succeed at the same time", transactions are just a distraction.
There are a few ways to approach your problem depending on some things that aren't immediately obvious from your post.
-If your data source for the 10M records is a table in the same database that you are going to populate with the new records (via the inserts and updates at the end of your loop) then you might be able to do everything through a single database query. SQL is very expressive and through joins and some of the built in functions (SUBSTR(), UPPER(), REVERSE(), CASE...END, et c.) you might be able to do everything you want. This would require reading up on SQL and trying to reframe your goals in terms of set operations that you could do.
-If you are inserting records that are sourced from outside the database (like from a file) then I would organize your code like this
//select sth
//do sth with result
//second select sth
//do sth with second result
//prepare values from above results so that $rec info can be added in later
foreach($record as $rec) {
//construct a big insert statement
}
//insert the new records by running the big insert statement
//update table
The advantage here is that you are only hitting the db with a few queries, instead of a few queries per $rec so your performance will be better (since db calls have overhead). For 10M rows you may need to break the above up into a few chunks since there is a limit to how big a single insert can be (see max_allowed_packet). I would suggest breaking the 10M into 5K or 10K chunks by adding another loop around the above that partitions off the chunks from the 10M.
A clearer answer could have been given if you added details about your data source, what transformations you want to do on the data, what the purpose of the
//select sth
//do sth with result
//second select sth
//do sth with second result
section is (within the context of how it adds information to your insert statements later), and what the prepare values section of your code does.

Is there any CakePHP function that builds and executes the query without returning the result set

I'm using CakePHP 2.1.3. I have a performance problem of looping a large set of array data returned from find('all'). I want to retrieve a query result row by row to eliminate this expensive array processing. I don't want the result set of array returning from find() or query(). What I'm trying to do is like below:
$db = $this->Model->getDataSource();
$sql = 'SELECT * FROM my_table';
if($result = $db->execute($sql){
$db->resultSet($result);
while($row = $db->fetchResult()){
// do something with $row
}
}
However, I don't want to write the raw query. Is there any Cake function that just builds the query according to the association set and executes it without returning the result set?
[Edit]
I'm currently implementing the above script in controller. My model has no associations and so I don't need to use recursive = -1. It is the whole table fetching for the purpose of CSV export.
The Cake's find() has an internal array processing and the returned result set has to be looped again explicitly. I want to optimize the code by avoiding the array processing of large data twice.
Related issue: https://github.com/cakephp/cakephp/issues/6426
At first be sure, that you only fetch the data you really yreally need. Ideally you get everything you need with $this->YourModel->recursive = -1
Often performance problems arise due to many connected data.
When you have checked this I think a loop would be the best solution where you fetch the desired data in chunks via an between condition. Although I am not sure if this will help you.
Why do you want to go through the whole table? Do you perform some maintenance like e.g. filling a new field or updating a counter? Maybe you can achieve the goal better than by trying to fetch a whole table.

PDO PHP Postgres: slow fetching of data

i was playing with PDO on PostgreSQL 9.2.4 and was trying to fetch data from a table having millions on rows. My query returns about 100.000 rows.
I do not use any of PDOStatements's fetch function, i simply use the result from the PDO Objecte itels and loop through it.
But its getting slower and slower by time. At the beginning it was fetching like 200 rows per second. But the close it comes to its end, it gets slower. Now being at row 30.000 it fetches only 1 row per second. Why is it getting slower.
I do this, its pretty simple:
$dbh = new PDO("pgsql...");
$sql = "SELECT x, y FROM point WHERE name is NOT NULL and place IN ('area1', 'area2')";
$res = $dbh->query($sql);
$ins_sql = "INSERT INTO mypoints (x, y) VALUES ";
$ins_vals = [];
$ins_placeholders = [];
foreach($res as $row) {
$ins_placeholders[] = "(?,?)";
$ins_vals = array_merge($ins_vals, [$row['x'], $row['y']]);
printCounter();
}
// now build up one insert query using placeholders and values,
// to insert all of them in one shot into table mypoints
Function printCounter simply increases an int var and prints it. So i can see how many rows it has put already in that array before i create my insert statement out of it. I use one shot inserts to speed things up, better than doing 100.000 inserts.
But that foreach loop is getting slower by time. How can i increase the speed.
Is there a difference between fetch() and the simple loop method using the pdostatement in foreach?
when i start this php script, it takes like 5-10 seconds for the query. So this has nothing to do with how the table is setup and if i need indexes.
I have other tables returning 1 million rows, im not sure what is the best way to fetch them. I can raise PHP's memory_limit if needed, so the most important thing for me is SPEED.
Appreciate any help.
It's not likely that the slowness is related to the database, because after the $dbh->query() call, the query is finished and the resulting rows are all in memory (they are not in PHP variables yet, but they're in memory accessible at the pgsql module level).
The more likely culprit is the array_merge operation. The array becomes larger at every loop iteration, and the operation recreates the entire array each time.
You may want to do instead:
$ins_vals[] = [$row['x'], $row['y']];
Although personally, when concerned with speed, I'd use an even simpler flat structure:
$ins_vals[] = $x;
$ins_vals[] = $y;
Another unrelated point is that it seems to build a query with a huge number of placeholders, which is not how placeholders are normally used. To send large numbers of values to the server, the efficient way is to use COPY, possibly into a temporary table followed by server-side merge operations if it's not a plain insertion.
I dont know why, but using fetch() method instead and doing the $ins_val filling like this:
$ins_vals[] = $x;
$ins_vals[] = $y;
and using beginTransaction and commit makes now my script unbelievable fast.
Now it takes only about 1 minute to add my 100.000 points.
i think both array_merge and that "ugly" looping through the PDOStatement slowed down my script.
And why the heck someone downvoted my question? Are you punishing me because of my missing knowledge? Thanks.
Ok i generated a class where i set the sql and then put the values for each row with a method call. Whenever it reaches a specific limit, it starts a transaction, prepares the statement with as many placeholders as i have put values, then executes it with the array having all the values, then commit.
This seems to be fast enough, at least it doesnt get slower anymore.
For some reason its faster to add values in a flat structure as Daniel suggested. Thats enough for me.
Sometimes its good to have a function doing one step of insertion, because when the function returns, all the memory used in the function will be freed, so your memory usage stays low.

Sqlite3 How to get number of rows in table without reading all the rows?

Can some shed some light on hoe to get the number of rows (using php) in a table without actually having to read all the rows? I am using squlite to log data periodically and need to know the table row count before I actually access specific data?
Apart from reading all rows and incrementing a counter, I cannot seem to work out how to do this quickly (it's a large database) rather simple requirement? I have tried the following php code but it only returns a boolean response rather that the actual number of rows?
$result = $db->query('SELECT count(*) FROM mdata');
Normally the SELECT statement will also return the data object (if there is any?)
Just use
LIMIT 1
That should work!! It Limits the result to ONLY look at 1 row!!
if you have record set then you get number or record by mysql_num_rows(#Record Set#);

How do PHP/MySQL database queries work exactly?

I have used MySQL a lot, but I always wondered exactly how does it work - when I get a positive result, where is the data stored exactly? For example, I write like this:
$sql = "SELECT * FROM TABLE";
$result = mysql_query($sql);
while ($row = mysql_fetch_object($result)) {
echo $row->column_name;
}
When a result is returned, I am assuming it's holding all the data results or does it return in a fragment and only returns where it is asked for, like $row->column_name?
Or does it really return every single row of data even if you only wanted one column in $result?
Also, if I paginate using LIMIT, does it hold THAT original (old) result even if the database is updated?
The details are implementation dependent but generally speaking, results are buffered. Executing a query against a database will return some result set. If it's sufficiently small all the results may be returned with the initial call or some might be and more results are returned as you iterate over the result object.
Think of the sequence this way:
You open a connection to the database;
There is possibly a second call to select a database or it might be done as part of (1);
That authentication and connection step is (at least) one round trip to the server (ignoring persistent connections);
You execute a query on the client;
That query is sent to the server;
The server has to determine how to execute the query;
If the server has previously executed the query the execution plan may still be in the query cache. If not a new plan must be created;
The server executes the query as given and returns a result to the client;
That result will contain some buffer of rows that is implementation dependent. It might be 100 rows or more or less. All columns are returned for each row;
As you fetch more rows eventually the client will ask the server for more rows. This may be when the client runs out or it may be done preemptively. Again this is implementation dependent.
The idea of all this is to minimize roundtrips to the server without sending back too much unnecessary data, which is why if you ask for a million rows you won't get them all back at once.
LIMIT clauses--or any clause in fact--will modify the result set.
Lastly, (7) is important because SELECT * FROM table WHERE a = 'foo' and SELECT * FROM table WHERE a = 'bar' are two different queries as far as the database optimizer is concerned so an execution plan must be determined for each separately. But a parameterized query (SELECT * FROM table WHERE a = :param) with different parameters is one query and only needs to be planned once (at least until it falls out of the query cache).
I think you are confusing the two types of variables you're dealing with, and neither answer really clarifies that so far.
$result is a MySQL result object. It does not "contain any rows." When you say $result = mysql_query($sql), MySQL executes the query, and knows what rows will match, but the data has not been transferred over to the PHP side. $result can be thought of as a pointer to a query that you asked MySQL to execute.
When you say $row = mysql_fetch_object($result), that's when PHP's MySQL interface retrieves a row for you. Only that row is put into $row (as a plain old PHP object, but you can use a different fetch function to ask for an associative array, or specific column(s) from each row.)
Rows may be buffered with the expectation that you will be retrieving all of the rows in a tight loop (which is usually the case), but in general, rows are retrieved when you ask for them with one of the mysql_fetch_* functions.
If you only want one column from the database, then you should SELECT that_column FROM .... Using a LIMIT clause is also a good idea whenever possible, because MySQL can usually perform significant optimizations if it knows that you only want a certain group of rows.
The first question can be answered by reading up on resources
Since you are SELECTing "*", every column is returned for each mysql_fetch_object call. Just look at print_r($row) to see.
In simple words the resource returned it like an ID that the MySQL library associate with other data. I think it is like the identification card in your wallet, it's just a number and some information but asociated with a lot of more information if you give it to the goverment, or your cell-phone company, etc.

Categories