How does PDOStatement::fetch() work internally? - php

I read everywhere that using PDOStatement::fetch() would mean you will not run out of memory, no matter how large the resultset is. So that begs the question: where are the rows stored?
PHP has to store the results somewhere when it gets them from the database. Where are these results stored? Or, are they stored in the database, and PHP has to query the database for the next row every time?

It's similar to reading a file, you open a stream and read data piece by piece, except that database internals are a way more complicated.
For example, look at the description of $driver_options PDO::prepare, so you can even set scrollable cursor in order to control the direction of reading.

PDOStatement::fetch will get you next row from your query. I don't think that it will not run out of memory (if one row contains lots of data) because your data will be held in memory (read about Buffered and Unbuffered queries).

Related

Speed up insert in Mariadb

Please, if somebody can give me support.
My problem is:
I have a table with 8 fields and about 510 000 records. In a web form, the user select an Excel file and it's read it with SimpleXLSX. The file has about 340 000 lines. With PHP and SimpleXLSX library this file is loaded in memory, then with a for cicle the script read line by line, taken one data of ecah line and search this value in the table, if the data exists in the table, then does not insert the value, other wise, the values read it are stored in the table.
This process takes days to finish.
Can somebody suggest me some operation to speed up the process?
Thanks a lot.
if you have many users, and they maybe use the web at the same time:
you must change SimpleXLSX to js-xlsx, in webbrowser do all work but only write database in server
if you have few users (i think you in this case)
and search this value in the table
this is cost the must time, if your single-to-single compare memory and database, then add/not-add to database.
so you can read all database info in memory, (must use hash-list for compare),then compare all
and add it to memory and mark newable
at last
add memory info to database
because you database and xls have most same count, so...database become almost valueless
just forget database, this is most fast in memory
in memory use hash-list for compare
of course, you can let above run in database if you can use #Barmar's idea.. don't insert single, but batch
Focus on speed on throwing the data into the database. Do not try to do all the work during the INSERT. Then use SQL queries to further clean up the data.
Use the minimal XLS to get the XML into the database. Use some programming language if you need to massage the data a lot. Neither XLS nor SQL is the right place for complex string manipulations.
If practical, use LOAD DATA ... XML to get the data loaded; it is very fast.
SQL is excellent for handling entire tables at once; it is terrible at handling one row at a time. (Hence, my recommendation of putting the data into a staging table, not directly into the target table.)
If you want to discuss further, we need more details about the conversions involved.

PHP / MySQL if number of rows is bigger than 6400 doesn't return any results

I am using simple PHP query to fetch rows from the database, there are above 7000 rows in the table and whenever I fetch all rows, the PHP doesn't work and script dies, but when I limit the results to 6400 everything works fine.
Is there any limitation in MySQL or PHP that I should be aware of? If any, where I need to configure these settings.
Any help is highly appreciated.
Please note that my PHP settings allow the script execution time of 1800 seconds and memory usage is set to 512MB.
Why do you need that much rows? I'm pretty sure php dies because it runs out of memory.
Run your query in a console to see if you get there more than 7000 rows without issues. If it returns them all there you can be sure it's php and not your database and I'm sure it is php.
For whatever you do, it would be better to loop over the data, also known as "pagination" and read it in chunks of for example 100 rows and process it, 0-100, 100-200, 200-300...
You have 2 solutions as I see it:
1) The one that #burzum suggested. Really nice although you would have to empirically establish the max size based on you server load (if it's not constant).
2) Use mysql_unbuffered_query()
mysql_unbuffered_query does have some drawbacks as described there:
The benefits of mysql_unbuffered_query() come at a cost: you cannot use mysql_num_rows() and mysql_data_seek() on a result set returned from mysql_unbuffered_query(), until all rows are fetched. You also have to fetch all result rows from an unbuffered SQL query before you can send a new SQL query to MySQL, using the same link_identifier.
But since you're dealing with large dataset it seems well justified...

mysqli_fetch_assoc - what happens if the data is changed in the meanwhile?

In PHP I'm using mysqli_fetch_assoc() in a while-loop to get every record in a certain query.
I'm wondering what happens if the data is changed while running the loop (by another process or server), so that the record doesn't match the query any more. Will it still be fetched?
In other words, is the array of records that are fetched fixed, when you do query()? Or is it not?
Update:
I understand that it's a feature that the resultset is not changed when the data is changed, but what if you actually WANT that? In my loop I'm not interested in records that are already updated by another server. How do I check for that, without doing a new query for each record that I fetch??
UPDATE:
Detailed explanation:
I'm working on some kind of searchengine-scraper that searches for values in a database. This is done by a few servers at the same time. Items that have been scraped shouldn't be searched anymore. I can't really control which server searches which item, I was hoping I could check the status of an item, while fetching the recordset. Since it's a big dataset, I don't transfer the entire resultset before searching, I fetch each record when I need it...
Introduction
I'm wondering what happens if the data is changed while running the loop (by another process or server), so that the record doesn't match the query any more. Will it still be fetched?
Yes.
In other words, is the array of records that are fetched fixed, when you do query()? Or is it not?
Yes.
A DBMS would not be worth its salt were it vulnerable to race conditions between table updates and query resultset iteration.
Certainly, as far as the database itself is concerned, your SELECT query has completed before any data can be changed; the resultset is cached somewhere in the layers between your database and your PHP script.
In-depth
With respect to the ACID principle *:
In the context of databases, a single logical operation on the data is called a transaction.
User-instigated TRANSACTIONs can encompass several consecutive queries, but 4.33.4 and 4.33.5 in ISO/IEC 9075-2 describe how this takes place implicitly on the per-query level:
The following SQL-statements are transaction-initiating
SQL-statements, i.e., if there is no current SQLtransaction, and an
SQL-statement of this class is executed, then an SQL-transaction is
initiated, usually before execution of that SQL-statement proceeds:
All SQL-schema statements
The following SQL-transaction statements:
<start transaction statement>.
<savepoint statement>.
<commit statement>.
<rollback statement>.
The following SQL-data statements:
[..]
<select statement: single row>.
<direct select statement: multiple rows>.
<dynamic single row select statement>.
[..]
[..]
In addition, 4.35.6:
Effects of SQL-statements in an SQL-transaction
The execution of an SQL-statement within an SQL-transaction has no
effect on SQL-data or schemas [..]. Together with serializable
execution, this implies that all read operations are repeatable
within an SQL-transaction at isolation level SERIALIZABLE, except
for:
1) The effects of changes to SQL-data or schemas and its contents
made explicitly by the SQL-transaction itself.
2) The effects of differences in SQL parameter values supplied to externally-invoked
procedures.
3) The effects of references to time-varying system
variables such as CURRENT_DATE and CURRENT_USER.
Your wider requirement
I understand that it's a feature that the resultset is not changed when the data is changed, but what if you actually WANT that? In my loop I'm not interested in records that are already updated by another server. How do I check for that, without doing a new query for each record that I fetch??
You may not.
Although you can control the type of buffering performed by your connector (in this case, MySQLi), you cannot override the above-explained low-level fact of SQL: no INSERT or UPDATE or DELETE will have an effect on a SELECT in progress.
Once the SELECT has completed, the results are independent; it is the buffering of transport of this independent data that you can control, but that doesn't really help you to do what it sounds like you want to do.
This is rather fortunate, frankly, because what you want to do sounds rather bizarre!
* Strictly speaking, MySQL has only partial ACID-compliance for tables other than those with the non-default storage engines InnoDB, BDB and Cluster, and MyISAM does not support [user-instigated] transactions. Still, it seems like the "I" should remain applicable here; MyISAM would be essentially useless otherwise.

Deallocate Memory in PHP

I have a script that is transferring about 1.5 million rows (~400mb worth of data) from a table to another table (during this process, some data is converted, modified, and placed in the correct field). It's a simple script, it just recursively loads data, then places it in the new tables under the correct fields and formats. The scripts works by (as an example) pulling all of the users from the table then begins looping through the users, inserting them into the new table, then pulling all of the posts from that user, looping through and inserting them into the correct table, then pulling all of the comments from a post and inserting those, then jumping back up and pulling all of the contacts for that user, finally onto the next user where it goes through the same process.
I'm just having a problem with the immense amount of data being transferred, because it is so large and there isn't any sort of memory management besides garbage collection (that I know of) in PHP, I'm unable to complete the script (it gets through about 15,000 connections and items transferred before it maxes out at 200MB of memory).
This is a one time thing, so I'm doing it on my local computer, not an actual server.
Since unset() does not actually free up memory, is there any other way to free up the data in a variable? One thing I attempted to do was overwrite the variable to a NULL value, but that didn't seem to help.
Any advice would be awesome, because man, this stinks.
If you're actually doing this recursively then that's your problem - you should be doing it iteratively. Recursive processing leaves overhead (+garbage) every time the next call is made - so eventually you hit the limit. An iterative approach doesn't have such problems, and should be actively garbage collecting.
You're also talking about a mind numbing number of connections - why are there so many? I guess I don't completely understand your process, and why this approach is what's needed rather than one retrieve connection and one store connection. Even if you were - say - reconnecting on for each row, you should look at using persistent connections which allows the second connection to the same db to reuse the last connection. Persistent connections aren't a great idea for a web app with multi users (for scalability reasons) but in your very targeted case they should be fine.
unset() does free up memory, but only if the object you're unsetting has no other references pointing to it. Since PHP uses reference counting rather than 'real' GC, this can bite you if you have circular references somewhere - a typical culprit is inside an ORM, where you often have a Database object that holds references to some Table objects, and each Table object has a reference back to the Database. Even if no outside reference exists to either object, they both still reference each other, preventing the reference count from hitting zero.
Also, are both tables on the same database? If so, all you need might be a simple INSERT ... SELECT query, mapping columns and doing a bit of conversion on the fly (although the processing you need to perform might not be possible or feasible in SQL).
Other than that, you don't need that many connections. Just open one for the reader, one for the writer; prepare a statement on the writer, execute the reader query, fetch one row at a time (this is important: do not fetch them all at once) from the reader query, do the processing, stuff it in the prepared writer statement, rinse and repeat. PHP's memory usage should remain roughly constant after the first few rows.

Does using a PHP class to run mysql queries like this bad?

I been looking at some database (MySQL) wrapper classes, A lot of them work like this,
1) Run the sql query
2) while fetching associative mysql array, they cyle through the results and add them to there own array
3) then you would run the class like this below and cycle through it's array
<?php
$database = new Database();
$result = $database->query('SELECT * FROM user;');
foreach ($result as $user){
echo $user->username;
}
?>
So my question here, is this not good on a high traffic type site? I ask because as far as I can tell, mysql is returning an array which eats memory, then you are building a new array from that array and then cycleing through the new array. Is this not good or pretty much normal?
The short answer is: it's bad, very bad.
The trouble is you pay a nasty performance hit (cycles and memory!) by iterating over the results twice. (what if you have 1000 rows returned? you'd get all of the data, loop 1000 times, keep it all in memory, and then loop over it again).
If you refactor your class a bit you can still wrap the query and the fetch, but you'll want to do the fetch_array outside the query. In this case, you can discard each row from memory as soon as your done, so you don't need to store the entire result set, and you loop just one time.
IIRC, PHP won't load the entire MySQL result set into memory, basically, when you call mysql_fetch_array you're asking for the next row in the set, which is loaded only upon asking for it, so you're not paying the memory hit for the full set (on the PHP side) just by running the original query. The whole result gets loaded into memory when you use mysql_query (thanks VolkerK), but you're still paying that CPU cost twice and that could be a substantial penalty.
The code is fine.
foreach() just moves the array pointer on each pass.
You can read all about it here:
http://php.net/foreach
For a deeper understanding, look at how pointers work in C:
http://home.netcom.com/~tjensen/ptr/ch2x.htm
Nothing is copied, iteration is almost always performed by incrementing pointers.
This kind of query is pretty much normal. It's better to fetch only a row at a time if you can, but for normal small datasets of the kind you'd get for small and paged queries, the extra memory utilisation isn't going to matter a jot.
SELECT * FROM user, however, could certainly generate an unhelpfully large dataset, if you have a lot of users and a lot of information in each user row. Try to keep the columns and number of rows selected down to the minimum, the information you're actually going to put on the page.

Categories