optimization with postgres cursor in php - php

I'm dealing with big volumes of data. It's a huge table where I'm performing unions through a SQL statement, from my php, and sending over to my own localhost db. I've got the thing sorted out but I want to optimize this. It had to stay overnight merging around 83.000 rows.
$con = new PDO("pgsql:host, port,dbname, user, password");
$con->beginTransaction(); // cursors require a transaction.
$stmt = $con->prepare($query);
$stmt->execute();
$innerStatement = $con->prepare("FETCH 1 FROM cursor1");
while($innerStatement->execute() && $row = $innerStatement->fetch(PDO::FETCH_ASSOC)) {
insertDataToDB($row);
}
Question: will changing the line to "FETCH 1000 FROM cursor1" make it so that I'm fetching 1000 rows each time instead of one? Will that help performance?
I'm hoping this larger operation was a one time thing. But in future I will have to move smaller amounts of data... still the query is rather heavy since it relies in comparisons with timestamps, otherwise how would I know if my DB is updated or not?
Thank you.

Related

multiple query execution or one query and nextRowset() to SELECT?

in what is efficient to execute multiple queries:
this with nextRowset() function to move over the queries
$stmt = $db->query("SELECT 1; SELECT 2;");
$info1 = $stmt->fetchAll();
$stmt->nextRowset();
$info2 = $stmt->fetchAll();
or multiple executions plan which is a lot easier to manage?
$info1 = $db->query("SELECT 1;")->fetchAll();
$info2 = $db->query("SELECT 2;")->fetchAll();
Performance of the code is likely to be similar.
The code at the bottom, to me, is more efficient for your software design because:
it is more readable
it can be changed with less chance of error since each of them addresses 1 query only
individual query and its interaction can be moved to a different function easily and can be tested individually
That's why I feel that overall efficiency (not just how fast data comes back from DB to PHP to the user, but also maintainability/refactoring of code) will be better with the code at the bottom.
"SQL injection" by a hacker is easier when you issue multiple statements at once. So, don't do it.
If you do need it regularly, write a Stored Procedure to perform all the steps via one CALL statement. That will return multiple "rowsets", so similar code will be needed.

PDO unbuffered query still waits until query result is complete

I have an SQL query which can return quite a lot results (something like 10k rows) but I cannot use the SQL LIMIT parameter, as I don't know the exact amount of needed rows (there's a special grouping done in PHP). So the plan was to stop fetching rows once I have enough.
Since PDO normally operates in buffered mode, which fetches the whole result set and passes it to PHP, I switched PDO to unbuffered mode with
$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
Now I expected that executing the query should take about the same time no matter what LIMIT I pass. So basically
$result = $pdo->query($query);
$count = 0;
while ($row = $result->fetch()) {
++$count;
if ($count > 10) break;
}
should execute in about the same time for
$query = 'SELECT * FROM myTable';
and
$query = 'SELECT * FROM myTable LIMIT 10';
However the first one takes 8 seconds whereas the second one executes instantly. So it seems like the unbuffered query also waits until the whole result set is fetched - which shouldn't be the case according to the documentation.
Is there any way to get the query result instantly in PHP with PDO and stop the query once I have enough results?
Database applications like "Sequel Pro SQL" can do this (I can hit cancel after 1 second and get the results that were already queried until that time) so it can't be a general problem with MySQL servers.
I can workaround the problem by choosing a very high LIMIT which always has enough valid results after my grouping. But since performance is an issue, I'd like to query only as many entries as really needed. Please don't suggest anything that involves grouping in MySQL, the terrible performance of that is the reason we have to change the behaviour.
Now I expected that executing the query should take about the same time no matter what LIMIT I pass. So basically
This might not be completely true. While you won't get the overhead of receiving all your results, they are all queried (without a limit)! You do get the advantage of keeping most of the results serverside until you need them, but your server actually does perform the whole query first as far as I know. I'm not sure how complicated your query is, but this could be the issue?
Say for instance you have a very slow join (not indexed), but only want the first 10 by id, your query will get 10 based on the index, and then only do the join for those 10. This'll be quick
But if you don't actually limit, but ask for the result in batches, your complete join will have to be done (slow!) and then your resultsset is released in parts.
A quicker method might be to repeat your limited query untill you have your result. I know, this will increase overhead, but it might be way quicker. Only way to know is to test.
as response to your comment: this is from the manual
Unbuffered MySQL queries execute the query and then return a resource while the data is still waiting on the MySQL server for being fetched.
So it executes the query. The complete query. So as I tried to explain above, it will not be as quick as the same query with a LIMIT 10, as it doesn't perform a partial query! The fact that a different DB engine does this does not mean MySQL can...
Have you tried using prepare/execute instead of query, and putting a $stmt->closeCursor(); call after the break?
$stmt = $dbh->prepare($query);
$stmt->execute();
$count = 0;
while ($row = $stmt->fetch()) {
++$count;
if ($count > 10) break;
}
$stmt->closeCursor();

PHP General Principles: is one big SQL call better or lots of little ones

This is an optimisation question RE: 1st principles.. Imagine I am doing a big heavy lifting comparison.. 30k files vs 30k database entries.. is it most process efficient to do one big MySQL into an array then loop through physical files checking vs the array or is it better to loop through the files and then one at a time do one line MySQL calls..
Here is some pseudo code to help explain:
//is this faster?
foreach($recursiveFileList as $fullpath){
$Record = $db->queryrow("SELECT * FROM files WHERE fullpath='".$fullpath."'");
//do some $Record logic
}
//or is this faster
$BigList = array();
$db->query("SELECT * FROM files");
while($Record = $db->rows()){
$BigList[$Record['fullpath']] = $Record;
}
foreach($recursiveFileList as $fullpath){
if (isset($BigList[$fullpath])){
$Record = $BigList[$fullpath];
//do some $Record logic
}
}
Update: if you always know that your $recursiveFileList is 100% of the table, then doing one query per row would be needless overhead. In that case, just use SELECT * FROM files.
I wouldn't use either of the two styles you show.
The first style runs one separate SQL query for each individual fullpath. This causes some overhead of SQL parsing, optimization, etc. Keep in mind that MySQL does not have the capability of remembering the query optimization from one invocation of a similar query to the next; it analysis and performs query optimization every time. The overhead is relatively small, but it adds up.
The second style shows fetching all rows from the table, and sorting it out in the application layer. This has a lot of overhead, because typically your $recursiveFileList might match only 1% or 0.1% or an even smaller portion of the rows in the table. I have seen cases where transferring excessive amounts of data over the network literally exhausted a 1Gbps network switch, and this put a ceiling on the requests per second for the application.
Use query conditions and indexes wisely to let the RDBMS examine and return only the matching rows.
The two styles you show are not the only options. What I would suggest is to use a range query to match multiple file fullpath values in a single query.
$sql = "SELECT * FROM files WHERE fullpath IN ("
. array_fill(0, count($recursiveFileList), "?") . ")";
$stmt = $pdo->prepare($sql);
$stmt->execute($recursiveFileList);
while ($row = $stmt->fetch()) {
//do some $Record logic
}
Note I also use a prepared query with ? parameter placeholders, and then pass the array of fullpath values separately when I call execute(). PDO is nice for this, because you can just pass an array, and the array elements get matched up to the parameter placeholders.
This also solves the risk of SQL injection in this case.

Is my method of fetching MySql data using prepared statements inefficient and taxing on my server?

I was informed by someone senior in our company today that the PHP code I have written for performing prepared statements on a MySQL database is "inefficient" and "too taxing on our server". Since then I find myself in the difficult position of trying to understand what he meant and then to fix it. I have no contact to said person for four days now so I am asking other developers what they think of my code and if there are any areas that might be causing bottlenecks or issues with server performance.
My code works and returns the results of my query in the variable $data, so technically it works. There is another question though as to whether it is efficient and written well. Any advice as to what that senior employee meant or was referring to? Here is the method I use to connect and query our databases.
(Please note, when I use the word method I do not mean a method inside a class. What I mean to say is this how I write/structure my code when I connect and query our databases.)
<?php
// Create database object and connect to database
$mysqli=new mysqli();
$mysqli->real_connect($hostname, $username, $password, $database);
// Create statement object
$stmt=$mysqli->stmt_init();
// Prepare the query and bind params
$stmt->prepare('SELECT `col` FROM `table` WHERE `col` > ?');
$stmt->bind_param('i', $var1);
// Execute the query
$stmt->execute();
// Store result
$stmt->store_result();
// Prepare for fetching result
$rslt=array();
$stmt->bind_result($rslt['col']);
// Fetch result and save to array
$data=array();
while($stmt->fetch()){
foreach($rslt as $key=>$value){
$row[$key]=$value;
}
$data[]=$row;
}
// Free result
$stmt->free_result();
// Close connections
$stmt->close();
$mysqli->close();
?>
Any advice or suggestions are useful, please do contribute and help out even if you are only guessing. Thanks in advance :)
There are two types of code that may be inefficient, the PHP code and the SQL code, or both.
For example, the SQL is a problem if the `col` column isn't indexed in the database. This puts lots of load on the database because the database has to scan very many rows to answer queries. If `col` isn't indexed in the given query, then all of the rows in the table would be scanned. Also, if the value passed in isn't very selective, then many rows will have to be examined, perhaps all of the rows, as MySQL will choose a table scan over an index scan when many rows will be examined. You will need to become familiar with the MySQL EXPLAIN plan feature to fix your queries, or add indexes to the database to support your queries.
The PHP would be a problem if you followed something like the pattern:
select invoice_id from invoices where customer_id = ?
for each invoice_id
select * from line_items where invoice_id = ?
That kind of pattern will lead to "over querying" the database, which puts extra load on it. Instead use a join:
select li.* from invoices i join line_items li using (invoice_id)
Ask your database administrator to turn on the slow query log and then process it with pt-query-digest
You can use pt-query-digest to report on queries that are expensive (take a long time to execute) and also to use it to report by frequency to detect over querying.

Should I rate-limit or reduce my database queries?

I'm creating a PHP script that imports some data from text files into a MySQL database. These text files are pretty large, an average file will have 10,000 lines in it each of which corresponds to a new item I want in my database. (I won't be importing files very often)
I'm worried that reading a line from the file, and then doing a INSERT query, 10,000 times in a row might cause some issues. Is there a better way for me to do this? Should I perform one INSERT query with all 10,000 values? Or would that be just as bad?
Maybe I can reach a medium, and perform something like 10 or 100 entries at once. Really my problem is that I don't know what is good practice. Maybe 10,000 queries in a row is fine and I'm just worrying for nothing.
Any suggestions?
yes it is
<?php
$lines = file('file.txt');
$count = count($lines);
$i = 0;
$query = "INSERT INTO table VALUES ";
foreach($lines as $line){
$i++;
if ($count == $i) {
$query .= "('".$line."')";
}
else{
$query .= "('".$line."'),";
}
}
echo $query;
http://sandbox.phpcode.eu/g/5ade4.php
this will make one single query, which is multiple faster than one-line-one-query style!
Use prepared statements, suggested by the authors of High Performance MySQL. It saves a lot of time (saves from wasteful protocol and SQL ASCII code).
I would do it in one large query with all the values at once. Just to be sure, though, make sure you run START TRANSACTION; before and COMMIT; afterwards, so that if something goes wrong during the execution of the query (which is possible, since it will most likely run for a fairly long time), the database will not be affected.

Categories