Is it bad to put a MySQL query in a PHP loop? - php

I often have large arrays, or large amounts of dynamic data in PHP that I need to run MySQL queries to handle.
Is there a better way to run many processes like INSERT or UPDATE without looping through the information to be INSERT-ed or UPDATE-ed?
Example (I didn't use prepared statement for brevity sake):
$myArray = array('apple','orange','grape');
foreach($myArray as $arrayFruit) {
$query = "INSERT INTO `Fruits` (`FruitName`) VALUES ('" . $arrayFruit . "')";
mysql_query($query, $connection);
}

OPTION 1
You can actually run multiple queries at once.
$queries = '';
foreach(){
$queries .= "INSERT....;"; //notice the semi colon
}
mysql_query($queries, $connection);
This would save on your processing.
OPTION 2
If your insert is that simple for the same table, you can do multiple inserts in ONE query
$fruits = "('".implode("'), ('", $fruitsArray)."')";
mysql_query("INSERT INTO Fruits (Fruit) VALUES $fruits", $connection);
The query ends up looking something like this:
$query = "INSERT INTO Fruits (Fruit)
VALUES
('Apple'),
('Pear'),
('Banana')";
This is probably the way you want to go.

If you have the mysqli class, you can iterate over the values to insert using a prepared statement.
$sth = $dbh->prepare("INSERT INTO Fruits (Fruit) VALUES (?)");
foreach($fruits as $fruit)
{
$sth->reset(); // make sure we are fresh from the previous iteration
$sth->bind_param('s', $fruit); // bind one or more variables to the query
$sth->execute(); // execute the query
}

one thing to note about your original solution over the implosion method of jerebear (which I have used before, and love) is that it is easier to read. The implosion takes more programmer brain cycles to understand, which can be more expensive than processor cycles. premature optimisation, blah, blah, blah... :)

One thing to note about jerebear's answer with multiple VALUE-blocks in one INSERT:
It can be rather dangerous for really large amounts of data, because most DBMS have an upper limit on the size of the commands they can handle. If you exceed that with too many VALUE-blocks, your insert will fail. On MySQL for example the limit is usually 1MB AFAIK.
So you should figure out what the maximum size is (ideally at runtime, might be available from the database metadata), and make sure you don't exceed it by spreading your lists of values over several INSERTs.

I was inspired by jerebear's answer to build something like his second option for one of my current projects. Because of the shear volume of records I couldn't save and do all the data at once. So I built this to do imports. You add your data, and then call a method when each record is done. After a certain, configurable, number of records the data in memory will be saved with a mass insert like jerebear's second option.
// CREATE TABLE example ( Id INT, Field1 INT, Field2 INT, Field3 INT);
$import=new DataImport($dbh, 'example', 'Id, Field1, Field2, Field3');
foreach ($whatever as $row) {
// add data in the order of your column definition
$import->addValue($Id);
$import->addValue($Field1);
$import->addValue($Field2);
$import->addValue($Field3);
$import->nextRow();
}
$import->lastRow();

Related

MySQL and PDO, speed up query and get result/output from MySQL function (routine)?

Getting the Value:
I've got the levenshtein_ratio function, from here, queued up in my MySQL database. I run it in the following way:
$stmt = $db->prepare("SELECT r_id, val FROM table WHERE levenshtein_ratio(:input, someval) > 70");
$stmt->execute(array('input' => $input));
$result = $stmt->fetchAll();
if(count($result)) {
foreach($result as $row) {
$out .= $row['r_id'] . ', ' . $row['val'];
}
}
And it works a treat, exactly as expected. But I was wondering, is there a nice way to also get the value that levenshtein_ratio() calculates?
I've tried:
$stmt = $db->prepare("SELECT levenshtein_ratio(:input, someval), r_id, val FROM table WHERE levenshtein_ratio(:input, someval) > 70");
$stmt->execute(array('input' => $input));
$result = $stmt->fetchAll();
if(count($result)) {
foreach($result as $row) {
$out .= $row['r_id'] . ', ' . $row['val'] . ', ' . $row[0];
}
}
and it does technically work (I get the percentage from the $row[0]), but the query is a bit ugly, and I can't use a proper key to get the value, like I can for the other two items.
Is there a way to somehow get a nice reference for it?
I tried:
$stmt = $db->prepare("SELECT r_id, val SET output=levenshtein_ratio(:input, someval) FROM table WHERE levenshtein_ratio(:input, someval) > 70");
modelling it after something I found online, but it didn't work, and ends up ruining the whole query.
Speeding It Up:
I'm running this query for an array of values:
foreach($parent as $input){
$stmt = ...
$stmt->execute...
$result = $stmt->fetchAll();
... etc
}
But it ends up being remarkably slow. Like 20s slow, for an array of only 14 inputs and a DB with about 350 rows, which is expected to be in the 10,000's soon. I know that putting queries inside loops is naughty business, but I'm not sure how else to get around it.
EDIT 1
When I use
$stmt = $db->prepare("SELECT r_id, val SET output=levenshtein_ratio(:input, someval) FROM table WHERE levenshtein_ratio(:input, someval) > 70");
surely that's costing twice the time as if I only calculated it once? Similar to having $i < sizeof($arr); in a for loop?
To clean up the column names you can use "as" to rename the column of the function. At the same time you can speed things up by using that column name in your where clause so the function is only executed once.
$stmt = $db->prepare("SELECT r_id, levenshtein_ratio(:input, someval) AS val FROM table HAVING val > 70");
If it is still too slow you might consider a c library like https://github.com/juanmirocks/Levenshtein-MySQL-UDF
doh - forgot to switch "where" to "having", as spencer7593 noted.
I'm assuming that `someval` is an unqalified reference to a column in the table. While you may understand that without looking at the table definition, someone else reading the SQL statement can't tell. As an aid to future readers, consider qualifying your column references with the name of the table or (preferably) a short alias assigned to the table in the statement.
SELECT t.r_id
, t.val
FROM `table` t
WHERE levenshtein_ratio(:input, t.someval) > 70
That function in the WHERE clause has to be evaluated for every row in the table. There's no way to get MySQL to build an index on that. So there's no way to get MySQL to perform an index range scan operation.
It might be possible to get MySQL to use an index for the query, for example, if the query had an ORDER BY t.val clause, or if there is a "covering index" available.
But that doesn't get around the issue of needing to evaluate the function for every row. (If the query had other predicates that excluded rows, then the function wouldn't necessarily need be evaluated for the excluded rows.)
Adding the expression to the SELECT list really shouldn't be too expensive if the function is declared to be DETERMINISTIC. A second call to a DETERMINISTIC function with the same arguments can reuse the value returned for the previous execution. (Declaring a function DETERMINISTIC essentially means that the function is guaranteed to return the same result when given the same argument values. Repeated calls will return the same value. That is, the return value depends only the argument values, and doesn't depend on anything else.
SELECT t.r_id
, t.val
, levenshtein_ratio(:input, t.someval) AS lev_ratio
FROM `table` t
WHERE levenshtein_ratio(:input2, t.someval) > 70
(Note: I used a distinct bind placeholder name for the second reference because PDO doesn't handle "duplicate" bind placeholder names as we'd expect. (It's possible that this has been corrected in more recent versions of PDO. The first "fix" for the issue was an update to the documentation noting that bind placeholder names should appear only once in statement, if you needed two references to the same value, use two different placeholder names and bind the same value to both.)
If you don't want to repeat the expression, you could move the condition from the WHERE clause to the HAVING, and refer to the expression in the SELECT list by the alias assigned to the column.
SELECT t.r_id
, t.val
, levenshtein_ratio(:input, t.someval) AS lev_ratio
FROM `table` t
HAVING lev_ratio > 70
The big difference between WHERE and HAVING is that the predicates in the WHERE clause are evaluated when the rows are accessed. The HAVING clause is evaluated much later, after the rows have been accessed. (That's a brief explanation of why the HAVING clause can reference columns in the SELECT list by their alias, but the WHERE clause can't do that.)
If that's a large table, and a large number of rows are being excluded, there might be a significant performance difference using the HAVING clause.. there may be a much larger intermediate set created.
To get an "index used" for the query, a covering index is the only option I see.
ON `table` (r_id, val, someval)
With that, MySQL can satisfy the query from the index, without needing to lookup pages in the underlying table. All of the column values the query needs are available from the index.
FOLLOWUP
To get an index created, we would need to create a column, e.g.
lev_ratio_foo FLOAT
and pre-populate with the result from the function
UPDATE `table` t
SET t.lev_ratio_foo = levenshtein_ratio('foo', t.someval)
;
Then we could create an index, e.g.
... ON `table` (lev_ratio_foo, val, r_id)
And re-write the query
SELECT t.r_id
, t.val
, t.lev_ratio_foo
FROM `table` t
WHERE t.lev_ratio_foo > 70
With that query, MySQL can make use of an index range scan operation on an index with lev_ratio_foo as the leading column.
Likely, we would want to add BEFORE INSERT and BEFORE UPDATE triggers to maintain the value, when a new row is added to the table, or the value of the someval column is modified.
That pattern could be extended, additional columns could be added for values other than 'foo'. e.g. 'bar'
UPDATE `table` t
SET t.lev_ratio_bar = levenshtein_ratio('bar', t.someval)
Obviously that approach isn't going to be scalable for a broad range of input values.

PDO two similar queries - only the first inserts

I have a form where the user can insert up to five line items for an invoice. The easiest way for me to do this is to just do five inserts and do a isset check before each query. However, the problem is if I try to run the two queries one after another only the first one inserts the data. I know I can combine them into one PDO query (and that does in fact work), but it does not suit my needs. The second query does not insert.
// Connect to the database
$conn = new PDO("mysql:host=$DB_HOST;dbname=$DB_DATABASE",$DB_USER,$DB_PASSWORD);
//Set all the data here
$receiptid = $_POST['receiptid'];
// .. the rest of the POST data gets set here.
//Insert first line item
$sql = "INSERT INTO lineitems (receiptid, service, description, quantity, unitprice, linetotal)
VALUES (:receiptid, :service, :description, :quantity, :unitprice, :linetotal)";
$q = $conn->prepare($sql);
$q->execute(array(':receiptid'=>$receiptid,
':service'=>$service,
':description'=>$description,
':quantity'=>$quantity,
':unitprice'=>$unitprice,
':linetotal'=>$linetotal));
//Insert second line item
$sql = "INSERT INTO lineitems (receiptid, service2, description2, quantity2, unitprice2, linetotal2)
VALUES (:receiptid, :service2, :description2, :quantity2, :unitprice2, :linetotal2)";
$q = $conn->prepare($sql);
$q->execute(array(':receiptid'=>$receiptid,
':service2'=>$service2,
':description2'=>$description2,
':quantity2'=>$quantity2,
':unitprice2'=>$unitprice2,
':linetotal2'=>$linetotal2));
Does your table really have different columns for each entered lineitem number (i.e. service2, descriptions2, etc.)?
Perhaps you need to change the field names in your second insert to match those in the first.
If you were handling cases where you did not get expected query result properly (i.e. checking your execution results and looking at the errors if something fails, You would be able to get to the source of the problem in a hurry.)

Downloading Large Data Sets -> Text to MySQL or just to MySQL?

I'm downloading large sets of data via an XML Query through PHP with the following scenario:
- Query for records 1-1000, download all parts (1000 parts has roughly 4.5 megs of text), then store those in memory while i query the next 1001 - 2000, store in mem (up to potentially 400k)
I'm wondering if it would be better to write these entries to a text field, rather than storing them in memory and once the complete download is done trying to insert them all up into the DB or to try and write them to the DB as they come in.
Any suggestions would be greatly appreciated.
Cheers
You can run a query like this:
INSERT INTO table (id, text)
VALUES (null, 'foo'), (null, 'bar'), ..., (null, 'value no 1000');
Doing this you'll do the thing in one shoot, and the parser will be called once. The best you can do, is running something like this with the MySQL's Benchmark function, running 1000 times a query that inserts 1000 records, or 1000000 of inserts of one record.
(Sorry about the prev. answer, I've misunderstood the question).
I think write them to database as soon as you receive them. This will save memory and u don't have to execute a 400 times slower query at the end. You will need mechanism to deal with any problems that may occur in this process like a disconnection after 399K results.
In my experience it would be better to download everything in a temporary area and then, when you are sure that everything went well, to move the data (or the files) in place.
As you are using a database you may want to dump everything into a table, something like this code:
$error=false;
while ( ($row = getNextRow($db)) && !error ) {
$sql = "insert into temptable(key, value) values ($row[0], $row[1])";
if (mysql_query ($sql) ) {
echo '#';
} else {
$error=true;
}
}
if (!error) {
$sql = "insert into myTable (select * from temptable)";
if (mysql_query($sql) {
echo 'Finished';
} else {
echo 'Error';
}
}
Alternatively, if you know the table well, you can add a "new" flag field for newly inserted lines and update everything when you are finished.

PDO: check tags presence in db and then insert

I'm working with PDO connection for mysql and I'd like to have some opinion on a query I use to check if tags are present on the database, and to add it in the case it isn't.
// the tags are allready processed in $tags array
$check_stmt = $connection->prepare ("SELECT * FROM tags WHERE tag_name = :tag_name");
$save_stmt = $connection->prepare ("INSERT INTO tags (tag_name) VALUES (:tag_name)");
foreach ($tags as $current_tag) {
$check_stmt->bindParam (':tag_name', $current_tag, PDO::PARAM_STR, 32);
$save_stmt->bindParam (':tag_name', $current_tag, PDO::PARAM_STR, 32);
$check_stmt->execute ($current_tag);
if ($check_stmt->rowCount() == 0) $save_stmt->execute ($current_tag);
}
I'm not skilled with databases so I'm not sure if the query is well projected
I'd adjust your selection query a bit, to optimize:
SELECT 1 AS found FROM tags WHERE tag_name = :tag_name LIMIT 1
SELECTing * transmits much more data (all fields in matching records) from the db to your app than is necessary. selecting only the fields you need is much more efficient, and in this case it looks like you're just checking for existence, so you don't need any record data, therefore the SELECT 1.
The LIMIT 1 limits the query results to one record, instead of all matching ones. Quicker query execution and again less data transfer.
Some dirtier MySQL-specific options include simply using REPLACE INTO (don't) or the IGNORE keyword in combination with INSERT (suggested). The INSERT IGNORE syntax will be slightly faster than executing your SELECT separately.

Batch processing in array using PHP

I got thousands of data inside the array that was parsed from xml.. My concern is the processing time of my script, Does it affect the processing time of my script since I have a hundred thousand records to be inserted in the database? I there a way that I process the insertion of the data to the database in batch?
Syntax is:
INSERT INTO tablename (fld1, fld2) VALUES (val1, val2), (val3, val4)... ;
So you can write smth. like this (dummy example):
foreach ($data AS $key=>$value)
{
$data[$key] = "($value[0], $value[1])";
}
$query = "INSERT INTO tablename (fld1, fld2) VALUES ".implode(',', $data);
This works quite fast event on huge datasets, and don't worry about performance if your dataset fits in memory.
This is for SQL files - but you can follow it's model ( if not just use it ) -
It splits the file up into parts that you can specify, say 3000 lines and then inserts them on a timed interval < 1 second to 1 minute or more.
This way a large file is broken into smaller inserts etc.
This will help bypass editing the php server configuration and worrying about memory limits etc. Such as script execution time and the like.
New Users can't insert links so Google Search "sql big dump" or if this works goto:
www [dot] ozerov [dot] de [ slash ] bigdump [ dot ] php
So you could even theoretically modify the above script to accept your array as the data source instead of the SQl file. It would take some modification obviously.
Hope it helps.
-R
Its unlikely to affect the processing time, but you'll need to ensure the DB's transaction logs are big enough to build a rollback segment for 100k rows.
Or with the ADOdb wrapper (http://adodb.sourceforge.net/):
// assuming you have your data in a form like this:
$params = array(
array("key1","val1"),
array("key2","val2"),
array("key3","val3"),
// etc...
);
// you can do this:
$sql = "INSERT INTO `tablename` (`key`,`val`) VALUES ( ?, ? )";
$db->Execute( $sql, $params );
Have you thought about array_chunk? It worked for me in another project
http://www.php.net/manual/en/function.array-chunk.php

Categories