I have a mysql table with a lot of data in it. All of the rows in this table need to have one field modified in a way that is not easily expressed in pure SQL.
I'd like to be able to loop over the table row by row, and update all the entries one by one.
However to do this I would do something like:
$sql = "SELECT id,value FROM objects";
foreach ($dbh->query($sql) as $row)
{
$value = update_value( $row['value'] );
$id = $row['id'];
$update_sql = "UPDATE objects SET value='$value' WHERE id=$d";
$dbh->query( $update_sql );
}
Will this do something bad? (Other than potentially being slow?)
Clarification: In particular I'm worried about the first select using a cursor, rather than retrieving all the data in one hit within the foreach, and then
there being something I don't know about cursor invalidation caused by the update inside the loop. If there is some rule like "don't update the same table while scanning it with another cursor" it's likely that it will only show up on huge tables, and so me performing a small test case is pretty much useless.
If someone can point me to docs that say doing this is OK, rather than a particular problem with working this way, that'd also be great.
The results of a single query are consistent, so updates won't affect it. To keep in mind:
Use prepared statements; it will reduce the traffic between your process and the database, because only the values are transferred instead of a whole query every time.
If you're worried about other processes running at the same time, you should use transactions and proper locking, e.g.
// transaction started
SELECT id,value
FROM objects
LOCK IN SHARE MODE
// your other code
// commit transaction
Seems like you have two options right out of the gate:
(straight-forward): use something like 'fetchAll' to get all the results of the first query before you start looping through it. this will help keep you from overlapping cursors.
(more obscure): change this to use a stored function (in place of 'update_value') so you can turn the two queries into a single 'update objects set value=some_function( id )'
Depending on the size and duration of this you may need to lock everything beforehand.
Related
For example, i have a table "tbl_book" with 100 records or more with multiple column like book_name, book_publisher,book_author,book_rate in mysql "db_bookshop". Now i would like to fetch them all by one query without iterate 100 times instead of one or two time looping. Is it possible? Is there any tricky way to do that. Generally we do what
$result = mysql_query("SELECT desire_column_name FROM table_name WHERE clause");
while( $row = mysql_fetch_array($result) ) {
$row['book_name'];
$row['book_publisher'];
$row['book_author'];
..........
$row['book_rate'];
}
// Or we may can use mysqli_query(); -mysqli_fetch_row(), mysqli_fetch_array(), mysqli_fetch_assoc();
My question is, is there any idea or any tricky way that we can be
avoided 1oo times iterate for fetching 1oo records? It's may be wired
to someone but one of the most experience programmer told me that it's
possible. But unfortunately i was not able to learn it from him. I
feel sorry for him because he is not anymore. Advance thanks for your idea sharing.
You should not use mysql_query the mysql extension is deprecated:
This extension is deprecated as of PHP 5.5.0, and has been removed as of PHP 7.0.0.
-- https://secure.php.net/manual/en/intro.mysql.php
When you use PDO you can fetch all items without looping over query like this
$connection = new PDO('mysql:host=localhost;dbname=testdb', 'dbuser', 'dbpass');
$statement = $connection->query('SELECT ...');
$rows = $statement->fetchAll();
The short answer - NO, it's impossible to fetch more than one record from a database without a loop.
But the the question here is that you don't want it.
There is no point in "just fetching" the data - you're always going to do something with it. With each row. Obviously, a loop is a natural way to do something with each row. Therefore, there is no point in trying to avoid a loop.
Which renders your question rather meaningless.
Regarding performance. The truth is that you experience not a single performance problem related to fetching just 100 records from a database. Which renters your problem an imaginary one.
The only plausible question I can think off your post is your performance as a programmer, as lack of education makes you write a lot of unnecessary code. If you manage to ask a certain question regarding that matter, you'll be shown a way to avoid the useless repetitive typing.
Have you tried using mysql_fetch_assoc?
$result = mysql_query("SELECT desire_column_name FROM table_name WHERE clause");
while ($row = mysql_fetch_assoc($result)) {
// do stuff here like..
if (!empty($row['some_field'])){
echo $row["some_field"];
}
}
It is possible to read all 100 records without loop by hardcoding the main column values, but that would involve 100 x number of columns to be listed, and there could be limitation on the number of columns you can display in MySQL.
eg,
select
case when book_name='abc' then book_name end Name,
case when book_name='abc' then book_publisher end as Publisher,
case when book_name='abc' then book_author end as Author,
case when book_name='xyz' then book_name end Name,
case when book_name='xyz' then book_publisher end as Publisher,
case when book_name='xyz' then book_author end as Author,
...
...
from
db_bookshop;
It's not practical but if you have less rows to query you might find it useful.
The time taken to ask the MySQL server for something is far greater than one iteration through a client-side WHILE loop. So, to improve performance, the goal is to have the SELECT go to the server in one round trip. Different API calls do this or don't do this; read their details.
I have written a lot of UIs with MySQL under the covers. I think nothing of fetching a few dozen rows at once, and then build a <table> (or something) with the results. I rarely fetch more than 100, not because of performance, but because 100 is (usually) too much for the user to take in on a single web page.
Also, I think nothing of issuing several, maybe dozens, of queries in support of a single web page. The delay is insignificant, especially when compared to the user's time for reading, digesting, and moving to the next page. So, I try to give the user a digestible amount of info without having to click to another page to get more. There are tradeoffs.
When it is practical to have SQL do the 'digesting', do so. It is faster for MySQL do do a SUM() and return just the total, rather than return dozens of rows for the client to add up. This is mostly a 'bandwidth' issue. Either way, MySQL will fetch (internally) all the needed rows.
I'm trying to create a script to import about 10m records to mysql database.
When I did a loop with single queries, import with 2000 records takes 20 minutes.
So I'm trying to do this with transactions. The problem is, in my loop there are some select queries that need to be trigger at once to get some values to create inserts. Last two queries (insert and update) could be in in transaction.
Something like this:
foreach($record as $rec) {
//select sth
//do sth with result
//second select sth
//do sth with second result
//prepare values from above results and $rec
// below part I'd like to do with transaction
//insert with new record
//update table
}
I know this is little messy and not exact, but this function is more complicated, so I decided to put just a "draft" and I need just advice, not complete code.
Regards
Transactions are for multiple statements that need to be treated as a single group that either entirely succeeds or entirely fails. It sounds like your issue has a lot more to do with performance than transactions. Unless there is a bit of information that you haven't included that involves groups of statements "which all must succeed at the same time", transactions are just a distraction.
There are a few ways to approach your problem depending on some things that aren't immediately obvious from your post.
-If your data source for the 10M records is a table in the same database that you are going to populate with the new records (via the inserts and updates at the end of your loop) then you might be able to do everything through a single database query. SQL is very expressive and through joins and some of the built in functions (SUBSTR(), UPPER(), REVERSE(), CASE...END, et c.) you might be able to do everything you want. This would require reading up on SQL and trying to reframe your goals in terms of set operations that you could do.
-If you are inserting records that are sourced from outside the database (like from a file) then I would organize your code like this
//select sth
//do sth with result
//second select sth
//do sth with second result
//prepare values from above results so that $rec info can be added in later
foreach($record as $rec) {
//construct a big insert statement
}
//insert the new records by running the big insert statement
//update table
The advantage here is that you are only hitting the db with a few queries, instead of a few queries per $rec so your performance will be better (since db calls have overhead). For 10M rows you may need to break the above up into a few chunks since there is a limit to how big a single insert can be (see max_allowed_packet). I would suggest breaking the 10M into 5K or 10K chunks by adding another loop around the above that partitions off the chunks from the 10M.
A clearer answer could have been given if you added details about your data source, what transformations you want to do on the data, what the purpose of the
//select sth
//do sth with result
//second select sth
//do sth with second result
section is (within the context of how it adds information to your insert statements later), and what the prepare values section of your code does.
In a site I maintain I have a need to query the same table (articles) twice, once for each category of article. AFAIT there are basically two ways of doing this (maybe someone can suggest a better, third way?):
Perform the db query twice, meaning the db server has to sort through the entire table twice. After each query, I iterate over the cursor to generate html for a list entry on the page.
Perform the query just once and pull out all the records, then sort them into two separate arrays. After this, I have to iterate over each array separately in order to generate the HTML.
So it's this:
$newsQuery = $mysqli->query("SELECT * FROM articles WHERE type='news' ");
while($newRow = $newsQuery->fetch_assoc()){
// generate article summary in html
}
// repeat for informational articles
vs this:
$query = $mysqli->query("SELECT * FROM articles ");
$news = Array();
$info = Array();
while($row = $query->fetch_assoc()){
if($row['type'] == "news"){
$news[] = $row;
}else{
$info[] = $row;
}
}
// iterate over each array separate to generate article summaries
The recordset is not very large, current <200 and will probably grow to 1000-2000. Is there a significant different in the times between the two approaches, and if so, which one is faster?
(I know this whole thing seems awfully inefficient, but it's a poorly coded site I inherited and have to take care of without a budget for refactoring the whole thing...)
I'm writing in PHP, no framework :( , on a MySql db.
Edit
I just realized I left out one major detail. On a given page in the site, we will display (and thus retrieve from the db) no more than 30 records at once - but here's the catch: 15 info articles, and 15 news articles. On each page we pull the next 15 of each kind.
You know you can sort in the DB right?
SELECT * FROM articles ORDER BY type
EDIT
Due to the change made to the question, I'm updating my answer to address the newly revealed requirement: 15 rows for 'news' and 15 rows for not-'news'.
The gist of the question is the same "which is faster... one query to two separate queries". The gist of the answer remains the same: each database roundtrip incurs overhead (extra time, especially over a network connection to a separate database server), so with all else being equal, reducing the number database roundtrips can improve performance.
The new requirement really doesn't impact that. What the newly revealed requirement really impacts is the actual query to return the specified resultset.
For example:
( SELECT n.*
FROM articles n
WHERE n.type='news'
LIMIT 15
)
UNION ALL
( SELECT o.*
FROM articles o
WHERE NOT (o.type<=>'news')
LIMIT 15
)
Running that statement as a single query is going to require fewer database resources, and be faster than running two separate statements, and retrieving two disparate resultsets.
We weren't provided any indication of what the other values for type can be, so the statement offered here simply addresses two general categories of rows: rows that have type='news', and all other rows that have some other value for type.
That query assumes that type allows for NULL values, and we want to return rows that have a NULL for type. If that's not the case, we can adjust the predicate to be just
WHERE o.type <> 'news'
Or, if there are specific values for type we're interested in, we can specify that in the predicate instead
WHERE o.type IN ('alert','info','weather')
If "paging" is a requirement... "next 15", the typical pattern we see applied, LIMIT 30,15 can be inefficient. But this question isn't asking about improving efficiency of "paging" queries, it's asking whether running a single statement or running two separate statements is faster.
And the answer to that question is still the same.
ORIGINAL ANSWER below
There's overhead for every database roundtrip. In terms of database performance, for small sets (like you describe) you're better off with a single database query.
The downside is that you're fetching all of those rows and materializing an array. (But, that looks like that's the approach you're using in either case.)
Given the choice between the two options you've shown, go with the single query. That's going to be faster.
As far as a different approach, it really depends on what you are doing with those arrays.
You could actually have the database return the rows in a specified sequence, using an ORDER BY clause.
To get all of the 'news' rows first, followed by everything that isn't 'news', you could
ORDER BY type<=>'news' DESC
That's MySQL short hand for the more ANSI standards compliant:
ORDER BY CASE WHEN t.type = 'news' THEN 1 ELSE 0 END DESC
Rather than fetch every single row and store it in an array, you could just fetch from the cursor as you output each row, e.g.
while($row = $query->fetch_assoc()) {
echo "<br>Title: " . htmlspecialchars($row['title']);
echo "<br>byline: " . htmlspecialchars($row['byline']);
echo "<hr>";
}
Best way of dealing with a situation like this is to test this for yourself. Doesn't matter how many records do you have at the moment. You can simulate whatever amount you'd like, that's never a problem. Also, 1000-2000 is really a small set of data.
I somewhat don't understand why you'd have to iterate over all the records twice. You should never retrieve all the records in a query either way, but only a small subset you need to be working with. In a typical site where you manage articles it's usually about 10 records per page MAX. No user will ever go through 2000 articles in a way you'd have to pull all the records at once. Utilize paging and smart querying.
// iterate over each array separate to generate article summaries
Not really what you mean by this, but something tells me this data should be stored in the database as well. I really hope you're not generating article excerpts on the fly for every page hit.
It all sounds to me more like a bad architecture design than anything else...
PS: I believe sorting/ordering/filtering of a database data should be done on the database server, not in the application itself. You may save some traffic by doing a single query, but it won't help much if you transfer too much data at once, that you won't be using anyway.
I have a database design here that looks this in simplified version:
Table building:
id
attribute1
attribute2
Data in there is like:
(1, 1, 1)
(2, 1, 2)
(3, 5, 4)
And the tables, attribute1_values and attribute2_values, structured as:
id
value
Which contains information like:
(1, "Textual description of option 1")
(2, "Textual description of option 2")
...
(6, "Textual description of option 6")
I am unsure whether this is the best setup or not, but it is done as such per requirements of my project manager. It definitely has some truth in it as you can modify the text easily now without messing op the id's.
However now I have come to a page where I need to list the attributes, so how do I go about there? I see two major options:
1) Make one big query which gathers all values from building and at the same time picks the correct textual representation from the attribute{x}_values table.
2) Make a small query that gathers all values from the building table. Then after that get the textual representation of each attribute one at a time.
What is the best option to pick? Is option 1 even faster as option 2 at all? If so, is it worth the extra trouble concerning maintenance?
Another suggestion would be to create a view on the server with only the data you need and query from that. That would keep the work on the server end, and you can pull just what you need each time.
If you have a small number of rows in attributes table, then I suggest to fetch them first, fetch all of them! store them into some array using id as index key in array.
Then you can proceed with building data, now you just have to use respective array to look for attribute value
I would recommend something in-between. Parse the result from the first table in php, and figure out how many attributes you need to select from each attribute[x]_values table.
You can then select attributes in bulk using one query per table, rather than one query per attribute, or one query per building.
Here is a PHP solution:
$query = "SELECT * FROM building";
$result = mysqli_query(connection,$query);
$query = "SELECT * FROM attribute1_values";
$result2 = mysqli_query(connection,$query);
$query = "SELECT * FROM attribute2_values";
$result3 = mysqli_query(connection,$query);
$n = mysqli_num_rows($result);
for($i = 1; $n <= $i; $i++) {
$row = mysqli_fetch_array($result);
mysqli_data_seek($result2,$row['attribute1']-1);
$row2 = mysqli_fetch_array($result2);
$row2['value'] //Use this as the value for attribute one of this object.
mysqli_data_seek($result3,$row['attribute2']-1);
$row3 = mysqli_fetch_array($result3);
$row3['value'] //Use this as the value for attribute one of this object.
}
Keep in mind that this solution requires that the tables attribute1_values and attribute2_values start at 1 and increase by 1 every single row.
Oracle / Postgres / MySql DBA here:
Running a query many times has quite a bit of overhead. There are multiple round trips to the db, and if it's on a remote server, this can add up. The DB will likely have to parse the same query multiple times in MySql which will be terribly inefficient if there are tons of rows. Now, one thing that your PHP method (multiple queries) has as an advantage is that it'll use less memory as it'll release the results as they're no longer needed (if you run the query as a nested loop that is, but if you query all the results up front, you'll have a lot of memory overhead, depending on the table sizes).
The optimal result would be to run it as 1 query, and fetch the results 1 at a time, displaying each one as needed and discarding it, which can reek havoc with MVC frameworks unless you're either comfortable running model code in your view, or run small view fragments.
Your question is very generic and i think that to get an answer you should give more hints to how this page will look like and how big the dataset is.
You will get all the buildings with theyr attributes or just one at time?
Cause your data structure look like very simple and anything more than a raspberrypi can handle it very good.
If you need one record at time you don't need any special technique, just JOIN the tables.
If you need to list all buildings and you want to save db time you have to measure your data.
If you have more attribute than buildings you have to choose one way, if you have 8 attributes and 2000 buildings you can think of caching attributes in an array with a select for each table and then just print them using the array. I don't think you will see any speed drop or improvement with so simple tables on a modern computer.
$att1[1]='description1'
$att1[2]='description2'
....
Never do one at a time queries, try to combine them into a single one.
MySQL will cache your query and it will run much faster. PhP loops are faster than doing many requests to the database.
The query cache stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.
http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
I have a photo gallery and want to update multiple captions at once via form input. I've tried to research this but I think I'm in way over my head. This is what I have so far but it's not working..
The data is saved in an SQL table called "gallery". An example row might look like:
gallery_id(key) = some number
product_id = 500
photo = photo.jpg
caption = 'look at this picture'
My form inputs are generated like this:
$sql = mysql_query("SELECT * FROM gallery WHERE product_id = 500");
while($row = mysql_fetch_array($sql)) {
$photo=$row['photo'];
$caption=$row['caption'];
echo '<img src="$photo"/>';
echo '<input name="cap['.$caption.']" id="cap['.$caption.']" value="'.$caption.'" />';
}
So once I submit the form I start to access my inputs like this but I hit a wall..
if( isset($_POST['cap']) && is_array($_POST['cap']) ) {
foreach($_POST['cap'] as $cap) {
mysql_query("UPDATE gallery
SET caption=$caption
WHERE ???????");
}
}
I don't know how to tell the database where to put these inputs and as far as I can tell you can't pass more than one variable in a foreach loop.
$_POST is an array, if im not wrong (I can't test it now and im new in PHP) you can do
foreach ($_POST as $p) {
$id=$p['id'];'
$cap=$p['caption'];
mysql_query("UPDATE gallery
SET caption=$cap
WHERE photoid=$id");
}
A few things to consider
Updating many rows is tricky. I see it's already answered so I'm not trying to give a better answer, but some non-trivials for those who find this thread by searching for update sql many rows or similar. It's a common scenario to update several rows, and I've seen many codes with the "one update a time" approach (described above) which can be quite slow. In many cases there are better ways. No one-fits-all solution here, so I tried to split up the possible techniques by what you have and where you're going:
Multiple rows, same dataIf you want to set the same caption for many rows, UPDATE ... WHERE ID IN (...) - it's an edge case but it gives you a noticeable performance boost
Multiple rows, different data, at onceWhen updating several rows, consider using a CASE structure, as explained in this article:
sql update multiple rows with case in php
TransactionsIf you're using InnoDB tables (which is quite likely), you may want to do the whole thing in one transaction. It's a lot faster.
Index updatesIf you're updating many-many rows (like thousands), it can make sense to disable keys before and enable them afterwards. This way SQL can save a lot of index updates while it happens, and only do it for the final result.
Delete + reinsertIf you're updating many fields per record, and there are no triggers and other magic consequences of deleting your rows, it can be faster to delete them WHERE ID IN(...) and then do a multi-insert. Don't do this if you're only updating fixed-size integers like counters.
Then again:
If your table has many concurrent reads & writes, and you're updating only a few (up to a hundred) rows, stick with the one-by-one approach. Especially if it's MyISAM.
Try before you buy; these techniques all depend on data itself too. But it's worth to go some extra miles to find the best.