Hey, I have a table with "id", "name", and "weight" columns. Weight is an unsigned small int.
I have a page that displays items ordered by "weight ASC". It'll use drag-n-drop, and once the order is changed, will pass out a comma-separated string of ids (in the new order).
Let's say there's 10 items in that table. Here's what I have so far:
Sample input:
5,6,2,9,10,4,8,1,3,7
Sample PHP handler (error handlers & security stuff excluded):
<?php
$weight = 0;
$id_array = explode(',', $id_string);
foreach ($id_array as $key => $val)
{
mysql_query("UPDATE tbl SET weight = '$weight' where id = '$val' LIMIT 1");
$weight++;
}
?>
When I make a change to column order, will my script need to make 10 separate UPDATE queries, or is there a better way?
You could create a temporary table with the new data in it (i.e., id and weight are the columns), then update the table with this data.
create temporary table t (id int, weight float);
insert into t(id, weight) values (1, 1.0), (2, 27), etc
update tbl inner join t on t.id = tbl.id
set tbl.weight = t.weight;
So, you have one create statement, one insert statement, and one update statement.
You can only specify one where clause in a single query -- which means, in your case, that you can only update one row at a time.
With 10 items, I don't know if I would go through that kind of troubles (it means re-writing some code -- even if that's not that hard), but, for more, a solution would be to :
delete all the rows
inserts them all back
doing all that in a transaction, of course.
The nice point is that you can do several inserts in a single query ; don't know for 10 items, but for 25 or 50, it might be quite nice.
Here is an example, from the insert page of the MySQL manual (quoting) :
INSERT statements that use VALUES
syntax can insert multiple rows. To do
this, include multiple lists of column
values, each enclosed within
parentheses and separated by commas.
Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
Of course, you should probably not insert "too many" items in a single insert query -- an insert per 50 items might be OK, though (to find the "right" number of items, you'll have to benchmark, I suppose ^^ )
Yes, you would need to do 10 updates. There are ways to batch up multiple queries in a single call to mysql_query, but it's probably best to avoid that.
If it's performance you are worried about, make sure you try it first before worrying about that. I suspect that doing 10 (or even 20 or 30) updates will be plenty fast.
10 updates is the simplest way conceptually. if you've got a bazillion rows that need to be updated, then you might have to try something different, such as creating a temporary table and using a JOIN in your UPDATE statement or a subquery with a row constructor.
Store the records in a temp table with batch insert and delete the records from the tbl and then from temp table do batch insert in tbl
Related
I'm trying to write a Laravel eloquent statement to do the following.
Query a table and get all the ID's of all the duplicate rows (or ideally all the IDs except the ID of the first instance of the duplicate).
Right now I have the following mysql statement:
select `codes`, count(`codes`) as `occurrences`, `customer_id` from `pizzas`
group by `codes`, `customer_id`
having `occurrences` > 1;
The duplicates are any row that shares a combination of codes and customer_id, example:
codes,customer_id
183665A4,3
183665A4,3
183665A4,3
183665A4,3
183665A4,3
I'm trying to delete all but 1 of those.
This is returning a set of the codes, with their occurrences and their customer_id, as I only want rows that have both.
Currently I think loop through this, and save the ID of the first instance, and then call this again and delete any without that ID. This seems not very fast, as there's about 50 million rows so each query takes forever and we have multiple queries for each duplicate to delete.
// get every order that shares the same code and customer ID
$orders = Order::select('id', 'codes', DB::raw('count(`codes`) as `occurrences`'), 'customer_id')
->groupBy('codes')
->groupBy('customer_id')
->having('occurrences', '>', 1)
->limit(100)
->get();
// loop through those orders
foreach ($orders as $order)
{
// find the first order that matches this duplicate set
$first_order = Order::where('codes', $order->codes)
->where('customer_id', $order->customer_id)
->first();
// delete all but the first
Order::where('codes', $order->codes)
->where('customer_id', $order->customer_id)
->where('id', '!=', $first_order->id)
->delete();
}
There has got to be a more efficient way to track down all rows that share the same code and customer_id, and delete all the duplicates but keep the first instance, right? lol
I'm thinking maybe if I can add a fake column to the results that is an array of every ID, I could at least then remove the first ID and delete the others.
Don't involve PHP
This seems not very fast
The logic in the question is inherently slow because it's lots of queries and for each query there's:
DB<->PHP network roundtrip
PHP ORM logic/overhead
Given the numbers in the question, the whole code needs calling up to 10k times (if there are exactly 2 occurrences for every one of those 2 million duplicate records), for arguments sake let's say there are 1k sets of duplicates, overall that's:
1,000 queries finding duplicates
100,000 queries finding the first record
100,000 delete queries
201,000 queries is a lot and the php overhead makes it an order of magnitude slower (a guess, based on experience).
Do it directly on the DB
Just eliminating php/orm/network (even if it's on the same machine) time would make the process markedly faster, that would involve writing a procedure to mimic the php logic in the question.
But there's a simpler way, the specifics depend on the circumstances. In comments you've said:
The table is 140GB in size
It contains 50 million rows
Approx 2 million are duplicate records
There isn't enough free space to make a copy of the table
Taking these comments at face value the process I suggest is:
Ensure you have a functional DB backup
Before doing anything make sure you have a functional DB backup. If you manage to make a mistake and e.g. drop the table - be sure you can recover without loss of data.
You'll be testing this process on a copy of the database first anyway, right :) ?
Create a table of "ids to keep" and populate it
This is a permutation of removing duplicate with a unique index:
CREATE TABLE ids_to_keep (
id INT PRIMARY KEY,
codes VARCHAR(50) NOT NULL, # use same schema as source table
customer_id INT NOT NULL, # use same schema as source table
UNIQUE KEY derp (codes,customer_id)
);
INSERT IGNORE INTO ids_to_keep
SELECT id, codes, customer_id from pizzas;
Mysql will silently drop the rows conflicting with the unique index, resulting in a table with one id per codes+customer_id tuple.
If you don't have space for this table - make room :). It shouldn't be too large; 140GB and 50M rows means each row is approx 3kb - this temporary table will likely require single-digit % of the original size.
Delete the duplicate records
Before executing any expected-to-be-slow query use EXPLAIN to check if the query will complete in a reasonable amount of time.
To run as a single query:
DELETE FROM
pizzas
WHERE
id NOT IN (SELECT id from ids_to_keep);
If you wish to do things in chunks:
DELETE FROM
pizzas
WHERE
id BETWEEN (0,10000) AND
id NOT IN (SELECT id from ids_to_keep);
Cleanup
Once the table isn't needed any more, get rid of it:
DROP TABLE ids_to_keep;
Make sure this doesn't happen again
To prevent this happening again, add a unique index to the table:
CREATE UNIQUE INDEX ON pizzas(codes, customer_id);
Try this one it will keep only the duplicate and non-duplicate id lastest id:
$deleteDuplicates = DB::table('orders as ord1')
->join('orders as ord2', 'ord1.codes', '<', 'ord2.codes')
->where('ord1.codes', '=', 'ord2.codes') ->delete();
Okay, I am trying to update two tables without using PHP and querying a loop.
Table one: users
Table two: traits
BOTH tables have a matching row "ID" (so ID 1 in "users" is also ID 1 in "traits").
TABLE 1 has two rows that need updating: "HP" and "EXP".
TABLE 2 has one row: "STUFF".
I need a simple query to update HP and EXP ONLY if STUFF = 0.
So something like:
UPDATE users,traits
SET
traits.hp = 3,
traits.exp = 10
WHERE
traits.hp < traits.maxhp
AND users.stuff = 0;
This query seems to work, but it is very slow. Is there a better way?
Thank you!
-Josh
Depending on the table size, I would recommend creating a couple indexes on those table columns (traits.hp, traints.maxhp and users.stuff) to help keep the query quick.
Also, make sure that your traits.hp and traits.maxhp are set as some sort of numerical (eg. INT) type, otherwise the server will need to try and convert it on-the-fly, which could slow things down as well.
Assuming I have the following 2 tables:
trips:
driverId (PK), dateTravelled, totalTravelDistance
farthest_trip (same structure)
driverId (PK), dateTravelled, totalTravelDistance
where only the most recent journey is stored:
REPLACE INTO JOURNEYS ('$driverId', '$totalTripDistance');
I want to store the farthest trip travelled for each driverId as well, but unfortunately you can't have a condition on an INSERT... ON DUPLICATE KEY UPDATE statement, or else I'd have a trigger such as:
INSERT INTO farthest_trip(driverId, dateTravelled, totalTravelDistance) ON DUPLICATE KEY UPDATE dateTravelled = new.dateTravelled, totalTravelDistance = new.totalTravelDistance WHERE totalTravelDistance < new.totalTravelDistance;
so what I do is after inserting into the first table, in PHP I check if the current distance is farther than the one previously recorded, and if so, then update farthest_journey. It feels like there should be a better way but I can't quite figure it out, so is there a more efficient way of doing this?
You could create a trigger. Something like
CREATE TRIGGER farhest_trigger BEFORE INSERT ON trips
FOR EACH ROW
BEGIN
UPDATE farthest_trips SET date=NEW.date, distance=NEW.distance WHERE driverId=NEW.driverId AND distance < NEW.distance;
END;
But then you would have code that gets executed "magically" from a PHP perspective.
I think the better solution would be appending new trips to the trips table and selecting the maximum and latest journey with SELECT statements.
I am wondering the best format to lay out my data in a mySQL table so that it can be queried in the fastest manner to gather an array of daily values to be further utilized by php.
So far, I have laid out the table as such:
item_id price_date price_amount
1 2000-03-01 22.4
2 2000-03-01 19.23
3 2000-03-01 13.4
4 2000-03-01 14.95
1 2000-03-02 22.5
2 2000-03-02 19.42
3 2000-03-02 13.4
4 2000-03-02 13.95
with item_id defined as an index.
Also, I am using:
"SELECT DISTINCT price_date FROM table_name"
to get an array containing a unique list of dates.
Furthermore, the part of the code that is within a loop (and the focus of my optimization question) is currently written as:
"SELECT price_amount FROM table_name WHERE item_id = 1 ORDER BY price_date"
This second "SELECT" statement is actually within a loop where I am selecting/storing-in-array the daily prices of each item_id requested.
All is currently functioning and pulling the data from mySQL properly, however, both the above listed "SELECT" statements are taking approx 4-5 seconds to complete per each run, and when looping through 100+ products to create a summary, adds up to a very inefficient/slow information system.
Is there any more-efficient way that I could structure the mySQL table and/or SELECT statements to retrieve the results faster? Perhaps defining a different index on a different column? I have used the EXPLAIN command to return information per the queries but am unsure how to use the EXPLAIN information to increase the efficiency of my queries.
Thanks in advance for any mySQL wizards that may be able to assist.
Single column index
I am using:
"SELECT DISTINCT price_date FROM table_name"
to get an array containing a unique list of dates.
This query can be executed more efficiently if you create an index for the price_date column:
ALTER TABLE table_name ADD INDEX price_idx (price_date);
Mutiple column index
Furthermore, the part of the code that is within a loop (and the focus of my optimization question) is currently written as:
"SELECT price_amount FROM table_name WHERE item_id = 1 ORDER BY price_date"
For the second query, you should create an index covering both the item_id and price_date column:
ALTER TABLE table_name ADD INDEX item_price_idx (item_id, price_date);
I know this is a bit late, but i stumbled across this and thought I would throw my thoughts into the mix.
Indexes used well are very helpful in speeding up queries (Explain shows some really godd results around which indexes are being chosen - if any - for a specific query). However efficient PHP will help even more.
In your case you do not show the PHP, but it looks like you offer a list of dates and then loop through finding all the items in that date to get the prices. It would be more efficient to do something like the following:
Select item_id, price_amount from table_name where price_date= order by item_id, price_amount
with an index (preferably a Unique Index) on price_date,item_id,price_amount
You then have a single loop through the resultant SQL not a loop with multiple SQL connections (this is especially true if your SQL server is separate from the PHP box as an external network connection can have an overhead).
4-5 seconds for a single query though is very slow )by a factor of at least 100x) so it would indicate a problem (very large table with no key to use) or disk issues (potentially).
I have a query and a loop written that lists all the rows from a mysql table ("records") formatted in a HTML table.
One of the fields is "subject_id" which is an integer. I also have a table named "subjects" which corresponds to the same "subject_id" in the records table. Only two fields are in the "subjects" table, is an auto-index ID integer and a varchar(1024) for the subject title.
The value that returns from the "records" table is an integer. I want to perform a lookup on the integer from the "records" table for each row to output the text equivalent from the "subject_id" field.
My first notion, the kindergarten way, would be to throw in another query within the loop, effectively increasing my number of queries from 300 to 600 to load the page (no pagination is needed).
What would be a better way of this "sub query" aside from adding 300 more queries?
Edit:
I'm pulling the data from the table using a while loop and echoing my variable $r_subject (from: $r_subject = mysql_result($result,$a,"subject");). The value returned from the initial records table is INT. I want to take the $r_subject and then check it against the SUBJECTS table to get the string name associated with that INT id.
It's hard to know exactly what you need without seeing code, but from what I gather, you have 2 tables, one has the ID, the other has the text, so you would want to use a join.
The second thing is, you'll want to look at whether or not you really need 300 queries in the first place. That's a lot of queries to run and you should only need to run that many queries when you're running a bulk insert/update or something of that nature; other than that, you most likely could reduce that number substantially.
select
A.*,
B.title
from
records A,
subjects B
where
B.subject_id = A.subject_id
That's a single query that will produce all of the data you need for your page.
select
subjects.SubjectTitle,
records.whateverFieldYouWant,
records.AnyOtherField
from
records
join subjects
on records.subject_id = subjects.subject_id
where
records.subject_id = TheOneSubjectYouWant
but can't confirm without actual structure of tables and some sample data displayed showing proper context of what you are expecting out