Assuming I have the following 2 tables:
trips:
driverId (PK), dateTravelled, totalTravelDistance
farthest_trip (same structure)
driverId (PK), dateTravelled, totalTravelDistance
where only the most recent journey is stored:
REPLACE INTO JOURNEYS ('$driverId', '$totalTripDistance');
I want to store the farthest trip travelled for each driverId as well, but unfortunately you can't have a condition on an INSERT... ON DUPLICATE KEY UPDATE statement, or else I'd have a trigger such as:
INSERT INTO farthest_trip(driverId, dateTravelled, totalTravelDistance) ON DUPLICATE KEY UPDATE dateTravelled = new.dateTravelled, totalTravelDistance = new.totalTravelDistance WHERE totalTravelDistance < new.totalTravelDistance;
so what I do is after inserting into the first table, in PHP I check if the current distance is farther than the one previously recorded, and if so, then update farthest_journey. It feels like there should be a better way but I can't quite figure it out, so is there a more efficient way of doing this?
You could create a trigger. Something like
CREATE TRIGGER farhest_trigger BEFORE INSERT ON trips
FOR EACH ROW
BEGIN
UPDATE farthest_trips SET date=NEW.date, distance=NEW.distance WHERE driverId=NEW.driverId AND distance < NEW.distance;
END;
But then you would have code that gets executed "magically" from a PHP perspective.
I think the better solution would be appending new trips to the trips table and selecting the maximum and latest journey with SELECT statements.
Related
I'm trying to write a Laravel eloquent statement to do the following.
Query a table and get all the ID's of all the duplicate rows (or ideally all the IDs except the ID of the first instance of the duplicate).
Right now I have the following mysql statement:
select `codes`, count(`codes`) as `occurrences`, `customer_id` from `pizzas`
group by `codes`, `customer_id`
having `occurrences` > 1;
The duplicates are any row that shares a combination of codes and customer_id, example:
codes,customer_id
183665A4,3
183665A4,3
183665A4,3
183665A4,3
183665A4,3
I'm trying to delete all but 1 of those.
This is returning a set of the codes, with their occurrences and their customer_id, as I only want rows that have both.
Currently I think loop through this, and save the ID of the first instance, and then call this again and delete any without that ID. This seems not very fast, as there's about 50 million rows so each query takes forever and we have multiple queries for each duplicate to delete.
// get every order that shares the same code and customer ID
$orders = Order::select('id', 'codes', DB::raw('count(`codes`) as `occurrences`'), 'customer_id')
->groupBy('codes')
->groupBy('customer_id')
->having('occurrences', '>', 1)
->limit(100)
->get();
// loop through those orders
foreach ($orders as $order)
{
// find the first order that matches this duplicate set
$first_order = Order::where('codes', $order->codes)
->where('customer_id', $order->customer_id)
->first();
// delete all but the first
Order::where('codes', $order->codes)
->where('customer_id', $order->customer_id)
->where('id', '!=', $first_order->id)
->delete();
}
There has got to be a more efficient way to track down all rows that share the same code and customer_id, and delete all the duplicates but keep the first instance, right? lol
I'm thinking maybe if I can add a fake column to the results that is an array of every ID, I could at least then remove the first ID and delete the others.
Don't involve PHP
This seems not very fast
The logic in the question is inherently slow because it's lots of queries and for each query there's:
DB<->PHP network roundtrip
PHP ORM logic/overhead
Given the numbers in the question, the whole code needs calling up to 10k times (if there are exactly 2 occurrences for every one of those 2 million duplicate records), for arguments sake let's say there are 1k sets of duplicates, overall that's:
1,000 queries finding duplicates
100,000 queries finding the first record
100,000 delete queries
201,000 queries is a lot and the php overhead makes it an order of magnitude slower (a guess, based on experience).
Do it directly on the DB
Just eliminating php/orm/network (even if it's on the same machine) time would make the process markedly faster, that would involve writing a procedure to mimic the php logic in the question.
But there's a simpler way, the specifics depend on the circumstances. In comments you've said:
The table is 140GB in size
It contains 50 million rows
Approx 2 million are duplicate records
There isn't enough free space to make a copy of the table
Taking these comments at face value the process I suggest is:
Ensure you have a functional DB backup
Before doing anything make sure you have a functional DB backup. If you manage to make a mistake and e.g. drop the table - be sure you can recover without loss of data.
You'll be testing this process on a copy of the database first anyway, right :) ?
Create a table of "ids to keep" and populate it
This is a permutation of removing duplicate with a unique index:
CREATE TABLE ids_to_keep (
id INT PRIMARY KEY,
codes VARCHAR(50) NOT NULL, # use same schema as source table
customer_id INT NOT NULL, # use same schema as source table
UNIQUE KEY derp (codes,customer_id)
);
INSERT IGNORE INTO ids_to_keep
SELECT id, codes, customer_id from pizzas;
Mysql will silently drop the rows conflicting with the unique index, resulting in a table with one id per codes+customer_id tuple.
If you don't have space for this table - make room :). It shouldn't be too large; 140GB and 50M rows means each row is approx 3kb - this temporary table will likely require single-digit % of the original size.
Delete the duplicate records
Before executing any expected-to-be-slow query use EXPLAIN to check if the query will complete in a reasonable amount of time.
To run as a single query:
DELETE FROM
pizzas
WHERE
id NOT IN (SELECT id from ids_to_keep);
If you wish to do things in chunks:
DELETE FROM
pizzas
WHERE
id BETWEEN (0,10000) AND
id NOT IN (SELECT id from ids_to_keep);
Cleanup
Once the table isn't needed any more, get rid of it:
DROP TABLE ids_to_keep;
Make sure this doesn't happen again
To prevent this happening again, add a unique index to the table:
CREATE UNIQUE INDEX ON pizzas(codes, customer_id);
Try this one it will keep only the duplicate and non-duplicate id lastest id:
$deleteDuplicates = DB::table('orders as ord1')
->join('orders as ord2', 'ord1.codes', '<', 'ord2.codes')
->where('ord1.codes', '=', 'ord2.codes') ->delete();
We use (INSERT INTO) to insert a record in the table which creates more than one record when used again. Is there any way to add a record and alternately replacing the prevoius one without adding any new record.
I know this would work:
UPDATE Customers
SET ContactName='Alfred Schmidt', City='Hamburg'
WHERE CustomerName='Alfreds Futterkiste';
But what if there is no condition ie. we don't know the record, we only know the column name. Is there any way to fill only one record and alternately replace the previous record without creating 2nd record?
OK... updating if a record exists or creating a record if there are zero records is a pretty simple matter and you have a solution for it. That having been said, I would do something different and keep track of my message of the day by date:
-- This is REALLY BASIC, but, just to give you the idea...
CREATE TABLE [dbo].[MessageOfTheDay](
[MessageDate] [date] not null,
[MessageContents] [nvarchar](500) not null,
UNIQUE (MessageDate)
)
declare #MessageContents nvarchar(500), #MessageDate date
set #MessageContents = 'This is the new MOTD!!!'
set #MessageDate = GETDATE()
-- Every day, create a new record and you can keep track of previous MOTD entries...
insert into MessageOfTheDay(MessageDate, MessageContents)
values (#MessageDate, #MessageContents)
-- Get the message for today
select MessageContents from MessageOfTheDay where MessageDate = #MessageDate
-- If you want, you can now create messages for FUTURE days as well:
set #MessageContents = 'This is tomorrow''s MOTD!!!';
set #MessageDate = dateadd(D, 1,GETDATE())
insert into MessageOfTheDay(MessageDate, MessageContents)
values (#MessageDate, #MessageContents)
-- Get tomorrow's message
select MessageContents from MessageOfTheDay where MessageDate = #MessageDate
-- If you aren't necessarily going to have one per day and want to always just show the most recent entry
select MessageContents from MessageOfTheDay order by MessageDate desc limit 1
Anyway, that's just my $.02. At some point I bet you will want to look over the history of your MOTD and when you do, you will be happy that you have that history. Plus, this more accurately models the data you are trying to represent.
I got my answer and It's working now!
I used:
INSERT INTO data (a, b, c)
VALUES
('1','2','3')
ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b)
I have the following call to my database to retrieve the last row ID from an AUTO_INCREMENT column, which I use to find the next row ID:
$result = $mysqli->query("SELECT articleid FROM article WHERE articleid=(SELECT MAX(articleid) FROM article)");
$row = $result->fetch_assoc();
$last_article_id = $row["articleid"];
$last_article_id = $last_article_id + 1;
$result->close();
I then use $last_article_id as part of a filename system.
This is working perfectly....until I delete a row meaning the call retrieves an ID further down the order than the one I want.
A example would be:
ID
0
1
2
3
4-(deleted row)
5-(deleted row)
6-(next ID to be used for INSERT call)
I'd like the filename to be something like 6-0.jpg, however the filename ends up being 4-0.jpg as it targets ID 3 + 1 etc...etc...
Any thoughts on how I get the next MySQL row ID when any number of previous rows have been deleted??
You are making a significant error by trying to predict the next auto-increment value. You do not have a choice, if you want your system to scale... you have to either insert the row first, or rename the file later.
This is a classic oversight I see developers make -- you are coding this as if there would only ever be a single user on your site. It is extremely likely that at some point two articles will be created at almost the same time. Both queries will "predict" the same id, both will use the same filename, and one of the files will disappear, one of the table entries may point to the wrong file, and the other entry will reference a file that does not exist. And you'll be scratching your head asking "how did this happen?!"
Predicting auto-increment values is bad practice. Don't do it. Plan for concurrency.
Also, the information_schema tables are not really tables... they are server internals exposed to the SQL interface. Calls to the "tables" table, and show table status are expensive calls that you do not want to make in production... so don't be tempted to use something you find there.
You can use mysql_insert_id() after you insert the new row to retrieve the new key:
$mysqli->query($yourQueryHere);
$newId = $mysqli->insert_id();
That requires the id field to be a primary key, though (I believe).
As for the filename, you could store it in a variable, then do the query, then change the name and then write the file.
I am having a wee problem, and I am sure there is a more convenient/simpler way to achieve the solution, but all searches are throw in up a blanks at the moment !
I have a mysql db that is regularly updated by php page [ via a cron job ] this adds or deletes entries as appropriate.
My issue is that I also need to check if any details [ie the phone number or similar] for the entry have changed, but doing this at every call is not possible [ not only does is seem to me to be overkill, but I am restricted by a 3rd party api call limit] Plus this is not critical info.
So I was thinking it might be best to just check one entry per page call, and iterate through the rows/entires with each successive page call.
What would be the best way of doing this, ie keeping track of which entry/row in the table that the should be checked next?
I have 2 ideas of how to implement this:
1 ) The id of current row could be save to a file on the server [ surely not the best way]
2) an extra boolean field [check] is add to the table, set to True on the first entry and false to all other.
Then on each page call it;
finds 'where check = TRUE'
runs the update check on this row,
'set check = FALSE'
'set [the next row] check = TRUE'
Si this the best way to do this, or does anyone have any better sugestion ?
thanks in advance !
.k
PS sorry about the title
Not sure if this is a good solution, but if I have to make nightly massive updates, I'll write the updates to a new blank table, then do a SQL select to join the tables and tell me where they are different, then do another SQL UPDATE like
UPDATE table, temptable
SET table.col1=temptable.col1, table.col2=temptable.col2 ......
WHERE table.id = temptable.id;
You can store the timestamp that a row is updated implicitly using ON UPDATE CURRENT_TIMESTAMP [http://dev.mysql.com/doc/refman/5.0/en/timestamp.html] or explicitly in your update SQL. Then all you need to do is select the row(s) with the lowest timestamp (using ORDER BY and LIMIT) and you have the next row to process. So long as you ensure that the timestamp is updated each time.
e.g. Say you used the field last_polled_on TIMESTAMP to store the time you polled a row.
Your insert looks like:
INSERT INTO table (..., last_polled_on) VALUES (..., NOW());
Your update looks like:
UPDATE table SET ..., last_polled_on = NOW() WHERE ...;
And your select for the next row to poll looks like:
SELECT ... FROM table ORDER BY last_polled_on LIMIT 1;
Hey, I have a table with "id", "name", and "weight" columns. Weight is an unsigned small int.
I have a page that displays items ordered by "weight ASC". It'll use drag-n-drop, and once the order is changed, will pass out a comma-separated string of ids (in the new order).
Let's say there's 10 items in that table. Here's what I have so far:
Sample input:
5,6,2,9,10,4,8,1,3,7
Sample PHP handler (error handlers & security stuff excluded):
<?php
$weight = 0;
$id_array = explode(',', $id_string);
foreach ($id_array as $key => $val)
{
mysql_query("UPDATE tbl SET weight = '$weight' where id = '$val' LIMIT 1");
$weight++;
}
?>
When I make a change to column order, will my script need to make 10 separate UPDATE queries, or is there a better way?
You could create a temporary table with the new data in it (i.e., id and weight are the columns), then update the table with this data.
create temporary table t (id int, weight float);
insert into t(id, weight) values (1, 1.0), (2, 27), etc
update tbl inner join t on t.id = tbl.id
set tbl.weight = t.weight;
So, you have one create statement, one insert statement, and one update statement.
You can only specify one where clause in a single query -- which means, in your case, that you can only update one row at a time.
With 10 items, I don't know if I would go through that kind of troubles (it means re-writing some code -- even if that's not that hard), but, for more, a solution would be to :
delete all the rows
inserts them all back
doing all that in a transaction, of course.
The nice point is that you can do several inserts in a single query ; don't know for 10 items, but for 25 or 50, it might be quite nice.
Here is an example, from the insert page of the MySQL manual (quoting) :
INSERT statements that use VALUES
syntax can insert multiple rows. To do
this, include multiple lists of column
values, each enclosed within
parentheses and separated by commas.
Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
Of course, you should probably not insert "too many" items in a single insert query -- an insert per 50 items might be OK, though (to find the "right" number of items, you'll have to benchmark, I suppose ^^ )
Yes, you would need to do 10 updates. There are ways to batch up multiple queries in a single call to mysql_query, but it's probably best to avoid that.
If it's performance you are worried about, make sure you try it first before worrying about that. I suspect that doing 10 (or even 20 or 30) updates will be plenty fast.
10 updates is the simplest way conceptually. if you've got a bazillion rows that need to be updated, then you might have to try something different, such as creating a temporary table and using a JOIN in your UPDATE statement or a subquery with a row constructor.
Store the records in a temp table with batch insert and delete the records from the tbl and then from temp table do batch insert in tbl