i have a piece of code here
$shorturls = ShortUrl::withCount("clicks")->where([["url_id", "=", $id->id], ["user_id", "=", Auth::id()]])
->orderBy("clicks_count", "desc")
->paginate(10);
this query runs in 6000ms (1 million rows of data)
when i comment orderBy it will around 300-500ms
(Shorturl model has many clicks)
i want a way to have a field in my short url named clicks_count to make this query faster.
Since you order by the clicks_count, mysql has to count the clicks for all rows in ShortUrl (1 million) before it can order.
Not just the 10 paginated.
You could:
Make sure that the ShortUrl<->clicks relationsship has correct indexes in the db. By looking at the query I would guess the field in the clicks table that should be indexed would be named "url_id".
Even though it is indexed it could still take some time. So another ide could be to denormalize the count, and then for each click, you increment a field in the short_urls table. That way it should not have to count on read.
If that not helps, then please provide your table structure including indexes for both short_urls and clicks tables.
Related
I'm trying to write a Laravel eloquent statement to do the following.
Query a table and get all the ID's of all the duplicate rows (or ideally all the IDs except the ID of the first instance of the duplicate).
Right now I have the following mysql statement:
select `codes`, count(`codes`) as `occurrences`, `customer_id` from `pizzas`
group by `codes`, `customer_id`
having `occurrences` > 1;
The duplicates are any row that shares a combination of codes and customer_id, example:
codes,customer_id
183665A4,3
183665A4,3
183665A4,3
183665A4,3
183665A4,3
I'm trying to delete all but 1 of those.
This is returning a set of the codes, with their occurrences and their customer_id, as I only want rows that have both.
Currently I think loop through this, and save the ID of the first instance, and then call this again and delete any without that ID. This seems not very fast, as there's about 50 million rows so each query takes forever and we have multiple queries for each duplicate to delete.
// get every order that shares the same code and customer ID
$orders = Order::select('id', 'codes', DB::raw('count(`codes`) as `occurrences`'), 'customer_id')
->groupBy('codes')
->groupBy('customer_id')
->having('occurrences', '>', 1)
->limit(100)
->get();
// loop through those orders
foreach ($orders as $order)
{
// find the first order that matches this duplicate set
$first_order = Order::where('codes', $order->codes)
->where('customer_id', $order->customer_id)
->first();
// delete all but the first
Order::where('codes', $order->codes)
->where('customer_id', $order->customer_id)
->where('id', '!=', $first_order->id)
->delete();
}
There has got to be a more efficient way to track down all rows that share the same code and customer_id, and delete all the duplicates but keep the first instance, right? lol
I'm thinking maybe if I can add a fake column to the results that is an array of every ID, I could at least then remove the first ID and delete the others.
Don't involve PHP
This seems not very fast
The logic in the question is inherently slow because it's lots of queries and for each query there's:
DB<->PHP network roundtrip
PHP ORM logic/overhead
Given the numbers in the question, the whole code needs calling up to 10k times (if there are exactly 2 occurrences for every one of those 2 million duplicate records), for arguments sake let's say there are 1k sets of duplicates, overall that's:
1,000 queries finding duplicates
100,000 queries finding the first record
100,000 delete queries
201,000 queries is a lot and the php overhead makes it an order of magnitude slower (a guess, based on experience).
Do it directly on the DB
Just eliminating php/orm/network (even if it's on the same machine) time would make the process markedly faster, that would involve writing a procedure to mimic the php logic in the question.
But there's a simpler way, the specifics depend on the circumstances. In comments you've said:
The table is 140GB in size
It contains 50 million rows
Approx 2 million are duplicate records
There isn't enough free space to make a copy of the table
Taking these comments at face value the process I suggest is:
Ensure you have a functional DB backup
Before doing anything make sure you have a functional DB backup. If you manage to make a mistake and e.g. drop the table - be sure you can recover without loss of data.
You'll be testing this process on a copy of the database first anyway, right :) ?
Create a table of "ids to keep" and populate it
This is a permutation of removing duplicate with a unique index:
CREATE TABLE ids_to_keep (
id INT PRIMARY KEY,
codes VARCHAR(50) NOT NULL, # use same schema as source table
customer_id INT NOT NULL, # use same schema as source table
UNIQUE KEY derp (codes,customer_id)
);
INSERT IGNORE INTO ids_to_keep
SELECT id, codes, customer_id from pizzas;
Mysql will silently drop the rows conflicting with the unique index, resulting in a table with one id per codes+customer_id tuple.
If you don't have space for this table - make room :). It shouldn't be too large; 140GB and 50M rows means each row is approx 3kb - this temporary table will likely require single-digit % of the original size.
Delete the duplicate records
Before executing any expected-to-be-slow query use EXPLAIN to check if the query will complete in a reasonable amount of time.
To run as a single query:
DELETE FROM
pizzas
WHERE
id NOT IN (SELECT id from ids_to_keep);
If you wish to do things in chunks:
DELETE FROM
pizzas
WHERE
id BETWEEN (0,10000) AND
id NOT IN (SELECT id from ids_to_keep);
Cleanup
Once the table isn't needed any more, get rid of it:
DROP TABLE ids_to_keep;
Make sure this doesn't happen again
To prevent this happening again, add a unique index to the table:
CREATE UNIQUE INDEX ON pizzas(codes, customer_id);
Try this one it will keep only the duplicate and non-duplicate id lastest id:
$deleteDuplicates = DB::table('orders as ord1')
->join('orders as ord2', 'ord1.codes', '<', 'ord2.codes')
->where('ord1.codes', '=', 'ord2.codes') ->delete();
So this one has kinda given me a mental run around. I have Rows, Users, and RowActivity models. Each time a user interacts with a row, it generates a single line in the row activity table saying what status the user changed. Each row can have 3-8 activity entries with a single user reference per line. So my question is this: I want to analyze the rows that particular users have interacted with. So I'd like to search for user A and get all the rows they've touched. But the only way to do that is to query the RowActivity table.
So the query would essentially be this:
Row::whereHas('rowActivity')->whereWithin(Collection of RowActivity, user_id = requested-user)->get();
I know that I can go the long way and query the RowActivity::where('user_id', $requestedUser) and then get all the other row activities based on the related row_id of that those results, but I feel there's a clean way that I can't figure out.
Just for clarification, the row activities are used to generate reports about which users changed the status of the row and how long the duration between the changes are. So I need to get all the activities associated with a row as well as all the rows that a user has touched so that I can analyze how long their portion of that interaction took.
If I need to do this as a multilevel query, so be it, I don't have an aversion to that, I just want to make sure my queries are pristine. If I did it as multi-level, it would look something like this:
$ra = RowActivity::where('user_id', $requestedUser)->pluck('row_id');
$rows = Row::with('rowActivity')->find($ra); //get all rows and their associated activities based on the plucked row id from query 1
You can do that within whereHas(). I hope that's what you want
Row::whereHas('rowActivity', function($query) use($requestedUser) {
$query->where('user_id', $requestedUser);
})->get();
I have two table both contains data around 200,000. I have write a query like below to retrieve data using some joins.
This is the query I have tried
$dbYTD = DB::table('stdtsum as a')
->join(DB::raw("(select distinct s_id, c_cod, compid from stdcus) b"), function($join){
$join->on('a.compid', '=', 'b.compid')->on('a.c_cod', '=', 'b.c_cod');
})
->select('b.s_id', DB::raw('sum(turnover) as sumturn'))
->whereBetween('date', [$startYTD, $endYTD])
->groupBy('b.s_id')
->get()
->toArray();
This query is giving result correct but the process time it takes is very long, sometimes it even timesout.
Can anybody help me how can I optimize this query?
You'll need to index all your columns on which join condition is applied. In your case : "compid" , "c_cod" in both your tables.
Generally, "Primary Key constraint" columns are automatically indexed in databases, though you'll have to index the "Foreign Key constraint" columns manually.
Some Indexing Tips :
Create an index on the field that has the biggest variety of values first. This will result in the “most bang for your buck”, so to speak.
Keep indexes small. It's better to have an index on just zip code or postal code, rather than postal code & country. The smaller the index, the better the response time.
For high frequency functions (thousands of times per day) it can be wise to have a very large index, so the system does not even need the table for the read function.
For small tables an index may be disadvantageous. The same can also be said for any function where the system would be better off by scanning the whole table.
Remember that an index slows down additions, modifications and deletes because the indexes need to be updated whenever the table is. So, the best practice is to add an index for values that are often used for a search, but that do not change much. As such, an index on a bank account number is better than one on the balance.
Tips reference : https://www.databasejournal.com/features/mysql/article.php/3840606/Maximizing-Query-Performance-through-Column-Indexing-in-MySQL.htm
I have three tables.
Radar data table (with id as primary), also has two columns of violation_file_id, and violation_speed_id.
Violation_speed table (with id as primary)
violation_file table (with id as primary)
I want to select all radar data, limited by 1000, from some start interval to an end interval, joins with violation_speed table. Each radar data must have a violation_speed_id.
I want to then join with the violation_file table, but not each radar records corresponding to violation_file_id, some records just has violation_file_id of 0, means there's no curresponding file.
My current sql is like this,
$results = DB::table('radar_data')
->join('violation_speed', 'radar_data.violation_speed_id', '=', 'violation_speed.id')
->leftjoin('violation_video_file', 'radar_data.violation_video_file_id', '=', 'violation_video_file.id')
->select('radar_data.id as radar_id',
'radar_data.violation_video_file_id',
'radar_data.violation_speed_id',
'radar_data.speed',
'radar_data.unit',
'radar_data.violate',
'radar_data.created_at',
'violation_speed.violation_speed',
'violation_speed.unit as violation_unit',
'violation_video_file.video_saved',
'violation_video_file.video_deleted',
'violation_video_file.video_uploaded',
'violation_video_file.path',
'violation_video_file.video_name')
->where('radar_data.violate', '=', '1')
->orderBy('radar_data.id', 'desc')
->offset($from_id)
->take($max_length)
->get();
It is PHP Laravel. But I think the translation to mysql statement is straight away.
My question is, is it a good way to select data like this? I tried but it seems a bit slow if the radar data grows to a large value.
Thanks.
Assuming you have the proper indices set this is largely the way to go, the only thing that's not 100% clear to me is what the offset() method does, but if it simply adds a WHERE clause than this should give you pretty much the best performance you're going to get. If not, replace it with a where('radar_data.id', '>', $from_id)
The most important indices are the ones on the foreign keys and primary keys here. And make sure not to forget the violate index.
The speed of the query often relies on the use of proper indexing on the joining clause and where clause used.
In your query there are 2 joins and if the joining keys are not indexed then you might need to apply the following
alter table radar_data add index violation_speed_id_idx(violation_speed_id);
alter table radar_data add index violation_video_file_id_idx(violation_video_file_id);
alter table radar_data add index violate_idx(violate);
The ids are primary key hence they are already indexed and should be covered
I have a table in MySQL that I'm accessing from PHP. For example, let's have a table named THINGS:
things.ID - int primary key
things.name - varchar
things.owner_ID - int for joining with another table
My select statement to get what I need might look like:
SELECT * FROM things WHERE owner_ID = 99;
Pretty straightforward. Now, I'd like users to be able to specify a completely arbitrary order for the items returned from this query. The list will be displayed, they can then click an "up" or "down" button next to a row and have it moved up or down the list, or possibly a drag-and-drop operation to move it to anywhere else. I'd like this order to be saved in the database (same or other table). The custom order would be unique for the set of rows for each owner_ID.
I've searched for ways to provide this ordering without luck. I've thought of a few ways to implement this, but help me fill in the final option:
Add an INT column and set it's value to whatever I need to get rows
returned in my order. This presents the problem of scanning
row-by-row to find the insertion point, and possibly needing to
update the preceding/following rows sort column.
Having a "next" and "previous" column, implementing a linked list.
Once I find my place, I'll just have to update max 2 rows to insert
the row. But this requires scanning for the location from row #1.
Some SQL/relational DB trick I'm unaware of...
I'm looking for an answer to #3 because it may be out there, who knows. Plus, I'd like to offload as much as I can on the database.
From what I've read you need a new table containing the ordering of each user, say it's called *user_orderings*.
This table should contain the user ID, the position of the thing and the ID of the thing. The (user_id, thing_id) should be the PK. This way you need to update this table every time but you can get the things for a user in the order he/she wants using ORDER BY on the user_orderings table and joining it with the things table. It should work.
The simplest expression of an ordered list is: 3,1,2,4. We can store this as a string in the parent table; so if our table is photos with the foreign key profile_id, we'd place our photo order in profiles.photo_order. We can then consider this field in our order by clause by utilizing the find_in_set() function. This requires either two queries or a join. I use two queries but the join is more interesting, so here it is:
select photos.photo_id, photos.caption
from photos
join profiles on profiles.profile_id = photos.profile_id
where photos.profile_id = 1
order by find_in_set(photos.photo_id, profiles.photo_order);
Note that you would probably not want to use find_in_set() in a where clause due to performance implications, but in an order by clause, there are few enough results to make this fast.