Laravel Many to Many Relationship Pagination with 20,000 rows

Laravel Many to Many Relationship Pagination with 20,000 rows - php

I have a table for articles and categories, along with a pivot table; I got all related articles within a category. Using
$category = Category::first();
return $category->articles()->paginate(10); // many to many relationship $this->belongsToMany(Article::class, 'article_category');
It returns the right articles, but it is so slow when there are more than 20000 articles in a category. Is there any way to make the query faster?
note: All tables have indexes.

I have face this quite few times.
So far the only solution that I know is TO NOT USE ELOQUENT in COMPLEX QUERY.
You have to change to DB Query Builder.
I assume you are using "whereHas" in your query. Using it will slow down a lot on your queries. If you do use it, change to DB Query with Join methods. It would be a lot faster.
The only problem is that there is no relationship you can use which you have already declared in the model. You have to manually link it back.
But recently I heard they add this relationship feature already in 5.3. But I have not yet check it out.
Hopefully this will solve the problem.

Related

Laravel Eloquent: get unique models between relationships

I have a scenario like this:
I have User model that has an OneToMany relationships with the Post model.
I have a Hashtag model that has an OneToMany relationships with the Post model.
Recap: ONE user has MANY posts, ONE post belongs to ONE hashtag, ONE hashtag has MANY posts.
I would like to fetch only unique users records of an hashtag.
I'm able to do it in non scalable way (fetching all the posts first and then iterate filtering by user id), but I need to maintain scalability for large record numbers.
Edit: I saw a partial solution in Laravel docs.
Laravel eloquent has a method called unique().
With that method I can specify the parameter which should be unique in query.
In my case figured it out with:
$users = $hashtag->posts->unique('user_id');
But I can't paginate query in this way...
Has anyone a solution for that?

You have to paginate your main model caller.
I don't know exactly if this is can reproduce your actual scenario, but let's see this example:
Let's say you have the model hashtags
So let's say the code could be something like this:
$users = DB::table('hashtags AS tags')
->select('tags.*')
->join('[posts AS post','post.id','=','tags.id')
->distinct()
->paginate(5, ['tags.*']);
I don't know if your query it'll be very accurate, but for this, I believe the best approach it'll be you do a raw query, could look like it'll be costly to your database and the operation, but you can work around this approach indexing and partitioning your database.
Remembering always the eloquent sometimes even when we're building join queries could be more even costly to your database.
Since you're worried about scalability so the best thing could be for too could write a view to fetch all the data with proper indexing and partitioning.

Try to use DISTINCT
$users = $hashtag->posts()->selectRaw('DISTINCT(user_id) AS unique_user_id')->paginate(10);
OR
use groupBy()
$users = $hashtag->posts()->groupBy('name')->select('user_id')->paginate(10);

is it advisable to use polymorphic relations with joins?

I've created migrations and also their relationships. looks like I'm using polymorphic relations, because in one table, I could have the value of one model or i could have the value of another model. as you might guess, I have something like this
ID Mappable_type Mappable_id value
Mappable_type could be one or many other models. mappable_id will be appropriate id from that model. I needed this because I'm creating dynamic form builder and that's how I designed a database.
Now I need to fetch some records from 5 tables together, so I gotta use joins. the table where I have mappable_type is at the third place from those 5 joins. now What should I do also to only fetch data from these 5 joins that have specific model type that is in that morph table? I'm doing something like that for now:
$final = Dynamic::join('dynamic_properties_values','dynamic_properties_values.dynamic_id','=','dynamic.id')
->join('dynamic_properties_mapping','dynamic_properties_mapping.id','=','dynamic_properties_values.mapping_id')
->where('dynamic_properties_mapping.mappable_type','=','App\Driver')
->get();
As you see, I have written by hand something like
"App\Driver"
. It doesn't matter I have written it by hand or had it saved into a variable and than used it here. WHat really bothers me is that let's say when I was inserting data into these morph tables, Model got saved as "App\Driver". what if I make a mistake and when fetching with joins, I write something like App\Http\Driver? or what If I changed the location of the model itself in my code project?
What's the best practice not to make those kind of errors by the time my project reaches huge amount of users?

Caching only a relation data from FuelPHP ORM result

I'm developing app with FuelPHP & mySql and I'm using the provided ORM functionality. The problem is with following tables:
Table: pdm_data
Massive table (350+ columns, many rows)
Table data is rather static (updates only once a day)
Primary key: obj_id
Table: change_request
Only few columns
Data changes often (10-20 times / min)
References primary key (obj_id from table pdm_data)
Users can customize datasheet that is visible to them, eg. they can save filters (eg. change_request.obj_id=34 AND pdm_data.state = 6) on columns which then are translated to query realtime with ORM.
However, the querying with ORM is really slow as the table pdm_data is large and even ~100 rows will result in many mbs of data. The largest problem seems to be in FuelPHP ORM: even if the query itself is relatively fast model hydration etc. takes many seconds. Ideal solution would be to cache results from pdm_data table as it is rather static. However, as far as I know FuelPHP doesn't let you cache tables through relations (you can cache the complete result of query, thus both tables or none).
Furthermore, using normal SQL query with join instead of ORM is not ideal solution, as I need to handle other tasks where hydrated models are awesome.
I have currently following code:
//Initialize the query and use eager-loading
$query = Model_Changerequest::query()->related('pdmdata');
foreach($filters as $filter)
{
//First parameter can point to either table
$query->where($filter[0], $filter[1], $filter[2]);
}
$result = $query->get();
...
Does someone have a good solution for this?
Thanks for reading!

The slowness of the version 1 ORM is a known problem which is being addressed with v2. My current benchmarks are showing that v1 orm takes 2.5 seconds (on my machine, ymmv) to hydrate 40k rows while the current v2 alpha takes around 800ms.
For now I am afraid that the easiest solution is to do away with the ORM for large selects and construct the queries using the DB class. I know you said that you want to keep the abstraction of the ORM to ease development, one solution is to use as_object('MyModel') to return populated model objects.
On the other hand if performance is your main concern then the ORM is simply not suitable.

One SELECT (to fetch/ rule them all!) then handle the entire collection using Ruby array functions?

I'm wondering, when looking through a set of rows and for each row fetching another tables set of rows. For example, looping through a series of categories and for each category, fetching all their news articles. Perhaps to display on a single page or such. It seems like a lot of SELECT queries - 1 to get all categories, and one for each category (to get it's articles). So, my question is - is it quicker to simple do two fetches at the start:
categories = Category.all
articles = Articles.all
...and then just use select() or where() on articles by category id to only take those from the articles array? Replacing multiple SELECT queries with multiple array functions, which is quicker? I imagine also that each app, depending on number of rows, may vary. I would be interested to hear what people think, or any links that clarify this as I ddin't find much on the matter myself.
My code example above is Ruby on Rails but this question might actually apply to any given language. I also use PHP from time to time.

It depends on what you want to do with your data. You could try eager loading.
categories = Category.includes(:articles)
Here's the documentation. http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations

I think you're describing what's called the N+1 problem (I'm new to this too). Here's another stack overflow question that addresses this issue generally: What is SELECT N+1?
n+1 is the worst, especially when you think about 10k or 10M articles like timpone pointed out. For 10M articles you'll be hitting the DB 10,000,001 times for a single request (hence the name n + 1 problem). Avoid this. Always. Anything is better than this.
If Category has a has_many relation to Article (and Article has a belongs_to relation to Category) you could use #includes to "pre-fetch" the association like so:
categories = Category.includes(:articles)
This will two two queries, one for the Category and one for the Article. You can write it out in two explicit select/where statements but I think doing it this way is semantically clearer. If you want to retrieve all the categories and then for each category get all the articles. You can write code like this (in Ruby):
categories.articles.each do |article|
# do stuff...
end
and it's immediately clear that you mean "all the articles for this category instance".

PHP forum database optimisation

I've been thinking about creating a forum in PHP so I did a little research to see what the standard is for the tables that people create in the database. On most websites I've looked up, they always choose to have one table for the threads and a second for the posts on the threads.
Having a table for the threads seems perfectly rational to me, but one table to hold all the posts on all the threads seems like a little too much. Would it be better to create a table for each thread that will hold that thread's posts instead sticking a few hundred thousand posts in one table?

The tables should represent the structure of the data in your database. If you have 2 objects, which in this case are your threads and your posts, you should put them in 2 tables.
Trust me, it will be a nightmare trying to figure out the right table to show for each post if you do it the way you're thinking. What would the SQL look like? Something like
SELECT *
FROM PostTable17256
and you would have to dynamically construct this query on each request.
However, by using 1 table, you can simply get a ThreadID and pass it as a variable to your query.
SELECT *
FROM Posts
WHERE ThreadID = $ThreadID
Relational databases are designed to have tables which hold lots of rows. You would probably be surprised what DBAs consider to be a "lot" by the way. A table with 1,000,000 rows is considered small to medium in most places.

Nope nope nope. Database love huge tables. Splitting posts into multiple tables will cause many many headaches.

Storing posts in one table is best solution.
MySQL can easily hold millions of rows in a table.
Creating multiple tables may cause few problems.
For example you will not be able to use JOIN with posts from different threads.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.