I am working on an application in CakePHP 3.7.
We have 3 database tables with the following hierarchy. These tables are associated correctly according to the Table classes:
regulations
groups
filters
(The associations are shown in a previous question: CakePHP 3 - association is not defined - even though it appears to be)
I can get all of the data from all three tables as follows:
$regulations = TableRegistry::getTableLocator()->get('Regulations');
$data = $regulations->find('all', ['contain' => ['Groups' => ['Filters']]]);
$data = $data->toArray();
The filters table contains >1300 records. I'm therefore trying to build a feature which loads the data progressively via AJAX calls as the user scrolls down the page similar to what's described here: https://makitweb.com/load-content-on-page-scroll-with-jquery-and-ajax/
The problem is that I need to be able to count the total number of rows returned. However, the data for this exists in 3 tables.
If I do debug($data->count()); it will output 8 because there are 8 rows in the regulations table. Ideally I need a count of the rows it's returning in the filters table (~1300 rows) which is where most of the data exists in terms of the initial load.
The problem is further complicated because this feature allows a user to perform a search. It might be the case that a given search term exists in all 3 tables, 1 - 2 of the tables, or not at all. I don't know whether the correct way to do this is to try and count the rows returned in each table, or the rows overall?
I've read How to paginate associated records? and Filtering By Associated Data in the Cake docs.
Additional information
The issue seems to come down to how to write a query using the ORM syntax Cake provides. The following (plain MySQL) query will actually do what I want. Assuming the user has searched for "Asbestos":
SELECT
regulations.label AS regulations_label,
groups.label AS groups_label,
filters.label AS filters_label
FROM
groups
JOIN regulations
ON groups.regulation_id = regulations.id
JOIN filters
ON filters.group_id = groups.id
WHERE regulations.label LIKE '%Asbestos%'
OR groups.label LIKE '%Asbestos%'
OR filters.label LIKE '%Asbestos%'
ORDER BY
regulations.id ASC,
groups_label ASC,
filters_label ASC
LIMIT 0,50
Let's say there are 203 rows returned. The LIMIT condition means I am getting the first 50. Some js on the frontend (which isn't really relevant in terms of how it works) will make an ajax call as the user scrolls to the bottom of the page to re-run this query with a different limit (e.g. LIMIT 51, 100 for the next set of results).
The problem seems to be two fold:
If I write the query using Cake's ORM syntax the output is a nested array. Conversely if I write it in plain SQL it's returning a table which has just 3 columns with the labels that I need to display. This structure is much easier to work with in terms of outputting the required data on the page.
The second - and perhaps more important issue - is that I can't see how you could write the LIMIT condition in the ORM syntax to make this work due to the nested structure described in 1. If I added $data->limit(50) for example, it only imposes this limit on the regulations table. So essentially the search works for any associated data on the first 50 rows in regulations. As opposed to the LIMIT condition I've written in MySQL which would take into consideration the entire result set which includes the columns from all 3 tables.
To further elaborate point 2, assume the tables contain the following numbers of rows:
regulations: 150
groups: 1000
filters: 5000
If I use $data->limit(50) it would only apply to 50 rows in the regulations table. I need to apply the LIMIT the result set after searching all rows in all 3 tables for a given term.
Creating that query using the query builder is pretty simple, if you want joins for non-1:1 associations, then use the *JoinWith() methods instead of contain(), in this case innerJoinWith(), something like:
$query = $GroupsTable
->find()
->select([
'regulations_label' => 'Regulations.label',
'groups_label' => 'Groups.label',
'filters_label' => 'Filters.label',
])
->innerJoinWith('Regulations')
->innerJoinWith('Filters')
->where([
'OR' => [
'Regulations.label LIKE' => '%Asbestos%',
'Groups.label LIKE' => '%Asbestos%',
'Filters.label LIKE' => '%Asbestos%',
],
])
->order([
'Regulations.id' => 'ASC',
'Groups.label' => 'ASC',
'Filters.label' => 'ASC',
]);
See also
Cookbook > Database Access & ORM > Query Builder > Using innerJoinWith
Cookbook > Database Access & ORM > Query Builder > Adding Joins
Related
I have three models, Companies, events and assistances, where the assistances table stores the event_id and the company_id. I'd like to get a query in which the total assistances of the company to certain kind of events are stored. Nevertheless, as all these counts are linked to the same table, I don't really know how to build this query effectively. I have the ids of the assistances to each kind of event stored in some arrays, and then I do the following:
$query = $this->Companies->find('all')->where($conditions)->order(['name' => 'ASC']);
$query
->select(['total_assistances' => $query->func()->count('DISTINCT(Assistances.id)')])
->leftJoinWith('Assistances')
->group(['Companies.id'])
->autoFields(true);
Nevertheless, I don't know how to get the rest of the Assistance count, as I would need to count not all the distinct assistance Ids but only those taht fit to certain conditions, something like ->select(['assistances_conferences' => $query->func()->count('DISTINCT(Assistances.id)')])->where($conferencesConditions) (but obviously the previous line does not work. Is there any way of counting different kind of assistances in the query itself? (I need to do it this way because I then plan to use pagination and sort the table taking those fields into consideration).
The *JoinWith() methods accept a second argument, a callback that receives a query builder used for affecting the select list, as well as the conditions for the join.
->leftJoinWith('Assistances', function (\Cake\ORM\Query $query) {
return $query->where([
'Assistances.event_id IN' => [1, 2]
]);
})
This would generate a join statement like this, which would only include (and therefore count) the Assistances with an event_id of 1 or 2:
LEFT JOIN
assistances Assistances ON
Assistances.company_id = Companies.id AND
Assistances.event_id IN (1, 2)
The query builder passed to the callback really only supports selecting fields and adding conditions, more complex statements would need to be defined on the main query, or you'd possibly have to switch to using subqueries.
See also
Cookbook > Database Access & ORM > Query Builder > Filtering by Associated Data
What I want to do is to get all rows related with user_id but in a different way.
First condition is to get all Books that are related with the User via Resources table where user_id is stored (in other words - Books owned by the User). Second condition is to get all Books that are related with the User through the Cities model again which is stored in the Resources table as well (Books that belong to Cities owned by the User).
I tried really a lot of things and I simply cannot make this two conditions work because I use JOIN (tried different combinations of innerJoinWith and leftJoinWith) on the same "end" model (User).
What I've done so far:
$userBooks = $this->Books->find()
->leftJoinWith("Resources.Users")
->leftJoinWith("Cities.Resources.Users")
->where(["Resources.Users" => 1])
->orWhere(["Cities.Resources.Users" => 1])
->all();
This of course does not work, but I hope you get the point about what I'm trying to achieve. The best what I was able to get with trying different approaches is the result of only one JOIN statement what is logical.
Basically, this can be separated into 2 parts which gives expected result (but I do not prefer it because I want it done with one query of course):
$userBooks = $this->Books->find()
->innerJoinWith("Resources.Users", function($q) {
return $q->where(["Users.id" => 1]);
})
->all();
$userBooks2 = $this->Books->find()
->innerJoinWith("Cities.Resources.Users", function($q) {
return $q->where(["Users.id" => 1]);
})
->all();
Also, before this I created an SQL script which works well and result is like expected:
SELECT books.id FROM books, cities, users_resources WHERE
(users_resources.resource_id = books.resource_id AND users_resources.user_id = 1)
OR
(users_resources.resource_id = cities.resource_id AND books.city_id = cities.id AND users_resources.user_id = 1)
This query works and I want to transfer it into ORM styled query in CakePHP to get both Books that are owned by the user and the ones that are connected with the User via Cities. I want somehow to separate these joins to individually filter data like I did in the SQL query.
EDIT
I've tried #ndm solution but the result is the same as where there is only 1 association (User) - I was still able to get data based on only one join statement (second one was ignored). Due to the fact I had to move on, I ended up with
$userBooks = $this->Books->find()
->innerJoinWith("Cities.Resources.Users")
->where(["Users.id" => $userId])
->union($this->Books->find()
->innerJoinWith("Resources.Users")
->where(["Users.id" => $userId])
)
->all();
which outputs correct result but not in very effective way (by union of 2 queries). I would really like to know the best way to approach this as this is a very common case (filtering by related model (user) that is associated with other models).
The ORM (specifically the eager loader) doesn't allow joining the same alias multiple times.
This can be worked around in various ways, the most simple one probaly being creating a separate association with a unique alias. For example in your ResourcesTable, create another association to Users with a different alias, say Users2, like:
$this->belongsToMany('Users2', [
'className' => 'Users'
]);
Then you can use that association in the second leftJoinWith(), and apply the conditions accordingly:
$this->Books
->find()
->leftJoinWith('Resources.Users')
->leftJoinWith('Cities.Resources.Users2')
->where(['Users.id' => 1])
->orWhere(['Users2.id' => 1])
->group('Books.id')
->all();
And don't forget to group your books to avoid duplicate results.
You could also create the joins manually using leftJoin() or join() instead, where you can define the aliases on your own (or don't use any at all) so that there are no conflicts, for more complex queries that can be a tedious task though.
You could also use your two separate queries as subqueries for conditions on Books, or even create a union query from them, which however might perform worse...
See also
Cookbook > Database Access & ORM > Query Builder > Adding Joins
CakePHP Issues > Improve association data fetching
This question might be flagged as too broad or opinion-based, but I take the risk...
I have a REST-API in php that gets all data from a mysql table, that also includes 'hasMany' fields. Let's call them a 'post' hasMany 'comments'.
Right now I'm doing ONE select with a LEFT JOIN on comments, then walk through the results to restructure the output to
{ "posts": [
{"id": 1,
"comments": [1,2,3]
},
....
]}
All is fine until I have more than one hasMany-field, because then the refacturing gets complicated (now produces double entries) and I would need to loop through the result (not manually, but still with built in functions) several times.
So I thought about refacturing my code to:
1. select the actual item ('post')
2. select all hasMany fields ('comments', 'anythingelse',...) and add the results.
which of course produces loads of action on my db.
So my question is if anybody has a simple answer like 'better grab all the data in one go from a database and do the work in php' or the opposite.
Yes, I could do benchmarks myself. But fist - to be honest I would like to avoid all the reprogramming just to find out it's slower - second i don't know if my benchmark would remain the same on an optimized (and linux) production machine (right now I'm developing on easyPhp on windows).
Some info:
The 'post' table could result in some hundred records, same as the hasMany each. But combined with some hasMany fields it could result in a recordset (with the first aproach) of several thousands.
Use the IN (…) operator.
First, get the relevant posts on their own:
SELECT […stuff…] FROM posts WHERE […conditions…]
Then take the list of post IDs from the results you get there and substitute the whole list into a set of queries of the form:
SELECT […stuff…] FROM comments WHERE post_id IN (1, 2, 3 […etc…])
SELECT […stuff…] FROM anythingelse WHERE post_id IN (1, 2, 3 […etc…])
Running one query per dependent table is fine. It's not significantly more expensive than running a single JOINed query; in fact, it may be less expensive, as there's no duplication of the fields from the parent table.
Make sure the post_id column is indexed on the subtables, of course.
The best alternative that I can think of would be along the lines of:
$posts = $dbh->prepare('SELECT [fields] FROM posts WHERE [conditions]')->
execute([...])->
fetchAll();
$stmt = $dbh->prepare('SELECT id FROM comments WHERE post_id = ?');
for($i=0; $i<count($posts); $i++) {
$stmt->execute($posts[$i]['id']);
$posts[$i]['comments'] = $stmt->fetchAll();
}
You need to decide if the work/overhead tradeoff between dealing "duplicate" data as a result of the join is more or less than separately retrieving the comments for each post.
Chance are if you were using an ORM something along the lines of the above would be happening automagically.
I am having performance issues when dealing with pagination and filtering products as seen on many ecommerce sites, here is an example from Zappos
Kind of the standard:
Showing 1-10 of 132 results. [prev] 1 2 [3] 4 ... 13 [next]
[10] Results per page
To me it seems like a large part of the problem is the query is run twice, once to count the number of results and again to actually populate the array. Below is the "filter" query:
SELECT product_id, product_title, orderable
FROM table_view
WHERE (family_title = 'Shirts' OR category_title = 'Shirts')
AND ((detail_value = 'Blue' AND detail_title = 'Color')
OR (detail_value = 'XL' AND detail_title = 'Size'))
GROUP BY product_id, product_title, orderable
HAVING COUNT(detail_title)=2
ORDER BY product_id
LIMIT 10 OFFSET 0
The query takes about 20ms to run just by itself. The table it is selecting from is a view which is a join of about five different tables. The parameters that are passed in by the user are the "detail_value" & "detail_title" which are the filtering criterial. The "family" & "category" and then the Limit is set by the "results per page". So if they want to view all the results the limit is set to 2000. And every time they go to a new page via the pagination the whole query is run again. Below is a snippet of the PHP, $products is an array of the query results. And then the $number_of_results is a count of the same thing with the maximum limit.
$products = filter($value, $category_title, $number_per_page, $subcategory, $start_number);
$number_of_results = count(filter($value, $category_title, 2000, $subcategory, 0));
$pages = ceil($number_of_results / $number_per_page);
When run on my local machine the results page takes about 600-800ms to load, when deployed to Heroku the page takes 13-16 seconds to load. I've left out a lot of the PHP code, but I'm using PHP's PDO class to make the query results into an object to display in PHP. The tables being joined are the product table, category table, detail table, and the two tables linking them via foreign keys.
Google results show that this is a pretty common/complex problem, but I have yet to come across any real solution that works for me.
Many queries for pagination generally need to run several times: once to determine how many records would be shown, then again to grab a screen of records. Then subsequent queries grabbing the next screen of records, etc.
Two solutions to slow pagination queries are:
Use a cursor to pull n-records from the open query resultset
Speed up the queries
Solution 1 can be expensive memory-wise for the server's resources and might not scale well if you have many concurrent users generating queries like this. It might also be difficult to implement cursors with the PDO class you're using.
Solution 2 could be done via improving view queries, adding indexes, etc. However that may not be enough. If the tables are read much more often than they are written to, you might try using UPDATE/INSERT/DELETE trigger tricks. Rather than running the query against a VIEW, create a table with the same column structure and data as the VIEW. Any time that one of the underlying tables changes, manually modify this new table to follow the changes. This will slow down writes, but greatly improve reading.
Ok to make it more clear:
I am Using doctrine
I have a table Brands and Products
Brand
id
name
Product
id
name
brand_id
I have a lot of brands and Products of those brands in the database.
I would like to retrieve List of brands(+ count of its products) Grouped by Brand.name's first latter.
ex:
array(
n => array(
0 => array('Nike', 4 ),
1 => array('North Pole', 18)
.....
)
.....
)
So my question was can this be done with one query in a efficient way.
I really don't wan't to run separate queries for each brand.name's first latter.
Doctrines "Hierarchical Data" cross my mind but I believe it for different thing?.
thanks
If you are going to use this form of result more than once, it might be worthwhile to make the formatting into a Hydrator, as described here.
In your case, you can create a query that select 3 columns
first letter of brand.name
brand.name
count(product.id)
Then hydrate the result
$results = $q->execute(array(), 'group_by_first_column');
You cannot take it from database in that way, but you can fetch data as objects or arrays and then transform it to described form. Use foreach loops.
When using Doctrine you can also use raw SQL querys and hydrate arrays instead of objects. So my Solution would be to use a native SQL Query:
SELECT
brand.name,
count(product.id)
FROM
brand
JOIN
product ON
brand.id=product.brand_id
GROUP BY
brand.id ORDER BY brand.name;
And then iterate in PHP over the result to build the desired array. Because the Result is ordered by Brand Name this is quite easy. If you wasn't to keep database abstraction I think it should also be possible to express this query in DQL, just hydrate an array instead of objects.