CakePHP 4 - Query to find the latest status from another table

CakePHP 4 - Query to find the latest status from another table - php

I'm using CakePHP 4 to build an application that shows an inventory of documents and the latest status for each document.
The tables are fairly simple:
documents: This contains a list of documents, each of which has a unique id, a human-friendly name, a filename etc.
statuses: This includes a list of about 10 different statuses that a document can go through within the application (e.g. "uploaded", "review requested", "reviewed", "rejected" etc). Each of these has a unique id and name (the name being the text of the status, such as "uploaded", "review requested" etc).
documents_statuses: This is a history table that contains all of the statuses that a document has gone through. Any given document (documents.id) can appear multiple times in this table (using a foreign key of documents_statuses.document_id referring to the relevant documents.id). It also has a documents_statuses.status_id corresponding to a statuses.id in the statuses table mentioned above. Each row here has CakePHP's conventional created timestamp so we know when rows were inserted. This is enough to tell us dates/times about when the document got to a particular status.
What I'm trying to do is output a list of documents and show the most recent status from documents_statuses in my table. The HTML structure of the table is simple and contains 3 headings:
Document Name
Filename
Last status
Writing a query to get the data for the first 2 columns is easy as the data for that belongs in the documents table:
// src/Controller/DocumentsController.php
public function index()
{
$documents = $this->Documents->find();
$documents = $documents->paginate($document);
$this->set('documents', $documents);
}
In my template I can then reference $documents->name and $documents->filename to output the respective columns from the documents table.
I understand that I need some extra logic in this query which will JOIN to the documents_statuses table and then order the records in descending order with a LIMIT of 1 to get the most recent status per document. I know I also need to do a further JOIN such that documents_statuses.status_id returns the corresponding statuses.name.
I know that I can adapt my query to contain documents_statuses and statuses:
$documents = $this->Documents->contain(['DocumentsStatuses', 'Statuses'])->find();
But I don't know how to loop through the records in documents_statuses in this query and do the ->orderDesc->limit(1) to get the most recent record. Furthermore I also know that to obtain the statuses.name I would need to get this query to join the documents_statuses.status_id and statuses.id to return statuses.name (e.g. "uploaded", "review requested" etc).
The application has been bake'd and the models associations are defined correctly.
Might something equivalent be described in the CakePHP docs?
Edit - Raw SQL
The following SQL is equivalent to what I'm trying to write using the ORM. The problem isn't particularly understanding the SQL involved, it's writing it using CakePHP's ORM syntax. Equally, if there is a "better" way to write this query I'm interested but the purpose of this question is how to make this work using CakePHP's ORM.
SELECT
documents.name,
documents_statuses.created,
statuses.name
FROM documents
LEFT JOIN
(SELECT documents_id, MAX(created) AS created
FROM documents_statuses
GROUP BY documents_id
) recent_statuses
ON documents.id = recent_statuses.documents_id
LEFT JOIN documents_statuses
ON documents.id = documents_statuses.documents_id AND recent_statuses.created = documents_statuses.created
LEFT JOIN statuses
ON documents_statuses.status_id = statuses.id

It can be done with a slightly complex JOIN; see the [groupwise-max] tag or my Groupwise-Max
Alternatively, you could use a different schema pattern:
Current -- this always has the latest 'status' (plus other info). There would be one row per document.
History -- essentially as you have it now. But this is not looked at to find the "current status". This table has many rows per document.
Your code would need to INSERT INTO History and UPDATE Current to update the status. (Actually the Update could be a IODKU if you need to Insert when the row does not exist yet.)
The query in question would be simply SELECT ... FROM Current ... -- no Join needed.

$documents = $this->Documents->contain(['DocumentsStatuses'=>function(Query $q){ return $q->contain(['Statuses'])->orderDesc->limit(1);}])->find();
book.cakephp.org

Related

Good practice for handling naturally JOINed results across an application

I'm working on an existing application that uses some JOIN statements to create "immutable" objects (i.e. the results are always JOINed to create a processable object - results from only one table will be meaningless).
For example:
SELECT r.*,u.user_username,u.user_pic FROM articles r INNER JOIN users u ON u.user_id=r.article_author WHERE ...
will yield a result of type, let's say, ArticleWithUser that is necessary to display an article with the author details (like a blog post).
Now, I need to make a table featured_items which contains the columnsitem_type (article, file, comment, etc.) and item_id (the article's, file's or comment's id), and query it to get a list of the featured items of some type.
Assuming tables other than articles contain whole objects that do not need JOINing with other tables, I can simply pull them with a dynamicially generated query like
SELECT some_table.* FROM featured_items RIGHT JOIN some_table ON some_table.id = featured_items.item_id WHERE featured_items.type = X
But what if I need to get a featured item from the aforementioned type ArticleWithUser? I cannot use the dynamically generated query because the syntax will not suit two JOINs.
So, my question is: is there a better practice to retrieve results that are always combined together? Maybe do the second JOIN on the application end?
Or do I have to write special code for each of those combined results types?
Thank you!

a view can be thot of as like a table for the faint of heart.
https://dev.mysql.com/doc/refman/5.0/en/create-view.html
views can incorporate joins. and other views. keep in mind that upon creation, they take a snapshot of the columns in existence at that time on underlying tables, so Alter Table stmts adding columns to those tables are not picked up in select *.

An old article which I consider required reading on the subject of MySQL Views:
By Peter Zaitsev
To answer your question as to whether they are widely used, they are a major part of the database developer's toolkit, and in some situations offer significant benefits, which have more to do with indexing than with the nature of views, per se.

MYSQL Database Design/SELECT Statement assistance

I'm hoping someone may be able to provide some advice regarding a database schema I have created and a SELECT statement that I am using to query the database.
I am attempting to create a database of very old newspaper articles from the 1800's, storing such things as the date, title and full text of the article, an image of the article, the name of the newspaper the article came from, names of locations mentioned in the article and individuals mentioned within the article.
Basically below is the current structure I've created with tbArticle being the main table focus ("test" is the name of the database). I've normalised the name of the newspaper, image info, location info and individuals into their own tables and because it is assumed there will be many articles to many individuals, I've added a link table (lktbArticleIndividuals) of sorts between tbArticle & tbIndividual;
The reason for creating the database is to obviously make a focused set of newspaper articles searchable and store them in a logical format.
My issue or question is this ...
All I want to do is display a list of all the articles in the database, obviously including data from the other tables other than tbArticle and to do this I am using this SELECT query;
SELECT *
FROM tbArticle a
, tbLocation l
, tbNewspapers n
, tbIndividual i
, lktbArticleIndividuals ai
, tbImage m
WHERE a.idLocation = l.idLocation
and a.idNewspaper = n.idNewspaper
and a.idArticle = ai.idArticle
and ai.idIndividual = i.idIndividual
and a.idImage = m.idImage;
Which does what I want ... except ... if more than one individual is listed as being in an article, then two (or more) instances of the whole article are returned with the only difference being the different individual's names being displayed.
If possible, I want to just list each article ONCE, but iterate through the two or more individuals to include them. Can this be done?
If I were to query the database in say PHP I suspect what I might have to do is some sort of loop within a loop to achieve the results I want, but this doesn't seem very efficient to me!!
Does any of this make sense to anyone?!

Instead of SELECT *, you could name the columns you're interested in, and for things such as individuals, use GROUP_CONCAT() to add them all into one field, and at the end of your query, use GROUP BY a.idArticle to limit each article to one row per article.

Assuming you just want the first_name of each individual you could use a group by with a GROUP_CONCAT.
SELECT *,
GROUP_CONCAT(i.firstname)
FROM tbArticle a
, tbLocation l
, tbNewspapers n
, tbIndividual i
, lktbArticleIndividuals ai
, tbImage m
WHERE a.idLocation = l.idLocation
and a.idNewspaper = n.idNewspaper
and a.idArticle = ai.idArticle
and ai.idIndividual = i.idIndividual
and a.idImage = m.idImage;
GROUP BY a.idArticle
However, if you want to get many details of each individual I would encourage you to do two separate queries: one for the articles and another one to get the individuals of each article.

SQL: Deleting old records only so long as there are not newer matching records?

I've got a really big collection of data in a postgres database where I'd like to nuke data past a particular age... but I do not want it nuking the latest iteration of data from any given location & site combination.
Basically, I've got a really big table that has a location (bigint), site (bigint), readdate (bigint), and a little accompanying data (note: there will be multiple entries for a given site, location, and readdate - but anything on the same readdate is considered part of the same scan, and needs to be kept for a given location).
Currently, I've just got it set to get rid of all old records... but the possibility exists that a particular site and location combination will stop giving out data for a while, and I'd like to preserve the final state if that happens. I'm doing the SQL queries from php, so I'm pretty sure I could hack together some highly ugly code that finds the latest readdate for any given site & location combination, then either deletes stuff younger than that for that location, or deletes based on the calender limit (whichever gives the lesser date), but I'd prefer to put the decision-making workload in the SQL query, rather than having to first get a list of all location, site, and max(readdate) entries, then iterate over them in php making individual delete queries.
My current query (which doesn't do what I want, as it deletes everything before $limit) is declared by:
$query="DELETE FROM votwdata WHERE readdate < '".$limit."';";
any ideas for a good revision?

If I understand what you are trying to do, you have a number of fields that might be the same, and you want to keep the most recent record. Assuming you have a sequential ID or a created_at on each record, you can run a subquery to identify the records you want to delete. For example:
select max(id),data1,data2 from table group by data1,data2;
That will pull the most recent record for a unique data1 and data2. You can run that as an inline query, joining it back to the original table.
select t.* from table t, (select max(id) "id",data1,data2 from table group by data1,data2) t2
where t.id=t2.id;
That will give you the most recent records. You can do an left join and look at the null values to delete anything that you don't like.
select t.id,t2.id
from table t left join (select max(id) "id",data1,data2 from table group by 2,3) t2 on t.id=t2.id
where t2.id is null;
That gives you all the records that you want to delete.
Okay, that's the dirty way - refactor away.

Completely arbitrary sort order in MySQL with PHP

I have a table in MySQL that I'm accessing from PHP. For example, let's have a table named THINGS:
things.ID - int primary key
things.name - varchar
things.owner_ID - int for joining with another table
My select statement to get what I need might look like:
SELECT * FROM things WHERE owner_ID = 99;
Pretty straightforward. Now, I'd like users to be able to specify a completely arbitrary order for the items returned from this query. The list will be displayed, they can then click an "up" or "down" button next to a row and have it moved up or down the list, or possibly a drag-and-drop operation to move it to anywhere else. I'd like this order to be saved in the database (same or other table). The custom order would be unique for the set of rows for each owner_ID.
I've searched for ways to provide this ordering without luck. I've thought of a few ways to implement this, but help me fill in the final option:
Add an INT column and set it's value to whatever I need to get rows
returned in my order. This presents the problem of scanning
row-by-row to find the insertion point, and possibly needing to
update the preceding/following rows sort column.
Having a "next" and "previous" column, implementing a linked list.
Once I find my place, I'll just have to update max 2 rows to insert
the row. But this requires scanning for the location from row #1.
Some SQL/relational DB trick I'm unaware of...
I'm looking for an answer to #3 because it may be out there, who knows. Plus, I'd like to offload as much as I can on the database.

From what I've read you need a new table containing the ordering of each user, say it's called *user_orderings*.
This table should contain the user ID, the position of the thing and the ID of the thing. The (user_id, thing_id) should be the PK. This way you need to update this table every time but you can get the things for a user in the order he/she wants using ORDER BY on the user_orderings table and joining it with the things table. It should work.

The simplest expression of an ordered list is: 3,1,2,4. We can store this as a string in the parent table; so if our table is photos with the foreign key profile_id, we'd place our photo order in profiles.photo_order. We can then consider this field in our order by clause by utilizing the find_in_set() function. This requires either two queries or a join. I use two queries but the join is more interesting, so here it is:
select photos.photo_id, photos.caption
from photos
join profiles on profiles.profile_id = photos.profile_id
where photos.profile_id = 1
order by find_in_set(photos.photo_id, profiles.photo_order);
Note that you would probably not want to use find_in_set() in a where clause due to performance implications, but in an order by clause, there are few enough results to make this fast.

Fetching records from different tables in the database

My application has a facebook-like stream that displays updates of various types. So it will show regular posts (from the "posts" table), and events (from the "events" tables) table and so on.
The problem is I have no idea how to fetch these records from different tables since they have different columns. Shall I query the database multiple times and then organize the data in PHP? if so, how? I'm not sure how I should approach this.
Your help is much appreciated :)

Unless the events and post are related to each other, then you'd probably query them separately, even if they show up on the same page.
You're not going to want to use JOIN just for the sake of it. Only if there is a foreign key relationship. If you don't know what that is, then you don't have one.

If the data tables are related to each other you can generally get the data back in a single query using some combination of JOINs and UNIONs. For a better answer, however, you'll have to post the structure of your data tables and a sample of what (combined) records you need for the website.

If you don't know the columns, you can get the table meta-data and find out what the columns represent and their corresponding data types.
If you know which columns, you can select from the multiple tables or even use nested selects or joins to get the data out.

Ideally you'd simply use a JOIN to obtain data from multiple tables in one query. However, without knowing more about your table schemas it's hard to provide any useful specifics. (It most likely won't be possible unless you've factored this in from the beginning that said.)
As such, you might also want to create a generic "meta" table that provides information for each of the posts/events in a common format, and provides a means to link to the relevant table. (i.e.: It would contain the "parent" type and ID.) You could then use this meta table as the source for the "updates" stream and drill down to the approriate content as required.

Join the tables on user_id i.e.
Select * from posts p
left join status_updates su on p.user_id = su.user_id
limit 25;
or if your tables differ too much then play with a temporary table first
create table tmp_updates
(
select user_id, p.id as update_id, 'post' as update_type, p.text from posts;
);
insert into table tmp_updates
(
select user_id, su.id as update_id, 'status' as update_type, su.text from status_updates;
);
Select * from tmp_updates
where user_id = '...'
limit 25;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.