I have to write a reporting query for a kind of versioning system where I need to retrieve date-based reporting variations of the latest version. Simplified table structures are:
items_register: ir_id (primary, auto inc), ir_name (varchar)
items: i_id (primary, auto inc), i_register_id (int), i_version_name (varchar), i_datetime (datetime), i_date_expiry (datetime)
Each entry in items_register has multiple associated versions, stored as entries in the items table - with the highest value of i_datetime being the most recent version.
I want to retrieve entries from the items_register where the most recent version (item) has i_date_expiry after a requested date ($f_date).
I think I somehow need to join the tables, order the items by i_datetime, limit them to 1 so I get the most recent version, then check if i_date_expiry is after $f_date & retrieve the fields if so.
The fields I want to retrieve are items_register.ir_id, items_register.ir_name, items.i_version_name, items.i_datetime.
TIA for any help.
It looks like you're searching for the "groupwise max" pattern.
Making a few assumptions about things that still aren't clear in your question, I think this may be the query you're looking for:
SELECT items_register.ir_id, items_register.ir_name,
items.i_version_name, items.i_datetime
FROM items_register
JOIN
(
SELECT items.i_register_id,
MAX(items.i_datetime) AS most_recent_item_datetime
FROM items
WHERE items.i_date_expiry > '$f_date'
GROUP BY items.i_register_id
) AS item_date ON item_date.i_register_id = items_register.ir_id
JOIN items ON items.i_register_id = items_register.ir_id
AND items.i_datetime = item_date.most_recent_item_datetime
Bear in mind that this assumes that $f_date is a string that conforms to the standards for datetime and timestamp literals (not date literals!) laid out in this documentation page.
Maybe this can be useful?
http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html
Related
I'm looking for some advice/help on quite a complex search algorithm. Any articles to relevant techniques etc. would be much appreciated.
Background
I'm building an application, which, in a nutshell, allows users to set their "availability" for any given day. The User first sets a general availability template which allows them to say:
Monday - AM
Tuesday - PM
Wednesday - All Day
Thursday - None
Friday - All Day
So this User is generally available Monday AM, Tuesday PM etc.
Schema:
id
user_id
day_of_week (1-7)(Monday to Sunday)
availability
They can then override specific dates manually, for example:
2013-03-03 - am
2013-03-04 - pm
2013-03-05 - all_day
Schema:
id
user_id
date
availability
This all works well - I have a Calendar being generated which combines the template and overrides and allows Users to modify their availability etc.
The Problem
I now need to allow Admin Users to search for Users who have specific availability. So the Admin User would use a calendar to select required dates and availability's and hit search.
For example, find me Users who are available:
2013-03-03 - pm
2013-03-04 - pm
2013-03-05 - pm
The search process would have to search for available Users using the Templated Availability and Overrides, then return the best results. Ideally, it would return Users who are available all of the time but in the case that no single user can match the dates, I need to provide a combination of Users who can.
I know this is quite a complex problem and I'm not looking for a complete answer, perhaps just some guidance or links to potentially relevant techniques etc.
What I've tried
At the moment, I have a halfway solution. I'm grabbing all the available Users, looping through each of them, and within that loop, looping through all of the required dates and breaking as soon as a User doesn't meet a required date. This is obviously very un-scalable and it's also only returning "perfect matches".
Possible Solutions
Full Text Searching with Aggregate Table
I thought about creating a separate table which had the following schema:
user_id
body
The body field would be populated with the Users template days and overrides so an example record might look like:
user_id: 2
body: monday_am tuesday_pm wednesday_pm thursday_am friday_allday 2013-03-03_all_day 2013-03-03_pm
I would then convert a Users search query into a similar format. So if a User was looking for someone who was available on the 19th March 2013 - All Day and 20th March 2013 - PM, I'd convert that into a string.
Firstly, as 19th March is a Tuesday, I'd convert that into tuesday_allday and same with the 20th. I'd therefore end up with:
tuesday_allday wednesday_pm 2013-03-19_allday 2013-03-20_pm
I'd then do a full text search against our aggregate table and return a "weighted" result set which I can then loop through and further interrogate.
I'm not sure how this would work in practice, so that's why I'm asking if anyone has any links to techniques or relevant articles I could use.
I am confident this problem can be solved with a more well defined DB schema.
By utilizing a more detailed DB schema you will be able to find any available user for any given time frame (not just am & pm) if you should so choose.
It will also allow you to keep template data, while not polluting your availability data with template information (instead you would select from the template table to programmatically fill in the availability for a given date, which then can be modified by the user).
I spent some time diagramming this problem and came up with a schema structure that I believe solves the problem you specified and allows you to grow your application with a minimum of schema changes.
(To make this easier to read I've added the SQL at the end of this proposed answer)
I have also included an example select statement that would allow you to pull availability data with any number of arguments.
For clarity that SELECT is above the SQL for the schema # the end of my explanatory text.
Please don't be intimidated by the select, it may look complicated # first glance but is really a map to the entire schema (save the templates table).
(btw, I'm not saying that because I have any doubt that you can understand it, I'm sure you can, but I've known many programmers who ignore more complex DB structures to their own detriment because it LOOKS overly complex but when analyzed is actually less complex than the acrobatics they have to do in their program to get similar results... Relational DBs are based on a branch of mathematics that is good # accurately, consistently, & (relatively) succinctly, associating data).
General Use:
(for more details read the comments in the SQL CREATE TABLE statements)
-Populate the DaysOfWeek table.
-Populate the TimeFrames table with some time frames you want to track (an AM timeframe might have a StartTime of 00:00:00 & an end time of 11:59:59 while PM might have StartTime of 12:00:00 & EndTime of 23:59:59)
-Add Users
-Add Dates to be tracked (see notes in SQL for thoughts on avoiding bloat & also the virtues of this table)
-Populate the Templates table for each user
-Generate the list of default Availabilities (with their associated AvailableTimes data) for each user
-Expose the default Availabilities to the users so they can override the defaults
NOTE: you can also add an optional table for Engagements to be the opposite of Availabilities (or maybe there is a better abstraction that would include both concepts...)
Disclaimer: I did not take the additional time to fully populate my local DB & verify everything so there may be some weaknesses/errors I did not see in my diagrams... (sorry I spent far longer than intended on this & must get work done on an overdue project).
While I have worked fairly extensively with DB structures & with DBs others have created for 12+ years I'm sure I am not without fault, I hope others on StackOverflow will round out mistakes I may have included.
I apologize for not including more example data.
If I have time in the near future I will provide some, (think adding George, Fred, & Harry to the users table, adding some dates to the Dates table then detailing how busy George & Fred are compared to Harry during their school week using the Availabilities, AvailableTimes & TimeFrames tables).
The SELECT statement (NOTE: I would highly recommend making this into a view... in that way you can select whatever columns you want & add whatever arguments/conditions you want in a WHERE clause without having to write the joins out every time... so the view would NOT include the WHERE clause... just to make that clear):
SELECT *
FROM Users Us
JOIN Availabilities Av
ON Us.User_ID=Av.User_ID
JOIN Dates Da
ON Av.Date_ID=Da.Date_ID
JOIN AvailableTimes Avt
ON Av.Av_ID=Avt.Av_ID
WHERE Da.Date='2014-01-03' -- whatever date
-- alternately: WHERE Da.DayOWeek_ID=3 -- which would be Wednesday
-- WHERE Da.Date BETWEEN() -- whatever date range...
-- etc...
Recommended data in DaysOfWeek (which is effectively a lookup table):
INSERT INTO DaysOfWeek(DayOWeek_ID,Name,Description)
VALUES (1,'Sunday', 'First Day of the Week'),(1,'Monday', 'Second Day of the Week')...(7,'Saturday', 'Last Day of the Week'),(8,'AllWeek','The entire week'),(9,'Weekdays', 'Monday through Friday'),(10,'Weekends','Saturday & Sunday')
Example Templates data:
INSERT INTO Templates(Time_ID,User_ID,DayOWeek_ID)
VALUES (1,1,9)-- this would show the first user is available for the first time frame every weekday as their default...
,(1,2,2) -- this would show the first user available on Tuesdays for the second time frame
The following is the recommended schema structure:
CREATE TABLE `test`.`Users` (
User_ID INT NOT NULL AUTO_INCREMENT ,
UserName VARCHAR(45) NULL ,
PRIMARY KEY (User_ID) );
CREATE TABLE `test`.`Templates` (
`Template_ID` INT NOT NULL AUTO_INCREMENT ,
`Time_ID` INT NULL ,
`User_ID` INT NULL ,
`DayOWeek_ID` INT NULL ,
PRIMARY KEY (`Template_ID`) )
`COMMENT = 'This table holds the template data for general expected availability of a user/agent/person (so the person would use this to set their general availability)'`;
CREATE TABLE `test`.`Availabilities` (
`Av_ID` INT NOT NULL AUTO_INCREMENT ,
`User_ID` INT NULL ,
`Date_ID` INT NULL ,
PRIMARY KEY (`Av_ID`) )
COMMENT = 'This table holds a users actual availability for a particular date.\nIf the use is not available for a date then this table has no entry for that user for that date.\n(btw, this suggests the possiblity of an alternate table that could utilize all other structures except the templates called Engagements which would record when a user is actually busy... in order to use this table & the other table together would need to always join to AvailableTimes as a date would actually be in both tables but associated with different time frames).';
CREATE TABLE `test`.`Dates` (
`Date_ID` INT NOT NULL AUTO_INCREMENT ,
`DayOWeek_ID` INT NULL ,
`Date` DATE NULL ,
PRIMARY KEY (`Date_ID`) )
COMMENT = 'This table is utilized to hold actual dates whith which users/agents can be associated.\nThe important thing to note here is: this may end up holding every day of every year... this suggests a need to archive this data (and everything associated with it for performance reasons as this database is utilized).\nOne more important detail... this is more efficient than associating actual dates directly with each user/agent with an availability on that date... this way the date is only recorded once, the other approach records this date with the user for each availability.';
CREATE TABLE `test`.`AvailableTimes` (
`AvTime_ID` INT NOT NULL AUTO_INCREMENT ,
`Av_ID` INT NULL ,
`Time_ID` INT NULL ,
PRIMARY KEY (`AvTime_ID`) )
COMMENT = 'This table records the time frames that a user is available on a particular date.\nThis allows the time frames to be flexible without affecting the structure of the DB.\n(e.g. if you only keep track of AM & PM at the beginning of the use of the DB but later decide to keep track on an hourly basis you simply add the hourly time frames & start populating them, no changes to the DB schema need to be made)';
CREATE TABLE `test`.`TimeFrames` (
`Time_ID` INT NOT NULL AUTO_INCREMENT ,
`StartTime` TIME NOT NULL ,
`EndTime` TIME NOT NULL ,
`Name` VARCHAR(45) NOT NULL ,
`Desc` VARCHAR(128) NULL ,
PRIMARY KEY (`Time_ID`) ,
UNIQUE INDEX `Name_UNIQUE` (`Name` ASC) )
COMMENT = 'Utilize this table to record the times that are being tracked.\nThis allows the flexibility of having multiple time frames on the same day.\nIt also provides the flexibility to change the time frames being tracked without changing the DB structure.';
CREATE TABLE `test`.`DaysOfWeek` (
`DaysOWeek_ID` INT NOT NULL AUTO_INCREMENT ,
`Name` VARCHAR(45) NOT NULL ,
`Description` VARCHAR(128) NULL ,
PRIMARY KEY (`DaysOWeek_ID`) ,
UNIQUE INDEX `Name_UNIQUE` (`Name` ASC) )
COMMENT = 'This table is a lookup table to hold the days of the week.\nI personally would recommend adding a row for:\nWeekends, All Week, & WeekDays \nThis will often be used in conjunction with the templates and will allow less entries in that table to be made with those 3 entries in this table.';
Ok, this is would I would do:
In the users table create fields for Sunday, Monday ... Saturday.
Use pm , am or both for values in those fields.
You should also index each field in the db for faster querying.
Then make a separate table for user/date/meridian fields (meridian means am or pm). Again the meridian field values would be pm , am or both.
You will need to do a little research with php's date function to pull out the day of the week number and use a switch statement against it perhaps.
Use the requested dates and pull out the day of the week and query the user table for their day of the week availability.
Then use the requested date/meridian itself and query the new user/date/meridian table for the users' individual availability dates/meridians.
I don't think there is going to be much of an algorithm here except when extracting the days of the weeks in the date requests. If you are doing a date range then you could benefit from a algorithm but if it is just a bunch of cherry picked dates then you are just going to have to do them one by one. Let me know and maybe I'll throw you an algo for you.
Description:
I am building a rating system with mysql/php. I am confused as to how I would set up the database.
Here is my article setup:
Article table:
id | user_id | title | body | date_posted
This is my assumed rating table:
Rating table:
id | article_id | score | ? user_id ?
Problem:
I don't know if I should place the user_id in the rating table. My plan is to use a query like this:
SELECT ... WHERE user_id = 1 AND article_id = 10
But I know that it's redundant data as it stores the user_id twice. Should I figure out a JOIN on the tables or is the structure good as is?
It depends. I'm assuming that the articles are unique to individual users? In that case, I could retain the user_id in your rating table and then just alter your query to:
SELECT ... WHERE article_id = 10
or
SELECT ... WHERE user_id = 1
Depending on what info you're trying to pull.
You're not "storing the user_id twice" so much as using the user_id to link the article to unique data associated to the user in another table. You're taking the right approach, except in your query.
I don't see anything wrong with this approach. The user id being stored twice is not particularly relevant since one is regarding a rating entry and the other, i assume, is related to the article owner.
The benefit of this way is you can prevent multiple scores being recorded for each user by making article_id and user_id unique and use replace into to manage scoring.
There are many things to elaborate on this depending on whether or not this rating system needs to be intelligent to prevent gaming, etc. How large the user base is, etc.
I bet for any normal person, this setup would not be detrimental to even a relatively large scale system.
... semi irrelevant:
Just FYI, depending on the importance and gaming aspects of this score, you could use STDDEV() to fetch an average factoring the standard deviation on the score column...
SELECT STDDEV(`score`) FROM `rating` WHERE `article_id` = {article_id}
That would factor outliers supposing you cared whether or not it looked like people were ganging up on a particular article to shoot it down or praise it without valid cause.
you should not, due to 3rd normal form, you need to keep the independence.
"The third normal form (3NF) is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971.[1] Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:
The relation R (table) is in second normal form (2NF)
Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of R."
Source here: http://en.wikipedia.org/wiki/Third_normal_form
First normal Form: http://en.wikipedia.org/wiki/First_normal_form
Second normal Form: http://en.wikipedia.org/wiki/Second_normal_form
you should take a look to normalization and E/R model it will help you a lot.
normalization in wikipedia: http://en.wikipedia.org/wiki/Database_normalization
I have a column called list which is used in my order by (in MYSQL queries) and within list is numbers: (e.g. 1 to 20)
This list is then output using MYSQL order by list ASC - However, when I update my list in backend using a Jquery drag drop UI list it is supposed to update the list frontend.
My problem is that my list order sometimes conflicts with other rows as there could be two or three rows with the value of 1 in list therefore when my order updates I would like to know how I can update other rows by +1 only if the rows are >= the order number given
I do not want to make the column primary as I am not aiming to make the list column unique, the reason for this is because there is more than one category - and in each category they all start at 1 - therefore if I make it unique it would cause errors because there was multiple 1's over different categories.
I asked a friend who said I could probably try PL/SQL using a trigger function but this is new grounds to me - I don't fully understand that language and was wondering if anyone could help me do what I am trying to using MYSQL or even PL/SQL.
This is what I have so far:
<?php
$project = mysql_real_escape_string(stripslashes($_POST['pid']));
$category = mysql_real_escape_string(stripslashes($_POST['cat']));
$order = mysql_real_escape_string(stripslashes($_POST['order']));
// need to do update the moved result (list row) and update all conflicting rows by 1
mysql_query("UPDATE `projects` SET `cat`='$category',`list`='$order' WHERE `id`='$project'")or die(mysql_query());
?>
Conclusion:
I am trying to update a none unique column to have unique values for that individual category. I am not sure how to update all the rows in that category by +1
#andrewsi is right, in particular I suggest order by list ASC, last_update DESC so in the same query where you update list you can timestamp last_update and therefore you will have not need to use triggers or any other updates.
In general, what andrewsi and Luis have suggested is true. Instead of (like andrewsi said) "do messy updates" you should really consider ordering by a second column.
However, I can maybe see your point for your approach. One similar situation I know it could apply is in a CMS where you let the backend user order items by changing the order number manually in textfields next to the items, e.g.
item 1 - [ 1 ]
item 2 - [ 3 ]
item 3 - [ 2 ]
... the number in the [] would then be the new order.
So, a quite messy solution would be (many steps, but if you do not have to worry about performance it might be OK for you, I don't know):
INSERT INTO projects (cat, list, timestamp_inserted) VALUES (:cat, :list, NOW())
and then as a second step
SELECT id, list FROM projects WHERE cat=:cat ORDER BY list ASC, timestamp_inserted DESC
and then loop through the array you get from the select and foreach row update (:i is the increasing index)
UPDATE projects SET list=:i WHERE id=:id
PS: you would have to add a column timestamp_inserted with a timestamp value.
PPS: to clearly state, I would not recommend this and never said it is best practice (for those considering to downvote because of this)
I got multilevel comment system, I store comments in mysql database table with such fields:
id
article_id
user_id
date
content
comment_id
Where comment_id is parent comment's id.
how can i count number of replies to user comments after some specific date for all articles?
e.g:
- comment1
-- comment1.1
--- comment1.1.1
-- comment1.2
-- comment1.3
--- comment1.3.1
if user posted comment1, i need query to return 5. If user posted comment 1.3 - return 1.
See Managing Hierarchical Data in MySQL for some ideas. One simple approach is to store the path in the comment tree like you listed above and do a LIKE query. E.g.:
SELECT COUNT(*) WHERE comment_path LIKE 'comment1.%'
You'll of course want an index on the comment_path column, which will be used as long as a % is only used on the end.
if it is possible, you can change your data schema to Nested Sets. With this schema you can count the answers in every hierarchy with a simple addition/substraction. Unfortunately I know only good tutorials in German :-/ for example this.
I know i am writing query's wrong and when we get a lot of traffic, our database gets hit HARD and the page slows to a grind...
I think I need to write queries based on CREATE VIEW from the last 30 days from the CURDATE ?? But not sure where to begin or if this will be MORE efficient query for the database?
Anyways, here is a sample query I have written..
$query_Recordset6 = "SELECT `date`, title, category, url, comments
FROM cute_news
WHERE category LIKE '%45%'
ORDER BY `date` DESC";
Any help or suggestions would be great! I have about 11 queries like this, but I am confident if I could get help on one of these, then I can implement them to the rest!!
Putting a wildcard on the left side of a value comparison:
LIKE '%xyz'
...means that an index can not be used, even if one exists. Might want to consider using Full Text Searching (FTS), which means adding full text indexing.
Normalizing the data would be another step to consider - categories should likely be in a separate table.
SELECT `date`, title, category, url, comments
FROM cute_news
WHERE category LIKE '%45%'
ORDER BY `date` DESC
The LIKE '%45%' means a full table scan will need to be performed. Are you perhaps storing a list of categories in the column? If so creating a new table storing category and news_article_id will allow an index to be used to retrieve the matching records much more efficiently.
OK, time for psychic debugging.
In my mind's eye, I see that query performance would be improved considerably through database normalization, specifically by splitting the category multi-valued column into a a separate table that has two columns: the primary key for cute_news and the category ID.
This would also allow you to directly link said table to the categories table without having to parse it first.
Or, as Chris Date said: "Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else)."
Anything with LIKE '%XXX%' is going to be slow. Its a slow operation.
For something like categories, you might want to separate categories out into another table and use a foreign key in the cute_news table. That way you can have category_id, and use that in the query which will be MUCH faster.
Also, I'm not quite sure why you're talking about using CREATE VIEW. Views will not really help you for speed. Not unless its a materialized view, which MySQL doesn't suppose natively.
If your database is getting hit hard, the solution isn't to make a view (the view is still basically the same amount of work for the database to do), the solution is to cache the results.
This is especially applicable since, from what it sounds like, your data only needs to be refreshed once every 30 days.
I'd guess that your category column is a list of category values like "12,34,45,78" ?
This is not good relational database design. One reason it's not good is as you've discovered: it's incredibly slow to search for a substring that might appear in the middle of that list.
Some people have suggested using fulltext search instead of the LIKE predicate with wildcards, but in this case it's simpler to create another table so you can list one category value per row, with a reference back to your cute_news table:
CREATE TABLE cute_news_category (
news_id INT NOT NULL,
category INT NOT NULL,
PRIMARY KEY (news_id, category),
FOREIGN KEY (news_id) REFERENCES cute_news(news_id)
) ENGINE=InnoDB;
Then you can query and it'll go a lot faster:
SELECT n.`date`, n.title, c.category, n.url, n.comments
FROM cute_news n
JOIN cute_news_category c ON (n.news_id = c.news_id)
WHERE c.category = 45
ORDER BY n.`date` DESC
Any answer is a guess, show:
- the relevant SHOW CREATE TABLE outputs
- the EXPLAIN output from your common queries.
And Bill Karwin's comment certainly applies.
After all this & optimizing, sampling the data into a table with only the last 30 days could still be desired, in which case you're better of running a daily cronjob to do just that.