how to make history of sql data? (report data changes)

how to make history of sql data? (report data changes) - php

Every day, I am saving (with crontab, php script) into database bugs information. Every row is like:
(Bugidentification, Date, Title, Who, etc....)
(e.g:
Bugidentification, Date, Title, Who, etc....
issue1, 2015-04-01, blabla, bill, etc...
issue2, 2015-04-01, nnnnnnn, john, etc...
issue3, 2015-04-01, vvvvvvv, greg, etc...
issue1, 2015-04-02, blabla, bill, etc...
issue2, 2015-04-02, nnnnnnn, john, etc...
issue3, 2015-04-02, vvvvvvv, mario, etc... (here it is now mario)
issue2, 2015-04-03, nnnnnnn, john, etc... (issue1 dissapeared)
issue3, 2015-04-03, vvvvvvv, tod, etc... (tod is new info)
issue4, 2015-04-03, rrrrrrrr, john, etc... (issue4 is new)
.............................................
)
Basically if I take example I posted above, results should be something like for comparison between date of April 2nd and April 3rd
New row is : issue4
Closed row is : Issue1
Updated row is : Issue3 (with tod instead of mario)
No change row is : Issue2
In my case there are hundreds of rows and I believe I know how to do it thanks to php, but my code will be long like creating foreach loops and see one by one if any change. I am not sure I am getting straightforward solution.
So my question is, is there any simple way to report those changes with "simple" code (like sql special request or any project code out there or simple php functions?).

There are way too many assumptions built into this design. And those assumptions require you to compare rows between different days to make the assumption in the first place -- not to mention you have to duplicate unchanged rows from one day to the next in order to maintain the unbroken daily entry needed to feed the assumptions. Whew.
Rule 1: don't build assumptions into the design. If something is new, it should be marked, "HEY! I'm new here!" When a change has been made to the data, "OK, something changed. Here it is." and when the issue has finally been closed, "OK, that's it for me. I'm done for."
create table Bug_Static( -- Contains one entry for each bug
ID int identity,
Opened date not null default sysdate,
Closed date [null | not null default date '9999-12-31'],
Title varchar(...),
Who id references Who_Table,
<other non-changing data>,
constraint PK_Bug_Static primary key( ID )
);
create table Bug_Versions( -- Contains changing data, like status
ID int not null,
Effective date not null,
Status varchar not null, -- new,assigned,in work,closed,whatever
<other data that may change from day to day>,
constraint PK_Bug_Versions primary key( ID, Effective ),
constraint FK_Bug_Versions_Static foreign key( ID )
references Bug_Static( ID )
);
Now you can select the bugs and the current data (the last change made) on any given day.
select s.ID, s.Opened, s.Title, v.Effective, v.Status
from Bug_Static s
join Bug_Versions v
on v.ID = s.ID
and v.Effective =(
select Max( Effective )
from Bug_Versions
where ID = v.ID
and Effective <= sysdate )
where s.Closed < sysdate;
The where s.Closed < sysdate is optional. What that gives you is all the bugs that were closed on the date the query is executed, but not the ones closed before then. That keeps the closed bugs from reappearing over and over again -- unless that's what you want.
Change the sysdate values to a particular date/time and you will get the data as it appeared as of that date and time.
Normally, when a bug is created, a row is entered into both tables. Then only new versions are entered as the status or any other data changes. If nothing changed on a day, nothing is entered. Then when the bug is finally closed, the Closed field of the static table is updated and a closed version is inserted into the version table. I've shown the Closed field with two options, null or with the defined "maximum date" of Dec 31, 9999. You can use either one but I like the max date method. It simplifies the queries.
I would also front both tables with a couple of views which joins the tables. One which shows only the last versions of each bug (Bug_Current) and one which shows every version of every bug (Bug_History). With triggers on Bug_Current, it can be the one used by the app to change the bugs. It would change, for instance, an update of any versioned field to an insert of a new version.
The point is, this is a very flexible design which you can easily show just the data you want, how you want it, as of any time you want.

Related

Is it possible to partially get/modify a field?

I'm setting up to gather long time statistics. It will be recorded in little blocks that I'm planning to stick all into one TEXT field, latest first.. sorta like this
[date:03.01.2016,data][date:02.01.2016,data][date:01.01.2016,data]...
it will be more frequent than that (just a sample) but should remain small enough to keep recording for decades, yet big enough to make me want to optimize it.
I'm looking for 2 things
Can you append to the front of a field in mysql?
Can you read the field partially, just the first 100 characters for example?
The blocks will be fixed length so I can accurately estimate how many characters I need to download to display statistics for X time period.

The answer to your two questions is "yes":
update t
set field = concat($newval, field)
where id = $id;
And:
select left(field, 100)
from t
where id = $id;
(These assume that you have multiple rows in the table.)
That said, you method of storing the data is absolutely not the right thing to do in a relational database.
Presumably, you want a table that looks something like this:
create table t (
tId int auto_increment primary key,
creationDate date,
data <something>
);
(This may be more complicated if data should be multiple columns.)
Then you insert into the table:
insert into t(createDate, data)
select $date, $data;
And you can fetch the most recent row:
select t.*
from t
order by tId desc
limit 1;
All of these are just examples, because your question doesn't give a complete picture of the data.

Need help in simple sql to add record

We use (INSERT INTO) to insert a record in the table which creates more than one record when used again. Is there any way to add a record and alternately replacing the prevoius one without adding any new record.
I know this would work:
UPDATE Customers
SET ContactName='Alfred Schmidt', City='Hamburg'
WHERE CustomerName='Alfreds Futterkiste';
But what if there is no condition ie. we don't know the record, we only know the column name. Is there any way to fill only one record and alternately replace the previous record without creating 2nd record?

OK... updating if a record exists or creating a record if there are zero records is a pretty simple matter and you have a solution for it. That having been said, I would do something different and keep track of my message of the day by date:
-- This is REALLY BASIC, but, just to give you the idea...
CREATE TABLE [dbo].[MessageOfTheDay](
[MessageDate] [date] not null,
[MessageContents] [nvarchar](500) not null,
UNIQUE (MessageDate)
)
declare #MessageContents nvarchar(500), #MessageDate date
set #MessageContents = 'This is the new MOTD!!!'
set #MessageDate = GETDATE()
-- Every day, create a new record and you can keep track of previous MOTD entries...
insert into MessageOfTheDay(MessageDate, MessageContents)
values (#MessageDate, #MessageContents)
-- Get the message for today
select MessageContents from MessageOfTheDay where MessageDate = #MessageDate
-- If you want, you can now create messages for FUTURE days as well:
set #MessageContents = 'This is tomorrow''s MOTD!!!';
set #MessageDate = dateadd(D, 1,GETDATE())
insert into MessageOfTheDay(MessageDate, MessageContents)
values (#MessageDate, #MessageContents)
-- Get tomorrow's message
select MessageContents from MessageOfTheDay where MessageDate = #MessageDate
-- If you aren't necessarily going to have one per day and want to always just show the most recent entry
select MessageContents from MessageOfTheDay order by MessageDate desc limit 1
Anyway, that's just my $.02. At some point I bet you will want to look over the history of your MOTD and when you do, you will be happy that you have that history. Plus, this more accurately models the data you are trying to represent.

I got my answer and It's working now!
I used:
INSERT INTO data (a, b, c)
VALUES
('1','2','3')
ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b)

Statistic : Compare previous value with actual value, database structure

I'd like to be able to tell if the actual value is higher than the previous week (we take -7 calendar days). So i could show the information and its evolution :
For that case i am talking about the status of my work item (post). This is my table (See status and status_last_update):
So to implement this new feature, i am thinking about 2 solutions. I'd like to know which one would be better and the reason for it. Or if there is any better solution.
First solution : i add 2 more columns in my actual table "previous_status" "previous_status_last_update".
Second solution : I create a new table that will store the previous status and the date of its last update.
Third solution : I create a table storing the actual value and previous value :
nb_new
nb_new_last_update
nb_under_discussion
nb_under_discussion_last_update
nb_liked
nb_liked_last_update
nb_disliked
nb_disliked_last_update
nb_approved
nb_approved_last_update
nb_rejected
nb_rejected_last_update
nb_new_previous
nb_new_last_update_previous
nb_under_discussion_previous
nb_under_discussion_last_update_previous
nb_liked_previous
nb_liked_last_update_previous
nb_disliked_previous
nb_disliked_last_update_previous
nb_approved_previous
nb_approved_last_update_previous
nb_rejected_previous
nb_rejected_last_update_previous

Why not just insert a little logic into your SQL statement that pulls the data from last week and this week into a single row like this:
select
sub.thisWeekStatus,
sub.lastWeekStatus,
sub.someOtherColumn
from
(
select
case
when dateField=cur_date() then status
end as thisWeekStatus,
case
when dateField<>cur_date() then status
end as lastWeekStatus,
someOtherColumn
from
yourTable
where
dateField>'dateLastWeek'
) sub
group by
sub.someOtherColumn
Obviously you need to tinker with it a little, I haven't bothered with the date functions to pull just the last week (or whatever) of data, but this can be adjusted to meet your exact specifications.

Schedule Searching in PHP/MySQL with templates and overrides

I'm looking for some advice/help on quite a complex search algorithm. Any articles to relevant techniques etc. would be much appreciated.
Background
I'm building an application, which, in a nutshell, allows users to set their "availability" for any given day. The User first sets a general availability template which allows them to say:
Monday - AM
Tuesday - PM
Wednesday - All Day
Thursday - None
Friday - All Day
So this User is generally available Monday AM, Tuesday PM etc.
Schema:
id
user_id
day_of_week (1-7)(Monday to Sunday)
availability
They can then override specific dates manually, for example:
2013-03-03 - am
2013-03-04 - pm
2013-03-05 - all_day
Schema:
id
user_id
date
availability
This all works well - I have a Calendar being generated which combines the template and overrides and allows Users to modify their availability etc.
The Problem
I now need to allow Admin Users to search for Users who have specific availability. So the Admin User would use a calendar to select required dates and availability's and hit search.
For example, find me Users who are available:
2013-03-03 - pm
2013-03-04 - pm
2013-03-05 - pm
The search process would have to search for available Users using the Templated Availability and Overrides, then return the best results. Ideally, it would return Users who are available all of the time but in the case that no single user can match the dates, I need to provide a combination of Users who can.
I know this is quite a complex problem and I'm not looking for a complete answer, perhaps just some guidance or links to potentially relevant techniques etc.
What I've tried
At the moment, I have a halfway solution. I'm grabbing all the available Users, looping through each of them, and within that loop, looping through all of the required dates and breaking as soon as a User doesn't meet a required date. This is obviously very un-scalable and it's also only returning "perfect matches".
Possible Solutions
Full Text Searching with Aggregate Table
I thought about creating a separate table which had the following schema:
user_id
body
The body field would be populated with the Users template days and overrides so an example record might look like:
user_id: 2
body: monday_am tuesday_pm wednesday_pm thursday_am friday_allday 2013-03-03_all_day 2013-03-03_pm
I would then convert a Users search query into a similar format. So if a User was looking for someone who was available on the 19th March 2013 - All Day and 20th March 2013 - PM, I'd convert that into a string.
Firstly, as 19th March is a Tuesday, I'd convert that into tuesday_allday and same with the 20th. I'd therefore end up with:
tuesday_allday wednesday_pm 2013-03-19_allday 2013-03-20_pm
I'd then do a full text search against our aggregate table and return a "weighted" result set which I can then loop through and further interrogate.
I'm not sure how this would work in practice, so that's why I'm asking if anyone has any links to techniques or relevant articles I could use.

I am confident this problem can be solved with a more well defined DB schema.
By utilizing a more detailed DB schema you will be able to find any available user for any given time frame (not just am & pm) if you should so choose.
It will also allow you to keep template data, while not polluting your availability data with template information (instead you would select from the template table to programmatically fill in the availability for a given date, which then can be modified by the user).
I spent some time diagramming this problem and came up with a schema structure that I believe solves the problem you specified and allows you to grow your application with a minimum of schema changes.
(To make this easier to read I've added the SQL at the end of this proposed answer)
I have also included an example select statement that would allow you to pull availability data with any number of arguments.
For clarity that SELECT is above the SQL for the schema # the end of my explanatory text.
Please don't be intimidated by the select, it may look complicated # first glance but is really a map to the entire schema (save the templates table).
(btw, I'm not saying that because I have any doubt that you can understand it, I'm sure you can, but I've known many programmers who ignore more complex DB structures to their own detriment because it LOOKS overly complex but when analyzed is actually less complex than the acrobatics they have to do in their program to get similar results... Relational DBs are based on a branch of mathematics that is good # accurately, consistently, & (relatively) succinctly, associating data).
General Use:
(for more details read the comments in the SQL CREATE TABLE statements)
-Populate the DaysOfWeek table.
-Populate the TimeFrames table with some time frames you want to track (an AM timeframe might have a StartTime of 00:00:00 & an end time of 11:59:59 while PM might have StartTime of 12:00:00 & EndTime of 23:59:59)
-Add Users
-Add Dates to be tracked (see notes in SQL for thoughts on avoiding bloat & also the virtues of this table)
-Populate the Templates table for each user
-Generate the list of default Availabilities (with their associated AvailableTimes data) for each user
-Expose the default Availabilities to the users so they can override the defaults
NOTE: you can also add an optional table for Engagements to be the opposite of Availabilities (or maybe there is a better abstraction that would include both concepts...)
Disclaimer: I did not take the additional time to fully populate my local DB & verify everything so there may be some weaknesses/errors I did not see in my diagrams... (sorry I spent far longer than intended on this & must get work done on an overdue project).
While I have worked fairly extensively with DB structures & with DBs others have created for 12+ years I'm sure I am not without fault, I hope others on StackOverflow will round out mistakes I may have included.
I apologize for not including more example data.
If I have time in the near future I will provide some, (think adding George, Fred, & Harry to the users table, adding some dates to the Dates table then detailing how busy George & Fred are compared to Harry during their school week using the Availabilities, AvailableTimes & TimeFrames tables).
The SELECT statement (NOTE: I would highly recommend making this into a view... in that way you can select whatever columns you want & add whatever arguments/conditions you want in a WHERE clause without having to write the joins out every time... so the view would NOT include the WHERE clause... just to make that clear):
SELECT *
FROM Users Us
JOIN Availabilities Av
ON Us.User_ID=Av.User_ID
JOIN Dates Da
ON Av.Date_ID=Da.Date_ID
JOIN AvailableTimes Avt
ON Av.Av_ID=Avt.Av_ID
WHERE Da.Date='2014-01-03' -- whatever date
-- alternately: WHERE Da.DayOWeek_ID=3 -- which would be Wednesday
-- WHERE Da.Date BETWEEN() -- whatever date range...
-- etc...
Recommended data in DaysOfWeek (which is effectively a lookup table):
INSERT INTO DaysOfWeek(DayOWeek_ID,Name,Description)
VALUES (1,'Sunday', 'First Day of the Week'),(1,'Monday', 'Second Day of the Week')...(7,'Saturday', 'Last Day of the Week'),(8,'AllWeek','The entire week'),(9,'Weekdays', 'Monday through Friday'),(10,'Weekends','Saturday & Sunday')
Example Templates data:
INSERT INTO Templates(Time_ID,User_ID,DayOWeek_ID)
VALUES (1,1,9)-- this would show the first user is available for the first time frame every weekday as their default...
,(1,2,2) -- this would show the first user available on Tuesdays for the second time frame
The following is the recommended schema structure:
CREATE TABLE `test`.`Users` (
User_ID INT NOT NULL AUTO_INCREMENT ,
UserName VARCHAR(45) NULL ,
PRIMARY KEY (User_ID) );
CREATE TABLE `test`.`Templates` (
`Template_ID` INT NOT NULL AUTO_INCREMENT ,
`Time_ID` INT NULL ,
`User_ID` INT NULL ,
`DayOWeek_ID` INT NULL ,
PRIMARY KEY (`Template_ID`) )
`COMMENT = 'This table holds the template data for general expected availability of a user/agent/person (so the person would use this to set their general availability)'`;
CREATE TABLE `test`.`Availabilities` (
`Av_ID` INT NOT NULL AUTO_INCREMENT ,
`User_ID` INT NULL ,
`Date_ID` INT NULL ,
PRIMARY KEY (`Av_ID`) )
COMMENT = 'This table holds a users actual availability for a particular date.\nIf the use is not available for a date then this table has no entry for that user for that date.\n(btw, this suggests the possiblity of an alternate table that could utilize all other structures except the templates called Engagements which would record when a user is actually busy... in order to use this table & the other table together would need to always join to AvailableTimes as a date would actually be in both tables but associated with different time frames).';
CREATE TABLE `test`.`Dates` (
`Date_ID` INT NOT NULL AUTO_INCREMENT ,
`DayOWeek_ID` INT NULL ,
`Date` DATE NULL ,
PRIMARY KEY (`Date_ID`) )
COMMENT = 'This table is utilized to hold actual dates whith which users/agents can be associated.\nThe important thing to note here is: this may end up holding every day of every year... this suggests a need to archive this data (and everything associated with it for performance reasons as this database is utilized).\nOne more important detail... this is more efficient than associating actual dates directly with each user/agent with an availability on that date... this way the date is only recorded once, the other approach records this date with the user for each availability.';
CREATE TABLE `test`.`AvailableTimes` (
`AvTime_ID` INT NOT NULL AUTO_INCREMENT ,
`Av_ID` INT NULL ,
`Time_ID` INT NULL ,
PRIMARY KEY (`AvTime_ID`) )
COMMENT = 'This table records the time frames that a user is available on a particular date.\nThis allows the time frames to be flexible without affecting the structure of the DB.\n(e.g. if you only keep track of AM & PM at the beginning of the use of the DB but later decide to keep track on an hourly basis you simply add the hourly time frames & start populating them, no changes to the DB schema need to be made)';
CREATE TABLE `test`.`TimeFrames` (
`Time_ID` INT NOT NULL AUTO_INCREMENT ,
`StartTime` TIME NOT NULL ,
`EndTime` TIME NOT NULL ,
`Name` VARCHAR(45) NOT NULL ,
`Desc` VARCHAR(128) NULL ,
PRIMARY KEY (`Time_ID`) ,
UNIQUE INDEX `Name_UNIQUE` (`Name` ASC) )
COMMENT = 'Utilize this table to record the times that are being tracked.\nThis allows the flexibility of having multiple time frames on the same day.\nIt also provides the flexibility to change the time frames being tracked without changing the DB structure.';
CREATE TABLE `test`.`DaysOfWeek` (
`DaysOWeek_ID` INT NOT NULL AUTO_INCREMENT ,
`Name` VARCHAR(45) NOT NULL ,
`Description` VARCHAR(128) NULL ,
PRIMARY KEY (`DaysOWeek_ID`) ,
UNIQUE INDEX `Name_UNIQUE` (`Name` ASC) )
COMMENT = 'This table is a lookup table to hold the days of the week.\nI personally would recommend adding a row for:\nWeekends, All Week, & WeekDays \nThis will often be used in conjunction with the templates and will allow less entries in that table to be made with those 3 entries in this table.';

Ok, this is would I would do:
In the users table create fields for Sunday, Monday ... Saturday.
Use pm , am or both for values in those fields.
You should also index each field in the db for faster querying.
Then make a separate table for user/date/meridian fields (meridian means am or pm). Again the meridian field values would be pm , am or both.
You will need to do a little research with php's date function to pull out the day of the week number and use a switch statement against it perhaps.
Use the requested dates and pull out the day of the week and query the user table for their day of the week availability.
Then use the requested date/meridian itself and query the new user/date/meridian table for the users' individual availability dates/meridians.
I don't think there is going to be much of an algorithm here except when extracting the days of the weeks in the date requests. If you are doing a date range then you could benefit from a algorithm but if it is just a bunch of cherry picked dates then you are just going to have to do them one by one. Let me know and maybe I'll throw you an algo for you.

CREATE VIEW for MYSQL for last 30 days

I know i am writing query's wrong and when we get a lot of traffic, our database gets hit HARD and the page slows to a grind...
I think I need to write queries based on CREATE VIEW from the last 30 days from the CURDATE ?? But not sure where to begin or if this will be MORE efficient query for the database?
Anyways, here is a sample query I have written..
$query_Recordset6 = "SELECT `date`, title, category, url, comments
FROM cute_news
WHERE category LIKE '%45%'
ORDER BY `date` DESC";
Any help or suggestions would be great! I have about 11 queries like this, but I am confident if I could get help on one of these, then I can implement them to the rest!!

Putting a wildcard on the left side of a value comparison:
LIKE '%xyz'
...means that an index can not be used, even if one exists. Might want to consider using Full Text Searching (FTS), which means adding full text indexing.
Normalizing the data would be another step to consider - categories should likely be in a separate table.

SELECT `date`, title, category, url, comments
FROM cute_news
WHERE category LIKE '%45%'
ORDER BY `date` DESC
The LIKE '%45%' means a full table scan will need to be performed. Are you perhaps storing a list of categories in the column? If so creating a new table storing category and news_article_id will allow an index to be used to retrieve the matching records much more efficiently.

OK, time for psychic debugging.
In my mind's eye, I see that query performance would be improved considerably through database normalization, specifically by splitting the category multi-valued column into a a separate table that has two columns: the primary key for cute_news and the category ID.
This would also allow you to directly link said table to the categories table without having to parse it first.
Or, as Chris Date said: "Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else)."

Anything with LIKE '%XXX%' is going to be slow. Its a slow operation.
For something like categories, you might want to separate categories out into another table and use a foreign key in the cute_news table. That way you can have category_id, and use that in the query which will be MUCH faster.
Also, I'm not quite sure why you're talking about using CREATE VIEW. Views will not really help you for speed. Not unless its a materialized view, which MySQL doesn't suppose natively.

If your database is getting hit hard, the solution isn't to make a view (the view is still basically the same amount of work for the database to do), the solution is to cache the results.
This is especially applicable since, from what it sounds like, your data only needs to be refreshed once every 30 days.

I'd guess that your category column is a list of category values like "12,34,45,78" ?
This is not good relational database design. One reason it's not good is as you've discovered: it's incredibly slow to search for a substring that might appear in the middle of that list.
Some people have suggested using fulltext search instead of the LIKE predicate with wildcards, but in this case it's simpler to create another table so you can list one category value per row, with a reference back to your cute_news table:
CREATE TABLE cute_news_category (
news_id INT NOT NULL,
category INT NOT NULL,
PRIMARY KEY (news_id, category),
FOREIGN KEY (news_id) REFERENCES cute_news(news_id)
) ENGINE=InnoDB;
Then you can query and it'll go a lot faster:
SELECT n.`date`, n.title, c.category, n.url, n.comments
FROM cute_news n
JOIN cute_news_category c ON (n.news_id = c.news_id)
WHERE c.category = 45
ORDER BY n.`date` DESC

Any answer is a guess, show:
- the relevant SHOW CREATE TABLE outputs
- the EXPLAIN output from your common queries.
And Bill Karwin's comment certainly applies.
After all this & optimizing, sampling the data into a table with only the last 30 days could still be desired, in which case you're better of running a daily cronjob to do just that.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.