Statistic : Compare previous value with actual value, database structure

Statistic : Compare previous value with actual value, database structure - php

I'd like to be able to tell if the actual value is higher than the previous week (we take -7 calendar days). So i could show the information and its evolution :
For that case i am talking about the status of my work item (post). This is my table (See status and status_last_update):
So to implement this new feature, i am thinking about 2 solutions. I'd like to know which one would be better and the reason for it. Or if there is any better solution.
First solution : i add 2 more columns in my actual table "previous_status" "previous_status_last_update".
Second solution : I create a new table that will store the previous status and the date of its last update.
Third solution : I create a table storing the actual value and previous value :
nb_new
nb_new_last_update
nb_under_discussion
nb_under_discussion_last_update
nb_liked
nb_liked_last_update
nb_disliked
nb_disliked_last_update
nb_approved
nb_approved_last_update
nb_rejected
nb_rejected_last_update
nb_new_previous
nb_new_last_update_previous
nb_under_discussion_previous
nb_under_discussion_last_update_previous
nb_liked_previous
nb_liked_last_update_previous
nb_disliked_previous
nb_disliked_last_update_previous
nb_approved_previous
nb_approved_last_update_previous
nb_rejected_previous
nb_rejected_last_update_previous

Why not just insert a little logic into your SQL statement that pulls the data from last week and this week into a single row like this:
select
sub.thisWeekStatus,
sub.lastWeekStatus,
sub.someOtherColumn
from
(
select
case
when dateField=cur_date() then status
end as thisWeekStatus,
case
when dateField<>cur_date() then status
end as lastWeekStatus,
someOtherColumn
from
yourTable
where
dateField>'dateLastWeek'
) sub
group by
sub.someOtherColumn
Obviously you need to tinker with it a little, I haven't bothered with the date functions to pull just the last week (or whatever) of data, but this can be adjusted to meet your exact specifications.

Related

How to filter out certain rows in MySQL dynamically to query against them?

I have a PHP - MySQL set up . I have a table devicevalue structure of it is like this
devId | vals | date | time
xysz | 23 | 2020.02.17 | 22.06
abcs | 44 | 2020.02.31 | 22.07
The vals columns hold temperature values .
any user loggin in on my webapp have access to only certain devices.
Here are steps
On my website "a user" selects from and to dates for which he wants to see data & submit it
Then these dates are passed a page "getrecords.php " ,where there are lot select queries ( and many are in loop ) to fetch filtered data in required format
The problem is that this table holds almost 2-3 Million records . and in every where clause I have to add to and from conditions. this causes to search in entire table .
My question is there any way that I can get temporary table at step 1 which will have only certain rows based on given two dates and then all my queries on other page will be against that temporary table ?

Edit: If your date column is a text string, you must convert it to a column of type DATE or TIMESTAMP, or you will never get good performance from this table. A vast amount of optimization code is in the MySQL server to make handling of time/date data types efficient. If you store dates or times as strings, you defeat all that optimization code.
Then, put an index on your date column like this.
CREATE INDEX date_from_to ON devicevalue (`date`, devId, vals, `time` );
It's called a covering index because the entire query can be satisfied using it only.
Then, in your queries use
WHERE date >= <<<fromdate>>>
AND date < <<<todate>> + INTERVAL 1 DAY
Doing this indexing correctly gets rid of the need to create temp tables.
If your query has something like `WHERE devId = <<>> in it, you need this index instead (or in addition).
CREATE INDEX date_id_from_to ON devicevalue (devId, `date`, vals, `time` );
If you get a chance to change this table's layout, combine the date and time columns into a single column with TIMESTAMP data type. The WHERE clauses I showed you above will still work correctly if you do that. And everything will be just as fast.
SQL is made to solve your kind of problem simply and fast. With a good data choices and proper indexing, a few million records is a modestly-sized table.

Short answer: No. Don't design temp tables that need to live between sessions.
Longer answer:
Build into your app that the date range will be passed from one page to the next, then use those as initial values in the <form> <input type=text...>
Then make sure you have a good composite index for the likely queries. But, to do that, you must get a feel for what might be requested. You will probably need a small number of multi-column indexes.
You can probably build a SELECT from the form entries. I rarely need to use more than one query, but it is mostly "constructed" on the fly based on the form.
It is rarely a good idea to have separate columns for date and time. It makes it very difficult, for example, to say noon one day to noon the next day. Combine into a DATETIME or TIMESTAMP.
O.Jones has said a lot of things that I would normally add here.

how to make history of sql data? (report data changes)

Every day, I am saving (with crontab, php script) into database bugs information. Every row is like:
(Bugidentification, Date, Title, Who, etc....)
(e.g:
Bugidentification, Date, Title, Who, etc....
issue1, 2015-04-01, blabla, bill, etc...
issue2, 2015-04-01, nnnnnnn, john, etc...
issue3, 2015-04-01, vvvvvvv, greg, etc...
issue1, 2015-04-02, blabla, bill, etc...
issue2, 2015-04-02, nnnnnnn, john, etc...
issue3, 2015-04-02, vvvvvvv, mario, etc... (here it is now mario)
issue2, 2015-04-03, nnnnnnn, john, etc... (issue1 dissapeared)
issue3, 2015-04-03, vvvvvvv, tod, etc... (tod is new info)
issue4, 2015-04-03, rrrrrrrr, john, etc... (issue4 is new)
.............................................
)
Basically if I take example I posted above, results should be something like for comparison between date of April 2nd and April 3rd
New row is : issue4
Closed row is : Issue1
Updated row is : Issue3 (with tod instead of mario)
No change row is : Issue2
In my case there are hundreds of rows and I believe I know how to do it thanks to php, but my code will be long like creating foreach loops and see one by one if any change. I am not sure I am getting straightforward solution.
So my question is, is there any simple way to report those changes with "simple" code (like sql special request or any project code out there or simple php functions?).

There are way too many assumptions built into this design. And those assumptions require you to compare rows between different days to make the assumption in the first place -- not to mention you have to duplicate unchanged rows from one day to the next in order to maintain the unbroken daily entry needed to feed the assumptions. Whew.
Rule 1: don't build assumptions into the design. If something is new, it should be marked, "HEY! I'm new here!" When a change has been made to the data, "OK, something changed. Here it is." and when the issue has finally been closed, "OK, that's it for me. I'm done for."
create table Bug_Static( -- Contains one entry for each bug
ID int identity,
Opened date not null default sysdate,
Closed date [null | not null default date '9999-12-31'],
Title varchar(...),
Who id references Who_Table,
<other non-changing data>,
constraint PK_Bug_Static primary key( ID )
);
create table Bug_Versions( -- Contains changing data, like status
ID int not null,
Effective date not null,
Status varchar not null, -- new,assigned,in work,closed,whatever
<other data that may change from day to day>,
constraint PK_Bug_Versions primary key( ID, Effective ),
constraint FK_Bug_Versions_Static foreign key( ID )
references Bug_Static( ID )
);
Now you can select the bugs and the current data (the last change made) on any given day.
select s.ID, s.Opened, s.Title, v.Effective, v.Status
from Bug_Static s
join Bug_Versions v
on v.ID = s.ID
and v.Effective =(
select Max( Effective )
from Bug_Versions
where ID = v.ID
and Effective <= sysdate )
where s.Closed < sysdate;
The where s.Closed < sysdate is optional. What that gives you is all the bugs that were closed on the date the query is executed, but not the ones closed before then. That keeps the closed bugs from reappearing over and over again -- unless that's what you want.
Change the sysdate values to a particular date/time and you will get the data as it appeared as of that date and time.
Normally, when a bug is created, a row is entered into both tables. Then only new versions are entered as the status or any other data changes. If nothing changed on a day, nothing is entered. Then when the bug is finally closed, the Closed field of the static table is updated and a closed version is inserted into the version table. I've shown the Closed field with two options, null or with the defined "maximum date" of Dec 31, 9999. You can use either one but I like the max date method. It simplifies the queries.
I would also front both tables with a couple of views which joins the tables. One which shows only the last versions of each bug (Bug_Current) and one which shows every version of every bug (Bug_History). With triggers on Bug_Current, it can be the one used by the app to change the bugs. It would change, for instance, an update of any versioned field to an insert of a new version.
The point is, this is a very flexible design which you can easily show just the data you want, how you want it, as of any time you want.

Long polling with PHP and jQuery - issue with update and delete

I wrote a small script which uses the concept of long polling.
It works as follows:
jQuery sends the request with some parameters (say lastId) to php
PHP gets the latest id from database and compares with the lastId.
If the lastId is smaller than the newly fetched Id, then it kills the
script and echoes the new records.
From jQuery, i display this output.
I have taken care of all security checks. The problem is when a record is deleted or updated, there is no way to know this.
The nearest solution i can get is to count the number of rows and match it with some saved row count variable. But then, if i have 1000 records, i have to echo out all the 1000 records which can be a big performance issue.
The CRUD functionality of this application is completely separated and runs in a different server. So i dont get to know which record was deleted.
I don't need any help coding wise, but i am looking for some suggestion to make this work while updating and deleting.
Please note, websockets(my fav) and node.js is not an option for me.

Instead of using a certain ID from your table, you could also check when the table itself was modified the last time.
SQL:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
If successful, the statement should return something like
UPDATE_TIME
2014-04-02 11:12:15
Then use the resulting timestamp instead of the lastid. I am using a very similar technique to display and auto-refresh logs, works like a charm.
You have to adjust the statement to your needs, and replace yourdb and yourtable with the values needed for your application. It also requires you to have access to information_schema.tables, so check if this is available, too.
Two alternative solutions:
If the solution described above is too imprecise for your purpose (it might lead to issues when the table is changed multiple times per second), you might combine that timestamp with your current mechanism with lastid to cover new inserts.
Another way would be to implement a table, in which the current state is logged. This is where your ajax requests check the current state. Then generade triggers in your data tables, which update this table.

You can get the highest ID by
SELECT id FROM table ORDER BY id DESC LIMIT 1
but this is not reliable in my opinion, because you can have ID's of 1, 2, 3, 7 and you insert a new row having the ID 5.
Keep in mind: the highest ID, is not necessarily the most recent row.
The current auto increment value can be obtained by
SELECT AUTO_INCREMENT FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
Maybe a timestamp + microtime is an option for you?

saving mySql row checkpoint in table?

I am having a wee problem, and I am sure there is a more convenient/simpler way to achieve the solution, but all searches are throw in up a blanks at the moment !
I have a mysql db that is regularly updated by php page [ via a cron job ] this adds or deletes entries as appropriate.
My issue is that I also need to check if any details [ie the phone number or similar] for the entry have changed, but doing this at every call is not possible [ not only does is seem to me to be overkill, but I am restricted by a 3rd party api call limit] Plus this is not critical info.
So I was thinking it might be best to just check one entry per page call, and iterate through the rows/entires with each successive page call.
What would be the best way of doing this, ie keeping track of which entry/row in the table that the should be checked next?
I have 2 ideas of how to implement this:
1 ) The id of current row could be save to a file on the server [ surely not the best way]
2) an extra boolean field [check] is add to the table, set to True on the first entry and false to all other.
Then on each page call it;
finds 'where check = TRUE'
runs the update check on this row,
'set check = FALSE'
'set [the next row] check = TRUE'
Si this the best way to do this, or does anyone have any better sugestion ?
thanks in advance !
.k
PS sorry about the title

Not sure if this is a good solution, but if I have to make nightly massive updates, I'll write the updates to a new blank table, then do a SQL select to join the tables and tell me where they are different, then do another SQL UPDATE like
UPDATE table, temptable
SET table.col1=temptable.col1, table.col2=temptable.col2 ......
WHERE table.id = temptable.id;

You can store the timestamp that a row is updated implicitly using ON UPDATE CURRENT_TIMESTAMP [http://dev.mysql.com/doc/refman/5.0/en/timestamp.html] or explicitly in your update SQL. Then all you need to do is select the row(s) with the lowest timestamp (using ORDER BY and LIMIT) and you have the next row to process. So long as you ensure that the timestamp is updated each time.
e.g. Say you used the field last_polled_on TIMESTAMP to store the time you polled a row.
Your insert looks like:
INSERT INTO table (..., last_polled_on) VALUES (..., NOW());
Your update looks like:
UPDATE table SET ..., last_polled_on = NOW() WHERE ...;
And your select for the next row to poll looks like:
SELECT ... FROM table ORDER BY last_polled_on LIMIT 1;

Is there a better way to get old data?

Say you've got a database like this:
books
-----
id
name
And you wanted to get the total number of books in the database, easiest possible sql:
"select count(id) from books"
But now you want to get the total number of books last month...
Edit: but some of the books have been
deleted from the table since last month
Well obviously you cant total for a month thats already past - the "books" table is always current and some of the records have already been deleted
My approach was to run a cron job (or scheduled task) at the end of the month and store the total in another table, called report_data, but this seems clunky. Any better ideas?

Add a default column that has the value GETDATE(), call it "DateAdded". Then you can query between any two dates to find out how many books there were during that date period or you can just specify one date to find out how many books there were before a certain date (all the way into history).
Per comment: You should not delete, you should soft delete.

I agree with JP, do a soft delete/logical delete. For the one extra AND statement per query it makes everything a lot easier. Plus, you never lose data.
Granted, if extreme size becomes an issue, then yeah, you'll potentially have to start physically moving/removing rows.

My approach was to run a cron job (or scheduled task) at the end of the month and store the total in another table, called report_data, but this seems clunky.
I have used this method to collect and store historical data. It was simpler than a soft-delete solution because:
The "report_data" table is very easy to generate reports/graphs from
You don't have to implement special soft-delete code for anything that needs to delete a book
You don't have to add "and active = 1" to the end of every query that selects from the books table
Because the code to do the historical reporting is isolated from everything else that uses books, this was actually the less clunky solution.

If you needed data from the previous month then you should not have deleted the old data. Instead you can have a "logical delete."
I would add a status field and some dates to the table.
books
_____
id
bookname
date_added
date_deleted
status (active/deleted)
From there you would be able to query:
SELECT count(id) FROM books WHERE date_added <= '06/30/2009' AND status = 'active'
NOTE: It my not be the best schema, but you get the idea... ;)

If changing the schema of the tables is too much work I would add triggers that would track the changes. With this approach you can track all kinds of things like date added, date deleted etc.

Looking at your problem and the reluctance in changing the schema and the code, I would suggest you to go with your idea of counting the books at the end of each month and storing the count for the month in another table. You can use database scheduler to invoke a SP to do this.

You have just taken a baby step down the road of history databases or data warehousing.
A data warehouse typically stores data about the way things were in a format such that later data will be added to current data instead of superceding current data. There is a lot to learn about data warehousing. If you are headed down that road in a serious way, I suggest a book by Ralph Kimball or Bill Inmon. I prefer Kimball.
Here's the websites: http://www.ralphkimball.com/
http://www.inmoncif.com/home/
If, on the other hand, your first step into this territory is the only step you plan to take, your proposed solution is good enough.

The only way to do what you want is to add a column to the books table "date_added". Then you could run a query like
select count(id) from books where date_added <= '06/30/2009';

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.