Is there a better way to get old data? - php

Say you've got a database like this:
books
-----
id
name
And you wanted to get the total number of books in the database, easiest possible sql:
"select count(id) from books"
But now you want to get the total number of books last month...
Edit: but some of the books have been
deleted from the table since last month
Well obviously you cant total for a month thats already past - the "books" table is always current and some of the records have already been deleted
My approach was to run a cron job (or scheduled task) at the end of the month and store the total in another table, called report_data, but this seems clunky. Any better ideas?

Add a default column that has the value GETDATE(), call it "DateAdded". Then you can query between any two dates to find out how many books there were during that date period or you can just specify one date to find out how many books there were before a certain date (all the way into history).
Per comment: You should not delete, you should soft delete.

I agree with JP, do a soft delete/logical delete. For the one extra AND statement per query it makes everything a lot easier. Plus, you never lose data.
Granted, if extreme size becomes an issue, then yeah, you'll potentially have to start physically moving/removing rows.

My approach was to run a cron job (or scheduled task) at the end of the month and store the total in another table, called report_data, but this seems clunky.
I have used this method to collect and store historical data. It was simpler than a soft-delete solution because:
The "report_data" table is very easy to generate reports/graphs from
You don't have to implement special soft-delete code for anything that needs to delete a book
You don't have to add "and active = 1" to the end of every query that selects from the books table
Because the code to do the historical reporting is isolated from everything else that uses books, this was actually the less clunky solution.

If you needed data from the previous month then you should not have deleted the old data. Instead you can have a "logical delete."
I would add a status field and some dates to the table.
books
_____
id
bookname
date_added
date_deleted
status (active/deleted)
From there you would be able to query:
SELECT count(id) FROM books WHERE date_added <= '06/30/2009' AND status = 'active'
NOTE: It my not be the best schema, but you get the idea... ;)

If changing the schema of the tables is too much work I would add triggers that would track the changes. With this approach you can track all kinds of things like date added, date deleted etc.

Looking at your problem and the reluctance in changing the schema and the code, I would suggest you to go with your idea of counting the books at the end of each month and storing the count for the month in another table. You can use database scheduler to invoke a SP to do this.

You have just taken a baby step down the road of history databases or data warehousing.
A data warehouse typically stores data about the way things were in a format such that later data will be added to current data instead of superceding current data. There is a lot to learn about data warehousing. If you are headed down that road in a serious way, I suggest a book by Ralph Kimball or Bill Inmon. I prefer Kimball.
Here's the websites: http://www.ralphkimball.com/
http://www.inmoncif.com/home/
If, on the other hand, your first step into this territory is the only step you plan to take, your proposed solution is good enough.

The only way to do what you want is to add a column to the books table "date_added". Then you could run a query like
select count(id) from books where date_added <= '06/30/2009';

Related

Hi, I am having the query with database

I am having the query with database.
In my case each user will have daily 5 records to save in a table.
So in 10 days, their will be 50 records for one user.. I have 50000 users the count of record goes to 50000*5=250000 records per day.
If I want to retrieve a particular record for a particular day for a particular user, I have to traverse through these many records.Is it a right practice?
If not, What is the solution for this?
I would suggest you to create indexes in the user and date columns, you can see details in the link suggested by tausif. Also I would recommend you to avoid your queries with "select * from ...", you should specify the columns you need in each query rather than a start (*) to retrieve all the columns.
Can you provide more details of the scenario you have? Is the database in a server? which database technology are you using? Is the data saved for a long period of time?
I would suggest you to start looking to indexes one or two of you columns, but maybe is not the right approach for your particular solution.

Update view count, most reliable way

Hello again Stackoverflow!
I'm currently working on custom forumsoftware and one of the things you like to see on a forum is a viewcounter.
All the approaches for a viewcounter that I found would just select the topic from the database, retrieve the number from a "views" column, add one and update it.
But here's my thought: If, lets say 400, people at the exact same time open a topic, the MySQL database probably won´t count all views because it takes time for the queries to complete, and so the last person (of the 400) might overwrites the first persons (of the 400) view.
Ofcourse one could argue that on a normal site this is never going to happen, but if you have ~7 people opening that topic at the exact same second and the server is struggleing at that moment, you could have the same problem.
Is there any other good approach to count views?
EDIT
Woah, could the one who voted down specify why?
I ment by "Retrieving the number of views and adding one" that I would use SELECT to retrieve the number, add one using PHP (note the tags) and updating it using UPDATE. I had no idea of the other methods specified below, that's why I asked.
If, lets say 400, people at the exact same time open a topic, the MySQL database apparently would count all the views because this is exactly what databases were invented for.
All the approaches for a viewcounter that you have found are wrong. To update a field you don't need to retrieve it, but just already update:
UPDATE forum SET views + 1 WHERE id = ?
So something like that will work:
UPDATE tbl SET cnt = cnt+1 WHERE ...
UPDATE is guaranteed to be atomic. That means no one will be able to alter cnt between the time it is read and the time it is replaced. If you have several concurrent UPDATE for the same row (InnoDB) or table (MyISAM) they have to wait their turn to update the date.
See Is incrementing a field in MySQL atomic?
and http://dev.mysql.com/doc/refman/5.1/en/ansi-diff-transactions.html

PHP / MySQL lecture allocation

I am using PHP, MySQL to develop a website to be used in an educational institution.
One function of this is to allocate lectures when creating a batch. To allocate lectures to the starting batch, system will prompt available lectures based on their availability and qualifications. Then the course coordinator will take the decision.
My problem is how to check the availability of the lecture on a particular week day given time slot.(time slot is varying not fixed durations)
I am planning to keep lecture schedules in a table where it shows lecture_Id, Batch_Id, day, start_time, end_time, start_day , end_day.
Then when availability checking I need to write a complex query to check the available lecture_Id's. I couldn't figure out it yet.
Is there any other smart ways to do this?
Thanks
The way you're proposing to store the info is the first way I'd think of doing it - the complex query isn't so bad though.
So if I understand correctly, you have lectures stored in another table, with a one - to -many relationship with your lecture_schedules table?
If so, this will get the lecture details for a given time range.
Something like this:
SELECT * FROM lecture_schedules
INNER JOIN lectures USING(lecture_id)
WHERE start_day<=DAY(yourstarttime) AND end_day>=DAY(yourendtime)
AND start_time<=yourstarttime AND end_time>=yourendtime;
Note you will need to edit table names and column names to reflect your actual schema, and replace yourstarttime and yourendtime with the time range values.
Hope this helps
EDIT:
This query makes several assumptions about the columns and datatypes of those columns in your schema - don't just copy & paste and expect it to work first time :)

MySql table design : Use lots of rows or store the information formatted in one text field per row?

I am trying to discover the best way to design my database to organize information related to events.
I have an events table which contains all the information about the event such as, a unique id, title of the event, venue etc.
Now each event can have multiple ticket types and the number and type of tickets will change with each event.
Is it better to have a events_tickets table which has a seperate row for each ticket type e.g.
event_id ticket_type price
1 standard 20
1 deluxe 40
1 cheap 10
Or is it better to have the table formatted so that the information is on one row?
event_id ticket_information
1 standard:20,deluxe:40,cheap:10
If I use the first way I could end up with 10 rows per event which when multiplied by lots of events could become very large, whereas the second version could have problems with data integrity.
the first one... definitely. :) having as much of your data as separate as possible is ALWAYS the best way... it makes it much more usable and much easier to change/upgrade/expand the code later.
In fact I would have 3 tables: events, event_options and ticket_types
event_options would just be literally a link table between the events and the ticket_types, and can include other information you need to hold per event. This way it will make it easier still to a) search by ticket type and b) add more ticket types because when you come to add a new ticket type to an existing event (or something similar) you will have a lot more issues the second way.
The official answer is to do it the first way. If you only ever have exactly the same three types of tickets, then you can get on with having three "ticket price" fields. But otherwise, relational-purism tells you to go with the first.
I'm assuming that in any event you have an "events" table. Tell you what: search for "third normal form" on your favorite search engine, and you'll learn a lot about designing databases.
The first way is better. It is more normalised. Why does this matter? It means it's much easier to query your data. You don't want to use the second way, because it'll be really complicated and time-consuming to retrieve data at a later time.

hiding model data based on id's existance in another table

I've got a somewhat complicated question for you cakephp experts.
Basically, I have created a db table called "locations". Every month I will get this table sent to me in csv format from a client. Unfortunately, instead of updating this table, I will have to empty it and reimport all of the records. Unfortunately, I cannot alter this table at all.
Functionality wise, users will have the ability to look at a display of these records, and be able to choose to hide certain ones. This "hidden" attribute must be persistent and survive the month to month purging of all records.
I had all of this working yesterday. What I did was, create a separate table called location_properties (columns were: id(int), location_id(foreign key), is_hidden(boolean)). When showing these records, it would simply check to see if "is_hidden==true".
This was all well and good(AND WORKING!), but then my boss kind of gummed up the works. He told me to delete the "is_hidden" column from the table because it would be more efficient. That I should be able to simply check for the existence of the location_id to hide or show it.
It doesn't appear to be quite that simple. Anyone know how I can pull this off? I've tried everything I can think of.
Your boss is wrong.
It's more efficient to add your column, than it is too delete and re-import the locations every month.
Did he say it was less efficient, or did you do an actual benchmark to see if its harms performance too much?
At first glance I see 2 solutions:
1) add a condition array('Location.id' => 'NOT NULL')
2) change join type to right join
I hope this helps

Categories