I am having the query with database.
In my case each user will have daily 5 records to save in a table.
So in 10 days, their will be 50 records for one user.. I have 50000 users the count of record goes to 50000*5=250000 records per day.
If I want to retrieve a particular record for a particular day for a particular user, I have to traverse through these many records.Is it a right practice?
If not, What is the solution for this?
I would suggest you to create indexes in the user and date columns, you can see details in the link suggested by tausif. Also I would recommend you to avoid your queries with "select * from ...", you should specify the columns you need in each query rather than a start (*) to retrieve all the columns.
Can you provide more details of the scenario you have? Is the database in a server? which database technology are you using? Is the data saved for a long period of time?
I would suggest you to start looking to indexes one or two of you columns, but maybe is not the right approach for your particular solution.
Related
I want to love DynamoDB, but the major drawback is the query/scan on the whole DB to pull the results for one query. Would I be better sicking with MySQL or is there another solution I should be aware of?
Uses:
Newsfeed items (Pulls most recent items from table where id in x,x,x,x,x)
User profiles relationships (users follow and friend eachother)
User lists (users can have up to 1,000 items in one list)
I am happy to mix and match database solutions.The main use is lists.
There will be a few million lists eventually, ranging from 5 to 1000 items per list. The list table is formatted as follows: list_id(bigint)|order(int(1))|item_text(varchar(500))|item_text2(varchar(12))|timestamp(int(11))
The main queries on this DB would be on the 'list_relations' table:
Select 'item_text' from lists where list_id=539830
I suppose my main question. Can we get all items for a particular list_id, without a slow query/scan? and by 'slow' do people mean a second? or a few minutes?
Thank you
I'm not going to address whether or not it's a good choice or the right choice, but you can do what you're asking. I have a large dynamoDB instance with vehicle VINs as the Hash, something else for my range, and I have a secondary index on vin and a timestamp field, I am able to make fast queries over thousands of records for specific vehicles over timestamp searches, no problem.
Constructing your schema in DynamoDB requires different considerations than building in MySQL.
You want to avoid scans as much as possible, this means picking your hash key carefully.
Depending on your exact queries, you may also need to have multiple tables that have the same data..but with different hashkeys depending on your querying needs.
You also did not mention the LSI and GSI features of DynamoDB, these also help your query-ability, but have their own sets of drawbacks. It is difficult to advise further without knowing more details about your requirements.
For a new version of a website, I am making a "Top User" section on the homepage. This allows people to vote for the user on a bunch of different categories. All this information is stored in a MySQL database in two main tables. One table has the information(id*autogenerated*, Username, Date added, etc.) and the other has the rates (id*autogenerated*, linkID, rowAvg, avg, avg2, etc.). The linkID from the rate table corresponds to the id from the information table. How the query works is it queries through the rate_table, orders it by highest averages then selects the first 10 results. Then using the linkID of that row, I select the actual information for that user from the info_table. This is fine and dandy as long as each user only has 1 rate. What happens when a user has 2 or more rates, and both of the rates are positive, is the query selects both of those and pulls the information for each of the 2 rows which is then displayed as two different users even though it is the same person. How can I make the rate_table query know when there are multiple rates for the same linkID and average them together instead of showing them as two different people. I was thinking of inserting each linkID into an array, then for each subsequently selected row, check if that linkID is already in the array. If not, insert it, if it is, then average them together and store it in the array and use the array to populate the table on the homepage. I feel like I am overthinking this. Sorry if this is a little confusing. If you need more information, let me know.
P.S. I'm still learning my way around MySQL queries so I apologize if I am going about this the completely wrong way and you spent the last few minutes trying to understand what I was saying :P
P.P.S. I know I shouldn't be using MySQL_Query anymore, and should be doing prepared statements. I want to master MySQL queries because that is what FMDB databases for iOS use then I will move onto learning prepared statements.
Take a look at some of the MySQL Aggregate Functions.
AVG() may be what you're looking for. The average of one result is just that result, so that should still work. Use the GROUP BY clause on the column that should be unique in order to run the aggregate calculation on the grouped rows.
I have a drupal site, and am trying to use php to grab some data from my database. What I need to do is to display, in a user's profile, how many times they were the first person to review a venue (exactly like Yelp's "First" tally). I'm looking at two options, and trying to decide which is the better way to approach it.
First Option: The first time a venue is reviewed, save the value of the reviewer's user ID into a table in the database. This table will be dedicated to storing the UID of the first user to review each venue. Then, use a simple query to display a count in the user's profile of the number of times their UID appears in this table.
Second Option: Use a set of several more complex queries to display the count in the user's profile, without storing any extra data in the database. This will rely on several queries which will have to do something along the lines of:
Find the ID for each review the user has created
Check the ID of the venue contained in each review
First review for each venue based on the venue ID stored in the review
Get the User ID of the author for the first review
Check which, if any, of these Author UIDs match the current user's UID
I'm assuming that this would involve creating an array of the IDs in step one, and then somehow executing each step for each item in the array. There would also be 3 or 4 different tables involved in the query.
I'm relatively new to writing SQL queries, so I'm wondering if it would be better to perform the set of potentially longer queries, or to take the small database hit and use a much much smaller count query instead. Is there any way to compare the advantages of either, or is it like comparing apples and oranges?
The volume of extra data stored will be negligible; the simplification to the processing will be significant. The data won't change (the first person to review a venue won't change), so there is a negligible update burden. Go with the extra data and simpler query.
I'm running a sql query to get basic details from a number of tables. Sorted by the last update date field. Its terribly tricky and I'm thinking if there is an alternate to using the UNION clause instead...I'm working in PHP MYSQL.
Actually I have a few tables containing news, articles, photos, events etc and need to collect all of them in one query to show a simple - whats newly added on the website kind of thing.
Maybe do it in PHP rather than MySQL - if you want the latest n items, then fetch the latest n of each of your news items, articles, photos and events, and sort in PHP (you'll need the last n of each obviously, and you'll then trim the dataset in PHP). This is probably easier than combining those with UNION given they're likely to have lots of data items which are different.
I'm not aware of an alternative to UNION that does what you want, and hopefully those fetches won't be too expensive. It would definitely be wise to profile this though.
If you use Join in your query you can select datas from differents tables who are related with foreign keys.
You can look of this from another angle: do you need absolutely updated information? (the moment someone enters new information it should appear)
If not, you can have a table holding the results of the query in the format you need (serving as cache), and update this table every 5 minutes or so. Then your query problem becomes trivial, as you can have the updates run as several updates in the background.
Say you've got a database like this:
books
-----
id
name
And you wanted to get the total number of books in the database, easiest possible sql:
"select count(id) from books"
But now you want to get the total number of books last month...
Edit: but some of the books have been
deleted from the table since last month
Well obviously you cant total for a month thats already past - the "books" table is always current and some of the records have already been deleted
My approach was to run a cron job (or scheduled task) at the end of the month and store the total in another table, called report_data, but this seems clunky. Any better ideas?
Add a default column that has the value GETDATE(), call it "DateAdded". Then you can query between any two dates to find out how many books there were during that date period or you can just specify one date to find out how many books there were before a certain date (all the way into history).
Per comment: You should not delete, you should soft delete.
I agree with JP, do a soft delete/logical delete. For the one extra AND statement per query it makes everything a lot easier. Plus, you never lose data.
Granted, if extreme size becomes an issue, then yeah, you'll potentially have to start physically moving/removing rows.
My approach was to run a cron job (or scheduled task) at the end of the month and store the total in another table, called report_data, but this seems clunky.
I have used this method to collect and store historical data. It was simpler than a soft-delete solution because:
The "report_data" table is very easy to generate reports/graphs from
You don't have to implement special soft-delete code for anything that needs to delete a book
You don't have to add "and active = 1" to the end of every query that selects from the books table
Because the code to do the historical reporting is isolated from everything else that uses books, this was actually the less clunky solution.
If you needed data from the previous month then you should not have deleted the old data. Instead you can have a "logical delete."
I would add a status field and some dates to the table.
books
_____
id
bookname
date_added
date_deleted
status (active/deleted)
From there you would be able to query:
SELECT count(id) FROM books WHERE date_added <= '06/30/2009' AND status = 'active'
NOTE: It my not be the best schema, but you get the idea... ;)
If changing the schema of the tables is too much work I would add triggers that would track the changes. With this approach you can track all kinds of things like date added, date deleted etc.
Looking at your problem and the reluctance in changing the schema and the code, I would suggest you to go with your idea of counting the books at the end of each month and storing the count for the month in another table. You can use database scheduler to invoke a SP to do this.
You have just taken a baby step down the road of history databases or data warehousing.
A data warehouse typically stores data about the way things were in a format such that later data will be added to current data instead of superceding current data. There is a lot to learn about data warehousing. If you are headed down that road in a serious way, I suggest a book by Ralph Kimball or Bill Inmon. I prefer Kimball.
Here's the websites: http://www.ralphkimball.com/
http://www.inmoncif.com/home/
If, on the other hand, your first step into this territory is the only step you plan to take, your proposed solution is good enough.
The only way to do what you want is to add a column to the books table "date_added". Then you could run a query like
select count(id) from books where date_added <= '06/30/2009';