I'm trying to make a pretty line graph to represent the rate of users registering to my site. What would be the best way to fetch and store my sites stat data to a MySQL Database?
I was thinking to use a cron job that would fetch the total amount of users from a seperate table, and subtract the previous total to get the amount of newly registered users. This value would be used for the line graph. To me, it seems like too much, especially if I want to get rate of users on a per-minute basis. Is there a more efficient way of doing this? Should I store each day in a separate row?
I'd suggest you include the time that users registered in your users table. You can then perform whatever analysis you like at a later date.
You can also create a field in table that is upgraded by a cron job (by minute, hour, day and so on), the field stores the status of user, new, old or other status that you have.
As said eggyal, you must also have a field with register date of user
Related
I'm in the process of writing a system to search through a MySQL database of real estate listings. I'm concerned about performance and wanted some input on how to handle this.
The table that will be the most frequently queried is the 'listings' table and will contain over 600k records with 86 columns. This table will also be updated every 30 minutes as listings change.
Almost every search will be against records with a status of 'active' which will be about 15k of the 600k records. However, I need to retain all of the records for our internal reports. Also, each query will likely be searching for various parameters (#beds, #baths, etc) so caching may not be feasible.
I was considering maintaining a second table containing the PK's of records marked 'active'. Create a view of the tables joined on the listing's PK. However, I know that under certain conditions, Views can be very inefficient.
I did have the thought of maintaining two databases since the inactive listings won't be searched frequently and will require less maintenance.
Fortunately it's not in production yet and I have time for performance testing. One more thing, this will be hosted on a dedicated Linux server with the front-end written in PHP. Any insight offered is greatly appreciated.
I suggest that you create an archive table. You could set up a process to run every 30 minutes or once per day, depending on the requirements.
The archive table would have the same columns as the original table plus and EffDate and EndDate, that have the dates/date times when the record is active.
Such a table will make it possible to recreate the history at any point in time -- something that will prove useful, I'm sure.
You will need code to create this. The basic logic is to lookup each record in your table with the most current version in the archive (EndDate is null and id = id). Then:
If new record is not present, create a new record with the current date as EffDate.
If present and all columns are the same, do nothing.
Otherwise update EndDate on the archive record and do (1).
Any archive records that do not have a new record at all should have EndDate set to the current date.
Typically, I have such tables updated once per day.
In code that does this, I have a big ugly query (Excel helps me build it) that does the comparisons and determines which records are "New", "Modified", and "Removed". The "Removed" and "Modified" records have the current EndDates set to the current date. The "New" and "Modified" records then get a new record with the EffDate set to the current date.
The values for EndDate and EffDate might be one more or less than stated, depending on how the updates really work. For a nightly update, for instance, the EffDate might be set to tomorrow or even to the date when the listing takes effect.
For my university project, I'm developing a dynamic live chat website with rooms, user registration, etc. I've got the entire system planned out bar one aspect. The rooms. I'm confused as to how to design the database for rooms.
To put it in perspective, a room is created by a user who is then an operator of that room. Users can join the room and talk within it. The system has to be scalable, accounting for hundreds of thousands if not millions of messages being sent a day.
Originally, I was going to create on table in my database called messages, and have fields like this:
| r_id | u_id | message | timestamp |
r_id and u_id would be foreign keys to the room ID and user ID respectively. Doing it this way means I would need to insert a new record whenever a user sends a message, and periodically run a SELECT statement for every client (say every 3 seconds or so) to get the recent messages. My worry with this is because the table will be huge, running these statements might create a lot of overhead and take a long time.
The other way I thought of implementing this would be to create a new database table for every room. Say a user creates 3 rooms called General, Programming and Gaming, the database tables would look like: room_general, room_programming, room_gaming, each with fields like:
| u_id | message | timestamp |
This would drastically cut down on the amount of queries for each table, but may introduce problems when I come to program it.
So, I'm stuck on what the best way to do this is. If it makes a difference, the technology I'm using will be MySQL with PHP, and a whole lotta AJAX.
Thanks for any help!
It is bad idea to create a table per room. Hard to implement and hard to support.
Dont worry about performance of selects because they will be wery simple:
SELECT * FROM messages WHERE r_id=X ORDER BY timestamp DESC LIMIT X,Y
Just make sure your (r_id, timestamp) indexed together in this order to make this select using index:
ALTER TABLE `messages` ADD KEY `IN_messages_room_time` (`r_id`, `timestamp`);
If you will still have problems with performance (probably you will not), just add a 1-3 seconds inmemory cache (using memcache) and fetch a messages from DB one time per 1-3 seconds.
Also look at the Apollo Clark's answer: https://stackoverflow.com/a/8673165/436932 to prevent storing huge amount of unneccessary old messages: you can just put it in to the MYISAM table archive or simply delete.
Look into creating a "transaction table" for storing the messages. Basically, you need to decide, do I really want to log all of the messages ever posted to the room, or just the message posted this past month / week / day / hour. If you really want to have a history of every message ever written, then you would create two databases. If you don't want to keep a history of every message, then you just need one table.
Using a transaction table, here's how it would flow:
user enters chat room
user types a message, which is saved to the transaction table.
every 500msec or 3sec, every user in the room would query the transaction table to get the latest updates from the past 500msec or 3sec
SELECT * FROM message_transactions WHERE timestamp > 123456789
a CRON job runs every 5 min or 1 hour, and deletes all entries older then 5min or however long you want the history to be.
Be sure to synchronize and round the time that each user queries the transaction table, so that the MySQL query result caching will kick in. For example, round the timestamp to once every 1sec or every 500msec.
What'll happen now is the users only get the newest messages, and your database won't explode in size over time, or slow down. Doing this, you'll need to cache the history of messages on the client-side in JS.
On the flip side, you could just get a PHP to IRC library, and call it a day. Also, if you're curious about it, look into how Facebook implements their AJAX-based chat system.
To speed up your database, have a look at indexing your tables: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
In your case I assume that you'd be SELECTing messages by r_id while doing a JOIN on the user table through u_id. I would index the r_id and u_id columns. I am by no means an expert on this subject as I've only done "what works" for my own projects. I don't understand every pro and con of indexing, just that indexing those columns that are typically used as, well, indexes, speeds things up. Google "mysql index tutorial", you'll find plenty more information.
Don't go nuts and index every column, you'll slow down your INSERTs and UPDATEs.
I also suggest that you purge the chat logs every few days / weeks, or move them to another server for archival purposes if that's what you want / need to do.
You could potentially use memcached to hold recent chat messages in memory and do your database writes in bulk.
Using memcached as a database buffer for chat messages
What you can do is:
whenever the user updates, you save the message to a cache specific to a room with a timestamp of when the message came in, while saving it to the database at the time. When the clients requests for new messages, if the user is not new in the chat room, you check the last time the user got served by the server and load the new messages from the cache for the request. But if the user is new, then you serve him from the database.
To improve scalability in this scenario, you have to set the expiration of the messages so that messages can expire after that time. Or implement an async method that deletes old messages based on their timestamp.
What's the most efficient way of counting the total number of registered users on a website?
I was thinking of using the following query, but if this table contained 1000's of users, the execution time will be very long.
mysql_query("SELECT COUNT(*) FROM users");
Instead, I thought of creating a separate table that will hold this value. Each time a new user registers, or a current one deleted, this value gets updated.
My Question:
Is it possible to carry out an INSERT and UPDATE in one query? - The INSERT will be for storing the new users details, and the UPDATE to increment the total users value.
I'm very interested in your thoughts on this.
If there is a better and faster way to find out the total registered users, I'm very interested to know ;
Cheers ;)
You can use triggers to update the value every time you make an INSERT, UPDATE or DELETE.
if this table contained 1000's of users, the execution time will be very long.
I doubt that it would be that slow for thousands of users. If you had millions of users then it would probably be too slow.
And does your count need to be 100% accurate?
If an approximate row count is sufficient, SHOW TABLE STATUS can be used.
(Source)
By the way, if you are using MyISAM then your original query will be close to instant because the row count is stored already with this storage engine.
You don't do an insert and update in one query. but rather, you do them in one "Transaction".
Transactions have a concept of "atomicity", which means that other processes cannot see "part" of the transaction - it is all or nothing.
If this concept is not familiar to you, you may wish to look it up.
Just a quickey. I am developming website, where you can buy credits and spend them later for things on the website.
My question is, is it ok to store amount of credits with user (user table, column credits and iteger amount) or it is necessary (or just better) to have separate table with user id and amount ?
Thanks
Both actually.
Considering that you'll be dealing with monetary transactions to get those credits, you want to be able to get a log of all transactions (depending of the laws in your country, you will NEED this). Therefore you'll need a credits_transactions table.
user_id, transaction_id, transaction_details, transaction_delta
Since programmatically calculating your current credit balance will be too costly for users with a lot of transactions, you'll also need a credit_balance row in your user table for quick access. Use triggers to automatically update that column whenever a row is inserted from credits_transactions (technically, update and delete shouldn't be allowed in that table). Here's is the code for the insert trigger.
CREATE TRIGGER ct_insert
AFTER INSERT ON credits_transactions
BEGIN
UPDATE users SET credit_balance = credit_balance + NEW.transaction_delta WHERE user_id = NEW.user_id;
END
;;
I also have sites containing credits and found it easiest to store them in the user table, mostly because you need access to it on every page (when the user is logged in). It is only an integer so will not do much harm. I think actually creating a new table for this value might be worse perfomance wise because it needs an index aswel.
A good rule of thumb is to create a user table for the info you need on every page, and normalise the data you dont need on every page (for example adress information, descriptions etc).
Edit:
Seeing the other reactions,
If you want to have transaction logs aswel I would store them seperately as they are mainly for logging (or if the user wants to view them). Calculating them on the fly from the log is fine for smaller sites but if you really have to squeeze performance just store the actual value in the user table.
If you store in separate table, you can keep log of changing the credits. If you store in column, you will have only the current amount of credits.
If you want to keep a record of Credits History Log like
how many credit bought today.
how many spend yesterday.
what did you bought with credits
I think its better to put this in a separate table. In this way you can get these kind of results by applying mathematical operations.
Credits are like money. If a user needs to purchase them, then they are money. Money is tracked using accounts. Account has associated transactions, deposits and withdrawals -- and balance. Search the SO or google for database and account. Here are just a few examples:
one
two
three
I'd have a table which stores the purchases and bought credits, with user id.
Then calculate each time based on this, it should be fast if it's indexed, this way you will be able to easily have a purchase history.
my site has lots of incoming searches which is stored in a database to show recent queries into my website. due to high search queries my database is getting bigger in size. so what I want is I need to keep only recent queries in database say 10 records. this keeps my database small and queries will be faster.
I am able to store incoming queries to database but don't know how to restrict or delete excess/old data from table.
any help??
well I am using PHP and MySQL
Hopefully you have a timestamp column in your table (or have the freedom to add one). AFAIK, you have to add the timestamp explicitly when you add data to the table. Then you can do something along the lines of:
DELETE FROM tablename WHERE timestamp < '<a date two days in the past, or whatever'>;
You'd probably want to just do this periodically, rather than every time you add to the table.
I suppose you could also just limit the size to the most recent ten records by checking the size of the table every time you are about to add a line, and deleting the oldest record (again, using the timestamp column you added) if adding the new record will make it too large.
Falkon's answer is good - though you might not want to have your archive in a table, depending on your needs for that forensic data. You could also set up a cron job that just uses mysqldump to make a backup of the database (with the date in the filename), and then delete the excess records. This way you can easily make backups of your old data, or search it with whatever tool, and your database stays small.
You should write a PHP script, which will be started by CRON (ie. once a day) and move some data from main table TableName to archive table TableNameArchive with exactly the same structure.
That SQL inside the script should looks like:
INSERT INTO TableNameArchive
SELECT * FROM TableName WHERE data < '2010-06-01' //of course you should provide here your conditions
next you should DELETE old records from TableName.