I have a table for posts as ID, Title, Content, etc. I just added a column as counter. It is a simple counter of visits, as every time it will be updated as $i+1. In this method, I update the table on every visit, just after reading the row (in a single mysql session).
Is it a good idea to separate the counter as another table connected with the post table with ID as foreign key? In this method, updating a lighter table is faster, but every time I need to read two tables (showing the post and its statistics).
So, my answer is it depends on what you are trying to do. I'm going to give you some behind the scene info on MySQL so you can decide when it makes sense to and not to create a counter table.
MySQL has different table engines. When you create a new table, you specify which engine to use, the more common ones are MyISAM and Innodb. MyISAM is really good and fast at doing selects and Innodb is really fast at doing inserts. There are other differences between both but it's good to understand that the engine you select is important.
Based on the above, if you have a table that is usually read and has a ton of rows, it might make sense to keep that table as MyISAM and create a separate counter table using Innodb that that keeps getting updated. If you are implementing cache, this is another reason which this model would work better since you won't clear your cache for your table data every time you update the counter since you would be updating the counter on a different table.
Now, some may argue that you should using Innodb because it has many more benefits but there are replication strategies that would allow you to make the best of both worlds.
I hope this gives you a general understanding so you can then dig deeper and find your answer. More info at: http://www.mysqlperformanceblog.com/2007/07/01/implementing-efficient-counters-with-mysql/
Related
I have a MySQL database that is becoming really large. I can feel the site becoming slower because of this.
Now, on a lot of pages I only need a certain part of the data. For example, I store information about users every 5 minutes for history purposes. But on one page I only need the information that is the newest (not the whole history of data). I achieve this by a simple MAX(date) in my query.
Now I'm wondering if it wouldn't be better to make a separate table that just stores the latest data so that the query doesn't have to search for the latest data from a specific user between millions of rows but instead just has a table with only the latest data from every user.
The con here would be that I have to run 2 queries to insert the latest history in my database every 5 minutes, i.e. insert the new data in the history table and update the data in the latest history table.
The pro would be that MySQL has a lot less data to go through.
What are common ways to handle this kind of issue?
There are a number of ways to handle slow queries in large tables. The three most basic ways are:
1: Use indexes, and use them correctly. It is important to avoid table scans on large tables; this is almost always your most significant performance hit with single queries.
For example, if you're querying something like: select max(active_date) from activity where user_id=?, then create an index on the activity table for the user_id column. You can have multiple columns in an index, and multiple indexes on a table.
CREATE INDEX idx_user ON activity (user_id)
2: Use summary/"cache" tables. This is what you have suggested. In your case, you could apply an insert trigger to your activity table, which will update the your summary table whenever a new row gets inserted. This will mean that you won't need your code to execute two queries. For example:
CREATE TRIGGER update_summary
AFTER INSERT ON activity
FOR EACH ROW
UPDATE activity_summary SET last_active_date=new.active_date WHERE user_id=new.user_id
You can change that to check if a row exists for the user already and do an insert if it is their first activity. Or you can insert a row into the summary table when a user registers...Or whatever.
3: Review the query! Use MySQL's EXPLAIN command to grab a query plan to see what the optimizer does with your query. Use it to ensure that the optimizer is avoiding table scans on large tables (and either create or force an index if necesary).
So... assuming i have a database with three tables:
Table clients
Table data
and Table clients_to_data
And I have a API which allows Clients to Access data from Table data. Every client has a record in Table clients (with things like IP adress etc.) To log who accesses what, i'm logging in the table clients_to_data (which contains the ID for table clients, table data and a timestamp.)
Every time a user access my API, he get's logged in the clients_to_data table. (So records in clients and data are not updated, just read.)
I also want to be able to get the amount of hits per client. Pretty easy, just query the clients_to_data table with a client_id and count the results. But as my DB grows, i'll have tenthousands of records in the clients_to_data table.
And here's my question:
Is it a better practice to add a field "hits" to Table clients that stores the amount of hits for that user and increment it every time the user queries the API
So this would be adding redundancy to the DB which i've heard generally is a bad thing. But in this case i think it would speed up the process of retrieving the amount of hits.
So which method is better and faster in this case? Thanks for your help!
Faster when?
Appending to the table will be faster , than finding the record and updating it, much faster than reading it, incrementing and updating it.
However having hits "precalulated", will be faster than the aggregate query to count them.
What you gain on the swings you lose on the roundabouts, which choice you make depends on your current usage patterns. So are you prepared to slow down adding a hit, to gain a signicant boost on finding out how many you've had?
Obviously selecting a single integer column from a table will be faster then selecting a count() of rows from a table.
The complexity trade off is a bit moot. 1 way you need to write a more complex sql, the other way you will need to update/insert 2 tables in your code.
How often is the number of hits queried? Do you clients look it up, or do you check it once a month? If you only look now and then I probably wouldn't be too concerned about the time taken to select count(*).
If your clients look up the hit count with every request, then I would look at storing a hits column.
Now that our table structures are all clearly defined, lets get to work.
You want to record something in the DB which is the number of times every client has accessed the data, in other terms,
Insert a record into a table "client_to_data" for every clients "impression".
You are worried about 2 things,
1. Redundancy
2. Performance when retrieving the count
What about the performance when storing the count.(Insert statements)..?
This is a classic scenario, where I would write the data to be inserted into memcache, and do a bulk insert at the end of the day.
More importantly, I will normalize the data before inserting it to the DB.
As to select, create indexes. If its text, install sphinx.
Thanks.
I'm developing software for conducting online surveys. When a lot of users are filling in a survey simultaneously, I'm experiencing trouble handling the high database write load. My current table (MySQL, InnoDB) for storing survey data has the following columns: dataID, userID, item_1 .. item_n. The item_* columns have different data types corresponding to the type of data acquired with the specific items. Most item columns are TINYINT(1), but there are also some TEXT item columns. Large surveys can have more than a hundred items, leading to a table with more than a hundred columns. The users answers around 20 items in one http post and the corresponding row has to be updated accordingly. The user may skip a lot of items, leading to a lot of NULL values in the row.
I'm considering the following solution to my write load problem. Instead of having a single table with many columns, I set up several tables corresponding to the used data types, e.g.: data_tinyint_1, data_smallint_6, data_text. Each of these tables would have only the following columns: userID, itemID, value (the value column has the data type corresponding to its table). For one http post with e.g. 20 items, I then might have to create 19 rows in data_tinyint_1 and one row in data_text (instead of updating one large row with many columns). However, for every item, I need to determine its data type (via two table joins) so I know in which table to create the new row. My zend framework based application code will get more complicated with this approach.
My questions:
Will my solution be better for heavy write load?
Do you have a better solution?
Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.
The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.
There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.
This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.
I am simulating several instruction queues using a mysql table. There is a 'mode' column which is the name for each queue and once items are taken from the queue they are deleted right afterwards. Typical queries look like
SELECT * FROM queue_table WHERE mode='queue1' LIMIT 50.
I am currently using a MYISAM table for this but there is a lot of overhead with all the deleting and optimization takes a long time. I was just wondering if there was a more efficient way to do this and if maybe the database should be INNODB.
InnoDB are useful if your are implying Foreign key constraints .
so one thing for optimizing you Query..
create index onmode column and also dont use * ( may be your table have a lots of columns), write column name which you only need to retrive.
I cannot say I completely understand your problem but I am sure that if you will make your fileds of fixed size (i.e. no text fields, char instead of varchar) the table records become extremely reusable and require no optimization at all.
I'm working on a basic php/mysql CMS and have a few questions regarding performance.
When viewing a blog page (or other sortable data) from the front-end, I want to allow a simple 'sort' variable to be added to the querystring, allowing posts to be sorted by any column. Obviously I can't accept anything from the querystring, and need to make sure the column exists on the table.
At the moment I'm using
SHOW TABLES;
to get a list of all of the tables in the database, then looping the array of table names and performing
SHOW COLUMNS;
on each.
My worry is that my CMS might take a performance hit here. I thought about using a static array of the table names but need to keep this flexible as I'm implementing a plugin system.
Does anybody have any suggestions on how I can keep this more concise?
Thankyou
If you using mysql 5+ then you'll find database information_schema usefull for your task. In this database you can access information of tables, columns, references by simple SQL queries. For example you can find if there is specific column at the table:
SELECT count(*) from COLUMNS
WHERE
TABLE_SCHEMA='your_database_name' AND
TABLE_NAME='your_table' AND
COLUMN_NAME='your_column';
Here is list of tables with specific column exists:
SELECT TABLE_SCHEMA, TABLE_NAME from COLUMNS WHERE COLUMN_NAME='your_column';
Since you're currently hitting the db twice before you do your actual query, you might want to consider just wrapping the actual query in a try{} block. Then if the query works you've only done one operation instead of 3. And if the query fails, you've still only wasted one query instead of potentially two.
The important caveat (as usual!) is that any user input be cleaned before doing this.
You could query the table up front and store the columns in a cache layer (i.e. memcache or APC). You could then set the expire time on the file to infinite and only delete and re-create the cache file when a plugin has been newly added, updated, etc.
I guess the best bet is to put all that stuff ur getting from Show tables etc in a file already and just include it, instead of running that every time. Or implement some sort of caching if the project is still in development and u think the fields will change.