I'm working on a basic php/mysql CMS and have a few questions regarding performance.
When viewing a blog page (or other sortable data) from the front-end, I want to allow a simple 'sort' variable to be added to the querystring, allowing posts to be sorted by any column. Obviously I can't accept anything from the querystring, and need to make sure the column exists on the table.
At the moment I'm using
SHOW TABLES;
to get a list of all of the tables in the database, then looping the array of table names and performing
SHOW COLUMNS;
on each.
My worry is that my CMS might take a performance hit here. I thought about using a static array of the table names but need to keep this flexible as I'm implementing a plugin system.
Does anybody have any suggestions on how I can keep this more concise?
Thankyou
If you using mysql 5+ then you'll find database information_schema usefull for your task. In this database you can access information of tables, columns, references by simple SQL queries. For example you can find if there is specific column at the table:
SELECT count(*) from COLUMNS
WHERE
TABLE_SCHEMA='your_database_name' AND
TABLE_NAME='your_table' AND
COLUMN_NAME='your_column';
Here is list of tables with specific column exists:
SELECT TABLE_SCHEMA, TABLE_NAME from COLUMNS WHERE COLUMN_NAME='your_column';
Since you're currently hitting the db twice before you do your actual query, you might want to consider just wrapping the actual query in a try{} block. Then if the query works you've only done one operation instead of 3. And if the query fails, you've still only wasted one query instead of potentially two.
The important caveat (as usual!) is that any user input be cleaned before doing this.
You could query the table up front and store the columns in a cache layer (i.e. memcache or APC). You could then set the expire time on the file to infinite and only delete and re-create the cache file when a plugin has been newly added, updated, etc.
I guess the best bet is to put all that stuff ur getting from Show tables etc in a file already and just include it, instead of running that every time. Or implement some sort of caching if the project is still in development and u think the fields will change.
Related
I have a MySQL database that is becoming really large. I can feel the site becoming slower because of this.
Now, on a lot of pages I only need a certain part of the data. For example, I store information about users every 5 minutes for history purposes. But on one page I only need the information that is the newest (not the whole history of data). I achieve this by a simple MAX(date) in my query.
Now I'm wondering if it wouldn't be better to make a separate table that just stores the latest data so that the query doesn't have to search for the latest data from a specific user between millions of rows but instead just has a table with only the latest data from every user.
The con here would be that I have to run 2 queries to insert the latest history in my database every 5 minutes, i.e. insert the new data in the history table and update the data in the latest history table.
The pro would be that MySQL has a lot less data to go through.
What are common ways to handle this kind of issue?
There are a number of ways to handle slow queries in large tables. The three most basic ways are:
1: Use indexes, and use them correctly. It is important to avoid table scans on large tables; this is almost always your most significant performance hit with single queries.
For example, if you're querying something like: select max(active_date) from activity where user_id=?, then create an index on the activity table for the user_id column. You can have multiple columns in an index, and multiple indexes on a table.
CREATE INDEX idx_user ON activity (user_id)
2: Use summary/"cache" tables. This is what you have suggested. In your case, you could apply an insert trigger to your activity table, which will update the your summary table whenever a new row gets inserted. This will mean that you won't need your code to execute two queries. For example:
CREATE TRIGGER update_summary
AFTER INSERT ON activity
FOR EACH ROW
UPDATE activity_summary SET last_active_date=new.active_date WHERE user_id=new.user_id
You can change that to check if a row exists for the user already and do an insert if it is their first activity. Or you can insert a row into the summary table when a user registers...Or whatever.
3: Review the query! Use MySQL's EXPLAIN command to grab a query plan to see what the optimizer does with your query. Use it to ensure that the optimizer is avoiding table scans on large tables (and either create or force an index if necesary).
I am planing to design a database which may have to store huge amounts of data. But i am not sure which way i should use for this? the records may have fields like user id, record date, group, coordinate and perhaps other properties like that, but the key is the user id.
then i may have to call (select) or process the records with that user id. there may be thousands of user ids so here is the question.
1-) on every record; i should directly store all records in a single table? and
then call or process them like "... WHERE userId=12345 ...".
2-) on every record; i should check if there exists a table with that
user id and if not create a new table with the user id as table name
and store its data in that table. and then call or process them with
"SELECT * FROM ...".
So what would you suggest me?
There are different views about using many databases vs many tables. the common view is that there isn't any performance disadvantage. i prefered to go with the 1st way (single table). the project is finished and there arent any problems. i dont need to change the table all the time. but my main reason was because it is a little bit more complicated and time-consuming to program many tables style.
1-) on every record; i should directly store all records in a single table? and then call or process them like "... WHERE userId=12345 ...".
besides that here is a link of mysql.com about many tables that could be.
Disadvantages of Creating Many Tables in the Same Database
If you have many MyISAM tables in the same database directory, open, close, and create operations are slow. If you execute SELECT statements on many different tables, there is a little overhead when the table cache is full, because for every table that has to be opened, another must be closed. You can reduce this overhead by increasing the number of entries permitted in the table cache.
(http://dev.mysql.com/doc/refman/5.7/en/creating-many-tables.html)
I'm developing software for conducting online surveys. When a lot of users are filling in a survey simultaneously, I'm experiencing trouble handling the high database write load. My current table (MySQL, InnoDB) for storing survey data has the following columns: dataID, userID, item_1 .. item_n. The item_* columns have different data types corresponding to the type of data acquired with the specific items. Most item columns are TINYINT(1), but there are also some TEXT item columns. Large surveys can have more than a hundred items, leading to a table with more than a hundred columns. The users answers around 20 items in one http post and the corresponding row has to be updated accordingly. The user may skip a lot of items, leading to a lot of NULL values in the row.
I'm considering the following solution to my write load problem. Instead of having a single table with many columns, I set up several tables corresponding to the used data types, e.g.: data_tinyint_1, data_smallint_6, data_text. Each of these tables would have only the following columns: userID, itemID, value (the value column has the data type corresponding to its table). For one http post with e.g. 20 items, I then might have to create 19 rows in data_tinyint_1 and one row in data_text (instead of updating one large row with many columns). However, for every item, I need to determine its data type (via two table joins) so I know in which table to create the new row. My zend framework based application code will get more complicated with this approach.
My questions:
Will my solution be better for heavy write load?
Do you have a better solution?
Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.
The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.
There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.
This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.
I have around 150 different databases, with dozens of tables each on one of my servers. I am looking to see which database contains a specific person's name. Right now, i'm using phpmyadmin to search each database indvidually, but I would really like to be able to search all databases and all tables at once. Is this possible? How would I go about doing this?
A solution would be to use the information_schema database, to list all database, all tables, all fields, and loop over all that...
There is this script that could help for at least some part of the work : anywhereindb (quoting) :
This code is search all the tables and
all the rows and columns in a MYSQL
Database. The code is written in PHP.
For faster result, we are only
searching in the varchar field.
But, as Harmen noted, this only works with one database -- which means you'd have to wrap something arround it, to loop over each database on your server.
For more informations about that, take a look at Chapter 19. INFORMATION_SCHEMA Tables ; especially, the SCHEMATA table, which contains the name of all databases on the server.
Here's another solution, based on a stored procedure -- which means less client/server calls, which might make it faster : http://kedar.nitty-witty.com/miscpages/mysql-search-through-all-database-tables-columns-stored-procedure.php
The right way to go about it would be to NORMALIZE your data in the first place!!!
You say name - but most people have at least 2 names (a surname and a forename) are these split up or in the same field? If they are in the same field, then what order do they appear in? how are they capitalized?
The most efficient way to try to identify where the data might be would be to write a program in C which sifts the raw data files (while the DBMS is shut down) looking for the data - but that will only tell you what table they apppear in.
Failing that you need to write some PHP which iterates through each database ('SHOW databases' works much like a select statement), then iterates through each table in the database, then generates a SELECT statement filtering on each CHAR or VARCHAR column large enough to hold the name you are looking for (try running 'DESC $table').
Good luck.
C.
The best answer probably depends on how often you want to do this. If it is ad-hoc once a week type stuff then the above answers are good.
If you want to do this kind of search once a second, maybe create a "data warehouse" database that contains just the table:columns you want to search (heavily indexed, with a reference back to the source database if that is needed) populated by cron job or by stored procedures driven by changes in the 150 databases...
I've got an application in php & mysql where the users writes and reads from a particular table. One of the write modes is in a batch, doing only one query with the multiple values. The table has an ID which auto-increments.
The idea is that for each row in the table that is inserted, a copy is inserted in a separate table, as a history log, including the ID that was generated.
The problem is that multiple users can do this at once, and I need to be sure that the ID loaded is the correct.
Can I be sure that if I do for example:
INSERT INTO table1 VALUES ('','test1'),('','test2')
that the ids generated are sequential?
How can I get the Id's that were just loaded, and be sure that those are the ones that were just loaded?
I've thinked of the LOCK TABLE, but the users shouldn't note this.
Hope I made myself clear...
Building an application that requires generated IDs to be sequential usually means you're taking a wrong approach - what happens when you have to delete a value some day, are you going to re-sequence the entire table? Much better to just let the values fall as they may, using a primary key to prevent duplication.
based on the current implementation of myisam and innodb, yes. however, this is not guaranteed to be so in the future, so i would not rely on it.