What is better extra query or extra column in database? - php

What is better extra query or extra column in database for data that will be available very less time.
Example: In Case of sub user management either i add one extra column super_user_id in main users table and make enrty if users types are sub_user and the default column value is -1 or i create new table and manage sub user in that table.
But in case of login i have to search in two tables and this i have to make one more query.
Thanks

There is no general answer; you'll have to be more specific. All I can provide are general principles.
All else being equal, you'll be better off with a well-normalized database without redundant information, for a number of reasons. But there are situations where redundant information could save your program a lot of time. One example is text formatted with Markdown: you need to store the original markup to allow for editing, but formatting the source every time you need the output may be extremely taxing on the system. Therefore, you might add a redundant column to store the formatted output and assume the additional responsibility of ensuring that that column is kept up-to-date.
All I know about your situation is that the postulated extra column would save a query. The only correct answer to that is that you should probably keep your table clean and minimal unless you know that the performance benefit of saving one query will make up for it. Remember, premature optimization is the root of all evil – you may find that your application runs more than fast enough anyways. If find while profiling that the extra query is a significant bottleneck, then you might consider adding the column.
Again, without more knowledge of your situation, it is impossible to provide a specific or concrete recommendation, but I hope that I've at least helped you to come to a decision.

Do you mean calculating a value in the your query versus storing a calculated value?
This depends on how often it will be updated, how big the data will be, how often it is needed. There may be no theoretical best answer, you will need to test and profile.

It depends on amount of redundency you will ad to table by adding a column.
With proper indexing and design joins work better so no need to afraid of normalizing if required.

Use the second table. It will not require you to issue two queries. Instead, you will issue a single query JOINing the two tables together or, better yet, create a VIEW that does the JOIN for you:
SELECT usertable.col1, usertable.col2 superusertable.superuserid
FROM usertable LEFT OUTER JOIN superusertable
ON usertable.userid = superusertable.userid
This allows you to maintain proper normalized structure, helps you in certain queries (like figuring out who is a super_user), and allows the database to optimize the search issues.

Doing an additional query will always take more time.
Adding an extra column in DB will not have any significant impact, even if you should have thousands of rows.
Ergo, add extra column and save DB trafic :)

Related

Best practice for filtering MySQL results

I want to implement a filter-function in my PHP project.
To implement a filter, I usually just add a WHERE clause in my query to show filtered results.
My Problem is:
These filters require not only a smple added WHERE clause, but a huge Query including multiple JOINs. The resulting Query has > 30 lines.
Later, there should also be a search function which would then also require this huge query.
I wonder if this is a good practice or if I should add a "redundant" Database column to my database table where I compute the attribute I need for filtering on every update.
With this column, I wouldnt have my huge query on different places over my project, but have a redundant column.
What do you think?
Greetings
As questioned, here the table structure/code. This is not the exact code, because there is also a revision system which makes it even more complex, but for understanding this is enough:
table submissions:
ID (primary)
(additionalColumns)
table reports:
ID (primary)
submissionID (reference to submission table)
(additionalColumns)
table report_objects:
reportID (reference to reports table, multiple report_object for one report)
table accounting:
ID (primary)
reportID (reference to reports table, multiple accountings for one report)
(additionalColumns)
table accounting_objects:
ID
accountingID (reference to accounting table, multiple accounting_object for one accounting)
(additionalColumns)
For a submission, one or multiple reports are being create with multiple objects to account (report_objects).
For each report, I can create multiple accountings, where each accounting is for a few objects of the report. The accounted report_objects are stored in accounting_object
My query/filter checks, if each report_object of a submissionID is accounted (accounting_object exists) for one submissionID.
There isn't one definitive answer and, in practice, if it works and runs quickly enough for your needs then you can leave it as is. Optimization is always something you can come back to.
Joining correctly
If you are simply checking for the existence of a join table and only including the results with that join you can do this through the correct LEFT / RIGHT JOIN expressions. This is always the first call.
Expressiveness
Also be as expressive as you can with SQL, you want to give it the best chance to optimize your query, there are keywords such as EXISTS, for example, make sure to use them.
Denormalization
You can add in a column that stores the computed value, the complexity that arises out of this is ensuring that the value is always up to date. This can be done by triggers or manually. The pros:
It is the easiest method of getting around slowness introduced by computed columns.
The cons:
Ruins your nice normalized schema
If you do it manually in code, you will forget to do it somewhere, causing headaches.
Triggers can be a bit of a pain.
Materialized view
This is like denormalization but prevents polluting your normalized tables by created a stored view. This is achieved in MySQL by storing the result of your complex select into a results table when the values change. Again, the same as denormalization, the complexity is keeping this up to date. It is typically done with triggers. This can be a pain but keeps the complexity out of your schema. As mentioned by#eggyal it isn't a supported feature of MySQL yet so you will have to DIY...
Materialized views with MySQL
Pros:
Keeps dirty denormalized stuff away from your nice normalized schema.
Cons:
Materialized views aren't supported so setting them up requires work.
If you trigger the refresh of your views in code you get stale data, but isn't quite as painful as the single column staleness of denormalization.
Triggers can be a bit of a pain.
If you aren't sure, and it really matters, do some benchmarking.
EDIT If you code has this query in one form or another across your code base then that has the possibility of cause headaches in future as you will have to remember to change the statements in all of the places if or when they change.
If by doing the above you have made your statements really simple and concise then they may differ enough from each other for it to not be a problem.
You can do some things to help you out:
Put all of the related queries in a single place, i.e. a single class or script that handles this query in its various forms. This way at least all of the changes are limited to the one file.
You can, to help yourself out a bit more, do a bit of refactoring it to remove duplication between the queries.
Also, If you feel the database information is too exposed to the code you may want to abstract it behind a view.

SQL query is much faster if I create indexes

Is it ok if I create like 8 indexes inside a table which has 13 columns?
If I select data from it and sort the results by a key, the query is really fast, but if the sort field is not a key it's much slower. Like 40 times slower.
What I'm basically asking is if there are any side effects of having many keys in the database...
Creating indexes on a table slows down all write operations on it a little, but speeds up read operations on the relevant columns a lot. If your application is not going to be doing lots and lots of writes to that table (which is true of most applications) then you are going to be fine.
Don't create indexes that are redundant or unused. But do create indexes you need to optimize the queries you run.
You choose indexes in any table based on your queries. Each query may use a different index, so it pays to analyze your queries carefully. See my presentation MENTOR Your Indexes. I also cover similar information in the chapter on indexing in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
There is no specific rule about how many indexes is too many. In Oracle SQL Tuning Pocket Reference, author Mark Gurry says:
My recommendation is to avoid rules stating a site will not have any more than a certain number of indexes. The bottom line is that all SQL statements must run acceptably. There is ALWAYS a way to achieve this. If it requires 10 indexes on a table, then you should put 10 indexes on the table.
There are a couple of good tools to help you find redundant or unused indexes for MySQL in Percona Toolkit: http://www.percona.com/doc/percona-toolkit/pt-duplicate-key-checker.html and pt-index-usage.
This is a good question and everyone who works with mysql should know the answer. It is also commonly asked. Here is a link to one of them with a good answer:
Indexing every column in a table
In a nutshell, each new index requires space (especially if you use InnoDB - see the "Disadvantages of clustering" section in this article) and slows down INSERTs, UPDATEs and DELETEs.
Only you are in a position to decide whether speedup you'll get in SELECT and the frequency with which it will be used is worth it. But whatever you eventually decide, make sure you base your decision on measurement, not guessing!
P.S. INSERTs, UPDATEs and DELETEs with WHERE can also be sped-up by index(es), but that's another topic...
The cost of an index in disk space is generally trivial. The cost of additional writes to update the index when the table changes is often moderate. The cost in additional locking can be severe.
It depends on the read vs write ratio on the table, and on how often the index is actually used to speed up a query.
Indexes use up disc space to store, and take time to create and maintain. Unused ones don't give any benefit. If there are lots of candidate indexes for a query, the query may be slowed down by having the server choose the "wrong" one for the query.
Use those factors to decide whether you need an index.
It is usually possible to create indexes which will NEVER be used - for example, and index on a (not null) field with only two possible values, is almost certainly going to be useless.
You need to explain your own application's queries to make sure that the frequently-performed ones are using sensible indexes if possible, and create no more indexes than required to do that.
You can get more by following this links:
For mysql:
http://www.mysqlfaqs.net/mysql-faqs/Indexes/What-are-advantages-and-disadvantages-of-indexes-in-MySQL
For DB2:
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0005052.htm
Indexes improve read performance, but increase size, and degrade insert/update. 8 indexes seem to be a bit too many for me; however, it depends on how often you typically update the table
Assuming MySQL from tag, even though OP makes no mention of it.
You should edit your question and add the fact that you are conducting order by operations as well (from a comment you posted to a solution). order by operations will also slow down queries (as will various other mysql ops) because MySQL has to create a temp table to accomplish the ordered result set (more info here). A lot of times, if the dataset allows it, I will pull the data I need, then order it at the application layer to avoid this penalty.
Your best bet is to EXPLAIN your most used queries, and check your slow query log.

Which is faster in SQL: many Many MANY tables vs one huge table?

I am in the process of creating a website where I need to have the activity for a user (similar to your inbox in stackoverflow) stored in sql. Currently, my teammates and I are arguing over the most effective way to do this; so far, we have come up with two alternate ways to do this:
Create a new table for each user and have the table name be theirusername_activity. Then when I need to get their activity (posting, being commented on, etc.) I simply get that table and see the rows in it...
In the end I will have a TON of tables
Possibly Faster
Have one huge table called activity, with an extra field for their username; when I want to get their activity I simply get the rows from that table "...WHERE username=".$loggedInUser
Less tables, cleaner
(assuming I index the tables correctly, will this still be slower?)
Any alternate methods would also be appreciated
"Create a new table for each user ... In the end I will have a TON of tables"
That is never a good way to use relational databases.
SQL databases can cope perfectly well with millions of rows (and more), even on commodity hardware. As you have already mentioned, you will obviously need usable indexes to cover all the possible queries that will be performed on this table.
Number 1 is just plain crazy. Can you imagine going to manage it, and seeing all those tables.
Can you imagine the backup! Or the dump! That many create tables... that would be crazy.
Get you a good index, and you will have no problem sorting through records.
here we talk about MySQL. So why would it be faster to make separate tables?
query cache efficiency, each insert from one user would'nt empty the query cache for others
Memory & pagination, used tables would fit in buffers, unsued data would easily not be loaded there
But as everybody here said is semms quite crazy, in term of management. But in term of performances having a lot of tables will add another problem in mySQL, you'll maybe run our of file descriptors or simply wipe out your table cache.
It may be more important here to choose the right engine, like MyIsam instead of Innodb as this is an insert-only table. And as #RC said a good partitionning policy would fix the memory & pagination problem by avoiding the load of rarely used data in active memory buffers. This should be done with an intelligent application design as well, where you avoid the load of all the activity history by default, if you reduce it to recent activity and restrict the complete history table parsing to batch processes and advanced screens you'll get a nice effect with the partitionning. You can even try a user-based partitioning policy.
For the query cache efficiency, you'll have a bigger gain by using an application level cache (like memcache) with history-per-user elements saved there and by emptying it at each new insert .
You want the second option, and you add the userId (and possibly a seperate table for userid, username etc etc).
If you do a lookup on that id on an properly indexed field you'd only need something like log(n) steps to find your rows. This is hardly anything at all. It will be way faster, way clearer and way better then option 1. option 1 is just silly.
In some cases, the first option is, in spite of not being strictly "the relational way", slightly better, because it makes it simpler to shard your database across multiple servers as you grow. (Doing this is precisely what allows wordpress.com to scale to millions of blogs.)
The key is to only do this with tables that are entirely independent from a user to the next -- i.e. never queried together.
In your case, option 2 makes the most case: you'll almost certainly want to query the activity across all or some users at some point.
Use option 2, and not only index the username column, but partition (consider a hash partition) on that column as well. Partitioning on username will provide you some of the same benefits as the first option and allow you to keep your sanity. Partitioning and indexing the column this way will provide a very fast and efficient means of accessing data based on the username/user_key. When querying a partitioned table, the SQL Engine can immediately lop off partitions it doesn't need to scan as it can tell based off of the username value queried vs. the ability of that username to reside within a partition. (in this case only one partition could contain records tied to that user) If you have a need to shard the table across multiple servers in the future, partitioning doesn't hinder that ability.
You will also want to normalize the table by separating the username field (and any other elements in the table related to username) into its own table with a user_key. Ensure a primary key on the user_key field in the username table.
This majorly depends now on where you need to retrieve the values. If its a page for single user, then use first approach. If you are showing data of all users, you should use single table. Using multiple table approach is also clean but in sql if the number of records in a single table are very high, the data retrieval is very slow

What is the best strategy to store user searches for an email alert?

Users can do advanced searches (they are many possible parameters):
/search/?query=toto&topic=12&minimumPrice=0&maximumPrice=1000
I would like to store the search parameters (after the /search/?) for an email alert.
I have 2 possibilites:
Storing the raw request (query=toto&topicId=12&minimumPrice=0&maximumPrice=1000) in a table with a structure like id, parameters.
Storing the request in a structured table id, query, topicId, minimumPrice, maximumPrice, etc.
Each solution has its pros and cons. Of course the solution 2 is the cleaner, but is it really worth the (over)effort?
If you already have implemented such a solution and have experienced the maintenance of it, what is the best solution?
The better solution should be the best for each dimension:
Rigidity
Fragility
Viscosity
Performance
Daniel's solution is likely to be the cleanest solution, but I get your point about performance. I'm not very familiar with PHP, but there should be some db abstraction library that takes care relations and multiple inserts so that you get the best performance, right? I only mention it because there may not be a real performance issue. DO you have load tests that point to an issue perhaps?
Anyway, if it is between your original 2 solutions, I would have to select the first. Having a table with column names (like your solution #2) is just asking for trouble. If you add new params, you have to modify the table columns. And there is the ever present issue of "what do we put to indicate not selected vs left empty?"
So I don't agree that solution 2 is cleaner.
You could have a table consisting of three columns: search_id, key, value with the two first being the primary key. This way you can reconstruct a particular search if you have the ID of a saved search. This also allows you to expand with additional search keywords without having to actually modify your table.
If you wish, you can also have key be a foreign key to another table containing valid search terms to ensure integrity. Whether you want to do that depends on your specific needs though.
Well that's completely dependent on what you want to do with the data. For the PHP part, you need to process it anyway, either on insertion or selection time.
For really large number of parameters you may save some time with the 1st on the database management/maintenance, since you don't need to change anything about your database scheme.
Daniel's answer is a generic solution, but if you consider performance an issue, you may end up doing too many inserts on the database side for a single search (one for each parameter). Too many inserts is a common source of performance problems.
You know your resources.

Should I break a larger mysql table into multiple?

I have a pretty large social network type site I have working on for about 2 years (high traffic and 100's of files) I have been experimenting for the last couple years with tweaking things for max performance for the traffic and I have learned a lot. Now I have a huge task, I am planning to completely re-code my social network so I am re-designing mysql DB's and everything.
Below is a photo I made up of a couple mysql tables that I have a question about. I currently have the login table which is used in the login process, once a user is logged into the site they very rarely need to hit the table again unless editing a email or password. I then have a user table which is basicly the users settings and profile data for the site. This is where I have questions, should it be better performance to split the user table into smaller tables? For example if you view the user table you will see several fields that I have marked as "setting_" should I just create a seperate setting table? I also have fields marked with "count" which could be total count of comments, photo's, friends, mail messages, etc. So should I create another table to store just the total count of things?
The reason I have them all on 1 table now is because I was thinking maybe it would be better if I could cut down on mysql queries, instead of hitting 3 tables to get information on every page load I could hit 1.
Sorry if this is confusing, and thanks for any tips.
alt text http://img2.pict.com/b0/57/63/2281110/0/800/dbtable.jpg
As long as you don't SELECT * FROM your tables, having 2 or 100 fields won't affect performance.
Just SELECT only the fields you're going to use and you'll be fine with your current structure.
should I just create a seperate setting table?
So should I create another table to store just the total count of things?
There is not a single correct answer for this, it depends on how your application is doing.
What you can do is to measure and extrapolate the results in a dev environment.
In one hand, using a separate table will save you some space and the code will be easier to modify.
In the other hand you may lose some performance ( and you already think ) by having to join information from different tables.
About the count I think it's fine to have it there, although it is always said that is better to calculate this kind of stuff, I don't think for this situation it hurt you at all.
But again, the only way to know what's better your you and your specific app, is to measuring, profiling and find out what's the benefit of doing so. Probably you would only gain 2% of improvement.
You'll need to compare performance testing results between the following:
Leaving it alone
Breaking it up into two tables
Using different queries to retrieve the login data and profile data (if you're not doing this already) with all the data in the same table
Also, you could implement some kind of caching strategy on the profile data if the usage data suggests this would be advantageous.
You should consider putting the counter-columns and frequently updated timestamps in its own table --- every time you bump them the entire row is written.
I wouldn't consider your user table terrible large in number of columns, just my opinion. I also wouldn't break that table into multiple tables unless you can find a case for removal of redundancy. Perhaps you have a lot of users who have the same settings, that would be a case for breaking the table out.
Should take into account the average size of a single row, in order to find out if the retrieval is expensive. Also, should try to use indexes as while looking for data...
The most important thing is to design properly, not just to split because "it looks large". Maybe the IP or IPs could go somewhere else... depends on the data saved there.
Also, as the socialnetworksite using this data also handles auth and autorization processes (guess so), the separation between login and user tables should offer a good performance, 'cause the data on login is "short enough", while the access to the profile could be done only once, inmediately after the successful login. Just do the right tricks to improve DB performance and it's done.
(Remember to visualize tables as entities, name them as an entity, not as a collection of them)
Two things you will want to consider when deciding whether or not you want to break up a single table into multiple tables is:
MySQL likes small, consistent datasets. If you can structure your tables so that they have fixed row lengths that will help performance at the potential cost of disk space. One thing that from what I can tell is common is taking fixed length data and putting it in its own table while the variable length data will go somewhere else.
Joins are in most cases less performant than not joining. If the data currently in your table will normally be accessed all at the same time then it may not be worth splitting it up as you will be slowing down both inserts and quite potentially reads. However, if there is some data in that table that does not get accessed as often then that would be a good candidate for moving out of the table for performance reasons.
I can't find a resource online to substantiate this next statement but I do recall in a MySQL Performance talk given by Jay Pipes that he said the MySQL optimizer has issues once you get more than 8 joins in a single query (MySQL 5.0.*). I am not sure how accurate that magic number is but regardless joins will usually take longer than queries out of a single table.

Categories