Would stored procedures be beneficial in my situation? - php

I recently started working for a fairly small business, which runs a small website. I over heard a co worker mention that either our site or MySQL databases get hit ~87 times a second.
I was also tasked today, with reorganizing some tables in these databases. I have been taught in school that good database design dictates that to represent a many-to-many relationship between two tables I should use a third table as a middle man of sorts. (This third table would contain the id of the two related rows in the two tables.)
Currently we use two separate databases, totalling to a little less than 40 tables, with no table having more than 1k rows. Right now, some PHP scripts use a third table to relate certain rows that has a third column that is used to store a string of comma separated ids if a row in one table relates to more than one row in some other table(s). So if they want to use an id from the third column they would have to get the string and separate it and get the proper id.
When I mentioned that we should switch to using the third table properly like good design dictates they said that it would cause too much overhead for such small tables, because they would have to use several join statements to get the data they wanted.
Finally, my question is would creating stored procedures for these joins mitigate the impact these joins would have on the system?
Thanks a bunch, sorry for the lengthy explanation!

By the sound of things you should really try to redesign your database schema.
two separate databases, totalling to a little less than 40 tables, with no table having more than 1k rows
Sounds like it's not properly normalized - or it has been far to aggressively normalized and would benefit from some polymorphism.
comma separated ids
Oh dear - surrogate keys - not intrinsically bad but often a sign of bad design.
a third table to relate certain rows that has a third column that is used to store a string of comma separated ids
So it's a very long way from normalised - this is really bad.
they said that it would cause too much overhead for such small tables
Time to start polishing up your resume I think. Sounds like 'they' know very little about DBMS systems.
But if you must persevere with this - its a lot easier to emulate a badly designed database from a well designed one (hint - use views) than vice versa. Redesign the database offline and compare the performance of tuned queries - it will run at least as fast. Add views to allow the old code to run unmodified and compare the amount of code you need to performa key operations.

I don't understand how storing a comma separated list of id's in a single column, and having to parse the list of ids in order to get all associated rows, is less complex than a simple table join.
Moving your queries into a stored procedure normally won't provide any sort of benefit. But if you absolutely have to use the comma separated list of values that represent foreign key associations, then a stored procedure may improve performance. Perhaps in your stored procedure you could declare a temporary table (see Create table variable in MySQL for example), and then populate the temporary table, 1 row for every value contained in your comma separated string.
I'm not sure what type of performance gain you would get by doing that though, considering like you mentioned there's not a lot of rows in any of the tables. The whole exercise seems a bit silly to do. Ditching the comma separated list of id's would be the best way to go.

It will be both quicker and more simple to do it in the database than in PHP; that's what database engines are good at. Make indexes on the keys (InnoDB will do this by default) and the joins will be fast; to be honest, with tables that tiny, the joins will almost always be fast.
Stored procedures don't really come into the picture, for two reasons; mainly, they're not going to make any difference to the impact (not that there's any impact anyway - you will be improving app performance by doing this in the DB rather than at the PHP level).
Secondly, avoid MySQL stored procedures like the plague, they're goddamn awful to write and debug if you've ever worked in the stored procedure languages for any other DB at all.

Related

MySQL Database Optimization for Large Volumes of Repeating Data

I'm building a medium sized (100,000 entries) table in MySQL, and I'm trying to optimize it for speed. The entries contain some data that is transactional in nature, this data will obviously be kept in MySQL. The remainder of the data will not change over the life of the table nor is it well suited to a table format (i.e. some entries will contain fields that other entries will not, leading to a lot of 'null' values). Further, much of the data in this second part will repeat, meaning that there may only be 500-1000 unique sets of data which are then paired with the entries in the table.
I'm considering three ways of organizing the data.
1) Leave all the data in MySQL in table format.
2) Serialize the non-unique data and save that data in a single MySQL field.
3) Serialize the non-unique data and save to a file in the hard disk, referenced by a pointer in the MySQL table.
My question is which format would you recommend and why? Which is going to be fastest, given that I will be running many queries on the database?
It sounds like you are describing a normalized database. This is very standard. You would have the "larger" entity as a single table with an id.
For the more voluminous data, you would have a reference to that id, called a foreign key. This is the structure that relational databases were designed for. Part of the meaning of "relational" is relationships between entities.
If you only have a few dozen columns, I wouldn't worry about some values being NULL in some rows and others being NULL in other rows. If you have multiple types of entities, then you can also reflect this in the data structure.
EDIT:
Normalization can have both good and bad effects on performance. In the case where it reduces the size of the data, then the performance is often better than with denormalized data. If you have proper index structures, then normalized data structures usually work pretty well.
Use one of indexing engines, such as Sphinx, do not re-invent the wheel. Sphinx organizes data according to searching / querying options and it is really fast, can handle lots of data. If your database doesnt change often you have to run Sphinx Indexer just once. One of cons of this solution is fact, that Sphinx index files are quite large.
Read this that will help you.
You can also used this and you can find your answer.

How to scale mysql tables for growth

So I'm working on site that will replace an older site with a lot of traffic, and I will also have a lot of data in the DB, so my question to you guys is what is the best way to design mysql tables for growth?
I was thinking to split let's say a table with 5 000 000 rows in 5 tables,with 1 000 000 rows/table and create a relationship between the tables, but I guess this isn't a good option since I will spend a lot of resources and time to figure out in what table my data is.
Or can you guys give me some tips mabe some useful articles?
No, you're absolutely right on the relationships. This technique is called Normalization where you define separate tables because these individual tables are affected with time and independent of other tables.
So if you have a hotel database that keeps a track of rooms and guests, then you know normalization is necessary because rooms and guests are independent of each other.
But you will have foreign keys/surrogate keys in each table (for instance, room_id) that could relate the particular guest entering for that particular room.
Normalization, in your case, could help you optimize that 5000 rows of yours as it would not be optimal for a loop to go over 5000 elements and retrieve an entire data.
Here is a strong example for why normalization is essential in database management.
Partitioning as mentioned in a comment is one way to go, but the first path to check out is even determining if you can break down the tables with the large amounts of data into workable chunks based on some internal data.
For instance, lets say you have a huge table of contacts. You can essentially break down the data into contacts that start from a-d, e-j, etc. Then when you go to add records you just make sure you add the records to the correct table (I'd suggest checking out stored procedures for handling this, so that logic is regulated in the database). You'd also probably set up stored procedures to also get data from the same tables. By doing this however, you have to realize that using auto-incrementing IDs won't work correctly as you won't be able to maintain unique IDs across all of the tables without doing some work yourself.
These of course are the simple solutions. There are tons of solutions for large data sets which also includes looking at other storage solutions, clustering, partitioning, etc. Doing some of these things manually yourself can give you a little bit of an understanding on some of the possibly "manual solutions".

Which is faster in SQL: many Many MANY tables vs one huge table?

I am in the process of creating a website where I need to have the activity for a user (similar to your inbox in stackoverflow) stored in sql. Currently, my teammates and I are arguing over the most effective way to do this; so far, we have come up with two alternate ways to do this:
Create a new table for each user and have the table name be theirusername_activity. Then when I need to get their activity (posting, being commented on, etc.) I simply get that table and see the rows in it...
In the end I will have a TON of tables
Possibly Faster
Have one huge table called activity, with an extra field for their username; when I want to get their activity I simply get the rows from that table "...WHERE username=".$loggedInUser
Less tables, cleaner
(assuming I index the tables correctly, will this still be slower?)
Any alternate methods would also be appreciated
"Create a new table for each user ... In the end I will have a TON of tables"
That is never a good way to use relational databases.
SQL databases can cope perfectly well with millions of rows (and more), even on commodity hardware. As you have already mentioned, you will obviously need usable indexes to cover all the possible queries that will be performed on this table.
Number 1 is just plain crazy. Can you imagine going to manage it, and seeing all those tables.
Can you imagine the backup! Or the dump! That many create tables... that would be crazy.
Get you a good index, and you will have no problem sorting through records.
here we talk about MySQL. So why would it be faster to make separate tables?
query cache efficiency, each insert from one user would'nt empty the query cache for others
Memory & pagination, used tables would fit in buffers, unsued data would easily not be loaded there
But as everybody here said is semms quite crazy, in term of management. But in term of performances having a lot of tables will add another problem in mySQL, you'll maybe run our of file descriptors or simply wipe out your table cache.
It may be more important here to choose the right engine, like MyIsam instead of Innodb as this is an insert-only table. And as #RC said a good partitionning policy would fix the memory & pagination problem by avoiding the load of rarely used data in active memory buffers. This should be done with an intelligent application design as well, where you avoid the load of all the activity history by default, if you reduce it to recent activity and restrict the complete history table parsing to batch processes and advanced screens you'll get a nice effect with the partitionning. You can even try a user-based partitioning policy.
For the query cache efficiency, you'll have a bigger gain by using an application level cache (like memcache) with history-per-user elements saved there and by emptying it at each new insert .
You want the second option, and you add the userId (and possibly a seperate table for userid, username etc etc).
If you do a lookup on that id on an properly indexed field you'd only need something like log(n) steps to find your rows. This is hardly anything at all. It will be way faster, way clearer and way better then option 1. option 1 is just silly.
In some cases, the first option is, in spite of not being strictly "the relational way", slightly better, because it makes it simpler to shard your database across multiple servers as you grow. (Doing this is precisely what allows wordpress.com to scale to millions of blogs.)
The key is to only do this with tables that are entirely independent from a user to the next -- i.e. never queried together.
In your case, option 2 makes the most case: you'll almost certainly want to query the activity across all or some users at some point.
Use option 2, and not only index the username column, but partition (consider a hash partition) on that column as well. Partitioning on username will provide you some of the same benefits as the first option and allow you to keep your sanity. Partitioning and indexing the column this way will provide a very fast and efficient means of accessing data based on the username/user_key. When querying a partitioned table, the SQL Engine can immediately lop off partitions it doesn't need to scan as it can tell based off of the username value queried vs. the ability of that username to reside within a partition. (in this case only one partition could contain records tied to that user) If you have a need to shard the table across multiple servers in the future, partitioning doesn't hinder that ability.
You will also want to normalize the table by separating the username field (and any other elements in the table related to username) into its own table with a user_key. Ensure a primary key on the user_key field in the username table.
This majorly depends now on where you need to retrieve the values. If its a page for single user, then use first approach. If you are showing data of all users, you should use single table. Using multiple table approach is also clean but in sql if the number of records in a single table are very high, the data retrieval is very slow

MySQL many tables or few tables

I'm building a very large website currently it uses around 13 tables and by the time it's done it should be about 20.
I came up with an idea to change the preferences table to use ID, Key, Value instead of many columns however I have recently thought I could also store other data inside the table.
Would it be efficient / smart to store almost everything in one table?
Edit: Here is some more information. I am building a social network that may end up with thousands of users. MySQL cluster will be used when the site is launched for now I am testing using a development VPS however everything will be moved to a dedicated server before launch. I know barely anything about NDB so this should be fun :)
This model is called EAV (entity-attribute-value)
It is usable for some scenarios, however, it's less efficient due to larger records, larger number or joins and impossibility to create composite indexes on multiple attributes.
Basically, it's used when entities have lots of attributes which are extremely sparse (rarely filled) and/or cannot be predicted at design time, like user tags, custom fields etc.
Granted I don't know too much about large database designs, but from what i've seen, even extremely large applications store their things is a very small amount of tables (20GB per table).
For me, i would rather have more info in 1 table as it means that data is not littered everywhere, and that I don't have to perform operations on multiple tables. Though 1 table also means messy (usually for me, each object would have it's on table, and an object is something you have in your application logic, like a User class, or a BlogPost class)
I guess what i'm trying to say is that do whatever makes sense. Don't put information on the same thing in 2 different table, and don't put information of 2 things in 1 table. Stick with 1 table only describes a certain object (this is very difficult to explain, but if you do object oriented, you should understand.)
nope. preferences should be stored as-they-are (in users table)
for example private messages can't be stored in users table ...
you don't have to think about joining different tables ...
I would first say that 20 tables is not a lot.
In general (it's hard to say from the limited info you give) the key-value model is not as efficient speed wise, though it can be more efficient space wise.
I would definitely not do this. Basically, the reason being if you have a large set of data stored in a single table you will see performance issues pretty fast when constantly querying the same table. Then think about the joins and complexity of queries you're going to need (depending on your site)... not a task I would personally like to undertake.
With using multiple tables it splits the data into smaller sets and the resources required for the query are lower and as an extra bonus it's easier to program!
There are some applications for doing this but they are rare, more or less if you have a large table with a ton of columns and most aren't going to have a value.
I hope this helps :-)
I think 20 tables in a project is not a lot. I do see your point and interest in using EAV but I don't think it's necessary. I would stick to tables in 3NF with proper FK relationships etc and you should be OK :)
the simple answer is that 20 tables won't make it a big DB and MySQL won't need any optimization for that. So focus on clean DB structures and normalization instead.

Normalization or Alternative with MySQL

building a site using PHP and MySQL that needs to store a lot of properties about users (for example their DOB, height, weight etc) which is fairly simple (single table, lots of properties (almost all are required)).
However, the system also needs to store other information, such as their spoken languages, instrumental abilities, etc. All in all their are over a dozen such characteristics. By default I assumed creating a separate table (called maybe languages) and then a link table with a composite id (user_id, language_id).
The problem I foresee though is when visitors attempt to search for users using these criteria. The dataset we're looking to use will have over 15,000 users at time of launch and the primary function will be searching and refining users. That means hundreds of queries daily and the prospect of using queries with up a dozen or more JOINs in them is not appealing.
So my question is, is there an alternative that's going to be more efficient? One way I was thinking is storing the M2M values as a CSV of IDs in the user table and then running a LIKE query against it. I know LIKE isn't the best, but is it better than a join?
Any possible solutions will be much appreciated.
Do it with joins. Then, if your performance goals are not met, try something else.
Start with a normalized database (e.g. a languages table, linked to the users table by a mapping table) to make sure you data is represented cleanly and logically.
If you have performance problems, examine your queries and make sure you have suitable indexes.
If you dislike repeatedly coding up queries with many joins, define some views.
If views are very slow to query, consider materialized views.
If you have several thousand records and a few hundred queries per day (really, that's pretty small and low-usage), these techniques will allow your site to run at full speed, with no compromise on data integrity. If you need to scale to many millions of records and millions of queries per day, even these techniques may not be enough; in which case, investigate cacheing and denormalization.

Categories