Scaling a database horizontally using PHP mysql

Scaling a database horizontally using PHP mysql - php

Problem statement: I am working on a application in which a user can follow other users (like twitter or other e-commerce sites) and get their updates on his wall.It is in relation to a merchant and a user. A user can follow any merchant.The user himself can be a merchant,so actually its like a user following other users(Many-many realtion).
Issue: The easiest way to go about it was to have a junction table which will have
id (auto-increment) | follower_user_id | followed_user_id. But I am not sure when the database grows vertically,how well will it scale.If a user follows 100 people there would be 100 entries for a single user.In that case if I want to get the followers of any user it would take longer time for the query to execute.
Research: i tried studying twitter and other websites and DB designs,but they use different databases like graph based Nosql etc to solve their problems.In our case its Mysql.I also went about using caching mechanism but I would like to know,if there is any way I could store the values horizontally i.e each user has his followers in a single row(comma separated would be tedious as I tried it).
Can I have a separate databse for this feature something like Nosql based database (mongo etc). What impact would it have on performnce in different cases?
if my approach of going with the easiset way is right how can I improve the performance for say 5-10k users(looking at a small base now)?Would basic mysql queries work well?
Please help me with inputs over the same.

The system I use (my personal preference) is to add a 2 columns on the users, following and followers and store a simple encrypted json array in it with the ID's of followers and the users that are following..
Only drawback is that when querying you have to decrypt it then json_decode it but it has worked fine for me for almost 2 years.

After going through the comments and doing some research I came to the conclusion that it would be better I go the normal way of creating the followers table and do some indexing and use caching mechanism for it.
Indexing as suggested composite indexes would work well
For caching I am planning to use Memcache!

Related

Must I have a common DB for all users active sessions?

Here I come again ;)
I am doing an application where each user will have their own DB.
Is it ok if I store session for each user in their individual DB? Or is it for some reason convenient to have active sessions in a common DB for all users?
Sorry about my question, I am kind of new to this level. :) I am working with PHP and MySQL, if that makes any difference, although I thik the question is language independent.

In a typical application, there will only be one database with several tables, where each table can have several records.
Sessions
You can just save sessions the same way you would add a record to database.
Profile Details / Friendship
This is where relationships take place.
Consider the image below. Credits to the owner on w3stack(dot)org.
Focus and try to study on the three tables above: Users, Friendships, Friends(virtual table). Ignore the virtual table concept for now, so you will not be much confused.
It is really a BAD, and I mean BAD approach to create individual databases for each users. What if you thought of adding a "following" and "follower" feature to your application? You would need to add another table, and re-add all those friends from another db. If UserA will have 100 friends with each database, you wouldn't want to query all those 100 databases.
To end, just use a single DB, and identify relationships according to your application features. It is important to plan your structure before you actually apply it on hands-on. Happy coding!

Using Sqlite in a web application

I'm currently developping an application which allows doctors to dinamically generate invoices. The fact is, each doctors requires 6 differents database tables, and there could be like 50 doctors connected at the same time and working with the database (writing and reading) at the same time.
What I wanted to know is if the construction of my application fits. For each doctors, I create a personnal Sqlite3 database (all database are secure) which only him can connect to. I'll have like 200 Sqlite database, but is there any problems ? I thought it could be better than using a big MySQL database for everyone.
Is this solution viable ? Will I have problems to deal with ? I never did such an application with so many users, but I thought it could be the best solution

Firstly, to answer your question: no, you probably will not have any significant problems if a single sqlite database is used only by one person (user) at a time. If you highly value certain edge cases, like the ability to move some users/databases to another server, this might be a very good solution.
But it is not a terribly good design. The usual way is to have all data in the same database, and tables having a field which identifies which rows belong to which users. The application code is responsible for maintaining security (i.e. not to let users see data which doesn't belong to them), and indexes in the database (which you should use in all cases, even in your own design) are responsible for making it fast.
There are a large number of tutorials which could help you to make a better database design; a random google result is http://www.profsr.com/sql/sqless02.htm .

Implementing a "Follow" user feature using PHP and Mysql

Issue: I am working on a kind of e-commerce platform which has sellers and buyers.Now in my case a seller can also be a buyer i.e every user can buy plus sell.
So i have a single table called users.Now I want to implement a follow vendor/user feature,wherein the user can click follow and he sees all the goods listed by that vendor under his account(till he unfollows).
Now my traditional approach was to have a table that has a key and two columns to store the follower and the followed Eg:
|id | userId| vendorId So it will go horizontally as the users go on following others.But if I have a user following many people(say 100) my query may take a lot of time to select a 100 records for each user.
Question: How can I implement the follow mechanism?Is there a better approach than this?I am using PHP and Mysql.
Reasearch: I tried going through how facebook and Pinterest handle it,but that seemed a bit too bigg for me to learn now as I don't expect as many users immedeately. Do I need to use memcache to enhance the performance and avoid recurring queries?Can I use a Document Database in any sense parallel with Mysql?
I would like a simple yet powerful implementation that would scale if my userbase grows gradually to a few thousands.
Any help or insights would be very helpful.

Since, from my understanding of this scenario, a user may follow many vendors, and a vendor may have many followers, this constitutes a many<->many relationship, and thus the only normalised way to achieve this in a database schema should be through using a link table, exactly as you described.
As for the performance considerations, I wouldn't worry too much about it, since it could be indexed on userId and vendorId, the queries should be fine.

The junction table is probably the best approach but still a lot depends on your clustered index.
Table clustered with a key on the substitute key id can make adding new records a bit faster.
Table clusetered with a key (userId,vendorId) will make the queries where you look for vendors a certain user follows faster
Table clustered with a key (vendorId,userId) will make the queries where you look for users that follow a certain vendor faster

simple multiple user selections/options

I see many implementations such as the Facebook like, forum karma, mark as read on forum posts and other simple options and selections available to multiple users on a given item.
I know I can implement this in mysql by creating a table which links say post IDs to liker user IDs for say, a like system.
My problem is, on a page with lots of posts, I will have to make a lookup for every post. I use prepared statements so that makes it faster for me.
Is there another way to implement these systems, if not, are there optimisations like database types or other tweaks that can make this faster?
Basically, is there a powerful, fast implementation of a many to many database interaction.
*EDIT***
I'm using opera mini and so I have issues with the ajax and js for commenting
Right now, I have a table with two columns. One for user id and the other for post id. Both are indexed and are used in foreign key constraints.
I'm thinking of making a compound primary key across the two.
My main issue is for the karma. I allow users to vote on each post. The problem is, for each post, I need to get the total votes, determine if a user has voted to either allow the user to or not to vote.
My site allows many users to host their own sites and so I need to seriously optimize this.
Someone suggested I use memory tables for this.
NOTE**
I can't use memcached.

I strongly suggest using something else than a MySQL db. I've written an opensocial app which had both heavy writes and reads to a database. It all started with a MySQL DB, I even switched to a dedicated master slave replication setup. But to no avail, it was expensive and it didn't scale very well.
The final solution was to use a NoSQL db which made the most out of RAM. My decision was mongoDB which has an activy community and solved my problem very well. MongoDB proofed to be highly scalable.

Still a little hazy about what you got so far, but I'll start it off and keep adding stuff if need be:
Make sure you're Indexing
Minimum lookups -so you get the list
of posts that will pop up, use that
list to match the like's, if they've viewed the article etc.
Using numbers - make sure all your
comparisons are with numbers
If you're running queries, don't run a single query for each post
Is there a limit in your query? - make sure you use that
De-normalization is not a sin
You can partition your databases to decrease lookups (e.g. if data is older than 60 days and barely touched, move it to a secondary database/table, so the size of your table is not huge)
e.g. SELECT * FROM user_liked WHERE post_id IN (1,2,3)
instead of
SELECT * FROM user_liked WHERE post_id = 1

Philipp Keller wrote a bunch of articles on tag systems based on MYSQL a few years ago. Just as Like-ing, Tagging is establishing a many-to-many relationship between a thing (tag, article being liked) and a user. The logic in his articles should be directly applicable to your problem as well.
Check out the comments as well.
http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
Database Schemas for Tagging solutions
http://www.pui.ch/phred/archives/2005/05/tags-with-mysql-fulltext.html
Abusing the MySQL FULLTEXT indices for tagging and tag search (requires MyISAM, I'd not go there).
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
Performance Tests of tagging systems

Should I break a larger mysql table into multiple?

I have a pretty large social network type site I have working on for about 2 years (high traffic and 100's of files) I have been experimenting for the last couple years with tweaking things for max performance for the traffic and I have learned a lot. Now I have a huge task, I am planning to completely re-code my social network so I am re-designing mysql DB's and everything.
Below is a photo I made up of a couple mysql tables that I have a question about. I currently have the login table which is used in the login process, once a user is logged into the site they very rarely need to hit the table again unless editing a email or password. I then have a user table which is basicly the users settings and profile data for the site. This is where I have questions, should it be better performance to split the user table into smaller tables? For example if you view the user table you will see several fields that I have marked as "setting_" should I just create a seperate setting table? I also have fields marked with "count" which could be total count of comments, photo's, friends, mail messages, etc. So should I create another table to store just the total count of things?
The reason I have them all on 1 table now is because I was thinking maybe it would be better if I could cut down on mysql queries, instead of hitting 3 tables to get information on every page load I could hit 1.
Sorry if this is confusing, and thanks for any tips.
alt text http://img2.pict.com/b0/57/63/2281110/0/800/dbtable.jpg

As long as you don't SELECT * FROM your tables, having 2 or 100 fields won't affect performance.
Just SELECT only the fields you're going to use and you'll be fine with your current structure.

should I just create a seperate setting table?
So should I create another table to store just the total count of things?
There is not a single correct answer for this, it depends on how your application is doing.
What you can do is to measure and extrapolate the results in a dev environment.
In one hand, using a separate table will save you some space and the code will be easier to modify.
In the other hand you may lose some performance ( and you already think ) by having to join information from different tables.
About the count I think it's fine to have it there, although it is always said that is better to calculate this kind of stuff, I don't think for this situation it hurt you at all.
But again, the only way to know what's better your you and your specific app, is to measuring, profiling and find out what's the benefit of doing so. Probably you would only gain 2% of improvement.

You'll need to compare performance testing results between the following:
Leaving it alone
Breaking it up into two tables
Using different queries to retrieve the login data and profile data (if you're not doing this already) with all the data in the same table
Also, you could implement some kind of caching strategy on the profile data if the usage data suggests this would be advantageous.

You should consider putting the counter-columns and frequently updated timestamps in its own table --- every time you bump them the entire row is written.

I wouldn't consider your user table terrible large in number of columns, just my opinion. I also wouldn't break that table into multiple tables unless you can find a case for removal of redundancy. Perhaps you have a lot of users who have the same settings, that would be a case for breaking the table out.

Should take into account the average size of a single row, in order to find out if the retrieval is expensive. Also, should try to use indexes as while looking for data...
The most important thing is to design properly, not just to split because "it looks large". Maybe the IP or IPs could go somewhere else... depends on the data saved there.
Also, as the socialnetworksite using this data also handles auth and autorization processes (guess so), the separation between login and user tables should offer a good performance, 'cause the data on login is "short enough", while the access to the profile could be done only once, inmediately after the successful login. Just do the right tricks to improve DB performance and it's done.
(Remember to visualize tables as entities, name them as an entity, not as a collection of them)

Two things you will want to consider when deciding whether or not you want to break up a single table into multiple tables is:
MySQL likes small, consistent datasets. If you can structure your tables so that they have fixed row lengths that will help performance at the potential cost of disk space. One thing that from what I can tell is common is taking fixed length data and putting it in its own table while the variable length data will go somewhere else.
Joins are in most cases less performant than not joining. If the data currently in your table will normally be accessed all at the same time then it may not be worth splitting it up as you will be slowing down both inserts and quite potentially reads. However, if there is some data in that table that does not get accessed as often then that would be a good candidate for moving out of the table for performance reasons.
I can't find a resource online to substantiate this next statement but I do recall in a MySQL Performance talk given by Jay Pipes that he said the MySQL optimizer has issues once you get more than 8 joins in a single query (MySQL 5.0.*). I am not sure how accurate that magic number is but regardless joins will usually take longer than queries out of a single table.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.