usage of database model for my site - php

I am building a social networking site. i hope for some high traffic in it. i am using php and mysql in it. i already started with RDBMS kind of database. I read that many high traffic sites use key value database model. In my situation which one should i go for ? and i guess it would be better to decide it at this early stage itself

For now, stick with MySQL in a traditional RDBMS format if that is what you are most familiar with. Getting your site up and running as fast as possible is WAY more important than worrying about scale issues at the 1st stages of building a site.
That being said, it doesn't hurt to keep scale concerns in mind as you design parts of the system. MySQL is already very good at some basic scaleability pieces, such as sharding, so you will probably be just fine for quite a while. Having a good DB design, with plenty of indexes, will also keep you running if you do hit sufficiently high traffic levels.
Since you expect high traffic volume (don't we all?), I would highly suggest logging / tracking the load on your server so that you can measure the actual traffic and determine if you truly do need to scale (up or out are both good options depending on the load characteristics)

I think it would be good if you had the key value tables for relationships between friends.
For example person a could have 500 friends and person b could have 100. That means that to prevent duplicated data being copied over and over again in one table, you would get the id of the person one and id of the his friends and put them in a table. This will result in faster searches and inserts and updates because your are working with integers.
E.g
table friends_relationship
id - friend
1 - 2
1 - 3
2 - 4
3 - 4
you need to make sure that the relationships are unique

Related

Multiple client tables or large overall table

I've recently taken over a project linking to a large MySQL DB that was originally designed many years ago and need some help.
Currently the DB has 5 tables per client that store their users information, transaction history, logs etc. However we currently have ~900 clients that have applied to use our services, with an average of 5 new clients applying weekly. So the DB has grown to nearly 5000 tables and ever increasing. Many of our clients do not end up using our services so their tables are all empty but still in the DB.
The original DB designer says it was created this way so if a table was ever compromised it would not reveal information on any other client.
As I'm redesigning the project in PHP I'm thinking of redesigning the DB to have an overall user, transaction history, log etc tables using the clients unique id to reference them.
Would this approach be correct or should the DB stay as is?
Could you see any possible security / performance concerns
Thanks for all your help
You should redesign the system to have just five tables, with a separate column identifying which client the row pertains to. SQL handles large tables well, so you shouldn't have to worry about performance. In fact, having many, many tables can be a hinderance to performance in many cases.
This has many advantages. You will be able to optimize the table structures for all clients at once. No more trying to add an index to 300 tables to meet some performance objective. Managing the database, managing the tables, backing things up -- all of these should be easier with a single table.
You may find that the database even gets smaller in size. This is because, on average, each of those thousands of tables has a half-paged filled at the end. This will go from thousands of half-pages to just one.
The one downside is security. It is easier to put security on tables than one rows in tables. If this is a concern, you may need to think about these requirements.
This may just be a matter of taste, but I would find it far more natural - and thus maintainable - to store this information in as few tables as possible. Also most if not all database ORMs will be expecting a structure like this, and there is no reason to reinvent that wheel.
From the perspective of security, it sounds like this project could be described as a web app. Obviously I don't know the realities of the business logic you're dealing with, but it seems like regardless of the table permissions all access to the database would be via the code base, in which case the app itself needs full permissions for all tables - nullifying any advantage of keeping the tables separated.
If there is a compelling reason for the security measures - say, different services that feed data into the DB independently of the web app, I would still explore ways to handle that authentication at the application layer instead of at the database layer. It will be much easier to handle your security rules in that way. Instead of having rules set in 5000+ different places, a single security rule of 'only let a user view a row of data if their user id equals the user_id column" is far simpler, easier to understand, and therefore far more maintainable (and possibly more secure).
Different people approach databases in different ways. I am a web developer, so I view databases as the place to store my data and nothing more, as it's always a dedicated and generally single-purpose DB installation, and I handle all other logic at the application level. There are people who view databases as the application itself, who make far more extensive use of built-in security features for their massive, distributed, multi-user systems - but I honestly don't know enough about those scenarios to comment on exactly where that line should be drawn.

card game engine with sql

i am planning to create a card game engine using sql, the game consits of 4 human players and cards are in an sql table, now every thing is done regarding game logic and points, each game is manged by a seperate sql table, and players are able to create rooms
each room shall have a game table contains cards data with each player represnted in a column and a seperate chat table
if there was 1000 games running in the same
time and each time a card played then a requst is made to the server
either to remove a card from a players deck record player score and
total game score, can this be handled in a single sql database
without delayes and performance issues?
can i use global temporary tables ##sometable for each game room or
do i have to create the tables manually and delete them after the
game ends?
i would like also to know if storing chat data in a single sql
table would make issues, one thing i thought of is saving chat data
for all open rooms in a single datatable with a game id column, but
would this give some performance issues if there was thausands of
lines of chat data?
also what about a database for each game, would that be an over
kill?
How such applications are managed normally?
do i have to use multiple servers and distribute the running games
on them?
any ideas you have about optimizing such things
You should consider a memory-based cache system such as Velocity or Memcached to address the performance issues.
Yes. The discussion of how to scale a task like this is a long one.
You could. But you should rather consider a smarter model whereby multiple games occur in a single table.
I would use SQL Server Service Broker for the chat
Yes.
I recommend you break your questions up into multiple questions so that contributers who specialise in a single aspect of your problem domain can contribute accordingly.
I don't know how PHP works; but I am fairly sure that it would be far more efficient for a lot of the game logic to occur client-side. Making a server call for every game action would work, my opinion is just that it is sub-optimal.
Yes, I would expect live players to have at least 1 second delay before making their moves and only one play is making a move at a time per game. So roughly 1000 transactions per second peak for 1000 games. Not an excessive load on modern architectures.
There is more overhead in most DBMSs for creating and destroying tables. Keep it all in the same table.
Chat would be fine in a single table. You could keep performance up by archiving chat from previous inactive games and removing from the primary live db.
Yes, very inefficient. Added complexity for no gain.
Not sure what you are asking.
Only as you scale. I would imagine you would start with a single db server until you needed more capacity.
Good design db design from the beginning from someone with experience will go a long ways. Don't waste too much time micro-optimizing at the get go or you will never get off the ground. Optimize as you need to as you scale.
The short version is that relational DB such as SQL Server are not very useful for games because they cannot efficiently store heavily structured hierarchical data
I would still advocate avoiding SQL, but there are now many more options in the NoSQL and for real performance you should consider using a Fast Temporary Storage such as Redis or Memcache
You can quickly look at Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison
Optimizing is a different topic entirely .. to wide and project specific .

Multi-tenant PHP SaaS - Separate DB's for each client, or group them?

You'll have to bear with me here for possibly getting some of the terminology slightly wrong as I wasn't even aware that this fell into the whole 'multi-tenant' 'software as a service' category, but here it does.
I've developed a membership system (in PHP) for a client. We're now looking at offering it as a completely hosted solution for our other clients, providing a subdomain (or even their own domain).
The options I seem to have on the table, as far as data storage goes are:
Option 1 - Store everything in 1 big database, and have a 'client_id' field on the tables that need it (there would be around 30 tables that it would apply to), and have a 'clients' table storing their main settings, details, etc and the domain to map to them. This then just sets a globally accessible variable containing their individual client id - I'd obviously have to modify every single query to check for the client_id column.
Option 2 - Have a master table with the 'shared reference' tables, and the 'clients' table. Then have 'blocks' of other databases, which each contain, say 10 clients. The client would get their own database tables, prefixed with their client ID. This adds a little bit of security to protect against seeing other client data if something went really wrong.
Option 3 - Exactly the same as option 2, except you have 1 database for each and every client, completely isolating them from other clients, and theoretically providing a bit more protection that if 1 client's tables were hacked or otherwise damaged, it wouldn't affect anyone else. The biggest downside is that when deploying a new client, an entire database, user and password need setting up, etc. Could this possibly also cause a fair amount of overhead, or would it be pretty much the same as if you had everyone in one database?
A few points as well - some of these clients will have 5000+ 'customers' along with all the details for those customers - this is why option 1 may be a bit of an issue - if I've got 100 clients, that could equal over half a million rows in 1 table.
Am I correct in thinking Option 3 would be the best way to go in a situation where security of customer data (and payment information) is key. From recommendations I've had, a few people have said to go with option 1 because 'its easier' however I really don't see it that way. I see it as a potential bottleneck down the line, as surely I can move clients around much easier if they have their own database.
(FYI The system is PHP based with MySQL)
Option 3 is the most scalable. While at first it may seem more complicated, it can be completely automated and will save you headaches on the future. You can also scale more efficiently by having client databases on multiple servers for increased performance.
I agree with Ozzy - I did this for an online database product. We had one master database that basically had a glorified user table. Each customer had their own database. What was great about this is that I could move one customers database from server A to server B easily [mysql] and could do so with command line tools in a pinch. Also doing maintenance on large tables, dropping/adding indexes can really screw up your application, especially if say, adding an index locks the table [mysql]. It affects everyone. With, presumably, smaller databases you are more immune to this and have more options when you need to roll out schema level changes. I just like the flexibility.
When many years ago I designed a platform for building SaaS applications in PHP, I opted for the third option: multi tenant code and single tenant databases.
In my experience, that is the most scalable option, but it also needs a set of scripts to propagate changes when updating code, DB schemes, enabling applications to a tenant, etc.
So a lot of my effort went in building a component based, extensible engine to fully automate all those tasks and minimize system administration stuff. I strongly advise to build such an architecture if you want to adopt the third option.

Practicality of multiple databases per client vs one database

I'm going to try to make this as brief as possible while covering all points - I work as a PHP/MySQL developer currently. I have a mobile app idea with a friend and we're going to start developing it.
I'm not saying it's going to be fantastic, but if it catches on, we're going to have a LOT of data.
For example, we'd have "clients," for lack of a better term, who would have anywhere from 100-250,000 "products" listed. Assuming the best, we could have hundreds of clients.
The client would edit data through a web interface, the mobile interface would just make calls to the web server and return JSON (probably).
I'm a lowly cms-developing kinda guy, so I'm not sure how to handle this. My question is more or less about performance; the most I've ever seen in a MySQL table was 340k, and it was already sort of slow (granted it wasn't the best server either).
I just can't fathom a table with 40 million rows (and potential to continually grow) running well.
My plan was to have a "core" database that held the name of the "real" database, so the user would come in and try to access a client's data, it would go to the core database and figure out which database to get the information from.
I'm not concerned with data separation or data security (it's not private information)
Yes, it's possible and my company does it. I'm certainly not going to say it's smart, though. We have a SAAS marketing automation system. Some client's databases have 1 million+ records. We deal with a second "common" database that has a "fulfillment" table tracking emails, letters, phone calls, etc with over 4 million records, plus numerous other very large shared tables. With proper indexing, optimizing, maintaining a separate DB-only server, and possibly clustering (which we don't yet have to do) you can handle a LOT of data......in many cases, those who think it can only handle a few hundred thousand records work on a competing product for a living. If you still doubt whether it's valid, consider that per MySQL's clustering metrics, an 8 server cluster can handle 2.5million updates PER SECOND. Not too shabby at all.....
The problem with using two databases is juggling multiple connections. Is it tough? No, not really. You create different objects and reference your connection classes based on which database you want. In our case, we hit the main database's company class to deduce the client db name and then build the second connection based on that. But, when you're juggling those connections back and forth you can run into errors that require extra debugging. It's not just "Is my query valid?" but "Am I actually getting the correct database connection?" In our case, a dropped session can cause all sorts of PDO errors to fire because the system no longer can keep track of which client database to access. Plus, from a maintainability standpoint, it's a scary process trying to push table structure updates to 100 different live database. Yes, it can be automated. But one slip up and you've knocked a LOT of people down and made a ton of extra work for yourself. Now, calculate the extra development and testing required to juggle connections and push updates....that will be your measure of whether it's worthwhile.
My recommendation? Find a host that allows you to put two machines on the same local network. We chose Linode, but who you use is irrelevant. Start out with your dedicated database server, plan ahead to do clustering when it's necessary. Keep all your content in one DB, index and optimize religiously. Finally, find a REALLY good DB guy and treat him well. With that much data, a great DBA would be a must.

How to increase performance for MySQL database

How to increase the performance for mysql database because I have my website hosted in shared server and they have suspended my account because of "too many queries"
the stuff asked "index" or "cache" or trim my database
I don't know what does "index" and cache mean and how to do it on php
thanks
What an index is:
Think of a database table as a library - you have a big collection of books (records), each with associated data (author name, publisher, publication date, ISBN, content). Also assume that this is a very naive library, where all the books are shelved in order by ISBN (primary key). Just as the books can only have one physical ordering, a database table can only have one primary key index.
Now imagine someone comes to the librarian (database program) and says, "I would like to know how many Nora Roberts books are in the library". To answer this question, the librarian has to walk the aisles and look at every book in the library, which is very slow. If the librarian gets many requests like this, it is worth his time to set up a card catalog by author name (index on name) - then he can answer such questions much more quickly by referring to the catalog instead of walking the shelves. Essentially, the index sets up an 'alternative ordering' of the books - it treats them as if they were sorted alphabetically by author.
Notice that 1) it takes time to set up the catalog, 2) the catalog takes up extra space in the library, and 3) it complicates the process of adding a book to the library - instead of just sticking a book on the shelf in order, the librarian also has to fill out an index card and add it to the catalog. In just the same way, adding an index on a database field can speed up your queries, but the index itself takes storage space and slows down inserts. For this reason, you should only create indexes in response to need - there is no point in indexing a field you rarely search on.
What caching is:
If the librarian has many people coming in and asking the same questions over and over, it may be worth his time to write the answer down at the front desk. Instead of checking the stacks or the catalog, he can simply say, "here is the answer I gave to the last person who asked that question".
In your script, this may apply in different ways. You can store the results of a database query or a calculation or part of a rendered web page; you can store it to a secondary database table or a file or a session variable or to a memory service like memcached. You can store a pre-parsed database query, ready to run. Some libraries like Smarty will automatically store part or all of a page for you. By storing the result and reusing it you can avoid doing the same work many times.
In every case, you have to worry about how long the answer will remain valid. What if the library got a new book in? Is it OK to use an answer that may be five minutes out of date? What about a day out of date?
Caching is very application-specific; you will have to think about what your data means, how often it changes, how expensive the calculation is, how often the result is needed. If the data changes slowly, it may be best to recalculate and store the result every time a change is made; if it changes often but is not crucial, it may be sufficient to update only if the cached value is more than a certain age.
Setup a copy of your application locally, enable the mysql query log, and setup xdebug or some other profiler. The start collecting data, and testing your application. There are lots of guides, and books available about how to optimize things. It is important that you spend time testing, and collecting data first so you optimize the right things.
Using the data you have collected try and reduce the number of queries per page-view, Ideally, you should be able to get everything you need in less 5-10 queries.
Look at the logs and see if you are asking for the same thing twice. It is a bad idea to request a record in one portion of your code, and then request it again from the database a few lines later unless you are sure the value is likely to have changed.
Look for queries embedded in loop, and try to refactor them so you make a single query and simply loop on the results.
The select * you mention using is an indication you may be doing something wrong. You probably should be listing fields you explicitly need. Check this site or google for lots of good arguments about why select * is evil.
Start looking at your queries and then using explain on them. For queries that are frequently used make sure they are using a good index and not doing a full table scan. Tweak indexes on your development database and test.
There are a couple things you can look into:
Query Design - look into more advanced and faster solutions
Hardware - throw better and faster hardware at the problem
Database Design - use indexes and practice good database design
All of these are easier said than done, but it is a start.
Firstly, sack your host, get off shared hosting into an environment you have full control over and stand a chance of being able to tune decently.
Replicate that environment in your lab, ideally with the same hardware as production; this includes things like RAID controller.
Did I mention that you need a RAID controller. Yes you do. You can't achieve decent write performance without one - which needs a battery backed cache. If you don't have one, each write needs to physically hit the disc which is ruinous for performance.
Anyway, back to read performance, once you've got the machine with the same spec RAID controller (and same discs, obviously) as production in your lab, you can try to tune stuff up.
More RAM is usually the cheapest way of achieving better performance - make sure that you've got MySQL configured to use it - which means tuning storage-engine specific parameters.
I am assuming here that you have at least 100G of data; if not, just buy enough ram that your entire DB fits in ram then read performance is essentially solved.
Software changes that others have mentioned such as optimising queries and adding indexes are helpful too, but only once you've got a development hardware environment that enables you to usefully do performance work - i.e. measure performance of your application meaningfully - which means real hardware (not VMs), which is consistent with the hardware environment used in production.
Oh yes - one more thing - don't even THINK about deploying a database server on a 32-bit OS, it's a ruinous waste of good ram.
Indexing is done on the database tables in order to speed queries. If you don't know what it means you have none. At a minumum you should have indexes on every foriegn key and on most fileds that are used frequently in the where clauses of your queries. Primary keys should have indexes automatically assuming you set them up to begin with which I would find unlikely in someone who doesn't know what an index is. Are your tables normalized?
BTW, since you are doing a division in your math (why I haven't a clue), you should Google integer math. You may neot be getting correct results.
You should not select * ever. Instead, select only the data you need for that particular call. And what is your intention here?
order by votes*1000+((1440 - ($server_date - date))/60)2+visites600 desc
You may have poorly-written queries, and/or poorly written pages that run too many queries. Could you give us specific examples of queries you're using that are ran on a regular basis?
sure
this query to fetch the last 3 posts
select * from posts where visible = 1 and date > ($server_date - 86400) and dont_show_in_frontpage = 0 order by votes*1000+((1440 - ($server_date - date))/60)*2+visites*600 desc limit 3
what do you think?

Categories