Mysql Big Table Optimization

Mysql Big Table Optimization - php

We have a 10-year-old website that has been using SMF. Now we wrote our very own forum script but since we are unexperienced developers, we have no idea about optimizing. Our messages table is too big (about 2 gigabytes including indexes, 2.654.193 rows total). SMF was using this table really fast but our new forum script causes high system load average.
Here is the query list: http://i.imgur.com/NPm0DmM.jpg
Here is the table structure and indexes: http://i.imgur.com/FwPdMoI.jpg
Note: We use APC for acceleration and Memcached for caching. I'm a hundred percent sure that the messages table (and topics table maybe) is slowing our website.

This is just the right moment to learn all about SQL indexing.
Proper indexing is THE way to improve SQL performance. Indexing has to be done by developers.
Consider starting here (it's the free web-edition of my book SQL Performance Explained
http://use-the-index-luke.com/
Major disclaimer: all links go to my own content.

Related

Optimization: large MySQL table, only recent records used

I have an optimization question.
The PHP web application, that I have recently started working with, has several large database tables in a MySQL database. The information in this tables should be accessible at all times for business purposes, which makes them grow really big eventually.
The tables are regularly written to and recent records are frequently selected.
Previous developers came up with a very weird practice of optimizing the system. They created separate database for storing recent records in order to keep tables compact and sync the tables once the record grows "old" (more than 24 hours old).
The application uses current date to pick the right database, when performing a SELECT query.
This is a very weird solution in my opinion. We had a big argument over that and I am looking to change this. However, before, I decided to ask:
1) Has someone ever came across anything similar before? I mean, separate database for recent records.
2) What are the most common practices to optimize databases for this particular case?
Any opinions are welcome, as there are many ways one can go at this point.

Try using INDEX:
CREATE INDEX
That improve the access, use and deploy of the information.

I believe this could help you RANGE Partitioning

The solution is to do a Partion to the table base on a date range.
By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan. Maintenance tasks, such as rebuilding indexes or backing up a table, can run more quickly.
The documentation of Mysql can be useful, check this out :
https://dev.mysql.com/doc/refman/5.5/en/partitioning-columns-range.html

Database Design and Solr

We have a database of about 500 000 records that has non-normalized data (Vehicles for sale). We have a master MySQL DB and to enable fast searching, we update a Solr index whenever changes are made. Most of our data is served from the Solr index due to the complex nature of the joins and relationships in the MySQL DB.
We have started to run into problems with the speed and integrity of updates from within solr. When we push updates using softcommit we are find that it takes ~1 second for the changes to be visible. While it isn’t a big issue at the moment, we are concerned that the problem will get worse and we want to have a solution before we get there.
We are after some guidance on what solutions we should be looking at:
How big is our dataset in comparison to other solutions using Solr in
this manner?
We are only using 1 server for Solr at the moment. What is the split point to move to clustering and will that help or hinder our update problem?
One solution we have seen is using a NoSQL DB for some of the
data. Do NoSQL DBs have better performance on a record by record
level?
Are there some other options that might be worth looking into?

I'll answer your questions in sequence
1) No your dataset is not that huge. Anything below 1 million records is fine for solr.
2)Using 1 solr server is not a good option. Try SolrCloud, it is the best way to get a solr into High Availability and it will improve your performance
3)Both sql and nosql databases have their advantages and disadvantages. It depends on your dataset. In general nosql databases are faster.
4)I suggest go with SolrCloud.It is fast and reliable.

Ad network infrastructure opinion? MySQL? Memcached? MongoDB?

We are planning to create an advertisement network. As any normal online advertisement network, we would provide ad serving, reporting (stats) and a little browsing site for publishers/advertisers.
Because the application would get huge impression (ad serving) requests, our application must be able to quickly insert data to log impressions and clicks, log the count of impressions and clicks for every publisher/advertiser. This data then would be used to monitor impressions/clicks from publishers and to generate reports.
Right now we have planned the whole system to be based on PHP, MySQL (InnoDB), php-eAccelerator, Memcached (just to store active ads)
Problems/Issues
Scaling...
I seriously feel that our application is not going to scale well when our traffic grows.
MySQL insertion and UPDATES would surely be the bottleneck. Also how to distribute this all to multiple servers so that our application may scale according to load.
Can anyone please help propose a structure of the application especially for impressions logging and calculation? Would MongoDB be a better solution in any way?
Any help would be highly appreciated.

I've built several high-volume statistical collection systems using MySQL. They perform fairly well so long as you keep ahead of the scaling curve with careful planning. In particular, if you're doing lots of INSERT or UPDATE queries, heavy writes, you'll need to keep your row sizes smaller, using INT from a look-up table instead of VARCHAR columns for instance, and pay careful attention to how big your indexes are getting.
Always, always simulate your schema with massive amounts of test data. Abuse it to the breaking point, fix it, and abuse it all over again. You want to see smoke or you're not trying hard enough. Remember, hardware makes a massive difference, so be careful to use something as close as possible to the deployment target. Your SSD notebook will blow the doors off of a server with 15K enterprise drives in a RAID10 setup, for example, if you're doing heavy writes.
That being said, you might want to look at Redis. It's not a relational database, but it's several orders of magnitude faster than MySQL for things like "add one to column X" or "give me Y count for Z interval" type operations.

card game engine with sql

i am planning to create a card game engine using sql, the game consits of 4 human players and cards are in an sql table, now every thing is done regarding game logic and points, each game is manged by a seperate sql table, and players are able to create rooms
each room shall have a game table contains cards data with each player represnted in a column and a seperate chat table
if there was 1000 games running in the same
time and each time a card played then a requst is made to the server
either to remove a card from a players deck record player score and
total game score, can this be handled in a single sql database
without delayes and performance issues?
can i use global temporary tables ##sometable for each game room or
do i have to create the tables manually and delete them after the
game ends?
i would like also to know if storing chat data in a single sql
table would make issues, one thing i thought of is saving chat data
for all open rooms in a single datatable with a game id column, but
would this give some performance issues if there was thausands of
lines of chat data?
also what about a database for each game, would that be an over
kill?
How such applications are managed normally?
do i have to use multiple servers and distribute the running games
on them?
any ideas you have about optimizing such things

You should consider a memory-based cache system such as Velocity or Memcached to address the performance issues.

Yes. The discussion of how to scale a task like this is a long one.
You could. But you should rather consider a smarter model whereby multiple games occur in a single table.
I would use SQL Server Service Broker for the chat
Yes.
I recommend you break your questions up into multiple questions so that contributers who specialise in a single aspect of your problem domain can contribute accordingly.
I don't know how PHP works; but I am fairly sure that it would be far more efficient for a lot of the game logic to occur client-side. Making a server call for every game action would work, my opinion is just that it is sub-optimal.

Yes, I would expect live players to have at least 1 second delay before making their moves and only one play is making a move at a time per game. So roughly 1000 transactions per second peak for 1000 games. Not an excessive load on modern architectures.
There is more overhead in most DBMSs for creating and destroying tables. Keep it all in the same table.
Chat would be fine in a single table. You could keep performance up by archiving chat from previous inactive games and removing from the primary live db.
Yes, very inefficient. Added complexity for no gain.
Not sure what you are asking.
Only as you scale. I would imagine you would start with a single db server until you needed more capacity.
Good design db design from the beginning from someone with experience will go a long ways. Don't waste too much time micro-optimizing at the get go or you will never get off the ground. Optimize as you need to as you scale.

The short version is that relational DB such as SQL Server are not very useful for games because they cannot efficiently store heavily structured hierarchical data
I would still advocate avoiding SQL, but there are now many more options in the NoSQL and for real performance you should consider using a Fast Temporary Storage such as Redis or Memcache
You can quickly look at Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison
Optimizing is a different topic entirely .. to wide and project specific .

Learning about MySql for large database/tables?

I've been working on a new site of mine for a couple of days now which will be retrieving almost all of its most used content from a MySql database. Seeming as the Database and website is still under development the tables are really small at the moment and speed is of no concern yet.
But you know what they say, a little bit of hard work now saves you a headache later on.
Now I'm only 17, the only database I've ever been taught was through Microsoft Access, and we were practically given the database completed - we learned up to 3NF, but that was about it.
I remember reading once when I was looking to pull data (randomly) out of a database how large databases were taking several seconds/minutes to complete a single query, so this just got me thinking. In a fraction of a second I can submit a search to google, google processes the query and returns the result, and then my browser renders it - all done in the blink of an eye. And google has billions of records to search through. And they're also doing this for millions of users simultaneously.
I'm thinking, how do they do it? I know that they have huge data centers, but still.
I realize that it probably comes down to the design of the database, how it's been optimized, and obviously the configuration. And I guess that's my question really. Could someone please tell me how to design high performance databases for millions/billions of rows (yes, I'm being optimistic), and possibly point me towards some good reading material to help me learn further?
Also, all my queries are done via PHP, if that's at all relevant to any answers.

The blog http://highscalability.com/ has some good articles and pointers to how companies handle large problems.
Specifically related to MySQL, you can Google for craigslist.org's use of MySQL.
http://www.slideshare.net/jzawodn/mysql-and-search-at-craigslist

First the good news... MySQL scales well (depending on the hardware) to at least hundreds of millions of rows.
Once you get to a certain point, a single database server will have trouble managing the load. That's when you get into the realm of partitioning or sharding... spreading the load across multiple database servers using any one of a number of different schemes (e.g. putting unrelated tables on different servers, spreading a single table across multiple servers e.g. by using the ID or date range as a partitioning key).
SQL does shard, but is not fundamentally designed to shard well. There's a whole category of storage alternatives collectively referred to as NoSQL that are designed to solve that very problem (MongoDB, Cassandra, HBase are a few).
When you use SQL at very large scale, you run into any number of issues such as making data model changes across a DB server farm, trouble keeping up with data backups, etc. That's a very complex topic, and people that solve it well are rare. For a glimpse at the issues, have a look at http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-death/
When selecting a database platform for a specific project, benchmark the solution early and often to understand whether or not it will meet the performance requirements that you envision. Having a framework to do that will help you learn about scalability, and will help you decide whether to invest effort in improving the data storage part of your solution, and will help you know where best to invest your time.

No one can tell how to design databases. It comes after much reading and many hour working on them. A good design is product of many many years doing them though. As you've only seen Access you got no knowledge of databases. Search through Amazon.com and you'll get tons of titles. For someone that's starting, anyone will do it.
I mean no disrespect. I've been there and I'm also tutor of some people learning programming/database design. I do know that there's no silver bullet or shortcuts for the work you have ahead.
If you intend to work with high performance database, you should have something in mind. The design of them in per application. A good design depends on learning more and more how the app's users interact with the system, the usage patterns, etc. The things you'll learn from books will give you options, using them will depend heavily on the scenario.
Good luck!

It doesn't all come down to the design of the database, though that is indeed a big part of it. The guys who made Google are geniouses, and if I'm not completely wrong about Google you won't be able to find out exactly how they do what they do. Also, I know that years back they had more than 10,000 computers processing queries, and today they probably have many more. I also suspect them for caching most of the recent/popular keywords. And all the websites have been indexed and analyzed using an unknown algorithm which will make sure the computers don't have to look through all the words on every page.
In fact, Google crawls the entire internet around every 14 days, so when you do a search you do not search the entire internet. Your search gets broken down into keywords and then these keywords are used to narrow the number of relevant pages - and I'm pretty sure all pages have already been analyzed for important and/or relevant keywords before you even thought of visiting google.com.
Have a look at this question.

Have a look into Sphinx server.
http://sphinxsearch.com/
Craigslist uses that for their search engine. Basically, you give it a source and it indexes whatever you want (mysql database/table, text files, etc.). If it works for craigslist, it should work for you.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.