How to work with big tables in MySQL + PHP

How to work with big tables in MySQL + PHP - php

Let's say, we have a MySQL table with user posts. We need to find all post where user_id=1and show them descending by date. But the more posts in the table is, the slower will the search happen, right? What if there are 10000 posts in the table, and we need to find just 3 of them. How long will it take? How to optimize? Can you explain please, how to design data right or just a general conception?

This is a bit long for a comment.
A table with 10,000 rows is not a large table. SQL databases regularly handle queries on tables tens, hundreds, even thousands of times bigger than that.
You need to learn about indexes and partitioning. A good place to start is with the MySQL documentation on these topics.

Related

Single vs multiple blog tables

I'm creating a classic php blog and have a dilemma about single or two mysql tables approach.
In the first case actual blogs would be placed inside actual table (100 rows max), and archived posts inside archive table (20.000 rows max).
Both tables have the same structure.
Querying on actual table is very often and on archive is not so often.
But sometimes there are join and union queries - covering both tables.
Logically, performances are much better on a smaller table but - is that in my case enough reason to create two tables instead single one?
There is also third solution - single table with two partitions actual - 100 rowsand archive - 20.000 rows.
What to do?

You wrote:
Logically, performances are much better on a smaller table
With respect, your intuition about this is entirely incorrect for tables containing less than about ten million rows. A purpose of SQL is to allow rapid retrieval of a few items from among many. Thousands of years of programmer labor (not an exaggeration) have gone into making this kind of thing very fast. You won't be able to outsmart that collective effort.
Put your items in one table. If you need to distinguish between active and inactive items, make a column called active or some such thing, and retrieve them with WHERE active=1 or some such query term.
If you think you're having performance problems you can add indexes to your tables. Read this. https://use-the-index-luke.com/

While designing databases, don't only think about how you will store the data; but think about all the possible cases:
How are you going to retrieve and update information?
Will there be different views and different permissions for different
people?
In your case, archive seems like a subset of actual table. So a single table would be preferred with a row for keeping track of archived files.

Is DynamoDB the right option for this use case?

I want to love DynamoDB, but the major drawback is the query/scan on the whole DB to pull the results for one query. Would I be better sicking with MySQL or is there another solution I should be aware of?
Uses:
Newsfeed items (Pulls most recent items from table where id in x,x,x,x,x)
User profiles relationships (users follow and friend eachother)
User lists (users can have up to 1,000 items in one list)
I am happy to mix and match database solutions.The main use is lists.
There will be a few million lists eventually, ranging from 5 to 1000 items per list. The list table is formatted as follows: list_id(bigint)|order(int(1))|item_text(varchar(500))|item_text2(varchar(12))|timestamp(int(11))
The main queries on this DB would be on the 'list_relations' table:
Select 'item_text' from lists where list_id=539830
I suppose my main question. Can we get all items for a particular list_id, without a slow query/scan? and by 'slow' do people mean a second? or a few minutes?
Thank you

I'm not going to address whether or not it's a good choice or the right choice, but you can do what you're asking. I have a large dynamoDB instance with vehicle VINs as the Hash, something else for my range, and I have a secondary index on vin and a timestamp field, I am able to make fast queries over thousands of records for specific vehicles over timestamp searches, no problem.

Constructing your schema in DynamoDB requires different considerations than building in MySQL.
You want to avoid scans as much as possible, this means picking your hash key carefully.
Depending on your exact queries, you may also need to have multiple tables that have the same data..but with different hashkeys depending on your querying needs.
You also did not mention the LSI and GSI features of DynamoDB, these also help your query-ability, but have their own sets of drawbacks. It is difficult to advise further without knowing more details about your requirements.

PHP forum database optimisation

I've been thinking about creating a forum in PHP so I did a little research to see what the standard is for the tables that people create in the database. On most websites I've looked up, they always choose to have one table for the threads and a second for the posts on the threads.
Having a table for the threads seems perfectly rational to me, but one table to hold all the posts on all the threads seems like a little too much. Would it be better to create a table for each thread that will hold that thread's posts instead sticking a few hundred thousand posts in one table?

The tables should represent the structure of the data in your database. If you have 2 objects, which in this case are your threads and your posts, you should put them in 2 tables.
Trust me, it will be a nightmare trying to figure out the right table to show for each post if you do it the way you're thinking. What would the SQL look like? Something like
SELECT *
FROM PostTable17256
and you would have to dynamically construct this query on each request.
However, by using 1 table, you can simply get a ThreadID and pass it as a variable to your query.
SELECT *
FROM Posts
WHERE ThreadID = $ThreadID
Relational databases are designed to have tables which hold lots of rows. You would probably be surprised what DBAs consider to be a "lot" by the way. A table with 1,000,000 rows is considered small to medium in most places.

Nope nope nope. Database love huge tables. Splitting posts into multiple tables will cause many many headaches.

Storing posts in one table is best solution.
MySQL can easily hold millions of rows in a table.
Creating multiple tables may cause few problems.
For example you will not be able to use JOIN with posts from different threads.

How to deal with big tables in MySQL? (DB design for a Game)

Suppose I have a Games table, a Players table, and a Users table.
Each Game has many Players
Each Player has a status, like 'dead' or 'alive'. So players are not users.
Each Player has a User
The structure looks fine so far. But I was wondering, if there may be thousands of games a day (this kind of game is very short in duration), and each game has 10 or 20 players.. I may end up with millions of rows in the Players table in less than a year. And I need to keep each player stored in the table even after the game ended because I want to be able to replay any game. I'm worried about the performance at that point, selects and updates will become slower and slower, right?
Any thoughts?

Essentially it is a question of scalability. Because scalability is a problem almost all popular web sites, games, etc. encounter, there is a range of solutions to that. First of all, given a reasonable database design and use of indexes, modern databases can handle millions of rows of data fine. If your game is so popular that the about of data grows beyond what modern databases can handle and you have some business model to your enterprise, you would probably be earning enough to hire top–notch experts to help you with that problem.
If you are just starting implementing the game, I would suggest you leave the fine–tuning of the database and queries for later, odds are performance bottlenecks will be in different places than where you expect them to be. Do not optimize prematurely :)

Yes after long time , there would be some problem regarding performance, but a proper Indexing on these tables-fields would make it quite easier for you.
Track all the upcoming select and update queries on your tables and do proper indexing.
you can refer How MySQL Uses Indexes and EXPLAIN Output Format
You can also think some logic to archive some games or records after some time(like 1 month or 2 month) into the another tables with same structure.

Is naming tables september_2010 acceptable and efficient for large data sets dependent on time?

I need to store about 73,200 records per day consisting of 3 points of data: id, date, and integer.
Some members of my team suggest creating tables using month's as the table name (september_2010), while others are suggesting having one table with lots of data in it...
Any suggestions on how to deal with this amount of data? Thanks.
========== Thank you to all the feedback.

I recommend against that. I call this antipattern Metadata Tribbles. It creates multiple problems:
You need to remember to create a new table every year or else your app breaks.
Querying aggregates against all rows regardless of year is harder.
Updating a date potentially means moving a row from one table to another.
It's harder to guarantee the uniqueness of pseudokeys across multiple tables.
My recommendation is to keep it in one table until and unless you've demonstrated that the size of the table is becoming a genuine problem, and you can't solve it any other way (e.g. caching, indexing, partitioning).

Seems like it should be just fine holding everything in one table. It will make retrieval much easier in the future to maintain 1 table, as opposed to 12 tables per year. At 73,200 records per day it will take you almost 4 years to hit 100,000,000 which is still well within MySQLs capabilities.

Absolutely not.
It will ruin relationship between tables.
Table relations being built based on field values, not table names.
Especially for this very table that will grow by just 300Mb/year

so in 100 days you have 7.3 M rows, about 25M a year or so. 25M rows isn't a lot anymore. MySQL can handle tables with millions of rows. It really depends on your hardware and your query types and query frequency.
But you should be able to partition that table (if MySQL supports partitioning), what you're describing is an old SQL Server method of partition. After building those monthly tables you'd build a view that concatenates them together to look like one big table... which is essentially what partitioning does but it's all under-the-covers and fully optimized.

Usually this creates more trouble than it's worth, it's more maintenance , your queries need more logic, and it's painful to pull data from more than one period.
We store 200+ million time based records in one (MyISAM) table, and queries are blazingly still fast.
You just need to ensure there's an index on your time/date column and that your queries makes use of the index (e.g. a query that messes around with DATE_FORMAT or similar on a date column will likely not use an index. I wouldn't put them in separate tables just for the sake of retreival performance.
One thing that gets very painful with such a large number of records is when you have to delete old data, this can take a long time (10 minutes to 2 hours for e.g. wiping a month worth of data in tables with hundreds of mullions rows). For that reason we've partitioning the tables, and use a time_dimension(see e.g. the time_dimension table a bit down here) relation table for managing the periods instead of simple date/datetime columns or strings/varchars representing dates.

Some members of my team suggest creating tables using month's as the table name (september_2010), while others are suggesting having one table with lots of data in it...
Don't listen to them. You're already storing a date stamp, what about different months makes it a good idea to split the data that way? The engine will handle the larger data sets just fine, so splitting by month does nothing but artificially segregate the data.

My first reaction is: Aaaaaaaaahhhhhhhhh!!!!!!
Table names should not embed data values. You don't say what the data means, but supposing for the sake of argument it is, I don't know, temperature readings. Just imagine trying to write a query to find all the months in which average temperature increased over the previous month. You'd have to loop through table names. Worse yet, imagine trying to find all 30-day periods -- i.e. periods that might cross month boundaries -- where temperature increased over the previous 30-day period.
Indeed, just retrieving an old record would go from a trivial operation -- "select * where id=whatever" -- would become a complex operation requiring you to have the program generate table names from the date on the fly. If you didn't know the date, you would have to scan through all the tables searching each one for the desired record. Yuck.
With all the data in one properly-normalized table, queries like the above are pretty trivial. With separate tables for each month, they're a nightmare.
Just make the date part of the index and the performance penalty of having all the records in one table should be very small. If the size of table really becomes a performance problem, I could dimply comprehend making one table for archive data with all the old stuff and one for current data with everything you retrieve regularly. But don't create hundreds of tables. Most database engines have ways to partition your data across multiple drives using "table spaces" or the like. Use the sophisticated features of the database if necessary, rather than hacking together a crude simulation.

Depends on what searches you'll need to do. If normally constrained by date, splitting is good.
If you do split, consider naming the tables like foo_2010_09 so the tables will sort alphanumerically.

what is your DB platform?
In SQL Server 2K5+ you can partition on date.
My bad, I didnt notice the tag. #thetaiko is right though and this is well within MySQL capabilities to deal with this.

I would say it depends on how the data is used. If most queries are done over the complete data, it would be an overhead to always join the tables back together again.
If you most times only need a part of the data (by date), it is a good idea to segment the tables into smaller pieces.
For the naming i would do tablename_yyyymm.
Edit: For sure you should then also think about another layer between the DB and your app to handle the segmented tables depending on some date given. Which can then get pretty complicated.

I'd suggest dropping the year and just having one table per month, named after the month. Archive your data annually by renaming all the tables $MONTH_$YEAR and re-creating the month tables. Or, since you're storing a timestamp with your data, just keep appending to the same tables. I assume by virtue of the fact that you're asking the question in the first place, that segregating your data by month fits your reporting requirements. If not, then I'd recommend keeping it all in one table and periodically archiving off historical records when performance gets to be an issue.

I agree with this idea complicating your database needlessly. Use a single table. As others have pointed out, it's not nearly enough data to warrent extraneous handling. Unless you use SQLite, your database will handle it well.
However it also depends on how you want to access it. If the old entries are really only there for archival purposes, then the archive pattern is an option. It's common for versioning systems to have the infrequently used data separated out. In your case you'd only want everything >1 year to move out of the main table. And this is strictly an database administration task, not an application behavior. The application would only join the current list and the _archive list, if at all. Again, this highly depends on the use case. Are the old entries generally needed? Is there too much data to process regularily?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.