I have an online iphone turnbased game, with lots of games running at the same time. I'm in the process of optimizing the code, since both me and the server have crashed today.
This is the setup:
Right now I have one table, "matches" (70 fields of data for each row. The structure), that keep track of all the active matches. Every 7 seconds, the iphone will connect, download all the matches in the "matches" table that he/she is active in, and update the UI in the iphone.
This worked great until about 1,000 people downloaded the game and played. The server crashed.
So to optimize, I figure I can create a new table called "matches_needs_update". This table have 2 rows; name and id. The "id" is the same as the match in the "matches" table. When a match is updated, it's put in this table.
Now, instead for search through the whole "matches" table, the query just check if the player have any matches that need to be updated, and then get those matches from the "matches" table.
My question is twofold:
Is this the optimal solution?
If a player is active in, say 10 matches, is there a good way to get those 10 matches from the "matches" table at the same time, or do I need a for loop doing 10 queries, one for each match:
"SELECT * FROM matches WHERE id = ?"
Thanks in advance
You need to get out of the database. Look to memcache or redis.
I suggest APC...
...as you're on PHP, and I assume you're doing this from a single mysql database,
It's easy to install, and will be default from PHP 6 onwards.
Keep this 1 table in memory and it will fly.
Your database looks really small. A table with 70 rows should return within milliseconds and even hundreds of queries per second should work without any problems.
A couple of traditional pointers
Make sure you pool your connections. You should never have to do the connect when a customer needs the data.
Make sure there is an index on "user is in match" so that the result will be fetched from the index.
I'm sure you have enough memory to hold the entire structure in the cache and with these small tables no additional config should be needed.
Make sure your schema is normalized. One table for each user. One for each match. And one for each user in a match.
Its time to start caching things eg memcache and apc.
As for looping though the matches... that is the wrong way to go about it.
How is a user connected to a match by a xref tabel? or does the match table have somthing like player1,player2.
Looping though queries is not the way to go properly indexing your tables and doing a join to pull all the active matches by a userId would me more efficient. Givin the number of users you may also want to (if you havent) split the tables up for active and inactive games.
If theres 6000 active games and 3,000,000 inactive its extremely beneficial to partition these tables.
Related
Assumptions
If A is a friend of B, B is also a friend of A.
I searched for this question and there are already lots of questions on Stack Overflow. But all of them suggest the same approach.
They are creating a table friend and have three columns from, to and status. This serves both purposes : who sent friend request as well as who are friends if status is accepted.
But this means if there are m users and each user has n friends, then I will have mn rows in the friends table.
What I was thinking is to store friends list in a text column. For every user I have a single row and a friends column which will have all accepted friends' IDs separated by a character, say | which I can explode to get all friends list. Similarly, I will have another column named pending requests. When a request is accepted, IDs will move from pending requests to friends column.
Now, this should significantly reduce the entries in the table and the search time.
The only overhead will be when I will have to delete a friend, I will have to retrieve the friend string, search the ID of the friend to be deleted, delete the ID and update the column. However, this is almost negligible if I assume a user cannot have more than 2000 friends.
I assume that I will definitely be forgetting some situations or this approach will have certain pitfalls. So please correct if so.
The answer is NO! Do not try to implement this idea - its complete disaster.
I am going to describe more precise why:
Relations. You are storing just keys separeted with |. What if you want to display list with names of friends? You will have to get list, explode it and make another n queries to DB. With relation table from | to | status you will be able to do that with one JOIN.
Deletions. Just horrible.
Inserts. For every insert you will need to do SELECT + UPDATE instead of INSERT.
Types. You should keep items in DB as they are, so integers as integers. Converting ints into string and back could cause some errors, bugs etc.
No ORM support. In future you will probably leave plain PHP for some framework. Take in mind that none of them will support your idea.
Search time?
Please do some tests. Search with WHERE + PRIMARY KEY is very fast.
I want to love DynamoDB, but the major drawback is the query/scan on the whole DB to pull the results for one query. Would I be better sicking with MySQL or is there another solution I should be aware of?
Uses:
Newsfeed items (Pulls most recent items from table where id in x,x,x,x,x)
User profiles relationships (users follow and friend eachother)
User lists (users can have up to 1,000 items in one list)
I am happy to mix and match database solutions.The main use is lists.
There will be a few million lists eventually, ranging from 5 to 1000 items per list. The list table is formatted as follows: list_id(bigint)|order(int(1))|item_text(varchar(500))|item_text2(varchar(12))|timestamp(int(11))
The main queries on this DB would be on the 'list_relations' table:
Select 'item_text' from lists where list_id=539830
I suppose my main question. Can we get all items for a particular list_id, without a slow query/scan? and by 'slow' do people mean a second? or a few minutes?
Thank you
I'm not going to address whether or not it's a good choice or the right choice, but you can do what you're asking. I have a large dynamoDB instance with vehicle VINs as the Hash, something else for my range, and I have a secondary index on vin and a timestamp field, I am able to make fast queries over thousands of records for specific vehicles over timestamp searches, no problem.
Constructing your schema in DynamoDB requires different considerations than building in MySQL.
You want to avoid scans as much as possible, this means picking your hash key carefully.
Depending on your exact queries, you may also need to have multiple tables that have the same data..but with different hashkeys depending on your querying needs.
You also did not mention the LSI and GSI features of DynamoDB, these also help your query-ability, but have their own sets of drawbacks. It is difficult to advise further without knowing more details about your requirements.
I'm working on already made Facemash-Alike Script. It's script that shows two pictures, and user make a choice which picture is better for him.
I wanted to create a small improvement that won't show a user the same combination of two pictures he already voted.
I tried to do this in two ways. But any of this ways is not good enough or not comfortable for user.
First one - Choices of two pictures are randomized. After vote, in database, new record is created with this specific combination, and value of vote. If combination of two pictures already exist as record in database then page shows historical vote, and after few seconds page refreshing, making another random combination.
Second one - In the moment when names of pictures are added to database then scripts creates all possible combinations as records in database. It's good way, because script pulls out from database a random record that doesn't contains any result, and after vote saves with a value. So it's no way to make any repeats. The main problem of this way is in the moment of adding new pictures. Database at the start becoming huge, and creating all possible combination at start taking forever.
Because of that I'm looking for another solution. I would like to hear even small advice that might help find me a way.
Your first approach scales better, you just want to avoid showing an historical vote. You need to keep a history of votes anyway, so use that history as a filter. In the SELECT statement you are using to get the random faces, left join on the history table to use the join as a filter.
Example:
SELECT faces.uid f_uid, votes.uid v_uid FROM faces
LEFT JOIN votes ON votes.user_id=# AND faces.uid=votes.face_id1 AND
faces.uid=votes.face_id2
WHERE v_uid IS NULL
ORDER BY RAND() LIMIT 2
That will make sure they never see the same face twice. It will become slower the more faces a user votes on. It won't be noticeably slower until they have done many hundreds of votes.
That said, you could change the LIMIT to something like 20 and cache that (i.e. in the session). You then have the next 10 pairings (20/2=10) ready to go. That is sort of a combination of 1 & 2.
I have a table which stores highscores for a game. This game has many levels where scores are then ordered by score DESC (which is an index) where the level is a level ID. Would partitioning on this level ID column create the same result as create many seperate level tables (one for each level ID)? I need this to seperate out the level data somehow as I'm expecting 10's of millions of entries. I hear partitioning could speed this process up, whilst leaving my tables normalised.
Also, I have an unknown amount of levels in my game (levels may be added or removed at any time). Can I specify to partition on this level ID column and have new partitions automaticaly get created when a new (distinct level ID) is added to the highscore table? I may start with 10 seperate levels but end up with 50, but all my data is still kept in one table, but many partitions? Do I have to index the level ID to make this work?
Thanks in advance for your advice!
Creting an index on a single column is good, but creating an index that contains two columns would be a better solution based on the information you have given. I would run a
alter table highscores add index(columnScore, columnLevel);
This will make performance much better. From a database point of view, no matter what highscores you are looking for, the database will know where to search for them.
On that note, if you can, (and you are using mysami tables) you could also run a:
alter table order by columnScore, columnLevel;
which will then group all your data together, so that even though the database KNOWS where each bit is, it can find all the records that belong to one another nearby - which means less hard drive work - and therefore quicker results.
That second operation too, can make a HUGE difference. My PC at work (horrible old machine that was top of the range in the nineties) has a database with several million records in it that I built - nothing huge, about 2.5gb of data including indexes - and performance was dragging, but ordering the data for the indexes improved query time from about 1.5 minutes per query to around 8 seconds. That's JUST due to hard drive speed in being able to get to all the sectors that contain the data.
If you plan to store data for different users, what about having 2 tables - one with all the information about different levels, another with one row for every user alongside with his scores in XML/json?
I have recently written a survey application that has done it's job and all the data is gathered. Now i have to analyze the data and i'm having some time issues.
I have to find out how many people selected what option and display it all.
I'm using this query, which does do it's job:
SELECT COUNT(*)
FROM survey
WHERE users = ? AND table = ? AND col = ? AND row = ? AND selected = ?
GROUP BY users,table,col,row,selected
As evident by the "?" i'm using MySQLi (in php) to fetch the data when needed, but i fear this is causing it to be so slow.
The table consists of all the elements above (+ an unique ID) and all of them are integers.
To explain some of the fields:
Each survey was divided into 3 or 4 tables (sized from 2x3 to 5x5) with a 1 to 10 happiness grade to select form. (questions are on the right and top of the table, then you answer where the questions intersect)
users - age groups
table, row, col - explained above
selected - dooooh explained above
Now with the surveys complete and around 1 million entries in the table the query is getting very slow. Sometimes it takes like 3 minutes, sometimes (i guess) the time limit expires and you get no data at all. I also don't have access to the full database, just my empty "testing" one since the costumer is kinda paranoid :S (and his server seems to be a bit slow)
Now (after the initial essay) my questions are: I left indexing out intentionally because with a lot of data being written during the survey, it would be a bad idea. But since no new data is coming in at this point, would it make sense to index all the fields of a table? How much sense does it make to index integers that never go above 10? (as you can guess i haven't got a clue about indexes). Do i need the primary unique ID in this table? I
I read somewhere that indexing may help groups but only if you group by the first columns in a table (and since my ID is first and from my point of view useless can i remove it and gain anything by it?)
Is there another way to write my query that would basically do the same thing but in a shorter period of time?
Thanks for all your suggestions in advance!
Add an index on entries that you "GROUP BY" or do "WHERE". So that's ONE index incorporating users,table,col,row and selected in your case.
Some quick rules:
combine fields to have the WHERE first, and the GROUP BY elements last.
If you have other queries that only use part of it (e.g. users,table,col and selected) then leave the missing value (row, in this example) last.
Don't use too many indexes/indeces, as each will slow the table to updates marginally - so on really large system you need to balance queries with indexes.
Edit: do you need the GROUP BY user,col,row as these are used in the WHERE. If the WHERE has already filtered them out, you only need group by "selected".