Storing Large Amounts of Numbers in MySQL Database

Storing Large Amounts of Numbers in MySQL Database - php

I've been thinking this problem over for awhile now and have yet to think of a good way to accomplish it, especially with keeping the tables in some sort of normalization and the data manageable.
I have a 33 x 24 grid map, totaling 792 tiles, and each single tile should have an integer value stored with it. Now, it gets slightly more complex since each of those 792 unique map values will contain another 100 values each with small integer values (likely nothing more than two digits), making 79,200 total unique values being stored.
Theoretically there would be a "tblMap" table that would have a unique mapID and 792 related fields per mapID. Then each of those 792 fields would have a table related to them that would have another 100 fields of data.
This is essentially a large 2D tile map that will be displayed to and updated by a user on a website. The data should not be stored locally for users. I'm using MySQL to store the data and PHP to access and manipulate it.
So what would be the best way to store this, whether with MySQL or possibly saving and loading a file with the values? And what would the table structure for something like this be? Is it a bad thing to have 792 rows in a single table (I kind of think so, but don't see any normalization rules about it)?
I hope that is clear and makes some sense. I searched around and was able to find related topics, but nothing that was specific to this type of case.
Thanks for any insight!

Related

MySQL Table - Large Gaps in ID Field

I'm new to MySQL so please forgive me if this question is too "basic".
We imported data from another database to MySQL. In two of the tables, there are large gaps in the ID field. For example, in one table, IDs 1 to 5438 have smaller gaps but then the next few IDs are 5823, 6612, 7880, 8577, 12541 and it continues like this to 54189. Then it jumps to 441739936 and continues to increase with large gaps in between to 3872082950. I'm assuming that when we start adding data to this table the next ID will be 3872082951 (it's set to auto-increment). The table only has 5234 rows.
My questions are:
Is there any problem with having these large gaps? Will it negatively impact query response time? Are there any other negative side effects of having these large gaps?
Is it fine to leave it as is or are we better off renumbering the IDs so they are sequential without gaps?

There is no problem or penalty with allowing large gaps to remain in the database. There's no impact to performance. Auto-increment id's must be unique, but there's no need for them to be consecutive.

1) No, it wont
2) Probably you should not, remember that id can be used to reference a data row in other table, so if you renumber this, you can change the relations.

There isn't any problem having the gaps, as stated by Bill, but be careful to set the correct data type for the field to cope with the size of the value (bigint, etc).
You can read more here: http://dev.mysql.com/doc/refman/5.7/en/example-auto-increment.html and find the Numeric Data types here: http://dev.mysql.com/doc/refman/5.7/en/numeric-types.html

Lots of rows (eav) vs lots of empty columns

I'm attempting a redesign of the database schema for a website where we have a dynamic form filled out by users based on the category they fit into. The end result of this dynamic form is that I have 100+ columns in the database to store their answers but in any given case less than 10 will actually have a value.
I'm looking to really tighten up the database as much as possible and remove NULL values where possible and 1 solution to the above problem that I am considering is to use a EAV table to hold the answers to variable questions.
So table 1 would hold their name and a few other bits and table 2 would have a row holding the question and answer to each question with a foreign key linking to table 1.
I've done some slap dash calculations that if they were to get 1000 forms filled a day (high estimate, not impossible in the longer term) I'm looking at up to 2,600,000 rows in table 2 per year (1000 * 10 * 260 - working days). My primary key will go up as far as 4 billion so that isn't a problem but I'm concerned that after a few years with 5 - 10 million records that performance will seriously degrade.
I'm mostly concerned with performance, as far as structure goes, EAV is very much preferable as it will allow my client to add new questions without having to interact with the database and it helps keep my database schema much tighter and I'm not concerned about how this complicates my code if it is the right data solution.

What is the most efficient way to store many coordinates in a database

I have a grid on which I'd like to show some elements. These elements I plan to store in a database. But with a grid of 30 by 20 there's a total of 600 possible coordinates to store with a value. On top of that the grid will have 2 overlapping layers, that would mean each user could have possibly 1200 coordinates.
This seems like a lot to me since I'm planning to implement this for a project that will have a large number of users.
What would be the most efficient way to store a large number of
coordinates in a database, and to access coordinates for a large number of users?

Absent other information, the normal database structure would be as table with columns such as:
CoordinateId
UserId
xCoordinate
yCoordinate
Value1
Value2
Having a couple thousand rows per user would normally not be a problem, particularly if you index the table properly (such as having a userId index).
But . . . the real answer depends on how this will be used. Will new values be added? Can the coordinates change? How strict are the performance requirements?

How to store searchable arrays in MySQL

So I've got this form with an array of checkboxes to search for an event. When you create an event, you choose one or more of the checkboxes and then the event gets created with these "attributes". What is the best way to store it in a MySQL database if I want to filter results when searching for these events? Would creating several columns with boolean values be the best way? Or possibly a new table with the checkbox values only?
I'm pretty sure selializing is out of the question because I wouldn't be able to query the selialized string for whether the checkbox was ticked or not, right?
Thanks

You can use the set datatype or a separate table that you join. Either will work.
I would not do a bunch of columns though.
You can search the set easily using FIND_IN_SET(), but it's not indexed, so it depends on how many rows you expect (up to a few thousand is probably OK - it's a very fast search).
The normal solution is a separate table with one column being the ID of the event, and the second column being the attribute using the enum datatype (don't use text, it's slower).

create separate columns or you can store them all in one column using bit mask

One way would be to create a new table with a column for each checkbox, as already described by others. I'll not add to that.
However, another way is to use a bitmask. You have just one column myCheckboxes and store the values as an int. Then in the code you have constants or another appropriate way to store the correlation between each checkbox and it's bit. I.e.:
CHECKBOX_ONE 1
CHECKBOX_TWO 2
CHECKBOX_THREE 4
CHECKBOX_FOUR 8
...
CHECKBOX_NINE 256
Remember to always use the next power of two for new values, otherwise you'll get values that overlap.
So, if the first two checkboxes have been checked you should have 3 as the value of myCheckboxes for that row. If you have ONE and FOUR checked you'd have 9 as the values of myCheckboxes, etc. When you want to see which rows have say checkboxes ONE, THREE and NINE checked your query would be like:
SELECT * FROM myTable where myCheckboxes & 1 AND myCheckboxes & 4 AND myCheckboxes & 256;
This query will return only rows having all this checkboxes marked as checked.
You should also use bitwise operations when storing and reading the data.
This is a very efficient way when it comes to speed. You have just a single column, probably just a smallint, and your searches are pretty fast. This can make a big difference if you have several different collections of checkboxes that you want to store and search trough. However, this makes the values harder to understand. If you see the value 261 in the DB it'll not be easy for a human to immeditely see that this means checkboxes ONE, THREE and NINE have been checked whereas it is much easier for a human seeing separate columns for each checkbox. This normally is not an issue, cause humans don't need to manually poke the database, but it's something worth mentioning.
From the coding perspective it's not much of a difference, but you'll have to be careful not to corrupt the values, cause it's not that hard to mess up a single int, it's magnitudes easier than screwing the data than when it's stored in different columns. So test carefully when adding new stuff. All that said, the speed and low memory benefits can be very big if you have a ton of different collections.

To serialize or to keep a separate table?

This question has risen on many different occasions for me but it's hard to explain without giving a specific example. So here goes:
Let's imagine for a while that we are creating a issue tracker database in PHP/MySQL. There is a "tasks" table. Now you need to keep track of people who are associated with a particular task (have commented or what not). These persons will get an email when a task changes.
There are two ways to solve this type of situation. One is to create a separate table tasks_participants:
CREATE TABLE IF NOT EXISTS `task_participants` (
`task_id` int(10) unsigned NOT NULL,
`person_id` int(10) unsigned NOT NULL,
UNIQUE KEY `task_id_person_id` (`task_id`,`person_id`)
);
And to query this table with SELECT person_id WHERE task_id='XXX'.
If there are 5000 tasks and each task has 4 participants in average (the reporter, the subject for whom the task brought benefit, the solver and one commenter) then the task_participants table would be 5000*4 = 20 000 rows.
There is also another way: create a field in tasks table and store serialized array (JSON or PHP serialize()) of person_id's. Then there would not be need for this 20 000 rows table.
What are your comments, which way would you go?

Go with the multiple records. It promotes database normalization. Normalization is very important. Updating a serialized value is no fun to maintain. With multiple records I can let the database do the work with INSERT, UPDATE and DELETE. Also, you are limiting your future joins by having a multivalued column.

Definitely do the cross reference table (the first option you listed). Why?
First of all, do not worry about the size of the cross reference table. Relational databases would have been out on their ear decades ago if they could not handle the scale of a simple cross reference table. Stop worrying about 20K or 200K records, etc. In fact, if you're going to worry about something like this, it's better to start worrying about why you've chosen a relational DB instead of a key-value DB. After that, and only when it actually starts to be a problem, then you can start worrying about adding an index or other tuning techniques.
Second, if you serialize the association info, you're probably opaque-ifying a whole dimension of your data that only your specialized JSON-enabled app can query. Serialization of data into a single cell in a table typically only makes sense if the embedded structure is (a) not something that contains data you would never need to query outside your app, (b) is not something you need to query the internals of efficiently (e.g., avg count(*) of people with tasks), and (c) is just something you either do not have time to model out properly or is in a prototypical state. So I say probably above, because it's not usually the case that data worth persisting fits these criteria.
Finally, by serializing your data, you are now forced to solve any computation on that serialized data in your code, which is just a big waste of time that you could have spent doing something more productive. Your database already can slice and dice that data any way you need, yet because your data is not in a format it understands, you need to now do that in your code. And now imagine what happens when you change the serialized data structure in V2.
I won't say there aren't use cases for serializing data (I've done it myself), but based on your case above, this probably isn't one of them.

There are a couple of great answers already, but they explain things in rather theoretical terms. Here's my (essentially identical) answer, in plain English:
1) 20k records is nothing to MySQL. If it gets up into the 20 million record range, then you might want to start getting concerned - but it still probably won't be an issue.
2) OK, let's assume you've gone with concatenating all the people involved with a ticket into a single field. Now... Quick! Tell me how many tickets Alice has touched! I have a feeling that Bob is screwing things up and Charlie is covering for him - can you get me a list of tickets that they both worked on, divided up by who touched them last?
With a separate table, MySQL itself can find answers to all kinds of questions about who worked on what tickets and it can find them fast. With everything crammed into a single field, you pretty much have to resort to using LIKE queries to find the (potentially) relevant records, then post-process the query results to extract the important data and summarize it yourself.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.