This question already has answers here:
What are the optimum varchar sizes for MySQL?
(2 answers)
Closed 9 years ago.
I am designing a database which will to store JSON strings of various sizes. I'm considering using different tables (each with two columns, 'id' and 'data') to store different string sizes (tinytext - bigtext). In this case each table would be searched, starting with the table containing the smallest string sizes.
I'm also considering using a single table with a single string size and using multiple rows to store large JSON strings.
..or I could just create a table with a large VARCHAR size and save myself some development time.
There are two points that I am designing around:
In some cases, mysql stores small pieces of data "in row" which helps performance. What does this mean and how can I take advantage of this?
In some cases, mysql processes VARCHAR as its largest possible size. When does this happen and how can I avoid it?
From the database point of view there is no particular "good" length for varchar. However try to keep maximum row size under 8kb, including non-clustered indexes. Then you will avoid MySQL storing data out of row, which hampers performance.
use 255
Why historically do people use 255 not 256 for database field magnitudes?
Although, as a side note, if you are working with PHP and trying to insert strings in excess of 1000 characters, you will need to truncate to your max col size on the PHP side before inserting, or you will hit an error.
Related
This question already has answers here:
MySQL Integer vs DateTime index
(3 answers)
Closed 6 years ago.
I'm creating a table to store millions of records, that's it 86400seconds a day x 365days x 10years = 315,360,000 rows of records with only 3 columns,
with datetime, decimal, and smallint (only 3 fields) datetime as index.
I'm thinking of converting the datetime into INT unsigned (PHP time()) to reduce the storage. With the datetime, decimal and smallint, I'm having 2.5GB for 1 table. I've not tried to replace the DATETIME with INT.
The insertion to this table is 1 time job, and I'll have a lots of SELECT statement for analytical purpose, thus I'm changing the InnoDB to MyISAM.
Any thoughts or suggestion?
Indexes are one of the things used to get a faster search on a table:
http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html
On the basic level, indexes are responsible for making the engine not iterate over the entire table when searching for something you asked for.
Some of the things stated in the link regarding the use of indexes:
To find the rows matching a WHERE clause quickly.
To eliminate rows from consideration. If there is a choice between
multiple indexes, MySQL normally uses the index that finds the
smallest number of rows (the most selective index).
See if that suits your needs
I'm currently scraping virtual currency transaction data off a webpage. The transactions consist of time/date, a description, price, and new balance.
Results are paginated. I can fetch 20 at a time. My goal is to have an accurate record of all entries in a separate database. There are a very large number of transactions occurring, and transactions can occur at any time, including between fetching different pages.
Time/date is measured to the minute, so multiple transactions can occur in the same minute. Descriptions can also be the same (for example the same item can be sold in the same quantity to the same person multiple times). Both price and balance could also overlap.
I am storing a timestamp, price, balance, and data which is parsed from the description in multiple fields. I need to be able to tell if an entry is already in the database quickly. The maximum effect I could get is to ensure that each entry has a unique time/data, description, price, and balance. The issue with composite keys is that I don't want to store the full description in the database. (This would double the database size.)
My solution that I came up with was to create a BIGINT hash based on those fields, which would be used as a UNIQUE field in the database. I found that the probability of a collision (based on the birthday attack formula) would be less than 1% for up to 61 million entries, which is a satisfactory probability, since the number of entries I'm planning to track is in the neighbourhood of 40k-2m.
My question is, based on my application and goals, which hashing algorithm would you recommend and how can I get the values from it in to a BIGINT size without losing any of the properties of the algorithm? The most important thing is to avoid collisions, as each one would affect the integrity of the data. Unless you have a better idea, my plan was to concatenate the data into a string (with separators between fields) then feed it into the function. Short code snippets are much appreciated!
Because I don't care about security, I used SHA1. This generates a 20-byte hexadecimal string. BIGINT is 8 bytes in size. Therefore, we need to truncate to 16 characters (since each character is half a byte in hex) using substr, and use base_convert to convert to base 10 for database storage.
function hashToBigInt ($string) {
return base_convert(substr(sha1($string), 0, 16), 16, 10);
}
Thanks everyone for all your help!
Is a search through numbers (INT) faster than characters in an mySQL database?
Regards,
Thijs
In practice the difference will be small - it will increase depending on the relative length of the 2 fields. The question pre-supposes that 2 are interchangeable - which is a very bad assumption given that MySQL natively supports an ENUM data-type.
If it's that important to you, why not measure it?
Yes, it should be. An int is only 4 bytes, while in a text (varchar) might be considerably bigger.
Besides, if you have an index on the field you are searching, it will be smaller in size. Hence, you might need fewer disk accesses to do an index scan.
If the INT and VARCHAR consumes the same amount of space the difference should be negligible, even though INT probably will come out on top.
Databases don't use black magic. They need to physically access data like everyone else. Table rows consume disk space. Reading 100 mb is faster than reading 200 mb. Always.
Therefore. this affects everything. Smaller rows means more rows per "block". More rows per block means more rows fetched per disk access. Fewer blocks in total means that a larger percentage of the rows will fit in various buffer caches.
One of the things that always worries me in MySQL is that my string fields will not be large enough for the data that need to be stored. The PHP project I'm currently working on will need to store strings, the lengths of which may vary wildly.
Not being familiar with how MySQL stores string data, I'm wondering if it would be overkill to use a larger data type like TEXT for strings that will probably often be less than 100 characters. What does MySQL do with highly variable data like this?
See this: http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html
VARCHAR(M), VARBINARY(M) L + 1 bytes
if column values require 0 – 255
bytes, L + 2 bytes if values may
require more than 255 bytes
BLOB, TEXT L + 2 bytes, where L < 2^16
So in the worst case, you're using 1 byte per table cell more when using TEXT.
As for indexing: you can create a normal index on a TEXT column, but you must give a prefix length - e.g.
CREATE INDEX part_of_name ON customer (name(10));
and moreover, TEXT columns allow you to create and query fulltext indexes if using the MyISAM engine.
On the other hand, TEXT columns are not stored together with the table, so performance could, theoretically, become an issue in some cases (benchmark to see about your specific case).
In recent versions of MySQL, VARCHAR fields can be quite long - up to 65,535 characters depending on character set and the other columns in the table. It is very efficient when you have varying length strings. See:
http://dev.mysql.com/doc/refman/5.1/en/char.html
If you need longer strings than that, you'll probably just have to suck it up and use TEXT.
I'm trying to do an INSERT into a mysql db and it fails when any of the values are longer than 898 characters. Is there somewhere to get or, better, set this maximum value? I'll hack the string into chunks and store 'em in separate rows if I must, but I'd like to be able to insert up to 2k at a time.
I'm guessing this is php issue as using LONGTEXT or BLOB fields should be more than enough space in the db.
Thanks.
Side Note:
When you get into working with large blobs and text columns, you need to watch out for the MySQL max_allowed_packet variable. I believe it defaults to at least 1M.
I'm assuming this is a varchar column you're trying to insert into? If so, I assume the maximum length has been set to 898 or 900 or something like that.
In MySQL 5 the total row size can be up to 65,536 bytes so a varchar can be defined to whatever size keeps the total row size under that.
If you need larger use text (65,536) or longtext (4 billion?).