All,
I'm writing a web app that will receive user generated text content. Some of those inputs will be a few words, some will be several sentence long. In more than 90% of cases, the inputs will be less than 800 characters. Inputs need to be searchable. Inputs will be in various character sets, including Asian. The site and the db are based on utf8.
I understand roughly the tradeoffs between VARCHAR and TEXT. What I am envisioning is to have both a VARCHAR and a TEXT table, and to store inputs on one or the other depending on their size (this should be doable by the PHP script).
What do you think of having several tables for data based on its size? Also, would it make any sense to create several VARCHAR tables for various size ranges? My guess is that I will get a large number of user inputs clustered around a few key sizes.
Thanks,
JDelage
Storing values in one column vs another depending on size of input is going to add a heck of a lot more complexity to the application than it'll be worth.
As for VARCHAR vs TEXT in MySQL, here's a good discussion about that, MySQL: Large VARCHAR vs TEXT.
The "tricky" part is doing a full-text search on this field which requires the use of MyISAM storage engine as it's the only one that supports full-text indexes. Also of note is that sometimes at the cost of complicating the system architecture, it might be worthwhile to use something like Apache Solr as it perform full-text search much more efficiently. A lot of people have most of the data in their MySQL database and use something like Solr just for full-text indexing that text column and later doing fancy searches with that index.
Re: Unicode. I've used Solr for full-text indexing of text with Unicode characters just fine.
Comments are correct. You are only adding 1 byte by using the TEXT datatype over VARCHAR.
Storage Requirements:
VARCHAR Length of string + 1 byte
TEXT Length of string + 2 bytes
The way I see it is you have two options:
Hold it in TEXT, it will waste single additional byte on storage and additional X processing power on search.
Hold it in VARCHAR, create additional table named A_LOT_OF_TEXT with the structure of (int row_id_of_varchar_table, TEXT). If the data is small enough, put it in varchar, otherwise put a predefined value instead of data, for example 'THE_DATA_YOU_ARE_LOOKING_FOR_IS_IN_TABLE_NAMED_A_LOT_OF_TEXT' or just simply NULL and put the real data to table A_LOT_OF_TEXT.
Related
So I am working on a small project and using PHP/MySQL so far.
I wanted to know how do I go on storing very long text (let's say a user inputs a lot of paragraphs at once). Is using SQL for saving the text a good idea? Or should I save the huge amount of text using some other method instead?
Thanks!
EDIT: Forgot to mention I am currently using LONGTEXT to store the text. I just wanted to know if it's a good approach to store such amount of text in a db.
It's fine to store long text in Mysql database, just make sure you use appropriate column type in mysql database:
TEXT - Holds a string with a maximum length of 65,535 characters
MEDIUMTEXT - Holds a string with a maximum length of 16,777,215 characters
LONGTEXT - Holds a string with a maximum length of 4,294,967,295 characters
I have some text data I would like to store in a mysql database. I currently have the data stored in a variable as a string.
I'm concerned that the table will become quite large due to the amount of text data I have for each row.
Therefore, what is the most easiest way (preferably php built in functions) of compacting this string data in a format ideal for storage and retrieval?
You could GZIP the string with GZEncode.
That's pretty standard and thus should be reversible from other languages if you want to.
I would advise storing a Base64 version of the result.
If you're using InnoDB you can enable compression on entire tables which doesn't impact your code at all.
ALTER TABLE database.tableName ENGINE='InnoDB' ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
You can alter the KEY_BLOCK_SIZE to smaller values to get more compression (depending on the data), but this adds more overhead to the CPU.
After testing a range of tables, I found a KEY_BLOCK_SIZE of 8 to be a good balance of compression vs performance.
How to best choose a size for a varchar/text/... column in a (mysql) database (let's assume the text the user can type into a text area should be max 500 chars), considering that the user also might use formatting (html/bb code/...), which is not visible to the user and should not affect the max 500 chars text size...??
1) theoretically, to prevent any error, the varchar size has to be almost endless, if the user e.g. uses 20 links like this (http://[huge number of chars]) or whatever... - or not?
2) should/could you save formatting in a separate column, to e.g. not give an index (like FULLTEXT) wrong values (words that are contained in formatting but not in the real text)?
If yes, how to best do this? do you remember at which point the formatting was used, save this point and the formatting and when outputting put this information together?
(php/mysql, java script, jquery)
Thank you very much in advance!
A good solution is to consider in the amount of formatting characters.
If you do not, to avoid data loss, you need to use much more space for the text on the database and check the length of prior record before save or use full text.
Keep the same data twice in one table is not a good solution, it all depends on your project, but usually better it's filter formating on php.
in my project an user can write comment [plain text], and view others comment, can delete own comment, but can not update comment !
In this case which would should i use ?
Text or Varchar(4048) ?
What is the advantage and disadvantage of Text and Varchar(large like 4000) ?
Is it secure enough if i replace only '<' with '& lt;' and '>' with '& gt;' to make sure everything is fine ?
[i dont want to convert all those like ' " & ..., to save space, i just want to make sure user can not write javascript]
There will be a limit on the front end
Varchar is usually faster in retrieval when the size is reasonable, as it is stored within the table, where as TEXT is stored off the table with a pointer to location.
Thanks
(You have multiple questions; I will address the one that is in the title.)
The only difference between VARCHAR(4000) and TEXT is that an INSERT will truncate to either 4000 characters or 65536 bytes, respectively.
For smaller values than 4000, there are cases where the temp table in a complex SELECT will run faster with, for example, VARCHAR(255) than TINYTEXT. For that reason, I feel that one should never use TINYTEXT.
To protect yourself from XSS attack, encode it using the htmlentities function.
Other than that, the choice of datatype has most to do with how big the content will be. If it may exceed 4048 characters, then use a text datatype. If many posts will be large, using a text datatype may reduce wasted data space and may perform slightly better than a giant varchar, but it depends upon your situation, you would be best to test the alternatives.
I generally prefer varchar because it's easier to deal with from a coding perspective, if nothing else, and fall back to text if the contents may exceed the size of a varchar.
it depends on the application behavior. space allocated inside block's table decrease space for other columns and decrease density data inside it. if full table scan is used by mysql, many blocks are scanned, it's inefficient.
so it depends on your sql requests.
I am writing a web application in PHP that will store large numbers of blocks of arbitrary length text. Is MySQL well suited for this task with a longtext field or similar, or should I store each block of text in its own file and use a MySQL table for indexes and filenames? Think online bulletin board type stuff, like how you would store each users posts.
Yes, MySQL is the way to go. A flat file would take much longer to search etc.
Mysql all the way. Much more efficient.