in my project an user can write comment [plain text], and view others comment, can delete own comment, but can not update comment !
In this case which would should i use ?
Text or Varchar(4048) ?
What is the advantage and disadvantage of Text and Varchar(large like 4000) ?
Is it secure enough if i replace only '<' with '& lt;' and '>' with '& gt;' to make sure everything is fine ?
[i dont want to convert all those like ' " & ..., to save space, i just want to make sure user can not write javascript]
There will be a limit on the front end
Varchar is usually faster in retrieval when the size is reasonable, as it is stored within the table, where as TEXT is stored off the table with a pointer to location.
Thanks
(You have multiple questions; I will address the one that is in the title.)
The only difference between VARCHAR(4000) and TEXT is that an INSERT will truncate to either 4000 characters or 65536 bytes, respectively.
For smaller values than 4000, there are cases where the temp table in a complex SELECT will run faster with, for example, VARCHAR(255) than TINYTEXT. For that reason, I feel that one should never use TINYTEXT.
To protect yourself from XSS attack, encode it using the htmlentities function.
Other than that, the choice of datatype has most to do with how big the content will be. If it may exceed 4048 characters, then use a text datatype. If many posts will be large, using a text datatype may reduce wasted data space and may perform slightly better than a giant varchar, but it depends upon your situation, you would be best to test the alternatives.
I generally prefer varchar because it's easier to deal with from a coding perspective, if nothing else, and fall back to text if the contents may exceed the size of a varchar.
it depends on the application behavior. space allocated inside block's table decrease space for other columns and decrease density data inside it. if full table scan is used by mysql, many blocks are scanned, it's inefficient.
so it depends on your sql requests.
Related
Difficult question to phrase, so let me explain.
As part of an RSS caching system I'm inserting a lot of rows into a DB, several times a day. One of the columns is 'snippet', for the description node in the RSS feeds.
Sometimes this node is far longer than I want, since the corresponding DB column is type "tiny text" (max: 255 chars).
So, in terms of computation/memory, is it better for me to truncate via PHP before insertion, or just feed the whole, too-long string to MySQL and have it do the truncation?
Both of course work, but I wondered if one was better practice than the other.
In cases like this it's probably best to measure. If you don't notice a difference then it doesn't matter.
My intuition tells me that, since your snippet size is very small and the plain text can be very big it would be better to truncate before hand. Take the performance hit in PHP so you don't spend a lot of time sending a large query to MySQL.
For readability and code clarity it would also be better to do the truncation in PHP because that makes it explicit. You can even do clever truncating by word or by sentence.
I have a column VARCHAR(1000) in MySQL DB.
So, when PHP requests insert more than 1000 symbols, the rest of the text is rejected.
On the site I had placed a textarea with maxlength=1000,
but on the server side, PHP parse submitted text with the htmlspecialchars function, so if the text was it's called "nothing", it becomes it's called "nothing".
The problem is text could become more than 1000 symbols even it was typed clearly 1000 characters.
Can you help me find the right way, right function etc. to insert all characters which user typed?
Don't use htmlspecialchars when storing in the database. Use it (or htmlentities) when retrieving from the DB and writing to the browser.
You must also watch out for UTF-8 which can have characters that take up more space than one byte each.
The idea behind VARCHAR (over CHAR for example) is that it allows the data base to store less bytes to disk when the field is not fully used.
For example, if you write "xx" into VARCHAR(1000), only two characters are stored to your physical data base, not 1000. (Note, I said characters here not bytes. The actual number of bytes will be more to allow for wider variable width UTF-8 codes, the needed string length, any word and disk buffer padding, etc. But clearly you will be storing a lot less bytes than if you used CHAR.)
So let SQL do some of the work for you. Make the field size 3000 or something, or at least big enough that you will never have a problem, even if a large percentage of the symbols are wider than 8 bits.
How to best choose a size for a varchar/text/... column in a (mysql) database (let's assume the text the user can type into a text area should be max 500 chars), considering that the user also might use formatting (html/bb code/...), which is not visible to the user and should not affect the max 500 chars text size...??
1) theoretically, to prevent any error, the varchar size has to be almost endless, if the user e.g. uses 20 links like this (http://[huge number of chars]) or whatever... - or not?
2) should/could you save formatting in a separate column, to e.g. not give an index (like FULLTEXT) wrong values (words that are contained in formatting but not in the real text)?
If yes, how to best do this? do you remember at which point the formatting was used, save this point and the formatting and when outputting put this information together?
(php/mysql, java script, jquery)
Thank you very much in advance!
A good solution is to consider in the amount of formatting characters.
If you do not, to avoid data loss, you need to use much more space for the text on the database and check the length of prior record before save or use full text.
Keep the same data twice in one table is not a good solution, it all depends on your project, but usually better it's filter formating on php.
All,
I'm writing a web app that will receive user generated text content. Some of those inputs will be a few words, some will be several sentence long. In more than 90% of cases, the inputs will be less than 800 characters. Inputs need to be searchable. Inputs will be in various character sets, including Asian. The site and the db are based on utf8.
I understand roughly the tradeoffs between VARCHAR and TEXT. What I am envisioning is to have both a VARCHAR and a TEXT table, and to store inputs on one or the other depending on their size (this should be doable by the PHP script).
What do you think of having several tables for data based on its size? Also, would it make any sense to create several VARCHAR tables for various size ranges? My guess is that I will get a large number of user inputs clustered around a few key sizes.
Thanks,
JDelage
Storing values in one column vs another depending on size of input is going to add a heck of a lot more complexity to the application than it'll be worth.
As for VARCHAR vs TEXT in MySQL, here's a good discussion about that, MySQL: Large VARCHAR vs TEXT.
The "tricky" part is doing a full-text search on this field which requires the use of MyISAM storage engine as it's the only one that supports full-text indexes. Also of note is that sometimes at the cost of complicating the system architecture, it might be worthwhile to use something like Apache Solr as it perform full-text search much more efficiently. A lot of people have most of the data in their MySQL database and use something like Solr just for full-text indexing that text column and later doing fancy searches with that index.
Re: Unicode. I've used Solr for full-text indexing of text with Unicode characters just fine.
Comments are correct. You are only adding 1 byte by using the TEXT datatype over VARCHAR.
Storage Requirements:
VARCHAR Length of string + 1 byte
TEXT Length of string + 2 bytes
The way I see it is you have two options:
Hold it in TEXT, it will waste single additional byte on storage and additional X processing power on search.
Hold it in VARCHAR, create additional table named A_LOT_OF_TEXT with the structure of (int row_id_of_varchar_table, TEXT). If the data is small enough, put it in varchar, otherwise put a predefined value instead of data, for example 'THE_DATA_YOU_ARE_LOOKING_FOR_IS_IN_TABLE_NAMED_A_LOT_OF_TEXT' or just simply NULL and put the real data to table A_LOT_OF_TEXT.
I know about VARCHAR, TEXT, etc. but I'm hoping for something that has a fixed width because I value the boost in efficiency, even if it's minor. I can't seem to find anything in the documentation, and my only other option is to split a message into pieces and store each in a separate column.
No, there is no such datatype. Did you actually measure if using CHAR in place of VARCHAR gives you any measurable performance gain? Remember what they say about premature optimization?
Splitting message into many columns and then stitching it together will probably be slower, than any increase in speed gained from using CHAR instead of VARCHAR.