So I am working on a small project and using PHP/MySQL so far.
I wanted to know how do I go on storing very long text (let's say a user inputs a lot of paragraphs at once). Is using SQL for saving the text a good idea? Or should I save the huge amount of text using some other method instead?
Thanks!
EDIT: Forgot to mention I am currently using LONGTEXT to store the text. I just wanted to know if it's a good approach to store such amount of text in a db.
It's fine to store long text in Mysql database, just make sure you use appropriate column type in mysql database:
TEXT - Holds a string with a maximum length of 65,535 characters
MEDIUMTEXT - Holds a string with a maximum length of 16,777,215 characters
LONGTEXT - Holds a string with a maximum length of 4,294,967,295 characters
Related
I have a column VARCHAR(1000) in MySQL DB.
So, when PHP requests insert more than 1000 symbols, the rest of the text is rejected.
On the site I had placed a textarea with maxlength=1000,
but on the server side, PHP parse submitted text with the htmlspecialchars function, so if the text was it's called "nothing", it becomes it's called "nothing".
The problem is text could become more than 1000 symbols even it was typed clearly 1000 characters.
Can you help me find the right way, right function etc. to insert all characters which user typed?
Don't use htmlspecialchars when storing in the database. Use it (or htmlentities) when retrieving from the DB and writing to the browser.
You must also watch out for UTF-8 which can have characters that take up more space than one byte each.
The idea behind VARCHAR (over CHAR for example) is that it allows the data base to store less bytes to disk when the field is not fully used.
For example, if you write "xx" into VARCHAR(1000), only two characters are stored to your physical data base, not 1000. (Note, I said characters here not bytes. The actual number of bytes will be more to allow for wider variable width UTF-8 codes, the needed string length, any word and disk buffer padding, etc. But clearly you will be storing a lot less bytes than if you used CHAR.)
So let SQL do some of the work for you. Make the field size 3000 or something, or at least big enough that you will never have a problem, even if a large percentage of the symbols are wider than 8 bits.
I am storing serialized data in a mysql and am unsure which field type to choose?
One example of the serialized data output is below,
string(393) "a:3:{s:4:"name";s:22:"PACMAN-Appstap.net.rar";s:8:"trackers";a:6:{i:0;s:30:"http://tracker.ccc.de/announce";i:1;s:42:"http://tracker.openbittorrent.com/announce";i:2;s:36:"http://tracker.publicbt.com/announce";i:3;s:23:"udp://tracker.ccc.se:80";i:4;s:35:"udp://tracker.openbittorrent.com:80";i:5;s:29:"udp://tracker.publicbt.com:80";}s:5:"files";a:1:{s:22:"PACMAN-Appstap.net.rar";i:4147632;}}"
The string lengths of the data can vary greatly upto around 20,000 characters.
I understand that I do not want to use TEXT data type as this could corrupt data because of character sets that it would have to use.
I am stuck as when it comes to use either VARBINARY, BLOB, MEDIUMBLOB etc.
Let us say if I use VARBINARY(20000) does this mean that I can insert a string of 20000 in length safely and if it is over then discard the insert?
I agree with PLB in that you should use BLOB. The length attribute specifies how many bytes can be saved in this column. The main difference between BLOB and VARBINARY is that VARBINARY fills up unused space with padding, wheras with BLOB only the actual length of the data is reserved for one field.
But as PLB said, only use this if you absolutely must, because it slows down the whole DB in most cases. A better solution would be to store the files in your server's filesystem and save the file's path in the DB.
I am serializing alot of arrays in php that are to be stored in a database using mysql.
The length of the final string can vary greatly from anything inbetween 2000 to 100,000+, I was wondering what would the best column type for this to be?
I currently have it set as LONGTEXT but I feel this is overkill! The database is already active and has around 3million rows this is a new column which will added soon.
Thanks
Always use any BLOB data-type for serializing data so that it does not get cut off and break the serialization in a binary safe manner. If there is not a maximum to the length of the final string then you will need LONGBLOB. If you know that the data won't fill 2^24 characters you could use a MEDIUMBLOB. MEDIUMBLOB is about 16MB while LONGBLOB is about 4GB so I would say you're pretty safe with MEDIUMBLOB.
Why a binary data type? Text data types in MySQL have an encoding. Character encoding will have an effect on how the serialized data is transposed between the different encodings. E.g. when stored as Latin-1 but then read out as UTF-8 (for example because of the database driver connection encoding setting), the serialized data can be broken because binary offsets did shift however the serialized data was not encoded for such shifts. PHP's serialized strings are binary data, not with any specific encoding.
You should choose BLOB (as Marc B noted) per the PHP manual for serialize():
"Note that this [outputs] a binary string which may include null bytes, and needs to be stored and handled as such. For example, serialize() output should generally be stored in a BLOB field in a database, rather than a CHAR or TEXT field."
Source: http://php.net/serialize
Of course J.Money's input regarding sizes must be borne in mind as well - even BLOB has its limits, and if you are going to exceed them then you would need MEDIUMBLOB or LONGBLOB.
How to best choose a size for a varchar/text/... column in a (mysql) database (let's assume the text the user can type into a text area should be max 500 chars), considering that the user also might use formatting (html/bb code/...), which is not visible to the user and should not affect the max 500 chars text size...??
1) theoretically, to prevent any error, the varchar size has to be almost endless, if the user e.g. uses 20 links like this (http://[huge number of chars]) or whatever... - or not?
2) should/could you save formatting in a separate column, to e.g. not give an index (like FULLTEXT) wrong values (words that are contained in formatting but not in the real text)?
If yes, how to best do this? do you remember at which point the formatting was used, save this point and the formatting and when outputting put this information together?
(php/mysql, java script, jquery)
Thank you very much in advance!
A good solution is to consider in the amount of formatting characters.
If you do not, to avoid data loss, you need to use much more space for the text on the database and check the length of prior record before save or use full text.
Keep the same data twice in one table is not a good solution, it all depends on your project, but usually better it's filter formating on php.
All,
I'm writing a web app that will receive user generated text content. Some of those inputs will be a few words, some will be several sentence long. In more than 90% of cases, the inputs will be less than 800 characters. Inputs need to be searchable. Inputs will be in various character sets, including Asian. The site and the db are based on utf8.
I understand roughly the tradeoffs between VARCHAR and TEXT. What I am envisioning is to have both a VARCHAR and a TEXT table, and to store inputs on one or the other depending on their size (this should be doable by the PHP script).
What do you think of having several tables for data based on its size? Also, would it make any sense to create several VARCHAR tables for various size ranges? My guess is that I will get a large number of user inputs clustered around a few key sizes.
Thanks,
JDelage
Storing values in one column vs another depending on size of input is going to add a heck of a lot more complexity to the application than it'll be worth.
As for VARCHAR vs TEXT in MySQL, here's a good discussion about that, MySQL: Large VARCHAR vs TEXT.
The "tricky" part is doing a full-text search on this field which requires the use of MyISAM storage engine as it's the only one that supports full-text indexes. Also of note is that sometimes at the cost of complicating the system architecture, it might be worthwhile to use something like Apache Solr as it perform full-text search much more efficiently. A lot of people have most of the data in their MySQL database and use something like Solr just for full-text indexing that text column and later doing fancy searches with that index.
Re: Unicode. I've used Solr for full-text indexing of text with Unicode characters just fine.
Comments are correct. You are only adding 1 byte by using the TEXT datatype over VARCHAR.
Storage Requirements:
VARCHAR Length of string + 1 byte
TEXT Length of string + 2 bytes
The way I see it is you have two options:
Hold it in TEXT, it will waste single additional byte on storage and additional X processing power on search.
Hold it in VARCHAR, create additional table named A_LOT_OF_TEXT with the structure of (int row_id_of_varchar_table, TEXT). If the data is small enough, put it in varchar, otherwise put a predefined value instead of data, for example 'THE_DATA_YOU_ARE_LOOKING_FOR_IS_IN_TABLE_NAMED_A_LOT_OF_TEXT' or just simply NULL and put the real data to table A_LOT_OF_TEXT.