Creating MySQL table for user-input text strings

Creating MySQL table for user-input text strings - php

Using a PHP form to insert the text strings from users into the table, and another form to pull it later on another page, what would be the best method of creating a table in MySQL for strings of text, and what options when creating the table would likely be necessary to best handle text strings?
The complicating factor, I suppose, is that the text that would exist in the table doesn't exist yet (as it would need to be input through the form, etc.), I am unsure if this is why I've had trouble (along with my relative inexperience, so I am unsure of what, precisely would be an ideal table configuration).
Since I don't want to store any other data beyond this user input (like I said, just strings of text i.e a sentence), I assumed I only needed one column when creating the table, but I was unsure of this as well; it seems it is possible I am more likely just overlooking something about how SQL works.

I'll put my comments into an answer now:
consider the estimated, maximum length of such strings to decide whether to use varchar-fields or text-fields in mysql.
Quoting from the MySQL-Manual (BTW a good read for your purpose):
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions.
http://dev.mysql.com/doc/refman/5.0/en/char.html
It is said that varcharis faster, for a good summary, see MySQL: Large VARCHAR vs. TEXT?
consider having at least a 2nd field called id (int, primary key, auto increment), when you need to reference those strings later. Consider having a field referencing the author of that string. Maybe a field to store the date and time when the string was put into the database would be a good idea as well.
use mysqli or PDO instead of mysql, which is deprecated.
See here, there are links to good tutorials in the 1st answer: How do I migrate my site from mysql to mysqli?

Related

How to get UUID in php created from trigger in mySQL [duplicate]

i have a table field type varchar(36) and i want to generate it dynamically by mysql so i used this code:
$sql_code = 'insert into table1 (id, text) values (uuid(),'some text');';
mysql_query($sql_code);
how can i retrieve the generated uuid immediately after inserting the record ?

char(36) is better
you cannot. The only solution is to perform 2 separated queries:
SELECT UUID()
INSERT INTO table1 (id, text) VALUES ($uuid, 'text')
where $uuid is the value retrieved on the 1st step.

You can do everything you need to with SQL triggers. The following SQL adds a trigger on tablename.table_id to automatically create the primary key UUID when inserting, then stores the newly created ID into an SQL variable for retrieval later:
CREATE TRIGGER `tablename_newid`
AFTER INSERT ON `tablename`
FOR EACH ROW
BEGIN
IF ASCII(NEW.table_id) = 0 THEN
SET NEW.table_id = UNHEX(REPLACE(UUID(),'-',''));
END IF;
SET #last_uuid = NEW.table_id;
END
As a bonus, it inserts the UUID in binary form to a binary(16) field to save storage space and greatly increase query speed.
edit: the trigger should check for an existing column value before inserting its own UUID in order to mimic the ability to provide values for table primary keys in MySQL - without this, any values passed in will always be overridden by the trigger. The example has been updated to use ASCII() = 0 to check for the existence of the primary key value in the INSERT, which will detect empty string values for a binary field.
edit 2: after a comment here it has since been pointed out to me that using BEFORE INSERT has the effect of setting the #last_uuid variable even if the row insert fails. I have updated my answer to use AFTER INSERT - whilst I feel this is a totally fine approach under general circumstances it may have issues with row replication under clustered or replicated databases. If anyone knows, I would love to as well!
To read the new row's insert ID back out, just run SELECT #last_uuid.
When querying and reading such binary values, the MySQL functions HEX() and UNHEX() will be very helpful, as will writing your query values in hex notation (preceded by 0x). The php-side code for your original answer, given this type of trigger applied to table1, would be:
// insert row
$sql = "INSERT INTO table1(text) VALUES ('some text')";
mysql_query($sql);
// get last inserted UUID
$sql = "SELECT HEX(#last_uuid)";
$result = mysql_query($sql);
$row = mysql_fetch_row($result);
$id = $row[0];
// perform a query using said ID
mysql_query("SELECT FROM table1 WHERE id = 0x" . $id);
Following up in response to #ina's comment:
A UUID is not a string, even if MySQL chooses to represent it as such. It's binary data in its raw form, and those dashes are just MySQL's friendly way of representing it to you.
The most efficient storage for a UUID is to create it as UNHEX(REPLACE(UUID(),'-','')) - this will remove that formatting and convert it back to binary data. Those functions will make the original insertion slower, but all following comparisons you do on that key or column will be much faster on a 16-byte binary field than a 36-character string.
For one, character data requires parsing and localisation. Any strings coming in to the query engine are generally being collated automatically against the character set of the database, and some APIs (wordpress comes to mind) even run CONVERT() on all string data before querying. Binary data doesn't have this overhead. For the other, your char(36) is actually allocating 36 characters, which means (if your database is UTF-8) each character could be as long as 3 or 4 bytes depending on the version of MySQL you are using. So a char(36) can range anywhere from 36 bytes (if it consists entirely of low-ASCII characters) to 144 if consisting entirely of high-order UTF8 characters. This is much larger than the 16 bytes we have allocated for our binary field.
Any logic performed on this data can be done with UNHEX(), but is better accomplished by simply escaping data in queries as hex, prefixed with 0x. This is just as fast as reading a string, gets converted to binary on the fly and directly assigned to the query or cell in question. Very fast.
Reading data out is slightly slower - you have to call HEX() on all binary data read out of a query to get it in a useful format if your client API doesn't deal well with binary data (PHP in paricular will usually determine that binary strings === null and will break them if manipulated without first calling bin2hex(), base64_encode() or similar) - but this overhead is about as minimal as character collation and more importantly is only being called on the actual cells SELECTed, not all cells involved in the internal computations of a query result.
So of course, all these small speed increases are very minimal and other areas result in small decreases - but when you add them all up binary still comes out on top, and when you consider use cases and the general 'reads > writes' principle it really shines.
... and that's why binary(16) is better than char(36).

Its pretty easy actually
you can pass this to mysql and it will return the inserted id.
set #id=UUID();
insert into <table>(<col1>,<col2>) values (#id,'another value');
select #id;

Depending on how the uuid() function is implemented, this is very bad programming practice - if you try to do this with binary logging enabled (i.e. in a cluster) then the insert will most likely fail. Ivan's suggestion looks it might solve the immediate problem - however I thought this only returned the value generated for an auto-increment field - indeed that's what the manual says.
Also what's the benefit of using a uuid()? Its computationally expensive to generate, requires a lot of storage, increases the cost of querying the data and is not cryptographically secure. Use a sequence generator or autoincrement instead.
Regardless if you use a sequence generator or uuid, if you must use this as the only unique key on the database, then you'll need to assign the value first, read it back into phpland and embed/bind the value as a literal to the subsequent insert query.

Is there a way to compress a MySQL column where values repeat very often?

I have a InnoDB table with a VARCHAR column, with tens of thousands instances of the same text under it. Is there a way to compact it on-the-fly in order to save space? Is some kind of INDEX enough?
Can't InnoDB see that the values are the same, and use less space by internally assigning them some ID or whatever?

If the task is as simple as it seems, then what you are looking for is normalisation.
In simple terms, what you have to do is make this column contain Foreign Keys to another table, which has the values for this table. Now, store newer values in the other table, and when a value previously exists you do not need to make another entry for that in the table. Form this relation between the tables and in your original table a huge amount of space will be saved.
I suggest you to read up about redundancies and normalisation.
Hope it solves your problem.

You can use MySQL ENUM data type. It stores the values as indexes, but upon select you see the text value.
Here is the documentation:
http://dev.mysql.com/doc/refman/5.7/en/enum.html
Cons are that not all databases support ENUM type so you may find that as a problem if some day you decide to switch databases.
There also some other limitations pointed here:
http://dev.mysql.com/doc/refman/5.7/en/enum.html#enum-limits

Combine Multiple Rows in MySQL into JSON or Serialize

I currently have a database structure for dynamic forms as such:
grants_app_id user_id field_name field_value
5--------------42434----full_name---John Doe
5--------------42434----title-------Programmer
5--------------42434----email-------example#example.com
I found this to be very difficult to manage, and it filled up the number rows in the database very quickly. I have different field_names that can vary up to 78 rows, so it proved to be very costly when making updates to the field_values or simply searching them. I would like to combine the rows and use either json or php serialize to greatly reduce the impact on the database. Does anyone have any advice on how I should approach this? Thank you!
This would be the expected output:
grants_app_id user_id data
5--------------42434----{"full_name":"John Doe", "title":"Programmer", "email":"example#example.com"}

It seems you don't have a simple primary key in those rows.
Speeding up the current solution:
create an index for (grants_app_id, user_id)
add an auto-incrementing primary key
switch from field_name to field_id
The index will make retrieving full-forms a lot more fun (while taking a bit extra time on insert).
The primary key allow you to update a row by specifying a single value backed by a unique index, which should generally be really fast.
You probably already have some definition of fields. Add integer-IDs and use them to speed up the process as less data is stored, compared, indexed, ...
Switching to a JSON-Encoded variant
Converting arrays to JSON and back can be done by using json_encode and json_decode since PHP 5.2.
How can you switch to JSON?
Possibly the current best way would be to use a PHP-Script (or similar) to retrieve all data from the old table, group it correctly and insert it into a fresh table. Afterwards you may switch names, ... This is an offline approach.
An alternative would be to add a new column and indicate by field_name=NULL that the new column contains the data. Afterwards you are free to convert data at any time or store only new data as JSON.
Use JSON?
While certainly it is tempting to have all data in one row there are somethings to remember:
with all fields preserved in a single text-field searching for a value inside a field may become a two-phase approach, as a % inside any LIKE can skip into other field's values. Also LIKE '%field:value%' is not easily optimized by indexing the column.
changing a single field means updating all stored fields. As long as you are sure only one process changes the data at any given time this is ok, otherwise there tend to be more problems.
JSON-column needs to be big enough to hold field-names + values + separators. This can be a lot. Also if you miss-calculate a long value in any field means a truncation with the risk of loosing all information on all fields after the long value
So in your case even with 78 different fields it may still be better two have a row per formular user and field. (It may even turn out that JSON is more practicable for formulars with few fields).
As explained in this question you have to remember that JSON is only some other text to MySQL.

Mysql phpMyAdmin few questions:

I am quite new to the mysql phpMyadmin environment, and I would like to have some area
1. I need a field of text that should be up to around 500 characters.
Does that have to be "TEXT" field? does it take the application to be responsible for the length ?
indexes. I understand that when I signify a field as "indexed", that means that field would have a pointer table and upon each a WHERE inclusive command, the search would be optimized by that field (log n complexity). But what happens if I signify a field as indexed after the fact ? say after it has some rows in it ? can I issue a command like "walk through all that table and index that field" ?
When I mark fields as indexed, I sometimes get them in phpMyAdmin as having the keyname
for accessing the table by the indexed field when I write php, does it take an extra effort on my side to use that keyname that is written down there at the "structure" view to use the table as indexed, or does that keyname is being used behind the scenes and I should not care about it whatsoever ?
I sometimes get the keynames referencing two or more fields altogether. The fields show one on top of the other. I don't know how it happened, but I need them to index only one field. What is going on ?
I use UTF-8 values in my db. When I created it, I think I marked it as utf8_unicode_ci, and some fields are marked as utf8_general_ci, does it matter ? Can I go back and change the whole DB definition to be utf8_general_ci ?
I think that was quite a bit,
I thank you in advance!
Ted

First, be aware that this not per se something about phpmyadmin, but more about mysql / databases.
1)
An index means that you make a list (most of the time a tree) of the values that are present. This way you can easily find the row with that/those values. This tree can be just as easily made after you insert values then before. Mind you, this means that all the "add to index" commands are put together, so not something you want to do on a "live" table with loads of entries. But you can add an index whenever you want it. Just add the index and the index will be made, either for an empty table or for a 'used' one.
2)
I don't know what you mean by this. Indexes have a name, it doesn't really matter what it is. A (primary) key is an index, but not all indexes are keys.
3)
You don't need to 'force' mysql to use a key, the optimizer knows best how and when to use keys. If your keys are correct they are used, if they are not correct they can't be used so you can't force it: in other words: don't think about it :)
4)
PHPMYADMIN makes a composite keys if you mark 2 fields as key at the same time. THis is annoying and can be wrong. If you search for 2 things at once, you can use the composite key, but if you search for the one thing, you can't. Just mark them as a key one at a time, or use the correct SQL command manually.
5)
you can change whatever you like, but I don't know what will happen with your values. Better check manually :)

If you need a field to contain 500 characters, you can do that with VARCHAR. Just set its length to 500.
You don't index field by field, you index a whole column. So it doesn't matter if the table has data in it. All the rows will be indexed.
Not a question
The indexes will be used whenever they can. You only need to worry about using the same columns that you have indexed in the WHERE section of your query. Read about it here
You can add as many columns as you wish in an index. For example, if you add columns "foo", "bar" and "ming" to an index, your database will be speed optimized for searches using those columns in the WHERE clause, in that order. Again, the link above explains it all.
I don't know. I'm 100% sure that if you use only UTF-8 values in the database, it won't matter. You can change this later though, as explained in this Stackoverflow question: How to convert an entire MySQL database characterset and collation to UTF-8?
I would recommend you scrap PHPMyAdmin for HeidiSQL though. HeidiSQL is a windows client that manages all your MySQL servers. It has lots of cool functions, like copying a table or database directly from one MySQL server to another. Try it out (it's free)

MySQL Unique hash insertion

So, imagine a mysql table with a few simple columns, an auto increment, and a hash (varchar, UNIQUE).
Is it possible to give mysql a query that will add a column, and generate a unique hash without multiple queries?
Currently, the only way I can think of to achieve this is with a while, which I worry would become more and more processor intensive the more entries were in the db.
Here's some pseudo-php, obviously untested, but gets the general idea across:
while(!query("INSERT INTO table (hash) VALUES (".generate_hash().");")){
//found conflict, try again.
}
In the above example, the hash column would be UNIQUE, and so the query would fail. The problem is, say there's 500,000 entries in the db and I'm working off of a base36 hash generator, with 4 characters. The likelyhood of a conflict would be almost 1 in 3, and I definitely can't be running 160,000 queries. In fact, any more than 5 I would consider unacceptable.
So, can I do this with pure SQL? I would need to generate a base62, 6 char string (like: "j8Du7X", chars a-z, A-Z, and 0-9), and either update the last_insert_id with it, or even better, generate it during the insert.
I can handle basic CRUD with MySQL, but even JOINs are a little outside of my MySQL comfort zone, so excuse my ignorance if this is cake.
Any ideas? I'd prefer to use either pure MySQL or PHP & MySQL, but hell, if another language can get this done cleanly, I'd build a script and AJAX it too.
Thanks!

This is our approach for a similar project, where we wanted to generate unique coupon codes.
First, we used an AUTO_INCREMENT primary key. This ensures uniqueness and query speed.
Then, we created a base24 numbering system, using A,B,C, etc, without using O and I, because someone might have thought that they were 0 or 1.
Then we converted the auto-increment integer to our base24 number. For example, 0=A, 1=B, 28=BE, 1458965=EKNYF. We used base24, because long numbers in base10 have fewer letters in base24.
Then we created a separate column in our table, coupon_code. This was not indexed.
We took the base24 and added 3 random numbers, or I and O (which were not used in our base24), and inserted them into our number. For example, EKNYF could turn into 1EKON6F or EK2NY3F9. This was our coupon code and we inserted it into our coupon_code column. It's unique and random.
So, when the user uses code EK2NY3F9, all we have to do it remove all non-used characters (2,3 and 9) and we get EKNYF, which we convert to 1458965. We just select the primary key 1458965 and then compare coupon_code column with EK2NY3F9.
I hope this helps.

If your heart is set on using base-36 4 character hashes (hashspace is only 1679616), you could probably pre-generate a table of hashes that aren't already in the other table. Then finding a unique hash would be as simple as moving it from the "unused table" to the "used table" which is O(1).
If your table is conceivably 1/3 full you might want to consider expanding your hashspace since it will probably fill up in your lifetime. Once the space is full you will no longer be able to find unique hashes no matter what algorithm you use.

What is this hash a hash of? It seems like you just want a randomly generated unique VARCHAR column? What's wrong with the auto increment?
Anyway, you should just use a bigger hash - find an MD5 function - (if you're actually hashing something), or a UUID generator with more than 4 characters, and yes, you could use a while loop, but just generate a big enough one so that conflicts are incredibly unlikely

As others have suggested whats wrong with an autoinc field? If you want an alpha numeric value then you could simply do a simple conversion from int to a alphanumeric string in base 36. This could be implemented in almost any language.

Going with zneaks comment, why don't you use an autoincrement column? save the hash in another (non unique) field, and concatenate the id to it (dynamically). So you give a user [hash][id]. You can parse it out in pure sql using the substring functions.
Since you have to have the hash, the user can't look at other records by incrementing the id.

So, just in case someone runs across a similar issue, I'm using a UNIQUE field, I'll be using a php hash function to insert the hashes, if it comes back with an error, I'll try again.
Hopefully because of the low likelyhood of conflict, it won't get slow.

You could also check the MySQL functions UUID() and UUID_SHORT(). Those functions generate UUIDs that are globally unique by definition. You won't have to double-check if your PHP-generated hash string already exists.
I think in several cases these functions can also fit your project's requirements. :-)

If you already have the table filled by some content, you can alter it with the following :
ALTER TABLE `page` ADD COLUMN `hash` char(64) AS (SHA2(`content`, 256)) AFTER `content`
This solution will add hash column right after the content one, generates hash for existing and new records too without need to change your INSERT statement.
If you add UNIQUE index to the column (after have removed duplicates), your inserts will only be done if content is not already in the table. This will prevent duplicates.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.