Column varchar and issue about index or fulltext?

Column varchar and issue about index or fulltext? - php

Well I have a column varchar for password on my table and at some scripts i make queries like:
length(column_varchar) < 10
My question is if i put a index on this column, it will help? or in this case should use fulltext? or don't need a index?
Another question i need to use index in all columns that will be used in 'where'?
Thanks in advanced.

Indexes are used to index content (field value), not the length of the field, therefore no index can help in the above query. (N. B. you could have a sparate field that has the content length and index that separate field.) Also, the password should be stored in a hashed format, so all password lengths should be the same, or at least should not be a criteria for selection.
No, you should not index all columns that will be used in a where criteria. Selecting the optimal index structure is a complicated and very broad topic. Always consider the following points when trying to determine what fields (or combination of fields) to index:
Indexes speed up selects, but slow down data modification, since you have to update the index as well, not just the column's value.
MySQL can use only 1 index per table in a query.
MySQL uses the selectivity of the indexes to determine which one to use. A field that can have 2 values only (yes / no, true / false) is not selective enough, so do not trouble yourself with indexing it.
Always use the explain command to check which indexes your queries use.

You've got two questions here, in general you should split questions up.
Anyway, the first "Will it help indexing a column where you doing a test for length."
No, it won't. The only way you could improve the performance here would be to have an additional column that holds the length of the value in column_varchar and index that.
You wrote in comments that you are holding hashes, so the lengths will all be the same, so I have to guess that some passwords are null and so you don't hash them, or that you are migrating from not hashed to hashed.
The second question: should you index all fields in a where clause. This is not an automatic yes, which is why there are books written about query optimisation.
It depends on how much benefit you will get from the index, and that depends on the nature of the data.
The main trade off is between insert speed and query speed. Indexes slow inserts and speed up queries.
The next thing to consider is selectivity. If the value you are indexing has only three potential values, for example, the database will need frequent updating of the index to get real value from it.
In this specific case, you have evenly distributed data ( because it is hashed), you have great selectivity ( MD5 has few collisions) and you are expecting to query more often with a single term, so you should definitely be indexing this column.

Related

Is there a way to compress a MySQL column where values repeat very often?

I have a InnoDB table with a VARCHAR column, with tens of thousands instances of the same text under it. Is there a way to compact it on-the-fly in order to save space? Is some kind of INDEX enough?
Can't InnoDB see that the values are the same, and use less space by internally assigning them some ID or whatever?

If the task is as simple as it seems, then what you are looking for is normalisation.
In simple terms, what you have to do is make this column contain Foreign Keys to another table, which has the values for this table. Now, store newer values in the other table, and when a value previously exists you do not need to make another entry for that in the table. Form this relation between the tables and in your original table a huge amount of space will be saved.
I suggest you to read up about redundancies and normalisation.
Hope it solves your problem.

You can use MySQL ENUM data type. It stores the values as indexes, but upon select you see the text value.
Here is the documentation:
http://dev.mysql.com/doc/refman/5.7/en/enum.html
Cons are that not all databases support ENUM type so you may find that as a problem if some day you decide to switch databases.
There also some other limitations pointed here:
http://dev.mysql.com/doc/refman/5.7/en/enum.html#enum-limits

Combine Multiple Rows in MySQL into JSON or Serialize

I currently have a database structure for dynamic forms as such:
grants_app_id user_id field_name field_value
5--------------42434----full_name---John Doe
5--------------42434----title-------Programmer
5--------------42434----email-------example#example.com
I found this to be very difficult to manage, and it filled up the number rows in the database very quickly. I have different field_names that can vary up to 78 rows, so it proved to be very costly when making updates to the field_values or simply searching them. I would like to combine the rows and use either json or php serialize to greatly reduce the impact on the database. Does anyone have any advice on how I should approach this? Thank you!
This would be the expected output:
grants_app_id user_id data
5--------------42434----{"full_name":"John Doe", "title":"Programmer", "email":"example#example.com"}

It seems you don't have a simple primary key in those rows.
Speeding up the current solution:
create an index for (grants_app_id, user_id)
add an auto-incrementing primary key
switch from field_name to field_id
The index will make retrieving full-forms a lot more fun (while taking a bit extra time on insert).
The primary key allow you to update a row by specifying a single value backed by a unique index, which should generally be really fast.
You probably already have some definition of fields. Add integer-IDs and use them to speed up the process as less data is stored, compared, indexed, ...
Switching to a JSON-Encoded variant
Converting arrays to JSON and back can be done by using json_encode and json_decode since PHP 5.2.
How can you switch to JSON?
Possibly the current best way would be to use a PHP-Script (or similar) to retrieve all data from the old table, group it correctly and insert it into a fresh table. Afterwards you may switch names, ... This is an offline approach.
An alternative would be to add a new column and indicate by field_name=NULL that the new column contains the data. Afterwards you are free to convert data at any time or store only new data as JSON.
Use JSON?
While certainly it is tempting to have all data in one row there are somethings to remember:
with all fields preserved in a single text-field searching for a value inside a field may become a two-phase approach, as a % inside any LIKE can skip into other field's values. Also LIKE '%field:value%' is not easily optimized by indexing the column.
changing a single field means updating all stored fields. As long as you are sure only one process changes the data at any given time this is ok, otherwise there tend to be more problems.
JSON-column needs to be big enough to hold field-names + values + separators. This can be a lot. Also if you miss-calculate a long value in any field means a truncation with the risk of loosing all information on all fields after the long value
So in your case even with 78 different fields it may still be better two have a row per formular user and field. (It may even turn out that JSON is more practicable for formulars with few fields).
As explained in this question you have to remember that JSON is only some other text to MySQL.

Storing an index list with MYSQL?

I have a MySQL/PHP performance related question.
I need to store an index list associated with each record in a table. Each list contains 1000 indices. I need to be able to quickly access any index value in the list associated to a given record. I am not sure about the best way to go. I've thought of the following ways and would like your input on them:
Store the list in a string as a comma separated value list or using JSON. Probably terrible performance since I need to extract the whole list out of the DB to PHP only to retrieve a single value. Parsing the string won't exactly be fast either... I can store a number of expanded lists in a Least Rencently Used cache on the PHP side to reduce load.
Make a list table with 1001 columns that will store the list and its primary key. I'm not sure how costly this is regarding storage? This also feels like abusing the system. And then, what if I need to store 100000 indices?
Only store with SQL the name of the binary file containing my indices and perform a fopen(); fseek(); fread(); fclose() cycle for each access? Not sure how the system filesystem cache will react to that. If it goes badly then there are many solutions available to adress the issues... but that's sounds a bit overkill no?
What do you think of that?

What about a good old one-to-many relationship?
records
-------
id int
record ...
indices
-------
record_id int
index varchar
Then:
SELECT *
FROM records
LEFT JOIN indices
ON records.id = indices.record_id
WHERE indices.index = 'foo'

The standard solution is to create another table, with one row per (record, index), and add a MySQL Index to allow fast search
CREATE TABLE IF NOT EXISTS `table_list` (
`IDrecord` int(11) NOT NULL,
`item` int(11) NOT NULL,
KEY `IDrecord` (`IDrecord`)
)
Change the item's type according to your needs - I used int in my example.

The most logical solution would be to put each value in it's own tuple. Adding a MYSQL index to each tuple will enable the DBMS to quickly ascertain the value, and should improve performance.
The reasons we're not going with your other answers are as follows:
Option 1
Storing multiple values in one MYSQL cell is a violation of the first stage of database normalisation. You can read up on it here.
Option 3
This has heavy reliance on other files. You want to localize your data storage as much as possible, to make it easier to maintain in the future.

Mysql phpMyAdmin few questions:

I am quite new to the mysql phpMyadmin environment, and I would like to have some area
1. I need a field of text that should be up to around 500 characters.
Does that have to be "TEXT" field? does it take the application to be responsible for the length ?
indexes. I understand that when I signify a field as "indexed", that means that field would have a pointer table and upon each a WHERE inclusive command, the search would be optimized by that field (log n complexity). But what happens if I signify a field as indexed after the fact ? say after it has some rows in it ? can I issue a command like "walk through all that table and index that field" ?
When I mark fields as indexed, I sometimes get them in phpMyAdmin as having the keyname
for accessing the table by the indexed field when I write php, does it take an extra effort on my side to use that keyname that is written down there at the "structure" view to use the table as indexed, or does that keyname is being used behind the scenes and I should not care about it whatsoever ?
I sometimes get the keynames referencing two or more fields altogether. The fields show one on top of the other. I don't know how it happened, but I need them to index only one field. What is going on ?
I use UTF-8 values in my db. When I created it, I think I marked it as utf8_unicode_ci, and some fields are marked as utf8_general_ci, does it matter ? Can I go back and change the whole DB definition to be utf8_general_ci ?
I think that was quite a bit,
I thank you in advance!
Ted

First, be aware that this not per se something about phpmyadmin, but more about mysql / databases.
1)
An index means that you make a list (most of the time a tree) of the values that are present. This way you can easily find the row with that/those values. This tree can be just as easily made after you insert values then before. Mind you, this means that all the "add to index" commands are put together, so not something you want to do on a "live" table with loads of entries. But you can add an index whenever you want it. Just add the index and the index will be made, either for an empty table or for a 'used' one.
2)
I don't know what you mean by this. Indexes have a name, it doesn't really matter what it is. A (primary) key is an index, but not all indexes are keys.
3)
You don't need to 'force' mysql to use a key, the optimizer knows best how and when to use keys. If your keys are correct they are used, if they are not correct they can't be used so you can't force it: in other words: don't think about it :)
4)
PHPMYADMIN makes a composite keys if you mark 2 fields as key at the same time. THis is annoying and can be wrong. If you search for 2 things at once, you can use the composite key, but if you search for the one thing, you can't. Just mark them as a key one at a time, or use the correct SQL command manually.
5)
you can change whatever you like, but I don't know what will happen with your values. Better check manually :)

If you need a field to contain 500 characters, you can do that with VARCHAR. Just set its length to 500.
You don't index field by field, you index a whole column. So it doesn't matter if the table has data in it. All the rows will be indexed.
Not a question
The indexes will be used whenever they can. You only need to worry about using the same columns that you have indexed in the WHERE section of your query. Read about it here
You can add as many columns as you wish in an index. For example, if you add columns "foo", "bar" and "ming" to an index, your database will be speed optimized for searches using those columns in the WHERE clause, in that order. Again, the link above explains it all.
I don't know. I'm 100% sure that if you use only UTF-8 values in the database, it won't matter. You can change this later though, as explained in this Stackoverflow question: How to convert an entire MySQL database characterset and collation to UTF-8?
I would recommend you scrap PHPMyAdmin for HeidiSQL though. HeidiSQL is a windows client that manages all your MySQL servers. It has lots of cool functions, like copying a table or database directly from one MySQL server to another. Try it out (it's free)

MySQL Unique hash insertion

So, imagine a mysql table with a few simple columns, an auto increment, and a hash (varchar, UNIQUE).
Is it possible to give mysql a query that will add a column, and generate a unique hash without multiple queries?
Currently, the only way I can think of to achieve this is with a while, which I worry would become more and more processor intensive the more entries were in the db.
Here's some pseudo-php, obviously untested, but gets the general idea across:
while(!query("INSERT INTO table (hash) VALUES (".generate_hash().");")){
//found conflict, try again.
}
In the above example, the hash column would be UNIQUE, and so the query would fail. The problem is, say there's 500,000 entries in the db and I'm working off of a base36 hash generator, with 4 characters. The likelyhood of a conflict would be almost 1 in 3, and I definitely can't be running 160,000 queries. In fact, any more than 5 I would consider unacceptable.
So, can I do this with pure SQL? I would need to generate a base62, 6 char string (like: "j8Du7X", chars a-z, A-Z, and 0-9), and either update the last_insert_id with it, or even better, generate it during the insert.
I can handle basic CRUD with MySQL, but even JOINs are a little outside of my MySQL comfort zone, so excuse my ignorance if this is cake.
Any ideas? I'd prefer to use either pure MySQL or PHP & MySQL, but hell, if another language can get this done cleanly, I'd build a script and AJAX it too.
Thanks!

This is our approach for a similar project, where we wanted to generate unique coupon codes.
First, we used an AUTO_INCREMENT primary key. This ensures uniqueness and query speed.
Then, we created a base24 numbering system, using A,B,C, etc, without using O and I, because someone might have thought that they were 0 or 1.
Then we converted the auto-increment integer to our base24 number. For example, 0=A, 1=B, 28=BE, 1458965=EKNYF. We used base24, because long numbers in base10 have fewer letters in base24.
Then we created a separate column in our table, coupon_code. This was not indexed.
We took the base24 and added 3 random numbers, or I and O (which were not used in our base24), and inserted them into our number. For example, EKNYF could turn into 1EKON6F or EK2NY3F9. This was our coupon code and we inserted it into our coupon_code column. It's unique and random.
So, when the user uses code EK2NY3F9, all we have to do it remove all non-used characters (2,3 and 9) and we get EKNYF, which we convert to 1458965. We just select the primary key 1458965 and then compare coupon_code column with EK2NY3F9.
I hope this helps.

If your heart is set on using base-36 4 character hashes (hashspace is only 1679616), you could probably pre-generate a table of hashes that aren't already in the other table. Then finding a unique hash would be as simple as moving it from the "unused table" to the "used table" which is O(1).
If your table is conceivably 1/3 full you might want to consider expanding your hashspace since it will probably fill up in your lifetime. Once the space is full you will no longer be able to find unique hashes no matter what algorithm you use.

What is this hash a hash of? It seems like you just want a randomly generated unique VARCHAR column? What's wrong with the auto increment?
Anyway, you should just use a bigger hash - find an MD5 function - (if you're actually hashing something), or a UUID generator with more than 4 characters, and yes, you could use a while loop, but just generate a big enough one so that conflicts are incredibly unlikely

As others have suggested whats wrong with an autoinc field? If you want an alpha numeric value then you could simply do a simple conversion from int to a alphanumeric string in base 36. This could be implemented in almost any language.

Going with zneaks comment, why don't you use an autoincrement column? save the hash in another (non unique) field, and concatenate the id to it (dynamically). So you give a user [hash][id]. You can parse it out in pure sql using the substring functions.
Since you have to have the hash, the user can't look at other records by incrementing the id.

So, just in case someone runs across a similar issue, I'm using a UNIQUE field, I'll be using a php hash function to insert the hashes, if it comes back with an error, I'll try again.
Hopefully because of the low likelyhood of conflict, it won't get slow.

You could also check the MySQL functions UUID() and UUID_SHORT(). Those functions generate UUIDs that are globally unique by definition. You won't have to double-check if your PHP-generated hash string already exists.
I think in several cases these functions can also fit your project's requirements. :-)

If you already have the table filled by some content, you can alter it with the following :
ALTER TABLE `page` ADD COLUMN `hash` char(64) AS (SHA2(`content`, 256)) AFTER `content`
This solution will add hash column right after the content one, generates hash for existing and new records too without need to change your INSERT statement.
If you add UNIQUE index to the column (after have removed duplicates), your inserts will only be done if content is not already in the table. This will prevent duplicates.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.