Which collation is preferred when sorting is not required?

Which collation is preferred when sorting is not required? - php

I have a table e.g.:
create table T1(Id int primary key auto_increment, Value text)
Value is used to store "textual" data but rows are never sorted according to the Value column.
Which collation should be prefered for Value?
Would utf8mb4_bin be a better choice or utf8mb4_general_ci?

That looks fine. I certainly wouldn't use a case-insensitive collation if it wasn't needed (as per your case) since it may result in slower queries (though I doubt it would be used for non-textual fields anyway).
You should keep in mind, however, that collation is not just for sorting, but for selection as well (e.g., the where clause). If you're only going to retrieve rows based on columns other than Value, that shouldn't matter.
In any case, I'm actually not a big fan of case-insensitive collations being done by the database itself, since I'd rather keep the database running as blindingly fast as possible, and use my own methods to handle case issues (such as an extra indexed column holding lower-cased last names, and updated with insert/update triggers to maintain consistency with the rest of the row).
Basically, I'm a Luddite :-) but nobody ever complains about how big their databases are, only about how slow.

Related

Is there a way to compress a MySQL column where values repeat very often?

I have a InnoDB table with a VARCHAR column, with tens of thousands instances of the same text under it. Is there a way to compact it on-the-fly in order to save space? Is some kind of INDEX enough?
Can't InnoDB see that the values are the same, and use less space by internally assigning them some ID or whatever?

If the task is as simple as it seems, then what you are looking for is normalisation.
In simple terms, what you have to do is make this column contain Foreign Keys to another table, which has the values for this table. Now, store newer values in the other table, and when a value previously exists you do not need to make another entry for that in the table. Form this relation between the tables and in your original table a huge amount of space will be saved.
I suggest you to read up about redundancies and normalisation.
Hope it solves your problem.

You can use MySQL ENUM data type. It stores the values as indexes, but upon select you see the text value.
Here is the documentation:
http://dev.mysql.com/doc/refman/5.7/en/enum.html
Cons are that not all databases support ENUM type so you may find that as a problem if some day you decide to switch databases.
There also some other limitations pointed here:
http://dev.mysql.com/doc/refman/5.7/en/enum.html#enum-limits

Combine Multiple Rows in MySQL into JSON or Serialize

I currently have a database structure for dynamic forms as such:
grants_app_id user_id field_name field_value
5--------------42434----full_name---John Doe
5--------------42434----title-------Programmer
5--------------42434----email-------example#example.com
I found this to be very difficult to manage, and it filled up the number rows in the database very quickly. I have different field_names that can vary up to 78 rows, so it proved to be very costly when making updates to the field_values or simply searching them. I would like to combine the rows and use either json or php serialize to greatly reduce the impact on the database. Does anyone have any advice on how I should approach this? Thank you!
This would be the expected output:
grants_app_id user_id data
5--------------42434----{"full_name":"John Doe", "title":"Programmer", "email":"example#example.com"}

It seems you don't have a simple primary key in those rows.
Speeding up the current solution:
create an index for (grants_app_id, user_id)
add an auto-incrementing primary key
switch from field_name to field_id
The index will make retrieving full-forms a lot more fun (while taking a bit extra time on insert).
The primary key allow you to update a row by specifying a single value backed by a unique index, which should generally be really fast.
You probably already have some definition of fields. Add integer-IDs and use them to speed up the process as less data is stored, compared, indexed, ...
Switching to a JSON-Encoded variant
Converting arrays to JSON and back can be done by using json_encode and json_decode since PHP 5.2.
How can you switch to JSON?
Possibly the current best way would be to use a PHP-Script (or similar) to retrieve all data from the old table, group it correctly and insert it into a fresh table. Afterwards you may switch names, ... This is an offline approach.
An alternative would be to add a new column and indicate by field_name=NULL that the new column contains the data. Afterwards you are free to convert data at any time or store only new data as JSON.
Use JSON?
While certainly it is tempting to have all data in one row there are somethings to remember:
with all fields preserved in a single text-field searching for a value inside a field may become a two-phase approach, as a % inside any LIKE can skip into other field's values. Also LIKE '%field:value%' is not easily optimized by indexing the column.
changing a single field means updating all stored fields. As long as you are sure only one process changes the data at any given time this is ok, otherwise there tend to be more problems.
JSON-column needs to be big enough to hold field-names + values + separators. This can be a lot. Also if you miss-calculate a long value in any field means a truncation with the risk of loosing all information on all fields after the long value
So in your case even with 78 different fields it may still be better two have a row per formular user and field. (It may even turn out that JSON is more practicable for formulars with few fields).
As explained in this question you have to remember that JSON is only some other text to MySQL.

Mysql phpMyAdmin few questions:

I am quite new to the mysql phpMyadmin environment, and I would like to have some area
1. I need a field of text that should be up to around 500 characters.
Does that have to be "TEXT" field? does it take the application to be responsible for the length ?
indexes. I understand that when I signify a field as "indexed", that means that field would have a pointer table and upon each a WHERE inclusive command, the search would be optimized by that field (log n complexity). But what happens if I signify a field as indexed after the fact ? say after it has some rows in it ? can I issue a command like "walk through all that table and index that field" ?
When I mark fields as indexed, I sometimes get them in phpMyAdmin as having the keyname
for accessing the table by the indexed field when I write php, does it take an extra effort on my side to use that keyname that is written down there at the "structure" view to use the table as indexed, or does that keyname is being used behind the scenes and I should not care about it whatsoever ?
I sometimes get the keynames referencing two or more fields altogether. The fields show one on top of the other. I don't know how it happened, but I need them to index only one field. What is going on ?
I use UTF-8 values in my db. When I created it, I think I marked it as utf8_unicode_ci, and some fields are marked as utf8_general_ci, does it matter ? Can I go back and change the whole DB definition to be utf8_general_ci ?
I think that was quite a bit,
I thank you in advance!
Ted

First, be aware that this not per se something about phpmyadmin, but more about mysql / databases.
1)
An index means that you make a list (most of the time a tree) of the values that are present. This way you can easily find the row with that/those values. This tree can be just as easily made after you insert values then before. Mind you, this means that all the "add to index" commands are put together, so not something you want to do on a "live" table with loads of entries. But you can add an index whenever you want it. Just add the index and the index will be made, either for an empty table or for a 'used' one.
2)
I don't know what you mean by this. Indexes have a name, it doesn't really matter what it is. A (primary) key is an index, but not all indexes are keys.
3)
You don't need to 'force' mysql to use a key, the optimizer knows best how and when to use keys. If your keys are correct they are used, if they are not correct they can't be used so you can't force it: in other words: don't think about it :)
4)
PHPMYADMIN makes a composite keys if you mark 2 fields as key at the same time. THis is annoying and can be wrong. If you search for 2 things at once, you can use the composite key, but if you search for the one thing, you can't. Just mark them as a key one at a time, or use the correct SQL command manually.
5)
you can change whatever you like, but I don't know what will happen with your values. Better check manually :)

If you need a field to contain 500 characters, you can do that with VARCHAR. Just set its length to 500.
You don't index field by field, you index a whole column. So it doesn't matter if the table has data in it. All the rows will be indexed.
Not a question
The indexes will be used whenever they can. You only need to worry about using the same columns that you have indexed in the WHERE section of your query. Read about it here
You can add as many columns as you wish in an index. For example, if you add columns "foo", "bar" and "ming" to an index, your database will be speed optimized for searches using those columns in the WHERE clause, in that order. Again, the link above explains it all.
I don't know. I'm 100% sure that if you use only UTF-8 values in the database, it won't matter. You can change this later though, as explained in this Stackoverflow question: How to convert an entire MySQL database characterset and collation to UTF-8?
I would recommend you scrap PHPMyAdmin for HeidiSQL though. HeidiSQL is a windows client that manages all your MySQL servers. It has lots of cool functions, like copying a table or database directly from one MySQL server to another. Try it out (it's free)

Will a MySQL database result be slowed down in relation to the number of columns in a table?

Using PHP, I am building an application that is MySQL database resource heavy, but I also need it's data to be very flexible. Currently there are a number of tables which have an array of different columns (including some text, longtext, int, etc), and in the future I would like to expand on the number of columns of these tables, whenever new data-groups are required.
My question is, if I have a table with, say, 10 columns, and I expand this to 40 columns in the future, would a SQL query (via PHP) be slowed down considerably?
As long as the initial, small query that is only looking up the initial 10 columns is not a SELECT-all (*) query, I would like to know if more resources or processing is used because the source table is now much larger.
Also, will the database in general run slower or be much larger due to many columns now constantly remaining as NULL values (eg, whenever a new entry that only requires the first 10 columns is inserted)?

MyISAM and InnoDB behave differently in this regard, for various reasons.
For instance, InnoDB will partition disk space for each column on disk regardless of whether it has data in it, while MyISAM will compress the tables on disk. In a case where there are large amounts of empty columns, InnoDB will be wasting a lot of space. On the other hand, InnoDB does row-level locking, which means that (with caveats) concurrent read / writes to the same table will perform better (MyISAM does a table-level lock on write).
Generally speaking, it's probably not a good idea to have many columns in one table, particularly for volatility reasons. For instance, in InnoDB (possibly MyISAM also?), re-arranging columns or changing types of columns (i.e. varchar 128 -> varchar 255) in the middle of a table requires that all data in columns to the right be moved around on disk to make (or remove) space for the altered column.
With respect to your overall database design, it's best to aim for as many columns as possible to be not null, which saves space (you don't need the null flag on the column, and you don't store empty data) and also increases query and index performance. If many records will have a particular column set to null, you should probably move it to a foreign key relationship and use a JOIN. That way disk space and index overhead is only incurred for records that are actually holding information.

Likely, the best solution would be to create a new table with the additional fields and JOIN the tables when necessary. The original table remains unchanged, keeping it's speed, but you can still get to the extra fields.

Optimization is not a trivia question. Nothing can be predicted.
In general short answer is: yes, it will be slower (because DBMS at least need to read from the disk and send more data, obviously).
But, it is very dependent on each particular case how much slower it will be. You can either even don't see the difference, or get it 10x times slower.

In all likelihood, no it won't be slowed down considerably.
However, a better question to ask is: Which method of adding more fields results in a more elegant, understandable, maintainable, cost effective solution?
Usually the answer is "It depends." It depends on how the data is accessed, how the requirements will change, how the data is updated, and how fast the tables grow.

you can divide one master table into multiple TRANSACTION tables so you will get much faster result than you getting now. and also make the primary key as UNIQUE KEY also in all the transaction as well as master tables. its really help you to make your query faster.
Thanks.

Key problem: Which key strategy should I use in my database?

Problem: When I use an auto-incrementing primary key in my database, this happens all the time:
I want to store an Order with 10 Items. The ordered Items belong to the Order. So I store the order, ask the database for the last inserted id (which is dangerous when it comes to concurrency, right?), and then store the 10 Items with the foreign key (order_id).
So I always have to do:
INSERT ...
last_inserted_id = db.lastInsertId();
INSERT ...
INSERT ...
INSERT ...
and I believe this prevents me from using transactions in almost all INSERT cases where I need a foreign key.
So... here some solutions, and I don't know if they're really good:
A) Don't use auto_increment keys! Use a key table?
Key Table would have two fields: table_name, next_key. Every time I need a key for a table to insert a new dataset, first I ask for the next_key by accessing a special static KeyGenerator class method. This does a SELECT and an UPDATE, if possible in one transaction (would that work?). Of course I would request that for every affected table. Next, I can INSERT my entire object graph in one transaction without playing ping-pong with the database, before I know the keys already in advance.
B) Using GUUID / UUID algorithm for keys?
These suppose to be really unique worldwide, and they're LARGE. I mean ... L_A_R_G_E. So a big amount of memory would go into these gigantic keys. Indexing will be hard, right? And data retrieval will be a pain for the database - at least I guess - integer keys are much faster to handle. On the other hand, these also provide some security: Visitors can't iterate anymore over all orders or all users or all pictures by just incrementing the id parameter.
C) Stick with auto_incremented keys?
Ok, if then, what about transactions like described in the example above? How can I solve that? Maybe by inserting a Ghost Row first and then doing an transaction with one UPDATE + n INSERTs?
D) What else?

When storing orders, you need transactions to prevent situations where only half your products are added to the database.
Depending on your database and your connector, the value returned by the last-insert-id function might be transaction-independent. For instance, with MySQL, mysql_insert_id returns the identifier for the last query from that particular client (without being affected by what other clients are doing concurrently).

Which database are you using?
Yes, typically inserting a record and then trying to select it again to find the auto-generated key is bad, especially if you are using a naive select max(id) from table query. This is because as soon as two threads are creating records max(id) may not actually return the last id your current thread used.
One way to avoid this is to create a sequence in the database. From your code you select sequence.NextValue then use that value to then execute your inserts (or you can craft a more complex SQL statement that does this selection and the inserts in one go). Sequences are atomic / thread-safe.
In MySQL you can ask for the last inserted id from the execution results which I believe will always give you the correct answer.

Sql Server supports SCOPE_IDENTITY (Transact-SQL) which should take care of your transaction issue and concurrency issue.
I would say stick with auto_increment.

(Assuming you are using MySQL)
"ask the database for the last inserted id (which is dangerous when it comes to concurrency, right?)"
If you use MySQLs last_insert_id() function, you only see what happened in your session. So this is safe. You mention ths:
db.last_insert_id()
I don't know what framework or language it is, but I would assume that uses MySQL's last_insert_id() under the covers (if not, it is a pretty useless database abstraction fromework)
" I believe this prevents me from using transactions in almost all INSERT cases w"
I don't see why. Please explain.

D) Sequence
: may not be available in your DBMS, but if it is, solves your problem elegantly.
For Postgresql, have a look at Sequence Functions

There is no final and general answer to this question.
auto incrementing columns are easy to use when you add new records. To use them as foreign keys within the same transaction, they are not so straight forward. You need database specific commands to get the newly created key. This technology is common for certain databases, for instance sql server.
Sequences seem to be harder to use, because you need to get a key before you insert a row, but at the end its easier to use them as foreign keys. This technology is common for certain databases, for instance oracle.
When you use Hibernate or NHibernate, it is discouraged to use auto incrementing keys, because some optimizations are not possible anymore. Using a hi-lo algorithm which uses an additional table is recommended.
Guids are strong, for instance when sharing data between different databases, systems, disconnected scenarios, import / export etc. In many databases, most of the tables contain only a few hundred records, so memory and performance are not such an issue. When using NHibernate, you get an guid generator which produces sequential guids, because some databases perform better when keys are sequential.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.