Why shouldn't commas be included in sql field names? - php

People keep telling me that spaces shouldn't be included in column names. I was just wondering, why is that? It is an issue I am having with a few database tables I am creating for school. The field names include Preble and Darke.
Instead, they need to be "Preble County (OH)" and "Darke County (OH)". If they were row names, I would just create an ID # column, and natural join that with a table displaying names that I want them to be ("Darke County (OH)" instead of "Darke").
However, I have no idea how I can go about changing these names since they are my field names. Can anyone help me out? Any help would be greatly appreciated.

tldr; Trying to included commas (or other special characters) generally indicates a fundamental flaw with the column name2.
This is a bad database design. It is bad because it tries to encode information into a column name1. This is not the point of a column name! A column name is merely a friendly moniker for an element of a record/tuple - nothing more. Under Relational Algebra (RA), and thus SQL, it is the record/tuple that contains the information.
Besides just leading to a schema that is hard to deal with (extra quoting syntax) that requires hard-coding queries based on changing information (column names and multiplicity), it is also impossible to use with a number of RA techniques in a flexible manner. RA can only generally handle multiplicity across records - and, as discovered, this includes joins.
Instead, the schema should look similar to, say:
County State Other Columns
======= ===== =============
Darke OH ..
Prebel OH ..
Where the Key is, say, (State,County) and "Other Columns" are dependent upon the key. Of course the model should be correctly normalized in relation to all the other information that is captured.
Note that there is no information stored in the column names presented above: the names are merely friendly monikers representing the information stored in each column.
1 Now a PIVOT transformation, which is primarily for human output/display, can be performed as needed. This is were a column in the output table (not schema table!) is generated per set of record values. However, this is a secondary issue and should not affect the primary schema.
If using SQL Server, one could first UNPIVOT the information-filled column names, perform the join and then re-PIVOT (given a known set of column names). However, I have no idea how this would be done in MySQL - a messy dynamic query, perhaps. In any case, this is an approach I would avoid.
2 While special characters are not allowed in bare identifiers, it is possible to use such names when used in quoted identifiers - e.g. `Vancouver, WA` or, with ANSI quotes, "Vancouver, WA". However, keep reading the rest of this response which argues against using such identifiers.

Related

MySQL: enumeration tables - use only one or many?

Until now, I still made more enumerating tables with columns ID and CODE .. (money_type, payment_type, shipping_type ...).
Is it better to make every enumerating table separately or only one ? with columns ID, CODE, TYPE ... where type will be "money, payment, shipping".
In my system there will be minimal 50 enumerating tables..
From a pure technical point of view, there is little against creating a giant lookup table for such purposes. However, there are some valid business reasons against doing so:
Security. You may not want the same user to be able to edit all lookup data. If you store all of them in a single table, then restricting access to certain records only can be difficult. Obviously, you can do that with views, but if you create as many views as lookup types, then what's the point of having a single table in the first place?
Configrability. In many cases these lookup tables may hold more data than just an id and a human readable description and the additional data would be specific for that given parameter only. For example, on a tax code lookup table you may be able to specify that the given tax code is only applicable to domestic transactions only, and not applicable on cross-border ones (such as VAT). If you have a giant table holding all configuration data, then such customization is a lot more difficult.
Obviously, if the purpose of having these lookup tables is to provide a human readable description (e.g. a definitions table you can use for GUI), then you can have a single table for that. Otherwise I would go for the 1 lookup table per parameter approach, even if you need to have 50 lookup tables.

Is it necessary to validate column names when submitting an SQL Query?

In my SQL Queries I am submitting data from forms filled out by the user, and as shown here it is not possible to parameterize my column names with PDO. This is important because the column names in the query are inserted dynamically based on the field names in the form. I can rather easily validate the column names submitted in the $_POST array by simply pulling them out of the database and throwing out any that don't match. Is this a good thing to do to avoid SQL injection or is simply a waste of system resources (as it effectively doubles the execution of any request that relies on the Database)?
Is this a good thing to do to avoid SQL injection
No.
or is simply a waste of system resources
No.
It cannot be a waste as it's just a simple select from the system table.
But it is still can be a some sort of injection when a user isn't allowed to some fields. Say, if there is an (imaginary) field "user_role" filled by site admin and a user will have a possibility to define it in the POST, they can alter their access privileges.
So, hardcoding (whitelisting) allowed fields is the only reliable way.
as it effectively doubles the execution of any request that relies on the Database
Man. Databases intended to be queried. It's the only their purpose. A database that cannot sustain a simple select query is a nonsense. Queries are different. An insert one is way more heavy than 10 selects. You have to distinguish queries by quality, not quantity.
the column names in the query are inserted dynamically based on the field names in the form.
Though for the insert/update queries it is quite true, for the SELECT ones it is a BIG SIGN of the bad design. I can stand variable field names in the WHERE/ORDER BY clauses but if you have to vem in the fieldset of table name clauses - your database design is wrong for sure.
Aside from hard-coding the list of columns, you could build a list of columns via another table in your database that you want to allow column querying from, such as
QuerableSources
SrcTable SrcColumn DescriptToUser
SomeTable SomeColumn Column used for
AnotherTable AnotherColumn Something Else
etc.
Then, you build for example a combobox for a user to pick the "DescriptionToUser" content for easier readability, and YOU control the valid column and table source.
As for the VALUE they are searching for, DEFINITELY Scrub / clean it to prevent SQL-Injection.
You can hard-code the column names to make it faster. You can also cache the pulled table description, so that you don't need to update the code every time table schema changes.

Is a country name better stored as an integer/number in a database rather than a string containing it's name?

I'd like to know because I'm working on a search form and using thinking sphinx and for filtered attributes it seems like only integers are accepted but my countries are stored in the db as strings containing their names.
So I will be creating my own country list with strings to show in the select menu and values as integers to store in the db. Just wondering why the developer of the gem (ruby on rails) that I'm using decided to create an array containing just strings.
This question isn't programming language specific. A database is a database.
Kind regards.
I would suggest to have the countries in a separate table with an unique ID assigned to each of them. It makes no harm but really does make the database structure more flexible. This way you can add more information related to the countries in case you ever need to, and relate other tables to them, if you need to.
It might even be a performance issue to look up rows by a string. Querying another table to find the country ID's should not create too much pressure.
of course the country name will always be string, just that whether directly use the country name as primary key, or use another column of integer as primary key
my stand is use the integer as primary key, so that update of country name later would be easier (even though unlikely but could happen). as for the problem of searching based on country name, just create an index on it.
Storing a main lookup table with the integer mapping of a country would be better.
Then the country can be referenced in other tables as the country id(integer).
One reason is when we try to access any data related to a country then comparing
the query country string against the actual string stored in DB is more expensive comparison as
compared to comparing just two numbers.
I think you should store the countries in a separate table say id, name,short-name and then you can use its id for searching..
My personal opinion is that it would fit more when storing country names as a string in a databese because country names do not change (usually) and there are no "updates" on them.
So i think it would be legit to store them as strings.

Mysql phpMyAdmin few questions:

I am quite new to the mysql phpMyadmin environment, and I would like to have some area
1. I need a field of text that should be up to around 500 characters.
Does that have to be "TEXT" field? does it take the application to be responsible for the length ?
indexes. I understand that when I signify a field as "indexed", that means that field would have a pointer table and upon each a WHERE inclusive command, the search would be optimized by that field (log n complexity). But what happens if I signify a field as indexed after the fact ? say after it has some rows in it ? can I issue a command like "walk through all that table and index that field" ?
When I mark fields as indexed, I sometimes get them in phpMyAdmin as having the keyname
for accessing the table by the indexed field when I write php, does it take an extra effort on my side to use that keyname that is written down there at the "structure" view to use the table as indexed, or does that keyname is being used behind the scenes and I should not care about it whatsoever ?
I sometimes get the keynames referencing two or more fields altogether. The fields show one on top of the other. I don't know how it happened, but I need them to index only one field. What is going on ?
I use UTF-8 values in my db. When I created it, I think I marked it as utf8_unicode_ci, and some fields are marked as utf8_general_ci, does it matter ? Can I go back and change the whole DB definition to be utf8_general_ci ?
I think that was quite a bit,
I thank you in advance!
Ted
First, be aware that this not per se something about phpmyadmin, but more about mysql / databases.
1)
An index means that you make a list (most of the time a tree) of the values that are present. This way you can easily find the row with that/those values. This tree can be just as easily made after you insert values then before. Mind you, this means that all the "add to index" commands are put together, so not something you want to do on a "live" table with loads of entries. But you can add an index whenever you want it. Just add the index and the index will be made, either for an empty table or for a 'used' one.
2)
I don't know what you mean by this. Indexes have a name, it doesn't really matter what it is. A (primary) key is an index, but not all indexes are keys.
3)
You don't need to 'force' mysql to use a key, the optimizer knows best how and when to use keys. If your keys are correct they are used, if they are not correct they can't be used so you can't force it: in other words: don't think about it :)
4)
PHPMYADMIN makes a composite keys if you mark 2 fields as key at the same time. THis is annoying and can be wrong. If you search for 2 things at once, you can use the composite key, but if you search for the one thing, you can't. Just mark them as a key one at a time, or use the correct SQL command manually.
5)
you can change whatever you like, but I don't know what will happen with your values. Better check manually :)
If you need a field to contain 500 characters, you can do that with VARCHAR. Just set its length to 500.
You don't index field by field, you index a whole column. So it doesn't matter if the table has data in it. All the rows will be indexed.
Not a question
The indexes will be used whenever they can. You only need to worry about using the same columns that you have indexed in the WHERE section of your query. Read about it here
You can add as many columns as you wish in an index. For example, if you add columns "foo", "bar" and "ming" to an index, your database will be speed optimized for searches using those columns in the WHERE clause, in that order. Again, the link above explains it all.
I don't know. I'm 100% sure that if you use only UTF-8 values in the database, it won't matter. You can change this later though, as explained in this Stackoverflow question: How to convert an entire MySQL database characterset and collation to UTF-8?
I would recommend you scrap PHPMyAdmin for HeidiSQL though. HeidiSQL is a windows client that manages all your MySQL servers. It has lots of cool functions, like copying a table or database directly from one MySQL server to another. Try it out (it's free)

How to apply normalization on mysql using php

Please I don't have any idea. Although I've made some readings on the topic. All I know is it is used to make the data in the database more efficient and easy to handle. And It can also be used to save disk space. And lastly, if you used normalization. You will have to generate more tables.
Now I have a lot of questions to ask.
First, how will normalization help to save disk space or whatever space occupied by the database.
Second, Is it possible to add data on multiple tables using only 1 query.
Please help, I'm just a newbie wanting to learn from you. Thanks.
Ok, couple of things:
php has got nothing to do with this. normalization is about modelling data
normalization is not about saving disk space. It is about organizing data so that it is easily maintainable, which in turn is a way to maintain data-integrity.
normalization is typically described in a few stages or 'normal forms'. In practice, people that design relational databases often intuitively 'get it right' most of the time. But it is still good to be aware of the normal forms and what their characteristics are. There is a lot of documentation on that on the internet (fe http://en.wikipedia.org/wiki/Database_normalization), and you should certainly do you own research, but the most important stages are:
unormalized data: in this stage, data is not truly tabular ('relational'). There is a lot of discussion of what tabular really means, and experts disagree with one another. but most people agree that data is unnormalized in case there are multi-valued attributes (=columns that can for one row contain lists as value), or in case there are repeating groups (=multiple columns or multiple groups of columns for storing the same type of data)
Example of multi-valued column: person (first_name, last_name, phonenumbers)
Here, phonenumbers implies there could be more phonenumbers, stored in one column
Example of repeating group: person(first_name, last_name, child1_first_name, child1_birth_date, child2_first_name, child2_birth_date..., childN_first_name, childN_birth_date)
Here, the person table has a number of column pairs (child_first_name, child_birth_date) to store the person's children.
Note that something like order (shipping_address, billing_address) is not a repeating group: the addresses for billing and shipping may be similar pieces of data, but each has its own distinct role for an order, both just represent a different aspect of an order. child1 thru child10 do not - children do not have specific roles, and the list of children is variable (you never know how many groups you should reserve in advance)
In both cases, multi-valued columns and repeating groups, you basically have "nested table" structure - a table within a table. Data is said to be in 1NF (first normal form) if neither of these occur.
The 1NF is about structural characeristics: the tabular form of the data. All subsequenct normal forms have to do with eliminating redundancy. Redundancy occurs when the same information is independently stored multiple times. Redundancy is bad: if you want to change some fact, you have to change it in multiple places. If you forget to chance one of them, you have inconsistent data - the data is contradicting itself.
There are a lot of processes that can eliminate redundancy, each leading to a higher normal form, all the way from 1nf up to 6nf. However, typically most databases are adequately normalized at 3nf (or a lsight variation of that called boyce-codd normal form, BCNF) You should study 2nf and 3nf, but the principle is very simple: a table is adequately normalized, if:
the table is in 1nf
the table has a key (a column or column combination whose values are required, and which uniquely identifies a row - ie. there can be only one row having that combination of values in the key columns)
there are no functional dependencies between the non-key columns
non-key columns are not functionally dependent upon part of the key (but are completely functionally dependent upon the entire key).
functional dependency means that a column's value can be derived from another column. simple example:
order_item (order_id, item_number, customer_id, product_code, product_description, amount)
let's assume (order_id, item_number) is key. product_code and product description are functionally dependent upon each other: for one particular product_code, you will always find the same product description (as if product description is a function of product_code). The problem is now: suppose a product description changes for a particualr product code, you have to change all orders that us that product_code. forget only one and you have an inconsistent database.
The way to solve it is to create a new product table with (product_code, product_description), having (product_code) as key, and then instead of storing all product fields in order, only store a reference to a row in the product table in the order_item records (in this case, order_item should only keep product_code, which is sufficient to look up a row in the product table and find the product_description)
So as you u can see, with this solution you do actually save space (by not storing all these product descriptions in each order_item that happens to order the product) and you do get more tables (split off product from order_item) But just remember that it is not because of saving diskspace: it is because you eliminate redundancy, thus making it easier to maintain the data. because now you only have to change one row in the product table to change the description
There are a lot of similar questions on StackOverflow already, for example, Can someone please give an example of 1NF, 2NF and 3NF in plain english?
Look in the Related sidebar to the right for a bunch of them. That'll get you started.
As for your specific questions:
Normalization saves disk space by reducing redundant data storage. This has another benefit: if you have multiple copies of a given entity attribute in your database, they can get out of sync, while if you have a normalized database and use referential integrity, this cannot happen.
The INSERT statement references only one table. A TRIGGER on the insert statement can add rows to other tables, but there's no way to supply data to the trigger other than those columns in the table that spawned it.
When you need to insert dependent rows after inserting a row to the parent table, use the LAST_INSERT_ID() function to retrieve the auto-generated primary key value of the last INSERT statement in your session.
I think you will learn this when you start creating the schema for your database.
Please think reverse when you add a field that exists somewhere else in your database.
By reverse I mean, ask yourself: if I have to modify the field, how many queries do I have to run?
Probably you end up, with the answer, that you will have to run 2 or X times the query to modify the content of your column.
Keep it simple, that means assign an ID to each content you have duplicated in your database.
For example taking column address
this is not good
update clients set address = 'new address' where clientid=500;
update orders set address = 'new address' where orderid=300;
good approach would be
create a addresses table
//and run a single query
update addresses set address = 'new address' where addressid=100;
And use the address id 100 everywhere in your database table as a foreign key reference (clients+orders), this way you achieve that the id 100 is not changed, but if you update the content of the address all linked tables will pick up the change.
Level 3 of normalization is enough this time for you.
Normalization is a set of rules. The more you follow, the higher a "level" of normalisation your database has. In general, level 3 is the highest level sought after.
Normalised data is theoretically "purer" than non-normalised data. This makes it easier to rationalise about it, and it removes redundancy, which is reduces the chance of data getting out of sync.
From a pratical viewpoint however, normalised data isn't always the best design, even if it is in theory. If you don't really know the finer points, aiming for normalised data isn't such a bad idea though.
in phpmyadmin > 4.3.0, in structure -> Table structure, we got above the table:
"Print" "Propose table structure" "Track table" "Move columns" "Improve table structure" , in "Improve table structure" you got a wizard which says :
Improve table structure (Normalization):
Select up to what step you want to normalize
First step of normalization (1NF)
Second step of normalization (1NF+2NF)
Third step of normalization (1NF+2NF+3NF)
To question 2: No it is not possible to insert data into multiple tables with one query.
See the INSERT syntax.
In addition to other answers, you can also search here on SO for normalization and find e.g. the question: Normalization in MySQL

Categories