Is it necessary to validate column names when submitting an SQL Query?

Is it necessary to validate column names when submitting an SQL Query? - php

In my SQL Queries I am submitting data from forms filled out by the user, and as shown here it is not possible to parameterize my column names with PDO. This is important because the column names in the query are inserted dynamically based on the field names in the form. I can rather easily validate the column names submitted in the $_POST array by simply pulling them out of the database and throwing out any that don't match. Is this a good thing to do to avoid SQL injection or is simply a waste of system resources (as it effectively doubles the execution of any request that relies on the Database)?

Is this a good thing to do to avoid SQL injection
No.
or is simply a waste of system resources
No.
It cannot be a waste as it's just a simple select from the system table.
But it is still can be a some sort of injection when a user isn't allowed to some fields. Say, if there is an (imaginary) field "user_role" filled by site admin and a user will have a possibility to define it in the POST, they can alter their access privileges.
So, hardcoding (whitelisting) allowed fields is the only reliable way.
as it effectively doubles the execution of any request that relies on the Database
Man. Databases intended to be queried. It's the only their purpose. A database that cannot sustain a simple select query is a nonsense. Queries are different. An insert one is way more heavy than 10 selects. You have to distinguish queries by quality, not quantity.
the column names in the query are inserted dynamically based on the field names in the form.
Though for the insert/update queries it is quite true, for the SELECT ones it is a BIG SIGN of the bad design. I can stand variable field names in the WHERE/ORDER BY clauses but if you have to vem in the fieldset of table name clauses - your database design is wrong for sure.

Aside from hard-coding the list of columns, you could build a list of columns via another table in your database that you want to allow column querying from, such as
QuerableSources
SrcTable SrcColumn DescriptToUser
SomeTable SomeColumn Column used for
AnotherTable AnotherColumn Something Else
etc.
Then, you build for example a combobox for a user to pick the "DescriptionToUser" content for easier readability, and YOU control the valid column and table source.
As for the VALUE they are searching for, DEFINITELY Scrub / clean it to prevent SQL-Injection.

You can hard-code the column names to make it faster. You can also cache the pulled table description, so that you don't need to update the code every time table schema changes.

Related

PHP HTML Filter Feature

What would be the best way to create a filter feature in PHP so the user can select which records they want displayed in a table.
For example, a database of addresses. I want to add a feature so the user can select which county they want displayed in a table (A filter feature).
I plan to just run a for each to loop through each of the rows and add the county to a checkbox group so then the user can check which of the counties they want to be displayed then I can base my MySQL query on that but I figures it would take time especially since I'd be having 5000 or more records.
What would be the most convenient way to achieve this and is there a command or a feature in PHP to get all the unique values in a column so I can list them in the filter box?

Focus more on the SQL aspect of the problem. Try:
SELECT DISTINCT column_name FROM table_name;
Also, you could make your database such that the countries are foreign keys to another COUNTRY table. this will reduce a huge amount of redundancy as the database is not filled with repeated mentions of same names.
NOTE: Never retrieve all the entries from the database and run for each loops, try to retrieve as few rows as possible from the database.

In Doctrine/PDO, how do I fetch sql query column names

I'm writing a piece of software that generates a pdf report out of a raw user-defined SQL query execution.
The pdf contains a simple table with rows containing SQL result rows. I'd like to add a table header with column names retrieved along with the SQL results.
The column headers in SQL may have various structures, e.g:
a) select * from users;
b) select name, surname, email from users;
c) select name as UserName, surname as UserSurname, email as UserEmail from users;
So far I fetch SQL results as association array, take the keys of the first row and treat them as column names.
It works only if there is at least 1 result in result set, so it's a heavy flaw in this approach.
I could generate pdf with "No results" label.
I could run a regex on a SQL query for named columns and execute describe table x, but this is plain ridiculous.
I also have even more ridiculous ideas, but that's not the way.
Is there anybody having any idea for solving this?
I use Doctrine on MySQL for this, but simple PDO approach would be just as good as Doctrine's one.
EDIT
Right after posting this question it came to my mind I could generate a view out of my SQL query, then run SHOW COLUMNS FROM randomViewName; and drop the view immediately afterwards.
It's hacky and needs some db security work (I can handle that), but it's a working candidate.
What do you think?

It may not be perfect solution, but I go along with the approach mentioned in the question.
I create a MySQL view. This allows me to have an db object which is queryable and have exact column names as I want.
describe nameOfViewYouJustCreated;
gives me exactly what I need.
The view is dropped afterwards.

mysql show table / columns - performance question

I'm working on a basic php/mysql CMS and have a few questions regarding performance.
When viewing a blog page (or other sortable data) from the front-end, I want to allow a simple 'sort' variable to be added to the querystring, allowing posts to be sorted by any column. Obviously I can't accept anything from the querystring, and need to make sure the column exists on the table.
At the moment I'm using
SHOW TABLES;
to get a list of all of the tables in the database, then looping the array of table names and performing
SHOW COLUMNS;
on each.
My worry is that my CMS might take a performance hit here. I thought about using a static array of the table names but need to keep this flexible as I'm implementing a plugin system.
Does anybody have any suggestions on how I can keep this more concise?
Thankyou

If you using mysql 5+ then you'll find database information_schema usefull for your task. In this database you can access information of tables, columns, references by simple SQL queries. For example you can find if there is specific column at the table:
SELECT count(*) from COLUMNS
WHERE
TABLE_SCHEMA='your_database_name' AND
TABLE_NAME='your_table' AND
COLUMN_NAME='your_column';
Here is list of tables with specific column exists:
SELECT TABLE_SCHEMA, TABLE_NAME from COLUMNS WHERE COLUMN_NAME='your_column';

Since you're currently hitting the db twice before you do your actual query, you might want to consider just wrapping the actual query in a try{} block. Then if the query works you've only done one operation instead of 3. And if the query fails, you've still only wasted one query instead of potentially two.
The important caveat (as usual!) is that any user input be cleaned before doing this.

You could query the table up front and store the columns in a cache layer (i.e. memcache or APC). You could then set the expire time on the file to infinite and only delete and re-create the cache file when a plugin has been newly added, updated, etc.

I guess the best bet is to put all that stuff ur getting from Show tables etc in a file already and just include it, instead of running that every time. Or implement some sort of caching if the project is still in development and u think the fields will change.

Ids from mysql massive insert from simultaneous sources

I've got an application in php & mysql where the users writes and reads from a particular table. One of the write modes is in a batch, doing only one query with the multiple values. The table has an ID which auto-increments.
The idea is that for each row in the table that is inserted, a copy is inserted in a separate table, as a history log, including the ID that was generated.
The problem is that multiple users can do this at once, and I need to be sure that the ID loaded is the correct.
Can I be sure that if I do for example:
INSERT INTO table1 VALUES ('','test1'),('','test2')
that the ids generated are sequential?
How can I get the Id's that were just loaded, and be sure that those are the ones that were just loaded?
I've thinked of the LOCK TABLE, but the users shouldn't note this.
Hope I made myself clear...

Building an application that requires generated IDs to be sequential usually means you're taking a wrong approach - what happens when you have to delete a value some day, are you going to re-sequence the entire table? Much better to just let the values fall as they may, using a primary key to prevent duplication.

based on the current implementation of myisam and innodb, yes. however, this is not guaranteed to be so in the future, so i would not rely on it.

How to apply normalization on mysql using php

Please I don't have any idea. Although I've made some readings on the topic. All I know is it is used to make the data in the database more efficient and easy to handle. And It can also be used to save disk space. And lastly, if you used normalization. You will have to generate more tables.
Now I have a lot of questions to ask.
First, how will normalization help to save disk space or whatever space occupied by the database.
Second, Is it possible to add data on multiple tables using only 1 query.
Please help, I'm just a newbie wanting to learn from you. Thanks.

Ok, couple of things:
php has got nothing to do with this. normalization is about modelling data
normalization is not about saving disk space. It is about organizing data so that it is easily maintainable, which in turn is a way to maintain data-integrity.
normalization is typically described in a few stages or 'normal forms'. In practice, people that design relational databases often intuitively 'get it right' most of the time. But it is still good to be aware of the normal forms and what their characteristics are. There is a lot of documentation on that on the internet (fe http://en.wikipedia.org/wiki/Database_normalization), and you should certainly do you own research, but the most important stages are:
unormalized data: in this stage, data is not truly tabular ('relational'). There is a lot of discussion of what tabular really means, and experts disagree with one another. but most people agree that data is unnormalized in case there are multi-valued attributes (=columns that can for one row contain lists as value), or in case there are repeating groups (=multiple columns or multiple groups of columns for storing the same type of data)
Example of multi-valued column: person (first_name, last_name, phonenumbers)
Here, phonenumbers implies there could be more phonenumbers, stored in one column
Example of repeating group: person(first_name, last_name, child1_first_name, child1_birth_date, child2_first_name, child2_birth_date..., childN_first_name, childN_birth_date)
Here, the person table has a number of column pairs (child_first_name, child_birth_date) to store the person's children.
Note that something like order (shipping_address, billing_address) is not a repeating group: the addresses for billing and shipping may be similar pieces of data, but each has its own distinct role for an order, both just represent a different aspect of an order. child1 thru child10 do not - children do not have specific roles, and the list of children is variable (you never know how many groups you should reserve in advance)
In both cases, multi-valued columns and repeating groups, you basically have "nested table" structure - a table within a table. Data is said to be in 1NF (first normal form) if neither of these occur.
The 1NF is about structural characeristics: the tabular form of the data. All subsequenct normal forms have to do with eliminating redundancy. Redundancy occurs when the same information is independently stored multiple times. Redundancy is bad: if you want to change some fact, you have to change it in multiple places. If you forget to chance one of them, you have inconsistent data - the data is contradicting itself.
There are a lot of processes that can eliminate redundancy, each leading to a higher normal form, all the way from 1nf up to 6nf. However, typically most databases are adequately normalized at 3nf (or a lsight variation of that called boyce-codd normal form, BCNF) You should study 2nf and 3nf, but the principle is very simple: a table is adequately normalized, if:
the table is in 1nf
the table has a key (a column or column combination whose values are required, and which uniquely identifies a row - ie. there can be only one row having that combination of values in the key columns)
there are no functional dependencies between the non-key columns
non-key columns are not functionally dependent upon part of the key (but are completely functionally dependent upon the entire key).
functional dependency means that a column's value can be derived from another column. simple example:
order_item (order_id, item_number, customer_id, product_code, product_description, amount)
let's assume (order_id, item_number) is key. product_code and product description are functionally dependent upon each other: for one particular product_code, you will always find the same product description (as if product description is a function of product_code). The problem is now: suppose a product description changes for a particualr product code, you have to change all orders that us that product_code. forget only one and you have an inconsistent database.
The way to solve it is to create a new product table with (product_code, product_description), having (product_code) as key, and then instead of storing all product fields in order, only store a reference to a row in the product table in the order_item records (in this case, order_item should only keep product_code, which is sufficient to look up a row in the product table and find the product_description)
So as you u can see, with this solution you do actually save space (by not storing all these product descriptions in each order_item that happens to order the product) and you do get more tables (split off product from order_item) But just remember that it is not because of saving diskspace: it is because you eliminate redundancy, thus making it easier to maintain the data. because now you only have to change one row in the product table to change the description

There are a lot of similar questions on StackOverflow already, for example, Can someone please give an example of 1NF, 2NF and 3NF in plain english?
Look in the Related sidebar to the right for a bunch of them. That'll get you started.
As for your specific questions:
Normalization saves disk space by reducing redundant data storage. This has another benefit: if you have multiple copies of a given entity attribute in your database, they can get out of sync, while if you have a normalized database and use referential integrity, this cannot happen.
The INSERT statement references only one table. A TRIGGER on the insert statement can add rows to other tables, but there's no way to supply data to the trigger other than those columns in the table that spawned it.
When you need to insert dependent rows after inserting a row to the parent table, use the LAST_INSERT_ID() function to retrieve the auto-generated primary key value of the last INSERT statement in your session.

I think you will learn this when you start creating the schema for your database.
Please think reverse when you add a field that exists somewhere else in your database.
By reverse I mean, ask yourself: if I have to modify the field, how many queries do I have to run?
Probably you end up, with the answer, that you will have to run 2 or X times the query to modify the content of your column.
Keep it simple, that means assign an ID to each content you have duplicated in your database.
For example taking column address
this is not good
update clients set address = 'new address' where clientid=500;
update orders set address = 'new address' where orderid=300;
good approach would be
create a addresses table
//and run a single query
update addresses set address = 'new address' where addressid=100;
And use the address id 100 everywhere in your database table as a foreign key reference (clients+orders), this way you achieve that the id 100 is not changed, but if you update the content of the address all linked tables will pick up the change.
Level 3 of normalization is enough this time for you.

Normalization is a set of rules. The more you follow, the higher a "level" of normalisation your database has. In general, level 3 is the highest level sought after.
Normalised data is theoretically "purer" than non-normalised data. This makes it easier to rationalise about it, and it removes redundancy, which is reduces the chance of data getting out of sync.
From a pratical viewpoint however, normalised data isn't always the best design, even if it is in theory. If you don't really know the finer points, aiming for normalised data isn't such a bad idea though.

in phpmyadmin > 4.3.0, in structure -> Table structure, we got above the table:
"Print" "Propose table structure" "Track table" "Move columns" "Improve table structure" , in "Improve table structure" you got a wizard which says :
Improve table structure (Normalization):
Select up to what step you want to normalize
First step of normalization (1NF)
Second step of normalization (1NF+2NF)
Third step of normalization (1NF+2NF+3NF)

To question 2: No it is not possible to insert data into multiple tables with one query.
See the INSERT syntax.
In addition to other answers, you can also search here on SO for normalization and find e.g. the question: Normalization in MySQL

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.