Avoid hard coding in mysql query - php

I have a table of ban reasons:
id | short_name | description
1 virus Virus detected in file
2 spam Spammy file
3 illegal Illegal content
When I ban a file for being a virus, in my code I do this:
$file -> banVirus();
Which inserts the file id and ban reason into a table:
"INSERT INTO 'banned_files' VALUES (61234, 1)"
My question is; is it a problem that I have hard-coded the value 1?, to indicate a spam file.
Should I use defines in my config like define ('SPAM', 1), so i can replace 1 with a define? Or does it not matter at all?

If the id is an auto incrementing field, then it is a very big problem! Since the ids are automatically generated, it's hard to guarantee their stability; i.e. they may change.
If the id is something you manually assigned, it's not such a big problem, but it's bad practice. Because magic numbers easily lead to confusion and mistakes. Who knows what "1" means when reading your code?
So either way, you'd be better off to assign a stable, readable id to each case.
I agree with #Tenner that it also hardly makes sense to have a table for this static, unchanging data to begin with. Your banned_files table should have a column like this:
reason ENUM('virus', 'spam', 'illegal') NOT NULL
You need nothing more in your database. When outputting this for the user, you can add a readable reason with a simple array through your PHP code.

Since you have a fixed (and small) number of parameters, I'd be tempted to make the IDs an enum in your code and not even include them as a separate database table at all.
Think about something like gender -- which has two (or more) options, both fixed. (We won't be adding multiple new genders anytime soon.) I guarantee most registration systems' don't have a GENDER table with two entries in it.
So, table banned_files would be something like this:
id | reason
--------+------------
12345 | 1
67890 | 2
and your code would contain enums as necessary:
enum BanReason {
Virus = 1,
Spam = 2,
Illegal = 3
}
(please convert to PHP; I'm a C# developer!)
In PHP:
$aBanReason = array(
'Virus' => 1,
'Spam' => 2,
'Illegal' => 3
);

Related

Storing in array or in fields? Which is better? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Today I was working on my website and I asked myself a simple question.
Does storing an array with all informations is better than saving those one in different fields?
For example if I store a word, a password and a number in one field on the database in this way
+-------------+----------------------------------------------------------------+
| Field | Value |
+-------------+----------------------------------------------------------------+
| all | ["test","fa26be19de6bff93f70bc2308434e4a440bbad02","25468684888"] |
+-------------+----------------------------------------------------------------+
Is it better than saving it in this way?
+-------------+------------------------------------------+
| Field | Value |
+-------------+------------------------------------------+
| word | test |
| password | fa26be19de6bff93f70bc2308434e4a440bbad02 |
| number | 25468684888 |
+-------------+------------------------------------------+
I think that the first method is faster than the last one because you need only to SELECT one field and not three or more. What do you think about it?
The second method. By far.
You should never put more than one piece of data into a single column.
A single row of data shuld contain all the information you need:
id name password
1 Fluff itsASecret
2 Flupp Ohnoes
Basically, it has to do with updates, selects, searches and pretty much everything that databases do. They are made to do it on single columns, not little bits of data inside a string.
Taking your example, how do you update the password? How do you put an index on the user ID?
What if you also had a bit of data called "NumberOfVotes" If you had it all in one column in a pseudo-array, how do you get a tally of all the votes cast by all users? Would you REALLY want to pull each entry out into PHP, explode it out, add it to the running total and THEN display how many votes have been cast? What if you had a million users?
If you store everything in a ingle column, you could do a tally really easily like this:
select
sum(NumberOfVotes)
from
yourTableName
Edit (Reply to faster query):
Absolutely not, the time it takes to compelte a query will come down to two things:
1) Time it takes to execute the query
2) Time it takes to return all the data.
In this case, the time it takes to return the data will be the same, after all, the database is returning the same amount of bytes. However, with tables that are properly set up, just FINDING the right data will be faster by orders of magnitue.
As an example of how difficult it would be to simply USE a table that has the various bits of information all mumbled together, try to write a query to update the "number" value in the row that starts with the word "test".
Having said that, there are possibly some potential cases where it can in fact be okay to store multiple "fields" of data in one column. I once saw (and copied) an exceptionally interesting permissions system for users that stored the various permissions in binary and each digit in the number equated to being allowed/not being allowed to perform a certain type of action. That was however one interesting example - and is pretty much what I would call an exception that proves the rule :)
I think that the first method is faster
is your main problem actually. You are comparing solutions from only "is it faster" point of view. While you have no measure to tell if there is any difference at all. Or, if even there is, if such a difference does matter at all. So, the only your reason is a false one. While you completely overlook indeed important, essential reasons like proper database design.
Saving in separate fields is a lot more flexible as you are then able to easily search/manipulate data using SQL queries, whereas if they were in an array you would frequently find yourself needing to parse data outside SQL. Consider the following example:
+-------------+----------------------------------------------------------------+
| Field | Value |
+-------------+----------------------------------------------------------------+
| all | ["1","fa26be19de6bff93f70bc2308434e4a440bbad02","25468684888"] |
+-------------+----------------------------------------------------------------+
Using the above table, you need to find the number field for the user with id 1, however there is nothing to search for, you can't simply to a query for the value 1 somewhere in the all field, as that would find every instance of the number 1!
You'll also encounter this problem when changing data in your DB, as you'll have to get the current array, parse it, change the value, then reinsert it.
Also you'll need to put some form of ID as a field to act as a primary key.
However with separate fields for each value, it's fairly simple:
+-------------+------------------------------------------+
| Field | Value |
+-------------+------------------------------------------+
| id | 1 |
| password | fa26be19de6bff93f70bc2308434e4a440bbad02 |
| number | 25468684888 |
+-------------+------------------------------------------+
SELECT `number` FROM mytable WHERE id = 1
The second option is better because its more readable and maintainable.
If someone who didnt write the code has to maintain it, the first option is terrible.
If you ever need to change a field, or add a field, likewise, the first option is a nightmare.
The second option requires much less work.
Keep it simple!
I think given example is trivial and that's why answer for specific example is 2nd method. But there are time's when first method is far more easy to implement. For example you create pages for website dynamically from admin panel, and in start you don't know all the values that will be used in every page. So you put general options like in 2nd method, and put something like page_data and use it to store serialized object. Now you should use serialized object for data that are not likely to change individually, as they are treated as single piece of data.
In your code you fetch serialized object, do unserialize and use them as normal. This way you can add page specific data that are not generalized for every page, but still the page's are the same.

mysql reorder rows with unique constraint

I'm having some trouble coming up with an efficient solution to this problem. Maybe I am making it more complicated than needs to be. I have a table like this:
thing_id | user_id | order
1 1 0
2 1 1
3 1 2
The user may mess around with their things and it may happen that they change thing 1 to thing 3, and thing 3 to thing 1. In my case, it is not that the user is explicitly changing the order. Rather, they are modifying their bank of things, and they may change the thing in slot 1 to be the thing in slot 3, and vice versa. So if the user performs this operation, the table should look like this:
thing_id | user_id | order
3 1 0
2 1 1
1 1 2
What complicates this is that (thing_id, user_id) has a unique constraint, so doing sequential updates does not quite work. If I try to UPDATE tbl SET thing_id=3 WHERE thing_id=1, the unique constraint is broken.
The order column is purely for show, in order to make an alphabetized list. So I suppose I could use PHP to check the order and figure things out like that, but this introduces code that really has nothing to do with the important stuff. I'd like to find a solution that is purely/mostly SQL.
Also, along the same lines, if I were to insert a new row into the table, I would want the order value to be 3. Is there an efficient way to do this in SQL, without first having to SELECT MAX(order) WHERE user_id=1?
My comment seems to have gotten some traction, so I'm posting it as an answer... To avoid your problem, add a new column, without constraints, and just use that for user desired updates.
Why aren't you updating the order instead of the thingid?
UPDATE tbl
SET order = 2
WHERE thing_id=1;
Each row represents a "thing-user" pair. The data is the ordering that you want to use. You don't want to change the entity ("thing-user"). You want to change the data.
By the way, you'll then have to do some additional work to keep unique values in orders.
If you switched this around and put the unique constraint on user_id, order, then it would make sense to update the thing_id.

MySQL name and surname in 2 columns vs name in 1 column

One simple question and I couldn't find any answers to id :
Should name be in 2 different DB columns ( name / surname ) or in 1 column ( name + surname ) ?
In all the projects I had they were in 2 different columns, but now I have to start a new project and I was wantering how it better to store it. I mean, the 2 different columns gave me a bit of trouble and sometimes slowed performance down. Please note this very important thing :
A very important part of the public part of the site will be an advanced search and it WILL search for the full name in about 200k records.
So, what do you suggest ? 2 columns or 1 ? I am inclined twords the 1 column solution because I cannot find any advantages in using 2, but maybe I am wrong ?
EDIT
Thank you for the answers. The only reason for this question was for the performance issue, I need all the extra boost I can get.
The point of a relational database is to relate data. If you store a full name (e.g. John Smith) in a single field, you lose the ability to easily separate out the first and last names.
If you store them in separate fields, you can VERY easily rejoin them into a single full name, but it's quite difficult to reliably pull a name apart into separate first + last name components.
Two columns is much more flexible. Eg.
Do you ever want to sort by surname?
Do you ever want to address the person formally (eg: Dear Mr Cosmin)?
Will you ever want to search by surname and not forename, or vice versa?
200K records is a trivial amount in any properly designed database.
You may find this an interesting read on the subject of names
With two columns, you can sort by surname without having to do expensive substring operations in your select statement. It is easy to do a CONCAT to get the full name in situations that call for it, but harder to parse the last name out of names such as "John Doe-Smith" or "John Doe III".
Using 2 columns helps you in:
easy sorting data by surname
communication with user by name (eg. "Hello Michael" used on many websites etc.)
displaying a lot of data in multiple columns (you can display only surname when you have no space on screen)
Names stored in format "Surname Name" is still easy to sort, but may be seen as inelegant in some countries.
In my opinion, I'd rather designed it as two different colmns because you can have various ways to handle the record. About performance issue, add an index on two columns to make faster searching.
There are times when you want to search for John Doe and wanting that even it is reverse Doe John but still matches to John Doe. That's one advantage of having separate fields on the name.
Sample design of schema,
CREATE TABLE PersonList
(
ID INT AUTO_INCREMENT,
FirstName VARCHAR(25),
LastName VARCHAR(25),
-- other fields here,
CONSTRAINT tb_pk PRIMARY (ID),
CONSTRAINT tb_uq UNIQUE (FirstName, LastName)
)

json column vs multiple columns

i don't even know if calling it serialized column is right, but i'm going to explain myself, for example, i have a table for users, i want to store the users phone numbers(cellphone, home, office, etc), so, i was thinkin' to make a column for each number type, but at the same time came to my head an idea, what if i save a json string in a single column, so, i will never have a column that probably will never be used and i can turn that string into a php array when reading the data from database, but i would like to hear the goods and bads of this practice, maybe it is just a bad idea, but first i want to know what other people have to say about
thanks
Short Answer, Multiple columns.
Long Answer:
For the love of all that is holy in the world please do not store mutiple data sets in a single text column
I am assuming you will have a table that will either be
+------------------------------+ +----------------------+
| User | cell | office | home | OR | User | JSON String |
+------------------------------+ +----------------------+
First I will say both these solutions are not the best solution but if you were to pick the from the two the first is best. There are a couple reasons mainly though the ability to modify and query specifically is really important. Think about the algrothim to modify the second option.
SELECT `JSON` FROM `table` WHERE `User` = ?
Then you have to do a search and replace in either your server side or client side language
Finally you have to reinsert the JSON string
This solution totals 2 queries and a search and replace algorithm. No Good!
Now think about the first solution.
SELECT * FROM `table` WHERE `User` = ?
Then you can do a simple JSON encode to send it down
To modify you only need one Query.
UPDATE `table` SET `cell` = ? WHERE `User` = ?
to update more than one its again a simple single query
UPDATE `table` SET `cell` = ?, `home` = ? WHERE `User` = ?
This is clearly better but it is not best
There is a third solution Say you want a user to be able to insert an infinite number of phone numbers.
Lets use a relation table for that so now you have two tables.
+-------------------------------------+
+---------+ | Phone |
| Users | +-------------------------------------+
+---------+ | user_name| phone_number | type |
| U_name | +-------------------------------------+
+---------+
Now you can query all the phone numbers of a user with something like this
Now you can query the table via a join
SELECT Users., phone. FROM Phone, Users WHERE phone.user_name = ? AND Users.U_name = ?
Inserts are just as easy and type checking is easy too.
Remember this is a simple example but SQL really provides a ton of power to your data-structure you should use it rather than avoiding it
I would only do this with non-essential data, for example, the user's favorite color, favorite type of marsupial (obviously 'non-essential' is for you to decide). The problem with doing this for essential data (phone number, username, email, first name, last name, etc) is that you limit yourself to what you can accomplish with the database. These include indexing fields, using ORDER BY clauses, or even searching for a specific piece of data. If later on you realize you need to perform any of these tasks it's going to be a major headache.
Your best best in this situation is using a relational table for 1 to many objects - ex UserPhoneNumbers. It would have 3 columns: user_id, phone_number, and type. The user_id lets you link the rows in this table to the appropriate User table row, the phone_number is self explanatory, and the type could be 'home', 'cell', 'office', etc. This lets you still perform the tasks I mentioned above, and it also has the added benefit of not wasting space on empty columns, as you only add rows to this table as you need to.
I don't know how familiar you are with MySQL, but if you haven't heard of database normalization and query JOINs, now is a good time to start reading up on them :)
Hope this helps.
If you work with json, there are more elegant ways than MySQL. Would recommend to use either another Database working better with json, like mongoDB or a wrapper for SQL like Persevere, http://www.persvr.org/Documentation (see "Perstore")
I'm not sure what the advantages of this approach would be. You say "so, i will never have a column that probably will never be used..." What I think you meant was (in your system) that sometimes a user may not have a value for each type of phone number available, and that being the case, why store records with empty columns?
Storing records with some empty columns is not necessarily bad. However, if you wanted to normalize your database, you could have a separate table for user_phonenumber, and create a 1:many relationship between user and user_phonenumber records. The user_phonenumber table would basically have four columns:
id (primary key)
userid (foreign key to user table)
type (e.g. cellphone, home, office, etc.)
value (the phone number)
Constraints would be that id is a primary key, userid is a foreign key for user.id, and type would be an enum (of all possible phone number types).

Is there a way (or best practice) to markup the head (<th> equivalent) of a CSV document?

I am exporting data from a database using PHP to convert it into a CSV. I figured it'd be useful to provide the first row with a title (similar to the <th> element in HTML) so the end user would understand the column's meanings. Example
=============
| id | name |
=============
| 0 | tim |
| 1 | tom |
=============
Which would look like this as a CSV
id, name
0, tim
1, tom
Is there a way to mark up the first row's columns or do anything differently that programs that often read CSVs (example Microsoft Excel) will mark it up accordingly. I.e. provide a semantic hook to inform the client (possibly Excel but not restricted to) that this is a column header?
Nope. And to make it even more fun, there's nothing that says that the header line has to be present at all. Good times, good times...
One key thing to avoid with CSVs is to avoid using 'ID' as the first characters in the file. The lowercase 'id' or double-quoted '"ID"' is acceptable, but if Excel comes across upper-case 'ID' it tries to open the file as a SYLK file and fails.
(edit: note that single quotes in the above should be ignored)
The best practice I can think of myself is to make the headings the first row only. But this is obviously common sense.

Categories