preferred way to make a unique column id in sql database - php

Hi all I just wanted to know if there is a preferred way of creating a unique id within a sql table so far I have tried Auto-increment of the number and this started to cause problems in future planning, so I have thought about using the php Rand() and then turning this into a string and then insert this into a database but there is a higher chance off two numbers being the same with the amount of data that will be within the database.
I was just wondering if there any suggestions of a preferred way to create a unique column_id
im open to all suggestions there is just a couple things the id needs to be easy enough to be used within other tables within inner joins and left outer joins and also used a file name as well for a download section and a upload section.

use GUID
string com_create_guid ( void )

<?php
echo uniqid();
?>
Description

Related

Many to many vs one row [duplicate]

This question already has answers here:
Many database rows vs one comma separated values row
(4 answers)
Closed 8 years ago.
I'm interested how and why many to many relationship is better than storing the information in one row.
Example: I have two tables, Users and Movies (very big data). I need to establish a relationship "view".
I have two ideas:
Make another column in Users table called "views", where I will store the ids of the movies this user has viewed, in a string. for example: "2,5,7...". Then I will process this information in PHP.
Make new table users_movies (many to many), with columns user_id and movie_id. row with user_id=5 and movie_id=7 means that user 5 has viewed movie 7.
I'm interested which of this methods is better and WHY. Please consider that the data is quite big.
The second method is better in just about every way. Not only will you utilize your DBs indexes to find records faster, it will make modification far far easier.
Approach 1) could answer the question "Which movies has User X viewed" by just having an SQL like "...field_in_set(movie_id, user_movielist) ...". But the other way round ("Which user do have viewed movie x") won't work on an sql basis.
That's why I always would go for approach 2): clear normalized structure, both ways are simple joins.
It's just about the needs you have. If you need performance then you must accept redundancy of the information and add a column. If your main goal is to respect the Normalization paradigma then you should not have redundancy at all.
When I have to do this type of choice I try to estimate the space loss of redundancy vs the frequency of the query of interest and its performance.
A few more thoughts.
In your first situation if you look up a particular user you can easily get the list of ids for the films they have seen. But then would need a separate query to get the details such as the titles of those movies. This might be one query using IN with the list of ids, or one query per film id. This would be inefficient and clunky.
With MySQL there is a possible fudge to join in this situation using the FIND_IN_SET() function (although a down side of this is you are straying in to non standard SQL). You could join your table of films to the users using ON FIND_IN_SET(film.id, users.film_id) > 0 . However this is not going to use an index for the join, and involves a function (which while quick for what it does, will be slow when performed on thousands of rows).
If you wanted to find all the users who had view any film a particular user had viewed then it is a bit more difficult. You can't just use FIND_IN_SET as it requires a single string and a comma separated list. As a single query you would need to join the particular user to the film table to get a lot of intermediate rows, and then join that back against the users again (using FIND_IN_SET) to find the other users.
There are ways in SQL to split up a comma separated list of values, but they are messy and anyone who has to maintain such code will hate it!
These are all fudges. With the 2nd solution these easy to do, and any resulting joins can easily use indexes (and possibly the whole queries can just use indexes without touching the actual data).
A further issue with the first solution is data integretity. You will have to manually check that a film doesn't appear twice for a user (with the 2nd solution this can easily be enforced using a unique key). You also cannot just add a foreign key to ensure that any film id for a user does actually exist. Further you will have to manually ensure that nothing enters a character string in your delimited list of ids.

Reasons not to use GROUP_CONCAT?

I just discovered this amazingly useful MySQL function GROUP_CONCAT. It appears so useful and over-simplifying for me that I'm actually afraid of using it. Mainly because it's been quite some time since I started in web-programming and I've never seen it anywhere. A sample of awesome usage would be the following
Table clients holds clients ( you don't say... ) one row per client with unique IDs.
Table currencies has 3 columns client_id, currency and amount.
Now if I wanted to get user 15's name from the clients table and his balances, with the "old" method of array overwriting I would have to do use the following SQL
SELECT id, name, currency, amount
FROM clients LEFT JOIN currencies ON clients.id = client_id
WHERE clients.id = 15
Then in php I would have to loop through the result set and do an array overwrite ( which I'm really not a big fan of, especially in massive result sets ) like
$result = array();
foreach($stmt->fetchAll() as $row){
$result[$row['id']]['name'] = $row['name'];
$result[$row['id']]['currencies'][$row['currency']] = $row['amount'];
}
However with the newly discovered function I can use this
SELECT id, name, GROUP_CONCAT(currency) as currencies GROUP_CONCAT(amount) as amounts
FROM clients LEFT JOIN currencies ON clients.id = client_id
WHERE clients.id = 15
GROUP BY clients.id
Then on application level things are so awesome and pretty
$results = $stmt->fetchAll();
foreach($results as $k => $v){
$results[$k]['currencies'] = array_combine(explode(',', $v['currencies']), explode(',', $v['amounts']));
}
The question I would like to ask is are there any drawbacks to using this function in performance or anything at all, because to me it just looks like pure awesomeness, which makes me think that there must be a reason for people not to be using it quite often.
EDIT:
I want to ask, eventually, what are the other options besides array overwriting to end up with a multidimensional array from a MySQL result set, because if I'm selecting 15 columns it's a really big pain in the neck to write that beast..
Using GROUP_CONCAT() usually invokes the group-by logic and creates temporary tables, which are usually a big negative for performance. Sometimes you can add the right index to avoid the temp table in a group-by query, but not in every case.
As #MarcB points out, the default length limit of a group-concatenated string is pretty short, and many people have been confused by truncated lists. You can increase the limit with group_concat_max_len.
Exploding a string into an array in PHP does not come for free. Just because you can do it in one function call in PHP doesn't mean it's the best for performance. I haven't benchmarked the difference, but I doubt you have either.
GROUP_CONCAT() is a MySQLism. It is not supported widely by other SQL products. In some cases (e.g. SQLite), they have a GROUP_CONCAT() function, but it doesn't work exactly the same as in MySQL, so this can lead to confusing bugs if you have to support multiple RDBMS back-ends. Of course, if you don't need to worry about porting, this is not an issue.
If you want to fetch multiple columns from your currencies table, then you need multiple GROUP_CONCAT() expressions. Are the lists guaranteed to be in the same order? That is, does the third field in one list correspond to the third field in the next list? The answer is no -- not unless you specify the order with an ORDER BY clause inside the GROUP_CONCAT().
I usually favor your first code format, use a conventional result set, and loop over the results, saving to a new array indexed by client id, appending the currencies to an array. This is a straightforward solution, keeps the SQL simple and easier to optimize, and works better if you have multiple columns to fetch.
I'm not trying to say GROUP_CONCAT() is bad! It's really useful in many cases. But trying to make any one-size-fits-all rule to use (or to avoid) any function or language feature is simplistic.
The biggest problem that I see with GROUP_CONCAT is that it is highly specific to MySql: if you want to port your code to run against any other platform, you would have to rewrite all queries that use GROUP_CONCAT. For example, your first query is a lot more portable - you can probably run it against any major RDBMS engine without changing a single character in it.
If you are fine with working only with MySql (say, because you are writing a tool that is meant to be specific to MySql) the queries with GROUP_CONCAT would probably go faster, because the RDBMS would do more work for you, saving on the size of the data transfer.

Is this the optimal MySQL database schema for a website that can become huge?

Im sketching out a database layout for a website that has the potential to become huge with 100's of queries a minute.
I was thinking about doing the following:
user table
id
name
(few more fields)
Pages (this one will become the biggest table)
id
titel
img
text
restaurant (this will be the row that connects the pages to the user table, i was planning on creating an index on this one to increase speed)
So im wondering if creating an index for the 'restaurant' row will increase the speed of my queries or if there is any other way to speed up things?
Thanks in advance!
If you need to do some query like :
select *
from pages
where restaurant = ...
Or like :
select *
from user
inner join pages on pages.restaurant = user.id
where user.name = '...'
Or any other condition on the restaurant column, then, you'll probably want to add an index on that column, to avoid scanning all lines on the pages table.
But note that useful/necessary indexes will almost always depend on the kind of queries you'll be doing.
Which means that it's not quite possible to accurately guess which indexes you'll need -- first, you need to know how you will access you data.
Note : you should read the How MySQL Uses Indexes section of MySQL's manual : it contains stuff that's interesting to know ;-)
As a test, you can always run your query in your preferred tool and add EXPLAIN in front. This will show you what indices are being used and/or which temporary tables had to be created etc.
EXPLAIN select *
from pages
where restaurant = ...
If you're using the InnoDB storage, you should not just use 'an index' but make use of FOREIGN KEY. Thus, you will also decrease potential integrity problems.
Suggestion: do not use restaurant as a name. Add some more tables and it will be difficult to keep track what references what. Why not call it user_id? (This is a matter of personal preference, though.)

Optimize Select Query from a table with Millions Rows

I am developing a website with Wordpress self-hosted CMS.
In one of the page, i ran a function that do a query into wordpress database, to check wether a post is already posted or not, i am comparing the title to check it.
Here is my query:
$wpdb->get_row("SELECT id FROM wp_posts WHERE post_title = '" . $title . "'", 'ARRAY_A');
So i am checking whether $title is posted or not, but i am afraid if the number of post grows, let says 1 Million Posts, i am afraid that it will be very slow..
Any suggestion on how to make this query faster? i heard about CREATE INDEX and mysql caching but i don't understand how to implement it.. any explanations and references suggestion will be highly appreciated.
Try this:
CREATE INDEX IX_wp_posts_post_title ON wp_posts (post_title)
The creation of the index will take a long time but afterward your queries should be close to instant.
create indexes on your tables based on most common columns that are used in querying data, such as here where you are looking for the post_title.
Additionally, from you building the SQL-Select statement on the fly like you are, you are wide-open for SQL-Injection attacks and should escape out the string and preferrably do with parameterized query calls.
Not quite sure what you are trying to achieve here. It doesn't seem like the end of the world to have two posts with the same title.
More concerning though is your code is totally sql-injectable. Read up on that, and use parameterised queries.
Creating an index is easy.
create index myindex on mytable ( columnname );
This will help selects... but if you are really having millions of rows, you might be better to get some proper database advice - you may need to partition your data.
If you are checking to see if a particular post is already in your database, you should be using the post's id to test instead of its title. One because it is a garanteed unique identifier (assuming it is the primary key), and two because the query will be able to search for it far far faster.

Count line breaks in a field and order by

I have a field in a table recipes that has been inserted using mysql_real_escape_string, I want to count the number of line breaks in that field and order the records using this number.
p.s. the field is called Ingredients.
Thanks everyone
This would do it:
SELECT *, LENGTH(Ingredients) - LENGTH(REPLACE(Ingredients, '\n', '')) as Count
FROM Recipes
ORDER BY Count DESC
The way I am getting the amount of linebreaks is a bit of a hack, however, and I don't think there's a better way. I would recommend keeping a column that has the amount of linebreaks if performance is a huge issue. For medium-sized data sets, though, I think the above should be fine.
If you wanted to have a cache column as described above, you would do:
UPDATE
Recipes
SET
IngredientAmount = LENGTH(Ingredients) - LENGTH(REPLACE(Ingredients, '\n', ''))
After that, whenever you are updating/inserting a new row, you could calculate the amounts (probably with PHP) and fill in this column before-hand. Or, if you're into that sort of thing, try out triggers.
I'm assuming a lot here, but from what I'm reading in your post, you could change your database structure a little bit, and both solve this problem and open your dataset up to more interesting uses.
If you separate ingredients into its own table, and use a linking table to index which ingredients occur in which recipes, it'll be much easier to be creative with data manipulation. It becomes easier to count ingredients per recipe, to find similarities in recipes, to search for recipes containing sets of ingredients, etc. also your data would be more normalized and smaller. (storing one global list of all ingredients vs. storing a set for each recipe)
If you're using a single text entry field to enter ingredients for a recipe now, you could do something like break up that input by lines and use each line as an ingredient when saving to the database. You can use something like PHP's built-in levenshtein() or similar_text() functions to deal with misspelled ingredient names and keep the data as normalized as possbile without having to hand-groom your [users'] data entry too much.
This is just a suggestion, take it as you like.
You're going a bit beyond the capabilities and intent of SQL here. You could write a stored procedure to scan the string and return the number and then use this in your query.
However, I think you should revisit the design of whatever is inserting the Ingredients so that you avoid searching strings in of every row whenever you do this query. Add a 'num_linebreaks' column, calculate the number of line breaks and set this column when you're adding the Indgredients.
If you've no control over the app that's doing the insertion, then you could use a stored procedure to update num_linebreaks based on a trigger.
Got it thanks, the php code looks like:
$check = explode("\r\n", $_POST['ingredients']);
$lines = count($check);
So how could I update all the information in the table so Ingred_count based on field Ingredients in one fellow swoop for previous records?

Categories