Best MySQL datatype

Best MySQL datatype - php

I am trying to figure a good way to store these a series of points in a database.
The couple ways I thought would be good is first a text type and then use a delimiter to sperate the points so it would look like this in the db
+----------------------------+
| x_points | 1:2:3:4:5... |
| y_points | 1:2:3:4:5... |
+----------------------------+
Then the web app would pull the points and plot them on a canvas.
Is there a better way to store points to a line in a db?
For complex functions the points could be 1000 points per graph and then a delimiter for each point so a boat load of characters.
Because of the comments I will try to elaborate more. I am using a canvas to plot out functions that a user inputs. The user will also be able to draw on the graph and I would like to store the line data of the drawing as well both I figured could be stored the same way and the computation of the points only needs to happen once.
A sample seniro would be a user can plot y=x^2 and then circle the y-intercept. Then they could link to that canvas and it would redraw their circle of the y-intercept and the graph. Of course this is simplistic example but I cannot figure out how to store the points on the canvas best.
Hope this helps more.

This is called a one-to-many relationship. You have one shape and many actual points. You would usually want to put the points into a separate table, like this.
+----------------------+ +-----------------+
| Shape | | Point_Values |
+----------------------+ +-----------------+
| shape_id (INT) | | shape_id (INT) |
| shape_name (VARCHAR) | | point_x (INT) |
+----------------------+ | point_y (INT) |
+-----------------+
To make a new shape, insert a new shape into the Shape table. Then create one or more values in the Point_Values table. When you want to go back and get the values, you would use a join, like this:
SELECT s.shape_name, v.point_x, v.point_y
FROM Shape s
JOIN Point_Values v ON s.shape_id = v.shape_id
WHERE s.shape_id = 5
This has the advantage of being very flexible. Each shape can have 0 or more points, and there is an implicit enforcement that there must be an equal number of x and y points.

Related

How can I implement a language system in my mysql database?

I am creating a video player application with php and mysql.
The application has videos that are gathered in playlists like this:
Playlists table:
+----+------------------+------+
| id | name | lang |
+----+-------------------------+
| 1 | Introduction | 1 |
+----+-------------------------+
Videos table:
+----+--------------+-------------+
| id | name | playlist_id |
+----+--------------+-------------+
| 1 | Video1 | 1 |
| 2 | Video2 | 1 |
+----+--------------+-------------+
It worked fine until now, because I need to build a searcher that finds videos depending on its name and language.
I though of creating another field called lang in the videos table, but then I realize that this maybe would contradict the normalization database rules. Because I would be repeating unnecessary information.
What can I do to select the videos without creating another field? Or do I need to create a new one with the repeated information?
EDIT:
JOIN LEFT both tables is not a solution, because I maybe add in the future a new table that links to playlists such as courses.

You can make LANGUAGE_ID COLUMN in Videos table,which will foreign key references to Playlists.lang .
Try above solution.
Hope this will help you.

You need to be clear about what attribute you want to assign to which entity (playlist, video or possibly course). You can assign language ids to both, playlist and video list items independently. Who is to say that you are not allowed to include a video with a language id of 2 in a playlist that carries a language id of 1? (This could, for example be a video in a foreign language that you want to appear in a playlist of your own language).
To search for suitable items you should then definitely use some kind of join (on video.playlist_id=playlist.id). The resulting table will contain both, video.language_id and playlist.language_id, which is not a case of having redundant information, as I have tried to explain above since they refer to different entities.

Group coordinates by proximity to each other

I'm building a REST API so the answer can't include google maps or javascript stuff.
In our app, we have a table containing posts that looks like that :
ID | latitude | longitude | other_sutff
1 | 50.4371243 | 5.9681102 | ...
2 | 50.3305477 | 6.9420498 | ...
3 | -33.4510148 | 149.5519662 | ...
We have a view with a map that shows all the posts around the world.
Hopefully, we will have a lot of posts and it will be ridiculous to show thousands and thousands of markers in the map. So we want to group them by proximity so we can have something like 2-3 markers by continent.
To be clear, we need this :
Image from https://github.com/googlemaps/js-marker-clusterer
I've done some research and found that k-means seems to be part of the solution.
As I am really really bad at Math, I tried a couple of php libraries like this one : https://github.com/bdelespierre/php-kmeans that seems to do a decent job.
However, there is a drawback : I have to parse all the table each time the map is loaded. Performance-wise, it's awful.
So I would like to know if someone already got through this problematic or if there is a better solution.

I kept searching and I've found an alternative to KMeans : GEOHASH
Wikipedia will explain better than me what it is : Wiki geohash
But to summarize, The world map is divided in a grid of 32 cells and to each one is given an alpha-numeric character.
Each cell is also divided into 32 cells and so on for 12 levels.
So if I do a GROUP BY on the first letter of hash I will get my clusters for the lowest zoom level, if I want more precision, I just need to group by the first N letters of my hash.
So, what I've done is only added one field to my table and generate the hash corresponding to my coordinates:
ID | latitude | longitude | geohash | other_sutff
1 | 50.4371243 | 5.9681102 | csyqm73ymkh2 | ...
2 | 50.3305477 | 6.9420498 | p24k1mmh98eu | ...
3 | -33.4510148 | 149.5519662 | 8x2s9674nd57 | ...
Now, if I want to get my clusters, I just have to do a simple query :
SELECT count(*) as nb_markers FROM mtable GROUP BY SUBSTRING(geohash,1,2);
In the substring, 2 is level of precision and must be between 1 and 12
PS : Lib I used to generate my hash

"horizontal" vs. "vertical" table design, SQL

Apologies if this has been covered thoroughly in the past - I've seen some related posts but haven't found anything that satisfies me with regards to this specific scenario.
I've been recently looking over a relatively simple game with around 10k players. In the game you can catch and breed pets that have certain attributes (i.e. wings, horns, manes). There's currently a table in the database that looks something like this:
-------------------------------------------------------------------------------
| pet_id | wings1 | wings1_hex | wings2 | wings2_hex | horns1 | horns1_hex | ...
-------------------------------------------------------------------------------
| 1 | 1 | ffffff | NULL | NULL | 2 | 000000 | ...
| 2 | NULL | NULL | NULL | NULL | NULL | NULL | ...
| 3 | 2 | ff0000 | 1 | ffffff | 3 | 00ff00 | ...
| 4 | NULL | NULL | NULL | NULL | 1 | 0000ff | ...
etc...
The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes. A new attribute is added every 1-2 months which requires table columns to be added. The table is rarely updated and read frequently.
I've been proposing that we move to a more vertical design scheme for better flexibility as we want to start adding larger volumes of attributes in the future, i.e.:
----------------------------------------------------------------
| pet_id | attribute_id | attribute_color | attribute_position |
----------------------------------------------------------------
| 1 | 1 | ffffff | 1 |
| 1 | 3 | 000000 | 2 |
| 3 | 2 | ffffff | 1 |
| 3 | 1 | ff0000 | 2 |
| 3 | 3 | 00ff00 | 3 |
| 4 | 3 | 0000ff | 1 |
etc...
The old developer has raised concerns that this will create performance issues as users very frequently search for pets with specific attributes (i.e. must have these attributes, must have at least one in this colour or position, must have > 30 attributes). Currently the search is quite fast as there are no JOINS required, but introducing a vertical table would presumably mean an additional join for every attribute searched and would also triple the number of rows or so.
The first part of my question is if anyone has any recommendations with regards to this? I'm not particularly experienced with database design or optimisation.
I've run tests for a variety of cases but they've been largely inconclusive - the times vary quite significantly for all of the queries that I ran (i.e. between half a second and 20+ seconds), so I suppose the second part of my question is whether there's a more reliable way of profiling query times than using microtime(true) in PHP.
Thanks.

This is called the Entity-Attribute-Value-Model, and relational database systems are really not suited for it at all.
To quote someone who deems it one of the five errors not to make:
So what are the benefits that are touted for EAV? Well, there are none. Since EAV tables will contain any kind of data, we have to PIVOT the data to a tabular representation, with appropriate columns, in order to make it useful. In many cases, there is middleware or client-side software that does this behind the scenes, thereby providing the illusion to the user that they are dealing with well-designed data.
EAV models have a host of problems.
Firstly, the massive amount of data is, in itself, essentially unmanageable.
Secondly, there is no possible way to define the necessary constraints -- any potential check constraints will have to include extensive hard-coding for appropriate attribute names. Since a single column holds all possible values, the datatype is usually VARCHAR(n).
Thirdly, don't even think about having any useful foreign keys.
Finally, there is the complexity and awkwardness of queries. Some folks consider it a benefit to be able to jam a variety of data into a single table when necessary -- they call it "scalable". In reality, since EAV mixes up data with metadata, it is lot more difficult to manipulate data even for simple requirements.
The solution to the EAV nightmare is simple: Analyze and research the users' needs and identify the data requirements up-front. A relational database maintains the integrity and consistency of data. It is virtually impossible to make a case for designing such a database without well-defined requirements. Period.
The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes.
That looks like a case for normalization: Break the table into multiple, for example one for horns, one for wings, all connected by foreign key to the main entity table. But do make sure that every attribute still maps to one or more columns, so that you can define constraints, data types, indexes, and so on.

Do the join. The database was specifically designed to support joins for your use case. If there is any doubt, then benchmark.
EDIT: A better way to profile the queries is to run the query directly in the MySQL interpretter on the CLI. It will give you the exact time that it took to run the query. The PHP microtime() function will also introduce other latencies (Apache, PHP, server resource allocation, network if connection to a remote MySQL instance, etc).

What you are proposing is called 'normalization'. This is exactly what relational databases were made for - if you take care of your indexes, the joins will run almost as fast as if the data were in one table.
Actually, they might even go faster: instead of loading 1 table row with 100 columns, you can just load the columns you need. If a pet only has 8 attributes, you only load those 8.

This question is a very subjective. If you have the resources to update the middleware to reflect the column that has been added then, by all means, go with horizontal there is nothing safer and easier to learn than a fixed structure. One thing to remember, anytime you update a tables structure you have to update each one of its dependencies unless there is some catch-all like *, which I suggest you stay aware from unless you are just dumping data to a screen and order of columns is irrelevant.
With that said, Verticle is the way to go if you don't have all of your requirements in place or don't have the desire to update code in n number of areas. Most of the time you just need storage containers to store data. I would segregate things like numbers, dates, binary, and text in separate columns to preserve some data integrity, but there is nothing wrong with verticle storage, as long as you know how to formulate and structure queries to bring back the data in the appropriate format.
FYI, Wordpress uses verticle data storage for majority of the dynamic content it has to store for the millions of uses it has.

First thing from Database point of view is that your data should be grow vertically not in horizontal way. So, adding a new column is not a good design at all. Second thing, this is very common scenario in DB design. And the way to solve this you have to create three tables. 1st is of Pets, 2nd is of Attributes and 3rd is mapping table between theres two. Here is the example:
Table 1 (Pet)
Pet_ID | Pet_Name
1 | Dog
2 | Cat
Table 2 (Attribute)
Attribute_ID | Attribute_Name
1 | Wings
2 | Eyes
Table 3 (Pet_Attribute)
Pet_ID | Attribute_ID | Attribute_Value
1 | 1 | 0
1 | 2 | 2
About Performance:
Pet_ID and Attribute_ID are the primary keys which are indexed (http://developer.mimer.com/documentation/html_92/Mimer_SQL_Engine_DocSet/Basic_concepts4.html), so the search is very fast. And this is the right way to sovle the problem. Hope, now it will be clear to you.

Database Normalisation and Data Entry (admin backend)

Take a look at the items table below, as you can see this table is not normalized. Name should in a separate table to normalize it.
mysql> select * from items;
+---------+--------+-----------+------+
| item_id | cat_id | name | cost |
+---------+--------+-----------+------+
| 1 | 102 | Mushroom | 5.00 |
| 2 | 2 | Mushroom | 5.40 |
| 3 | 173 | Pepperoni | 4.00 |
| 4 | 109 | Chips | 1.00 |
| 5 | 35 | Chips | 1.00 |
+---------+--------+-----------+------+
This table is not normalize because on the backend Admin site, staff simply select a category and type in the item name to add data quickly. It is very quick. There are hundreds of same item name but the cost is not always the same.
If I do normalize this table to something like this:
mysql> select * from items;
+---------+--------+--------------+------+
| item_id | cat_id | item_name_id | cost |
+---------+--------+--------------+------+
| 1 | 102 | 1 | 5.00 |
| 2 | 2 | 1 | 5.40 |
| 3 | 173 | 2 | 4.00 |
| 4 | 109 | 3 | 1.00 |
| 5 | 35 | 3 | 1.00 |
+---------+--------+--------------+------+
mysql> select * from item_name;
+--------------+-----------+
| item_name_id | name |
+--------------+-----------+
| 1 | Mushroom |
| 2 | Pepperoni |
| 3 | Chips |
+--------------+-----------+
Now how can I add item (data) on the admin backend (data entry point of view) because this table has been normalized? I don't want like a dropdown to select item name - there will be thousands of different item name - it will take a lot of of time to find the item name and then type in the cost.
There need to be a way to add item/data quick as possible. What is the solution to this? I have developed backend in PHP.
Also what is the solution for editing the item name? Staff might rename the item name completely for example: Fish Kebab to Chicken Kebab and that will effect all the categories without realising it. There will be some spelling mistake that may need correcting like F1sh Kebab which should be Fish Kebab (This is useful when the tables are normalized and I will see item name updated every categories).

I don't want like a dropdown to select item name - there will be thousands of different item name - it will take a lot of of time to find the item name and then type in the cost.
There are options for selecting existing items other than drop down boxes. You could use autocompletion, and only accept known values. I just want to be clear there are UI friendly ways to achieve your goals.
As for whether to do so or not, that is up to you. If the product names are varied slightly, is that a problem? Can small data integrity issues like this be corrected with batch jobs or similar if they are a problem?
Decide what your data should look like first, based on the design of your system. Worry about the best way to structure a UI after you've made that decision. Like I said, there are usable ways to design UI regardless of your data structuring.

I think you are good to go with your current design, for you name is the product name and not the category name, you probably want to avoid cases where renaming a single product would rename too many of them at once.
Normalization is a good thing but you have to measure it against your specific needs and in this case I really would not add an extra table item_name as you shown above.
just my two cents :)

What are the dependencies supposed to be represented by your table? What are the keys? Based on what you've said I don't see how your second design is any more normalized that your first.
Presumably the determinants of "name" in the first design are the same as the determinants of "item_name_id" in the second? If so then moving name to another table won't make any difference to the normal forms satisified by your items table.
User interface design has nothing to do with database design. You cannot let the UI drive the database design and expect sensible results.

You need to validate the data and check for existence prior to adding it to see if it's a new value.
$value = $_POST['userSubmittedValue']
//make sure you sanitize the variable (never trust user input)
$query = SELECT item_name_id
FROM item_name
WHERE name='$value';
$result = mysql_query($query);
$row = mysql_fetch_row($result);
if(!empty($row))
{
//add the record with the id from $row['item_name_id'] to items table
}
else
{
//this will be a new value so run queries to add the new value to both items and item_name tables
}

There need to be a way to add item/data quick as possible. What is the
solution to this? I have developed backend in PHP.
User interface issues and database structure are separate issues. For a given database structure, there are usually several user-friendly ways to present and change the data. Data integrity comes from the database. The user interface just needs to know where to find unique values. The programmer decides how to use those unique values. You might use a drop-down list, pop up a search form, use autocomplete, compare what the user types to the elements in an array, or query the database to see whether the value already exists.
From your description, it sounds like you had a very quick way to add data in the first place: "staff simply select a category and type in the item name to add data quickly". (Replacing "mushroom" with '1' doesn't have anything to do with normalization.)
Also what is the solution for editing the item name? Staff might
rename the item name completely for example: Fish Kebab to Chicken
Kebab and that will effect all the categories without realising it.
You've allowed the wrong person to edit item names. Seriously.
This kind of issue arises in every database application. Allow only someone trained and trustworthy to make these kinds of changes. (See your dbms docs for GRANT and REVOKE. Also take a look at ON UPDATE RESTRICT.)
In our production database at work, I can insert new states (for the United States), and I can change existing state names to whatever I want. But if I changed "Alabama" to "Kyrgyzstan", I'd get fired. Because I'm supposed to know better than to do stuff like that.
But even though I'm the administrator, I can't edit a San Francisco address and change its ZIP code to '71601'. The database "knows" that '71601' isn't a valid ZIP code for San Francisco. Maybe you can add a table or two to your database, too. I can't tell from your description whether something like that would help you.
On systems where I'm not the administrator, I'd expect to have no permissions to insert rows into the table of states. In other tables, I might have permission to insert rows, but not to update or delete them.
There will be some spelling mistake that may need correcting like F1sh
Kebab which should be Fish Kebab
The lesson is the same. Some people should be allowed to update items.name, and some people should not. Revoke permissions, restrict cascading updates, increase data integrity using more tables, or increase training.

Find Similar Descriptions in Database PHP/MySQL

We are building a help desk application for running our service company, and I am trying to figure out to assist the call center people in assigning a category based the problem description from the customer.
My primary idea, is to compare the description the customer gave, to prior descriptions, and use the category that was used in the prior service calls based on the most common category assigned.
Any ideas how to do it?
My description field is a blob field as some descriptions are quite long. I would prefer to find a way to do this that requires the least system resources.
Thanks for any input :)
Mike

I'm a person of custom code; I don't feel the job is done right if you use big, bloated systems, so take this with a grain of salt if you are not wanting to code this yourself. However, this might not be as hard as you're making it; yes, I would definitely go with a tagging system. However, it doesn't have to be so complicated.
Here's how I would handle it:
First, make a database with 3 tables; one for categories, tags, and 'links' (links between categories and tags).
Then, create a PHP function that initializes an array (empty works just fine) and pushes new (lowercased) words if they don't exist. An example of this might be:
<?php
// Pass the new description to this
// function.
function getCategory($description)
{
// Lowercase it all
$description = strtolower($description);
// Kill extra whitespace
$description = trim($description);
$description = preg_replace('~\s\s+~', ' ', $description);
// Kill anything that isn't a number or a letter
// NOTE: This is untested, so just edit this however you'd like to make it work. The
// idea is to just eliminate everything that isn't a letter or number. Just don't take out
// spaces; we need them!
$descripton = trim($description, "!##$%^&*()_+-=[]{};:'\"\\\n\r|<>?,./");
// Now the description should just contain words with a single space in between them.
// Let's break them up.
$dict = explode(" ", $description);
// And find the unique ones...
$dict = array_unique($dict, SORT_STRING);
// If you wanted to, you could trim either common words you specify,
// or any words under, say, 4 characters. Up to you!
return $dict;
}
?>
Next, populate your database how you want; make a few categories and some tags, and then link them together (if you want to get fancy, switch the MySQL engine to InnoDB and make relationships. Makes things a bit quicker!)
Table `Categories`
|-------------------------|
| Column: Category |
| Rows: |
| Food |
| Animals |
| Plants |
| |
|-------------------------|
Table `Tags`
|-------------------------|
| Column: Tag |
| Rows: |
| eat |
| hamburger |
| meat |
| leaf |
| stem |
| seed |
| fur |
| hair |
| claws |
| |
|-------------------------|
Table `Links`
|-------------------------|
| Columns: tag, category |
| Rows: |
| eat, Food |
| hamburger, Food |
| meat, Food |
| leaf, Food |
| leaf, Plant |
| stem, Plant |
| fur, Animals |
| ... |
|-------------------------|
By using MySQL InnoDB relationships, the links table will not take up any more space by creating rows; this is because they are linked, in a way, and are all stored by reference. This will immensely cut down on database size.
Now, for the kicker, a clever mysql query to the database, which follows these steps:
For each category, sum up the tags belonging both to the category and the description dictionary (which we created in the earlier PHP function).
Sort them from greatest to least
Pull the top 1 or 3 or however many suggested categories you'd like!
This will get you a nice list of categories that have the highest matching count of tags. How you want to craft the MySQL query is up to you.
While this seems like a lot of setup, it really isn't. You have 3 tables at most, one or two PHP functions and a few MySQL queries. The database will only be as big as the categories, the tags and the references to both (in the links table; references don't take up much space!)
To update the database, simply put in tags that don't exist to the tags database and link them to the category you decided to assign to the description. This will broaden your database's range of tags and will, over time, get your database more tuned to your descriptions (i.e. more accurate).
If you wanted to get really detailed, you'd insert duplicate links between categories and tags to create a sort of weighted tag system, which would make your system even more accurate.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.