In a multilanguage website, should I reference the language with numbers or keywords?
For example, let's say an english person selects a service from a list of services, the list of services will be in english, while if a spanish person selects from a list of services, the list will be in spanish.
The list of services is selected from a table in a database, each service has a unique number to identify it, and something to identify in what language the service is written.
What I'm asking is, which is better. To use a number to identify the language, or to use a language code?
Example:
hypothetical table of services:
id | service_id | service | lang
------------------------------------
0 | cooking | 1 | en
1 | driving | 2 | en
2 | singing | 3 | en
3 | running | 4 | en
4 | cocinar | 1 | es
5 | conducir | 2 | es
6 | cantar | 3 | es
7 | correr | 4 | es
VS
id | service_id | service | lang
------------------------------------
0 | cooking | 1 | 1
1 | driving | 2 | 1
2 | singing | 3 | 1
3 | running | 4 | 1
4 | cocinar | 1 | 2
5 | conducir | 2 | 2
6 | cantar | 3 | 2
7 | correr | 4 | 2
Where I give a numerical id to every language
I can see the language code approach makes the database more human readable, but why should it really matter if the server handles it all anyway, where numbers are easier for the server, but then I would have to give a number to every language.
So which approach do you think is better and why?
I would almost always normalize such things, but this may be a rare exception for the following reasons:
An nchar(2) column would only occupy 4 bytes, which is the same as an int column. Therfore, performance should not be impacted, especially if you set the coalation to ordinal.
The two-character language codes are in international standards which are extremely unlikely to ever be changed. So massive updates should not be an issue.
So the arguments for normalization do not really apply in this case.
There is an ISO standardized set of language codes. I'd just go with using those like example 1. You should probabbly have a secondary table that lists the short 2/3 digit codes to the long spelled out version as well.
Related
For an online game, I have a table that contains all the plays, and some information on those plays, like the difficulty setting etc.:
+---------+---------+------------+------------+
| play-id | user-id | difficulty | timestamp |
+---------+---------+------------+------------+
| 1 | abc | easy | 1335939007 |
| 2 | def | medium | 1354833214 |
| 3 | abc | easy | 1354833875 |
| 4 | abc | medium | 1354833937 |
+---------+---------+------------+------------+
In another table, after the game has finished, I store some stats related to that specific game, like the score etc:
+---------+----------------+--------+
| play-id | type | value |
+---------+----------------+--------+
| 1 | score | 201487 |
| 1 | enemies_killed | 17 |
| 1 | gems_found | 4 |
| 2 | score | 110248 |
| 2 | enemies_killed | 12 |
| 2 | gems_found | 7 |
+---------+----------------+--------+
Now, I want to make a distribution graph so users can see in what score percentile they are. So I basically want the boundaries of the percentiles.
If it would be on a score level, I could rank the scores and start from there, but it needs to be on a highscore level. So mathematically, I would need to sort all the highscores of users, and then find the percentiles.
I'm in doubt what's the best approach here.
On one hand, constructing an array that holds all the highscores seems like a performance heavy thing to do, because it needs to cycle through both tables and match the scores and the users (the first table holds around 10M rows).
On the other hand, making a separate table with the highscore of users would make things easier, but it feels like it's against the rules of avoiding data redundancy.
Another approach that came to mind was doing the performance heavy thing once a week and keep the result in a separate table, or doing the performance heavy stuff on only a (statistically relevant) subset of the data.
Or maybe I'm completely missing the point here and should use a completely different database setup?
What's the best practice here?
I need to design a db model for a backend module where user can translate page content into multiple languages. The things that will be translated are basic words, phrases, link names, titles, field names, field values. They should also be grouped so i can find them by group name. For example if there is a select field on page with different colors as options then i should be able to select all of them by group name.
So here is what i have at the moment:
lang
+----+---------+
| id | name |
+----+---------+
| 1 | english |
| 2 | german |
+----+---------+
lang_entity
+----+------------+-------------+-------+-------+
| id | module | group | name | order |
+----+------------+-------------+-------+-------+
| 1 | general | | hello | 0 |
| 2 | accounting | colorSelect | one | 1 |
| 3 | accounting | colorSelect | two | 2 |
| 4 | accounting | colorSelect | three | 3 |
+----+------------+-------------+-------+-------+
lang_entity_translation
+----+---------+----------------+-------------+
| id | lang_id | lang_entity_id | translation |
+----+---------+----------------+-------------+
| 1 | 1 | 1 | Hello |
| 2 | 2 | 1 | Guten tag |
| 3 | 1 | 2 | One |
| 4 | 2 | 2 | Ein |
| 5 | 1 | 3 | Two |
| 6 | 2 | 3 | Zwei |
| 7 | 1 | 4 | Three |
| 8 | 2 | 4 | Drei |
+----+---------+----------------+-------------+
So lang table holds different languages.
Table lang_entity has entities that can be translated for different languages.
Module row is just to group them by page modules in the backend translating module. Also this gives me possiblity to have entities with same name for different modules.
Group as mentioned is needed for selects and maybe some other places where multiple values are going to be used. This also gives me an option to allow user to add and order entities in one group.
And table lang_entity_translation holds the translations for each entity in each language.
So my question is are visible flaws in this kind of a design? Would you reccomend something different?
Also a bonus question: I really dont like the lang_entity table name, do you have a better idea of a table name that would hold all the words/phrases that are translated? :)
Edit: similar, but not a duplicate. The linked question is about translating dynamic products and having a seperate table for each translated type. Im talking about translating whole page content, including groups in a single table.
I don't understand the order column of lang_entity, but then I probably don't need to.
The setup looks sane, but make sure you add foreign key constraints from lang_entity_translation to language and lang_entity.
As for naming, I would call the table phrase or translatable.
We had similar situation. This was 7 years before.
We had different column for different language. Like for name we had
Name_Eng,Name_Ger,Name_Spa .We had 7-10 language.
We had common id for name for all language.
Based on the Language selection from UI we passed the language code to Back end In the Stored proc it was appended to the column Name
Example, we will be passing "Eng" if English is selected and we form the column name as Name_Eng and fetch the data. we were using dynamic query.
I want to retrieve rows from a table based on some search criteria. The results I get have to meet those criteria and should additionally deliver a sample of the data that is as diverse as possible. A query with the same search criteria should always return the same sample though. So using RAND() in the query is no solution. The results are used to be displayed on a PHP driven website.
Example
I've got a table with accommodations, e.g. hotels, Bed&Breakfast or campgrounds. For every accommodation, there are additional informations like rating, budget, city and region. So the table basically looks like this:
| id | name | type | rating | budget | city | region |
| 1 | A Name | Hotel | 2 | 2 | New York | East |
| 2 | B Name | B&B | 3 | 2 | New York | East |
| 3 | C Name | Hotel | 4 | 3 | New York | East |
| 4 | A Name | Hotel | 3 | 4 | Chicago | Central |
| 5 | D Name | B&B | 4 | 3 | Chicago | Central |
| 6 | E Name | Hotel | 2 | 2 | Omaha | Central |
| 7 | F Name | Hotel | 5 | 4 | Omaha | Central |
| 8 | G Name | Camping | 2 | 4 | Yosemite | West |
I now need a query that gets ten accommodations from e.g. region='Central' which contains as many cities as possible, as many accommodation types as possible, cheap accommodations as well as expensive ones and so on. I don't need a mathematically perfect solution, just something consistent.
Idea 1
I could query the table multiple times, with several different where clauses and mix the results. But querying multiple times is a big drawback for a web application.
Idea 2
I could introduce an additional column random that is filled by a random value while inserting the data and then do an order by random. The drawback of this solution is that random is a bad heuristic for what I want to accomplish.
I have a question on making the effective database structure for accounting code. The result I was expecting is this
| ID | Code | Name | Parent |
| 1 | 1 | Assets | |
| 2 | 1 | Tangible Fixed Assets | 1 |
| 3 | 1 | Building | 2 |
| 4 | 2 | Intangible Fixed Assets| 1 |
| 5 | 1 | CopyRights | 3 |
I've been thinking about making 3 tables such as tbl_lvl1 for main parent, tbl_lvl2 as first child and tbl_lvl3 as second child. I found about recursive query, which is just only using 1 table, but it's kind of difficult making recursive query in MYSQL.
And the result I want to view in PHP, is something like this
| Code | Name |
| 1 | Assets |
| 11 | Tangible Fixed Assets |
| 111 | Building |
| 12 | Intangible Fixed Asset |
| 121 | CopyRights |
Which structure I should make? Using 3 table or 1 table ? Thank you
You're looking for a search tree, and I'd especially suggest a B-tree.
A search tree, generally spoken, allows you to hierarchically search for all sub-nodes in a single query through nested intervals.
There are literally dozens of implementations, so you don't need to dig deep into the details, even though I would suggest it, as it's a major data structure that you should be used to.
Okay, so lets say that we have 4 columns and 3 rows of data.
|user_id|pick_1|pick_2|pick_3|
-------------------------------
|fred |C++ |java | php |
------------------------------
|eric |java |C++ | php |
------------------------------
|sam | C++ | php | java |
------------------------------
So right now, users are entering their favorite languages. The first pick(pick_1) would be the favorite programming language and the second pick (pick_2) would be the 2nd favorite programming language and etc.
How can I organize this in a way so that I can give a point value according to what columns the programming languages are. So maybe pick_1 can give 3 points, pick_2 can give 2 points and pick_3 can give 1 point.
So when you tally up the scores, C++ will have 8 points, java will have 6 points, and php will have 4 points.
That way I can give an overall ranking of what tends to be the more favorable programming language. Like so
|rank|language|points|
----------------------
| 1 | C++ | 8 |
----------------------
| 2 | java | 6 |
----------------------
| 3 | php | 4 |
----------------------
It doesn't even need to have a point system, I just couldn't think of another way to rank the languages on a scale of liked to un-liked. So if there's another way to yield the same results than please let me know. Otherwise how would I be able to do this. Preferably in just MySql. I am currently using PHP.
Thank you for reading.
You need a simpler structure
User_ID | Pick | Points
Fred c++ 3
Fred php 2
Fred java 1
This way you can do a simple sum(points) group by pick
for a SQL only solution, I would normalize your structure, and put the picks in a different table:
users: user_id; user_name
picks: pick_id; user_id; language; points;
then you would have your data in 2 tables:
| user_id | user_name |
-----------------------
| 1 | Fred |
-----------------------
| 2 | Eric |
-----------------------
| 3 | Sam |
-----------------------
| pick_id | user_id | language | points |
---------------------------------------------
| 1 | 1 | C++ | 1 |
---------------------------------------------
| 2 | 1 | Java | 2 |
---------------------------------------------
| 3 | 1 | php | 3 |
---------------------------------------------
| 4 | 2 | Java | 1 |
---------------------------------------------
| 5 | 2 | C++ | 2 |
---------------------------------------------
| 6 | 2 | php | 3 |
---------------------------------------------
| 7 | 3 | C++ | 1 |
---------------------------------------------
| 8 | 3 | Java | 2 |
---------------------------------------------
| 9 | 3 | php | 3 |
---------------------------------------------
And then use the following query to fetch the desired result:
SELECT language, SUM(points) FROM users JOIN picks ON users.user_id=picks.user_id GROUP BY language
As seen in this fiddle
This way it's also easy to add constraints so people can not vote for a language more then once, or give the same amount of votes to 2 different languages.