MySQL Database I18N, a JSON approach?

MySQL Database I18N, a JSON approach? - php

UPDATE: I've come across this question I did after some years: now I know this is a very bad approach. Please don't use this. You can always use additional tables for i18n (for example products and products_lang), with separate entries for every locale: better for indexes, better for search, etc.
I'm trying to implement i18n in a MySQL/PHP site.
I've read answers stating that "i18n is not part of database normally", which I think is a somewhat narrow-minded approach.
What about product namesd, or, like in my instance, a menu structure and contents stored in the db?
I would like to know what do you think of my approach, taking into account that the languages should be extensible, so I'm trying to avoid the "one column for each language solution".
One solution would be to use a reference (id) for the string to translate and for every translatable column have a table with primary key, string id, language id and translation.
Another solution I thought was to use JSON. So a menu entry in my db would look like:
idmenu label
------ -------------------------------------------
5 {"en":"Homepage", "it":"pagina principale"}
What do you think of this approach?

"One solution would be to use a reference (id) for the string to translate and for every translatable column have a table with primary key, string id, language id and translation."
I implemented it once, what i did was I took the existing database schema, looked for all tables with translatable text columns, and for each such table I created a separate table containing only those text columns, and an additional language id and id to tie it to the "data" row in the original table. So if I had:
create table product (
id int not null primary key
, sku varchar(12) not null
, price decimal(8,2) not null
, name varchar(64) not null
, description text
)
I would create:
create table product_text (
product_id int not null
, language_id int not null
, name varchar(64) not null
, description text
, primary key (product_id, language_id)
, foreign key (product_id) references product(id)
, foreign key (language_id) references language(id)
)
And I would query like so:
SELECT product.id
, COALESCE(product_text.name, product.name) name
, COALESCE(product_text.description, product.description) description
FROM product
LEFT JOIN product_text
ON product.id = product_text.product_id
AND 10 = product_text.language_id
(10 would happen to be the language id which you're interested in right now.)
As you can see the original table retains the text columns - these serve as default in case no translation is available for the current language.
So no need to create a separate table for each text column, just one table for all text columns (per original table)
Like others pointed out, the JSON idea has the problem that it will be pretty impossible to query it, which in turn means being unable to extract only the translation you need at a particular time.

This is not an extension. You loose all advantages of using a relational database. By way like yours you may use serialize() for much better performance of decoding and store data even in files. There is no especial meen to use SQL with such structures.
I think no problem to use columns for all languages. That's even easier in programming of CMS. A relational database is not only for storing data. It is for rational working with data (e.g. using powerful built-in mechanisms) and controlling the structure and integrity of data.

first thought: this would obviously brake exact searching in sql WHERE label='Homepage'
second: user while search would be able to see not needed results (when e.g. his query was find in other languge string)

I would recommend keeping a single primary language in the database and using an extra sub-system to maintain the translations. This is the standard approach for web applications like Drupal. Most likely in the domain of your software/application there will be a single translation for each primary language string, so you don't hav to worry about context or ambiguity. (In fact for best user experience you should strive to have unique labels for unique functionality anyway).
If you want to roll your own table, you could have something like:
create table translations (
id int not null primary key
, source varchar(255) not null // the text in the primary language
, lang varchar(5) not null // the language of the translation
, translation varchar(255) not null // the text of the translation
)
You probably want more than 2 characters for language since you'll likely want en_US, en_UK etc.

Related

What are the options, with +vs and -vs, for store and retrieval of 400-500k fields of user data?

Context
I'm implementing a website to help people learn a foreign language.
I'm working in PHP and PDO. My database backend is MySQL. (For those who are interested, the front end is all done in HTML5, CSS and Javascript.)
The essence of this question is how to best planning/structure the backend of a web app which requires storing lots of individual items of data for many users.
What I Have Already, and What I Want to Do
I have four database tables:
Contains every word of a corpus of texts in the language, with lemma
and morphological tagging. (350,000+ rows)
Contains dictionary of words, with lemma numbers that match table 1. (6-7,000 rows)
Contains list of grammar morphemes that need to be learnt. (500-1,000 rows)
Contains list of users.
I want users to have a score for how well they know every word in the corpus. For each word:
Score for recognition of lexeme meaning.
x3 different scores for different aspects of grammar parsing relevant to this
particular language.
I also want users to have a score for how well they know the different grammar morphemes. In other words, for each user, I want to store and retrieve up to 400-500k fields.
What I Would Like to Know
I'm pretty sure that I can't store all this data for each user in a database table, because the number of columns required far exceeds the maximum allowed in SQL (from my research: 1k, or maybe 4k on some systems).
At present, the only options I know about are storing the data in an xml file for each user, or in a csv file for each user.
What are my options? What are the +ves and -ves of these options? Thanks for your time and help.

I strongly recommend using (a) join table(s):
Word ID
User ID
Lexeme Score
x3 grammar Score
With a PK of (UserID, WordID) (and maybe a secondary key on WordID) you get a table, that has a max of 350k*Usercount rows, accessed only (or mostly) via PK, with close-to-perfect index locality, which seems quite manageable.
Edit
Assuming, the word and user tables each have an integer PK called id and the score is a positive int , to create your join table you would need
CREATE TABLE scores (
wordID INT NOT NULL,
userID INT NOT NULL,
lexscore UNSIGNED INT DEFAULT NULL,
gramscoreA UNSIGNED INT DEFAULT NULL,
gramscoreB UNSIGNED INT DEFAULT NULL,
gramscoreC UNSIGNED INT DEFAULT NULL,
PRIMARY KEY(userID, wordID)
)

mysql - Many tables to one table - multiple entries

I have a system which has (for the sake of a simple example) tables consisting of Customers, Vendors, Products, Sales, etc. Total 24 tables at present.
I want to add the ability to have multiple notes for each record in these tables. E.g., Each Customers record could have 0 to many notes, each Vendor record could have 0 to many notes, etc.
My thinking is that I would like to have one "Notes" table, indexed by a Noteid and Date/time stamp. The actual note data would be a varchar(255).
I am looking for a creative way to bi-directionally tie the source tables to the Notes table. The idea of having 24 foreign key type cross reference tables or 24 Notes tables doesn't really grab me.
Programming is being done in PHP with Apache server. Database is mysql/InnoDB.
Open to creative ideas.
Thanks
Ralph

I would sugges a table like this
note_id : int autoincrement primary
type_id : int, foreign key from f Customers, Vendors, Products etc
type : varchar, code indicating the type, like Vendors, VENDORS or just V
note : varchar, the actual node
CREATE TABLE IF NOT EXISTS `notes` (
`note_id` int(11) NOT NULL AUTO_INCREMENT,
`type_id` int(11) NOT NULL,
`type` varchar(20) CHARACTER SET utf8 NOT NULL,
`note` varchar(255) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`note_id`)
)
With a setup like that you can have multiple notes for each type, like Vendors, and also hold notes for multiple types.
data sample
note_id type_id type note
--------------------------------------------------------------------
1 45 Vendors a note
2 45 Vendors another note
3 3 Customers a note for customer #3
4 67 Products a note for product #67
SQL sample
select note from notes where type="Vendors" and type_id=45
To reduce table size, I would prefer aliases for the types, like V, P, C and so on.

Don't do a "universal" table, e.g.
id, source_table, source_record_id, note_text
might sound good in practice, but you can NOT join this table against your others without writing dynamic SQL.
It's far better to simply add a dedicated notes field to every table. This eliminates any need for dynamic sql, and the extra space usage will be minimal if you use varchar/text fields, since those aren't stored in-table anyways.

I've done a structure like this before where I used a format like this:
id (int)
target_type (enum/varchar)
target_id (int)
note (text)
Each data element just has to query for it's own type then, so for your customer object you would query for notes attached to it like this
SELECT * FROM notes where target_type='customer' AND target_id=$this->id
You can also link target_type to the actual class, so that you write to the database using get_class($this) to fill out target type, in which case a single function inside of the Note class could take in any other object type you have.

In my opinion, there isn't a clean solution for this.
option 1: Master entity table
Every (relevant) row of every (relevant) table has a master entry inside a table (let's call it entities_tbl. The ids of each derived table isn't an autoincrement but it's a foreign key referencing the master table.
Now you can easily link the notes table with the master entity id.
PRO: It's an object oriented idea. Like a base "Object" class which is the father of every other class. Also, each entity has an unique id across the database.
CON: It's a mess. Every entity ID is scattered among (at least) two tables. You'd need JOINs every single time, and the master entity table will be HUGE (it will contain the same number of rows as the sum of every other child table, combined)
option 2: meta-attributes
inside the notes table, the primary key would contain an autoincrement, the entity_id and item_table_name. This way you can easily extract the notes of any entity from any table.
PRO: Easy to write, easy to populate
CON: It needs meta-values to extract real values. No foreign keys to grant referential integrity, messy and sloppy joins, table names as where conditions.
option 3: database denormalization
(sigh, I've never considered to ever give this suggestion)
Add a column inside each table where you need notes. Store the notes as json encoded strings. (this means to denormalize a database because you will introduce non-atomic values)
PRO: easy and fast to write, uses some form of standard even for future database users, the notes are centralized and easily accessible from each entity
CON: the database isn't normalized, poor search and comparison between notes

PHP & MySQL - Which variable type best for this array?

Question: Stated in title
Project: My PHP Mafia/Mobsters Strategy Game
Reason for Asking: I'm unsure how I would word this question into a somewhat relevant Google search.
I would like to know which MySql variable type I should use for an array. I will explode this array into an ID list including all people in that players mob.
EXAMPLE MYSQL DATA BELOW:
PlayerId -------- Mob //Lables
134 ------------- '23','59','12','53','801' //Values
This will then be exploded using explode() in PHP into a bunch of ints containing the IDS of players in that persons mob.
I would like the mob field to have an unlimited character length so that players can have HUGEE mobs.
I think I may be able to simply use longtext or the set type but I'm not completely sure. I don't want any errors later on once I release the game and I want my methods to stay clean and correct.
Thank you so much for taking the time to read this, I hope you can help. :)

You should create a table that associates players with mobs:
CREATE TABLE PlayerMobs (
PlayerId INT UNSIGNED NOT NULL,
MobId INT UNSIGNED NOT NULL,
FOREIGN KEY (PlayerID) REFERENCES Players (PlayerID),
FOREIGN KEY (MobID) REFERENCES Mobs (MobID)
);
And then join it with your other tables in queries as required.
I have added FOREIGN KEY constraints to ensure that only valid PlayerID and MobID values exist in the PlayerMobs table, but note that these constraints currently only work with the InnoDB storage engine.

You can try
CREATE TABLE PlayerMobs (
PlayerId INT UNSIGNED NOT NULL,
MobId Text UNSIGNED NOT NULL);

I did this a very long time ago, but I happened to come across the question when browsing through my account.
There aren't any specific "mobs." Your "mob" is basically like a friends list. It's not a group. It's just a bunch of people connected to you.
I believe I simply made a row for "mob members" and just put the other players ids, separated by commas, then in the PHP I exploded the string with a comma as the delimiter.

Dynamic project data managment with forms and mysql

I am currently responsible for creating a web based project management application for the department I work in. Each project has a bunch of different data points that describe it and my first plan was to just set up a table in the mysql database and an html form to manage the data in that table.
My managers just let me know they will need to be able to add/delete data points for the projects in case their work flow and project tracking changes. (This would be something that happens MAYBE a few times a year if at all)
So I am attempting to figure out the best way to go about storing this data in MySQL. The first approach that came to mind was give them an interface that allows them to add columns to the 'projects' table. and have a 'master' table that tracks all the column names and data types. But that feels like a REALLY Bad idea and a bit of a nightmare to maintain.
Another possible option would be to have the interface add a new table that stores all the information for that data point AND the id of the project that is using the data.
I understand that both of these could be really screwy ways of doing things. If there is a better way I would love to hear about it. If I need to clarify something let me know.
Thank you for your time

CREATE TABLE projects (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL
)
CREATE TABLE datapoints (
id INT PRIMARY KEY AUTO_INCREMENT,
projectid INT NOT NULL,
name VARCHAR(50) NOT NULL,
value VARCHAR(250) NOT NULL,
INDEX(projectid),
INDEX(name)
)
If you want more fancy, do one or more of
Put datapoint names in a table, reference them isnstead of naming them in table datapoints
Have datapoints have a field for each of numeric, pit, text, longtext OR use different tables

Figuring out the most effective way for a custom database server (PHP)

I just came across the idea of writing a special database which will fit for exactly one purpose. I have looked into several other database-systems and came to the conclusion that I need a custom type. However my question is not about if it is a good idea, but how to implement this best.
The application itself is written in php and needs to write to a custom database system.
Because there can be simultaneous read/write operations I can forget the idea of implementing the database directly into my application. (correct me please if I'm wrong).
That means I have to create 2 scripts:
The database-server-script
The application.
This means that the application has to communicate with the server. My idea was using php in cli mode for the database-server. The question is, if this is effective, or if I should look into a programming language like c++ to develop the server application? The second question is then the communication. When using php in cli mode I thought about giving a serialized-array-query as a param. When using c++ should I still do it serialized? or maybe in json, or whatever?
I have to note that a database to search through can consist of several thousands of entries. So i dont know exactly if php is realy the right choice.
Secondly i have to note that queries arent strings which have to be parsed, but an array giving a key,value filter or dataset. The only maybe complexer thing the database server has to be able to is to compare strings like the MySQL version of LIKE '%VALUE%', which could be slow at several thousand entries.
Thanks for the Help.

writing a special database which will fit for exactly one purpose
I presume you mean a custom database management system,
I'm having a lot of trouble undertanding why this would ever be necessary.
Datasbes and Tables like usual databases have. But i dont have columns. Each entry can have its own columns, except for the id
That's not a very good reason for putting yourself (and your users) through a great deal of pain and effort.
i could use mysql id | serialized data... but then much fun searching over a specific parameter in a entry
So what's wrong with a fully polymorphic model implemented on top of a relational database:
CREATE TABLE relation (
id INTEGER NOT NULL auto_increment,
....
PRIMARY KEY (id)
);
CREATE TABLE col_string (
relation_id NOT NULL /* references relation.id */
name VARCHAR(20),
val_string VARCHAR(40),
PRIMARY KEY (relation_id, name)
);
CREATE TABLE col_integer (
relation_id NOT NULL /* references relation.id */
name VARCHAR(20),
val_integer INTEGER,
PRIMARY KEY (relation_id, name)
);
CREATE TABLE col_float (
relation_id NOT NULL /* references relation.id */
name VARCHAR(20),
val_float INTEGER,
PRIMARY KEY (relation_id, name)
);
... and tables for BLOBs, DATEs, etc
Or if scalability is not a big problem....
CREATE TABLE all_cols (
relation_id NOT NULL /* references relation.id */
name VARCHAR(20),
ctype ENUM('string','integer','float',...),
val_string VARCHAR(40),
val_integer INTEGER,
val_float INTEGER,
...
PRIMARY KEY (relation_id, name)
);
Yes, inserts and selecting 'rows' is more complicated than for a normal relational table - but a lot simpler than writing your own DBMS from scratch. And you can wrap most of the functionality in stored procedures. The method described would also map easily to a NoSQL db.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.