Localization with mysql/PHP

Localization with mysql/PHP - php

I'm currently building a multilingual website using PHP and MySQL and am wondering what the best approach regarding localization is with regards to my data model. Each table contains fields that need to be translated (e.g. name, description....) in several languages.
The first idea was to create a field for each language (e.g. name_en, name_de, name_fr) and retrieve the appropriate field from PHP using a variable (e.g. $entry['name_' . LANGUAGE]). While it would work, this approach has in my opinion many drawbacks:
-you need as many occurrences of each field as you have languages (bearing in mind you can have en-US, en-CA, en-GB...)
-if you add or remove languages you need to modify the database structure accordingly
-if you have untranslated fields, they are still created for each entry which doesn't seem very optimized
The second idea is to create a translation table that can be used to store the translation of any field of any table in the database:
----------------
translation
----------------
id INT
table_name VARCHAR
field_name VARCHAR
value VARCHAR
language_id VARCHAR
The table_name and field_name will allow identifying which table and which field the translation is about, while language_id will indicate which language that translation if for. The idea is to create models that would replace the value of the translatable fields (e.g. name, description) by their corresponding translation based on the language selected by the user.
Can you see drawbacks with this approach? Have you got suggestions to make?
Thanks.

The main drawback is that you destroy the relational model by storing metadata like table name and field name as application data. You queries would be too ugly and non-effective.
Another drawback is that you are limited only to one data type of the translatable data. Your table structure would define
value VARCHAR(255)
which means you would store data that would require smaller field always in VARCHAR(255). And if you like to have it even more universal to store also large text you need to define it
value TEXT
which is even worse.
The popular model is the following. For every entity you define the fields which are not language dependent and those which are language dependent and create always 2 tables. For example:
products
--------
id
price
status
picture
products_translations
--------
product_id
language_id
name VARCHAR(100)
description TEXT
This is the proper relational approach. Of course, it also has drawbacks major one being that you would always join 2 table to fetch items and adding/updating of data becomes a bit more complex.

Not sure if this answer will satisfy you, but I discern between two types of texts:
static
dynamic
Static text is provided by yourself for general application text that users have no influence on. Stuff like form input labels and introductory text. I use gettext for those, so I can send it off to professional translators, if i need it translated.
Dynamic text is text provided by the user of the application, which seems to be what you're talking about. Personally, I discern dynamic text into 2 different types as well.
generally applicable
specific
An example of the general type would be options inside of HTML select elements, or a tagging system. They're not specific to a single content element, but (can) apply to multiple ones.
Examples for a specific text would be the actual content inside of a CMS like an article, or a product description in an online shop.
For the first one I use a kind of central lookup table with a hash of the actual, original text as the index, which i refer to as a foreign key in tables where i use that string. Then you look up that hash in the central table to echo the real text behind it (of course, you ought to use some sort of caching here).
For the latter one I use a classic content table with columns for every content area specific to that logical content unit and a row for each language.
Thus far it's working out pretty well.

Related

Creating a entity framework SQL table structure

For the past couple years I've been working on my own lightweight PHP CMS that I use for my personal projects. The one thing its missing is an easy databasing solution.
I am looking to create a simple content type database framework in which I can specify a new type (user, book, event..ect) and then be able to load everything related to it automatically.
For some content types, there could be fields that can only have 1 value and some that can have zero to many values so I will use a new table for these. Take the example:
table: event
columns: id, name, description, date
table: event_people:
columns: id_event, id_user
table: event_pictures:
columns: id_event, picture
Events will have a bunch of fields that contain a value such as the description, but there could also be a bunch of pictures and people going to it.
I want to be able to create a generic PHP class that will load all the information on a content type. My current thought process is to make entity loader function that I can give it an id and type:
Entity:load($id, "event");
From this I was going to get all of the tables with the prefix of "event", load all of the data with the passed in ID and then store it in a multidimensional array. I feel like there is probably a more efficient way for this however. I'd like to stay away from having a config file someplace that specifies all of the content types and their child tables because I want to be able to add a new child table and have it pick it up automatically.
Is there anyway to store this relationship directly within the MySQL table? I don't do a lot of databasing and I've just recently started to use foreign keys (what a life saver). Would I be more efficient to see which tables have a foreign key related to the id column in the event table, and if so how would this be done? I'm also open to different ways of storing this information.
Note: I'm doing this just for fun so please don't refer me to use any premade frameworks. I'd like to create this myself.

I think your approach of searching for all tables with prefix name event is sensible. The only way I can think to be more efficient is to have an "entity_relationship" table that you could query. It would allow you flexibility in your naming convention, avoid naming conflicts, and this lookup should be more efficient than a pattern match search.
Then whenever a new object type with its own table was added, then you could make an entry on the relationship table.
INSERT INTO entity_relationship VALUES
('event','event_people'),
('event','event_pictures'),
('event','event_documents'),
('event','event_performers');

Most efficient method to store MYSQL options and values in database

I'm hitting a dead with the best practice for storing a large amount of options and values in my MYSQL database and then assigning them to properties. The way I usually do this (example is for real estate) is to create a table called "pool" then have an auto increment value as the ID and a varchar to store the value, in this case "Above Ground" and another row for "In-ground". Then in my property table I would have a column for "has_pool" with the proper ID value from the "pool" table assigned. Obviously the problem is that with hundreds of options (fireplace, water view, etc) for each property, my number of database tables will get very large, very fast and my left joins would become out of control on the front side.
Can someone point me in the right direction on what the best practice would be to easily populate new values for the property attributes and keep the query count down to a minimum? I feel like there is a simple solution but my research so far has not made it apparent to me. Thank you!

One way you could do this is create an 'options' table with four columns: id, menuId, value
Create another table called menus, with two fields; id and name.
Add the menu names (pool, fireplace etc.) to the menus table, and then add the possible values to the options table, including the id of the menu it is related to.

I'd store all the values serialized (e.g. JSON or XML or YAML) into a blob, and then define inverted index tables for attributes I want to be searchable.
I describe this technique and alternatives in my presentation Extensible Data Modeling with MySQL.
Also see http://bret.appspot.com/entry/how-friendfeed-uses-mysql

Tables' schema for MySQL driven multi-language website

I'm working on multi-lang website. Trying to design optimal db schema for this purpose.
As you see, there are 2 tables: langs and menu. My idea is following:
For ex. lets take a look at multi-language navigation generation from MySQL table. In PHP backend, while generating navigation from database
Get all data from menu table's row
Left join second table - langs (by name field of menu table )
and get data from defined language's column (for ex. en, ru)
How do you think, is it optimal way, or is there more efficient solution? Please, give me database related answers, not file. (for ex. gettext,... etc)

It would be better if the langs table contained a language column, with one row for each translation. This allows you to add another language later and only have to apply data changes (rather than having to re-write queries).
As you've already implied, performing a left join and falling back to a default language held directly in the menus table is also a good idea (in which case, you don't need to hold a translation for e.g. en in the langs table, since the english version will always be available).
You may also want to consider storing country-specific translations (if, say, there are multiple spanish speaking countries where the translations can be different), and following a fallback strategy of finding: language-country, language, English translations.

You can normalize further like so:
tokenname: id, token (eg, (1, program), (2, menu))
localizations: id,
tokenname_id, lang, text
Tables that reference strings then refer to the string by tokenname.id as a foreign key.
Note that you will need some kind of mini templating language for text localizations.

How to design a generic database whose layout may change over time?

Here's a tricky one - how do I programatically create and interrogate a database whose contents I can't really foresee?
I am implementing a generic input form system. The user can create PHP forms with a WYSIWYG layout and use them for any purpose he wishes. He can also query the input.
So, we have three stages:
a form is designed and generated. This is a one-off procedure, although the form can be edited later. This designs the database.
someone or several people make use of the form - say for daily sales reports, stock keeping, payroll, etc. Their input to the forms is written to the database.
others, maybe management, can query the database and generate reports.
Since these forms are generic, I can't predict the database structure - other than to say that it will reflect HTML form fields and consist of a the data input from collection of edit boxes, memos, radio buttons and the like.
Questions and remarks:
A) how can I best structure the database, in terms of tables and columns? What about primary keys? My first thought was to use the control name to identify each column, then I realized that the user can edit the form and rename, so that maybe "name" becomes "employee" or "wages" becomes ":salary". I am leaning towards a unique number for each.
B) how best to key the rows? I was thinking of a timestamp to allow me to query and a column for the row Id from A)
C) I have to handle column rename/insert/delete. Foe deletion, I am unsure whether to delete the data from the database. Even if the user is not inputting it from the form any more he may wish to query what was previously entered. Or there may be some legal requirements to retain the data. Any gotchas in column rename/insert/delete?
D) For the querying, I can have my PHP interrogate the database to get column names and generate a form with a list where each entry has a database column name, a checkbox to say if it should be used in the query and, based on column type, some selection criteria. That ought to be enough to build searches like "position = 'senior salesman' and salary > 50k".
E) I probably have to generate some fancy charts - graphs, histograms, pie charts, etc for query results of numerical data over time. I need to find some good FOSS PHP for this.
F) What else have I forgotten?
This all seems very tricky to me, but I am database n00b - maybe it is simple to you gurus?
Edit: please don't tell me not to do it. I don't have any choice :-(
Edit: in real life I don't expect column rename/insert/delete to be frequent. However it is possible that after running for a few months a change to the database might be required. I am sure this happens regularly. I fear that I have worded this question badly and that people think that changes will be made willy-nilly every 10 minutes or so.
Realistically, my users will define a database when they lay out the form. They might get it right first time and never change it - especially if they are converting from paper forms. Even if they do decide to change, this might only happen once or twice ever, after months or years - and that can happen in any database.
I don't think that I have a special case here, nor that we should be concentrating on change. Perhaps better to concentrate on linkage - what's a good primary key scheme? Say, perhaps, for one text input, one numerical and a memo?

"This all seems very tricky to me, but
I am database n00b - maybe it is
simple to you gurus?"
Nope, it really is tricky. Fundamentally what you're describing is not a database application, it is a database application builder. In fact, it sounds as if you want to code something like Google App Engine or a web version of MS Access. Writing such a tool will take a lot of time and expertise.
Google has implemented flexible schemas by using its BigTable platform. It allows you to flex the schema pretty much at will. The catch is, this flexibility makes it very hard to write queries like "position = 'senior salesman' and salary > 50k".
So I don't think the NoSQL approach is what you need. You want to build an application which generates and maintains RDBMS schemas. This means you need to design a metadata repository from which you can generate dynamic SQL to build and change the users' schemas and also generate the front end.
Things your metadata schema needs to store
For schema generation:
foreign key relationships (an EMPLOYEE works in a DEPARTMENT)
unique business keys (there can be only one DEPARTMENT called "Sales")
reference data (permitted values of EMPLOYEE.POSITION)
column data type, size, etc
whether column is optional (i.e NULL or NOT NULL)
complex business rules (employee bonuses cannot exceed 15% of their salary)
default value for columns
For front-end generation
display names or labels ("Wages", "Salary")
widget (drop down list, pop-up calendar)
hidden fields
derived fields
help text, tips
client-side validation (associated JavaScript, etc)
That last points to the potential complexity in your proposal: a regular form designer like Joe Soap is not going to be able to formulate the JS to (say) validate that an input value is between X and Y, so you're going to have to derive it using templated rules.
These are by no means exhaustive lists, it's just off the top of my head.
For primary keys I suggest you use a column of GUID datatype. Timestamps aren't guaranteed to be unique, although if you run your database on an OS which goes to six places (i.e. not Windows) it's unlikely you'll get clashes.
last word
'My first thought was to use the
control name to identify each column,
then I realized that the user can edit
the form and rename, so that maybe
"name" becomes "employee" or "wages"
becomes ":salary". I am leaning
towards a unique number for each.'
I have built database schema generators before. They are hard going. One thing which can be tough is debugging the dynamic SQL. So make it easier on yourself: use real names for tables and columns. Just because the app user now wants to see a form titled HEADCOUNT it doesn't mean you have to rename the EMPLOYEES table. Hence the need to separate the displayed label from the schema object name. Otherwise you'll find yourself trying to figure out why this generated SQL statement failed:
update table_11123
set col_55542 = 'HERRING'
where col_55569 = 'Bootle'
/
That way madness lies.

In essence, you are asking how to build an application without specifications. Relational databases were not designed so that you can do this effectively. The common approach to this problem is an Entity-Attribute-Value design and for the type of system in which you want to use it, the odds of failure are nearly 100%.
It makes no sense for example, that the column called "Name" could become "Salary". How would a report where you want the total salary work if the salary values could have "Fred", "Bob", 100K, 1000, "a lot"? Databases were not designed to let anyone put anything anywhere. Successful database schemas require structure which means effort with respect to specifications on what needs to be stored and why.
Therefore, to answer your question, I would rethink the problem. The entire approach of trying to make an app that can store anything in the universe is not a recipe for success.

Like Thomas said, rational database is not good at your problem. However, you may want to take a look at NoSQL dbs like MongoDB.

See this article:
http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/
for someone else's experience of your problem.

This is for A) & B), and is not something I have done but thought it was an interesting idea that Reddit put to use, see this link (look at Lesson 3):
http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html

Not sure about the database but for charts instead of using PHP for the charts, I recommend looking into using javascript (http://www.reynoldsftw.com/2009/02/6-jquery-chart-plugins-reviewed/). Advantages to this are some of the processing is offloaded to the client side for chart displays and they can be interactive.

The other respondents are correct that you should be very cautious with this approach because it is more complex and less performant than the traditional relational model - but I've done this type of thing to accommodate departmental differences at work, and it worked fine for the amount of use it got.
Basically I set it up like this, first - a table to store some information about the Form the user wants to create (obviously, adjust as you need):
--************************************************************************
-- Create the User_forms table
--************************************************************************
create table User_forms
(
form_id integer identity,
name varchar(200),
status varchar(1),
author varchar(50),
last_modifiedby varchar(50),
create_date datetime,
modified_date datetime
)
Then a table to define the fields to be presented on the form including any limits
and the order and page they are to be presented (my app presented the fields as a
multi-page wizard type of flow).
-
-************************************************************************
-- Create the field configuration table to hold the entry field configuration
--************************************************************************
create table field_configuration
(
field_id integer identity,
form_id SMALLINT,
status varchar(1),
fieldgroup varchar(20),
fieldpage integer,
fieldseq integer,
fieldname varchar(40),
fieldwidth integer,
description varchar(50),
minlength integer,
maxlength integer,
maxval varchar(13),
minval varchar(13),
valid_varchars varchar(20),
empty_ok varchar(1),
all_caps varchar(1),
value_list varchar(200),
ddl_queryfile varchar(100),
allownewentry varchar(1),
query_params varchar(50),
value_default varchar(20)
);
Then my perl code would loop through the fields in order for page 1 and put them on the "wizard form" ... and the "next" button would present the page 2 fields in order etc.
I had javascript functions to enforce the limits specified for each field as well ...
Then a table to hold the values entered by the users:
--************************************************************************
-- Field to contain the values
--************************************************************************
create table form_field_values
(
session_Id integer identity,
form_id integer,
field_id integer,
value varchar(MAX)
);
That would be a good starting point for what you want to do, but keep an eye on performance as it can really slow down any reports if they add 1000 custom fields. :-)

SO what RDF database do i use for product attribute situation initially i thought about using EAV?

i have a similar issue as espoused in How to design a product table for many kinds of product where each product has many parameters
i am convinced to use RDF now. only because of one of the comments made by Bill Karwin in the answer to the above issue
but i already have a database in mysql and the code is in php.
1) So what RDF database should I use?
2) do i combine the approach? meaning i have a class table inheritance in the mysql database and just the weird product attributes in the RDF? I dont think i should move everything to a RDF database since it is only just products and the wide array of possible attributes and value that is giving me the problem.
3) what php resources, articles should i look at that will help me better in the creation of this?
4) more articles or resources that helps me to better understand RDF in the context of the above challenge of building something that will better hold all sorts of products' attributes and values will be greatly appreciated. i tend to work better when i have a conceptual understanding of what is going on.
Do bear in mind i am a complete novice to this and my knowledge of programming and database is average at best.

Ok, one of the main benefits of RDF is global identity, so if you use surrogate ids in your RDBMS schema, then you could assign the surrogate ids from a single sequence. This will make certain RDF extensions to your schema easier.
So in your case you would use a single sequence for the ids of products and other entities in your schema (maybe users etc.)
You should probably keep essential fields in normalized RDBMS tables, for example a products table with the fields which have a cardinality of 0-1.
The others you can keep in an additional table.
e.g. like this
create table product (
product_id int primary key,
// core data here
)
create table product_meta (
product_id int,
property_id int,
value varchar(255)
)
create table properties (
property_id int primary key,
namespace varchar(255),
local_name varchar(255)
)
If you want also reference data in dynamic form you can add the following meta table :
create table product_meta_reference (
product_id int,
property_id int,
reference int
)
Here reference refers via the surrogate id to another entity in your schema.
And if you want to provide metadata for another table, let's say user, you can add the following table :
create table user_meta (
user_id int,
property_id int,
value varchar(255)
)
I put this as a different answer, because it is a more specific suggestion then my first answer.

1 & 3) As you're using PHP and MySQL you're best bet would be either ARC 2 (although the website states this is a preview release this is the release you want) or RAP both of which allow for database based storage allowing you to store your data in MySql
ARC 2 seems to be more popular and well maintained in my experience
2) Depends how much of your existing code base would have to change if you move everything to RDF and what kinds of queries you do with the existing data in your database. Some things may be quicker using RDF and SPARQL while some may be slower.

I haven't used RDF with PHP, but in general if you use two persistence technologies in one project then the result is probably more complex and risky than using one alone.
If you stay with the RDBMS approach you can make it more RDF like by introducing the following attributes :
use a single sequence for all surrogate ids, this way you get unique identifiers, which is a requirement for RDF
use base tables for mandatory properties and extension tables with subject + property + values columns for additional data
You don't have to use an RDF engine to keep your data in RDF mappable form.
Compared to EAV RDF is a more expressive paradigm for dynamic data modeling.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.