I have joined a project recently and now I'm working on its Internationalization improvement. Technologies used are PHP/MySQL/Zend Framework/Dojo. I18n is implemented using gettext almost as described here link to SO question in the second answer.
But I encountered one problem. Some part of the information specific to certain DB tables is stored within those tables in the enum type columns. For example there is a field usr_online_status in the table "user" which could be one of either 'online' or 'offline'. There are many such tables with enum fields which contain info like ('yes' ,'no') ,('download', 'upload') and so on. Of course this info is displayed in English regardless of the current Language chosen by user.
I would like to solve this inconvenience. But don't know what is the best way to do this in terms of performance and ease of implementation.
I see two possible options:
1) Make language specific dictionary tables for each table which uses such enums.
2) Download all the info from enums. Translate it. Make a script which could on demand alter every table and replace those enums with the required translations.
But there may be simpler or better solutions for this problem.
What would you do ?
Thanks for your answers.
UPD1
Important remark. Info from the enums is not only displayed at the GUI but is used in search. For example - there is a grid on a webpage which contains info about users. You can type 'line' in a search field and the result will be only those users with the word '%line%' in their info, for example 'online' status.
You definitly want dictionary tables: Only with these can 2 different users of the app work in different languages at the same time.
I recommend to put some of these dictionary tables into PHP though, as this has proven to be quite an unintrusive and performant way of doing it - e.g.
$translation=array('yes'=>'Ja','no'=>'Nein', ..)
//...
$row=mysql_fetch_row($qry);
//$row[1] has yes/no
$row[1]=$translation[$row[1]];
//...
$translation could be require_once()'ed depending on the current user's language preferences, the URL or whatever
Basically you trade some RAM for speed and easyness.
UPDATE:
With Gior312 adding the info about search, here is my solution for it: Have the reverse translation in a DB table (you even might use it to create $translation per a script):
CREATE TABLE translations (
id INT PRIMARY KEY AUTO_INCREMENT,
languageid INT NOT NULL,
enumword VARCHAR(m) NOT NULL,
langword VARCHAR(n) NOT NULL,
-- n and m to your needs
INDEX(languageid)
-- other indices to your needs
)
Now when the search up until now was
$line=... //Maybe coming from $_POST['line'] via mysql_real_escape_string()
$sql="SELECT * FROM sometable WHERE somefield LIKE '%$line%'";
What you now do is
$line=... //Maybe coming from $_POST['line'] via mysql_real_escape_string()
$sql="SELECT enumword FROM translations WHERE languageid=$currentlanguageid AND langword LIKE '%$line%'";
//fetch resulting enumwords into array $enumwords
$enumlist=implode("','",$enumwords);
//This assumes, that the field enumwords contains nothing, that needs to be escaped
$sql="SELECT * FROM sometable WHERE somefield IN ('$enumlist')";
The rationale behind treating forward and back translation differently is:
There will be many more lines in the code where you display, than where you search, so the unintrusiveness of the forward translation is more important
The forward trnslation has to be done PER ROW (with a join), the reverse only PER QUERY, so the performance of the forward translation is more important than the performance of the reverse translation
Related
I have a Website in 3 languages.
Which is the best way to structure the DB?
1) Create 3 table, one for every language (e.g. Product_en, Product_es, Product_de) and retrieve data from the table with an identifier:
e.g. on the php page I have a string:
$language = 'en'
so I get the data only
SELECT FROM Product_$language
2) Create 1 table with:
ID LANGUAGE NAME DESCR
and post on the page only
WHERE LANGUAGE = '$language'
3) Create 1 table with:
ID NAME_EN DESCR_EN NAME_ES DESCR_ES NAME_DE DESCR_DE
Thank you!
I'd rather go for the second option.
The first option for me seems not flexible enough for searching of records. What if you need to search for two languages? The best way you can do on that is to UNION the result of two SELECT statement. The third one seems to have data redundancy. It feels like you need to have a language on every names.
The second one very flexible and handy. You can do whatever operations you want without adding some special methods unless you want to pivot the records.
I would opt for option one or two. Which one really depends on your application and how you plan to access your data. When I have done similar localization in the past, I have used the single table approach.
My preference to this approach is that you don't need to change the DB schema at all should you add additional localizations. You also should not need to change your related code in this case either, as language identifier just becomes another value that is used in the query.
That way you would be killing the database in no-time.
Just do a table like:
TABLE languages with fields:
-- product name
-- product description
-- two-letter language code
This will allow you, not only to have a better structured database, but you could even have products that only have one translation. If you want you can even want to show the default language if no other is specified. That you'll do programmatically of course, but I think you get the idea.
There are some language and courses(based on language) are defined in two table. Language table reference is used in course table to relate course with particular language. I also have a notes table that content notes of specific course and that is related to course table. Now I have two issues.
Now in coding I need to take some specific action for Spanish language only. So how should I handle this as languages will be entered by users and we would not be having any idea about Spanish language ID. If I do use text (the language name) then each time I need to fetch ID for Spanish from language table and then will fetch all course related to this from course table.
Suppose Spanish notes are stored in four separate sections and other notes have only one section so should I use same table with four column (one for each section) or use two tables(notes and spanish_notes). Using former way, will leave three column blank for other languages notes. I don't think that is good.
One quick solution to your first issue about multiple languages is to use language codes such as 'en', 'es', 'fr' etc. For instance in your language table you could have both id, code columns but in your content tables you could have a FK with code. So you could either get this lang. code from requests Accept-Language property. or somewhere else.
For second question in terms of normalization it is better to have separate tables for Spanish notes. It is way better for many reasons such as redundancy and dependency concerns.
EDIT: PS. You could also have a look at language codes from here and HTTP Accept-Language from here.
Some inputs:
There could be 2 ways of doing this:
a. When languages are entered by users, use a SELECT dropdown box for accepting user inputs. For each SELECT option, you can set the language name as the text and language id as the value. This way you will know the language ID as
b. You can use MySQL INNER JOIN between "language" and "courses" table, something like:
SELECT *
FROM `language` `l`
INNER JOIN `courses` `c` ON `l`.`language_id` = `c`.`language_id`
WHERE `l`.`language_name` = `spanish`;
I think it's okay to keep all notes for all sections in the single table. So, for the other 3 columns that will only contain Spanish notes, you can set them to accept NULL values
Hope it helps.
Background
I am creating a MySQL database to store items such as courses where there may be many attributes to a single course. For example:
A single course may have any or all of the following attributes:
Title (varchar)
Secondary Title (varchar)
Description (text)
Date
Time
Specific Location (varchar; eg. White Hall Room 7)
General Location (varchar; eg. Las Vegas, NV)
Location Coords (floats; eg. lat, long)
etc.
The database is set up as follows:
A table storing specific course info:
courses table:
Course_ID (a Primary Key unique ID for each course)
Creator_ID (a unique ID for the creator)
Creation_Date (datetime of course creation)
Modified_Date (where this is the most recent timestamp the course was modified)
The table storing each courses multiple attributes is set up as follows:
course_attributes table:
Attribute_ID (a unique ID for each attribute)
Course_ID (reference to the specific course attribute is for)
Attribute (varchar definining the attribute; eg. 'title')
Value (text containing value of specified attribute; eg. 'Title Of My Course')
Desire
I would like to search this database using sphinx search. With this search, I have different fields weighing different amounts, for example: 'title' would be more important than 'description'.
Specific search fields that I wish to have are:
Title
Date
Location (string)
Location (geo - lat/long)
The Question
Should I define a View in Mysql to organize the attributes according to 'title', 'description', etc., or is there a way to define my sphinx.conf file to understand specific attributes?
I am open to all suggestions to solving this problem, whether it be rearrangement of the database/tables or the way in which I search.
Let me know if you need any additional details to help me find a solution.
Thanks in advance for the help
!--Update--!
OK, so after reading some of the answers, I feel that I should provide some additional information.
Latitude / Longitude
The latitude/longitude attributes are created by me internally after receiving the general location string. I can generate the values in any way I wish, meaning that I can store them together in a single lat/long attribute as 'float lat, float long' values or any other desired format. This is done only after they have been generated from the initial location string and verified. This is to guard against malformed data as #X-Zero and #Cody have suggested.
Keep in mind that the latitude and longitude was merely illustrating the need to have that field be searchable as opposed to anything more than that. It is simply another attribute; one of many.
Weighting Search Results
I know how to add weights to results in a Sphinx search query:
$cl->setFieldWeights( array('title'=>1000, 'description'=>500) );
This causes the title column to have a higher weight than the description column if the structure was as #X-Zero suggested. My question was more directed to how one would apply the above logic with the current table definition.
Database Structure, Views, and Efficiency
Using my introductory knowledge of Views, I was thinking that I could possibly create something that displays a row for each course where each attribute is its own column. I don't know how to accomplish this or if it's even possible.
I am not the most confident with database structures, but the reason I set my tables up as described was because there are many cases where not all of the fields will be completed for every course and I was attempting to be efficient [yes, it seems as though I've failed].
I was thinking that using my current structure, each attribute would contain a value and would therefore cause no wasted space in the table. Alternatively, if I had a table with tons of potential attributes, I would think there would be wasted space. If I am incorrect, I am happy to learn why my understanding is wrong.
Let me preface this by saying that I've never even heard of Sphinx, nor (obviously) used it. However, from a database perspective...
Doing multi-domain columns like this is a terrible (I will hunt you down and kill you) idea. For one thing, it's impossible to index or sort meaningfully, period. You also have to pray that you don't get a latitude attribute with textual data (and because this can only be reinforced programatically, I'm going to garuantee this will happen) - doing so will cause all distance based formulas to crash. And speaking of location, what happens if somebody stores a latitude without a longitude (note that this is possible regardless of whether you are storing a single GeoLocation attribute, or the pair)?
Your best bet is to do the following:
Figure out which attributes will always be required. These belong in the course table (...mostly).
For each related set of optional attributes, create a table. For example, location (although this should probably be required...), which would contain Latitude/Longitude, City, State, Address, Room, etc. Allow the columns to be nullable (in sets - add constraints so users can't add just longitude and not latitude).
For every set of common queries add a view. Even (perhaps especially) if you persist in using your current design, use a view. This promotes seperation between the logical and physical implementations of the database. (This assumes searching by SQL) You will then be able to search by specifying view_column is null or view_column = input_parameter or whichever.
For weighted searching (assuming dynamic weighting) your query will need to use left joins (inside the view as well - please document this), and use prepared-statement host-parameters (just save yourself the trouble of trying to escape things yourself). Check each set of parameters (both lat and long, for example), and assign the input weighting to a new column (per attribute), which can be summed up into a 'total' column (which must be over some threshold).
EDIT:
Using views:
For your structure, what you would normally do is left join to the attributes table multiple times (one for each attribute needed), keying off of the attribute (which should really be an int FK to a table; you don't want both 'title' and 'Title' in there) and joining on course_id - the value would be included as part of the select. Using this technique, it would be simple to then get the list of columns, which you can then apparently weight in Sphinx.
The problem with this is if you need to do any data conversion - you are betting that you'll be able to find all conversions if the type ever changes. When using strongly typed columns, this is between trivial (the likelyhood is that you end up with a uniquely named column) to unnecessary (views usually take their datatype definitions from the fields in the query); with your architecture, you'll likely end up looking through too many false positives.
Database efficiency:
You're right, unfilled columns are wasted space. Usually, when something is optional(ish), that means you may need an additional table. Which is Why I suggested splitting off location into it's own table: this prevents events which don't need a location (... what?) from 'wasting' the space, but then forces any event that defines a location to specify all required information. There's an additional benefit about splitting it off this way: if multiple events all use the same location (... not at the same time, we hope), a cross-reference table will save you a lot of space. Way more than your attributes table ever could (you're still having to store the complete location per event, after all). If you still have a lot of 'optional' attributes, I hear that NoSQL is made for these kinds of things (but I haven't really looked into it). However, other than that, the cost of an additional table is trivial; the cost of the data inside may not be, but the space required is weighed against the perceived value of the data stored. Remember that disk space is relatively cheap - it's developer/maintainer time that is expensive.
Side note for addresses:
You are probably going to want to create an address table. This would be completely divorced from the event information, and would include (among other things) the precomputed latitude/longitude (in the recommended datatype - I don't know what it is, but it's for sure not a comma-separated string). You would then have an event_address table that would be the cross-reference between the events and where they take place - if there is additional information (such as room), that should be kept in a location table that is referenced (instead of referencing address directly). Once a lat/long value is computed, you should never need to change it.
Thoughts on later updates for lat/long:
While specifying the lat/long values yourself is better, you're going to want to make them a required part of the address table (or part of/in addition to a purely lat/long only table). Frankly, multi-value columns (delimited lists) of any sort are just begging for trouble - you keep having to parse them every time you search on them (among other related issues). And the moment you make them separate rows, one of the pair will eventually get dropped - Murphy himself will personally intervene, if necessary. Additionally, updating them at different times from the addresses will result in an address having a lat/long pair that does not match; your best bet is to compute this at insertion time (there are a number of webservices to find this information for you).
Multi-domain tables:
With a multi-domain table, you're basically betting that the domain key (attribute) will never become out-of-sync with the value (err, value). I don't care how good you are, somewhere, somehow, it's going to happen: at my company, we had one of these in our legacy application (it stored FK links and which files the FKs refer to, along with an attribute). At one point an application was installed in production which promptly began storing the correct file links, but the FK links to a different file, for a given class of attribute. Thankfully, there were audit records in another file which allowed this to be reversed (... as near as they were able tell).
In summary:
Revisit your required/optional data. Don't be afraid to create additional tables, each for a single entity, with every column for a single domain; you will also need relationship tables. You may also wish to place your audit data (last_updated_time) in a set of separate tables (single-domain tables will help immensely in this regard).
In the sphinx config you define your index and the SQL queries that populate it. You can define basic attributes, see Sphinx Attributes
Sphinx also supports geo searches on lat/long but they need to be expressed in radians, definitely not text columns like you have. I agree with X-Zero that storing lat/lng values are strings is a bad idea.
This is a "theoretical" question.
I'm having trouble defining the question so please bear with me.
When you have several related tables in a database, for example a table that holds "users" and a table that holds "phones"
both "phones" and "users" have a column called "user_id"
select user_id,name,phone from users left outer join phones on phones.user_id = users.user_id;
the query will provide me with rows of all the users whether they have a phone or not.
If a user has several phones, his name will be returned in 2 rows as expected.
columns=>|user_id|name|phone|
row0 = > | 0 |fred|NULL|
row1 = > | 1 |paul|tlf1|
row2 = > | 1 |paul|tlf2|
the name "paul" in the case above is a necessary duplicate which in the RDMS's eye's is not a duplicate at all!
It will then be handled by some server side scripting language, for example php.
How are these "necessary duplicates" actually handled in real websites or applications?
as in, how are the row's "mapped" into some usable object model.
p.s. if you decide to post examples, post them for php,mysql,sqlite if possible.
edit:
Thank you for providing answers, each answer has interpreted the question differently and as such is different and correct in it's own way.
I have come to the conclusion that if round trips are expensive this will be the best way along with Jakob Nilsson-Ehle's solution, which was fitting for the theoretical question.
If round trips they are cheap, I will do separate selects for phones and users as 9000 suggests, if I need to show a single phone for every user, I will give a primary column to the phones and join it with the user select like Ollie Jones correctly suggests.
even though for real life applications I'm using 9000's answer, I think that for this unrealistic question Jakob Nilsson-Ehle's solution is most appropriate.
The thing I would probably do in this case in PHP would be to use the userId in a PHP array and then use that to continuosly update the users
A very simple example would be
$result = mysql_query('select user_id,name,phone from users left outer join phones on phones.user_id = users.user_id;');
$users = Array();
while($row = mysql_fetch_assoc($result)) {
$uid =$row['user_id'];
if(!array_key_exists($uid, $users)) {
$users[$uid] = Array('name' => $row['name'], 'phones' => Array());
}
$users[$uid]['phones'][] = $row['phone'];
}
Of course, depending on your programming style and the complexity of the user data, you might define a User class or something and populate the data, but that is fundamentally how I would would do it.
Your data model inherently allows a user to have 0, 1, or more phones.
You could get your database to return either 0 or 1 phone items for each user by employing a nasty hack, like choosing the numerically smallest phone number. (MIN(phone) ... GROUP BY user). But numerically comparing phone numbers makes very little sense.
Your problem of ambiguity (which of several phone numbers) points to a problem in your data model design. Take a look, if you will, at some common telephone-directory apps. A speed-dial app on a mobile phone is a good example. Mostly they offer ways to put in multiple phone numbers, but they always have the concept of a primary phone number.
If you add a column to your phone table indicating number priority, and make it part of your primary (unique) key, and declare that priority=1 means the user's primary number, your app will not have this ambiguous issue any more.
You can't easily get a tree structure from an RDBMS, only a table structure. And you want a tree: [(user1, (phone1, phone2)), (user2, (phone2, phone3))...]. You can optimize towards different goals, though.
Round-trips are more expensive than sending extra info: go with your current solution. It fetches username multiple times, but you only have one round-trip per entire phone book. May make sense if your overburdened MySQL host is 1000 miles away.
Sending extra info is more expensive than round-trips, or you want more clarity: as #martinho-fernandes suggests, only fetch user IDs with phones, then fetch user details in another query. I'd stick with this approach unless your entire user details is a short username. With SQLite I'd stick with it at all times just for the sake of clarity.
Sound's like you're confusing the object data model with the relational data model - Understanding how they differ in general, and in the specifics of your application is essential to writing OO code on top of a relational database.
Trivial ORM is not the solution.
There are ORM mapping technologies such as hibernate - however these do not scale well. IME, the best solution is using a factory pattern to manage the mapping properly.
I am building an inventory tracking system for internal use at my company. I am working on the database structure and want to get some feedback on which design is better*.
I need a recursive(i might be using this term wrong...) system where a part could be made up of zero or more parts. I though of two ways to do this but am not sure which one to use. I am not an expert in database design so maybe there is a their option that i haven't thought of.
Option 1:
Two tables one with the part_id and the other with part_id, sub_part_id (which refers to another part_id) and quantity. so one table part_id would be unique and the other table there could be zero or more rows showing all the parts that make up a certain part.
Option 2:
One table with part_id and assembly. assembly would be a text field that looks something like this, part_id,quantity;part_id,quanity;.... I would then use the PHP explode() function to separate by semi-colon and again by comma to get an array of the sub parts.
I hope this all makes sense. I am using PHP/MySQL.
*community wiki because this may be subjective.
Generally, option 1 is preferable to option 2, not least because some of the part IDs in the assembly would themselves be assemblies.
You do have to deal with recursive or tree-structured queries. That is not particularly easy in any dialect of SQL. Some systems have better support for them than others. Oracle has its CONNECT BY PRIOR system (weird, but it sort of works), and DB2 has recursive WITH clauses, and ...
NEVER, never ever use procedural languages like PHP or C# to process data structures when you have a database engine for that. Relational data structures are much more faster and flexible, and surer, than storing text. Forget about Option 2.
You could use recursive UDFs to retrieve the whole tree with no big fuss about it.
How about a nullable foreign key on the same table? Something like:
CREATE TABLE part (
part_id int not null auto_increment primary key,
parent_part_id int null,
constraint fk_parent_part foreign key (parent_part_id) references part (part_id)
)
Definitely not option 2. That is a recipe for trouble. The correct answer depends on how many potential levels of assemblies are possible, and how you think of the assemblies. Do you think of an assembly (a composite onject consisting of 2 or more atomic parts) as a part in it's own right, that can itself be used as a subpart in anothe assmebly? Or are assemblies a fundementally differrent kind of thing froma an atomic part?
If the former is the case, then put all assemblies and parts in one table, with a PartID, and add a second table that just has the construction details for those parts that are composed of multiple other parts (which themseleves may be assemblies of yet more atomic parts). This second table would look like this:
ConstructionDetails
PartId, SubPartId, QuantityRequired
If you think of things more like the second way, then only put the atomic parts in the first table, and put the assemblies in the second table
Assemblies
AssemblyId, PartId, QuantityRequired