Multi-language social website - Database driven?

Multi-language social website - Database driven? - php

To store multi language content, there is lots of content, should they be stored in the database or file? And what is the basic way to approach this, we have page content, reference tables, page title bars, metadata, etc. So will every table have additional columns for each language? So if there are 50 languages (number will keep growing as this is a woldwide social site, so eventual goal is to have as many languages as possible) then 50 extra columns per table? Or is there a better way?
There is a mixture of dynamic system and user content + static content.
Scalability and performance are important. Being developed in PHP and MySQL.
User will be able to change language on any page from the footer. Language can be either session based or preference based. Not sure what is a better route?

If you have a variable, essentially unknown today number of languages, than this definately should NOT be multiple columns in a record. Basically the search key on this table should be something like message id plus language id, or maybe screen id plus message id plus language id. Then you have a separate record for each language for each message.
If you try to cram all the languages into one record, your maintenance will become a nightmare. Every time you add another language to the app, you will have to go through every program to add "else if language=='Tagalog' then text=column62" or whatever. Make it part of the search key and then you're just reading "where messageId='Foobar' and language=current_language", and you pass the current language around. If you have a new language, nothing should have to change except adding the new language to the list of valid language codes some place.

So really the question is:
blah blah blah. Should I keep my data in flat files or a database?
Short answer is whichever you find easier to work with. Depending on how you structure it, the file based approach can be faster than the database approach. OTOH, get it wrong and performance impact will be huge. The database approach enforces more consistent structure from the start. So if you make it up as you go along, then the database approach will probably pay off in the long run.
eventual goal is to have as many languages as possible) then 50 extra columns per table?
No.
If you need to change your database schema (or the file structure) every time you add a new language (or new content) then your schema is wrong. If you don't understand how to model data properly then I'd strongly recommend the database approach for the reasons given.
You should also learn how to normalize your data - even if you ultimately choose to use a non-relational database for keeping the data in.

You may find this useful:
PHP UTF-8 cheatsheet
The article describes how to design the database for multi-lingual website and the php functions to be used.

Definitely start with a well defined model so your design doesn't care whether data comes from a file, db or even memCache or something like that. Probably best to do a single call per page to get an object that contains all the fields for that single page, rather than multiple calls. The you can just reference that single returned object to get each localised field. Behind the scenes you could then code the respository access and test. Personally I'd probably go the DB approach over a file - you don't have to worry about concurrent file access and it's probably easier to deploy changes - again you don't have to worry about files being locked by reads when you're deploying new files - just a db update.
See this link about php ioc, that might help you as that would allow you to abstract from your code what type of respository is used to hold the data. That way if you go one approach and later you want to change it - you won't have to do so much rework.

There's no reason you need to stick with one data source for all "content". There is dynamic content that will be regularly added to or updated, and then there is relatively static content that only rarely gets modified. Then there is peripheral content, like system messages and menu text, vs. primary content—what users are actually here to see. You will rarely need to search or index your peripheral content, whereas you probably do want to be able to run queries on your primary content.
Dynamic content and primary content should be placed in the database in most cases. Static peripheral content can be placed in the database or not. There's no point in putting it in the database if the site is being maintained by a professional web developer who will likely find it more convenient to just edit a .pot or .po file directly using command-line tools.
Search SO for the tags i18n and l10n for more info on implementing internationalization/localization. As for how to design a database schema, that is a subject deserving of its own question. I would search for questions on normalization as suggested by symcbean as well as look up some tutorials on database design.

Related

Where to save a single value on server

I am creating an application with a click to call button on an html page.
There will be one person manning the phone. I want this person to be able to set a variable with a boolean value on my server: 1 is available, 0 is unavailable.
I could create a single field SQL table but this feels like overkill, or I could read and write to a text file containing just one character.
What is the most correct way to store a single value?

I know it seems like overkill to use a small database table for this.
If your application already uses a database, this is by far the best way to proceed. Your database technology has all kinds of support for storing data so it doesn't get lost. But, don't stand up a database and organize your application to use it just for this one data point; a file will be easier in that case.
(WordPress does something similar; it uses a table called wp_options containing a lot of one-off settings values.)
I suggest your table contain two columns (or maybe more), agent_id and available. Then, if you happen to add another person taking telephone calls, your app will be ready to handle that growth. Your current person can have agent_id = 0.

If you have a DB set up, I'd use it.
That's what DB's are for, persisting changeable data.. otherwise you are basically writing your own separate DB system for the sake of one setting, which would be uberkill in my eyes!
There is value in consistency and flexibility.. what if I suddenly need to store an expected return time? How do I do this in a text-file, how do I differentiate the column? How do I manipulate the data? MySQL already answers all these questions for you.
As a team member, I'd expect most of my dev colleagues (and new hires) to know how to use MySQL.. I wouldn't want them to have to work with, extend or debug a separate bespoke file persistence system that I had tacked on.
If you are worried about having lots of one row tables dotted about, you could use a single table for miscellaneous singular config variables which need updating regularly.
We have a table like this:
Table: `setting`
Columns: `key_string` VARCHAR, `value` VARCHAR
And could store your variable as
['key_string' => 'telephone_service_available', 'value' => '1']

In this specific case a simple file check (Exist a file or not) is probably the most simple way you can do here. And it also has the benefit to easily check if the file exist or not, you don't have to read file contents.
But if you need just one more information, you have to go a complete other way.

Depends on what you try to do afterwards with the information.
If you use it within a web-application store it in the session.
Or try a flatfile-database like SQLite (no active DBMS needed). Its easy and you can extend it very easy.
Or just a bipolar information with creating a file. If the file is not there is is off.

storing large data in files vs tables

So I am working on this website where people can post articles. My colleague suggested storing all the meta-data of an article (user, title, dates etc) in a table and the actual article body as a file in the server.
A data structure would look like:
post_id post_user_id post_title post_body post_date etc
-------------------------------------------------------------------------------
1 1 My First Post 1_1.txt 2014-07-07 ...
2 1 My First Post 2_1.txt 2014-07-07 ...
--------------------------------------------------------------------------------
Now we would get the record of the post and then locate where it is by
$post_id . "_" . $post_user_id . ".txt";
He says that this will reduce the size of the tables and on the long run make it faster to access. I am not sure about this and wanted to ask if there are any issues in this design.

The first risk that pops into my mind is data corruption. Following the design, you are splitting the information into two fragments, even though both pieces are dependant from one another :
A file has to exist for each metadata entry (or you'll end up with not found errors for entries supposed to exist).
A metadata entry has to exist for each file (or you'll end up with garbage).
Using the database only has one big advantage : it is very probably relational. This means that you actually can set up rules to prevent the two scenarios above to occur (you could use an SQL CASCADE DELETE for instance, or put every piece of information in one table). Keeping these relations between two data backends is going to be a tricky thing to setup.
Another important thing to remember : data stored in a SQL database isn't sent to a magical place far from your drive. When you add an entry into your database, you write to your database files. For instance, those files are stored in /var/lib/mysql for MySQL engines. Writing to other files does not make that much of a difference...
Next thing : time. Accessing a database is fast once it's opened, all it takes is query processing. Accessing files (and that is, once per article) may be heavier : files need to be opened (including privileges checks, ...), read (line-by-line according to your buffer size) and closed. Of course, you can add to that all the programming it would take to link those files to their metadata...
To me, this design adds unecessary complexity to the application. You could store everything in the database, centralise. You'll use pretty much the same amount of disk space in both cases, yet looking-up/accessing each article file separately (while keeping it connected with its DB metadata) will certainly waste some time.
Design for simplicity; add complexity only where you must. (Eric S. Raymond)

This could look like a good idea is those posts are NEVER edited. Access to a file could take a while, and if your user wants to edit a lot of times his post, storing the content in a file is not a great idea. SQL support well large text values (as WYSIWYG text), dont be afraid to store them in your Post table.
Additionally, your filesystem will take ways more time to read and writes datas stored in files than in database.
Everything will depend of the number of post you want to store, and if you users can edit or not their posts.

I would agree, in a production environment it is generally recommended to let the file system keep track of files and the database to hold on to the metadata.
However, I have mostly heard this philosophy be applicable to BLOG types and Images. Because even large articles are relatively small, a TEXT data type can suffice and even make it easier to edit, draw from, and search as needed. \
(hence I agree with Rémi Delhaye, who answered this just as I was writing this post)

Filesystem is much more likely to have higher latency and files can 'go missing' where a database record is less likely to.
If the contents of the field is too large in the case of SQL Server then you could look at the FileStream API in newer versions.
Really though, either approach is valid in my opinion. With a file you don't have to worry about the database mangling the content if you make a mistake during escaping or something.
Beware if you're writing your code on a case-insensitive filesystem and running on a case-sensitive one in production- filename case matters so it can be another way to lose access to your files later on or unexpectedly once the application is deployed.

Automated Process to Add New Data to Table

I am working on a website which utilizes a table for presenting old and newly passed laws. As such it requires that we have a large volume of data in the tables, and we constantly have to add more data to the tables.
I know already how to construct the table through CSS and HTML; however, due to the sheer volume of data which we are dealing with, I would like to know if there is a way to create a separate admin page where we can just plug in the law information and have it automatically added to the table rather than having to physically code in all of the information through HTML.
I also have a second question: I would like to add some tabs at the top of the table which allows users to sort laws based on the year they were passed. An example of this can be seen at this site: CT Legislation | 2014 | General Assembly | Passed | LegiScan . It has several tabs at the top which allow users to sort legislation- my question is what coding language is required to add this to a table?

A CMS may or may not do it for you. What really would be good is to use Parse to hold all your data. Take a look at Data Storage and Cloud Code. You can add new laws whenever you want, and you would configure Parse to dynamically add the data to your table for you.

You can use a variety of languages for this type of solution. If it were all web based it would likely utilize php for building a password protected page (admin) page. It would also be used along with SQL to send/receive data from the Database, the it could use CSS/html for the table and content styling.
For the database you could use MYSQL (a type of database).
If this is beyond what you are comfortable with, a content management system (CMS) would be a great option. They set up the entire backend for the user and have an interface that will allow someone who knows zero code or html/CSS to put a pretty decent website together.
The great part about using wordpress is it lends itself well to someone looking to learn more about development. You can see how things can be set up with code to achieve certain outcomes and can learn more and more as you work with the system on increasingly deeper levels.
Another option is using google drive. There are tabs, a table and it is cloud based so you can share it with people you want to have access to it. Anyone you choose can add/delete and it keeps very good track of what is changed and who made the changes. It is really easy to go back an fix things if they have been messed up.

Storing language and styles. What would be best? Files or DB ( i18N )

I'm starting a Incident Tracking System for IT, and its likely my first PHP project.
I've been designing it in my mind based on software I've seen like vBulletin, and I'd like it to have i18n and styles editables.
So my first question goes here:
What is best method to store these things, knowing they will be likely static. I've been thinking about getting file content with PHP, showing it in a text editor, and when save is made, replace the old one. (Making a copy if it hasn't ever been edited before so we have the "original").
I think this would be considerably faster than using MySQL and storing the language / style.
What about security here? Should I create .htaccess for asking for pw on this folder?
I know how to make a replace using for each getting an array from database and using strreplace ($name, $value, $file) but if I store language in file, can I make a an associative array with it's content (like a JSON).
Thanks a lot and sorry for so many questions, im newbie

this is what im doing in my cms:
for each plugin/program/entity (you name it) i develop, i create a /translations folder.
i put there all my translations, named like el.txt, de.txt, uk.txt etc. all languages
i store the translation data in JSON, because its easy to store to, easy to read from and easiest for everyone to post theirs.
files can be easily UTF8 encoded in-file without messing with databases, making it possible to read them in file-mode. (just JSON.parse them)
on installation of such plugins, i just loop through all translations and put them in database, each language per table row. (etc. a data column of TEXT datatype)
for each page render i just query once the database for taking this row of selected language, and call json_decode() to the whole result to get it once; then put it in a $_SESSION so the next time to get flash-speed translated strings for current selected language.
the whole thing was developed having i mind both performance and compatibility.

The benefit for storing on the HDD vs DB is that backups won't waste as much space. eg. once the file is backed-up once, it doesn't take up tape on the next day. Whereas, a db gets fully backed up every day and takes up increasing amounts of space. The down-side to writing it to the disk is that it increases your chance of somebody uploading something malicious and they might be clever enough to figure out how to execute it. You just need to be more careful, that's all.
Yes, use .htaccess to limit any action on a writable folder. Good job thinking ahead of that risk.
Your approach sounds like a good strategy.
Good luck.

Storing post bodies in database or files?

I'm learning web-centric programming by writing myself a blog, using PHP with a MySQL database backend. This should replace my current (Drupal based) blog.
I've decided that a post should contain some data: id, userID, title, content, time-posted. That makes a nice schema for a database table. I'm having issues deciding how I want to organize the storage of content, though.
I could either:
Use a file-based system. The database table content would then be a URL to a locally-located file, which I'd then read, format, and display.
Store the entire contents of the post in content, ie put it into the database.
If I went with (1), searching the contents of posts would be slightly problematic - I'd be limited to metadata searching, or I'd have to read the contents of each file when searching (although I don't know how much of a problem that'd be - grep -ir "string" . isn't too slow...). However, images (if any) would be referenced by a URL, so referencing content would at least be an internally consistant methodology, and I'd be easily able to reuse the content, as text files are ridiculously easy to work with, compared to an SQL database file.
Going with (2), though, I could use a longtext. The content would then need to be sanitised before I tried to put it into the tuple, and I'm limited by size (although, it's unlikely that I'd write a 4GB blog post ;). Searching would be easy.
I don't (currently) see which way would be (a) easier to implement, (b) easier to live with.
Which way should I go / how is this normally done? Any further pros / cons for either (1) or (2) would be appreciated.

For the 'current generation', implementing a database is pretty much your safest bet. As you mentioned, it's pretty standard, and you outlined all of the fun stuff. Most SQL instances have a fairly powerful FULLTEXT (or equivalent) search.
You'll probably have just as much architecture to write between the two you outlined, especially if you want one to have the feature-parity of the other.
The up-and-coming technology is a key/value store, commonly referred to as NoSQL. With this, you can store your content and metadata into separate individual documents, but in a structured way that makes searching and retrieval quite fast. Some common NoSQL engines are mongo, CouchDB, and redis (among others).
Ultimately this comes down to personal preference, along with a few use-case considerations. You didn't really outline what is important to you as far as conveniences and your application. Any one of these would be just fine for a personal or development blog. Building an entire platform with multiple contributors is a different conversation.

13 years ago I tried your option 1 (having external files for text content) - not with a blog, but with a CMS. And I ended in shoveling it all back into the database for easier handling. It's much easier to have global replaces on the database than on the text file level. With large numbers of post you run into trouble with directory sizes and access speed, or you have to manage subdirectory schemes etc. etc. Stick to the database only approach-
There are some tools to make your life easier with text files than the built-in mysql functions, but with a command line client like mysql and mysqldump you can easily extract any texts to the file system level, work on them with standard tools and re-load them into the database. What mysql really lacks is built-in support for regex search/replace, but even for that you'll find a patch if you're willing to recompile mysql.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.