internationalization of php website - php

I am currently working on a project / website and I will need to make it available in several languages. The site was done with PHP / mysql and a lot of javascript (jQuery). I have no idea where to start and I was hoping somebody could give me some hints. I would like to know opinions about what is the best approach to take, if there are some good tools for such a php site, what to do with the existing scripts, or better, with the text inside of the scripts that need to be translated as well. Does anybody had to do something like this before that could guide me through the right path :) ??
thanks

There are a number of ways of tackling this. None of them "the best way" and all of them with problems in the short term or the long term. The very first thing to say is that multi lingual sites are not easy, translators and lovely people but hard to work with and most programmers see the problem as a technical one only. There is also another dimension, outside the scope of this answer, as to whether you are translating or localising. This involves looking at the target audiences cultural mores and then tailoring language, style, layout, colour, typeface etc., to that culture. Finally do not use MT, Machine Translation, for anything serious or if it needs to be accurate and when acquiring translators ensure that they are translating from a foreign language into their native language which means that they understand all the nuances of the target language.
Right. Solutions. On the basis that you do not want to rewrite the site then simply clone the site you have and translate the copies to the target language. Assuming the code base is stable you can use a VCS to manage any code changes. You can tweak individual parts of the site to fit the target language, for example French text is on average 30% larger than the equivalent English text so using one site to deliver this means you may (will) have formatting problems and need to swap a different css file in and out depending on the language. It might seem a clunky way to do it but then how long are the sites going to exist? The management overhead of doing it this way may well be less than other options.
Second way without rebuilding. Replace all content in the current site with tags and then put the different language in file or db tables, sniff the users desired language (do you have registered users who can make a preference or do you want to get the browser language tag, or is it going to be URL dot-com dot-fr, dot-de that make the choice) and then replace the tags with the target language. Then you need to address the sizing issues and the image issues separately. This solution is in effect when frameworks like Symfony and Zend do to implement l10n.
Then you could rebuild with a framework or with gettext and and possibly have a cleaner solution but remember frameworks were designed to solve other problems, not translation and the translation component has come into the framework as partial solution not the full one.
The big problem with all the solutions is ongoing maintenance. Because not not only do you have a code base but also multiple language bases to maintain. Unless you all in one solution is really clever and effective then to ongoing task will be difficult.

Related

Best practice for dynamically translating content into different languages

I am the project manager on a website that needs to be converted into multiple languages. I am trying to figure out what the best option to go with is. I don't have a problem paying for something, but I just want to make sure it will work properly.
The options that I have thought of was to either (somehow) integrate google translate that when the user clicks on the language they want to read the page in, it updates the language for google to translate into. I did work with Google translate a little bit, but I found it to be little clumsy. Maybe I am not using it properly.
Another alternative I had, definitely not the best idea, but a backup if need be is to have the content put in a database and pulling the content dependent on the user's language. The only problem I have is that changing one word on the English version would have to change on every other language.
I am open to any other idea. I can clarify the project more, if need be.
As someone who speaks several languages, I can assure you that Google Translate often misses the mark. In many cases their translations are embarrassing, especially when you try to translate individual words or phrases without a sufficient context. Some language pairs are better than others, but overall this is not an option at this point.
Compiled languages have an advantage of static i18n, when a different version of a code is compiled for each UI language.
Database-driven dynamic i18n is a bad option, and almost all programming frameworks try to avoid it. I would recommend, therefore, that you look for an i18n solution that works with properties (text) files to lookup translated strings. In PHP this is gettext or intl.
Note also that i18n involves not only translation of text, but it also requires appropriate localization of dates, numbers, currencies, etc.
I don't have a problem paying for something, but I just want to make
sure it will work properly.
Based on that statement of yours I would like to suggest that hiring a firm that specializes in translation will be your best bet, then just put a multiple links that will lead to multiple languages of your website.
Problems that you might encounter:
Adjusting contents, some translations might be too short, some might be too long.
Using google translate can ruin your site, because sometimes it fails especially for some languages.

a two or three-pane hierarchical list app with several scrollable areas - what language handles that well?

My question is in bold (see below) but I hope to provide some insight into my issues just in case it helps anyone who could answer my question.
I'm not sure which framework or language is the best for this job but I'd like to make an app that has two or three independently scrollable areas, similar to a 2 column template, that are all dynamically updated. Users can populate each list separately as they go and yah know ... I really don't want the app to move slowly in any way. It needs to be jerky with almost knee-jerk reactions to input.
I've read some interesting things about speeds for each language that I'm considering (php and python - possibly java or ruby) and well, I can't really decide for myself since I don't yet know what's going on to create any slowdowns.
The app would be very simple requiring basic information on a user and letting the user basically grow their own database of lists themselves. Some fancy things would happen on the site per the users input or list updates like color changes or pictures and numbers that change .. maybe a graph or something.
All that said, I hope my question(s) is simple and answerable:
What is a useful programming language and/or framework for making and handling user created hierarchical lists that would ultimately grow in complexity? And how does the language differ from the others for tasks like ones as described in this post?
Just trying to find the right/best/effective tool for this particular job.
By the way, I'm new to programming but have covered the basic tutorials for python (plus some django) and php via youtube mostly - I've got a few books in the queue. be gentle
I don't think you will conclusively be able to decide on a language for your task based on clear evidence of it outperforming others for similar task.
There's too many variables to factor in.
My take is, a well built app, whatever the technology would perform as well as the other one with different technologies, but built well. The difference wouldn't be much to make a fuss over.
Some things that are of note:
The client side code, HTML, CSS and Javascript would need to be top notch. You need to
Combine that with a good host, a nicely constructed and tuned database, a good lean method of communication back and forth between client and server.
Use gzip, caching, minifying and combining scripts, stylesheets, make fewer http requestes,
Architect the application with performance in mind from the get go.
If you are new to programming then the language/framework will be the least of your problems. You tried a little Python and some PHP. Which language did you find easier to grasp? Whichever one that was, pick it and just start writing.
As you work on your project and become a better programmer you can revisit the language/framework debate (although, to be honest, the programmer influences the site performance more than the framework).

Building cms for my bachelor degree and need some advice

I'm currently starting to write my own CMS in php from ground up using CakePHP (or should i use something else?) for my bachelors degree. And i'm thinking about various stuff that will be needed to do.
One of the things i can not figure out is if i should use a single file (for example, index.php will handle everything, and will include everything) or i should break up my cms into a few smaller files.
so my main questions are
is cakePHP a good choice?
use one file for everything or use multiple files?
do you have any good general advice on building more complex websites using php or any best-practices advice (i don't really understand why they don't teach us this in school)
Using a single entry point or multiple entry points becomes a moot point if you are using most frameworks. CakePHP for instance has an index.php file and all you end up doing is defining models, views, and controllers for different parts of your project. I would imagine that most frameworks these days work this way.
Alternatively, if you choose to roll your own framework and system for managing this, which given this is for a bachelor's degree may be (1) a lot of extra work but (2) more revealing and more instructive, I can speak from experience that I found having a single entry point to be useful.
It enables you to have a common code path for set-up stuff: things like enabling E_STRICT, E_NOTICE, etc. for debugging and reliability purposes. Things like sanitizing form inputs to work around the magic-quotes setting. Yes you can do that from an include 'globals.php' but:
Putting everything in one place also lets you come up with a standard file-naming convention and an __autoload handler that will help remove any include or require directives except for perhaps one. Means you can add classes and such without having to also remember to update a master file.
And this is entirely subjective, but I have found that it's easier to create simpler URLs using this. Instead of /volunteers/communities.php?id=Hedrick_Summit I can do /volunteers/communities/Hedrick_Summit which is more pleasing to me.
As for the choice of CakePHP, I have briefly toyed around with that framework. What I don't like about frameworks in general is they often have to be too general, to the point it results in extra cruft and slower page rendering. And the moment you have to do something that pushes the boundaries of the framework, and you will, you end up fighting the framework.
But to be fair, CakePHP seems to be adequate and generally well-designed. I personally took issue with the ORM layer but that was me striving for perfection and actually trying to do work in the SQL query. It has a reputation for being slow, but unless you're trying to build the next Facebook you should be fine.
Using a single file "entry point" gives you more flexibility when it comes to routing requests to various logic - you'll only ever have to worry about filtering one spot in a request chain.
These are really subjective questions.
I, once, wrote a CMS in php from ground up for my 3rd year project.
What I did was basically:
Checking how other people did it (Plume CMS and CMSmadesimple were a good start)
I didn't use any framework (that was a requirement)
and Yes, I used index.php with multiple params to handle different pages.
Answer is yes use multiple files in multiple directories, it makes all difference in the world when you need to debug or scale.
I would advise you to keep in mind the MVC (Model-View-Controller) pattern.
It is one of the most commonly used (and often misused) patterns in the CMS field.
Also, don't be afraid about looking what other people are doing. Read the code from Joomla, Drupal and other open source CMS. Have a look to language different from PHP to have a comprehensive glance about the possibilities.
Don't try to simply re-invent the wheel. Even if this is simply a Uni assignment, try to put something new on your CMS. Something that would push me to use yours instead of other CMS.
is cakePHP a good choice?
That's a highly subjective question and as such unanswerable. Though, if you want to experiment with architecture (eg. compare front controllers to page controllers), you probably should build more from scratch, as a lot of those decisions have already been made by the writers of said framework (And a lot of other frameworks, for the matter).
use one file for everything or use multiple files?
It's called a front controller (single entrypoint) or page controllers (multiple entry points). Get a copy of Patterns of Enterprise Application Architecture by M. Fowler.
do you have any good general advice on building more complex websites using php or any best-practices advice (i don't really understand why they don't teach us this in school)
There are billions of CMS's. Find some of them and analyse them to find out what they did and how they differ from each other. Trying to categorise the different approaches and compare their strenghts/weaknesses could make for a good paper.

Better to use multiple language files or 1?

From your experience, is it better to use 1 language file or multiple smaller langauge files for each language in a PHP project using the gettext extension? I am not even sure if it is possible to use multiple files, it is hard for me to test since the server caches the language files.
I am doing multiple languages on a social network site, so far just the signup page which is about 1 out of 200 pages to go and it has 35 text strings to translate, at this pace the language file for each language wold be really large so I was thinking maybe it would be better to do different language files for differnt pages or perhaps sections like forums section and blogs section but if it makes no difference then I would ratther not waste my time in making multiple smaller files for each language.
I realize every situation is different and the only real answer is to test it but I am hoping to avoid that this time and just get some oppinions of people more experienced, this is my first time using gettext, thanks
I would have the language files module based. With gettext you need to specify locale for each language. It would fit best to have a separate .po/.mo files for each module or big parts of your site.
That's my opinion. :-)
I typically automate the process and have multiple languages in multiple files by using a database to edit the site (using a simple db lookup). This lets me hire translators to come in and verify the current translation easily. Deploying to production then is simply turning the database into a set of language files.
From experience i would break the languages down on a per file basis as the management overhead becomes heavy and there is great scope for duplication and mistakes.
The other advantage it that by using a directory structure and naming convention the correct language can be selected programatically more easily than the large file and it is easier to write management tools at a later stage in the project.
It is also worth looking at some of the formats other people use. Many of the Frameworks use this sort of structure, Dashcode, Symfony, Zend etc. And there is an xml format xliff which is built to handle translation and integrates with many of the tools that translators use.
Multiple files are the best way to go, but things can get disorganized.
We've just launched a free new service called String which solves most of the problems of managing multiple language files - like a basecamp for localization. You can either import existing files, or start from scratch with keys and strings in the system. When you're ready, you can export the files again to run your app. It works with PHP (array), PHP (define), po, yaml, ini and .strings formats.
String allows you to collaborate with translators easily - you just invite them to a project and set their language permissions. Translators can leave comments and questions on each string if they need more info - and you can revert strings back using the History function if things aren't quite right.
Anyway enough sales pitch!
Check it out at http://mygengo.com/string - we'd love your feedback.

Looking for ideas on a computer science course project [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Hey. I'm taking a course titled Principles of Programming Languages, and I need to decide on a project to do this summer. Here is a short version of what the project needs to accomplish:
"The nature of the project is language processing. Writing a Scheme/Lisp processor is a project of this type. A compiler for a language like C or Pascal is also a potential project of this type. Some past students have done projects related to databases and processing SQL. Another possible project might relate to pattern matching and manipulating XML. Lisp, Pascal, and C usually result in the most straight forward projects."
I am very interested in web technologies, and have some experience with PHP, MySql, JavaScript, etc. and I would like to do something web oriented, but I'm having trouble coming up with any ideas. I also want this to be a worthwhile project that could have some significance, instead of just doing the same thing as everyone else in class.
Any ideas? Thanks!
EDIT: I really like the idea of a Latex to XHTML/MathML translator, and I passed the idea to my instructor, in which he wrote back:
"I think the idea is interesting, my question (and yours) is whether it is appropriate.
I think of LateX as a low-level mark-up language. I'm wondering if converting this to XHTML or MathML is really a change in levels and complexity. I think you can make your point with a little more discussion and some examples. You might also think of some other mark-up constructs which made it easier to describe equations."
Any ideas on how to convince him this may be appropriate, or any extensions of this idea that could work for the goals of my project?
Thanks for all the responses so far!
Hm, neat! Maybe:
1. A web-based language interpreter. eg, a very simple assembly interpreter in javascript, or a PHP-based C interpreter (PHP script reads C code, and executes it in some sort of sandboxed kind of way. Obviously it would only be able to implement a small subset of the C language)
2. Maybe some automated way to transform PHP data structures (like PHP arrays) into SQL queries, and vice versa. That kind of stuff has already been done, but you might be able to do something which (for example) takes an SQL query and creates the array datastructure that would be needed to "hold" the information returned by the SQL. It could support complex things like JOINS and GROUP BYs.
3. Maybe a C-to-PHP compiler? (or a PHP-to-C compiler, to be able to run simple PHP code natively. Use this with any combination of languages)
edit:
4. Maybe a regex-to-C parser. That is, something that takes a regex, and generates C code to match that pattern. Or something which takes a regex, and converts it into an FSM which represents the "mathematical" translation of that expression. Or the opposite - something which takes an FSM for a CFL and generates the perl-syntax regex for it.
5. Maybe an XML-to-PHP/MySQL parser. eg, an XML file might contain information about a database and fields, and then your program creates the SQL to create those tables, or the HTML/PHP code for the forms.
Best of luck!
I'd stay away from PHP and MySQL for a project like this. Both are commercial platforms that have compromised a lot of core CS principles in order to gain market share and solve user's problems. Given what you've described it sounds like the point of this project is to think about how programming languages are processed. Javascript The Language (not the browser API) might be a good choice here. Writing a processor/interpreter/compiler for Javascript or using Javascript itself to write a processor/interpreter/compiler for another language would meet the criteria for the assignment. Writing a Javascript "minifier" that removes all unnecessary white space (for smaller file sizes) while maintaining the program's functionality is another possible project.
Here's something I'd love: a PHP-based LaTeX-to-MathML translator. It wouldn't have to do everything, but if I could just cut-and-paste mathematical formulas written in valid LaTeX code into a window and have the script parse it and convert it into valid MathML, that'd be awesome.
Let me expand on this some more. The current state of scientific publishing on the web isn't great. Titles, headers, section numbers, tables, etc. can all be done in HTML, but for mathematical and chemical formulas which depend on precise two-dimensional formatting, scientific authors have only second-class options:
Publish their work in pdf format, which looks great but has a (comparably) huge file size and doesn't do hyperlinking well, or
Use something like latex-to-html, which converts formulas into .gif files (or some similar image file), which are semantically meaningless and thus doesn't lend themselves to indexing or searching.
Moreover, neither of these options allow for mathematical formulas to be generated programmatically, which would be helpful to the education community (think randomly-generated online homework).
Publishing scientific work in MathML would solve all of these issues, but it has a few of issues of its own, namely:
It's really too verbose to code by hand. I mean, you can do it, but c'mon.
The scientific community uses LaTeX for publishing, they're happy with it (for good reason), and they're not about to learn another mathematical markup language when they've got their own research and lesson-planning to do.
Browser support for MathML is currently pretty limited. I know this, and I don't mean to stick my head in the sand about it.
In other words: scientific authors know LaTeX, they use it daily, it's the de facto standard for authoring scientific content. MathML isn't and won't ever be the way math and science is authored, but it's the only semantically rich way to put hypertext mathematics on the web. Browser support for MathML is weak because nobody uses it; nobody uses it because it's too hard to write by hand. Now, maybe this is wishful thinking, but I have to believe that if it were only easier to write MathML, more scientists and mathematicians, especially the early-adopter types, would at least try it, and this would inspire browsers (especially open-source browsers) to improve their support, which would then lead to more authors using it, etc.
Here's where the translator comes in: Until the barrier-to-entry for MathML drops, it'll never be widely adopted. A simple LaTeX-to-MathML converter would take care of that. It would reduce the barrier-to-entry for MathML to near zero. If it leads to widespread use of and better support for MathML, it would be a major benefit to the scientific and education communities.
I finished this course last semester :)
IMHO the best way to go is to build an expression evaluator. build the simplest expression evaluator you can.
Then add these features in order as many as you like:
1- constant symbols, just place holders for variables. your evaluator should ask for their values after parsing the expression.
2- imperative-style variables. Like variables in any imperative language, where the user can change the value of a symbol anywhere in code.
3- simple control-statements. 'if-else' and pretest while loop are the simplest to consider.
4- arrays. if you really want your expression evaluator to be really like a programming language. It would be interesting if you add variable dimension arrays to your 'language'. you have to build a generic mapping function for your arrays.
Now you got a real programming language. To be a useful one, you might add subroutines.
so the list continuous:
5- subroutines. This is little harder than previous features, but it should not be impossible :)
6- build a simple math library for your new language in your language it self! and that is the fun part in my opinion ;)
Sebest book is a good book to overview famous imperative programming languages.
You shouldn't view creating an implementation of a particular language as insignificant. Everyone probably wants to be a famous programmer and not many people achieve it. This is a great opportunity to be familiar with very cool uncommon languages. (Lisp, APL, etc) If this is your first time creating a compiler/interpreter then it will also be a better choice to go with an already existent language (so you can see what design elements are needed to create a successful language.)
Significant ideas typically arise from necessity. People began using a language because they either needed it or it was a lot easier to accomplish the task they wanted to do. I don't think you will find the answer or the motivation to start a project from scratch here. That being said, I've always thought it would be cool to have a language that uses processor native byte code to create dynamic websites (without using something like cgi).
In response to your edit, here are some latex ideas:
LaTeX-to-ASCII pretty print, perhaps just for a small subset of TeX
LaTeX-to-Maple/Mathcad/Mathematica script, so that equations can be imported or edited or solved (don't know if that already exists)
Javascript LaTeX translator. basically, as you type, it does a translation from latex to html/css/.gif/whatever, so you can see your math "live" as you type it, kinda like the stackoverflow text editor.
Perhaps some sort of latex macros for expressing C code or something? Or how about this: often, C code is doing math: "det = (b*b - 4*a*c); det_sqrt = sqrt(det); etc" How about something which takes C (or java or whatever) code, which is performing a series of arithmetic assignments, and converts it into a nicely-formatted latex list of equations that are human-readable (ie, a \begin{eqnarray} block)
Or something that does the opposite: take a listing of latex computations or equations, and generates C code which declares the requisite variables, gets requisite user input, and performs the computations listed in your latex?
Why not write some sort of interface that can be interpreted/compiled down to the appropriate web technology of the users choice?
Or something like a Python to C compiler?
Just something I thought of recently: write a Ruby interpreter in Lisp.
Something that can be interesting to work on, is a regexp to automaton using Glouchkov's algorithm, here are some key features that can be implemented
Syntaxical analysis of regexp
Transformation into an automaton using Glouchkov's algorithm
Generating random phrases matching the regexp with that automaton / Validating phrases
Exporting automatons using XML
That's not a very long assignment so you may be able to handle it in a few months
You can try to make a scripting language in the vein of nadvsh if you want to do something interesting, but it might be too removed from what your instructor is expecting of you.
New Adventure Shell (nadvsh)
If you want to process language you can do a UIMA program. UIMA stands for Unstructured Information Management Architecture, it was developped by IBM at a cost of about 45Million dollars and is now available opensource. Basically UIMA is ascii codecs to analyse text documents to find patterns. It is made to find things where there is no order(finding needles in hay stacks). It uses XML and C.
The web is a rich area for doing work with languages. Take a look at a popular web framework like Ruby on Rails, and you'll find that much of its productivity comes from the fact that it implements a domain specific language well suited to web applications. Ruby just so happened to be a good language to implement such a language because of its dynamic nature, but the power comes from the language they created from it.
In your case, perhaps you could try designing your own domain specific language using a language that you are familiar with, such as PHP, to implement the essential core of a web framework:
routing URLs to pages
generating pages dynamically using a template (and maybe implement your own template syntax!)
connecting objects to underlying databases (object relational mapping)
If you are really ambitious, instead of building from an existing language, you could build your own language from the ground up (lexer, parser, code generator, etc) to do this.
You can ideas from this massive list.
Writing compiler for C or Pascal will likely take you months or years, if you are not compiler guru.
Write a simple web server. It will be fun and might prove useful as a simple and free solution. I once met a guy who said he did something like this and used for simple customer sites. Yours could become a useful thing as well.

Categories