Switch gettext translated language with original language

Switch gettext translated language with original language - php

I started my PHP application with all text in German, then used gettext to extract all strings and translate them to English.
So, now I have a .po file with all msgids in German and msgstrs in English. I want to switch them, so that my source code contains the English as msgids for two main reasons:
More translators will know English, so it is only appropriate to serve them up a file with msgids in English. I could always switch the file before I give it out and after I receive it, but naaah.
It would help me to write English object & function names and comments if the content text was also English. I'd like to do that, so the project is more open to other Open Source collaborators (more likely to know English than German).
I could do this manually and this is the sort of task where I anticipate it will take me more time to write an automated routine for it (because I'm very bad with shell scripts) than do it by hand. But I also anticipate despising every minute of manual computer labour (feels like an oxymoron, right?) like I always do.
Has someone done this before? I figured this would be a common problem, but couldn't find anything. Many thanks ahead.
Sample Problem:
<title><?=_('Routinen')?></title>
#: /users/ruben/sites/v/routinen.php:43
msgid "Routinen"
msgstr "Routines"
I thought I'd narrow the problem down. The switch in the .po-file is no issue of course, it is as simple as
preg_replace('/msgid "(.+)"\nmsgstr "(.+)"/', '/msgid "$2"\nmsgstr "$1"/', $str);
The problem for me is the routine that searches my project folder files for _('$msgid') and substitutes _('msgstr') while parsing the .po-file (which is probably not even the most elegant way, after all the .po-file contains comments which contain all file paths where the msgid occurs).
After fooling around with akirk's answer a little, I ran into some more problems.
Because I have a mixture of _('xxx') and _("xxx") calls, I have to be careful about (un)escaping.
Double quotes " in msgids and msgstrs have to be unescaped, but the slashes can't be stripped, because it may be that the double quote was also escaped in PHP
Single quotes have to be escaped when they're replaced into PHP, but then they also have to be changed in the .po-file. Luckily for me, single quotes only appear in English text.
msgids and msgstrs can have multiple lines, then they look like this
msgid = ""
"line 1\n"
"line 2\n"
msgstr = ""
"line 1\n"
"line 2\n"
plural forms are of course skipped at the moment, but in my case that's not an issue
poedit wants to remove strings as obsolete that seem successfully switched and I have no idea why this happens in (many) cases.
I'll have to stop working on this for tonight. Still it seems using the parser instead of RegExps wouldn't be overkill.

I built on akirk's answer and wanted to preserve what I came up with as an answer here, in case somebody has the same problem.
This is not recursive, but that could easily change of course. Feel free to comment with improvements, I will be watching and editing this post.
$po = file_get_contents("locale/en_GB/LC_MESSAGES/messages.po");
$translations = array(); // german => english
$rawmsgids = array(); // find later
$msgidhits = array(); // record success
$msgstrs = array(); // find later
preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$german = str_replace('\"','"',$match[1]); // unescape double quotes (could misfire if you escaped double quotes in PHP _("bla") but in my case that was one case versus many)
$english = str_replace('\"','"',$match[2]);
$en_sq_e = str_replace("'","\'",$english); // escape single quotes
$translations['_(\''. $german . '\''] = '_(\'' . $en_sq_e . '\'';
$rawmsgids['_(\''. $german . '\''] = $match[1]; // find raw msgid with searchstr as key
$translations['_("'. $match[1] . '"'] = '_("' . $match[2] . '"';
$rawmsgids['_("'. $match[1] . '"'] = $match[1];
$translations['__(\''. $german . '\''] = '__(\'' . $en_sq_e . '\'';
$rawmsgids['__(\''. $german . '\''] = $match[1];
$translations['__("'. $match[1] . '"'] = '__("' . $match[2] . '"';
$rawmsgids['__("'. $match[1] . '"'] = $match[1];
$msgstrs[$match[1]] = $match[2]; // msgid => msgstr
}
foreach (glob("*.php") as $file) {
$code = file_get_contents($file);
$filehits = 0; // how many replacements per file
foreach($translations AS $msgid => $msgstr) {
$hits = 0;
$code = str_replace($msgid,$msgstr,$code,$hits);
$filehits += $hits;
if($hits!=0) $msgidhits[$rawmsgids[$msgid]] = 1; // this serves to record if the msgid was found in at least one incarnation
elseif(!isset($msgidhits[$rawmsgids[$msgid]])) $msgidhits[$rawmsgids[$msgid]] = 0;
}
// file_put_contents($file, $code); // be careful to test this first before doing the actual replace (and do use a version control system!)
echo "$file : $filehits <br>";
echo $code;
}
/* debug */
$found = array_keys($msgidhits, 1, true);
foreach($found AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";
echo "Not Found: <br>";
$notfound = array_keys($msgidhits, 0, true);
foreach($notfound AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";
/*
following steps are still needed:
* convert plurals (ngettext)
* convert multi-line msgids and msgstrs (format mentioned in question)
* resolve uniqueness conflict (msgids are unique, msgstrs are not), so you may have duplicate msgids (poedit finds these)
*/

See http://code.activestate.com/recipes/475109-regular-expression-for-python-string-literals/ for a good python-based regular expression for finding string literals, taking escapes into account. Although it's python, this might be quite good for multiline strings and other corner cases.
See http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/poswap.html for a ready, out-of-the-box base language swapper for .po files.
For instance, the following command line will convert german-based spanish translation to english-based spanish translation. You just have to ensure that your new base language (english) is 100% translated before starting conversion:
poswap -i de-en.po -t de-es.po -o en-es.po
And finally to swap english po file to german po file, use swappo:
http://manpages.ubuntu.com/manpages/hardy/man1/swappo.1.html
After swapping files, some manual polishing of resultant files might be required. For instance headers might be broken and some duplicate texts might occur.

So if I understand you correctly you'd like to replace all German gettext calls with English ones. To replace the contents in the directory, something like this could work.
$po = file_get_contents("translation.pot");
$translations = array(); // german => english
preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$translations['_("'. $match[1] . '")'] = '_("' . $match[2] . '")';
$translations['_(\''. $match[1] . '\')'] = '_(\'' . $match[2] . '\')';
}
foreach (glob("*.php") as $file) {
$code = file_get_contents($file);
$code = str_replace(array_keys($translations), array_values($translations), $code);
//file_put_contents($file, $code);
echo $code; // be careful to test this first before doing the actual replace (and do use a version control system!)
}

Related

PHP string replace with characters inbetween tags

I need help with my PHP, I'm using str_ireplace() and I want to filter something out and replace it with what I have.
I find it hard to explain what I am talking about so I will give an example below:
This is what I need
$string = "<error> " . md5(rand(0, 1000)) . time() . " </error> Test:)";
then I want to remove and replace the whole <error> .... </error> with nothing.
So the end outcome should just print 'Test:)'.

Your question is not perfectly clear, but I believe I may understand what you are asking. This code may do the trick:
$string = " " . md5(rand(0, 1000)) . time() . " Test:)";
$newstring = preg_replace("/.*?\ /i", "", $string);
This uses regular expressions to filter out everything that comes before the space (and also removes the space)

Find a pattern within two or more sets of text

I have lots of data that I need to search through for certain patterns.
Problem is when looking for said patterns I have no reference to what I'm looking for.
Or in other words, I have two paragraphs. Each on similar topics. I need to be able to compare both paragraphs and find patterns. Phrases said in both paragraphs and how many times both were said.
Can't seem to find the solution because preg_match and other functions your required to supply the things your looking for.
Example paragraphs
Paragraph 1:
Bee Pollen is made by honeybees, and is the food of the young bee. It
is considered one of nature's most completely nourishing foods as it
contains nearly all nutrients required by humans. Bee-gathered pollens
are rich in proteins (approximately 40% protein), free amino acids,
vitamins, including B-complex, and folic acid.
Paragraph 2:
Bee Pollen is made by honeybees. It is required for the fertilization
of the plant. The tiny particles consist of 50/1,000-millimeter
corpuscles, formed at the free end of the stamen in the heart of the
blossom, nature's most completely nourishing foods. Every variety of
flower in the universe puts forth a dusting of pollen. Many orchard
fruits and agricultural food crops do, too.
So from those examples these patterns:
Bee Pollen is made by honeybees
and:
nature's most completely nourishing foods
Both phrases are found in both paragraphs.

This is potentially a complex question depending on whether you're looking for similar phrases or phrases that match word for word.
Finding exact word-for-word matches is quite simple all you need to do is split on common breaks like punctuation marks (e.g. .,;:) and perhaps on conjunctions as well (e.g. and or). However, the problem comes when you come to, for example, adjectives two phrases might be exactly the same but have one word different, like so:
The world is spinnnig around its axis at a tremendous speed.
The world is spinning around its axis at a magnificent speed.
This won't match because tremendous and magnificent are used in place of one another. Potentially you could work around this, however, that would be a more complex question.
Answer
If we stick to the simple side of things we can achieve phrase matching with just a few lines of code (4 in this example; not including the formatting for comments/readability).
$wordSplits = 'and or on of as'; //List of words to split on
preg_match_all('/(?<m1>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para1, $matches1);
preg_match_all('/(?<m2>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para2, $matches2);
$commonPhrases = array_filter( //Removes blank $key=>$value pairs
array_intersect( //Finds matching paterns
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para1 values - removes leading and following spaces
}, $matches1['m1']),
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para2 values - removes leading and following spaces
}, $matches2['m2'])
)
);
var_dump($commonPhrases);
/**
OUTPUT:
array(2) {
[0]=>
string(31) "bee pollen is made by honeybees"
[5]=>
string(41) "nature's most completely nourishing foods"
}
/*
The above code will find matches splitting both on punctuation (defined in [...] of the preg_match_all pattern) it will also concatenate the word list (matching only words in the word list with a preceding and following space).
Wordlist
You can change the word list to include any breaks you like, editing the list until you get the phrases you are after, examples:
$wordSplits = 'and or';
$wordSplits = 'and but if or';
$wordSplits = 'a an as and by but because if in is it of off on or';
Punctuation
You can add any punctuation marks you like into the list (between [ and ]), however remember that some characters do have special meanings and may need to be escaped (or placed appropriately): - and ^ should become \- and \^ or be placed where their special meaning doesn't come into play.
You may consider changing:
([.,;:\-]|
To:
([.,;:\-] | //Adding a space before the pipe
So that you only split punctuation marks which are followed by a space. For example: this would mean that items like 50,000 won't be split.
Spaces and breaks
You may also consider changing the spaces to \s so that tabs and newlines etc are included and not just spaces. Like so:
'/(?<m1>.*?)([.,;:\-]|\s'.str_replace(' ', '\s|\s', trim($wordSplits)).'\s)/i'
This would also apply to:
([.,;:\-]\s|
If you decide to go down that route.

I've been working on this code, don't know if it suits your needs... Feel free to expand it!
$p1 = "Bee Pollen is made by honeybees, and is the food of the young bee. It is considered one of nature's most completely nourishing foods as it contains nearly all nutrients required by humans. Bee-gathered pollens are rich in proteins (approximately 40% protein), free amino acids, vitamins, including B-complex, and folic acid.";
$p2 = "Bee Pollen is made by honeybees. It is required for the fertilization of the plant. The tiny particles consist of 50/1,000-millimeter corpuscles, formed at the free end of the stamen in the heart of the blossom, nature's most completely nourishing foods. Every variety of flower in the universe puts forth a dusting of pollen. Many orchard fruits and agricultural food crops do, too.";
// Strip strings of periods etc.
$p1 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p1));
$p2 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p2));
// Extract words from first paragraph
$w1 = explode(" ", $p1);
// Build search string
$search = '';
$found = array();
foreach ($w1 as $word) {
//echo 'Word: ' . $word . "<br />";
$search .= ' ' . $word;
$search = trim($search);
//echo '. . Search string: '. $search . "<br /><br />";
if (substr_count($p2, $search)) {
$old_search = $search;
$num_occured = substr_count($p2, $search);
//echo " . . . found!" . "<br /><br /><br />";
$add = TRUE;
} else {
//echo " . . . not found! Generating new search string: " . $word . '<br />';
if ($add) {
$found[] = array('pattern' => $old_search, 'occurences' => $num_occured);
$add = FALSE;
}
$old_search = '';
$search = $word;
}
}
print_r($found);
The above code finds occurences of patterns from the first string in the second one.
I'm sure it can be written better, but since it's past midnight (local time), I'm not as "fresh" as I'd like to be...
Codepad-link

PHP substr doesn't echo anything

I'm having problems with this code, and the PHP method 'substr' is playing up. I just don't get it. Here's a quick introduction what I'm trying to achieve. I have this massive XML-document with email-subscribers from Joomla. I'm trying to import it to Mailchimp, but Mailchimp have some rules for the syntax of the ways to import emails to a list. So at the moment the syntax is like this:
<subscriber>
<subscriber_id>615</subscriber_id>
<name><![CDATA[NAME OF SUBSCRIBER]]></name>
<email>THE_EMAIL#SOMETHING.COM</email>
<confirmed>1</confirmed>
<subscribe_date>THE DATE</subscribe_date>
</subscriber>
I want to make a simple PHP-script that takes all those emails and outputs them like this:
[THE_EMAIL#SOMETHING.COM] [NAME OF SUBSCRIBER]
[THE_EMAIL#SOMETHING.COM] [NAME OF SUBSCRIBER]
[THE_EMAIL#SOMETHING.COM] [NAME OF SUBSCRIBER]
[THE_EMAIL#SOMETHING.COM] [NAME OF SUBSCRIBER]
If I can do that, then I can just copy paste it into Mailchimp.
Now here's my PHP-script, so far:
$fileName = file_get_contents('emails.txt');
foreach(preg_split("/((\r?\n)|(\r\n?))/", $fileName) as $line){
if(strpos($line, '<name><![CDATA[')){
$name = strpos($line, '<name><![CDATA[');
$nameEnd = strpos($line, ']]></name>', $name);
$nameLength = $nameEnd-$name;
echo "<br />";
echo " " . strlen(substr($line, $name, $nameLength));
echo " " . gettype(substr($line, $name, $nameLength));
echo " " . substr($line, $name, $nameLength);
}
if(strpos($line, '<email>')){
$var1 = strpos($line, '<email>');
$var2 = strpos($line, '</email>', $var1);
$length = $var2-$var1;
echo substr($line, $var1, $length);
}
}
The first if-statement works as it should. It identifies, if there's an ''-tag on the line, and if there is, then it finds the end-tag and outputs the email with the substr-method.
The second if-statement is annoying me. If should do the same thing as the first if-statement, but it doesn't. The length is the correct length (I've checked). The type is the correct type (I've checked). But when I try to echo it, then nothing happens. The script still runs, but it doesn't write anything.
I've played around with it quite a lot and seem to have tried everything - but I can't figure it out.

Warning
This function may return Boolean FALSE, but may also return a non-Boolean value which evaluates to FALSE. Please read the section on Booleans for more information. Use the === operator for testing the return value of this function.
You should be using if(strpos($line,'...') !== false) {
That aside, your file seems to be XML, so you should use an XML parser lest you fall under the pony he comes.
DOMDocument is a good one. You could do something like this:
$dom = new DOMDocument();
$dom->load("emails.txt");
$subs = $dom->getElementsByTagName('subscriber');
$count = $subs->length;
for( $i=0; $i<$l; $i++) {
$sub = $subs->item($i);
echo $sub->getElementsByTagName('email')->item(0)->nodeValue;
echo " ";
echo $sub->getElementsByTagName('name')->item(0)->nodeValue;
echo "\n";
}
This will output the names and emails in the format you described.

So there's a few things wrong with this, including the strpos command which will actually return 0 if it finds the tag at the beginning of the line, which doesn't appear to be what you intend.
Also, if the XML is not formatted exactly as you have, with each opening and closing tag on the one line, then your logic will fail as well.
It's not a good idea to re-invent XML processing for this reason...
Here as others have proposed, is a better solution to the problem*.
$xml = simplexml_load_file('emails.txt');
foreach( $xml->subscriber as $sub )
{
// Note that SimpleXML is aware of CDATA, and only outputs the text
$output = '[' . $sub->name . ']' . ' ' . '[' . $sub->email . ']';
}
*This assumes that you XML is valid, i.e. "subscriber" blocks are contained in a single parent at the top level. You can of course use simplexml documentation to adjust for your use case.

PHP using prefix tags to linkify text

I'm trying to write a code library for my own personal use and I'm trying to come up with a solution to linkify URLs and mail links. I was originally going to go with a regex statement to transform URLs and mail addresses to links but was worried about covering all the bases. So my current thinking is perhaps use some kind of tag system like this:
l:www.google.com becomes http://www.google.com and where m:john.doe#domain.com becomes john.doe#domain.com.
What do you think of this solution and can you assist with the expression? (REGEX is not my strong point). Any help would be appreciated.

Maybe some regex like this :
$content = "l:www.google.com some text m:john.doe#domain.com some text";
$pattern = '/([a-z])\:([^\s]+)/'; // One caracter followed by ':' and everything who goes next to the ':' which is not a space or tab
if (preg_match_all($pattern, $content, $results))
{
foreach ($results[0] as $key => $result)
{
// $result is the whole matched expression like 'l:www.google.com'
$letter = $results[1][$key];
$content = $results[2][$key];
echo $letter . ' ' . $content . '<br/>';
// You can put str_replace here
}
}

Find specific text in multiple TXT files in PHP

I want to find a specific text string in one or more text files in a directory, but I don't know how. I have Googled quite a long time now and I haven't found anything. Therefor I'm asking you guys how I can fix this?
Thanks in advance.

If it is a Unix host you're running on, you can make a system call to grep in the directory:
$search_pattern = "text to find";
$output = array();
$result = exec("/path/to/grep -l " . escapeshellarg($search_pattern) . " /path/to/directory/*", $output);
print_r($output);
// Prints a list of filenames containing the pattern

You can get what you need without the use of grep. Grep is a handy tool for when you are on the commandline but you can do what you need with just a bit of PHP code.
This little snippet for example, gives you results similar to grep:
$path_to_check = '';
$needle = 'match';
foreach(glob($path_to_check . '*.txt') as $filename)
{
foreach(file($filename) as $fli=>$fl)
{
if(strpos($fl, $needle)!==false)
{
echo $filename . ' on line ' . ($fli+1) . ': ' . $fl;
}
}
}

If you're on a linux box, you can grep instead of using PHP. For php specifically, you can iterate over the files in a directory, open each as a string, find the string, and save the file if the string exists.

Just specify a file name, get the contents of the file, and do regex matching against the file contents. See this and this for further details regarding my code sample below:
$fileName = '/path/to/file.txt';
$fileContents = file_get_contents($fileName);
$searchStr = 'I want to find this exact string in the file contents';
if ($fileContents) { // file was retrieved successfully
// do the regex matching
$matchCount = preg_match_all($searchStr, $fileContents, $matches);
if ($matchCount) { // there were matches
// $match[0] will contain the entire string that was matched
// $matches[1..n] will contain the match substrings
}
} else { // file retrieval had problems
}
Note: This will work irrespective of whether or not you're on a linux box.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.