I am looking for a simple way to create an excel or CSV file from a .po localization file.
I couldn't find any via Google, so i'm thinking of writing it myself in PHP.
The PO file has such structure
msgid "Titre"
msgstr "Titre"
So i guess i need my PHP script to parse the .po file looking for "the first bit of text between comma after each occurrence of the keyword msgstr".
I assume that's the job for a regex, so i tried that, but it does not return anything:
$po_file = '/path/to/messages.po';
die("you got the filepath wrong dude.");
$str = file_get_contents($po_file);
// find all occurences of msgstr "SOMETHING"
preg_match('#^msgstr "([^/]+)"#i', $str, $matches);
$msgstr = $matches[1];
There is a nice pear library. File_Gettext
If you look at the source File/Gettext/PO.php you see the regex pattern that you'll need:
$matched = preg_match_all('/msgid\s+((?:".*(?<!\\\\)"\s*)+)\s+' .
$contents, $matches);
for ($i = 0; $i < $matched; $i++) {
$msgid = substr(rtrim($matches[1][$i]), 1, -1);
$msgstr = substr(rtrim($matches[2][$i]), 1, -1);
$this->strings[parent::prepare($msgid)] = parent::prepare($msgstr);
Or just use the pear lib:
include 'File/Gettext/PO.php';
$po = new File_Gettext_PO();
$poArray = $po->toArray();
foreach ($poArray['strings'] as $msgid => $msgstr) {
// write your csv as you like...
After search on Google and i've found this code to extract comment msgid, msgstr. It works!
$contents = file_get_contents("file.po");
$regex = '/^#\s*(.+?)\nmsgid "(.+?)"\nmsgstr "(.+?)"/m';
$matched = preg_match_all($regex, $contents, $matches );
$array = NULL;
for ($i = 0; $i < $matched; $i++) {
$array[] = array('comment' => $matches[1][$i],
'msgid' => $matches[2][$i],
'msgstr' => $matches[3][$i]
I have a search String: $str (Something like "test"), a wrap string: $wrap (Something like "|") and a text string: $text (Something like "This is a test Text").
$str is 1 Time in $text. What i want now is a function that will wrap $str with the wrap defined in $wrap and output the modified text (even if $str is more than one time in $text).
But it shall not output the whole text but just 1-2 of the words before $str and then 1-2 of the words after $str and "..." (Only if it isn`t the first or last word). Also it should be case insensitive.
$str = "Text"
$wrap = "<span>|</span>"
$text = "This is a really long Text where the word Text appears about 3 times Text"
Output would be:
"...long <span>Text</span> where...word <span>Text</span> appears...times <span>Text</span>"
My Code (Obviusly doesnt works):
$tempar = preg_split("/$str/i", $text);
if (count($tempar) <= 2) {
$result = "... ".substr($tempar[0], -7).$wrap.substr($tempar[1], 7)." ...";
} else {
$amount = substr_count($text, $str);
for ($i = 0; $i < $amount; $i++) {
$result = $result.".. ".substr($tempar[$i], -7).$wrap.substr($tempar[$i+1], 0, 7)." ..";
If you have a tipp or a solution dont hesitate to let me know.
I have taken your approach and made it more flexible. If $str or $wrap changes you could have escaping issues within the regex pattern so I have used preg_quote.
Note that I added $placeholder to make it clearer, but you can use $placeholder = "|" if you don't like [placeholder].
function wrapInString($str, $text, $element = 'span') {
$placeholder = "[placeholder]"; // The string that will be replaced by $str
$wrap = "<{$element}>{$placeholder}</{$element}>"; // Dynamic string that can handle more than just span
$strExp = preg_quote($str, '/');
$matches = [];
$matchCount = preg_match_all("/(\w+\s+)?(\w+\s+)?({$strExp})(\s+\w+)?(\s+\w+)?/i", $text, $matches);
$response = '';
for ($i = 0; $i < $matchCount; $i++) {
if (strlen($matches[1][$i])) {
$response .= '...';
if (strlen($matches[2][$i])) {
$response .= $matches[2][$i];
$response .= str_replace($placeholder, $matches[3][$i], $wrap);
if (strlen($matches[4][$i])) {
$response .= $matches[4][$i];
if (strlen($matches[5][$i]) && $i == $matchCount - 1) {
$response .= '...';
return $response;
$text = "text This is a really long Text where the word Text appears about 3 times Text";
string(107) "<span>text</span> This...long <span>text</span> where...<span>text</span> appears...times <span>text</span>"
To make the replacement case insensitive you can use the i regex option.
If I understand your question correct, just a little bit of implode and explode magic needed
$text = "This is a really long Text where the word Text appears about 3 times Text";
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
If you specifically need to render the span tags using HTML, just write it that way
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
Use patern below to get your word and 1-2 words before and after
In PHP code it can be:
$str = "Text";
$wrap = "<span>|</span>";
$text = "This is a really long Text where the word Text appears about 3 times Text";
$temp = str_replace('|', $str, $wrap); // <span>Text</span>
// find patern and 1-2 words before and after
// (to make it casesensitive, delete 'i' from patern)
if(preg_match_all('/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i', $text, $match)) {
$res = array_map(function($x) use($str, $temp) { return '... '.str_replace($str, $temp, $x) . ' ...';}, $match[0]);
echo implode(' ', $res);
I want to ask 2 questions about url conversion in php.
1 question: I need to convert text into link. I've done my own preg and also read many forums, but all solutions are connected with www. or (ht|f)tp(s), but I need preg that will convert domain names even without www and http in text, for example:
I like stackoverflow.com very much
I like <a href='http://stackoverflow.com'>stackoverflow.com</a> very much
Sure it must consider points and commas and etc., like:
I like stackoverflow.com.
I like <a href='http://stackoverflow.com'>stackoverflow.com</a>.
And one more question: links with url-encoded symbols on wiki are displayed as they are, but on other sites they are displayed like url-encoded string (%XX%XX%XX). How did wiki do this? Thanks!
For your first question, I would not recommend you to that, it is very difficult to know if the a word containing a dot is a domain name or not, and people often forget to put a space after the dot in the middle of a paragraph.
for your second question, it is simple, you url encode the link in the href but not between the open and close a tag. For example :
The function auto_link from CodeIgniter URL helper can help you:
if ( ! function_exists('auto_link'))
function auto_link($str, $type = 'both', $popup = FALSE)
if ($type != 'email')
if (preg_match_all("#(^|\s|\()((http(s?)://)|(www\.))(\w+[^\s\)\<]+)#i", $str, $matches))
$pop = ($popup == TRUE) ? " target=\"_blank\" " : "";
for ($i = 0; $i < count($matches['0']); $i++)
$period = '';
if (preg_match("|\.$|", $matches['6'][$i]))
$period = '.';
$matches['6'][$i] = substr($matches['6'][$i], 0, -1);
$str = str_replace($matches['0'][$i],
$matches['1'][$i].'<a href="http'.
$period, $str);
if ($type != 'url')
if (preg_match_all("/([a-zA-Z0-9_\.\-\+]+)#([a-zA-Z0-9\-]+)\.([a-zA-Z0-9\-\.]*)/i", $str, $matches))
for ($i = 0; $i < count($matches['0']); $i++)
$period = '';
if (preg_match("|\.$|", $matches['3'][$i]))
$period = '.';
$matches['3'][$i] = substr($matches['3'][$i], 0, -1);
$str = str_replace($matches['0'][$i], safe_mailto($matches['1'][$i].'#'.$matches['2'][$i].'.'.$matches['3'][$i]).$period, $str);
return $str;
I want for example to scan this $lang['foo1']='foo2'; from a PHP file so I tried
this but it doesn't work.
$file = "../lang/lang.en.php";
if(file_exists($file)) {
$text = fopen($file, 'r+');
$content = trim(file_get_contents($file, NULL, NULL, 221));
$i = 0;
do {
$n = sscanf($content, "\$lang['%s']=%s;", $s1[$i], $s2[$i]);
echo $s1[$i].'==>'.$s2[$i];
} while($s1[$i]! = '' && $s2[$i] != '');
What is my problem?
You should just include('../lang/lang.en.php') like a normal PHP file.
Also, it's possible to make lang.en.php return an array directly with return, http://nl3.php.net/manual/en/function.return.php
I would recommend to use preg_match_all() in your case.
// Match
// $lang['PAGE_TITLE']='Meine Webseite Titel';
$content = file_get_contents('../lang/lang.en.php');
preg_match_all("~\$lang\['.+'\]\s=\s'.+';~", $content, $result);
I'm trying to parse the resources contained in a resources.arsc file as discussed in this question. I know the androidmanifest.xml file identifies resources located in the .arsc file. I have successfully managed to parse the header of the .arsc file, I can't figure out how to parse the resources themselves.
Can somebody please help me figure out how to parse the resources contained in an .arsc file?
My parsing code so far:
$doc = fopen('resources.arsc', 'r+');
$res[$i] = _unpack('V', fread($doc, 4));
for ($i = 0, $j = $res[6]; $i <= $j; $i++) {
$word = fread($doc, 4);
$stroffs[] = _unpack('V', $word);
$strings = array();
$curroffs = 0;
foreach($stroffs as $offs){
//read length
$len = _unpack('v', fread($doc, 2));
//read string
$str = fread($doc, $len*2);
$str = '';
$wd = fread($doc, 2);
$strings[] = mb_convert_encoding($str, 'gbk', 'UTF-16LE');
//curr offset
$curroffs += ($len+1)*2 + 2;
$tpos = ftell($doc);
//fseek($doc, $tpos + $tpos % 4);
$i = 0;
$xmls = $strings;
//and then...somebody konw format or continue parse?
function read_doc_past_sentinel(&$doc){
$pos = ftell($doc);
$count= 0;
while($word = fread($doc, 4)){
if(_unpack('V', $word)==-1)break;
$n = 1;
if ($count < $n){
while($word = peek_doc($doc, 4)){
if(_unpack('V', $word) != -1)break;
fread($doc, 4);
if(isset($count) && $count >= $n)break;
echo 'skip '.$n.' chars<br />';
function peek_doc(&$doc, $size){
$data = fread($doc, $size);
fseek($doc, ftell($doc)-$size);
return $data;
function _unpack($m, $b){
//if(!$b)return '';
$res = unpack($m, $b);
return $res[1];
This is a fairly complicated binary file. You will need way more code than that to parse it. :)
My suggestion would be to use the same code that the platform does -- that is the ResTable and related classes found here:
Note that ResourceTypes.h also has definitions for the complete structure of the resource table (which the classes there use to parse it).
You may also just be able to use the aapt tool. This has a number of options for parsing resource-related data out of an .apk:
aapt d[ump] [--values] WHAT file.{apk} [asset [asset ...]]
badging Print the label and icon for the app declared in APK.
permissions Print the permissions from the APK.
resources Print the resource table from the APK.
configurations Print the configurations in the APK.
xmltree Print the compiled xmls in the given assets.
xmlstrings Print the strings of the given compiled xml assets.
If there is some other data you want not available with those commands, consider modifying the tool code in frameworks/base/tools/aapt to add stuff to parse out what you want. This tool is using ResTable to parse the resources.
I currently have the following code :
$content = "
<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>";
I need to find a method to create and array as name=>value. E.g Manufacturer => John Deere.
Can anyone help me with a simple code snipped I tried some regex but doesn't even work to extract the names or values, e.g.:
$pattern = "/<name>Manufacturer<\/name><value>(.*)<\/value>/";
preg_match_all($pattern, $content, $matches);
$st_selval = $matches[1][0];
You don't want to use regex for this. Try out something like SimpleXML
Well, why don't you start with this:
$content = "<root>" . $content . "</root>";
$xml = new SimpleXMLElement($c);
Despite the fact that some of the answers posted using regular expression MAY work, you should get in the habit of using the correct tool for the job and regular expressions are not the correct tool for parsing of XML.
I'm using your $content variable:
$preg1 = preg_match_all('#<name>([^<]+)#', $content, $name_arr);
$preg2 = preg_match_all('#<value>([^<]+)#', $content, $val_arr);
$array = array_combine($name_arr[1], $val_arr[1]);
This is rather simple, can be solved by regex. Should be:
$name = '<name>\s*([^<]+)</name>\s*';
$value = '<value>\s*([^<]+)</value>\s*';
$pattern = "|$name $value|";
preg_match_all($pattern, $content, $matches);
# create hash
$stuff = array_combine($matches[1], $matches[2]);
# display
First of all, never use regex to parse xml...
You could do this with an XPATH query...
First, wrap the content in a root tag to make the parser happy (if it doesn't already have it):
$content = '<root>' . $content . '</root>';
Then, load the document
$dom = new DomDocument();
Then, initialize the XPATH
$xpath = new DomXpath($dom);
Write your query:
$xpathQuery = '//name[text()="Manufacturer"]/follwing-sibling::value/text()';
Then, execute it:
$manufacturer = $xpath->evaluate($xpathQuery);
If I did the xpath right, it $manufacturer should be John Deere...
You can see the docs on DomXpath, a basic primer on XPath, and a bunch of XPath examples...
Edit: That won't work (PHP doesn't support that syntax (following-sibling). You could do this instead of the xpath query:
$xpathQuery = '//name[text()="Manufacturer"]';
$elements = $xpath->query($xpathQuery);
$manufacturer = $elements->item(0)->nextSibling->nodeValue;
I think this is what you're looking for:
$content = "<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>";
$pattern = "(\<name\>(\w*)\<\/name\>\<value\>(\w*)\<\/value\>)";
preg_match_all($pattern, $content, $matches);
$arr = array();
for ($i=0; $i<count($matches); $i++){
$arr[$matches[1][$i]] = $matches[2][$i];
/* This is an example on how to use it */
echo "Location: " . $arr["Location"] . "<br><br>";
/* This is the array */
If your array has a lot of elements dont use the count() function in the for loop, calculate the value first and then use it as a constant.
I'll edit as my PHP is wrong, but here's some PHP (pseudo-)code to give some direction.
$pattern = '|<name>([^<]*)</name>\s*<value>([^<]*)</value>|'
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
for($i = 0; $i < count($matches); $i++) {
$arr[$matches[$i][1]] = $matches[$i][2];
$arr is the array you want to store the name/value pairs.
Using XMLReader:
$content = '<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>';
$content = '<content>' . $content . '</content>';
$output = array();
$reader = new XMLReader();
$currentKey = null;
$currentValue = null;
while ($reader->read()) {
switch ($reader->name) {
case 'name':
$currentKey = $reader->value;
case 'value':
$currentValue = $reader->value;
if (isset($currentKey) && isset($currentValue)) {
$output[$currentKey] = $currentValue;
$currentKey = null;
$currentValue = null;
The output is:
[Manufacturer] => John Deere
[Year] => 2001
[Location] => NSW
[Hours] => 6320