php exec output is getting truncade truncated because of accents

php exec output is getting truncade truncated because of accents - php

I'm building a website on a linux server which can provide some informations about mkv file using mkvmerge command line but i'm facing a big issue when using the command $info = shell_exec("mkvmerge -J '".$chemin_fichier."'");
when the output of the command line contains accents, the output is being truncated :
exptected output :
{
"container": {
"properties": {
"is_providing_timestamps": true,
"title": "Le Bel Été 2019"
},
"type": "Matroska"
}
}
actual output :
{
"container": {
"properties": {
"is_providing_timestamps": true,
"title": "Le Bel
I did find on the web that we needed to modify the language of the environnement [using putenv() and setlocale() ] which i did but it didn t work. But i can define some variables using accents so this is quite strange.
anyway, when i run the same file on my computer using wamp server, or the same command line in my linux server terminal, i'm getting the correct output so i think the problem come from php(7.3) or apache(2.4).
do you have any idea ? Feel free to ask for extra details :)

Ok so i figured it out !
actually, the language "fr_FR.utf8" does not work for accents and special characters while "en_US.utf8" seems to be the only solution.
The default language on my server was POSIX which does not allow accents neither.
to make things work, place the following lines in your php script
$locale = 'en_US.utf8';
setlocale(LC_ALL, $locale);
putenv('LC_ALL='.$locale);

Related

How to find out the character-encoding standard that has been used in a PHP file?

I'm using PHP 7.2.11 on my laptop that runs on Windows 10 Home Single Language 64-bit operating system.
I've installed Apache/2.4.35 (Win32) and PHP 7.2.10 using the latest version of XAMPP.
I typed in a below code into a file titled demo.php :
<?php
$string1 = "Hel\xE1lo"; //Tried hexadecimal equivalent code-point from ISO-8859-1
echo $string1;
?>
After running above program into my web browser it gave me below output :
Hel�lo
Then, I made a small change to the above program and re-wrote the code as below :
<?php
$string1 = "Hel\xC3\xA1lo"; //Tried hexadecimal equivalent code-point from UTF-8, C form
echo $string1;
?>
After running the same program after making some change into my web browser it gave me below output (Indeed the expected result) :
Helálo
So, a doubt came to my mind after watching this stuff.
I want to know whether there is any built-in function or some mechanism in PHP which will tell me which character-encoding standard has been used in the current file?
P.S. : I know that in PHP the string will be encoded in whatever fashion it is encoded in the script file. I want to know whether there exist some built-in function, some mechanism or any other way around which will tell me the character-encoding standard used in the file under consideration.

This function must be in the same file whose encoding is to be determined.
//return 'UTF-8', 'iso-8859-1',.. or false
function getPageCoding(){
$codes = array(
'UTF-8' => "\xc3\xa4",
'iso-8859-1' => "\xe4",
'cp850' => "\x84",
);
return array_search('ä',$codes);
}
echo getPageCoding();
Demo: https://3v4l.org/UVvBM

Fix indentation of PHP code files with php-cs-fixer

I have several hundreds of horribly indented PHP files with mixed tabs and spaces (and even mixed line endings, I suppose) I would like to fix them with php-cs-fixer v2+.
I have configured php-cs-fixer to my needs, and the code is scrubbed accordingly - except the indentation. I have tried a minimal configuration, like shown bellow, to pin down the problem. But I cannot get the indentation fixer straight:
return PhpCsFixer\Config::create()
->setRules([
'#PSR2' => true,
'indentation_type' => true,
'braces' => ['position_after_functions_and_oop_constructs' => 'same'],
])
->setIndent("\t")
->setLineEnding("\r\n")
Currently, I run this on my Windows box using the following command (here for a single file):
php-cs-fixer.bat fix new_user.php --config /full/windowspath/to/php_cs.dist
Just in case, the generated php_cs.cache (which contains the actually applied rules in JSON) file looks like this:
{
"php": "5.6.31",
"version": "2.6.0:v2.6.0#5642a36a60c11cdd01488d192541a89bb44a4abf",
"rules": {
"blank_line_after_namespace": true,
"braces": {
"position_after_functions_and_oop_constructs": "same"
},
"class_definition": true,
"elseif": true,
"function_declaration": true,
"indentation_type": true,
"line_ending": true,
"lowercase_constants": true,
"lowercase_keywords": true,
"method_argument_space": {
"ensure_fully_multiline": true
},
"no_break_comment": true,
"no_closing_tag": true,
"no_spaces_after_function_name": true,
"no_spaces_inside_parenthesis": true,
"no_trailing_whitespace": true,
"no_trailing_whitespace_in_comment": true,
"single_blank_line_at_eof": true,
"single_class_element_per_statement": {
"elements": ["property"]
},
"single_import_per_statement": true,
"single_line_after_imports": true,
"switch_case_semicolon_to_colon": true,
"switch_case_space": true,
"visibility_required": true,
"encoding": true,
"full_opening_tag": true
},
"hashes": {
"new_students.org_.php": -151826318
}
}
And here is some badly indented sample file content.
<?php
session_start();
include 'connect.php';
include 'functions.php';
$test= "abc";
$additional_studs = "";
if (date('m') == 12 and $term='SP') {
$yr_suffix = date('y') + 1;
} else {
$yr_suffix = date('y');
}
function dup_stud($id, $conn)
{//...
}
$i = 0;
I am most annoyed be lines like $test="abc"; & include 'connect.php'; with one or more leading tabs/spaces that do not get properly indented.
I am open to alternative approaches. Others must have faced formatting issues like this before.
I have also tried NetBeans, which happens to format the source beautifully, but it is tedious to open each file manually and apply the source formatting via shortcut.

You should use braces fixer to force indentation.
The body of each structure MUST be enclosed by braces. Braces should be properly placed. Body of braces should be properly indented.
indentation_type simply enforces consistency.
But since both the fixers are already included in #PSR2 so the code should be fixed correctly.
See the relevant sections in the README.
Using your code php-cs-fixer 2.6 produces the following code
<?php
$test= "abc";
$additional_studs = "";
if (date('m') == 12 and $term='SP') {
$yr_suffix = date('y') + 1;
} else {
$yr_suffix = date('y');
}
function dup_stud($id, $conn)
{//...
}
$i = 0;
where the indentation is only partly fixed.
I reduced it to the code below
<?php
echo "a";
echo "b";
echo "c";
It looks like a bug in php-cs-fixer.

I will answer my own question based on the findings that led me to a resolution.
While the formatting basically worked, the catch for me was the indentation. If there were some leading spaces or tabs, certain lines kept sticking out after the fix.
Since neither php-cs-fixer nor phpcbf was able to fix the indentation properly I took desperate measures and trimmed every leading whitespace from each line as preparatory step with sed in a script like this:
sed "s/^[ \t]*//" -i test.php
Then I processed some prepped files again with php-cs-fixer and phpcbf to find out which one does a better job formatting the files according to PSR-2. It's shameful, but both fixers failed again - now showing some different shortcomings (i.e. bugs). To cut a long story short, I finally learnt that coupling the two tools leads to properly formatted code files. What a mess.
So, after sed, I run phpcbf
phpcbf --standard="PSR2" test.php
followed by
php-cs-fixer fix test.php --rules=#PSR2
And all the sudden I have beautifully PSR-2 formatted PHP files. Not the most efficient way, but it does the job.
Some additional comments:
If you would like to apply additional fixer rules, I would suggest to do this in a 4th step using a different, more complete php_cs configuration from a PSR-2 baseline formatting (because, you know, there are more fixer issues..).
I suggest to use 4 spaces as indent, as required by PSR-2. According to my experience things get even more complicated if you insist to have tabs.
The described procedure wouldn't be necessary if php-cs-fixer and phpcbf would not have so many issues. I will report them one after another, and hopefully, in the future the same can be achieved in one go.

About alternative options. I also had a problem with automatic code formatting in Visual Studio Code. I tried some formatters but only phpfmt solved my problem with indentation and putting braces in the right place. It also has many customization options but I didn't test them, since they weren't needed.

OP says
I am open to alternative approaches. Others must have faced formatting issues like this before.
Our PHP Formatter will indent files nicely. See OP's "badly indented" sample processed by the PHP Formatter:
C:\>DMSFormat PHP~v7 \temp\test.php
PHP~v7 PrettyPrinter Version 1.3.17
Copyright (C) 2004-2016 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
DMS_PHP~v7_INPUT_ENCODING=ISO-8859-1
DMS_PHP~v7_OUTPUT_ENCODING=ISO-8859-1
Parsing \temp\test.php [encoding ISO-8859-1 +CRLF +LF +CR +NEL +1 /^I]
<?php
include 'connect.php';
include 'functions.php';
$test="abc";
$additional_studs="";
if (date('m') == 12 and $term='SP') {
$yr_suffix=date('y')+1;
}
else {
$yr_suffix=date('y');
}
function dup_stud($id,$conn) { //...
}
$i=0;
(I had to add
<?php
to the start of the file to make it legal.)
This example was run from a file to the console. You can also do one file to one file, or run an entire list of files using a project file [this is probably what OP wants].
The PHP formatter uses a real PHP parser to process the source text and build an abstract syntax tree, and a special prettyprinter to print the AST back to nicely formatted text. It can't screw up the file.

GBP £ symbol in ASCII php file being converted to Â£ on live server (transferring with git)

I have a piece of PHP code, which was written in notepad++ on a Windows 7 machine
The Encoding in notepad++ is set to "Encode to ANSI" (ASCII)
I am them doing this in my code:
utf8_encode("£")
so I am sure to get the utf friendly version of the £ symbol.
All works perfectly fine on the local server.
But when I push it up to my live server I'm getting all sorts of issues with utf8 encoding errors in php.
Is something in the git push/pull process corrupting this, or is it perhaps a locale setting on the live server?
Both local and live servers run ubuntu 12.04
Thanks
Update 1
The actual error I'm getting is
invalid byte sequence for encoding "UTF8": 0xa3'
(This is a Postgres SQL error)
Other difference in local and live is live is over https and local is just http (both apache)
Update 2
Running:
file -bi script.php
on both local and live produces:
text/x-php; charset=iso-8859-1
So it seems as if the encoding of the file is intact?
Update 3
Looking at the local Postgres installation it has the following settings:
ENCODING = 'UTF8'
LC_COLLATE = 'en_GB.UTF-8'
LC_CTYPE = 'en_GB.UTF-8'
Whereas live has:
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
I'm going to see if I can swap the collate types to match local and see if that helps
Update 4
I'm doing this, which is the ultimately resulting in the failing piece of code on live (not local)
setlocale(LC_MONETARY, 'en_GB');
$equivFinal = utf8_encode("£") . money_format('%.2n', $equivFinal);
Update 5
I'm getting closer to the issue.
On local the string is produced as
Â£1.00
On live the string is produced as
Â£ï¿½1.00
So for some reason the live server is adding more crap in when doing the UTF8 conversion
Update 6
Ok so I've pinned it down to this:
setlocale(LC_MONETARY, 'en_GB');
Logger::getInstance(__NAMESPACE__)->info("TEST 01= " .money_format('%.2n', 1.00));
On local it outputs
TEST 01= 1.00
As expected
on live it output
TEST 01= ï¿½1.00
With the random characters added to the start, which is what is causing my utf8 issue as it's croaking on that.
Any idea why money_format would do that on one server and not another?

finally nailed it
it's money_format
if you dont specifiy a locale or specify it incorrectly then it just does its own thing
so i was doing
setlocale(LC_MONETARY, 'en_GB');
and on local that meant money_format just ignored the £ from the start of the output
but on live it meant that money_format put the unicode WTF character.
doing it properly for ubuntu of
setlocale(LC_MONETARY, 'en_GB.UTF-8');
means money_format comes out with £ at the front and therefore i dont need my utf8 rubbish
Update 1
Better still, don't bother with setlocale and I'm just going to do this:
utf8_encode("£") . money_format('%!.2n', $equivFinal);
Which basically formats the money and excludes the symbol prefix
and then better still just use number_format and do
utf8_encode("£") . number_format($equivFinal, 2);
I've learnt something new :)

The issue is that you can't save raw GBP symbol inside ASCII file.

Never use weird characters in your source code because no matter how much they "should" work you always run into problems like this. (You can come up with your own definition of "weird" but mine is anything you can't type in on a us-english keyboard without resorting to alt-codes.)
To get arround this restriction concatinate in the results of the chr() function. (use the following code snipit to find out the parameter you need to pass chr is 163 in this case.)
<?php echo(ord('£')); ?>
so in your case the line would read:
$equivFinal = chr(163) . money_format('%.2n', $equivFinal);

PHP exec change encoding

I need to address UTF-8 filenames with the php exec command. The problem is that the php exec command does not seem to understand utf-8. I use something like this:
echo exec('locale charmap');
returns ANSI_X3.4-1968
looking at this SO question, the solution lookes like that:
echo exec('LANG=de_DE.utf8; locale charmap');
But I still get the same output: ANSI_X3.4-1968
On the other hand - if I execute this php command on the bash command line:
php -r "echo exec('LANG=de_DE.UTF8 locale charmap');"
The output is UTF-8.
So the questions are:
Why is there an different result be executing the php command at bash and at apache_module/web page?
How to set UTF-8 for exec if it runs inside a website as apache module?

To answer my own question - i found the following solution:
setting the locale environment variable with PHP
$locale='de_DE.UTF-8';
setlocale(LC_ALL,$locale);
putenv('LC_ALL='.$locale);
echo exec('locale charmap');
This sets to / returns UTF-8. So i'm able to pass special characters and umlauts to linux shell commands.

This solves it for me (source: this comment here):
<?php
putenv('LANG=en_US.UTF-8');
$command = escapeshellcmd('python3 myscript.py');
$output = shell_exec($command);
echo $output;
?>

I had the similar problem. My program was returning me some German letters like: üäöß. Here is my code:
$programResult = shell_exec('my script');
Variable $programResult is containing German umlauts, but they were badly encoded. In order to encode it properly you can call utf8_encode() function.
$programResult = shell_exec('my script');
$programResult = utf8_encode($programResult);

PHP Gettext - No translation

I am trying to use the PHP gettext extension in order to translate some strings. All functions appear to return the correct values but calling gettext()/_() returns the original string only. The PO/MO files seem correct and I believe I have set the directories up correctly. I am running WAMP Server with PHP 5.3.10 on Windows (also tried running 5.3.4 and 5.3.8 because I have the installations).
Firstly, see /new2/www/index.php:
$locale = 'esn'; # returns Spanish_Spain.1252 in var dump
putenv("LC_ALL={$locale}"); // Returns TRUE
setlocale(LC_ALL, $locale); // Returns 'Spanish_Spain.1252'
$domain = 'messages';
bindtextdomain($domain, './locale'); // Returns C:\wamp\www\new2\www\locale
bind_textdomain_codeset($domain, 'UTF-8'); // Returns UTF-8
textdomain($domain); // Returns'messages'
print gettext("In the dashboard"); // Prints the original text, not the translation.
exit;
I have created the following file structure:
www/new2/www/locale/Spanish_Spain.1252/LC_MESSAGES/messages.mo
I have also tried replacing Spanish_Spain.1252 with: es_ES, esn, esp, Spanish, and Spanish_Spain.
The PO file used to generate the MO is like so (only the relevant entry given):
#: C:\wamp\www\new2/www/index.php:76
msgid "In the dashboard"
msgstr "TRANSLATED es_ES DASHBOARD"
This was generated using PoEdit. I have restarted Apache after adding any new .MO file. Also note that I was previously using Zend_Translate with Gettext and it was translating correctly. I wish to rely on the native gettext extension, though, in part because I am attempting to create a lightweight framework of my own.
Any help would be appreciated.
Edit: Amended directory structure. Note - will be able to try recent answers within 24hrs.

I set this up on my XAMPP instance and figure it out.
Flat out setlocale does not work on Windows, so what it returns is irrelevant.
For Windows you set the locale using the standard language/country codes (in this case es_ES is Spanish as spoken in Spain)
Under your locale directory create es_ES/LC_MESSAGES/. This where your messages.mo file lives.
$locale = 'es_ES';
putenv("LC_ALL={$locale}"); // Returns TRUE
$domain = 'messages';
bindtextdomain($domain, './locale');
bind_textdomain_codeset($domain, 'UTF-8');
textdomain($domain); // Returns'messages'
print gettext("In the dashboard");
exit;
I am not sure if this made a different, but I did two things when creating the po file. In poEdit under File -> Preferences I changed the Line ending format to Windows. And after I created the initial po with poEdit I opened the file in Notepad++ and switched the encoding type to UTF-8 as poEdit did not do this.
I hope this at least points you in the right direction.
References
PHP Localization Tutorial on Windows
Country Codes
Language Codes

Your code mentions this as the return value from bindtextdomain:
C:\wamp\www\new2\www\locale
With the setlocale of Spanish_Spain.1252 and textdomain of messages, calls to gettext will look in this path:
C:\wamp\www\new2\www\locale\Spanish_Spain.1252\LC_MESSAGES\messages.mo
But you created the file structure of:
www/new2/locale/Spanish_Spain.1252/LC_MESSAGES/messages.mo
^^
www/ missing here
Edit
Okay, so that didn't help. I've created a test script on Windows and using POEdit like you:
$locale = "Dutch_Netherlands.1252";
putenv("LC_ALL=$locale"); // 'true'
setlocale(LC_ALL, $locale); // 'Dutch_Netherlands.1252'
bindtextdomain("messages", "./locale"); // 'D:\work\so\l10n\locale'
textdomain("messages"); // 'messages'
echo _("Hello world"); // 'Hallo wereld'
My folder structure is like this:
D:\work\so\l10n\
\locale\Dutch_Netherlands.1252\LC_MESSAGES\messages.mo
\locale\Dutch_Netherlands.1252\LC_MESSAGES\messages.po
\test.php
Hope it helps, although it looks almost identical to yours. A few things I found online:
It's important to set the character set in .po file
Spaces inside the localization file might have a UTF8 alternative, so be wary of key lookups failing. Probably the best thing to test first is keys without spaces at all.

A suggestion: you may need the full locale for the .mo file. This is probably Spanish_Spain.UTF8 or esn_esn.UTF8 or esn_esp.UTF8 (not 1252, as you change the code base).
To track what directory it's looking for, you can install Process monitor (http://technet.microsoft.com/en-us/sysinternals/bb896645). It spews out bucket loads on info, but you should be able to find out which file/directory is being looked for.
(My other thought is to check file permissions - but if you already had something similar in Zend_Translate, then probably not the cause, but worth checking anyway).
Sorry if not good - but might give you a clue.

Look here. It works for me on windows and on linux also. The last values in the array works for windows. List of languages names can be found here. My catalogs are in
./locales/en/LC_MESSAGES/domain.mo
/cs/LC_MESSAGES/domain.mo

I have never tried using gettext on Windows, but each time I had problems with gettext on linux systems, the reason was that an appropriate language pack was not installed.

Problem can be also that when you change your *.po and *.mo files, you have to restart the Apache Server. This can be problem, so you can use workaround - always rename these files to some new name and they will be reloaded.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.