PHP preg_replace_callback with unicode - php

I wrote a simple script below to simulate my problem.
Both my string and pattern contain unicode characters.
Basically, if I run it from command line (php -f test.php), it prints "match" as expected.
But if I run it through web server (apache, http://localhost/test.php), it prints "no match".
I am using PHP 5.3.
Any idea why it behaves differently?
How do I make it work through web server?
thanks.
<?php
function myCallback($matches) {
return $matches[0];
}
$value = 'aaa äää';
$pattern = '/(\bäää)/u';
$value = preg_replace_callback($pattern, 'myCallback', $value, -1, $count);
if ($count > 0) {
echo "match";
} else {
echo 'no match';
}
?>

Try changing default_charset using iniset('default_charset','utf-8').
If it works, it means that CLI and Apache PHP configs have separate php.ini configurations and perhaps this variable is set differently, or maybe based on environment.
You can leave that in as a solution or find an alternative.
Cheers,
Dan

Check your test.php for it to have the correct headers. In PHP you should state:
header('Content-Type: text/html; charset=utf-8');
As in your HTML head:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
As standard it is set to ISO-8895-1 and maybe that is causing the problem. Here you can find some more information about multiple encodings (if utf-8 encoding is not acceptable) and about utf-8 self: http://devlog.info/2008/08/24/php-and-unicode-utf-8/

Related

How to find out the character-encoding standard that has been used in a PHP file?

I'm using PHP 7.2.11 on my laptop that runs on Windows 10 Home Single Language 64-bit operating system.
I've installed Apache/2.4.35 (Win32) and PHP 7.2.10 using the latest version of XAMPP.
I typed in a below code into a file titled demo.php :
<?php
$string1 = "Hel\xE1lo"; //Tried hexadecimal equivalent code-point from ISO-8859-1
echo $string1;
?>
After running above program into my web browser it gave me below output :
Hel�lo
Then, I made a small change to the above program and re-wrote the code as below :
<?php
$string1 = "Hel\xC3\xA1lo"; //Tried hexadecimal equivalent code-point from UTF-8, C form
echo $string1;
?>
After running the same program after making some change into my web browser it gave me below output (Indeed the expected result) :
Helálo
So, a doubt came to my mind after watching this stuff.
I want to know whether there is any built-in function or some mechanism in PHP which will tell me which character-encoding standard has been used in the current file?
P.S. : I know that in PHP the string will be encoded in whatever fashion it is encoded in the script file. I want to know whether there exist some built-in function, some mechanism or any other way around which will tell me the character-encoding standard used in the file under consideration.
This function must be in the same file whose encoding is to be determined.
//return 'UTF-8', 'iso-8859-1',.. or false
function getPageCoding(){
$codes = array(
'UTF-8' => "\xc3\xa4",
'iso-8859-1' => "\xe4",
'cp850' => "\x84",
);
return array_search('ä',$codes);
}
echo getPageCoding();
Demo: https://3v4l.org/UVvBM

PHP strpos() not working

I am trying to get PHP to search a text file for a string. I know the string exists in the text, PHP can display all the text, and yet strpos returns false.
Here is my code:
<?php
$pyscript = "testscript.py";
//$path = "C:\\Users\\eneidhart\\Documents\\Python Scripts\\";
$process_path = "C:\\Users\\eneidhart\\Documents\\ProcessList.txt";
//$processcmd = "WMIC /OUTPUT: $process PROCESS get Caption,Commandline,Processid";
$process_file = fopen($process_path, "r") or die("Unable to open file!");
$processes = fread($process_file);
if (strpos($processes, $pyscript) !== FALSE) {
echo "$pyscript found";
} elseif (strpos($processes, $pyscript) === FALSE) {
echo "$pyscript NOT found :(";
} else {
echo "UHHHHHHHH...";
}
echo "<br />";
while (!feof($process_file)) {
echo fgets($process_file)."<br />";
}
fclose($processfile);
echo "End";
?>
The while loop will print out every line of the text file, including
python.exe python testscript.py
but strpos still can't seem to find "testscript.py" anywhere in it.
The final goal of this script is not necessarily to read that text file, but to check whether or not a particular python script is currently running. (I'm working on Windows 7, by the way.) The text file was generated using the commented out $processcmd and I've tried having PHP return the output of that command like this:
$result = `$processcmd`;
but no value was returned. Something about the format of this output seems to be disagreeing with PHP, which would explain why strpos isn't working, but this is the only command I know of that will show me which python script is running, rather than just showing me that python.exe is running. Is there a way to get this text readable, or even just a different way of getting PHP to recognize that a python script is running?
Thanks in advance!
EDIT:
I think I found the source of the problem. I created my own text file (test.txt) which only contained the string I was searching for, and used file_get_contents as was suggested, and that worked, though it did not work for the original text file. Turns out that the command listed under $processcmd creates a text file with Unicode encoding, not ANSI (which my test.txt was encoded in). Is it possible for that command to create a text file with a different encoding, or even simpler, tell PHP to use Unicode, not ANSI?
You can use the functions preg_grep() and file():
$process_path = "C:\\Users\\eneidhart\\Documents\\ProcessList.txt";
$results = preg_grep('/\btestscript.py\b/', file($process_path));
if(count($results)) {
echo "string was found";
}
You should follow the advice given in the first comment and use either:
file_get_contents($process_path);
or
fread($process_file, filesize($process_path));
If that fix is not enough and there is actually a problem on strpos (which shouldn't be the case), you can use:
preg_match("/.*testscript\.py.*/", $processes)
NB: Really try to use strpos and not preg_match as it's not advised by the documentation.
Well, I found the answer. Thanks to those of you who suggested using file_get_contents(), as I would not have gotten here without that advice. Turns out that WMIC outputs Unicode, and PHP did not like reading that. The solution was another command which converts Unicode to ANSI:
cmd.exe /a /c TYPE unicode_file.txt > ansi_file.txt
I hope this helps, for those of you out there trying to check if a particular python script is working, or if you're just trying to work with WMIC.

UTF-8 dates doesn't encode properly

I have hard time with character charset, I suspect my fonction that display date to return non UTF-8 character (août is replaced by a question mark inside a diamond août).
When working on my local server everything's fine but when I push my code on my staging server, it's not displaying properly.
My php files are saved as UTF-8 NO BOM
If I inspect my output page, headers indicate UTF-8.
My local machine is a Mac with MAMP installed and my stating server have CentOS with cPanel installed.
Here is the part I suspect causing problem :
$langCode = "fr_FR"; /* Alos tried fr_FR.UTF-8 */
setlocale(LC_ALL, $langCode);
$monthName = _(strftime("%B",strtotime($dateStr)))
echo $monthName; /* Alos tried utf8_encode($monthName) worked on my staging server but not on my local server ! I'm using */
Finally found how to find the bug and fix it.
setlocale(LC_ALL, 'fr_FR');
var_dump(mb_detect_encoding(_(strftime("%B",strtotime($dateStr)))));
the dump returned UTF-8 on local and FALSE on staging server.
PHP.net documentation about mb_detect_encoding()
Return Values ¶
The detected character encoding or FALSE if the encoding cannot be
detected from the given string.
So charset can't be detected. I will try to force it "again"
setlocale(LC_ALL, 'fr_FR.UTF-8');
var_dump(mb_detect_encoding(_(strftime("%B",strtotime($dateStr)))));
this time the dump returned UTF-8 on local and UTF-8 on staging server. So I rollback my code to see what's happened when I tried first time with fr_FR.UTF-8 why does it was not working ? And I realize I was using utf8_encode() like pointed by user deceze in comment of this function's doc,
In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.
Thank you for your help everyone !
put this meta tag on your html code inside <head></head>
<meta charset="UTF-8">
It seems your server are configured to send the header
content-type: text/html; charset=UTF-8
as default. You could change your server configuration or you could add at the very start
<?php
header("content-type: text/html; charset=UTF-8");
?>
to set this header by yourself.
you need to use :
<?php
$conn = mysql_connect("localhost","root","root");
mysql_select_db("test");
mysql_query("SET NAMES 'utf8'", $conn);//put this line after you select db.

php utf-8 does not work

I'm trying, put in header("Content-Type: text/html; charset=utf-8"); but not work
<?PHP
header("Content-Type: text/html; charset=utf-8");
function randStr($rts=20){
$act_chars = "ABCÇDEFGĞHIİJKLMNOÖPRSŞTUÜVYZ";
$act_val = "";
for($act=0; $act <$rts ; $act++)
{
mt_srand((double)microtime()*1000000);
$act_val .= $act_chars[mt_rand(0,strlen($act_chars)-1)];
}
return $act_val;
}
$dene = randStr(16);
print "$dene";
?>
output
K��A�CÞZU����EJ
There are several mistakes. Don't use [] or strlen() on a multibyte string. Use mb_substr() and mb_strlen().
Also set the internal encoding of the multibyte extension to UTF-8:
mb_internal_encoding("UTF-8");
You could also set the encoding on a per-function basis. See the specific function signatures for further details.
Here is an improved version of your code:
header('Content-Type: text/html; charset=utf-8');
mb_internal_encoding("UTF-8");
function randStr($rts = 20) {
$act_chars = 'ABCÇDEFGĞHIİJKLMNOÖPRSŞTUÜVYZ';
$act_val = '';
$act_chars_last = mb_strlen($act_chars);
for($act = 0; $act < $rts; $act++) {
$act_val .= mb_substr($act_chars, mt_rand(0, $act_chars_last), 1);
}
return $act_val;
}
$dene = randStr(16);
print $dene;
Replaced double quotes by single quotes (saves a tiny amount of time)
Removed mt_srand() calls because they are automatically done as of PHP 4.2.0 (thanks to feeela for mentioning that in a comment)
Saved the string length - 1 in a variable, so PHP doesn't need to recomputed it in every loop step.
If you want more insights on that topic, checkout the link from deceze in the answer below: What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.
The problem is that you're assembling a random gobbledygook of bytes and are telling the browser to interpret it as UTF-8. The standard PHP str functions assume one byte = one character. By randomly picking a multi-byte string apart with those you're not going to get whole characters, only bytes.
If you need encoding aware functions operating on a character level rather than a byte level, use the mb_* functions.
And read What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.
Instead of using the mb_* functions, you could alternatively set the following variables in php.ini if you have access to it:
# following mbstring-variables should be set via php.ini or vhost-configuration (httpd.conf);
# does not work per directory/via .htaccess
mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.func_overload = 7
That way you could use your current function as it is and use strlen or similar functions as if they were a mb_strlen function.
See php.net – Function Overloading Feature for more details.
See also:
http://www.php.net/manual/en/mbstring.configuration.php#ini.mbstring.language
http://www.php.net/manual/en/mbstring.configuration.php#ini.mbstring.func-overload
http://www.php.net/manual/en/mbstring.configuration.php#ini.mbstring.internal-encoding

shell_exec() to call a php file returns incorrect output character encoding

Ok, here's my scenario.
I have file.php that contains the following:
<?php
$output = shell_exec("php output.php");
echo $output;
?>
And the output.php contains the following:
<?php
echo "This is my output!";
?>
When I run file.php from a web browser, I get the following output:
‹ ÉÈ,V¢ÜJ…üÒ’‚ÒEÿÿp³*š
However, when I run the same php output.php directly from the shell, I get the correct output:
This is my output!
Now I'm well aware that this is some sort of encoding issue, but I cannot for the life of me figure out how to resolved it. I've tried setting the language using putenv('LANG=en_US.UTF-8');. I also tried using header('Content-Type: text/html; charset=UTF-8'); and even trying to determine what encoding type is being outputted using mb_detect_encoding($out, 'UTF-8', true);. without result.
exec() produces the same, malformed output.
I would really appreciate if anyone can shed some light on this and can possibly provide some insight on what is happening between the shell_exec and the output of the file to cause the output to be malformed.
The problem was the PHP output was being compressed twice, due to output compression being enabled.
The solution is to disable zlib.output_compression either by an entry in your .htaccess file, or by including the following at the top of your .php file:
ini_set('zlib.output_compression', 'Off');

Categories