i'm testing the exact same functionality in two different environments, one is a local development environment, and the other is a staging server. they have the exact same code.
when I do a curl request to each endpoint containing the functionality, I get two different results:
Local (php 5.4)
//this was the desired output
<p><span>Awesome water shooting power</span></p>
Staging (php 5.3)
//none of the html chars are changed.
<p><span>Awesome water shooting power</span></p>
the actual string of text is being run through htmlspecialchars in the following way:
htmlspecialchars( $req->get('description') )
Should I be specifically using all of the other arguments in this htmlspecialchars method in order to make it behave the same way in any environment? or is there something at the php.ini level that could be happening?
in the php documentation for htmlspecialchars:
Encoding: Defines encoding used in conversion. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior
to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
so, based on that, I tried setting the fields explicitly, so they would not default to different things silently.
htmlspecialchars( $string , ENT_COMPAT, 'UTF-8' );
now the output is the same between the two different environments.
Related
i have a problem with htmlentities in one of my websites. I get form database a utf8 encoded string, but when i use htmlentities it returns an empty string.
After some testing I determined that the root of the problem is the php installation. ¿Why? Because if i change my php version with the Nginx config (sites-available file...) it works.
In my "PHP Version 5.6.35-1+ubuntu14.04.1+deb.sury.org+1" installation it works correctly, but in "PHP Version 5.5.9-1ubuntu4.22" in doesn't work. The database is the same, the files are the same, all is equal besides php version.
Some tensting code:
$text = "señal";
mb_detect_encoding($text); //returns 'UTF-8'
htmlspecialchars($text); //returns ''
htmlentities($text); //returns ''
htmlentities(utf8_encode($text)); //returns 'SEÑAL'
Is something on the php configuration that i need to see? Thanks!!!
I'm using PHP 7.2.11 on my laptop that runs on Windows 10 Home Single Language 64-bit operating system.
I've installed Apache/2.4.35 (Win32) and PHP 7.2.10 using the latest version of XAMPP.
I typed in a below code into a file titled demo.php :
<?php
$string1 = "Hel\xE1lo"; //Tried hexadecimal equivalent code-point from ISO-8859-1
echo $string1;
?>
After running above program into my web browser it gave me below output :
Hel�lo
Then, I made a small change to the above program and re-wrote the code as below :
<?php
$string1 = "Hel\xC3\xA1lo"; //Tried hexadecimal equivalent code-point from UTF-8, C form
echo $string1;
?>
After running the same program after making some change into my web browser it gave me below output (Indeed the expected result) :
Helálo
So, a doubt came to my mind after watching this stuff.
I want to know whether there is any built-in function or some mechanism in PHP which will tell me which character-encoding standard has been used in the current file?
P.S. : I know that in PHP the string will be encoded in whatever fashion it is encoded in the script file. I want to know whether there exist some built-in function, some mechanism or any other way around which will tell me the character-encoding standard used in the file under consideration.
This function must be in the same file whose encoding is to be determined.
//return 'UTF-8', 'iso-8859-1',.. or false
function getPageCoding(){
$codes = array(
'UTF-8' => "\xc3\xa4",
'iso-8859-1' => "\xe4",
'cp850' => "\x84",
);
return array_search('ä',$codes);
}
echo getPageCoding();
Demo: https://3v4l.org/UVvBM
I have a web server with a hundred applications developed on PHP 5.2.3, Apache 2.2.4, MySQL 5.0.37 (databases with utf8_general_ci character set).
I set up a new machine to which I will port the web server that has PHP 5.5.12 (default_charset=UTF-8), Apache 2.4.9 (html head with content="text/html; charset=utf-8", MySQL 5.6.17 (test database with utf8_general_ci character set).
In the PHP scripts I used many time the htmlentities function in a form like htmlentities($var) (ok not the best way but I was a beginner) in which $var is text extracted from MySQL and contain special chars like "è" (when I save in the dbase I use set var=_utf8'è').
The problem is that on the new server the htmlentities function returns nothing (the same code in the old server return the correct è).
After some googling I found a solution that is rewrite the call as htmlentities(utf8_encode($var)), but you know to correct hundred application...
There is a solution (with .ini variables, database charset modification or similar) to maintain the "old" functionality of the htmlentities function?
[EDIT]
Thanks to CBroe's suggestion, I can solve the problem linked to MySQL using the function mysql_set_charset called when I connect to the DB (a common function).
But, the problem still remain for a generic conversion. For example, if I want to print the euro symbol and I want to to use the htmlentities function instead to remember the html code.
As other note if I use htmlentities("è",ENT_QUOTES,'UTF-8') the result is nothing, if I use htmlentities("è",ENT_QUOTES,'ISO-8859-1') or htmlentities("è",ENT_QUOTES,'') the result is right.
PS:
The problem is the same if I pass a string with a special char like "abcdè".
[EDIT]
I found a solution for the same problem on the ODBC connection:
https://www.saotn.org/php-56-default_charset-change-may-break-html-output/
so setting default_charset = "iso-8859-1" the old applications still work fine.
I have a website on a host that recently switched from PHP 5.2 to 5.4, and required us to chose a new php.ini file: 5.4 plain, 5.4 solo (just one php.ini file used throughout the site), and 5.4 fast.
I do not know which one I was using prior to making the switch, but when I did, (I chose 5.4 solo), I noticed that a part of my website that depends on mbstring (multibyte characters) no longer works.
In specific, it opens a text file that is full of characters and then that is used in an encryption script and it stores garbage in the mysql database. Then to retrieve it, it's again run through the script and decrypted, and displayed on the screen.
This worked just fine until the 5.4 change. Now it appears that it's unable to retrieve (open?) the text file. I have tested this with a non-multibyte character version and that works fine, so I don't think the issue is with the code, but rather with the way PHP is treating multibyte chars...and I suspect, just a hunch, that this is fixable by tweaking the PHP.ini file somehow. Zend.multibyte seems to be PHP's new thing.
My problem is that I have no idea what to tweak. I tried several different Zend.multibyte/mbstring combos and that didn't work.
I know that everything works up until a string is sent for encryption. It comes back as a null value, instead of a garbled string. I feel like something in the string is being rejected by PHP and thus it's failing...offering nothing instead of the string it should.
Does anyone have a thought as to what might be happening and why my script no-longer works with 5.4? I have checked and the mbstring module IS loaded, with default values in the php.ini.
Any suggestions would be great...I'm totally stumped. Even some additional reports or ways to test or narrow down the problem would be fantastic.
Thank you!
Here is some code, where I think the problem is:
$this->s1 = "";
$s1array = array("a1.txt", "a2.txt", "a3.txt");
foreach ($s1array as $i => $value) {
$myFile = "../a/dir/somewhere/$s1array[$i]";
$fh = fopen($myFile, 'r');
$theData = fgets($fh);
fclose($fh);
$this->s1 .= html_entity_decode($theData, ENT_NOQUOTES, 'UTF-8');
}
The files ../a/dir/somewhere/a1.txt and ../a/dir/somewhere/a2.txt (etc) are semi-comma delimited strings of html coded letters, for example: & #x0fb0f;& #x02c97;& #x00436;& #x10833;& #x00514; (I added the spaces so it would show code not the HTML values!).
But I guess now, for some reason, this above code isn't returning any results. If I assign the result to a variable and echo that variable, there's nothing. But if I assign $this->s1 = "abcde"; or a longer string and skip the "foreach" part, it will work. So something in this process, this code, no longer works in 5.4. Can anyone tell what's going on here? Thank you!
Why you use fopen and so on for text files when you could use file_put_contents and file_get_contents - they are mostly wrappers for fopen, freads and so on. I have NEVER ever had any problems with UTF8 using that two functions.
Also make sure everything (from php, to db if you are using it, and php files) are encoded or using utf8. There is nothing funnier than *.php files in for example latin2 and all the rest in utf8.
I have a piece of PHP code, which was written in notepad++ on a Windows 7 machine
The Encoding in notepad++ is set to "Encode to ANSI" (ASCII)
I am them doing this in my code:
utf8_encode("£")
so I am sure to get the utf friendly version of the £ symbol.
All works perfectly fine on the local server.
But when I push it up to my live server I'm getting all sorts of issues with utf8 encoding errors in php.
Is something in the git push/pull process corrupting this, or is it perhaps a locale setting on the live server?
Both local and live servers run ubuntu 12.04
Thanks
Update 1
The actual error I'm getting is
invalid byte sequence for encoding "UTF8": 0xa3'
(This is a Postgres SQL error)
Other difference in local and live is live is over https and local is just http (both apache)
Update 2
Running:
file -bi script.php
on both local and live produces:
text/x-php; charset=iso-8859-1
So it seems as if the encoding of the file is intact?
Update 3
Looking at the local Postgres installation it has the following settings:
ENCODING = 'UTF8'
LC_COLLATE = 'en_GB.UTF-8'
LC_CTYPE = 'en_GB.UTF-8'
Whereas live has:
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
I'm going to see if I can swap the collate types to match local and see if that helps
Update 4
I'm doing this, which is the ultimately resulting in the failing piece of code on live (not local)
setlocale(LC_MONETARY, 'en_GB');
$equivFinal = utf8_encode("£") . money_format('%.2n', $equivFinal);
Update 5
I'm getting closer to the issue.
On local the string is produced as
£1.00
On live the string is produced as
£�1.00
So for some reason the live server is adding more crap in when doing the UTF8 conversion
Update 6
Ok so I've pinned it down to this:
setlocale(LC_MONETARY, 'en_GB');
Logger::getInstance(__NAMESPACE__)->info("TEST 01= " .money_format('%.2n', 1.00));
On local it outputs
TEST 01= 1.00
As expected
on live it output
TEST 01= �1.00
With the random characters added to the start, which is what is causing my utf8 issue as it's croaking on that.
Any idea why money_format would do that on one server and not another?
finally nailed it
it's money_format
if you dont specifiy a locale or specify it incorrectly then it just does its own thing
so i was doing
setlocale(LC_MONETARY, 'en_GB');
and on local that meant money_format just ignored the £ from the start of the output
but on live it meant that money_format put the unicode WTF character.
doing it properly for ubuntu of
setlocale(LC_MONETARY, 'en_GB.UTF-8');
means money_format comes out with £ at the front and therefore i dont need my utf8 rubbish
Update 1
Better still, don't bother with setlocale and I'm just going to do this:
utf8_encode("£") . money_format('%!.2n', $equivFinal);
Which basically formats the money and excludes the symbol prefix
and then better still just use number_format and do
utf8_encode("£") . number_format($equivFinal, 2);
I've learnt something new :)
The issue is that you can't save raw GBP symbol inside ASCII file.
Never use weird characters in your source code because no matter how much they "should" work you always run into problems like this. (You can come up with your own definition of "weird" but mine is anything you can't type in on a us-english keyboard without resorting to alt-codes.)
To get arround this restriction concatinate in the results of the chr() function. (use the following code snipit to find out the parameter you need to pass chr is 163 in this case.)
<?php echo(ord('£')); ?>
so in your case the line would read:
$equivFinal = chr(163) . money_format('%.2n', $equivFinal);