get file name with diacritic PHP - php

there are a lot of topics about diacritics/accents in PHP but none of them solved my problem.
I have this code:
<!DOCTYPE html>
<html lang="sk">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
</head>
<body>
<?php
$items = scandir("test/");
echo $items[3];
?>
</body>
</html>
$items[3] is ľšá.png but it displays: ğšá.png
I tried:
foreach(mb_list_encodings() as $chr){
echo mb_convert_encoding($items[3], 'UTF-8', $chr) ." : ".$chr."<br>";
}
But none of them is right for me.
I also tried to put this before scandir():
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
ini_set('default_charset', 'utf-8');
But no change.
It is very strange because my website have always been working before I saw the issue (today) and I did not affect any code.

You tried to convert from 1-byte encodings to UTF-8 (double-byte), but that wrong file name that you see has double characters in it, so its already UTF-8!
You need to convert it from UTF-8, and for me it worked like this:
mb_convert_encoding($items[3], "ISO-8859-15", 'UTF-8'); // its to ISO from UTF-8
Personally I use iconv
echo iconv("UTF-8","ISO-8859-15",$items[3]); // its from UTF-8 to ISO
but i think its no big difference if either of them actually works.
Also I suggest you to check file names on your webserver if they accidentally has been converted when uploaded.

Related

How to make sure that php files are actually saved with utf-8 encoding?

I'm running into an annoying php utf8 netbeans encoding problem. I spent all night yesterday searching Google and also StackOverflow for solutions. I tried many things and it seems I can't get it to work. I know i'm doing something wrong...
I have a php project that needs to print french caracters like é è ç à but instead of the letter it outputs �
I added this to my file:
<!DOCTYPE html>
<html>
<?php
header('Content-type: text/html; charset=utf-8');
?>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
</head>
Some users suggested to modify neatbeans.confby adding -J-Dfile.encoding=UTF-8 on this line netbeans_default_options
Still the problem remains...
I right clicked on the project -> properties -> sources and my encoding is set to windows-1252 I tried to change it to UTF-8 but it gives me this warning : Changing to UTF-8 may cause some files to be unreadable. Would you like to continue>
I clicked yes but the problem remains...
So how can I make sure my file is encoded in UTF-8? Easy way please.
What I have done to resolve this kind of issue is using:
iconv("ISO-8859-1", "UTF-8", $value);

PHP Encoding of Special Characters iso-8859-1

My PHP script parses a web site and pulls out an HTML DIV that looks like this (and saves it as a string)
<div id="merchantinfo">The following merchants: Nautica®, Brookstone®, Teds® ©2012 Blabla</div>
I store this as $merchantList (string).
However, when I output the data to the webpage
echo $merchantList
The encoding gets messed up and displays as:
Nautica®, Brookstone®, Teds® ©2012 Blabla
I tried adding the following to the display page:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
But that didn't do anything. --Thanks
EDIT:: ------------
For the question, the accepted answer is correct.
But I realized my actual issue was slightly different.
The initial parsing using DOMDocument::loadHTML had already mangled the UTF-8 encoding, causing the string to save as
<div id="merchantinfo">The following merchants: Nauticaî, Brookstoneî, Tedsî ©2012 Blabla</div>
This was solved by:
$html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");
$dom->loadHTML($html);
Use:
ini_set('default_charset', 'UTF-8');
And do not use iso-8859-1. Use UTF-8.
From the mojibake you posted the input string is utf-8, not iso-8859-1.
You need just to Use htmlspecialchars_decode function , exemple :
$string = '"hello dude"';
$decodechars = htmlspecialchars_decode($string);
echo $decodechars; // output : "hello dude"

read arabic characters from text files in PHP

I want to read arabic characters from to text file and show them
but they are shown in a strange symbols like �
and they can't be compared with any characters
Let try to work on it together :
I will assume you code looks like :
1- your file should written in UTF8 encoding , i will take care if it's not
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<?php
$file_path = "c:/home/user/text.txt";
$data = file_get_contents($file);
// if its utf8 file skip this line
$arabic_data = iconv("windows-1256" , "utf8" , $data);
echo $arabic_data ; ?>
in case you are linux user you can use iconv in commandline much powerful & easier
I'll update my answer in case you need more help or provide more info
Try to create .htaccess file in the root of your web application and write such string there AddDefaultCharset UTF-8
You need to know the encoding of these files. Can be UTF-8, can be some transliteracion, can be something else. Then you have to convert to UTF-8 (not needed if are already utf-8), then you output that as in the page,... having the page declare itself as UTF-8... Like the Mira comment.
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
is life saver
if you working with php, use
echo '<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />';
in you scrip
Change file encoding to UTF-8
By open file by using any editor and change encode from editor to utf-8

PHP Display Special Characters

When i output the text £3.99 per M² from an xml file,browser displays
it as £3.99 per M².XML file is in UTF-8 format.I wonder how to fix
this.
Make sure you're outputting UTF-8. That conversion sounds like your source is UTF-8, yet you're telling the browser to expect something else (Latin1?). You should send a header indicating to the browser UTF-8 is coming up, and you should have the correct meta header:
<?php
header ('Content-type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<?php echo "£3.99 per M²"; ?>
</body>
</html>
This should work correctly.
You should encode html entities:
you could try
htmlentities($str, ENT_QUOTES, "UTF-8");
Look here for a complete reference
If you still have problems sometimes you also have to decode the string with utf8_decode()
so you can try:
$str = utf8_decode($str);
$str = htmlentities($str, ENT_QUOTES);

PHP decode GB2312

I'm working on an IMAP email script and I have some lines coded in GB2312 (which I assume is Chinese encoding), looks like this =?GB2312?B?foobarbazetc
How can I start working with this string? I checked mb_list_encodings() and this one is not listed.
If you have the base64-decoded data, then use mbstring or iconv. If you have the raw header, then mbstring.
<?php
$t = "\xc4\xe3\xba\xc3\n";
echo iconv('GB2312', 'UTF-8', $t);
echo mb_convert_encoding($t, 'UTF-8', 'GB2312');
mb_internal_encoding('UTF-8');
echo mb_decode_mimeheader("=?gb2312?b?xOO6ww==?=");
?>
Ignacio solved the meat of the problem with mb_decode_mimeheader() but for future reference these links are also helpful:
http://developer.loftdigital.com/blog/php-utf-8-cheatsheet
http://www.herongyang.com/PHP-Chinese/PHP-UTF-8-Chinese-String-Literals.html
The specific header string I was working with:
$subject = "=?GB2312?B?tPC4tDogUXVvdGF0aW9uIFBJSSBwcm9kdWN0cyA=?= =?GB2312?B?Rk9CIFNoYW5naGFpIG9yIE5pbmdibyBwb3J0?="
This required a page header of
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
and PHP
mb_internal_encoding('utf-8');
echo mb_decode_mimeheader($subject)."<br />";
to output
主题: Quotation PII products FOB Shanghai or Ningbo port

Categories