I will be brief. My FTP function returns wrong encoding of filenames
$conn_id = ftp_connect("site.com");
ftp_login($conn_id, "login", "pass");
ftp_pasv($conn_id, true);
$buff = ftp_nlist($conn_id, "./");
print_r($buff);
-> // result
array() {
[0]=> "��.txt"
}
The file name has Windows-1251 encoding.
I tried to connect to FTP via nodejs but it also returns something creepy — òð.txt.
My desktop client (WinSCP) however works fine with this.
PS: I tried to use utf8_encode - but that's also not working for me.
If the encoding is of you could try to change it using mb_convert_encoding. The code below should output the correct value.
<?php
echo mb_convert_encoding($buff[0], "UTF-8");
//or
echo mb_convert_encoding($buff[0], "UTF-8", "windows-1251");
?>
If it doesnt work, you can try to find the right encoding using something like
<?php
foreach(mb_list_encodings() as $chr){
echo mb_convert_encoding($buff[0], 'UTF-8', $chr)." : ".$chr."<br>";
}
?>
Many (but not all) ftp servers supports UTF-8 pathnames encoding. You can turn this feature on by issuing 'OPTS UTF8 ON' command before ftp_nlist call.
ftp_raw('OPTS UTF8 ON');
First you add content type on your page.
header('Content-Type: text/html; charset=utf-8');
And then try this, hope it helps
str_replace(array('%82','%94','+'),array('é','ö',' '),urlencode($folder_name));
It's not the best way, but it works for me, if you url encode a string it changes the awkward characters into e.g. %82... You can then replace these with the HTML codes.
you can try using iconv function. Hoping it will solve your problem.
Related
I got a csv file, if I set the charset to ISO-8859-2(eastern europe) in Libre Calc, than it renders the characters correctly, but since the server's locale set to EN-UK.
I can not read the characters correctly, for example:
it returns : T�t insted of Tót.
I tried many things like:
echo (mb_detect_encoding("T�t","ISO-8859-2","UTF-8"));
I know probably the char does not exist in UTF-8 but I tried.
Also tried to setup the correct charset in the header:
header('Content-Type: text/html; charset=iso-8859-2');
echo "T�th";
but its returns : TÄĹźËth insted of Tóth.
Please help me solve this, thanks in advance
I advise against setting the header to charset=iso-8859-2'. It is usual to work with UTF-8. If the data is available with a different encoding, it should be converted to UTF-8 and then processed as CSV. The following example code could be kept as simple as the newline characters in UTF-8 and iso-8859-2 are the same.
$fileName = "yourpath/Iso8859_2.csv";
$fp = fopen($fileName,"r");
while($row = fgets($fp)){
$strUtf8 = mb_convert_encoding($row,'UTF-8','ISO-8859-2');
$arr = str_getcsv($strUtf8);
var_dump($arr);
}
fclose($fp);
The exact encoding of the CSV file must be known. mb_detect_encoding is not suitable for determining the encoding of a file.
I will be brief. My FTP function returns wrong encoding of filenames
$conn_id = ftp_connect("site.com");
ftp_login($conn_id, "login", "pass");
ftp_pasv($conn_id, true);
$buff = ftp_nlist($conn_id, "./");
print_r($buff);
-> // result
array() {
[0]=> "��.txt"
}
The file name has Windows-1251 encoding.
I tried to connect to FTP via nodejs but it also returns something creepy — òð.txt.
My desktop client (WinSCP) however works fine with this.
PS: I tried to use utf8_encode - but that's also not working for me.
If the encoding is of you could try to change it using mb_convert_encoding. The code below should output the correct value.
<?php
echo mb_convert_encoding($buff[0], "UTF-8");
//or
echo mb_convert_encoding($buff[0], "UTF-8", "windows-1251");
?>
If it doesnt work, you can try to find the right encoding using something like
<?php
foreach(mb_list_encodings() as $chr){
echo mb_convert_encoding($buff[0], 'UTF-8', $chr)." : ".$chr."<br>";
}
?>
Many (but not all) ftp servers supports UTF-8 pathnames encoding. You can turn this feature on by issuing 'OPTS UTF8 ON' command before ftp_nlist call.
ftp_raw('OPTS UTF8 ON');
First you add content type on your page.
header('Content-Type: text/html; charset=utf-8');
And then try this, hope it helps
str_replace(array('%82','%94','+'),array('é','ö',' '),urlencode($folder_name));
It's not the best way, but it works for me, if you url encode a string it changes the awkward characters into e.g. %82... You can then replace these with the HTML codes.
you can try using iconv function. Hoping it will solve your problem.
I can't figure out what I'm doing wrong. I'm getting file content from the database. When I echo the content, everything displays just fine, when I write it to a file (.html) it breaks. I've tried iconv and a few other solutions, but I just don't understand what I should put for the first parameter, I've tried blanks, and that didn't work very well either. I assume it's coming out of the DB as UTF-8 if it's echoing properly. Been stuck a little while now without much luck.
function file($fileName, $content) {
if (!file_exists("out/".$fileName)) {
$file_handle = fopen(DOCROOT . "out/".$fileName, "wb") or die("can't open file");
fwrite($file_handle, iconv('UTF-8', 'UTF-8', $content));
fclose($file_handle);
return TRUE;
} else {
return FALSE;
}
}
Source of the html file looks like.
Comes out of the DB like this:
<h5>Текущая стабильная версия CMS</h5>
goes in file like this
<h5>Ð¢ÐµÐºÑƒÑ‰Ð°Ñ ÑÑ‚Ð°Ð±Ð¸Ð»ÑŒÐ½Ð°Ñ Ð²ÐµÑ€ÑÐ¸Ñ CMS</h5>
EDIT:
Turns out the root of the problem was Apache serving the files incorrectly. Adding
AddDefaultCharset utf-8
To my .htaccess file fixed it. Hours wasted... At least I learned something though.
Edit: The database encoding does not seem to be the issue here, so this part of the answer is retained for information only
I assume it's coming out of the DB as UTF-8
This is most likely your problem, what database type do you use? Have you set the character encoding and collation details for the database, the table, the connection and the transfer.
If I was to hazard a guess, I would say your table is MySQL and that your MySQL collation for the database / table / column should all be UTF8_general_ci ?
However, for some reason MySQL UTF8 is not actually UTF8, as it stores its data in 3bits rather than 4bits, so can not store the whole UTF-8 Character sets, see UTF-8 all the way through .
So you need to go through every table, column on your MySQL and change it from UTF8_ to the UTF8mb4_ (note: since MySQL 5.5.3) which is UTF8_multibyte_4 which covers the whole UTF-8 Spectrum of characters.
Also if you do any PHP work on the data strings be aware you should be using mb_ PHP functions for multibyte encodings.
And finally, you need to specify a connection character set for the database, don't run with the default one as it will almost certainly not be UTF8mb4, and hence you can have the correct data in the database, but then that data is repackaged as 3bit UTF8 before then being treated as 4bit UTF8 by PHP at the other end.
Hope this helps, and if your DB is not MySQL, let us know what it is!
Edit:
function file($fileName, $content) {
if (!file_exists("out/".$fileName)) {
$file_handle = fopen(DOCROOT . "out/".$fileName, "wb") or die("can't open file");
fwrite($file_handle, iconv('UTF-8', 'UTF-8', $content));
fclose($file_handle);
return TRUE;
} else {
return FALSE;
}
}
your $file_handle is trying to open a file inside an if statement that will only run if the file does not exist.
Your iconv is worthless here, turning from "utf-8" to er, "utf-8". character detection is extremely haphazard and hard for programs to do correctly so it's generally advised not to try and work out / guess what a character encoding it, you need to know what it is and tell the function what it is.
The comment by Dean is actually very important. The HTML should have a <meta charset="UTF-8"> inside <head>.
That iconv call is actually not useful and, if you are right that you are getting your content as UTF-8, it is not necessary.
You should check the character set of your database connection. Your database can be encoded in UTF-8 but the connection could be in another character set.
Good luck!
I'm having some problem with mkdir
I'm using xampp on windows, when I try to create a directory, it returns not like should be, in example
mkdir(JPATH_SITE.'/images/projects/'.$region_folder.'/'.$project_folder, 0777, true);
Should return something like
/images/projects/Ленинградская_область/Ленинградская_область_1
But create a directory like:
/images/projects/Ленинградская_область/Ленинградская_область_1
It's something about encoding? or has to do with the OS?
Windows filenames are not encoded in utf8, but in windows-1252 or windows-1251 or smthing like that.
try this:
$dirname = JPATH_SITE.'/images/projects/'.$region_folder.'/'.$project_folder;
//replace "UTF-8" with the respective input charset, if it is not utf8
$dirname = iconv("UTF-8","Windows-1252",$dirname);
mkdir($dirname, 0777, true);
//if this doesnt work, try another charset like this:
$dirname = iconv("UTF-8","Windows-1251",$dirname);
//you can also use iconv on your russian variables only
//remember that you might need to change UTF-8 to another input charset
$region_folder = iconv("UTF-8","Windows-1251",$region_folder);
$project_folder = iconv("UTF-8","Windows-1251",$project_folder);
read more about iconv here: PHP iconv()
also useful to detect your charset encoding: mb_detect_encoding()
I am creating a file using php fwrite() and I know all my data is in UTF8 ( I have done extensive testing on this - when saving data to db and outputting on normal webpage all work fine and report as utf8.), but I am being told the file I am outputting contains non utf8 data :( Is there a command in bash (CentOS) to check the format of a file?
When using vim it shows the content as:
Donâ~#~Yt do anything .... Itâ~#~Ys a
great site with
everything....Weâ~#~Yve only just
launched/
Any help would be appreciated: Either confirming the file is UTF8 or how to write utf8 content to a file.
UPDATE
To clarify how I know I have data in UTF8 i have done the following:
DB is set to utf8 When saving data
to database I run this first:
$enc = mb_detect_encoding($data);
$data = mb_convert_encoding($data, "UTF-8", $enc);
Just before I run fwrite i have checked the data with Note each piece of data returns 'IS utf-8'
if (strlen($data)==mb_strlen($data, 'UTF-8')) print 'NOT UTF-8';
else print 'IS utf-8';
Thanks!
If you know the data is in UTF8 than you want to set up the header.
I wrote a solution answering to another tread.
The solution is the following: As the UTF-8 byte-order mark is \xef\xbb\xbf we should add it to the document's header.
<?php
function writeStringToFile($file, $string){
$f=fopen($file, "wb");
$file="\xEF\xBB\xBF".$file; // this is what makes the magic
fputs($f, $string);
fclose($f);
}
?>
You can adapt it to your code, basically you just want to make sure that you write a UTF8 file (as you said you know your content is UTF8 encoded).
fwrite() is not binary safe. That means, that your data - be it correctly encoded or not - might get mangled by this command or it's underlying routines.
To be on the safe side, you should use fopen() with the binary mode flag. that's b. Afterwards, fwrite() will safe your string data "as-is", and that is in PHP until now binary data, because strings in PHP are binary strings.
Background: Some systems differ between text and binary data. The binary flag will explicitly command PHP on such systems to use the binary output. When you deal with UTF-8 you should take care that the data does not get's mangeled. That's prevented by handling the string data as binary data.
However: If it's not like you told in your question that the UTF-8 encoding of the data is preserved, than your encoding got broken and even binary safe handling will keep the broken status. However, with the binary flag you still ensure that this is not the fwrite() part of your application that is breaking things.
It has been rightfully written in another answer here, that you do not know the encoding if you have data only. However, you can validate data if it validates UTF-8 encoding or not, so giving you at least some chance to check the encoding. A function in PHP which does this I've posted in a UTF-8 releated question so it might be of use for you if you need to debug things: Answer to: SimpleXML and Chinese look for can_be_valid_utf8_statemachine, that's the name of the function.
//add BOM to fix UTF-8 in Excel
fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ));
I find this piece works for me :)
The problem is your data is double encoded. I assume your original text is something like:
Don’t do anything
with ’, i.e., not the straight apostrophe, but the right single quotation mark.
If you write a PHP script with this content and encoded in UTF-8:
<?php
//File in UTF-8
echo utf8_encode("Don’t"); //this will double encode
You will get something similar to your output.
$handle = fopen($file,"w");
fwrite($handle, pack("CCC",0xef,0xbb,0xbf));
fwrite($handle,$file);
fclose($handle);
I know all my data is in UTF8 - wrong.
Encoding it's not the format of a file. So, check charset in headers of the page, where you taking data from:
header("Content-type: text/html; charset=utf-8;");
And check if data really in multi-byte encoding:
if (strlen($data)==mb_strlen($data, 'UTF-8')) print 'not UTF-8';
else print 'utf-8';
There is some reason:
first you get information from database it is not utf-8.
if you sure that was true use this ,I always use this and it work :
$file= fopen('../logs/logs.txt','a');
fwrite($file,PHP_EOL."_____________________output_____________________".PHP_EOL);
fwrite($file,print_r($value,true));
The only thing I had to do is add a UTF8 BOM to the CSV, the data was correct but the file reader (external application) couldn't read the file properly without the BOM
Try this simple method that is more useful and add to the top of the page before tag <body> :
<head>
<meta charset="utf-8">
</head>