Arabic characters and UTF-8 in aria2 - php

I use aria2 to have download with XML_RPC and when i want to have a download like this in php :
$client->aria2_addUri( array($url), array("dir"=>'/home/amir/دانلود') );
it will create a folder named شسÛب instead of دانلود. i post a related post in aria2 forums. and they said aria2 has not problem if that string sent to aria2 with utf-8.
so, i used utf-8 header and convert the string to utf-8, but it's not works :
header('Content-type:application/json; charset=utf-8');
$dir_on_server = mb_convert_encoding($dir_on_server, 'UTF-8');
what do you think?

Try accessing the file or folder via the browser.
By writing a .htaccess-file with the content "Options Indexes" so that you're folders are shown.(I can even access them via http)
I created multiple files and folders by writing a script where the GET Value file or folder determines the name of the folder or file, I tried it with japanese and arabic characters. Albeit they won't be shown in FTP correctly (In my case only file names like: "?????") they are correctly displayed if you read them by script.
The problem might be at the program you're using to access your FTP, WinSCP for example has UTF-8 normally on "auto" by default, so forcing it might work out.(Although I have to admit that it's not working on my side, maybe my linux server is not supporting utf-8 file names which can also be a problem for you)
PS:
Also make sure your php-file is encoded(saved) in UTF-8 without BOM since you're using a constant utf-8 string.
EDIT:
Also if you still intent to use mb_convert_encoding, better add the optional parameter "from_encoding".
I tested this with japanese in a SHIFT-JIS encoded file:
$text = "A strange string to pass, maybe with some 日本語の characters.";
echo mb_convert_encoding($text, 'UTF-8');
and it's not displaying correctly although my browser has UTF-8 activated, so it seems to be not always right when it's trying to detect the Encoding.
So this for example works for me then:
$text = "A strange string to pass, maybe with some 日本語の characters.";
echo mb_convert_encoding($text, 'UTF-8', 'SJIS'); //from SJIS(SHIFT-JIS)
This little script is nice to findout the optional parameter you want for your arabic characters:
http://www.php.net/manual/de/function.mb-convert-encoding.php#97902
But converting won't be necessary if the file is already in UTF-8, it's only making sense if it's in some arabic encoding, so I think this is not really bringing you any further to the solution.
EDIT2:
Tried a different FTP-Program, Filezilla displays my files and folder, which have japanese names and the arabic one, correctly. (I was using WinSCP 4.3.4 before)

Related

PHP - How to save a file in Windows-1252?

I work on a system that automates signature generation for outlook. The part to generate the .htm files works great. But now I need to also add files in .txt format. If I use the content without any change in the encoding, all my accentuated characters are converted to a different value for example : "é" becomes "é" or "ô" becomes "ô".
This issue clearly looked like an encoding conflict of some sort. I tried to correct it by converting the text value input to the "Windows-1252" encoding.
$myText = iconv( mb_detect_encoding( $myText ) , "Windows-1252//TRANSLIT", $myText);
But it didn't change anything. I also tried with :
$myText = mb_convert_encoding($myText, "Windows-1252");
And it didn't work either. For both of these tests, I checked the file type with Atom (my IDE) and it recognise these files as UTF-8. But when I check on terminal with file -I signature.txt it responds with this encoding signature.txt: text/plain; charset=iso-8859-1
Note that if I manually change the encoding to Windows-1252 in Atom, the characters are correct.
Has anyone met the same problem ? Is there another way in php to specify the encoding of the file ?
I figured it out. The code to use was (as pointed out by #Powerlord):
$monTexteTXT = mb_convert_encoding($monTexteTXT, "Windows-1252", "UTF-8");
I had a false negative when I first tried this solution because when I opened the file the characters seemed broken. But once it was opened with outlook it was fine.

utf8_encode does not encode special characters ě/š/č/ř/ž/ý/á, etc

I have the following problem which seems to have no solution and I am absolutely disgusted.
I have Android application where users can upload file to my server and then they can access them. So if user opens his account, this application uses function scandir() and on my server I use method json_encode() to send data to my app to shows him his files and folders. And here is the problem:
If some user for example uploads file with special characters (Válcování stupHovitých vzorko za tepla.pptx) and this file is not utf-8 encoded, then I can't pass it via json_encode, because I get UTF-8 error. So I tried to use method uf8_encode() on each file name and it worked, BUT if there is some file or folder with special characters like č/š/ě/ř/ž/á/ý/í/é, etc. and use method utf8_encode() on it then I get some mess in my application and instead of getting folder with name č, I get name Ä.
I tried nearly everything from htmlspecialchars() to iconv(), but I can't find a method which returns me files and folders on my server with proper names.
Yes, it does not. The doc reads:
utf8_encode — Encodes an ISO-8859-1 string to UTF-8
Not sure what encoding it is, but it's definitely not ISO-8859-1.
You need to use mb-convert-encoding to convert between arbitrary encodings. E.g.
$utfStr = mb-convert-encoding('č/š/ě/ř/ž/á/ý/í/é', 'UTF-8', 'ISO-8859-15')
If you don't know client's encoding, you may need to use mb_detect_encoding, which may not always work, or be exactly accurate.
To avoid this mess, I would recommend to do it other way round and send utf-encoded file name from your android app, rather than convert it serverside.

Bug with php file converted from ansi to utf-8

I have a few php scripts files encoded in ANSI. Now that I converted my website to html5, I need everything in UTF-8, so that accents in these file are displayed correctly without any php conversion through iconv(). I used Notepad++ to set the encoding of my scripts on UTF-8 and save the files, and most are fine, accents are displayed correctly, only the main script now blocks everything, and the server only returns a white page, without any error message, even with ini_set('error_reporting', 'E_ALL') !
When I change the encoding back to ANSI in Notepad++, and save the file without any other change, it works again (except the accents are not displayed correctly without iconv() ).
I did also try to use a php script to change the encoding with ...$file = iconv('ISO-8859-1','UTF-8', $file);... but the result is exactly the same !
I wrote a short php script to look for high char() values, but the highest values seems to be usual French accents like é, è, etc which are also present on other files and pose no problem. I did remove other special chars, without any effect...
The problem is that the file is large, more than 4500 lines and I'm not sure how to proceed to correct this ? Anyone has had this problem, or has any idea ?
The issue was with the "£" (pound) character, I used it a lot as delimiter in preg_match("£(...)£", "...", $string) and preg_replace conditions.
For some reason these characters were not accepted after conversion. I had to replace all of them, then only it worked fine in utf-8... Apparently they are not a problem now that the file is converted, I can use them again.

Encoding issue with Apache , displaying diamond characters in browser

Request you all to help me set up Apache server on Cent OS. It looks like some encoding issue, but I am not able to resolve it yet.
Instead of HTML content it displays HTML source in (chrome,firefox), IE 9 works fine. It displays � character after each "<" symbol.
http://pdf.gen.in/index1.htm
Second Problem is with PHP. It displays source code of PHP http://pdf.gen.in/index.php with similar diamond characters, wherever it encounters a "<" character. It seems like php issue is related to the first issue.
Those files are encoded with UTF-16LE. For the static HTML page, you might be able to get it to work by setting the charset correctly in the MIME type (it's currently text/html; charset=UTF-8). I don't know how strong PHP's Unicode support is. Try using UTF-8 instead, it's generally more well supported due to its partial overlap with ASCII.
You should use a decent text editor, and always set encoding of php/html to "UTF-8 without BOM".
Create a file named "test.php", paste below codes and save with "UTF-8 without BOM" encoding, then it will work just fine.
<?php
phpinfo();
?>

PHP fwrite function to write txt file in utf-8 encoding

I have made a form where a user writes his message in Arabic and submits it by a submit button. The message is saved in database and I need to create a .txt file on the server for some other application which shows something like this :
د پوليسو Ù¾Ø
I successfully used the fopen, fwrite functions to create my txt files.
When I open the file in notepad the Arabic text is shown correctly
but when I open it in eclipse I get something like this :
د پوليسو پر روزنيز مرکز توغندويي بريد وشو
Well afterwards when I save the txt file in notepad as utf-8 encoding the above unknown stuff changes to Arabic.
But I cant do that manually for every message.
I searched a lot on the internet and did these:
I saved the script in utf-8
I used utf8_encode function
I set this too ini_set('default_charset', 'UTF-8');
this too <meta http-equiv="Content-Type" content="text/html; charset=utf-8; encoding=utf-8" />
I change the parameter in fwrite to "wb" where b is for binary
Any solution to this problem ill be very glad I have continuously worked on this issue for the last week. I know the problem is in the encoding so how can I write utf-8 encoded files using PHP?
If the text displays fine in one program but not another, that just means one program interprets the file correctly while the other doesn't. Most likely Notepad sets a UTF-8 BOM on the file when you save it again, so Eclipse now automatically recognizes that it's UTF-8 encoded. Without that, Eclipse assumes latin-1 or some other encoding as the default.
Two options:
change your Eclipse preferences to open files as UTF-8 by default
set a BOM on the file when writing it, see Encoding a string as UTF-8 with BOM in PHP
A BOM can be helpful for making programs recognize UTF-8 but can also cause problems in other programs that don't expect or want BOMs. Whether to use a BOM or not depends on your intended use and target audience.
In eclipse you need to set your encoding in menu Edit > Set Encoding...

Categories