PHP Converting UnUnicode characters to Unicode (Full) - php

I've been trying to convert some special and ununicode characters to the original ones but i didn't get what i needed .
Lets have an example. I have a file in Persian name and i want to upload it. when i upload it the name changes to some strange characters like Ø ! I dont know why (And i dont want to know) but when i start to download it with IDM , the strange name changes back to its original .
Original File Name: ترمینال_گنبدکاووس.rar
Strange File Name: ترمینال_گنبدکاووس.rar
And this is not just my problem. the file name in every language converts to its own strange characters. and in another example when i want to convert babylon dictionaries (bgl) to stardict (mdx) format, the phonetics are converted to strange characters again.
I want to know that is there an absolute way to convert all these characters back to its original using PHP ???????

Related

Accents issue doc to txt

First I want to convert pdf file to html, but the api can't do that.
So I tried to convert pdf to txt. I have a lot of problems with multiple space or line...
So I tried (again) to convert pdf to word and word. The word is perfect.
Unfortunately, ConvertApi can't convert word to html... and I can found a free library to convert word to html.
So I tried (again and again) to convert word to txt.
Now I have accents problems on the txt file :
régime become r‚gime
matière become matiŠres
contrôle become contr“le

file_get_contents returns bizarre characters from raw text file

This is very bizarre. I have a .txt file on my Windows server. I'm using file_get_contents to retrieve it, but the first several characters show up as a diamond with a question make inside them. I've tried recreating the file from scratch and it's the same result. What's really bizarre is other files don't have this issue.
Also, if I put a * at the start of the file it seems to fix it, but if I try to open the file and do it with PHP it's still messed up.
The start of the file in question begins with: Trinity Cannon - that's a direct copy and paste from the text file. I've tried re-typing it and the first few characters are always that diamond with a question mark.
$myfile='C:\\inetpub\\wwwroot\\fastpitchscores\\data\\2020.txt';
$fh = file_get_contents($myfile);
echo $fh; // Trinity Cannon
echo $fh[0]; // �
It sounds like whatever editor you used to originally create the file a UTF Byte Order Mark at the beginning the file.
You typically can't edit the BOM from within an editor. If your editor has a encoding conversion functionality, try converting to ASCII. For example, in Notepad++ use Encoding->Encode in ANSI.

Unable to save file with Thai characters filename

I'm currently trying to save a pdf file using mPDF library. My problem is when I try to output file using English filename, the filename is displayed correctly, but if the filename contains any Thai characters it became weird.
My mPDF outputs code.
$save_file = $s_code.'_'.$classroom.'.pdf';
$mpdf->Output('../../../upload/'.$save_file,'F');
With English filename it displayed correctly.
t10024_201.pdf
With Thai characters it doesn't.
เธ—เธช10024_201.pdf
I can't figure out what causes the problem.
The filename is restricted to the character set supported by the device (in this case, the server where mpdf is generating your pdfs), and doesn't actually reflect a problem with mpdf itself. [If you can add/write Thai characters within the pdf, just the filename doesn't reflect Thai characters].
You may need to configure the Content-Disposition headers for the webserver's response with PDF file. As an example, see this blog post that describes how a ColdFusion application developer dealt with saving files with French characters.
Thank you so much, Anson W Han.
It's about character encoding. I finally found a solution. I simply convert the filename to Thai edcoding using "iconv" and it displays correctly.
The code:
$mpdf->Output('../../../upload/'.iconv("UTF-8", "TIS-620",$save_file),'F');

Cannot generate PDF for some specific simplified chinese characters via TCPDF

I have an existing program (codes) to generate PDF file via TCPDF. It works fine even contain non-English characters in most cases, but now, when the content has either two simplified Chinese characters 喆 (unicode number: 21894) or 旻 (unicode number: 26107), all Chinese characters will be converted to rectangle (invalid character).
I tried to check the uni2cid_ag15.php, and I can find the mapping of those two words and the mapped cids are correct. Is anyone know the reason for converting the Chinese characters incorrectly with that specific character(s)?
References:
https://raw.githubusercontent.com/adobe-type-tools/cmap-resources/master/cmapresources_gb1-5/cid2code.txt
https://github.com/tecnickcom/TCPDF/blob/master/fonts/uni2cid_ag15.php
Thanks for the advice in advance.
I found out the solution by using new encoding "GB18030" for php function mb_convert_encoding, instead of "GB2312". Those characters can be generated in the PDF without problem.

special characters in url filename cause problems

I have the following www.mywebsite.com/upload/server/php/files/foto/test/Aston_Martin_DBS_V12_coupé_(rear)_b-w.jpg
This file is uploaded trough a script. The file exists on the server.
However, because the special character in the url (é), I am experiencing some problems.
The filename on the server is Aston_Martin_DBS_V12_coup%C3%A9_(rear)_b-w.jpg, which is correct. However somehow my browser (Chrome) requests this page as ISO-8859-1 instead of UTF-8.
Therefore, I get a 404.
I am using jQuery file upload plugin.
I deleted my answer from here and i wrote new:
Usually websites does not contain files with non-standard characters. Files usually have removed non standard characters, sometimes that characters are replaced by similar standard chars (Polish ą to a, ś to s). For example - im renaming files manually, or when i have a lot of files - i just use bash or php script that removes/replaces that characters in filenames on server.
Anyway, if you HAVE TO use original filenames - you have to decode them from ISO and encode them to UTF8.
Take look at that php code fragment here:
how to serve HTTP files with special characters
Some special Charater make problem in url for filename
like
+ ,#,%,&
For those file which are accessing through url make file which not contain above letters
forex
str_replace(array(" ","&","'","+","#","%"),"-","filename")
it will works fine
If the filename contains the % character codes, you will need to encode those in your URL. Try accessing Aston_Martin_DBS_V12_coup%25C3%25E9_(rear)_b-w.jpg

Categories