Unable to save file with Thai characters filename - php

I'm currently trying to save a pdf file using mPDF library. My problem is when I try to output file using English filename, the filename is displayed correctly, but if the filename contains any Thai characters it became weird.
My mPDF outputs code.
$save_file = $s_code.'_'.$classroom.'.pdf';
$mpdf->Output('../../../upload/'.$save_file,'F');
With English filename it displayed correctly.
t10024_201.pdf
With Thai characters it doesn't.
เธ—เธช10024_201.pdf
I can't figure out what causes the problem.

The filename is restricted to the character set supported by the device (in this case, the server where mpdf is generating your pdfs), and doesn't actually reflect a problem with mpdf itself. [If you can add/write Thai characters within the pdf, just the filename doesn't reflect Thai characters].
You may need to configure the Content-Disposition headers for the webserver's response with PDF file. As an example, see this blog post that describes how a ColdFusion application developer dealt with saving files with French characters.

Thank you so much, Anson W Han.
It's about character encoding. I finally found a solution. I simply convert the filename to Thai edcoding using "iconv" and it displays correctly.
The code:
$mpdf->Output('../../../upload/'.iconv("UTF-8", "TIS-620",$save_file),'F');

Related

file_get_contents returns bizarre characters from raw text file

This is very bizarre. I have a .txt file on my Windows server. I'm using file_get_contents to retrieve it, but the first several characters show up as a diamond with a question make inside them. I've tried recreating the file from scratch and it's the same result. What's really bizarre is other files don't have this issue.
Also, if I put a * at the start of the file it seems to fix it, but if I try to open the file and do it with PHP it's still messed up.
The start of the file in question begins with: Trinity Cannon - that's a direct copy and paste from the text file. I've tried re-typing it and the first few characters are always that diamond with a question mark.
$myfile='C:\\inetpub\\wwwroot\\fastpitchscores\\data\\2020.txt';
$fh = file_get_contents($myfile);
echo $fh; // Trinity Cannon
echo $fh[0]; // �
It sounds like whatever editor you used to originally create the file a UTF Byte Order Mark at the beginning the file.
You typically can't edit the BOM from within an editor. If your editor has a encoding conversion functionality, try converting to ASCII. For example, in Notepad++ use Encoding->Encode in ANSI.

Missing characters in filled pdf using PDFTk with encoding UTF-8

I'm trying to fill pdf documents using PDFTk. Script working fine, it fills inputs in form but I don't get special characters [polish charset: UTF-8 or ISO-8859-2].
Script: https://github.com/mikehaertl/php-pdftk
The weird thing is that generated pdf actually has polish characters when I click on field.
Before click:
After click on field:
Default encoding is set to UTF-8. The problem is that PDFTk can't use chars outside the standard ASCII with FDF form fill. It doesn't allow multi-byte characters.
What I did:
Add fonts to pdf files (checked and files has font)
Create fields in pdf files with default font (Arial)
Change encoding in script (function fillForm) to ISO-8859-2
Change data values encoding (iconv or mb_convert_encoding)
Change functions encoding and data value encoding to ISO-8859-2
Flatten pdf after filling the form
Read all topics about this problem in stackoverflow, google
UPDATE (25.03.2016): Findout that pdf documents works fine on some computers. Some people have polish characters and other don't. All of
us have right fonts (with polish charset). I used default Arial or
Times New Roman. Fonts are also embed in that file.
Any ideas?
you need to run pdftk with need_appearances as an argument.
kudos to the guys from this issue on github.
I had similar issue.
Solved it with utf8_decode function. eg utf8_decode('Łukasz')
The best results (without flatten) I got when I was creating FDF file with UTF-8 values encoded into UTF-18BE
chr(0xfe) . chr(0xff) . str_replace(array('\\', '(', ')'), array('\\\\', '\(', '\)'), mb_convert_encoding($string, 'UTF-16BE'));
Your library works quite well but ie. when I open the PDF generated with it directly in Safari on MACOS it does not show polish chars until I click the field. When I open it with Adobe Reader - it works fine.
I could not find how to change font, so my solution - use itext, https://itextpdf.com/en/resources/examples/itext-5/filling-out-forms
wrote for my project https://github.com/dddeeemmmooonnn/pdf_form_filler

XML file isn't UTF-8 encoded when created in PHP

I'm trying to output XML file using PHP, and everything is right except that the file that is created isn't UTF-8 encoded, it's ANSI. (I see that when I open the file an do the Save as...).
I was using
$dom = new DOMDocument('1.0', 'UTF-8');
but I figured out that non-english characters don't appear on the output.
I was searching for solution and I tryed first adding
header("Content-Type: application/xml; charset=utf-8");
at the beginning of the php script but it say's:
Extra content at the end of the document
Below is a rendering of the page up to the first error.
I've tryed some other suggestions like not to include 'UTF-8' when creating the document but to write it separately:
$doc->encoding = 'UTF-8'; , but the result was the same.
I used
$doc->save("filename.xml");
to save the file, and I've tryed to change it to
$doc->saveXML();
but the non-english characters didn't appear.
Any ideas?
ANSI is not a real encoding. It's a word that basically means "whatever encoding my Windows computer is configured to use". Getting ANSI is a clear sign of relying on default encoding somewhere.
In order to generate valid UTF-8 output, you have to feed all XML functions with proper UTF-8 input. The most straightforward way to do it is to save your PHP source code as UTF-8 and then just type some non-English letters. If you are reading data from external sources (such as a database) you need to ensure that the complete toolchain makes proper use of encodings.
Whatever, using "Save as" in an undisclosed piece of software is not a reliable way to determine the file encoding.

Arabic characters and UTF-8 in aria2

I use aria2 to have download with XML_RPC and when i want to have a download like this in php :
$client->aria2_addUri( array($url), array("dir"=>'/home/amir/دانلود') );
it will create a folder named شسÛب instead of دانلود. i post a related post in aria2 forums. and they said aria2 has not problem if that string sent to aria2 with utf-8.
so, i used utf-8 header and convert the string to utf-8, but it's not works :
header('Content-type:application/json; charset=utf-8');
$dir_on_server = mb_convert_encoding($dir_on_server, 'UTF-8');
what do you think?
Try accessing the file or folder via the browser.
By writing a .htaccess-file with the content "Options Indexes" so that you're folders are shown.(I can even access them via http)
I created multiple files and folders by writing a script where the GET Value file or folder determines the name of the folder or file, I tried it with japanese and arabic characters. Albeit they won't be shown in FTP correctly (In my case only file names like: "?????") they are correctly displayed if you read them by script.
The problem might be at the program you're using to access your FTP, WinSCP for example has UTF-8 normally on "auto" by default, so forcing it might work out.(Although I have to admit that it's not working on my side, maybe my linux server is not supporting utf-8 file names which can also be a problem for you)
PS:
Also make sure your php-file is encoded(saved) in UTF-8 without BOM since you're using a constant utf-8 string.
EDIT:
Also if you still intent to use mb_convert_encoding, better add the optional parameter "from_encoding".
I tested this with japanese in a SHIFT-JIS encoded file:
$text = "A strange string to pass, maybe with some 日本語の characters.";
echo mb_convert_encoding($text, 'UTF-8');
and it's not displaying correctly although my browser has UTF-8 activated, so it seems to be not always right when it's trying to detect the Encoding.
So this for example works for me then:
$text = "A strange string to pass, maybe with some 日本語の characters.";
echo mb_convert_encoding($text, 'UTF-8', 'SJIS'); //from SJIS(SHIFT-JIS)
This little script is nice to findout the optional parameter you want for your arabic characters:
http://www.php.net/manual/de/function.mb-convert-encoding.php#97902
But converting won't be necessary if the file is already in UTF-8, it's only making sense if it's in some arabic encoding, so I think this is not really bringing you any further to the solution.
EDIT2:
Tried a different FTP-Program, Filezilla displays my files and folder, which have japanese names and the arabic one, correctly. (I was using WinSCP 4.3.4 before)

Encoding issue with Apache , displaying diamond characters in browser

Request you all to help me set up Apache server on Cent OS. It looks like some encoding issue, but I am not able to resolve it yet.
Instead of HTML content it displays HTML source in (chrome,firefox), IE 9 works fine. It displays � character after each "<" symbol.
http://pdf.gen.in/index1.htm
Second Problem is with PHP. It displays source code of PHP http://pdf.gen.in/index.php with similar diamond characters, wherever it encounters a "<" character. It seems like php issue is related to the first issue.
Those files are encoded with UTF-16LE. For the static HTML page, you might be able to get it to work by setting the charset correctly in the MIME type (it's currently text/html; charset=UTF-8). I don't know how strong PHP's Unicode support is. Try using UTF-8 instead, it's generally more well supported due to its partial overlap with ASCII.
You should use a decent text editor, and always set encoding of php/html to "UTF-8 without BOM".
Create a file named "test.php", paste below codes and save with "UTF-8 without BOM" encoding, then it will work just fine.
<?php
phpinfo();
?>

Categories