What kind of encoding is this string?

What kind of encoding is this string? - php

To keep it simple to explain:
I moved a bunch of folders to my linux VPS
This is the original name:
Rodinné záležitosti
And the folders name became like so:
Rodinn#U00e9 z#U00e1le#U017eitosti
And when I enter via Browser on the folder this is the url
www.localhost/folders/Rodinn%23U00e9%20z%23U00e1le%23U017eitosti/
How can I from this string: "Rodinné záležitosti" get the folder url?

You have a different question in your title than your body.
For your title question, it is unicode.
For your body answer, I am guessing that you have not set a character encoding. Try UTF-8 as it's a recommended standard, and hopefully the character encoding that whatever function or software you are using to read the folder name is assuming that you are using.
Please share the code that you are using to read folder names so then it's possible to let you know -how- to set your character encoding.
FYIL
Rodinn#U00e9 z#U00e1le#U017eitosti
with a bit of a clean up:
Rodinn\u00e9 z\u00e1le\u017eitosti
ran through a unicode to text converter:
Rodinné záležitosti

Related

Arabic characters and UTF-8 in aria2

I use aria2 to have download with XML_RPC and when i want to have a download like this in php :
$client->aria2_addUri( array($url), array("dir"=>'/home/amir/دانلود') );
it will create a folder named Ø´Ø³ÛØ¨ instead of دانلود. i post a related post in aria2 forums. and they said aria2 has not problem if that string sent to aria2 with utf-8.
so, i used utf-8 header and convert the string to utf-8, but it's not works :
header('Content-type:application/json; charset=utf-8');
$dir_on_server = mb_convert_encoding($dir_on_server, 'UTF-8');
what do you think?

Try accessing the file or folder via the browser.
By writing a .htaccess-file with the content "Options Indexes" so that you're folders are shown.(I can even access them via http)
I created multiple files and folders by writing a script where the GET Value file or folder determines the name of the folder or file, I tried it with japanese and arabic characters. Albeit they won't be shown in FTP correctly (In my case only file names like: "?????") they are correctly displayed if you read them by script.
The problem might be at the program you're using to access your FTP, WinSCP for example has UTF-8 normally on "auto" by default, so forcing it might work out.(Although I have to admit that it's not working on my side, maybe my linux server is not supporting utf-8 file names which can also be a problem for you)
PS:
Also make sure your php-file is encoded(saved) in UTF-8 without BOM since you're using a constant utf-8 string.
EDIT:
Also if you still intent to use mb_convert_encoding, better add the optional parameter "from_encoding".
I tested this with japanese in a SHIFT-JIS encoded file:
$text = "A strange string to pass, maybe with some 日本語の characters.";
echo mb_convert_encoding($text, 'UTF-8');
and it's not displaying correctly although my browser has UTF-8 activated, so it seems to be not always right when it's trying to detect the Encoding.
So this for example works for me then:
$text = "A strange string to pass, maybe with some 日本語の characters.";
echo mb_convert_encoding($text, 'UTF-8', 'SJIS'); //from SJIS(SHIFT-JIS)
This little script is nice to findout the optional parameter you want for your arabic characters:
http://www.php.net/manual/de/function.mb-convert-encoding.php#97902
But converting won't be necessary if the file is already in UTF-8, it's only making sense if it's in some arabic encoding, so I think this is not really bringing you any further to the solution.
EDIT2:
Tried a different FTP-Program, Filezilla displays my files and folder, which have japanese names and the arabic one, correctly. (I was using WinSCP 4.3.4 before)

TCPDF - missing Swedish characters

I'm using TCPDF in Codeigniter to generate PDF file.
I have link from 'view' to 'controller' function that contains parameters. One parameter is name = 'Högskolan'.
When I get this parameter in controller and display in PDF sometimes it's displayed like 'Högskolan', sometimes like 'Hgskolan'(missing swedish character). This issue happens only in IE (sometimes - not always).
Also there are differences when I save this file with File->Save as and File->Save.
With first option the file is saved as 'Hgskolan.pdf', with the second as 'Högskolan.pdf'.
What can caused these issues? Any idea?
Thanks.

I would suggest doing something like this: Convert accented characters to their plain ascii equivalents
You will find it works better to output without the accents (provided it doesn't substantially change the word of course).
As anttir suggested it's probably a browser specific issue or system issue not liking the characters. Can you test output on another browser or another platform to isolate the issue.
I'm not 100% sure if it's TCPDF tripping you up there, or the browser. You can test that with something like Fiddler (http://fiddler2.com/) or Charles (http://www.charlesproxy.com/) [both debugging proxies].

Php: Reading special characters in a variable from a required file

There are three files: index.php, config.php and [language prefix].php.
Index sets basic settings (like the include path) and passes the control to "config.php.
Config sets a lot more things, including the language, and when it knows the language it requires the language file.
The language file stores some vars for static translation (like $menu=array('foo','bar','etc');)
I've done everything and tested everything(locally), but when i uploaded to the server, every variable which contained special characters (like áéíóúâêîôû etc...) declared outside index.php (either in config.php or in the language file) resulted in invalid characters(�), but if i declare it inside index, the characters appear normally.
As it worked locally, i am assuming that it should be due to a server setting. What should be the problem? (I have UTF-8 headers and the files are UTF-8 encoded)
More info:
I got a script to translate the date across languages, but it has words like "Sábado" which are correctly printed. The script itself is included by the template, but the vars are set and used inside the same file. Can require change the encoding of a file?

Save your files encoded as utf-8, set the page charset to be utf-8 and if there is data from db also set connection as utf-8.

Use a text editor with which you can specify the encoding as UTF-8. I use TextPad. I also found with TextPad that when you are saving, you need to specify the file format as UNIX as opposed to PC. I don't do my main editing with TextPad, but it's useful for specifying the encoding.
UPDATE: I recently fount that TextPad doesn't handle UTF-8 encoding well, so I switched to Notepad++. I had another encoding problem that was quickly resolved after switching from TextPad to Notepad++.

Encoding issue with Apache , displaying diamond characters in browser

Request you all to help me set up Apache server on Cent OS. It looks like some encoding issue, but I am not able to resolve it yet.
Instead of HTML content it displays HTML source in (chrome,firefox), IE 9 works fine. It displays � character after each "<" symbol.
http://pdf.gen.in/index1.htm
Second Problem is with PHP. It displays source code of PHP http://pdf.gen.in/index.php with similar diamond characters, wherever it encounters a "<" character. It seems like php issue is related to the first issue.

Those files are encoded with UTF-16LE. For the static HTML page, you might be able to get it to work by setting the charset correctly in the MIME type (it's currently text/html; charset=UTF-8). I don't know how strong PHP's Unicode support is. Try using UTF-8 instead, it's generally more well supported due to its partial overlap with ASCII.

You should use a decent text editor, and always set encoding of php/html to "UTF-8 without BOM".
Create a file named "test.php", paste below codes and save with "UTF-8 without BOM" encoding, then it will work just fine.
<?php
phpinfo();
?>

How to avoid echoing character 65279 in php?

I have encountered a similar problem described here (and in other places) -
where as on an ajax callback I get a xmlhttp.responseText that seems ok (when I alert it - it shows the right text) - but when using an 'if' statement to compare it to the string - it returns false.
(I am also the one who wrote the server-side code returning that string) - after much studying the string - I've discovered that the string had an "invisible character" as its first character. A character that was not shown. If I copied it to Notepad - then deleted the first character - it won't delete until pressing Delete again.
I did a charCodeAt(0) for the returned string in xmlhttp.responseText. And it returned 65279.
Googling it reveals that it is some sort of a UTF-8 control character that is supposed to set "big-endian" or "small-endian" encoding.
So, now I know the cause of the problem - but... why does that character is being echoed?
In the source php I simply use
echo 'the string'...
and it apparently somehow outputs [chr(65279)]the string...
Why? And how can I avoid it?

To conclude, and specify the solution:
Windows Notepad adds the BOM character (the 3 bytes: EF BB BF) to files saved with utf-8 encoding.
PHP doesn't seem to be bothered by it - unless you include one php file into another -
then things get messy and strings gets displayed with character(65279) prepended to them.
You can edit the file with another text editor such as Notepad++ and use the encoding
"Encode in UTF-8 without BOM",
and this seems to fix the problem.
Also, you can save the other php file with ANSI encoding in notepad - and this also seem to work (that is, in case you actually don't use any extended characters in the file, I guess...)

If you want to print a string that contains the ZERO WIDTH NO-BREAK SPACE char (e.g., by including an external non-PHP file), try the following code:
echo preg_replace("/\xEF\xBB\xBF/", "", $string);

If you are using Linux or Mac, here is an elegant solution to get rid of the  character in PHP.
If you are using WordPress (25% of Internet websites are powered by WordPress), the chances are that a plugin or the active theme are introducing the BOM character due a file that contains BOM (maybe that file was edited in Windows). If that's the case, go to your wp-content/themes/ folder and run the following command:
grep -rl $'\xEF\xBB\xBF' .
This will search for files with BOM. If you have .php results in the list, then do this:
Rename the file to something like filename.bom.bak.php
Open the file in your editor and copy the content in the clipbard.
Create a new file and paste the content from the clipboard.
Save the file with the original name filename.php
If you are dealing with this locally, then eventually you'd need to re-upload the new files to the server.
If you don't have results after running the grep command and you are using WordPress, then another place to check for BOM files is the /wp-content/plugins folder. Go there and run the command again. Alternatively, you can start deactivating all the plugins and then check if the problem is solved while you active the plugins again.
If you are not using WordPress, then go to the root of your project folder and run the command to find files with BOM. If any file is found, then run the four steps procedure described above.

You can also remove the character in javascript with:
myString = myString.replace(String.fromCharCode(65279), "" );

I had this problem and changed my encoding to utf-8 without bom, Ansi, etc with no luck. My problem was caused by using a php include function in the html body. Moving the include function to above my html (above !DOCTYPE tag) resolved the issue.
After I knew my issue I tested include, include_once and require functions. All attempts to include a file from within the html body created the extra miscellaneous 𐃁 character at the spot where the PHP code would start.
I also tried to assign the result of the include to a variable ... i.e $result = include("myfile.txt"); with the same extra character being added
Please note that moving the include above the HTML would not remove the extra character from showing, however it removes it from my data and out of the content area.

In addition to the above, I just had this issue when pulling some data from a MySQL database (charset is set to UTF-8) - the issue being the HTML tags, I allowed some basic ones like <p> and <a> when I displayed it on the page, I got the &#65729 character looking through Dev Tools in Chrome.
So I removed the tags from the table and that removed the &#65729 issue (and the blank line above the where the text was to be displayed.
I just wanted to add to this, since my Rep isn't high enough to actually comment on the answer.
EDIT: Using VIM I was able to remove the BOM with :set nobomb and you can confirm the presence of the BOM with :set bomb? which will display either bomb or nobomb

I use "Dreamweaver CC 2015", by default it has this option enabled: "include BOM signature" or something like that, when you click on save as option from file menu. In the window that apears, you can see "Unicode Options..". You can disable the BOM option. And remeber to change all your files like that. Or you can simply go to preferences and disable the BOM option and save all your files.

I'm using the PhpStorm IDE to develop php pages.
I had this problem and use this option of IDE to remove any BOM characters and problem solved:
File -> Remove BOM
Try to find options like this in your IDE.

Probably something on the server. If you know it's there, I would just bypass it until solved.
myString = myString.substring(1)
Chops off the first character.

When using atom it is a white space on the start of the document before <?php

A Linux solution to find and remove this character from a file is to use sed -i 's/\xEF\xBB\xBF//g' your-filename-here

My solution is create a php file with content:
<?php
header("Content-Type:text/html;charset=utf-8");
?>
Save it as ANSI, then other php file will require/include this before any html or php code

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

What kind of encoding is this string? - php

Related

Arabic characters and UTF-8 in aria2

TCPDF - missing Swedish characters

Php: Reading special characters in a variable from a required file

Encoding issue with Apache , displaying diamond characters in browser

How to avoid echoing character 65279 in php?

Categories

Resources