I have string like this - "Divizia NaÅ¢ionalÄ", and want to convert it to "Divizia Naţională", which is correct string. What heppand, how to remove this special characters? Hich PHP function I can use? Now to convert it to " Devizia Nationala" which is best readable for everyone?
Use UTF8
Be sure your data source is UTF8
Learn how to force browsers to display UTF8
Don't use other encodings, convert everything to UTF8 as soon as possible
Your best support on this mission is iconv()
Related
I have a URL like: domain.tld/Σχετικά_με_μας
[edit]
Reading the $_SERVER['REQUEST_URI'] I get to work with:
%CE%A3%CF%87%CE%B5%CF%84%CE%B9%CE%BA%CE%AC_%CE%BC%CE%B5_%CE%BC%CE%B1%CF%82
[/edit]
In PHP I need to convert it to HTML, I get pretty far with:
htmlentities(urldecode($navstring), ENT_QUOTES, 'UTF-8');
It results in:
Σχετικά_με_μας
but the 'ά' becomes 'ά' But I need it converted to
ά
I'dd really appreciate help. I need a universal solution, not a "string replace"
I have been playing around a little, and the following worked. Use mb-convert-encoding instead of htmlentities.:
mb_convert_encoding(urldecode($navstring),'HTML-ENTITIES','UTF-8');
//string(90) "domain.tld/Σχετικά_με_μας"
See mb-convert-encoding
Information
All modern web browsers understand UTF-8 character encoding.
My advice would be :
Always know the character encoding of the data you are using.
Store your data with UTF-8.
Output data with UTF-8
The mbstring php extension doesn't just manipulate Unicode strings. It also converts multibyte strings between various character encodings.
Use the mb_detect_encoding() (ref) and mb_convert_encoding() (ref 2) functions to convert Unicode strings from one character encoding to another.
PHP Needs to know !
You also need to tell PHP that you are working with UTF-8, to tell him the default value, you can do it in your php.ini file :
default_charset = "UTF-8";
That default value is added to the default Content-Type header returned by PHP unless you specified it with the header() function :
header('Content-Type: application/json;charset=utf-8');
Keep in mind
The default character set is used by a lot of functions in PHP such as :
htmlentities()
htmlspecialchars()
all the mbstring functions
...
i am trying to get some information from a webpage however it is in a different encoding is there an easy way to convert to utf8 and then use it?
For example i am getting these urls which i will need to visit
http://www.mega.co.il/jsfweb/cat/טופו/
http://www.mega.co.il/jsfweb/cat/גבינה_מלוחה/
http://www.mega.co.il/jsfweb/cat/גבינה_לארוח/
http://www.mega.co.il/jsfweb/cat/גבינה_מותכת/
http://www.mega.co.il/jsfweb/cat/גבינה_צהובה/
http://www.mega.co.il/jsfweb/cat/גבינה_לבנה/
http://www.mega.co.il/jsfweb/cat/קוטג/
how do i turn that to utf8 and then urlencode in php?
You can try function html_entity_decode() to decode that entities. To change decoding, use mb_convert_encoding(). I have no experience with Hebrew, so I don't know if it would work.
What is the best way to convert user input to UTF-8?
I have a simple form where a user will pass in HTML, the HTML can be in any language and it can be in any character encoding format.
My question is:
Is it possible to represent everything as UTF-8?
What can I use to effectively convert any character encoding to UTF-8 so that I can parse it with PHP string functions and save it to my database and subsequently echo out using htmlentities?
I am trying to work out how to best implement this - advice and links appreciated.
I am making use of Codeigniter and its input class to retrieve post data.
A few points I should make:
I need to convert HTML special characters to their respective entities
It might be a good idea to accept encoding and return it in that same encoding. However, my web app is making use of :
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
This might have an adverse effect on things.
Specify accept-charset in your <form> tag to tell the browser to submit user-entered data encoded in UTF-8:
<form action="foo" accept-charset="UTF-8">...</form>
See here for a complete guide on HOW TO Use UTF-8 Throughout Your Web Stack.
Is it possible to represent everything as UTF-8?
Yes, UTF-8 is a Unicode encoding, so you can use any character defined in Unicode. That's the best you can do with a computer to date.
What can I use to effectively convert any character encoding to UTF-8
iconv lets you convert virtually any encoding to any other encoding. But, for that you have to know what encoding you're dealing with. You can't say "iconv, whatever this is, make it UTF-8!". That's unfortunately not how it works. You can only say "iconv, I have this string here in BIG5, please convert that to UTF-8.".
If you're only dealing with form data in UTF-8 though, you'll probably never need to convert anything.
so that I can parse it with PHP string functions
"PHP string functions" work on bytes. They don't care about characters or encodings. Depending on what you want to do, working with naive PHP string functions on UTF-8 text will give you bad results. Use encoding-aware string functions in the MB extension for any multi-byte encoding string manipulation.
save it to my database
Just make sure your database stores text in UTF-8 and you have set your database connection to UTF-8 (i.e. the database knows you're sending it UTF-8 data). You should be able to specify that in the CodeIgniter database connection settings.
subsequently echo out using htmlentities?
Just echo htmlentities($text), nothing more you need to do.
However, my web app is making use of : <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
This might have an adverse effect on things.
Not at all. It just signals to the browser that your page is encoded in UTF-8. Now you just need to make sure that's actually the case (as you're trying to do anyway). It also implies to the browser that it should send UTF-8 to the server. You can make that explicit with the accept-charset attribute on forms.
May I recommend What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text, which might help you understand more.
1) Is it possible to represent everything as UTF-8?
Yes, everything defined in UNICODE. That's the most you can get nowadays, and there is room for the future that UNICODE can support.
2) What can I use to effectively convert any character encoding to UTF-8 so that I can parse it with PHP string functions and save it to my database and subsequently echo out using htmlentities?
The only thing you need to know is the actual encoding of your data. If you want your webapplication to support UTF-8 for input and output, the frontend needs to signal that it supports UTF-8. See Character Encodings for a guide regarding your applications user-interface.
Within PHP you need to feed any function with the encoding it supports. Some need to have the encoding specified, for some you need to convert it. Always check the function docs if it supports what you ask for. Additionally check your PHP configuration.
Related:
Preparing PHP application to use with UTF-8
How to detect malformed utf-8 string in PHP?
If you want to change the encoding of a string you can try
$utf8_string = mb_convert_encoding( $yourBadString , 'UTF-8' );
I found out that the only thing that works out for UTF-8 encoding is setting inside my config.php
putenv('LC_ALL=en_US.utf8'); // or whatever language you need
setlocale(LC_ALL, 'en_US.utf8'); // or whatever language you need
bindtextdomain("mydomain", dirname(__FILE__) . "/../language");
textdomain("mydomain");
EDIT :
Is it possible to represent everything as UTF-8?
Yes, these is what you need to ensure :
html : headers/meta-header set to utf-8
all files saved as utf-8
database collation, tables and data encoding to utf-8
What can I use to effectively convert any character encoding to UTF-8
You can use utf8_encode (Since for a system set up mainly for Western European languages, it will generally be ISO-8859-1 or its close relation,ref) before saving it into your database.
// eg
$name = utf8_encode($this->input->post('name'));
And as i mention before, you need to make sure database collation, tables and data encoding to utf-8. In CI, at your database connection config
// Make sure have these lines
$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_general_ci';
i wanna convert to original string of “Cool†..Origingal string is cool . (' is backquote)
It seems that you just forgot to specify the character encoding properly.
Because “ is what you get when the character “ (U+201C) encoded in UTF-8 (0xE2809C) is interpreted with a single-byte character encoding like Windows-1252 (default character encoding in some browsers) where 0xE2, 0x80, and 0x9C represent the characters â, €, and œ respectively.
So just make sure to specify your character encoding properly. Or if you actually want to use Windows-1252 as your output character encoding, you can convert your UTF-8 data with mb_convert_encoding, iconv or similar functions.
There's a wide variety of character encoding functions in PHP, especially if you have access to the multibyte string functions. (mb_string is thankfully enabled on most PHP installs.)
What you need to do is convert the encoding of the original string to the encoding you require, but as I don't know what encoding has been used/is required all I can suggest is that you could try using the mb_convert_encoding function, possibly after using mb_detect_encoding on the original string.
Incidentally, I'd highly recommend attempting to keep all data in UTF-8, (text files, HTML encoding, database connections/data, etc.) as you'll make your life a lot easier this way.
Can php convert strings with all charset encodes to utf8?
Solutions that don't works:
utf8_encode($string) - but its only Encodes an ISO-8859-1 string to UTF-8?
iconv($incharset, $outcharset,$text) - but how can be find string current encodding?
(only can be if string part of html dom document, not just string)
thanks
It is possible to convert a string from any encoding supported by iconv() into UTF-8 in PHP.
but how can be find string current encodding?
You should never need to "find" the current encoding: Your script should always know what it is. Any resource you query, if properly encoded, will give you its encoding in the content-type header or through other means.
As Artefacto says, there is the possibility of using mb_detect_encoding() but this is not a reliable method. The data flow of the program should always have it defined what encoding a string is in (and preferably work with UTF-8 internally) - that's the way to go.
In general, you cannot know the encoding a given string using.
All you can do is guess. There's mb_detect_encoding, which doesn't really work well and then there are more complex heuristics, such as those used by browsers, which employ language cues.