I'm developing a site with codeigniter that support multilanguage. When a user search with their native language I got the first result when I paginate the result the character is not decoding.
This is the url which is used to paginate.
When I print the uri segment I got %E0%B4%AE
I tried the url encode and url decode that time I got a different charecter like à´®
Can any one tell me how can I decode this type of charecterset?
While urldecode is what you should be using, the reason that you are getting the wrong output printed is probably because the output page's encoding hasn't been set to UTF-8, and is thus defaulting to ISO-8859-1. Hence, while the characters have been decoded correctly by PHP, the browser then interprets the characters in the wrong encoding, resulting in incorrect display.
To fix the problem, send a charset in the Content-type header before any output like so:
header('Content-type: <type>; charset=utf-8');
If your output page is HTML, you could alternatively use this tag in the head:
<meta charset="utf-8">
If you take the second option, be sure to place the tag as early as possible in the head, as browsers do not scan past the first 1024 bytes of the page for this declaration.
Related
I have one problem when using accents in HTML. The problem is that my page is loaded sometimes with all characters ok and sometimes with the typical strange characters like Ã, only need to refresh the page to load ok or wrong... this is absolutely random but first time after clean cache is always bad loaded.
Of course I have the meta line in headers
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>"
The file have php extension, don't know if this is relevant but I include the next two lines in the php section:
header("Content-Type: text/html;charset=UTF-8");
ini_set('default_charset', 'UTF-8');
Thanks
Those settings tell the browser what encoding you say you are using but doesn't change your encoding itself,
if your data is not utf8 encoded you need to encode it in your code using something like the utf8_encode() function or the mb_convert_encoding() function.
you can use the function mb_detect_encoding() to find out what encoding your data is in, en then encode accordingly.
I'm using PHP and cURL to access a remote API. It returns a JSON result. The API returns user-posted content, so I expected some odd characters here and there. However, very simple characters such as – or ’ are being echoed out via PHP as Chinese characters (I'm aware those aren't true dashes or apostrophes, but rather some equivalent). Nonetheless, other websites manage to display them fine, so I'm not sure why they're echoed out as Chinese characters in my case.
For example: the character ’ echoes out as 鈥檙.
I've tried various PHP methods at my disposal to get them to encode or display correctly, including:
htmlentities()
utf8_encode()
htmlspecialchars()
and none make a difference.
Additionally, I've checked and my page does have
<meta charset="utf-8">
at in the <head> element.
Am I missing an obvious solution? I feel like I must be.
鈥檙 is not a special charter its unicode. special charters are still ascii and takes 8 bits.
whereas unicode take 16 bits.
Have you tried removing
<meta charset="utf-8">
The API's HTTP Content-Type should give you an idea of the character encoding. You need to view the headers returned by your curl request to see what encoding you're receiving. Running curl from the command line will show you:
curl -v http://...
For example, curl -v google.com shows:
Content-Type: text/html; charset=UTF-8
Then you need to be sure that you are respecting that character encoding in your database, and in your HTML meta tag.
So, I was just an idiot. I failed to notice that there was a conflicting meta tag on MY page adding the WRONG charset. Thanks to all who took time to try and help.
I have converted results from a web scrape from DOMNodeLists to strings:
$node = $the_sentence->item(0);
$the_sentence = "{$node->nodeName} - {$node->nodeValue}";
However now when I print out the result it includes whatever tag the text had in the page as well as the   character:
Before:
"This is the sentence"
Now:
"h2 - This is the Âsentence Â"
Any ideas how I can get rid of these characters? Thanks for any help.
This looks like a character set problem.
Have a look at the source page and see what character set it is encoded in. This might be in a Content-Type HTTP header, or it might be in a <meta> tag at the start of the document. Then, when you handle the data, make sure that everything you do handles it in the same format.
You probably want to store the data in UTF-8. Thus, if you capture in another format, in general it is a good idea to convert it from that charset to UTF-8; this will mean you can capture from a wide range of sources and store it in the same database. Look at iconv in the PHP manual if you wish to learn more about charset conversion.
Are you printing the output to console or a browser? If the former, note that some consoles (old versions of Windows in particular) do not handle UTF-8 well at all. If you are echoing to a browser, make sure your character set is set to "UTF-8" in your own HTML.
I know a number of post is there for utf-8 encoding issue. but i'm getting fail to convert string into utf-8.
I have a string "beløp" in php.
When i print this screen in i frame it printed "bel�p".
After that i tried - utf8_encode("beløp"); - now i got output - "bel�p".
Again i tried iconv("UTF-8", "ISO-8859-1", "beløp"); now i got output - "bel ".
And finally i tried - utf8_encode(utf8_decode("beløp")); now i got output - "bel?p".
Please let me know where i'm wrong and how i can fix it.?
This
bel�p
is an indication that you are outputting a non-UTF-8 character in a UTF-8 context.
Make sure your file is encoded in UTF-8 ( Don't know what editor you're using, but Notepad++/Sublime Text got a "Save with encoding.." option ) and if at the top of your HTML page there's
<meta charset="utf-8">
Hi it's fixed there was problem in my file it was not encoded in "UTF-8".
I fixed by replacing "bel�p" to "beløp".
The reason your conversion does not work is because the original format of your "beløp" text was not in iso-8859-1. The utf8_encode will only work for conversions is from this format. What could work for this type of issues is to use mb_detect_encoding function (http://php.net/manual/en/function.mb-detect-encoding.php) to find out which format the text is originally from, then use the iconv convert from the detected encoding to utf-8. When this is done you have to make sure as mentioned on earlier comments that utf-8 is as encoding in the header.
Note that the php mb detect enconding is not very reliable and can make mistakes on detecting correct encoding. Especially if you do not have a large amount of text. To ensure to display all text correct at all times you need to make sure that all processing at all times is in the same encoding. If you get the text from external sources or web services you should always check the headers for correct encoding before the text is processed.
I have a script which caches a number of RSS feeds, however I have noticed that I've started getting strange characters appearing in the page where I output the cached contents (Stored in DB).
For instance the RSS feed contains the characters: Introducing…: ...
Which should read: Introducing...: ...
However my page displays it as: Introducing…: ...
It seems that these strangers chars are actually being stored in the database like this.
Can anyone suggest where I might be going wrong?
Do I need to encode on the way into the database the decode on the way out?
You need to make sure that the encoding of the RSS feed is the same as in your DB. Otherwise you first need to convert the content.
The encoding of the feed should be in the XML header:
<?xml version="1.0" encoding="UTF-8"?>
You can use this function to convert it to the encoding you use in the DB (preferably UTF-8):
http://php.net/manual/function.mb-convert-encoding.php
When you use UTF-8 then make sure you set the database connection to utf-8.. f.e. in mysql
SET NAMES 'utf-8';
Then set the correct output content-type like described by Anthony Williams. At best you do both: set the META Content-Type and send the Content-Type HTTP-Header.
Since your application seems to decode the htmlentities of that cached RSS feed before writing them to the DB, you may also output them like you got them in the first place
<?php echo htmlentities($string, ENT_QUOTES, 'UTF-8'); ?>
The fact that there are 3 bad characters in the output suggests that the RSS feed is being interpreted so that the HTML character reference is converted to UTF-8.
Try setting the text encoding of your display page to UTF-8 by adding the following to the output HTML in the <head> section:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Alternatively, since this is PHP you can set the HTTP header directly:
<?php
header("Content-Type: text/html; charset=UTF-8");
?>
However, a better solution might be to avoid converting the entity in the first place. Have you got a call to html_entity_decode() in the code that retrieves the RSS feed? If so, then it might be wise to remove it.