jQuery autocomplete special character (Norwegian) problems - php

I'm using jQuery's autocomplete function on my Norwegian site. When typing in the Norwegian characters æ, ø and å, the autocomplete function suggests words with the respective character, but not words starting with the respective character. It seems like I've to manage to character encode Norwegian characters in the middle of the words, but not characters starting with it.
I'm using a PHP script with my own function for encoding Norwegian characters to UTF-8 and generating the autocomplete list.
This is really frustrating!
Code:
PHP code:
$q = strtolower($_REQUEST["q"]);
if (!$q) return;
function rewrite($string){
$to = array('%E6','%F8','%E5','%F6','%EB','%E4','%C6','%D8','%C5','%C4','%D6','%CB', '%FC', '+', ' ');
$from = array('æ', 'ø', 'å', 'ä', 'ö', 'ë', 'æ', 'ø', 'å', 'ä', 'ö', 'ë', '-', '-');
$string = str_replace($from, $to, $string);
return $string;
}
$items is an array containg suggestion-words.
foreach ($items as $key=>$value) {
if (strpos(strtolower(rewrite($key)), $q) !== false) {
echo utf8_encode($key)."\n";
}
}
jQuery code:
$(document).ready(function(){
$("#autocomplete").autocomplete("/search_words.php", {
position: 'after',
selectFirst: false,
minChars: 3,
width: 240,
cacheLength: 100,
delay: 0
}
)
}
);

The bug (I think):
Strtolower() will not lowercase special characters.
Therefore, you are not converting capital special characters in your re-write function (Ä Æ Ø Å etc.)
if I understand the code correctly, a query for Øygarden(Notice the capital Ø) would leave the first character in its original form Ø, but you are querying against the urlencode()d form which should be %C3%98
You should use mb_convert_case() specifying UTF-8 as the encoding.
Let me know whether this solves it.
General re-writing suggestions
Your code could be replaced 100% using standard PHP functions, which can handle all Unicode characters instead of just those you specify, thus being less prone to bugs. I think the functionality of your custom rewrite() function could be replaced by
urldecode()
iconv()
you would then get proper UTF-8 encoded data that you don't need to utf8_encode() any more.
It could be possible to get a cleaner approach that way that works for all characters. It could also be that that already sorts whatever bug there is (if the bug is in your code).

I'm using a similar configuration but with Danish characters (æ, ø and å) and I do not have a problem with any characters. Are you sure you are encoding all characters correctly?
My response contains a | delimited list of values. All values are UTF-8 encoded (that's how they are stored in the database), and I set the content type to text/plain; charset=utf-8 using php's header function. The last bit is not needed for it to work though.
Frank

Thank you for all answers and help. I certainly learned some new things about PHP and encoding :)
But the solution that worked for me was this:
I found out that the jQuery autocomplete function actually UTF-8 encodes and lowercase special character before sending it to the PHP function. So when I write out the arrays of suggest content, I used my rewrite()-function to encode the special characters. So in my compare function I only had to lowercase everything.
Now it works great!

I had similar problem. solution in my case was urldecode() php function to convert string back to it's original and than send query to db.

Related

Change encoding from windows-1251 to utf-8

I'm trying to decode files created in windows-1251 and encode them to UTF-8. Everything works except some special characters such as ÅÄÖåäö. E.g Ä becomes Ž which I then use preg_replace to alter which works fine like below:
$file = preg_replace("/\Ž/", 'Ä', $file);
I'm having trouble with Å which shows up like this <U+008F>, which I see translates to single shift three and I can't seem to use preg_replace on it?
You have two major builtin functions to do the job, just pick one:
Multibyte String:
$file = mb_convert_encoding($file, 'UTF-8', 'Windows-1251');
iconv:
$file = iconv('Windows-1251', 'UTF-8', $file);
To determine why your homebrew alternative doesn't work we'd need to spend some time reviewing the complete codebase but I can think of some potential issues:
You're working with mixed encodings yet you aren't using hexadecimal notation or string entities of any kind. It's also unclear what encoding the script file itself is saved as.
There's no \Ž escape sequence in PCRE (no idea what the intention was).
Perhaps you're replacing some strings more than once.
Last but not least, have you compiled a complete and correct character mapping database of at least the 128 code points that differ between both encodings?

How to escape special characters in Code Igniter

I'm building a website in Code Igniter and i'm trying to save products in a database. I don't know what kind of products i get. It's a XML file. There ara some times special characters in it so Code Igniter crashes.
I fixed it like this
$search = array('â', 'à', 'á', 'ã', 'ä', 'å', 'Â', '€', 'š', '¢', 'Ã');
str_replace($search, '', $string)
But this is not nice and i haven't coverd all the possible special characters.
My question is: is there a eassier way to get this done?
Sorry for the short answer but I think you have issues regarding handling UTF8 encoded characters and need to read this (a guide to working with PHP, MySQL and UTF8 character encoding - other guides are available ) -
http://www.toptal.com/php/a-utf-8-primer-for-php-and-mysql
I believe you should take look at convert_accented_characters() function of Text Helper in CI
It transliterates high ASCII characters to low ASCII equivalents,
useful when non-English characters need to be used where only standard
ASCII characters are safely used, for instance, in URLs.
$string = convert_accented_characters($string); This function uses a
companion config file application/config/foreign_chars.php to define
the to and from array for transliteration.

Converting Æ to "Ae" In PHP With Str_replace?

For reasons justified by business logic, I need to convert the character "Æ" to "Ae" in a string. However, despite the fact that mb_detect_encoding() tells me the string is UTF-8, I can't figure out how to do this. (And for other reasons of business logic, it would be an issue to htmlentities() the string before replacing it, as other Google searches have suggested.)
What I tried first was this, using the test string "Æther":
return str_replace("Æ", 'Ae', $string);
Unfortunately, that doesn't actually find the Æ in the text, returning "Æther".
return str_replace(chr(195), 'Ae', $string);
That finds the Æ and replaces it, but adds an unknown character afterwards, changing it to the not-usable "Ae�ther." So I tried this:
$ae_character = mb_convert_encoding('&#' . intval(195) . ';', 'UTF-8', 'HTML-ENTITIES');
return str_replace($ae_character, 'Ae', $string);
Which again failed to find the Æ character in the string. I know it's a UTF-8 issue of some sort, but I'm honestly stumped as to how to search for and replace this without adding the extra character afterwards. Any ideas?
<?php
$x = 'Æmystr';
print str_replace('Æ', 'AE', $x); // prints: AEmystr
?>
That code works just fine, what I believe you're missing is changing the encoding of your file. Your .php file should be encoded in UTF-8 or UNICODE. This can be done in some (text) editors or IDEs, i.e Eclipse, EditPlus, Notepad++ etc... Even Notepad on windows 7.
When saving bring up the Save/Save As dialog, and normally near the Save button there is an Encoding dropdown/radio buttons, that lets you choose between ANSI and UTF-8 (and others).
On *nix I believe most editors have it, just not sure of the locations. If after you do it and get it working, then edit/save with an editor that just does ANSI it'll overwrite it with an unknown char etc...
As to why the below code didn't work.
return str_replace(chr(195), 'Ae', $string);
It's because a unicode char is normally 2 chars put together. So what you have above is just the start of the unicode char. try this:
print str_replace(chr(195).chr(134), 'AE', $x);
That should replace it as well and might even be preferred as you (might|do) not have to change the file encoding.
Click on this for a link to characters page
Here's another one.

Replace unicode character

I am trying to replace a certain character in a string with another. They are quite obscure latin characters. I want to replace character (hex) 259 with 4d9, so I tried this:
str_replace("\x02\x59","\x04\xd9",$string);
This didn't work. How do I go about this?
**EDIT: Additional information.
Thanks bobince, that has done the trick. Although, I want to replace the uppercase schwa also and it is not working for some reason. I calculated U+018F (Ə) as UTF-8 0xC68F and this is to be replaced with U+04D8 (0xD398):
$string = str_replace("\xC9\x99", "\xD3\x99", $_POST['string_with_schwa']); //lc 259->4d9
$string = str_replace( "\xC6\8F", "\xD3\x98" , $string); //uc 18f->4d8
I am copying the 'Ə' into a textbox and posting it. The first str_replace works fine on the lowercase, but does not detect the uppercase in the second str_replace, strange. It remains as U+018F. Guess I could run the string through strtolower but this should work though.
U+0259 Latin Small Letter Schwa is only encoded as the byte sequence 0x02,0x59 in the UTF-16BE encoding. It is very unlikely you will be working with byte strings in the UTF-16BE encoding as it's not an ASCII-compatible encoding and almost no-one uses it.
The encoding you want to be working with (the only ASCII-superset encoding to support both Latin Schwa and Cyrillic Schwa, as it supports all Unicode characters) is UTF-8. Ensure your input is in UTF-8 format (if it is coming from form data, serve the page containing the form as UTF-8). Then, in UTF-8, the character U+0259 is represented using the byte sequence 0xC9,0x99.
str_replace("\xC9\x99", "\xD3\x99", $string);
If you make sure to save your .php file as UTF-8-no-BOM in the text editor, you can skip the escaping and just directly say:
str_replace('ə', 'ә', $string);
A couple of possible suggestions. Firstly, remember that you need to assign the new value to $string, i.e.:
$string = str_replace("\x02\x59","\x04\xd9",$string);
Secondly, verify that your byte stream occurs in the $string. I mention this because your hex string begins with a low-byte, so you'll need to make sure your $string is not UTF8 encoded.

PHP and character encoding problem with  character

I'm having a problem where PHP (5.2) cannot find the character 'Â' in a string, though it is clearly there.
I realize the underlying problem has to do with character encoding, but unfortunately I have no control over the source content. I receive it as UTF-8, with those characters already in the string.
I would simply like to remove it from the string. strpos(), str_replace(), preg_replace(), trim(), etc. Cannot correctly identify it.
My string is this:
"Â Â Â A lot of couples throughout the World "
If I do this:
$string = str_replace('Â','',$string);
I get this:
"� � � A lot of couples throughout the World"
I even tried utf8_encode() and utf8_decode() before the str_replace, with no luck.
What's the solution? I've been throwing everything I can find at it...
$string = str_replace('Â','',$string);
How is this 'Â' encoded? If your script file is saved as iso-8859-1 the string 'Â' is encoded as the one byte sequence xC2 while the (/one) utf-8 representation is xC3 x82. php's str_replace() works on the byte level, i.e. it only "knows" single-byte characters.
see http://docs.php.net/intro.mbstring
I use this:
function replaceSpecial($str){
$chunked = str_split($str,1);
$str = "";
foreach($chunked as $chunk){
$num = ord($chunk);
// Remove non-ascii & non html characters
if ($num >= 32 && $num <= 123){
$str.=$chunk;
}
}
return $str;
}
From the PHP Manual Comment Page:
http://www.php.net/manual/en/function.preg-replace.php#96847
And from StackOverflow:
Remove accents without using iconv

Categories