I've been working with Arabic characters for a while now.
Look at this:
$string = "السلام";
Works perfectly when I print it.
But. I want to get the last letter, "م".
I've tried
$string[strlen($string]-1)];
Tried substring too.
Getting this output: �
SOLVED:
Forgot to add: mb_internal_encoding("UTF-8");
Thanks a lot guys!
You're trying to use byte-type operations on a multi-byte string (utf-8? -16?) You need to use the mb_*() functions to work with multi-byte strings: http://php.net/mb_substr
Try this:
<?php
mb_internal_encoding("UTF-8");
$string = "السلام";
echo mb_substr($string, -1);
?>
Your code is also not correct (there is syntax error):
$string[strlen($string]-1)];
^--should be )
$string[strlen($string)-1)];
You should use mb_strlen for multibyte strings. These characters take more than one byte, therefore when you fetch them with native non-mb functions, you take only one part of the character, which is usually some gibberish. mb_* functions take care of that.
Related
I have the following code:
$text = 'Tomáš'
echo strpos($text, "č");
# result if 4
I believe they are different chars so why is PHP telling me they are the same?
What is going on and how can I correct this?
The encoding you chose to save your source code file in cannot encode the characters you're trying to save. Whatever characters PHP is seeing, it's not comparing the strings you think it is. Save your source code in an encoding that can encode all characters, preferably UTF-8.
You should try with mb_strpos function.
Performs a multi-byte safe strpos() operation based on number of characters. The first character's position is 0, the second character position is 1, and so on.
With a regular setup, it returns false to me.
However if you've troubles with such special characters, using mb_strpos instead of strpos should help.
http://php.net/manual/en/function.mb-strpos.php
I am using the stripos function to check if a string is located inside another string, ignoring any cases.
Here is the problem:
stripos("ø", "Ø")
returns false. While
stripos("Ø", "Ø")
returns true.
As you might see, it looks like the function does NOT do a case-insensitive search in this case.
The function has the same problems with characters like Ææ and Åå. These are Danish characters.
Use mb_stripos() instead. It's character set aware and will handle multi-byte character sets. stripos() is a holdover from the good old days when there was only ASCII and all chars were only 1 byte.
You need mb_stripos.
mb_stripos will take care of this.
As the other solutions say, try first with mb_stripos(). But if using this function doesn't help, check the encoding of your php file. Convert it to UTF-8 and save it. That did the trick for me after hours of research.
I have gone throught the following question:
substr() not working but it did not work for me :(
I am facing the same problem. I am using nicEditor and for at the time of insert, I do htmlentities(addslashes(urlencode($description)))
and when I view the description? It shows me correctly, but when i use substr() it returns nothing.
like:
substr($description,0,10)
$description contains the content and it is fine, present in db, works without substr()
Please provide a var_dumb()
of $description and a bit more code before $description is filled in, so we can see if there is an other problem.
Try this one
Use mb_substr for multibyte character encodings like UTF-8. substr
just counts bytes while mb_substr counts characters.
substr() works with singlebyte only
http://php.net/manual/en/function.mb-substr.php
Source: PHP Substr Function Trimming Problem
This happens because in UTF-8 characters are not restricted to one
byte, they have variable length to match Unicode characters, between 1
and 4 bytes.
A safe way of cutting these strings without losing anything is by
using the mb_substr PHP function instead. It works almost the same way
as substr but the difference is that you can add a new parameter to
specify the encoding type, whether is UTF-8 or a different encoding.
Source: http://osc.co.cr/extracting-a-substring-from-a-utf-8-string-in-php/
I am using a substr method to access the first 20 characters of a string. It works fine in normal situation, but while working on rtl languages (utf8) it gives me wrong results (about 10 characters are shown). I have searched the web but found nth useful to solve this issue. This is my line of code:
substr($article['CBody'],0,20);
Thanks in advance.
If you’re working with strings encoded as UTF-8 you may lose
characters when you try to get a part of them using the PHP substr
function. This happens because in UTF-8 characters are not restricted
to one byte, they have variable length to match Unicode characters,
between 1 and 4 bytes.
You can use mb_substr(), It works almost the same way as substr but the difference is that you can add a new parameter to specify the encoding type, whether is UTF-8 or a different encoding.
Try this:
$str = mb_substr($article['CBody'], 0, 20, 'UTF-8');
echo utf8_decode($str);
Hope this helps.
Use this instead, here is extra text to make the body long enough. This will handle multi-byte characters.
http://php.net/manual/en/function.mb-substr.php
I have the the problem described in title.
If I use
preg_match_all('/\pL+/u', $_POST['word'], $new_word);
and I type hello à and ì the new_word returned is *hello and *
Why?
Someone advised me to specify all characters I want to convert in this way
preg_match_all('/\pL+/u', $_POST['word'], 'aäeëioöuáéíóú');
, but I want my application works with all existing accents (for a multilanguage website).
Can you help me?
Thanks.
EDIT: I specify that I utilise this regex to purify punctuation. It well purify all punctuation but unicode characters are wrong returned, in fact are not even returned.
EDIT 2: I am sorry, but I very badly explained.
The problem is not in preg_match_all but in
str_word_count($my_key, 2, 'aäáàeëéèiíìoöóòuúù');
I had to manually specify accented characters but I think there are many others. Right?
\pL should match all utf8 characters and spaces. Be sure, that $_POST['word'] is a string encoded with utf8. If not, try utf8_encode() before matching or check the encoding of your HTML form. In my tests, your example works like a charm.
You may use this together with count() to get the number of words. Then you need not care about the possible characters. \pL will do this for you. This should do the trick:
$string = "áll thât words wíth ìntérnâtiønal çhårs";
preg_match_all('/\pL+/u', $string, $words);
echo count($words[0]); // returns: 6
Try using mb_ereg_match() (instead of preg_match()) from Multibyte String PHP library. It is specially made for working with multibyte strings.