ucwords doesn't capitalize foreign chars like öüäõ
so I need a solution, which will make "öösel" into "Öösel"
Is there a simple way to do it with regexp or I have to check all the characters manually?
If you have the mbstring extension installed, you can use the mb_convert_case function, specifying MB_CASE_TITLE as the $mode parameter.
You can give a try to strtoupper() which works fine for me with French.
Sorry I hadn't seen it was ucwords...
Otherwise, this should work:
mb_convert_case($string, MB_CASE_TITLE, "UTF-8");
Aside from the other answers, which suffer from the same problems as ucwords, you might take a look at keeping this variation in your toolbox.
Related
I am using the stripos function to check if a string is located inside another string, ignoring any cases.
Here is the problem:
stripos("ø", "Ø")
returns false. While
stripos("Ø", "Ø")
returns true.
As you might see, it looks like the function does NOT do a case-insensitive search in this case.
The function has the same problems with characters like Ææ and Åå. These are Danish characters.
Use mb_stripos() instead. It's character set aware and will handle multi-byte character sets. stripos() is a holdover from the good old days when there was only ASCII and all chars were only 1 byte.
You need mb_stripos.
mb_stripos will take care of this.
As the other solutions say, try first with mb_stripos(). But if using this function doesn't help, check the encoding of your php file. Convert it to UTF-8 and save it. That did the trick for me after hours of research.
I was looking for this for a while, but was not able to find any answer. I need to change a string to lowercase in PHP.
Off course, this can be done by using strtolower(), but I was wondering if its possible to do it via preg_replace().
I noticed that in vim one can use \L or \U modifiers in the back references to change the case to lower or upper.
Is something like that possible to do in PHP, i.e. in the second argument in preg_replace()? The reason why I wanna change the case via preg_replace() is that I heard that it might work better for UTF8 strings (not sure if its true).
Thanks.
You should actually just use
mb_strtolower($str, 'UTF-8')
That way you specify utf-8 is the encoding, and all should work well.
Edit: sorry had strtoupper, changed to lower. Also, you can leave off utf-8 and it should automatically detect the encoding and use the right one.
Doing with preg_replace is practically impossible.
This is because you need to pass the strtolower() / strtoupper() as a parameter to preg_replace function. Since preg_replace cannot act on their own.
Go with the function what Dave suggested.
I've been working with Arabic characters for a while now.
Look at this:
$string = "السلام";
Works perfectly when I print it.
But. I want to get the last letter, "م".
I've tried
$string[strlen($string]-1)];
Tried substring too.
Getting this output: �
SOLVED:
Forgot to add: mb_internal_encoding("UTF-8");
Thanks a lot guys!
You're trying to use byte-type operations on a multi-byte string (utf-8? -16?) You need to use the mb_*() functions to work with multi-byte strings: http://php.net/mb_substr
Try this:
<?php
mb_internal_encoding("UTF-8");
$string = "السلام";
echo mb_substr($string, -1);
?>
Your code is also not correct (there is syntax error):
$string[strlen($string]-1)];
^--should be )
$string[strlen($string)-1)];
You should use mb_strlen for multibyte strings. These characters take more than one byte, therefore when you fetch them with native non-mb functions, you take only one part of the character, which is usually some gibberish. mb_* functions take care of that.
I have the the problem described in title.
If I use
preg_match_all('/\pL+/u', $_POST['word'], $new_word);
and I type hello à and ì the new_word returned is *hello and *
Why?
Someone advised me to specify all characters I want to convert in this way
preg_match_all('/\pL+/u', $_POST['word'], 'aäeëioöuáéíóú');
, but I want my application works with all existing accents (for a multilanguage website).
Can you help me?
Thanks.
EDIT: I specify that I utilise this regex to purify punctuation. It well purify all punctuation but unicode characters are wrong returned, in fact are not even returned.
EDIT 2: I am sorry, but I very badly explained.
The problem is not in preg_match_all but in
str_word_count($my_key, 2, 'aäáàeëéèiíìoöóòuúù');
I had to manually specify accented characters but I think there are many others. Right?
\pL should match all utf8 characters and spaces. Be sure, that $_POST['word'] is a string encoded with utf8. If not, try utf8_encode() before matching or check the encoding of your HTML form. In my tests, your example works like a charm.
You may use this together with count() to get the number of words. Then you need not care about the possible characters. \pL will do this for you. This should do the trick:
$string = "áll thât words wíth ìntérnâtiønal çhårs";
preg_match_all('/\pL+/u', $string, $words);
echo count($words[0]); // returns: 6
Try using mb_ereg_match() (instead of preg_match()) from Multibyte String PHP library. It is specially made for working with multibyte strings.
why php's str_replace and many other string functions mess up the strings with special chars such ('é' 'à' ..) ? and how to fix this problem ?
str_replace is not multi-byte (unicode) aware. use the according mb_* functions instead
in your place mb_ereg_replace sounds like the right option. you could as well just use the PCRE regex functions and specifying the X flag
PHP wasn't developed from the ground up to natively support UTF8. It may be useful to instead of specify the character literal, specify the entity reference / hex code of that in your replacement, eg \x3094 and replace that, I think it's more consistently supported.
Though it would help seeing your direct issue at hand, with more code.