Should I be using mb_convert_case with MB_CASE_TITLE or ucwords? Or something else? What will the differences be?
It depends.
mb_convert_case() is multibyte safe. ucwords() is not.
mb_convert_case() requires an extension that is not always available. ucwords() is always available.
So if your application will only ever use single-byte encodings then ucwords() gives you better portability.
But if your application might need to process multi-byte encodings then ucwords() will fail you.
function uc_words($string){
return mb_convert_case($string, MB_CASE_TITLE, "UTF-8");
}
MB means multi byte, so it can convert non-ASCII characters, ucwords can convert only ASCII.
If you use ucwords on "moj šal", you will get "Moj šal", if you use multi byte convert you will get "Moj Šal"... that's it.
Related
Huh, looking at all those string functions, sometimes I get confused. One is using all the time mb_ functions, the other - plain ones, so the question is simple...
When should I use mb_strpos(); and when should I go with the plain one (strpos();)?
And, yes, I'm aware about that mb_ functions stand for multi-byte, but does it really mean, that if I'm working with only utf-8 encoded strings, I should stick with mb_ functions?
Thanks in advance!
You should use the mb_ functions whenever you expect to work with text that's not pure ASCII. I.e. you can work with the regular string functions, even if you're using UTF-8, as long as all the strings you're using them on only contain ASCII characters.
strpos('foobar', 'foo') // fine in any (ASCII-compatible) encoding, including UTF-8
strpos('ふーばー', 'ふー') // won't work as expected, use mb_strpos instead
Yes, if working with UTF-8 (which is a multi-byte encoding : one character can use more than one byte), you should use the mb_* functions.
The non-mb functions will work on bytes, and not characters -- which is fine when 1 character == 1 byte ; but that's not the case with (for example) UTF-8.
I'd say yes, here's the description from the php documentation:
mbstring provides multibyte specific string functions that help you deal with multibyte encodings in PHP. In addition to that, mbstring handles character encoding conversion between the possible encoding pairs. mbstring is designed to handle Unicode-based encodings such as UTF-8 and UCS-2 and many single-byte encodings for convenience....
If you're not sure that the mb extension is loaded, you should check before because mb-string is a non-default extension.
I have set of strings where some of them are made of non-ascii characters.
How do I get strings with only ascii characters using a php script.
Thanks a lot in advance for any guidance..
<?php
echo preg_replace('/[^(\x20-\x7F)]*/', '', 'Standard ASCII and some gärbägè');
?>
Probably the easiest option is to use the iconv function (if the iconv extension is available), using either the //IGNORE or //TRANSLIT option (see the documentation), if the behavior suits your needs.
Hey, I'm using php 5 and need to communicate with another server that runs completely in unicode. I need to convert every string to unicode before sending it over. This seems like an easy task, but I haven't been able to find a way to do it yet. Is there a simple function that returns a unicode string? i.e. convert_to_unicode("the string i'm sending")
You can use the utf8_encode and utf8_decode functions. Also, you may need to go through Multibyte String to deal with specific encoding with those mb functions.
You can use either :
utf8_encode / utf8_decode
The mb_* Multibyte String functions ; in your case, see mb_convert_encoding
iconv and the iconv function.
You can use the function utf8_encode
Ok, iconv worked. The trouble is that this is a windows server, so I had to do it in little-endian. UTF-16LE works. Here's the working code:
iconv("UTF-8", "UTF-16LE", "data to send")
why php's str_replace and many other string functions mess up the strings with special chars such ('é' 'à' ..) ? and how to fix this problem ?
str_replace is not multi-byte (unicode) aware. use the according mb_* functions instead
in your place mb_ereg_replace sounds like the right option. you could as well just use the PCRE regex functions and specifying the X flag
PHP wasn't developed from the ground up to natively support UTF8. It may be useful to instead of specify the character literal, specify the entity reference / hex code of that in your replacement, eg \x3094 and replace that, I think it's more consistently supported.
Though it would help seeing your direct issue at hand, with more code.
In PHP, what is the difference between strtolower and mb_strtolower?
If I want to convert submitted email address, to be converted to lower-case, which one should I use? Is there any email like this: Name#Domain-Test.com
If there are such email, should I still convert the submitted email address to lower case?
strtolower(); doesn't work for polish chars
<?php strtolower("mĄkA"); ?>
will return: mĄka;
the best solution - use mb_strtolower()
<?php mb_strtolower("mĄkA",'UTF-8'); ?>
will return: mąka
See strtolower() & mb_strtolower() in PHP Manual
whats is the different between strtolower and mb_strtolower?
The mb_* functions work with multi-byte string. The manual says:
By contrast to strtolower(), 'alphabetic' is determined by the Unicode character properties. Thus the behaviour of this function is not affected by locale settings and it can convert any characters that have 'alphabetic' property, such as A-umlaut (Ä).
-
Is there any email like this : Name#Domain-Test.com
Yes, I suppose there could be email addresses like that. I've found that in general, email addresses are case-insensitive, so I don't bother changing their case.
The mb_ functions work with Multi-Byte (unicode) strings as well. E-Mail addresses shouldn't be case sensitive - there isn't much reason to convert them to lower.
If you use this function on a unicode string without telling PHP that it is unicode, then you will corrupt your string. In particular, the uppercase 'A' with tilde, common in 2-byte UTF-8 characters, is converted to lowercase 'a' with tilde.
mb_strtolower() is very SLOW, if you have a database connection, you may want to use it to convert your strings to lower case. Even latin1/9 (iso-8859-1/15) and other encodings are possible.