Changing case with regex - php

I was looking for this for a while, but was not able to find any answer. I need to change a string to lowercase in PHP.
Off course, this can be done by using strtolower(), but I was wondering if its possible to do it via preg_replace().
I noticed that in vim one can use \L or \U modifiers in the back references to change the case to lower or upper.
Is something like that possible to do in PHP, i.e. in the second argument in preg_replace()? The reason why I wanna change the case via preg_replace() is that I heard that it might work better for UTF8 strings (not sure if its true).
Thanks.

You should actually just use
mb_strtolower($str, 'UTF-8')
That way you specify utf-8 is the encoding, and all should work well.
Edit: sorry had strtoupper, changed to lower. Also, you can leave off utf-8 and it should automatically detect the encoding and use the right one.

Doing with preg_replace is practically impossible.
This is because you need to pass the strtolower() / strtoupper() as a parameter to preg_replace function. Since preg_replace cannot act on their own.
Go with the function what Dave suggested.

Related

How to get only integer value Started from Symbol from given string

I need a integer value which started from £ and £ , I try to do with regrex but I only getting value which starting from £.
Here I use the regrex Like this.
if(preg_match('/(\£[0-9]+(\.[0-9]{2})?)/',$vals,$matches))
{
$main[]= str_replace('£','',$matches[0]);
}
I am not familiar with regrex. so please share any solution. any help would highly appriciated.Thank you.
From your question I understand that you are having troubles with character encodings, so first of all I would suggest you to address this issue one step before, it is really important to resolve encoding issues in the earliest possible step.
Back to the question, first off, to avoid falling deeper into the charset encoding hell, I would recommend you to write your regexp literal in HEX, because otherwise the charset encoding in which you save your PHP files would affect the result. I.E. if you do something like this:
preg_match('/(£|£)(\d+)', ...)
It would match "£" and "£" (binary) if you save your source code in ISO-8859-1, but it would actually match "£" and "£" (binary) if you chose to save your source code in UTF-8 (which might be a good idea in general). So be careful with this, and verify what your editor/IDE is doing!
My suggestion thus is to write it this way, which is equivalent for ISO-8859-1 and UTF-8:
preg_match('/(\xa3|\xc2\xa3)(\d+)', ...) // match "£" and "£"
Also I suggest to make use of the sub-pattern capture feature of regular expressions, so you don't have to str_replace() afterwards, this way:
if (preg_match('/(?:\xa3|\xc2\xa3)([0-9]+(?:\.[0-9]{2})?)/', $data, $regp)) {
$main[] = $regp[1];
}
The "?:" at after the "(" means "this is a sub-pattern, but don't capture it".
Note that you can also replace preg_match with preg_match_all and you will find in $regp[1] the array of all matching numbers already prepared.
Try with this modified regex:
(?:£|£)([0-9]+(\.[0-9]{2})?)
It should do the trick. But it will return you decimal values also, because of the:
(.[0-9]{2})?
You can remove it and it will return only the integer part after £|£

How to make Stripos("UTF-8") in PHP? [duplicate]

I am using the stripos function to check if a string is located inside another string, ignoring any cases.
Here is the problem:
stripos("ø", "Ø")
returns false. While
stripos("Ø", "Ø")
returns true.
As you might see, it looks like the function does NOT do a case-insensitive search in this case.
The function has the same problems with characters like Ææ and Åå. These are Danish characters.
Use mb_stripos() instead. It's character set aware and will handle multi-byte character sets. stripos() is a holdover from the good old days when there was only ASCII and all chars were only 1 byte.
You need mb_stripos.
mb_stripos will take care of this.
As the other solutions say, try first with mb_stripos(). But if using this function doesn't help, check the encoding of your php file. Convert it to UTF-8 and save it. That did the trick for me after hours of research.

using the php 5.4's new constant ENT_DISALLOWED in htmlentities

There is a string I'm trying to output in an htmlencoded way, and the htmlentities() function always returns an empty string.
I know exactly why it does so. Well, I am not running PHP 5.4 I got the latest PHP 5.3 flavor installed.
The question is how I am gonna be able to htmlencode a string which has invalid code unit sequences.
According to the manual, ENT_SUBSTITUTE is the way to go. But this constant is not defined in PHP 5.3.X.
I did this:
if (!defined('ENT_SUBSTITUTE')) {
define('ENT_SUBSTITUTE', 8);
}
still no luck. htmlentities is still returning empty string.
I wanted to try ENT_DISALLOWED instead, but I cannot find its corresponding long value for it.
So my question is two folded
What's the constant value of PHP 5.4's ENT_DISALLOWED?
How do I make sure that a string containing non UTF-8 characters (such as the smart quotes), can be cleared out of them? - Not just the smart quotes but anything that causes htmlentities() to return blank string.
It is true that htmlentities() in PHP 5.3 does not have the ENT_SUBSTITUTE flag, however it has the (not really suggested) ENT_IGNORE flag. Be ware of the note and try to understand it before use:
Using this flag is discouraged as it » may have security implications.
It is far better that you understand why there is a problem with the input string in the first place. Most often users are only missing to specify the correct encoding.
E.g. first re-encode the string into UTF-8, then pass it to htmlspecialchars() or htmlentities(). Speaking of smart-quotes you are probably using a Windows-1252 encoded string. You won't even need to convert that one before use, you can just specify the charset properly (PHP 5.3):
htmlentities($string, ENT_QUOTES, $encoding = 'Windows-1252');
Naturally this only works if the input $string is encoded in Windows-1252 (CP1252). Find out the correct encoding first, then it's normally no problem. For non-supported encodings re-encode into a supported one first, for example with iconv or mb_string.
As you say, these constants were added in 5.4.0. The thing is, the support is new to 5.4.0 as well. Meaning you can pass whatever values you want, older htmlentities will not understand it.
As it is most probably the case, php changelog is quite misleading.

need capitalize words with special chars in PHP

ucwords doesn't capitalize foreign chars like öüäõ
so I need a solution, which will make "öösel" into "Öösel"
Is there a simple way to do it with regexp or I have to check all the characters manually?
If you have the mbstring extension installed, you can use the mb_convert_case function, specifying MB_CASE_TITLE as the $mode parameter.
You can give a try to strtoupper() which works fine for me with French.
Sorry I hadn't seen it was ucwords...
Otherwise, this should work:
mb_convert_case($string, MB_CASE_TITLE, "UTF-8");
Aside from the other answers, which suffer from the same problems as ucwords, you might take a look at keeping this variation in your toolbox.

why php's str_replace mess up the strings with special chars

why php's str_replace and many other string functions mess up the strings with special chars such ('é' 'à' ..) ? and how to fix this problem ?
str_replace is not multi-byte (unicode) aware. use the according mb_* functions instead
in your place mb_ereg_replace sounds like the right option. you could as well just use the PCRE regex functions and specifying the X flag
PHP wasn't developed from the ground up to natively support UTF8. It may be useful to instead of specify the character literal, specify the entity reference / hex code of that in your replacement, eg \x3094 and replace that, I think it's more consistently supported.
Though it would help seeing your direct issue at hand, with more code.

Categories