multi-byte function to replace preg_match_all? - php

I'm looking for a multi-byte function to replace preg_match_all(). I need one that will give me an array of matched strings, like the $matches argument from preg_match(). The function mb_ereg_match() doesn't seem to do it -- it only gives me a boolean indicating if there were any matches.
Looking at the mb_* functions page, I don't offhand see anythng that replaces the functionality of preg_match(). What do I use?
Edit I'm an idiot. I originally posted this question asking for a replacement for preg_match, which of course is ereg_match. However both those only return the first result. What I wanted was a replacement for preg_match_all, which returns all match texts. But anyways, the u modifier works in my case for preg_match_all, as hakre pointed out.

Have you taken a look into mb_ereg?
Additionally, you can pass an UTF-8 encoded string into preg_match using the u modifier, which might be the kind of multi-byte support you need. The other option is to encode into UTF-8 and then encode the results back.
See as well an answer to a related question: Are the PHP preg_functions multibyte safe?

PHP: preg_grep manual
$matches = preg_grep('/(needles|to|find)/u', $inputArray);
Returns an array indexed using the keys from the input array.
Note the /u modifier which enables multibyte support.
Hope it helps others.

Related

str_replace for UTF-16 characters

I have some strings containing characters such as \x{1f601} which I want to replace with some text.
When I do this using preg_replace, it would be something like:
preg_replace('/\x{1f601}/u', '######', $str)
However, this doesn't seem to work with str_replace:
str_replace("\x{1f601}", '######', $str)
How can I make such replacements work with str_replace?
preg_replace is a Regex parser/replacer, which is a Perl Regular expression engine, but str_replace is NOT and replaces things with a plaintext method
The Preg_replace you have got can be seen here in regex101, stating that:
matches the character 😁 with position 0x1f601 (128513 decimal or 373001 octal) in the character set
But this could be transferable to a non-regex find and replace,by copy and pasting that face smiley symbol into the str_replace directly.
$str = str_replace("😁", '######', $str)
Or, by reading deceze's comment which gives you a clean, small solution.
Additional:
You are using a character set that is non-standard so it may be useful for you to explore Mb_Str_replace (gitHub) which is an accompanyment (but not directly from) the mb_string collection of PHP functions.
Finally:
Why do you need to do string replace whe you are already doing regex preg_replace? Also please read the manual which states all of this fairly clearly.

How to transform a string to lowercase with preg_replace

I just stuck at this and cannot find solution.
I would like to try to transform a string to lower case using preg_replace.
I just cannot create the right regex.
The reason is that normal strtolower does not support unicode characters.
I know that I could use mb_strtolower but this function seems to be quite slow and beside them not everyone has MB support.
Any clue?
Regards,
Radek
EDIT: Ok, thanks alot for your help guys. I think my approach was not quite correct.
I think it would be much better to use this: How do I detect non-ASCII characters in a string? and then respectively use either the strtolower or mb_strtolower if available.
Regex is not able to change characters by itself, it can only change their order and/or add additional characters/delete some of them.
There is preg_replace_callback or /e flag, but they can manipulate only with known functions, and therefore can't do better than strtolower.
If you can't rely on existense of mb_strolower function, you will have to implement it yourself.
You shouldn't use a preg_replace for this because preg_replace is used to match a certain pattern and replace it with something else. Wat you want is to replace every single uppercase character with a lowercase one, so no need to match a pattern.
mb_strtolower would be the way to go, and if you don't have the mb_ functions you'll have to write a function yourself using a lot of str_replace's...

PHP: is there an isLetter() function or equivalent?

I am no PHP expert. I am looking for the PHP equivalent of isLetter() in Java, but I can't find it. Does it exist?
I need to extract letters from a given string and make them lower case, for example: "Ap.ér4i5T i6f;" should give "apéritif'. So, yes, there are accentuated characters in my strings.
ctype_alpha().
In addition to regex / preg_replace, you can also use strtoupper($string) and strtolower($string), if you need to universally upper-case a string. As Konrad mentioned, preg_replace is probably your best bet though.
http://php.net/manual/en/function.strtoupper.php
http://www.php.net/manual/en/function.strtolower.php
In PHP (and in Java) you wouldn’t use isLetter to implement it, you’d rather replace all characters that aren’t letters using a regular expression:
echo preg_replace('/\P{L}/', '', input);
Loop up the documentation of preg_replace and the regex pattern syntax desciption, in particular the relevant Unicode character classes.
You could probably use the php-slugs source code, with appropriate modifications.

php what is the equivalent of preg_match but does not require regex?

In PHP is there an equivalent to preg_match that does not require the use of regex? There is str_replace() for preg_replace. Is there something for preg_match.
*update * I am only looking to replace a known string with another. Using regex just seems like overkill.
I have the string "This is a [test1], and not a [test2]" and I want to match them with "[test1]" and "[test2]".
If you mean find a string within another string without using regex, you can use strpos
if (strpos('hello today', 'hello') !== false) {
// string found
}
Since I am not sure what result you are looking for I can't say if this is exactly what you are looking for.
You can use strpos to see if an occurrence of one string is in another.
To answer your question there is some function of PHP without regex
Do not use preg_match() if you only
want to check if one string is
contained in another string. Use
strpos() or strstr() instead as they
will be faster.
But they can not replace preg_match completely at all
First, str_replace() is not replacement for preg_replace(). Function str_replace() replaces all occurrences of the search string with the replacement string, preg_replace() replaces content selected by regular expressions (that's not same thing).
A lot of things require regex (and that's good) so you can't simply replace it with single PHP function.
Most developers use preg_match because they want to use the matches (the third parameter which will get set by the function).
I can not think of a function that will return or set the same information, as done with matches.
If however, you are using preg_match without regex then you might not care as much about the matches.
If you are using preg_match to see if there is a "match" and just that then I'd suggest using strpos instead, since it is much more efficient at seeing if one string is found in another.

PHP ereg_replace questions

A couple of PHP ereg_replace questions. I have an array of names:
$ssKeywords = array("Str", "Int", "String", "As", "Integer", "Variant");
However when I use this ereg_replace:
foreach($arKeyword as $aWord) {
$sCode = ereg_replace($aWord, "<span class='ssKeyword'>".$aWord."</span>", $sCode);
}
It will only find the "str" or "int" not the full match. Apparently ereg_replace is greedy, so why is it not looking for the full match?
I managed to get comments working using preg_replace.
Do you need to use ereg? it has been deprecated and will be obsolete as of PHP 6. I suggest you use preg, which is also more efficient.
This information is available at php.net/ereg
Instead of searching for one term at a time better search for all of them at a time:
$sCode = preg_replace('/(?:'.implode('|', array_map('preg_quote', $ssKeywords)).')/', '<span class="ssKeyword">$0</span>', $sCode);
And if you sort the terms by its length, you will find Integer instead of just Int:
usort($ssKeywords, create_function('$a,$b', 'return strlen($b)-strlen($a);'));
Your problem is nothing to do with ereg_replace. Not only is preg_replace a better bet, but in fact in your case you aren't using any type of regular expressions; you're just doing a plain string match. So str_replace would be quicker and clearer.
The problem is that you're doing:
foreach($arKeyword as $aWord) {
which loops from the first to the last element of the array, testing the whole string against each of the keywords in the order you declared them. You declared ‘Int’ first, so any ‘Integer’ in the string will get replaced by “<span class="ssKeyword">Int</span>eger” before the loop gets as far as the ‘Integer’ keyword. By which time, with the “</span>” in the way, it'll never match.
Change the array order so that the longer keywords come first and it'll work.
If you're doing a plain-text match then str_replace is more efficient and less unnecessary. If you do need regular expressions any time in the future, use the preg_* functions because ereg_* is deprecated and preg_* is faster.
With regard to your question about "greedy", that refers to when you're actually using regular expressions. For example if you have the text:
Hello World! Hello World!
And use a regex like this:
/Hell(.+)rld!/
Then it will match the entire string because the + operator is greedy and finds as much as it can on one line. You'd need to do this to stop it being greedy and match each of the phrases:
/Hell(.+?)rld!/

Categories