Migrate from ereg_replace to preg_replace - php

Just migrating from PHP 5.2 to 5.3, lot of hard work! Is the following ok, or would you do something differently?
$cleanstring = ereg_replace("[^A-Za-z0-9]^[,]^[.]^[_]^[:]", "", $critvalue);
to
$cleanstring = preg_replace("[^A-Za-z0-9]^[,]^[.]^[_]^[:]", "", $critvalue);
Thanks all

As a follow-up to cletus's answer:
I'm not familiar with the POSIX regex syntax (ereg_*) either, but based on your criteria the following should do what you want:
$cleanstring = preg_replace('#[^a-zA-Z0-9,._:]#', '', $critvalue);
This removes everything except a-z, A-Z, 0-9, and the puncation characters.

I'm not that familiar with the ereg_* functions but your preg version has a couple of problems:
^ means beginning of string so with it in the middle it won't match anything; and
You need to delimit your regular expression;
An example:
$out = preg_replace('![^0-9a-zA-Z]+!', '', $in);
Note I'm using ! to delimit the regex but you could just as easily use /, ~ or whatever. The above removes everything except numbers and letters.
See Pattern Syntax, specifically Delimiters.

Related

Make a multiple delimiter REGEX for preg_split

I need to split multiple lines in multiple files by different delimiters. I think preg_split should do the job but i never worked with PCRE REGEX stuff. I could also change all my delimiters to be consistent but that adds unnecessary calculations.
Q: My delimiters consist of (,)(;)(|)(space) and i am curious how to build such a REGEX.
Put the characters in square brackets []:
$parts = preg_split('/[,;| ]/', $string, null, PREG_SPLIT_NO_EMPTY);
You can also use \s instead of a space character, which matches all kinds of whitspace, such as tabs and newlines.
Try this:
$string = "foo:bar|it;is:simple";
print_r(preg_split ( '/,|;|\||\s/' , $string ));

preg_replace or mb_ereg_replace in this case?

I have this RegEx for matching whitespace in Unicode:
/^[\pZ\pC]+|[\pZ\pC]+$/u
I'm not even sure of what it does, but it seems to work. Now, in this case, which function applies better and why?
$str = preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);
or
$str = mb_ereg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);
The first one works. The second one doesn't.
Tried it out again, mb_ereg_replace doesn't actually support those Unicode char escapes. And it doesn't use regex delimiters. (See Oniguruma)
preg_replace uses the PCRE regex engine, which supports both.
Anyway, there is no such thing as a "better" application. It's either functioning, or not.

What is the regular expression for space and alpha-numeric

I'm using ajax check function to check inserted category name which should be only alpha-numeric and also allowed space
I've used this function eregi_replace with the following regular expression [a-zA-Z0-9_]+
$check = eregi_replace('([a-zA-Z0-9_]+)', "", $catname);
But when i insert category name for example hello world it failed cause it does not accept space but if i write it as helloworld works so i understood that the error must be in the regular expression i'm using.
so what is the correct regular expression that filter any special characters and allow only for alpha-numeric and space.
Thanks a lot
A character class matching letters, numbers, the underscore and space would be
[\w ]
You should not be using any of the POSIX regular expression functions as they are now deprecated. Instead, use their superior counterparts from the PCRE suite.
Change your regular expression to:
([A-Za-z0-9_]+(?: +[A-Za-z0-9_]+)*)
I realize that it is not as straightforward as you might have hoped. Things to note:
The identifier must start with a non-space
If there are spaces, they should be between words and not matched at the end
?: is used to prevent an extra grouping in your expression, but is not required
The + after the space character allows multiple spaces between words. You can enforce a single space by removing it, but in some solutions, it is a better practice to normalize the space internally with a preg_split that matches on " +" (a space with a plus sign) and then use implode(" ", $array). But eh... if you are just validating, this should be fine.
you've got it nearly right, just add \s into your square brackets and "hello world" will pass.
([A-Za-z0-9_\s]+)
I've got some help by old friend and i've tested and works perfect - thank you all for answers and comments it was very helpful to me.
this works perfect
$check = eregi_replace('(^[a-zA-Z0-9 ]*$)', "", $catname);
Alphanumeric and white space regular expression
#Phil
yours works perfect but still will pass underscore ~ thanks
#Michael Hays
I do not know it didn't worked for whitespace , but your comments is very helpful ~ thanks
#kjetilh
I will read more about $preg ~ thanks
#Alastair
Works fine if i've replaced \s with just whitespace ! ~ thanks
eregi functions are deprecated as of php 5.3. Use preg instead.

Is it ok to use £ as delimiter in preg_replace?

I am converting an eregi_replace function I found to preg_replace, but the eregi string has about every character on the keyboard in it. So I tried to use £ as the delimiter.. and it is working currently, but I wonder if it might potentially cause problems because it is a non-standard character?
Here is the eregi:
function makeLinks($text) {
$text = eregi_replace('(((f|ht){1}tp://)[-a-zA-Z0-9#:%_\+.~#?&//=]+)',
'\\1', $text);
$text = eregi_replace('([[:space:]()[{}])(www.[-a-zA-Z0-9#:%_\+.~#?&//=]+)',
'\\1\\2', $text);
return $text;}
and the preg:
function makeLinks($text) {
$text = preg_replace('£(((f|ht){1}tp://)[-a-zA-^Z0-9#:%_\+.~#?&//=]+)£i',
'\\1', $text);
$text = preg_replace('£([[:space:]()[{}])(www.[-a-zA-Z0-9#:%_\+.~#?&//=]+)£i',
'\\1\\2', $text);
return $text;
}
£ is problematic because it isn't an ASCII character. It's from the Latin-1 charset and will only work if your PHP script also uses the 8bit representation. Should your file be encoded as UTF-8, then £ will be represented as two bytes. And PCRE in PHP will trip over that. (At least my version does.)
You can use parentheses to delimit a regex rather than a single character, for example:
preg_replace('(abc/def#ghi)i', ...);
That would probably be nicer than trying to find an obscure character that's not (yet) part of your expression.
You can use the unicode character, just to be sure.
\u00A3
Watch out for the ereg functions and unicode support.
http://www.regular-expressions.info/php.html
http://www.regular-expressions.info/characters.html
Long live the Queen.
As #Chris pointed out, you can use paired bracket characters as delimiters, but they have to properly balanced throughout the regex. For example, '<<>' won't work, but '<<>>' will. You can use any of (), [], {} or <>, but I recommend the braces or the square brackets; parentheses are too common in regexes, and angle brackets are used in escape sequences like (?>...) (atomic group) and (?<=...) (lookbehind).
But I'm with #Brad on this one: why not just escape the delimiter character with a backslash whenever it appears in the regex?
You would know the data being parsed better than we would. As far as regex is concerned, it's no different than any other ASCII value.
Though I have to ask: what's wrong with traditional then just escaping it? Or using a class with a character range?

Simple preg_replace

I cant figure out preg_replace at all, it just looks chinese to me, anyway I just need to remove "&page-X" from a string if its there.
X being a number of course, if anyone has a link to a useful preg_replace tutorial for beginners that would also be handy!
Actually the basic syntax for regular expressions, as supported by preg_replace and friends, is pretty easy to learn. Think of it as a string describing a pattern with certain characters having special meaning.
In your very simple case, a possible pattern is:
&page-\d+
With \d meaning a digit (numeric characters 0-9) and + meaning: Repeat the expression right before + (here: \d) one or more times. All other characters just represent themselves.
Therefore, the pattern above matches any of the following strings:
&page-0
&page-665
&page-1234567890
Since the preg functions use a Perl-compatible syntax and regular expressions are denoted between slashes (/) in Perl, you have to surround the pattern in slashes:
$after = preg_replace('/&page-\d+/', '', $before);
Actually, you can use other characters as well:
$after = preg_replace('#&page-\d+#', '', $before);
For a full reference of supported syntax, see the PHP manual.
preg_replace uses Perl-Compatible Regular Expression for the search pattern. Try this pattern:
preg_replace('/&page-\d+/', '', $str)
See the pattern syntax for more information.
$outputstring = preg_replace('/&page-\d+/', "", $inputstring);
preg_replace()
preg_replace('/&page-\d+/', '', $string)
Useful information:
Using Regular Expressions with PHP
http://articles.sitepoint.com/article/regular-expressions-php

Categories