Why don't reg expressions from regexlib.com work in PHP? - php

I found a regex on http://regexlib.com/REDetails.aspx?regexp_id=73
It's for matching a telephone number with international code like so:
^(\(?\+?[0-9]*\)?)?[0-9_\- \(\)]*$
When using with PHP's preg_match, the expression fails? Why is that?

You need to surround it with / delimiters:
preg_match('/^(\(?\+?[0-9]*\)?)?[0-9_\- \(\)]*$/', $phoneNumber)
And make sure you don't leave out the backslashes (\).

Because preg_match expects the regex to be delimited, usually with slashes (but, as correctly noted below, other characters are possible as long as they are matched):
preg_match('/^(\(?\+?[0-9]*\)?)?[0-9_ ()-]*$/', $subject)
Apart from that, the original regex was copied wrong - several characters were unescaped. The original on regexlib has a few warts, too (some characters were escaped needlessly).

Related

Regex to match characters that must be escaped in a PHP regex

I've had a look at this question, which shows what characters need to be escaped. However, I'm having a lot of trouble constructing a regex that will match any instance of one of those characters in a string.
For some background on the problem, I'm implementing a simple word-for-word (or term-for-term if you prefer) translation database where users enter language pairs, and can then trigger translations on blocks of text. The problem comes when users enter strings like "Yes/No". So, in PHP, I need to escape the string to be matched, and place it like this:
"/\b".$target."\b/"
So, what do I need to be looking at in terms of a preg_replace?
You want to use preg_quote(). As the documentation clearly states:
preg_quote() takes str and puts a backslash in front of every character that is part of the regular expression syntax. This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
Or \Q ... \E, ( What's between \Q and \E is treated as normal characters, not regular expression characters. )

PHP Regex Help (converting from preg_match_all to preg_replace)

I'm having a bit of difficulties converting some regex from being used in preg_match_all to being used in preg_replace.
Basically, via regex only, I would like to match uppercase characters that are preceded by either a space, beginning of text, or a hypen. This is not a problem, I have the following for this which works well:
preg_match_all('/(?<= |\A|-)[A-Z]/',$str,$results);
echo '<pre>' . print_r($results,true) . '</pre>';
Now, what I'd like to do, is to use preg_replace to only return the string with the uppercase characters that match my criteria above. If I port the regex straight into preg_replace, then it obviously replaces the characters I want to keep.
Any help would be much appreciated :)
Also, I'm fully aware regex isn't the best solution for this in terms of efficiency, but nonetheless I would like to use preg_replace.
According to De Morgan's laws,
if you want to keep letters that are
A-Z, and
preceded by [space], \A, or -
then you'd want to remove characters that are
not A-Z, or
not preceded by [space], \A, or -
Perhaps this (replace match with empty string)?
/[^A-Z]|(?<! |\A|-)./
See example here.
I think it will be something like this:
$sString = preg_replace('#.*?(?<= |\A|-)([A-Z])([a-z]+)#m',"$1", $sString);

PHP Regex for checking space or certain characters after string

I need a regex which can basically check for space, line break etc after string.
So conditions are,
Allow special characters ., _, -, + inside the string i.e.#hello.world, #hello_world, #helloworld, etc.
Discard anything including special characters where there is no alpha-numeric string after them i.e. #helloworld.<space>, #helloworld-<space>, #helloworld.?, etc. must be parsed as #helloworld
My existing RegEx is /#([A-Za-z0-9+_.-]+)/ which works perfectly Condition #1, but still there seems to be a problem Condition #2
I am using above RegEx in preg_replace()
Solution:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
This works perfectly.
Tested with
http://gskinner.com/RegExr/
You can use word boundaries to easily find the position between an alphanumeric letter and a non-alphanumeric letter:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
Working example: http://ideone.com/0ShCm
Here's an idea:
Use strrev to reverse the string
Use strcspn to find the longest prefix of the reversed string that does not contain any alphanumeric characters
Cut the prefix off with substr
Reverse the string again; this is your final result
See it in action.
I 'm not taking into account any requirement that restricts the legal characters in the string to some subset, but you can use your regular expression for that (or even strspn, which might be faster).
The reason is because it's reading the string as a whole. If you want it to parse out everything after the alphanumeric section you might have to do like and end(explode()); and run that through to make sure that it isn't valid and if it isn't valid then remove it from the equation, but then you'd have to check the end for every possible explode point i.e. .,-,~,etc.
Then again another trap that you might run into is that in the case of a item or anything w/ alphanumeric value it might just parse everything from after the last alphanumeric character on.
Sorry that this isn't much help, but I figured thinking aloud does help.

Why does this PHP regex not match for accented characters?

I'm writing a quick PHP page, and I need to ignore any Strings with accented characters. I am using this preg_match() string on each word:
"[ÀÁÅÃÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]"
(Quite a brute force method I know, but apparently [a-zA-Z] can match for accented characters)
But the function never seems to return true when it searches Strings with accented characters (Examples: "cheap…", "gustaría"...)
I haven't used Regex before, so please point out any stupid mistakes I'm making here!
PHP regexes need delimiters, like so:
preg_match('/[ÀÁÅÃÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]/', "gustaría");
Note that it's also preferable to use single quotes for regex because the dollar sign could be mistaken by php as a variable.

RegEx string "preg_replace"

I need to do a "find and replace" on about 45k lines of a CSV file and then put this into a database.
I figured I should be able to do this with PHP and preg_replace but can't seem to figure out the expression...
The lines consist of one field and are all in the following format:
"./1/024/9780310320241/SPSTANDARD.9780310320241.jpg" or "./t/fla/8204909_flat/SPSTANDARD.8204909_flat.jpg"
The first part will always be a period, the second part will always be one alphanumeric character, the third will always be three alphanumeric characters and the fourth should always be between 1 and 13 alphanumeric characters.
I came up with the following which seems to be right however I will openly profess to not knowing very much at all about regular expressions, it's a little new to me! I'm probably making a whole load of silly mistakes here...
$pattern = "/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z]{1,13}\/)$/";
$new = preg_replace($pattern, " ", $i);
Anyway any and all help appreciated!
Thanks,
Phil
The only mistake I encouter is the anchor for the string end $ that should be removed. And your expression is also missing the _ character:
/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z_]{1,13}\/)/
A more general pattern would be to just exclude the /:
/^(\.\/[^\/]{1}\/[^\/]{3}\/[^\/]{1,13}\/)/
You should use PHP's builtin parser for extracting the values out of the csv before matching any patterns.
I'm not sure I understand what you're asking. Do you mean every line in the file looks like that, and you want to process all of them? If so, this regex would do the trick:
'#^.*/#'
That simply matches everything up to and including the last slash, which is what your regex would do if it weren't for that rogue '$' everyone's talking about. If there are other lines in other formats that you want to leave alone, this regex will probably suit your needs:
'#^\./\w/\w{3}/\w{1,13}/#"
Notice how I changed the regex delimiter from '/' to '#' so I don't have to escape the slashes inside. You can use almost any punctuation character for the delimiters (but of course they both have to be the same).
The $ means the end of the string. So your pattern would match ./1/024/9780310320241/ and ./t/fla/8204909_flat/ if they were alone on their line. Remove the $ and it will match the first four parts of your string, replacing them with a space.
$pattern = "/(\.\/[0-9a-z]{1}\/[0-9a-z]{3}\/[0-9a-z\_]+\.(jpg|bmp|jpeg|png))\n/is";
I just saw, that your example string doesn't end with /, so may be you should remove it from your pattern at the end. Also underscore is used in the filename and should be in the character class.

Categories