How can I validate regex? - php

I'd like to test the validity of a regular expression in PHP, preferably before it's used. Is the only way to do this actually trying a preg_match() and seeing if it returns FALSE?
Is there a simpler/proper way to test for a valid regular expression?

// This is valid, both opening ( and closing )
var_dump(preg_match('~Valid(Regular)Expression~', '') === false);
// This is invalid, no opening ( for the closing )
var_dump(preg_match('~InvalidRegular)Expression~', '') === false);
As the user pozs said, also consider putting # in front of preg_match() (#preg_match()) in a testing environment to prevent warnings or notices.
To validate a RegExp just run it against null (no need to know the data you want to test against upfront). If it returns explicit false (=== false), it's broken. Otherwise it's valid though it need not match anything.
So there's no need to write your own RegExp validator. It's wasted time...

I created a simple function that can be called to checking preg
function is_preg_error()
{
$errors = array(
PREG_NO_ERROR => 'Code 0 : No errors',
PREG_INTERNAL_ERROR => 'Code 1 : There was an internal PCRE error',
PREG_BACKTRACK_LIMIT_ERROR => 'Code 2 : Backtrack limit was exhausted',
PREG_RECURSION_LIMIT_ERROR => 'Code 3 : Recursion limit was exhausted',
PREG_BAD_UTF8_ERROR => 'Code 4 : The offset didn\'t correspond to the begin of a valid UTF-8 code point',
PREG_BAD_UTF8_OFFSET_ERROR => 'Code 5 : Malformed UTF-8 data',
);
return $errors[preg_last_error()];
}
You can call this function using the follow code :
preg_match('/(?:\D+|<\d+>)*[!?]/', 'foobar foobar foobar');
echo is_preg_error();
Alternative - Regular Expression Online Tester
RegExr
PHP Regular Expression Tester
Regular Expression Tool

If you want to dynamically test a regex preg_match(...) === false seems to be your only option. PHP doesn't have a mechanism for compiling regular expressions before they are used.
Also you may find preg_last_error an useful function.
On the other hand if you have a regex and just want to know if it's valid before using it there are a bunch of tools available out there. I found rubular.com to be pleasant to use.

You can check to see if it is a syntactically correct regex with this nightmare of a regex, if your engine supports recursion (PHP should).
You cannot, however algorithmically tell if it will give the results you want without running it.
From: Is there a regular expression to detect a valid regular expression?
/^((?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>)?(?1)??\)|\(\?(?:R|[+-]?\d+)\))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*)$/

Without actually executing the regex you have no way to be sure if it's be valid. I've recently implemented a similar RegexValidator for Zend Framework. Works just fine.
<?php
class Nuke_Validate_RegEx extends Zend_Validate_Abstract
{
/**
* Error constant
*/
const ERROR_INVALID_REGEX = 'invalidRegex';
/**
* Error messages
* #var array
*/
protected $_messageTemplates = array(
self::ERROR_INVALID_REGEX => "This is a regular expression PHP cannot parse.");
/**
* Runs the actual validation
* #param string $pattern The regular expression we are testing
* #return bool
*/
public function isValid($pattern)
{
if (#preg_match($pattern, "Lorem ipsum") === false) {
$this->_error(self::ERROR_INVALID_REGEX);
return false;
}
return true;
}
}

You can validate your regular expression with a regular expression and up to a certain limit. Checkout this stack overflow answer for more info.
Note: a "recursive regular expression" is not a regular expression, and this extended version of regex doesn't match extended regexes.
A better option is to use preg_match and match against NULL as #Claudrian said

I am not sure if it supports PCRE, but there is a Chrome extension over at https://chrome.google.com/webstore/detail/cmmblmkfaijaadfjapjddbeaoffeccib called RegExp Tester. I have not used it as yet myself so I cannot vouch for it, but perhaps it could be of use?

I'd be inclined to set up a number of unit tests for your regex. This way not only would you be able to ensure that the regex is indeed valid but also effective at matching.
I find using TDD is an effective way to develop regex and means that extending it in the future is simplified as you already have all of your test cases available.
The answer to this question has a great answer on setting up your unit tests.

So in summary, for all those coming to this question you can validate regular expressions in PHP with a function like this.
preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred. - PHP Manual
/**
* Return an error message if the regular expression is invalid
*
* #param string $regex string to validate
* #return string
*/
function invalidRegex($regex)
{
if(preg_match($regex, null) !== false)
{
return '';
}
$errors = array(
PREG_NO_ERROR => 'Code 0 : No errors',
PREG_INTERNAL_ERROR => 'Code 1 : There was an internal PCRE error',
PREG_BACKTRACK_LIMIT_ERROR => 'Code 2 : Backtrack limit was exhausted',
PREG_RECURSION_LIMIT_ERROR => 'Code 3 : Recursion limit was exhausted',
PREG_BAD_UTF8_ERROR => 'Code 4 : The offset didn\'t correspond to the begin of a valid UTF-8 code point',
PREG_BAD_UTF8_OFFSET_ERROR => 'Code 5 : Malformed UTF-8 data',
);
return $errors[preg_last_error()];
}
Which can be used like this.
if($error = invalidRegex('/foo//'))
{
die($error);
}

You could use valid() from T-Regx
pattern('InvalidRegular)Expression')->valid(); // bool (false)

just use the easy way - look if the preg_match is return a false value:
//look is a regex or not
$look = "your_regex_string";
if (preg_match("/".$look."/", "test_string") !== false) {
//regex_valid
} else {
//regex_invalid
}

You should try to match the regular expression against NULL. If the result is FALSE (=== FALSE), there was an error.
In PHP >= 5.5, you can use the following to automatically get the built-in error message, without needing to define your own function to get it:
// For PHP >= 8, use the built-in strEndsWith instead of this function.
// Taken from https://www.php.net/manual/en/function.str-ends-with.php#125967
function endsWith($haystack, $needle) {
$length = strlen($needle);
return $length > 0 ? substr($haystack, -$length) === $needle : true;
}
function test_regex($regex) {
preg_match($regex, NULL);
$constants = get_defined_constants(true)['pcre'];
foreach ($constants as $key => $value) {
if (!endsWith($key, '_ERROR')) {
unset($constants[$key]);
}
}
return array_flip($constants)[preg_last_error()];
}
Try it online!
Note that the call to preg_match() will still throw a warning for invalid regular expressions. The warning can be caught with a custom error handler using set_error_handler().
See Can I try/catch a warning?.

According to the PCRE reference, there is no such way to test validity of an expression, before it's used. But i think, if someone use an invalid expression, it's a design error in that application, not a run-time one, so you should be fine.

Related

what is equivalent of =~ of ruby in php?

I am a Rubyist trying to implement some of my code in PHP and not able to get the equivalent PHP code for this particular def.Can anyone help me out.Thanks in advance.
def check_condition(str)
str =~ SOME_REGEX
end
In PHP it looks like:
function check_condition($str) {
return preg_match(SOME_REGEX, $str);
}
Unfortunately there is no regex-match operator in PHP unlike some other languages. You'll have to call a function. Follow the manual of preg_match() and the manual page about the so called perl compatible regular expresssions (preg) in general.
Something additional. After reading the manual page of preg_match you know that the method returns an integer, the number of matches found. As the method returns after the first match this can be only 0 or 1. As of the loose typing system of PHP this would be good for using it in loose comparisons like:
if(check_condition($str)) { ....
if(check_condition($str) == true) { ...
But it would not work in a strict comparison:
if(check_condition($str) === true) { ...
Therefore it would be a good idea to cast the return value of preg_match:
function check_condition($str) {
return (boolean) preg_match(SOME_REGEX, $str);
}
Update
I have thought a little bit about my last suggestion and I see a problem with this. preg_match() will return an integer if all is working fine but boolean FALSE if an error occurs. For example because of a syntax error in the regex pattern. Therefore you will be not aware of errors if you are just casting to boolean. I would use exceptions to show that an error was happening:
function check_condition($str) {
$ret = preg_match(SOME_REGEX, $str);
if($ret === FALSE) {
$error = error_get_last();
throw new Exception($error['message']);
}
return (boolean) $ret;
}
Have a look at preg_match:
if (preg_match('/regex/', $string) {
return 1;
}
Isn't it preg_match?
function check_condition($str) {
return preg_match(SOME_REGEX,$str);
}
I don't think there is an equivalent.
preg_match returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.
=~however returns the position where the match starts, or nil if there is no match. Since nil is false and all numbers including zero are true, boolean operations are possible.
puts "abcdef" =~ /def/ #=> 3 # don't know how to get this from a RegExp in PHP
puts "Matches" if "abcdef"=~ /def/ #=> Matches

php test if string ends with _text + number

I have a couple of issues with this if statement which checks if a string ends with "address".
E.g this matches user_address, user_address1, user_address99, etc. Which is correct.
The problem is that it also matches user_address_location which is not correct.
I only want this to match if:
Is ends with _address
Also if it has a number on the end e.g _address2
/* Only establish an address field as ADDRESS if follows "user_address1" format */
if((stristr($column,"address") !== false)) {
$parts = explode("_", $column);
if (!empty($parts)) {
// address code
}
}
This might be a decent place to use a regex
if(preg_match('/^user_addesss\d*$/', $column) === 1){
}
Use regular expressions:
if(preg_match('/_address(\d+)?$/', $column))
{
}
if you are doing a lot of string comparing and manipulation this web application will be very useful to you: http://gskinner.com/RegExr/
It allows you to develop regular expressions against content with live feedback on matches and replacements.
You can make use of $ of regular expressions here. When using regular expressions $ specifies the end of a string
So, you can search for this regular expression:
$regexp = "^.*_address\d+$";
^ is the start, .* indicates any number of any characters _address is what you want to search for, \d+ says it can have numbers after address, and $ indicates end of string.
You can read more about regular expressions, and preg_match on php.net
change if((stristr($column,"address") !== false)) { to if(preg_match("/^user_address[0-9]*/", $column) == 1) {
I wanted to find a way to do it without regular expressions, sorry I should of made that clear in the question.
But I think I have found a way, not sure if it can be improved or performance compared to the regular expressions.
/* First establish it could be an address field e.g "user_address1" format */
$column_address = strpos($column, "_address");
if($column_address !== false) {
// remove everything before "_address"
$last = substr($column, $column_address);
// Check if "_address" OR remove "_address" and check if int
if($last == "_address" || intval(substr($last, 8))){

Check if a string does not contains a specific substring [duplicate]

This question already has answers here:
How do I check if a string contains a specific word?
(36 answers)
Closed 26 days ago.
In SQL we have NOT LIKE %string%
I need to do this in PHP.
if ($string NOT LIKE %word%) { do something }
I think that can be done with strpos()
But can’t figure out how…
I need exactly that comparission sentence in valid PHP.
if ($string NOT LIKE %word%) { do something }
if (strpos($string, $word) === FALSE) {
... not found ...
}
Note that strpos() is case sensitive, if you want a case-insensitive search, use stripos() instead.
Also note the ===, forcing a strict equality test. strpos CAN return a valid 0 if the 'needle' string is at the start of the 'haystack'. By forcing a check for an actual boolean false (aka 0), you eliminate that false positive.
Use strpos. If the string is not found it returns false, otherwise something that is not false. Be sure to use a type-safe comparison (===) as 0 may be returned and it is a falsy value:
if (strpos($string, $substring) === false) {
// substring is not found in string
}
if (strpos($string, $substring2) !== false) {
// substring2 is found in string
}
use
if(stripos($str,'job')){
// do your work
}
<?php
// Use this function and Pass Mixed string and what you want to search in mixed string.
// For Example :
$mixedStr = "hello world. This is john duvey";
$searchStr= "john";
if(strpos($mixedStr,$searchStr)) {
echo "Your string here";
}else {
echo "String not here";
}
Kind of depends on your data, doesn't it? strpos('a foolish idea','fool') will show a match, but may not be what you want. If dealing with words, perhaps
preg_match("!\b$word\b!i",$sentence)
is wiser. Just a thought.

Regex error: 'Warning: ereg() [function.ereg]: REG_ERANGE' in PHP

The code below gives me this mysterious error, and i cannot fathom it. I am new to regular expressions and so am consequently stumped. The regular expression should be validating any international phone number.
Any help would be much appreciated.
function validate_phone($phone)
{
$phoneregexp ="^(\+[1-9][0-9]*(\([0-9]*\)|-[0-9]*-))?[0]?[1-9][0-9\- ]*$";
$phonevalid = 0;
if (ereg($phoneregexp, $phone))
{
$phonevalid = 1;
}else{
$phonevalid = 0;
}
}
Hmm well the code you pasted isn't quite valid, I fixed it up by adding the missing quotes, missing delimiters, and changed preg to preg_match. I didn't get the warning.
Edit: after seeing the other comment, you meant "ereg" not "preg"... that gives the warning. Try using preg_match() instead ;)
<?php
function validate_phone($phone) {
$phoneregexp ='/^(\+[1-9][0-9]*(\([0-9]*\)|-[0-9]*-))?[0]?[1-9][0-9\- ]*$/';
$phonevalid = 0;
if (preg_match($phoneregexp, $phone)) {
$phonevalid = 1;
} else {
$phonevalid = 0;
}
}
validate_phone("123456");
?>
If this is PHP, then the regex must be enclosed in quotes. Furthermore, what's preg? Did you mean preg_match?
Another thing. PHP knows boolean values. The canonical solution would rather look like this:
return preg_match($regex, $phone) !== 0;
EDIT: Or, using ereg:
return ereg($regex, $phone) !== FALSE;
(Here, the explicit test against FALSE isn't strictly necessary but since ereg returns a number upon success I feel safer coercing the value into a bool).
It's the [0-9\\- ] part of your RE - it's not escaping the "-" properly. Change it to [0-9 -] and you should be OK (a "-" at the last position in a character class is treated as literal, not part of a range specification).
Just to provide some reference material please read
Regular Expressions (Perl-Compatible)
preg_match()
or if you'd like to stick with the POSIX regexp:
Regular Expression (POSIX Extended)
ereg()
The correct sample code has already been given above.

Coalescing regular expressions in PHP

Suppose I have the following two strings containing regular expressions. How do I coalesce them? More specifically, I want to have the two expressions as alternatives.
$a = '# /[a-z] #i';
$b = '/ Moo /x';
$c = preg_magic_coalesce('|', $a, $b);
// Desired result should be equivalent to:
// '/ \/[a-zA-Z] |Moo/'
Of course, doing this as string operations isn't practical because it would involve parsing the expressions, constructing syntax trees, coalescing the trees and then outputting another regular expression equivalent to the tree. I'm completely happy without this last step. Unfortunately, PHP doesn't have a RegExp class (or does it?).
Is there any way to achieve this? Incidentally, does any other language offer a way? Isn't this a pretty normal scenario? Guess not. :-(
Alternatively, is there a way to check efficiently if either of the two expressions matches, and which one matches earlier (and if they match at the same position, which match is longer)? This is what I'm doing at the moment. Unfortunately, I do this on long strings, very often, for more than two patterns. The result is slow (and yes, this is definitely the bottleneck).
EDIT:
I should have been more specific – sorry. $a and $b are variables, their content is outside of my control! Otherwise, I would just coalesce them manually. Therefore, I can't make any assumptions about the delimiters or regex modifiers used. Notice, for example, that my first expression uses the i modifier (ignore casing) while the second uses x (extended syntax). Therefore, I can't just concatenate the two because the second expression does not ignore casing and the first doesn't use the extended syntax (and any whitespace therein is significant!
I see that porneL actually described a bunch of this, but this handles most of the problem. It cancels modifiers set in previous sub-expressions (which the other answer missed) and sets modifiers as specified in each sub-expression. It also handles non-slash delimiters (I could not find a specification of what characters are allowed here so I used ., you may want to narrow further).
One weakness is it doesn't handle back-references within expressions. My biggest concern with that is the limitations of back-references themselves. I'll leave that as an exercise to the reader/questioner.
// Pass as many expressions as you'd like
function preg_magic_coalesce() {
$active_modifiers = array();
$expression = '/(?:';
$sub_expressions = array();
foreach(func_get_args() as $arg) {
// Determine modifiers from sub-expression
if(preg_match('/^(.)(.*)\1([eimsuxADJSUX]+)$/', $arg, $matches)) {
$modifiers = preg_split('//', $matches[3]);
if($modifiers[0] == '') {
array_shift($modifiers);
}
if($modifiers[(count($modifiers) - 1)] == '') {
array_pop($modifiers);
}
$cancel_modifiers = $active_modifiers;
foreach($cancel_modifiers as $key => $modifier) {
if(in_array($modifier, $modifiers)) {
unset($cancel_modifiers[$key]);
}
}
$active_modifiers = $modifiers;
} elseif(preg_match('/(.)(.*)\1$/', $arg)) {
$cancel_modifiers = $active_modifiers;
$active_modifiers = array();
}
// If expression has modifiers, include them in sub-expression
$sub_modifier = '(?';
$sub_modifier .= implode('', $active_modifiers);
// Cancel modifiers from preceding sub-expression
if(count($cancel_modifiers) > 0) {
$sub_modifier .= '-' . implode('-', $cancel_modifiers);
}
$sub_modifier .= ')';
$sub_expression = preg_replace('/^(.)(.*)\1[eimsuxADJSUX]*$/', $sub_modifier . '$2', $arg);
// Properly escape slashes
$sub_expression = preg_replace('/(?<!\\\)\//', '\\\/', $sub_expression);
$sub_expressions[] = $sub_expression;
}
// Join expressions
$expression .= implode('|', $sub_expressions);
$expression .= ')/';
return $expression;
}
Edit: I've rewritten this (because I'm OCD) and ended up with:
function preg_magic_coalesce($expressions = array(), $global_modifier = '') {
if(!preg_match('/^((?:-?[eimsuxADJSUX])+)$/', $global_modifier)) {
$global_modifier = '';
}
$expression = '/(?:';
$sub_expressions = array();
foreach($expressions as $sub_expression) {
$active_modifiers = array();
// Determine modifiers from sub-expression
if(preg_match('/^(.)(.*)\1((?:-?[eimsuxADJSUX])+)$/', $sub_expression, $matches)) {
$active_modifiers = preg_split('/(-?[eimsuxADJSUX])/',
$matches[3], -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
}
// If expression has modifiers, include them in sub-expression
if(count($active_modifiers) > 0) {
$replacement = '(?';
$replacement .= implode('', $active_modifiers);
$replacement .= ':$2)';
} else {
$replacement = '$2';
}
$sub_expression = preg_replace('/^(.)(.*)\1(?:(?:-?[eimsuxADJSUX])*)$/',
$replacement, $sub_expression);
// Properly escape slashes if another delimiter was used
$sub_expression = preg_replace('/(?<!\\\)\//', '\\\/', $sub_expression);
$sub_expressions[] = $sub_expression;
}
// Join expressions
$expression .= implode('|', $sub_expressions);
$expression .= ')/' . $global_modifier;
return $expression;
}
It now uses (?modifiers:sub-expression) rather than (?modifiers)sub-expression|(?cancel-modifiers)sub-expression but I've noticed that both have some weird modifier side-effects. For instance, in both cases if a sub-expression has a /u modifier, it will fail to match (but if you pass 'u' as the second argument of the new function, that will match just fine).
Strip delimiters and flags from each. This regex should do it:
/^(.)(.*)\1([imsxeADSUXJu]*)$/
Join expressions together. You'll need non-capturing parenthesis to inject flags:
"(?$flags1:$regexp1)|(?$flags2:$regexp2)"
If there are any back references, count capturing parenthesis and update back references accordingly (e.g. properly joined /(.)x\1/ and /(.)y\1/ is /(.)x\1|(.)y\2/ ).
EDIT
I’ve rewritten the code! It now contains the changes that are listed as follows. Additionally, I've done extensive tests (which I won’t post here because they’re too many) to look for errors. So far, I haven’t found any.
The function is now split into two parts: There’s a separate function preg_split which takes a regular expression and returns an array containing the bare expression (without delimiters) and an array of modifiers. This might come in handy (it already has, in fact; this is why I made this change).
The code now correctly handles back-references. This was necessary for my purpose after all. It wasn’t difficult to add, the regular expression used to capture the back-references just looks weird (and may actually be extremely inefficient, it looks NP-hard to me – but that’s only an intuition and only applies in weird edge cases). By the way, does anyone know a better way of checking for an uneven number of matches than my way? Negative lookbehinds won't work here because they only accept fixed-length strings instead of regular expressions. However, I need the regex here to test whether the preceeding backslash is actually escaped itself.
Additionally, I don’t know how good PHP is at caching anonymous create_function use. Performance-wise, this might not be the best solution but it seems good enough.
I’ve fixed a bug in the sanity check.
I’ve removed the cancellation of obsolete modifiers since my tests show that it isn't necessary.
By the way, this code is one of the core components of a syntax highlighter for various languages that I’m working on in PHP since I’m not satisfied with the alternatives listed elsewhere.
Thanks!
porneL, eyelidlessness, amazing work! Many, many thanks. I had actually given up.
I've built upon your solution and I'd like to share it here. I didn't implement re-numbering back-references since this isn't relevant in my case (I think …). Perhaps this will become necessary later, though.
Some Questions …
One thing, #eyelidlessness: Why do you feel the necessity to cancel old modifiers? As far as I see it, this isn't necessary since the modifiers are only applied locally anyway.
Ah yes, one other thing. Your escaping of the delimiter seems overly complicated. Care to explain why you think this is needed? I believe my version should work as well but I could be very wrong.
Also, I've changed the signature of your function to match my needs. I also thing that my version is more generally useful. Again, I might be wrong.
BTW, you should now realize the importance of real names on SO. ;-) I can't give you real credit in the code. :-/
The Code
Anyway, I'd like to share my result so far because I can't believe that nobody else ever needs something like that. The code seems to work very well. Extensive tests are yet to be done, though. Please comment!
And without further ado …
/**
* Merges several regular expressions into one, using the indicated 'glue'.
*
* This function takes care of individual modifiers so it's safe to use
* <em>different</em> modifiers on the individual expressions. The order of
* sub-matches is preserved as well. Numbered back-references are adapted to
* the new overall sub-match count. This means that it's safe to use numbered
* back-refences in the individual expressions!
* If {#link $names} is given, the individual expressions are captured in
* named sub-matches using the contents of that array as names.
* Matching pair-delimiters (e.g. <code>"{…}"</code>) are currently
* <strong>not</strong> supported.
*
* The function assumes that all regular expressions are well-formed.
* Behaviour is undefined if they aren't.
*
* This function was created after a {#link https://stackoverflow.com/questions/244959/
* StackOverflow discussion}. Much of it was written or thought of by
* “porneL” and “eyelidlessness”. Many thanks to both of them.
*
* #param string $glue A string to insert between the individual expressions.
* This should usually be either the empty string, indicating
* concatenation, or the pipe (<code>|</code>), indicating alternation.
* Notice that this string might have to be escaped since it is treated
* like a normal character in a regular expression (i.e. <code>/</code>)
* will end the expression and result in an invalid output.
* #param array $expressions The expressions to merge. The expressions may
* have arbitrary different delimiters and modifiers.
* #param array $names Optional. This is either an empty array or an array of
* strings of the same length as {#link $expressions}. In that case,
* the strings of this array are used to create named sub-matches for the
* expressions.
* #return string An string representing a regular expression equivalent to the
* merged expressions. Returns <code>FALSE</code> if an error occurred.
*/
function preg_merge($glue, array $expressions, array $names = array()) {
// … then, a miracle occurs.
// Sanity check …
$use_names = ($names !== null and count($names) !== 0);
if (
$use_names and count($names) !== count($expressions) or
!is_string($glue)
)
return false;
$result = array();
// For keeping track of the names for sub-matches.
$names_count = 0;
// For keeping track of *all* captures to re-adjust backreferences.
$capture_count = 0;
foreach ($expressions as $expression) {
if ($use_names)
$name = str_replace(' ', '_', $names[$names_count++]);
// Get delimiters and modifiers:
$stripped = preg_strip($expression);
if ($stripped === false)
return false;
list($sub_expr, $modifiers) = $stripped;
// Re-adjust backreferences:
// We assume that the expression is correct and therefore don't check
// for matching parentheses.
$number_of_captures = preg_match_all('/\([^?]|\(\?[^:]/', $sub_expr, $_);
if ($number_of_captures === false)
return false;
if ($number_of_captures > 0) {
// NB: This looks NP-hard. Consider replacing.
$backref_expr = '/
( # Only match when not escaped:
[^\\\\] # guarantee an even number of backslashes
(\\\\*?)\\2 # (twice n, preceded by something else).
)
\\\\ (\d) # Backslash followed by a digit.
/x';
$sub_expr = preg_replace_callback(
$backref_expr,
create_function(
'$m',
'return $m[1] . "\\\\" . ((int)$m[3] + ' . $capture_count . ');'
),
$sub_expr
);
$capture_count += $number_of_captures;
}
// Last, construct the new sub-match:
$modifiers = implode('', $modifiers);
$sub_modifiers = "(?$modifiers)";
if ($sub_modifiers === '(?)')
$sub_modifiers = '';
$sub_name = $use_names ? "?<$name>" : '?:';
$new_expr = "($sub_name$sub_modifiers$sub_expr)";
$result[] = $new_expr;
}
return '/' . implode($glue, $result) . '/';
}
/**
* Strips a regular expression string off its delimiters and modifiers.
* Additionally, normalize the delimiters (i.e. reformat the pattern so that
* it could have used '/' as delimiter).
*
* #param string $expression The regular expression string to strip.
* #return array An array whose first entry is the expression itself, the
* second an array of delimiters. If the argument is not a valid regular
* expression, returns <code>FALSE</code>.
*
*/
function preg_strip($expression) {
if (preg_match('/^(.)(.*)\\1([imsxeADSUXJu]*)$/s', $expression, $matches) !== 1)
return false;
$delim = $matches[1];
$sub_expr = $matches[2];
if ($delim !== '/') {
// Replace occurrences by the escaped delimiter by its unescaped
// version and escape new delimiter.
$sub_expr = str_replace("\\$delim", $delim, $sub_expr);
$sub_expr = str_replace('/', '\\/', $sub_expr);
}
$modifiers = $matches[3] === '' ? array() : str_split(trim($matches[3]));
return array($sub_expr, $modifiers);
}
PS: I've made this posting community wiki editable. You know what this means …!
I'm pretty sure it's not possible to just put regexps together like that in any language - they could have incompatible modifiers.
I'd probably just put them in an array and loop through them, or combine them by hand.
Edit: If you're doing them one at a time as described in your edit, you maybe be able to run the second one on a substring (from the start up to the earliest match). That might help things.
function preg_magic_coalasce($split, $re1, $re2) {
$re1 = rtrim($re1, "\/#is");
$re2 = ltrim($re2, "\/#");
return $re1.$split.$re2;
}
You could do it the alternative way like this:
$a = '# /[a-z] #i';
$b = '/ Moo /x';
$a_matched = preg_match($a, $text, $a_matches);
$b_matched = preg_match($b, $text, $b_matches);
if ($a_matched && $b_matched) {
$a_pos = strpos($text, $a_matches[1]);
$b_pos = strpos($text, $b_matches[1]);
if ($a_pos == $b_pos) {
if (strlen($a_matches[1]) == strlen($b_matches[1])) {
// $a and $b matched the exact same string
} else if (strlen($a_matches[1]) > strlen($b_matches[1])) {
// $a and $b started matching at the same spot but $a is longer
} else {
// $a and $b started matching at the same spot but $b is longer
}
} else if ($a_pos < $b_pos) {
// $a matched first
} else {
// $b matched first
}
} else if ($a_matched) {
// $a matched, $b didn't
} else if ($b_matched) {
// $b matched, $a didn't
} else {
// neither one matched
}

Categories