php test if string ends with _text + number - php

I have a couple of issues with this if statement which checks if a string ends with "address".
E.g this matches user_address, user_address1, user_address99, etc. Which is correct.
The problem is that it also matches user_address_location which is not correct.
I only want this to match if:
Is ends with _address
Also if it has a number on the end e.g _address2
/* Only establish an address field as ADDRESS if follows "user_address1" format */
if((stristr($column,"address") !== false)) {
$parts = explode("_", $column);
if (!empty($parts)) {
// address code
}
}

This might be a decent place to use a regex
if(preg_match('/^user_addesss\d*$/', $column) === 1){
}

Use regular expressions:
if(preg_match('/_address(\d+)?$/', $column))
{
}
if you are doing a lot of string comparing and manipulation this web application will be very useful to you: http://gskinner.com/RegExr/
It allows you to develop regular expressions against content with live feedback on matches and replacements.

You can make use of $ of regular expressions here. When using regular expressions $ specifies the end of a string
So, you can search for this regular expression:
$regexp = "^.*_address\d+$";
^ is the start, .* indicates any number of any characters _address is what you want to search for, \d+ says it can have numbers after address, and $ indicates end of string.
You can read more about regular expressions, and preg_match on php.net

change if((stristr($column,"address") !== false)) { to if(preg_match("/^user_address[0-9]*/", $column) == 1) {

I wanted to find a way to do it without regular expressions, sorry I should of made that clear in the question.
But I think I have found a way, not sure if it can be improved or performance compared to the regular expressions.
/* First establish it could be an address field e.g "user_address1" format */
$column_address = strpos($column, "_address");
if($column_address !== false) {
// remove everything before "_address"
$last = substr($column, $column_address);
// Check if "_address" OR remove "_address" and check if int
if($last == "_address" || intval(substr($last, 8))){

Related

check words with preg_match

I have some words with | between each one and I have tried to use preg_match to detect if it's containing target word or not.
I have used this:
<?php
$c_words = 'od|lom|pod|dyk';
$my_word = 'od'; // only od not pod or other word
if (preg_match('/$my_word/', $c_words))
{
echo 'ok';
}
?>
But it doesn't work correctly.
Please help.
No need for regular expressions. The functions explode($delimiter, $str); and in_array($needle, $haystack); will do everything for you.
// splits words into an array
$array = explode('|', $c_words);
// check if "$my_word" exists in the array.
if(in_array($my_word, $array)) {
// YEP
} else {
// NOPE
}
Apart from that, your regular expression would match other words containing the same sequence too.
preg_match('/my/', 'myword|anotherword'); // true
preg_match('/another/', 'myword|anotherword'); // true
That's exactly why you shouldn't use regular expressions in this case.
You can't pass a variable into a string with single quotes, you need to use either
preg_match("/$my_word/", $c_words);
Or – and I find that cleaner :
preg_match('/' .$my_word. '/', $c_words);
But for something as simple as that I don't even know if I'd use a Regex, a simple if (strpos($c_words, $my_word) !== 0) should be enough.
You are using preg_match() the wrong way. Since you're using | as a delimiter you can try this:
if (preg_match('/'.$all_words.'/', $my_word, $c_words))
{
echo 'ok';
}
Read the documentation for preg_match().

Regex to match Youtube URL's

I am trying to validate a Youtube URL using regex:
preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+~', $videoLink)
It kind of works, but it can match URL's that are malformed. For example, this will match ok:
http://www.youtube.com/watch?v=Zu4WXiPRek
But so will this:
http://www.youtube.com/watch?v=Zu4WX£&P!ek
And this wont:
http://www.youtube.com/watch?v=!Zu4WX£&P4ek
I think it's because of the + operator. It's matching what seems to be the first character after v=, when it needs to try and match everything behind v= with [a-zA-Z0-9-]. Any help is appreciated, thanks.
To provide an alternative that is larger and much less elegant than a regex, but works with PHP's native URL parsing functions so it might be a bit more reliable in the long run:
$url = "http://www.youtube.com/watch?v=Zu4WXiPRek";
$query_string = parse_url($url, PHP_URL_QUERY); // v=Zu4WXiPRek
$query_string_parsed = array();
parse_str($query_string, $query_string_parsed); // an array with all GET params
echo($query_string_parsed["v"]); // Will output Zu4WXiPRek that you can then
// validate for [a-zA-Z0-9] using a regex
The problem is that you are not requiring any particular number of characters in the v= part of the URL. So, for instance, checking
http://www.youtube.com/watch?v=Zu4WX£&P!ek
will match
http://www.youtube.com/watch?v=Zu4WX
and therefore return true. You need to either specify the number of characters you need in the v= part:
preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]{10}~', $videoLink)
or specify that the group [a-zA-Z0-9-] must be the last part of the string:
preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+$~', $videoLink)
Your other example
http://www.youtube.com/watch?v=!Zu4WX£&P4ek
does not match, because the + sign requires that at least one character must match [a-zA-Z0-9-].
Short answer:
preg_match('%(http://www.youtube.com/watch\?v=(?:[a-zA-Z0-9-])+)(?:[&"\'\s])%', $videoLink)
There are a few assumptions made here, so let me explain:
I added a capturing group ( ... ) around the entire http://www.youtube.com/watch?v=blah part of the link, so that we can say "I want get the whole validated link up to and including the ?v=movieHash"
I added the non-capturing group (?: ... ) around your character set [a-zA-Z0-9-] and left the + sign outside of that. This will allow us to match all allowable characters up to a certain point.
Most importantly, you need to tell it how you expect your link to terminate. I'm taking a guess for you with (?:[&"\'\s])
?) Will it be in html format (e.g. anchor tag) ? If so, the link in href will obviously end with a " or '.
?) Or maybe there's more to the query string, so there would be an & after the value of v.
?) Maybe there's a space or line break after the end of the link \s.
The important piece is that you can get much more accurate results if you know what's surrounding what you are searching for, as is the case with many regular expressions.
This non-capturing group (in which I'm making assumptions for you) will take a stab at finding and ignoring all the extra junk after what you care about (the ?v=awesomeMovieHash).
Results:
http://www.youtube.com/watch?v=Zu4WXiPRek
- Group 1 contains the http://www.youtube.com/watch?v=Zu4WXiPRek
http://www.youtube.com/watch?v=Zu4WX&a=b
- Group 1 contains http://www.youtube.com/watch?v=Zu4WX
http://www.youtube.com/watch?v=!Zu4WX£&P4ek
- No match
a href="http://www.youtube.com/watch?v=Zu4WX&size=large"
- Group 1 contains http://www.youtube.com/watch?v=Zu4WX
http://www.youtube.com/watch?v=Zu4WX£&P!ek
- No match
The "v=..." blob is not guaranteed to be the first parameter in the query part of the URL. I'd recommend using PHP's parse_url() function to break the URL into its component parts. You can also reassemble a pristine URL if someone began the string with "https://" or simply used "youtube.com" instead of "www.youtube.com", etc.
function get_youtube_vidid ($url) {
$vidid = false;
$valid_schemes = array ('http', 'https');
$valid_hosts = array ('www.youtube.com', 'youtube.com');
$valid_paths = array ('/watch');
$bits = parse_url ($url);
if (! is_array ($bits)) {
return false;
}
if (! (array_key_exists ('scheme', $bits)
and array_key_exists ('host', $bits)
and array_key_exists ('path', $bits)
and array_key_exists ('query', $bits))) {
return false;
}
if (! in_array ($bits['scheme'], $valid_schemes)) {
return false;
}
if (! in_array ($bits['host'], $valid_hosts)) {
return false;
}
if (! in_array ($bits['path'], $valid_paths)) {
return false;
}
$querypairs = explode ('&', $bits['query']);
if (count ($querypairs) < 1) {
return false;
}
foreach ($querypairs as $querypair) {
list ($key, $value) = explode ('=', $querypair);
if ($key == 'v') {
if (preg_match ('/^[a-zA-Z0-9\-_]+$/', $value)) {
# Set the return value
$vidid = $value;
}
}
}
return $vidid;
}
Following regex will match any youtube link:
$pattern='#(((http(s)?://(www\.)?)|(www\.)|\s)(youtu\.be|youtube\.com)/(embed/|v/|watch(\?v=|\?.+&v=|/))?([a-zA-Z0-9._\/~#&=;%+?-\!]+))#si';

Get more backreferences from regexp than parenthesis

Ok this is really difficult to explain in English, so I'll just give an example.
I am going to have strings in the following format:
key-value;key1-value;key2-...
and I need to extract the data to be an array
array('key'=>'value','key1'=>'value1', ... )
I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:
/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/
to work with preg_match and this code:
for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
$parameters[$matches[$i]] = $matches[$i+1];
}
However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.
In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.
You can use a lookahead to validate the input while you extract the matches:
/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/
(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.
This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.
If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.
regex is powerful tool, but sometimes, its not the best approach.
$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
$e = explode("-",$k);
$array[$e[0]]=$e[1];
}
print_r($array);
Use preg_match_all() instead. Maybe something like:
$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';
preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$parameters[$match[1]] = $match[2];
}
print_r($parameters);
EDIT:
to first validate if the input string conforms to the pattern, then just use:
if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
EDIT2: the final semicolon is optional
if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.
what about this solution:
$samples = array(
"good" => "key-value;key1-value;key2-value;key5-value;key-value;",
"bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
"bad2" => "key;key1-value;key2-value;key5-value;key-value;",
"bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);
foreach($samples as $name => $value) {
if (preg_match("/^(\w+-\w+;)+$/", $value)) {
printf("'%s' matches\n", $name);
} else {
printf("'%s' not matches\n", $name);
}
}
I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

Regex error: 'Warning: ereg() [function.ereg]: REG_ERANGE' in PHP

The code below gives me this mysterious error, and i cannot fathom it. I am new to regular expressions and so am consequently stumped. The regular expression should be validating any international phone number.
Any help would be much appreciated.
function validate_phone($phone)
{
$phoneregexp ="^(\+[1-9][0-9]*(\([0-9]*\)|-[0-9]*-))?[0]?[1-9][0-9\- ]*$";
$phonevalid = 0;
if (ereg($phoneregexp, $phone))
{
$phonevalid = 1;
}else{
$phonevalid = 0;
}
}
Hmm well the code you pasted isn't quite valid, I fixed it up by adding the missing quotes, missing delimiters, and changed preg to preg_match. I didn't get the warning.
Edit: after seeing the other comment, you meant "ereg" not "preg"... that gives the warning. Try using preg_match() instead ;)
<?php
function validate_phone($phone) {
$phoneregexp ='/^(\+[1-9][0-9]*(\([0-9]*\)|-[0-9]*-))?[0]?[1-9][0-9\- ]*$/';
$phonevalid = 0;
if (preg_match($phoneregexp, $phone)) {
$phonevalid = 1;
} else {
$phonevalid = 0;
}
}
validate_phone("123456");
?>
If this is PHP, then the regex must be enclosed in quotes. Furthermore, what's preg? Did you mean preg_match?
Another thing. PHP knows boolean values. The canonical solution would rather look like this:
return preg_match($regex, $phone) !== 0;
EDIT: Or, using ereg:
return ereg($regex, $phone) !== FALSE;
(Here, the explicit test against FALSE isn't strictly necessary but since ereg returns a number upon success I feel safer coercing the value into a bool).
It's the [0-9\\- ] part of your RE - it's not escaping the "-" properly. Change it to [0-9 -] and you should be OK (a "-" at the last position in a character class is treated as literal, not part of a range specification).
Just to provide some reference material please read
Regular Expressions (Perl-Compatible)
preg_match()
or if you'd like to stick with the POSIX regexp:
Regular Expression (POSIX Extended)
ereg()
The correct sample code has already been given above.

Coalescing regular expressions in PHP

Suppose I have the following two strings containing regular expressions. How do I coalesce them? More specifically, I want to have the two expressions as alternatives.
$a = '# /[a-z] #i';
$b = '/ Moo /x';
$c = preg_magic_coalesce('|', $a, $b);
// Desired result should be equivalent to:
// '/ \/[a-zA-Z] |Moo/'
Of course, doing this as string operations isn't practical because it would involve parsing the expressions, constructing syntax trees, coalescing the trees and then outputting another regular expression equivalent to the tree. I'm completely happy without this last step. Unfortunately, PHP doesn't have a RegExp class (or does it?).
Is there any way to achieve this? Incidentally, does any other language offer a way? Isn't this a pretty normal scenario? Guess not. :-(
Alternatively, is there a way to check efficiently if either of the two expressions matches, and which one matches earlier (and if they match at the same position, which match is longer)? This is what I'm doing at the moment. Unfortunately, I do this on long strings, very often, for more than two patterns. The result is slow (and yes, this is definitely the bottleneck).
EDIT:
I should have been more specific – sorry. $a and $b are variables, their content is outside of my control! Otherwise, I would just coalesce them manually. Therefore, I can't make any assumptions about the delimiters or regex modifiers used. Notice, for example, that my first expression uses the i modifier (ignore casing) while the second uses x (extended syntax). Therefore, I can't just concatenate the two because the second expression does not ignore casing and the first doesn't use the extended syntax (and any whitespace therein is significant!
I see that porneL actually described a bunch of this, but this handles most of the problem. It cancels modifiers set in previous sub-expressions (which the other answer missed) and sets modifiers as specified in each sub-expression. It also handles non-slash delimiters (I could not find a specification of what characters are allowed here so I used ., you may want to narrow further).
One weakness is it doesn't handle back-references within expressions. My biggest concern with that is the limitations of back-references themselves. I'll leave that as an exercise to the reader/questioner.
// Pass as many expressions as you'd like
function preg_magic_coalesce() {
$active_modifiers = array();
$expression = '/(?:';
$sub_expressions = array();
foreach(func_get_args() as $arg) {
// Determine modifiers from sub-expression
if(preg_match('/^(.)(.*)\1([eimsuxADJSUX]+)$/', $arg, $matches)) {
$modifiers = preg_split('//', $matches[3]);
if($modifiers[0] == '') {
array_shift($modifiers);
}
if($modifiers[(count($modifiers) - 1)] == '') {
array_pop($modifiers);
}
$cancel_modifiers = $active_modifiers;
foreach($cancel_modifiers as $key => $modifier) {
if(in_array($modifier, $modifiers)) {
unset($cancel_modifiers[$key]);
}
}
$active_modifiers = $modifiers;
} elseif(preg_match('/(.)(.*)\1$/', $arg)) {
$cancel_modifiers = $active_modifiers;
$active_modifiers = array();
}
// If expression has modifiers, include them in sub-expression
$sub_modifier = '(?';
$sub_modifier .= implode('', $active_modifiers);
// Cancel modifiers from preceding sub-expression
if(count($cancel_modifiers) > 0) {
$sub_modifier .= '-' . implode('-', $cancel_modifiers);
}
$sub_modifier .= ')';
$sub_expression = preg_replace('/^(.)(.*)\1[eimsuxADJSUX]*$/', $sub_modifier . '$2', $arg);
// Properly escape slashes
$sub_expression = preg_replace('/(?<!\\\)\//', '\\\/', $sub_expression);
$sub_expressions[] = $sub_expression;
}
// Join expressions
$expression .= implode('|', $sub_expressions);
$expression .= ')/';
return $expression;
}
Edit: I've rewritten this (because I'm OCD) and ended up with:
function preg_magic_coalesce($expressions = array(), $global_modifier = '') {
if(!preg_match('/^((?:-?[eimsuxADJSUX])+)$/', $global_modifier)) {
$global_modifier = '';
}
$expression = '/(?:';
$sub_expressions = array();
foreach($expressions as $sub_expression) {
$active_modifiers = array();
// Determine modifiers from sub-expression
if(preg_match('/^(.)(.*)\1((?:-?[eimsuxADJSUX])+)$/', $sub_expression, $matches)) {
$active_modifiers = preg_split('/(-?[eimsuxADJSUX])/',
$matches[3], -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
}
// If expression has modifiers, include them in sub-expression
if(count($active_modifiers) > 0) {
$replacement = '(?';
$replacement .= implode('', $active_modifiers);
$replacement .= ':$2)';
} else {
$replacement = '$2';
}
$sub_expression = preg_replace('/^(.)(.*)\1(?:(?:-?[eimsuxADJSUX])*)$/',
$replacement, $sub_expression);
// Properly escape slashes if another delimiter was used
$sub_expression = preg_replace('/(?<!\\\)\//', '\\\/', $sub_expression);
$sub_expressions[] = $sub_expression;
}
// Join expressions
$expression .= implode('|', $sub_expressions);
$expression .= ')/' . $global_modifier;
return $expression;
}
It now uses (?modifiers:sub-expression) rather than (?modifiers)sub-expression|(?cancel-modifiers)sub-expression but I've noticed that both have some weird modifier side-effects. For instance, in both cases if a sub-expression has a /u modifier, it will fail to match (but if you pass 'u' as the second argument of the new function, that will match just fine).
Strip delimiters and flags from each. This regex should do it:
/^(.)(.*)\1([imsxeADSUXJu]*)$/
Join expressions together. You'll need non-capturing parenthesis to inject flags:
"(?$flags1:$regexp1)|(?$flags2:$regexp2)"
If there are any back references, count capturing parenthesis and update back references accordingly (e.g. properly joined /(.)x\1/ and /(.)y\1/ is /(.)x\1|(.)y\2/ ).
EDIT
I’ve rewritten the code! It now contains the changes that are listed as follows. Additionally, I've done extensive tests (which I won’t post here because they’re too many) to look for errors. So far, I haven’t found any.
The function is now split into two parts: There’s a separate function preg_split which takes a regular expression and returns an array containing the bare expression (without delimiters) and an array of modifiers. This might come in handy (it already has, in fact; this is why I made this change).
The code now correctly handles back-references. This was necessary for my purpose after all. It wasn’t difficult to add, the regular expression used to capture the back-references just looks weird (and may actually be extremely inefficient, it looks NP-hard to me – but that’s only an intuition and only applies in weird edge cases). By the way, does anyone know a better way of checking for an uneven number of matches than my way? Negative lookbehinds won't work here because they only accept fixed-length strings instead of regular expressions. However, I need the regex here to test whether the preceeding backslash is actually escaped itself.
Additionally, I don’t know how good PHP is at caching anonymous create_function use. Performance-wise, this might not be the best solution but it seems good enough.
I’ve fixed a bug in the sanity check.
I’ve removed the cancellation of obsolete modifiers since my tests show that it isn't necessary.
By the way, this code is one of the core components of a syntax highlighter for various languages that I’m working on in PHP since I’m not satisfied with the alternatives listed elsewhere.
Thanks!
porneL, eyelidlessness, amazing work! Many, many thanks. I had actually given up.
I've built upon your solution and I'd like to share it here. I didn't implement re-numbering back-references since this isn't relevant in my case (I think …). Perhaps this will become necessary later, though.
Some Questions …
One thing, #eyelidlessness: Why do you feel the necessity to cancel old modifiers? As far as I see it, this isn't necessary since the modifiers are only applied locally anyway.
Ah yes, one other thing. Your escaping of the delimiter seems overly complicated. Care to explain why you think this is needed? I believe my version should work as well but I could be very wrong.
Also, I've changed the signature of your function to match my needs. I also thing that my version is more generally useful. Again, I might be wrong.
BTW, you should now realize the importance of real names on SO. ;-) I can't give you real credit in the code. :-/
The Code
Anyway, I'd like to share my result so far because I can't believe that nobody else ever needs something like that. The code seems to work very well. Extensive tests are yet to be done, though. Please comment!
And without further ado …
/**
* Merges several regular expressions into one, using the indicated 'glue'.
*
* This function takes care of individual modifiers so it's safe to use
* <em>different</em> modifiers on the individual expressions. The order of
* sub-matches is preserved as well. Numbered back-references are adapted to
* the new overall sub-match count. This means that it's safe to use numbered
* back-refences in the individual expressions!
* If {#link $names} is given, the individual expressions are captured in
* named sub-matches using the contents of that array as names.
* Matching pair-delimiters (e.g. <code>"{…}"</code>) are currently
* <strong>not</strong> supported.
*
* The function assumes that all regular expressions are well-formed.
* Behaviour is undefined if they aren't.
*
* This function was created after a {#link https://stackoverflow.com/questions/244959/
* StackOverflow discussion}. Much of it was written or thought of by
* “porneL” and “eyelidlessness”. Many thanks to both of them.
*
* #param string $glue A string to insert between the individual expressions.
* This should usually be either the empty string, indicating
* concatenation, or the pipe (<code>|</code>), indicating alternation.
* Notice that this string might have to be escaped since it is treated
* like a normal character in a regular expression (i.e. <code>/</code>)
* will end the expression and result in an invalid output.
* #param array $expressions The expressions to merge. The expressions may
* have arbitrary different delimiters and modifiers.
* #param array $names Optional. This is either an empty array or an array of
* strings of the same length as {#link $expressions}. In that case,
* the strings of this array are used to create named sub-matches for the
* expressions.
* #return string An string representing a regular expression equivalent to the
* merged expressions. Returns <code>FALSE</code> if an error occurred.
*/
function preg_merge($glue, array $expressions, array $names = array()) {
// … then, a miracle occurs.
// Sanity check …
$use_names = ($names !== null and count($names) !== 0);
if (
$use_names and count($names) !== count($expressions) or
!is_string($glue)
)
return false;
$result = array();
// For keeping track of the names for sub-matches.
$names_count = 0;
// For keeping track of *all* captures to re-adjust backreferences.
$capture_count = 0;
foreach ($expressions as $expression) {
if ($use_names)
$name = str_replace(' ', '_', $names[$names_count++]);
// Get delimiters and modifiers:
$stripped = preg_strip($expression);
if ($stripped === false)
return false;
list($sub_expr, $modifiers) = $stripped;
// Re-adjust backreferences:
// We assume that the expression is correct and therefore don't check
// for matching parentheses.
$number_of_captures = preg_match_all('/\([^?]|\(\?[^:]/', $sub_expr, $_);
if ($number_of_captures === false)
return false;
if ($number_of_captures > 0) {
// NB: This looks NP-hard. Consider replacing.
$backref_expr = '/
( # Only match when not escaped:
[^\\\\] # guarantee an even number of backslashes
(\\\\*?)\\2 # (twice n, preceded by something else).
)
\\\\ (\d) # Backslash followed by a digit.
/x';
$sub_expr = preg_replace_callback(
$backref_expr,
create_function(
'$m',
'return $m[1] . "\\\\" . ((int)$m[3] + ' . $capture_count . ');'
),
$sub_expr
);
$capture_count += $number_of_captures;
}
// Last, construct the new sub-match:
$modifiers = implode('', $modifiers);
$sub_modifiers = "(?$modifiers)";
if ($sub_modifiers === '(?)')
$sub_modifiers = '';
$sub_name = $use_names ? "?<$name>" : '?:';
$new_expr = "($sub_name$sub_modifiers$sub_expr)";
$result[] = $new_expr;
}
return '/' . implode($glue, $result) . '/';
}
/**
* Strips a regular expression string off its delimiters and modifiers.
* Additionally, normalize the delimiters (i.e. reformat the pattern so that
* it could have used '/' as delimiter).
*
* #param string $expression The regular expression string to strip.
* #return array An array whose first entry is the expression itself, the
* second an array of delimiters. If the argument is not a valid regular
* expression, returns <code>FALSE</code>.
*
*/
function preg_strip($expression) {
if (preg_match('/^(.)(.*)\\1([imsxeADSUXJu]*)$/s', $expression, $matches) !== 1)
return false;
$delim = $matches[1];
$sub_expr = $matches[2];
if ($delim !== '/') {
// Replace occurrences by the escaped delimiter by its unescaped
// version and escape new delimiter.
$sub_expr = str_replace("\\$delim", $delim, $sub_expr);
$sub_expr = str_replace('/', '\\/', $sub_expr);
}
$modifiers = $matches[3] === '' ? array() : str_split(trim($matches[3]));
return array($sub_expr, $modifiers);
}
PS: I've made this posting community wiki editable. You know what this means …!
I'm pretty sure it's not possible to just put regexps together like that in any language - they could have incompatible modifiers.
I'd probably just put them in an array and loop through them, or combine them by hand.
Edit: If you're doing them one at a time as described in your edit, you maybe be able to run the second one on a substring (from the start up to the earliest match). That might help things.
function preg_magic_coalasce($split, $re1, $re2) {
$re1 = rtrim($re1, "\/#is");
$re2 = ltrim($re2, "\/#");
return $re1.$split.$re2;
}
You could do it the alternative way like this:
$a = '# /[a-z] #i';
$b = '/ Moo /x';
$a_matched = preg_match($a, $text, $a_matches);
$b_matched = preg_match($b, $text, $b_matches);
if ($a_matched && $b_matched) {
$a_pos = strpos($text, $a_matches[1]);
$b_pos = strpos($text, $b_matches[1]);
if ($a_pos == $b_pos) {
if (strlen($a_matches[1]) == strlen($b_matches[1])) {
// $a and $b matched the exact same string
} else if (strlen($a_matches[1]) > strlen($b_matches[1])) {
// $a and $b started matching at the same spot but $a is longer
} else {
// $a and $b started matching at the same spot but $b is longer
}
} else if ($a_pos < $b_pos) {
// $a matched first
} else {
// $b matched first
}
} else if ($a_matched) {
// $a matched, $b didn't
} else if ($b_matched) {
// $b matched, $a didn't
} else {
// neither one matched
}

Categories