Php preg_match optional slash not working properly - php

I have this regex:
`^(?:/(?P<cat1>[^/\.]+/?)?)(?:(?P<cat2>[^/\.]+/?)?)(?:(?P<cat3>[^/\.]+/?)?)(?:(?P<cat4>[^/\.]+/?)?)(?:(?P<slug>[^/\.]+))-(?:(?P<id>[0-9]++))$`u
Which should work with
/cat-one/product-14
/cat-one/cat-two/product-14
/cat-one/cat-two/cat-three/product-14
/cat-one/cat-two/cat-three/cat-four/product-14
Problem is that only with the fourth one works good.
Array
(
[cat1] => cat-one
[cat2] => cat-two
[cat3] => cat-three
[cat4] => cat-four
[slug] => product
[id] => 14
)
The first three the 'slug' parameter has only one letter and the cat before gets the first letters:
Array
(
[cat1] => cat-one
[cat2] => cat-two
[cat3] => produc
[cat4] =>
[slug] => t
[id] => 14
)
I know the optional / is causing some problems, but i need it to match something else in the code and this regex is generated dinamically and I can not set a specific if for this case only.
(?P<cat1>[^/\.]+/?)?)
How can I make the / optional but still get the result I need?
Thanks!
LE: The problem here preg match possible duplicate was that i had different parameters optional and the preg_match was not matching them accordingly. The question above is different, since the problem is that because of a /? i get my slug broken in two.

Don't make the / optional since the groups are optional.
This leaves the slug-id intact each time.
^/(?:(?<cat1>[^/.\r\n]+/)?)(?:(?<cat2>[^/.\r\n]+/)?)(?:(?<cat3>[^/.\r\n]+/)?)(?:(?<cat4>[^/.\r\n]+/)?)(?:(?<slug>[^/.\r\n]+))-(?:(?<id>[0-9]++))$
https://regex101.com/r/Z64x8l/1
Readable regex
^ /
(?:
(?<cat1> [^/.\r\n]+ / )? # (1)
)
(?:
(?<cat2> [^/.\r\n]+ / )? # (2)
)
(?:
(?<cat3> [^/.\r\n]+ / )? # (3)
)
(?:
(?<cat4> [^/.\r\n]+ / )? # (4)
)
(?:
(?<slug> [^/.\r\n]+ ) # (5)
)
-
(?:
(?<id> [0-9]++ ) # (6)
)
$
Note, \r\n were added for multiline purposes. If you have a single line
string, just take that out.
Also, if you believe there may be more nesting before slug-id that you
don't account for, just add (?:[^/.\r\r]+/)* before the slug named group.
This will always keep the slug-id at the end.

Related

Regex: Capturing multiple instances in one word group

I'm not good at Regex and I've been trying for hours now so I hope you can help me. I have this text:
✝his is *✝he* *in✝erne✝*
I need to capture (using PREG_OFFSET_CAPTURE) only the ✝ in a word surrounded with *, so I only need to capture the last three ✝ in this example. The output array should look something like this:
[0] => Array
(
[0] => ✝
[1] => 17
)
[1] => Array
(
[0] => ✝
[1] => 32
)
[2] => Array
(
[0] => ✝
[1] => 44
)
I've tried using (✝) but ofcourse this will select all instances including the words without asterisks. Then I've tried \*[^ ]*(✝)[^ ]*\* but this only gives me the last instance in one word. I've tried many other variations but all were wrong.
To clarify: The asterisk can be at all places in the string, but always at the beginning and end of a word. The opening asterisk always precedes a space except at the beginning of the string and the closing asterisk always ends with a space except at the end of the string. I must add that punctuation marks can be inside these asterisks. ✝ is exactly (and only) what I need to capture and can be at any position in a word.
You could make use of the \G anchor to get iterative matches between the *. The anchor matches either at the start of the string, or at the end of the previous match.
(?:\*|\G(?!^))[^&*]*(?>&(?!#)[^&*]*)*\K✝(?=[^*]*\*)
Explanation
(?: Non capture group
\* Match *
| Or
\G(?!^) Assert the end of the previous match, not at the start
) Close non capture group
[^&*]* Match 0+ times any char except & and *
(?> Atomic group
&(?!#) Match & only when not directly followed by #
[^&*]* Match 0+ times any char except & and *
)* Close atomic group and repeat 0+ times
\K Clear the match buffer (forget what is matched until now)
✝ Match literally
(?=[^*]*\*) Positive lookahead, assert a * at the right
Regex demo | Php demo
For example
$re = '/(?:\*|\G(?!^))[^&*]*(?>&(?!#)[^&*]*)*\K✝(?=[^*]*\*)/m';
$str = '✝his is *✝he* *in✝erne✝*';
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[0]);
Output
Array
(
[0] => Array
(
[0] => ✝
[1] => 16
)
[1] => Array
(
[0] => ✝
[1] => 31
)
[2] => Array
(
[0] => ✝
[1] => 43
)
)
Note The the offset is 1 less than the expected as the string starts counting at 0. See PREG_OFFSET_CAPTURE
If you want to match more variations, you could use a non capturing group and list the ones that you would accept to match. If you don't want to cross newline boundaries you can exclude matching those in the negated character class.
(?:\*|\G(?!^))[^&*\r\n]*(?>&(?!#)[^&*\\rn]*)*\K&#(?:x271D|169);(?=[^*\r\n]*\*)
Regex demo

Validate url parameters with preg_match

Valid example
12[red,green],13[xs,xl,xxl,some other text with chars like _&-##%]
number[anythingBut ()[]{},anythingBut ()[]{}](,number[anythingBut ()[]{},anythingBut ()[]{}]) or nothing
Full match 12[red,green]
Group 1 12
Group 2 red,green
Full match 13[xs,xl,xxl,some other text with chars like _&-##%]
Group 1 13
Group 2 xs,xl,xxl,some other text with chars like _&-##%
Not valid example
13[xs,xl,xxl 9974-?ds12[dfgd,dfgd]]
What I tried is this: (\d+(?=\[))\[([^\(\[\{\}\]\)]+)\], regex101 link with what I tried, but this also matches wrong input like given in the example.
If you just need to validate the input, you can add some anchors:
^(?:\d+\[[^\(\[\{\}\]\)]+\](?:,|$))+$
Regex101
If you also need to get all the matching parts, you can use another regex. Using only one will not work well.
$in = '12[red,green],13[xs,xl,xxl,some other text with chars like _&-##%],13[xs,xl,xxl 9974-?ds12[dfgd,dfgd]]';
preg_match_all('/(\d+)\[([^][{}()]+)(?=\](?:,|$))/', $in, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => 12[red,green
[1] => 13[xs,xl,xxl,some other text with chars like _&-##%
)
[1] => Array
(
[0] => 12
[1] => 13
)
[2] => Array
(
[0] => red,green
[1] => xs,xl,xxl,some other text with chars like _&-##%
)
)
Explanation:
/ : regex delimiter
(\d+) : group 1, 1 or more digits
\[ : open square bracket
( : start group 2
[^][{}()]+ : 1 or more any character that is not open or close parenthesis, brackets or square brackets
) : end group 2
(?= : positive lookahead, make sure we have after
\] : a close square bracket
(?:,|$) : non capture group, a comma or end of string
) : end group 2
/ : regex delimiter

Regex - avoid any unnecessary searches preg_match() PHP

Hello i have small problem with my regular expression.
For simple:
$pattern='/^(a([0-9]|[a-z])?|b(\=|\?)?)$/';
$subject='b=';
returns array:
Array
(
[0] => b=
[1] => b=
[2] =>
[3] => =
)
Index number 2 in this array is from a(...)? - my question: can i avoid this field in my result? I have very long pattern and my array is in 90% empty. Can i remove this empty fields by some magic characters?
Edit:
In my pattern i have something like that:
n(o|h)?(\+|\-|\(([+]?[0-9]+);([+]?[0-9]+)\))?
It will search strings like no+ or n(12;15). Can i do it simpler? And i have more text like this, it means i have something like that:
/^(n(o|h)?(\+|\-|\(([+]?[0-9]+);([+]?[0-9]+)\))?|i(o|h)?(\+|\-|\(([+]?[0-9]+);([+]?[0-9]+)\))?)$/
Regards
After reading your pattern, I assume that you can make it simpler with this version:
\A([in][oh]?)([+-]|\(\+?[0-9]+;\+?[0-9]+\))\z
demo
Note that i don't know exactly the captures you need, but you can add them as you want.
details:
\A # anchor for the start of the string
( # capture group 1:
[in] # a 'i' or a 'n'
[oh]? # a 'o' or a 'h' (optional)
)
( # capture group 2:
[+-] # a '+' or a '-'
| # OR
\(\+?[0-9]+;\+?[0-9]+\)
)
\z # anchor for the end of the string

preg_match_all for words in and outside of brackets

I have been sitting for hours to figure out a regExp for a preg_match_all function in php.
My problem is that i whant two different things from the string.
Say you have the string "Code is fun [and good for the brain.] But the [brain is] tired."
What i need from this an array of all the word outside of the brackets and the text in the brackets together as one string.
Something like this
[0] => Code
[1] => is
[2] => fun
[3] => and good for the brain.
[4] => But
[5] => the
[6] => brain is
[7] => tired.
Help much appreciated.
You could try the below regex also,
(?<=\[)[^\]]*|[.\w]+
DEMO
Code:
<?php
$data = "Code is fun [and good for the brain.] But the [brain is] tired.";
$regex = '~(?<=\[)[^\]]*|[.\w]+~';
preg_match_all($regex, $data, $matches);
print_r($matches);
?>
Output:
Array
(
[0] => Array
(
[0] => Code
[1] => is
[2] => fun
[3] => and good for the brain.
[4] => But
[5] => the
[6] => brain is
[7] => tired.
)
)
The first lookbind (?<=\[)[^\]]* matches all the characters which are present inside the braces [] and the second [.\w]+ matches one or more word characters or dot from the remaining string.
You can use the following regex:
(?:\[([\w .!?]+)\]+|(\w+))
The regex contains two alternations: one to match everything inside the two square brackets, and one to capture every other word.
This assumes that the part inside the square brackets doesn't contain any characters other than alphabets, digits, _, !, ., and ?. In case you need to add more punctuation, it should be easy enough to add them to the character class.
If you don't want to be that specific about what should be captured, then you can use a negated character class instead — specify what not to match instead of specifying what to match. The expression then becomes: (?:\[([^\[\]]+)\]|(\w+))
Visualization:
Explanation:
(?: # Begin non-capturing group
\[ # Match a literal '['
( # Start capturing group 1
[\w .!?]+ # Match everything in between '[' and ']'
) # End capturing group 1
\] # Match literal ']'
| # OR
( # Begin capturing group 2
\w+ # Match rest of the words
) # End capturing group 2
) # End non-capturing group
Demo

Validating US phone number with php/regex

EDIT: I've mixed and modified two of the answers given below to form the full function which now does what I had wanted and then some... So I figured I'd post it here in case anyone else comes looking for this same thing.
/*
* Function to analyze string against many popular formatting styles of phone numbers
* Also breaks phone number into it's respective components
* 3-digit area code, 3-digit exchange code, 4-digit subscriber number
* After which it validates the 10 digit US number against NANPA guidelines
*/
function validPhone($phone) {
$format_pattern = '/^(?:(?:\((?=\d{3}\)))?(\d{3})(?:(?<=\(\d{3})\))?[\s.\/-]?)?(\d{3})[\s\.\/-]?(\d{4})\s?(?:(?:(?:(?:e|x|ex|ext)\.?\:?|extension\:?)\s?)(?=\d+)(\d+))?$/';
$nanpa_pattern = '/^(?:1)?(?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?[2-9][0-9]{2}(?<!(11))[0-9]{4}(?<!(555(01([0-9][0-9])|1212)))$/';
//Set array of variables to false initially
$valid = array(
'format' => false,
'nanpa' => false,
'ext' => false,
'all' => false
);
//Check data against the format analyzer
if(preg_match($format_pattern, $phone, $matchset)) {
$valid['format'] = true;
}
//If formatted properly, continue
if($valid['format']) {
//Set array of new components
$components = array(
'ac' => $matchset[1], //area code
'xc' => $matchset[2], //exchange code
'sn' => $matchset[3], //subscriber number
'xn' => $matchset[4], //extension number
);
//Set array of number variants
$numbers = array(
'original' => $matchset[0],
'stripped' => substr(preg_replace('[\D]', '', $matchset[0]), 0, 10)
);
//Now let's check the first ten digits against NANPA standards
if(preg_match($nanpa_pattern, $numbers['stripped'])) {
$valid['nanpa'] = true;
}
//If the NANPA guidelines have been met, continue
if($valid['nanpa']) {
if(!empty($components['xn'])) {
if(preg_match('/^[\d]{1,6}$/', $components['xn'])) {
$valid['ext'] = true;
}
}
else {
$valid['ext'] = true;
}
}
//If the extension number is valid or non-existent, continue
if($valid['ext']) {
$valid['all'] = true;
}
}
return $valid['all'];
}
You can resolve this using a lookahead assertion. Basically what we're saying is I want a series of specific letters, (e, ex, ext, x, extension) followed by one or more number. But we also want to cover the case where there's no extension at all.
Side Note, you don't need brackets
around single characters like [\s] or
that [x] that follows. Also, you can group
characters that are meant to be in the same
spot, so instead of \s?\.?/?, you can
use [\s\./]? which means "one of any of those
characters"
Here's an update with regex that resolves your comment here as well. I've added the explanation in the actual code.
<?php
$sPattern = "/^
(?: # Area Code
(?:
\( # Open Parentheses
(?=\d{3}\)) # Lookahead. Only if we have 3 digits and a closing parentheses
)?
(\d{3}) # 3 Digit area code
(?:
(?<=\(\d{3}) # Closing Parentheses. Lookbehind.
\) # Only if we have an open parentheses and 3 digits
)?
[\s.\/-]? # Optional Space Delimeter
)?
(\d{3}) # 3 Digits
[\s\.\/-]? # Optional Space Delimeter
(\d{4})\s? # 4 Digits and an Optional following Space
(?: # Extension
(?: # Lets look for some variation of 'extension'
(?:
(?:e|x|ex|ext)\.? # First, abbreviations, with an optional following period
|
extension # Now just the whole word
)
\s? # Optionsal Following Space
)
(?=\d+) # This is the Lookahead. Only accept that previous section IF it's followed by some digits.
(\d+) # Now grab the actual digits (the lookahead doesn't grab them)
)? # The Extension is Optional
$/x"; // /x modifier allows the expanded and commented regex
$aNumbers = array(
'123-456-7890x123',
'123.456.7890x123',
'123 456 7890 x123',
'(123) 456-7890 x123',
'123.456.7890x.123',
'123.456.7890 ext. 123',
'123.456.7890 extension 123456',
'123 456 7890',
'123-456-7890ex123',
'123.456.7890 ex123',
'123 456 7890 ext123',
'456-7890',
'456 7890',
'456 7890 x123',
'1234567890',
'() 456 7890'
);
foreach($aNumbers as $sNumber) {
if (preg_match($sPattern, $sNumber, $aMatches)) {
echo 'Matched ' . $sNumber . "\n";
print_r($aMatches);
} else {
echo 'Failed ' . $sNumber . "\n";
}
}
?>
And The Output:
Matched 123-456-7890x123
Array
(
[0] => 123-456-7890x123
[1] => 123
[2] => 456
[3] => 7890
[4] => 123
)
Matched 123.456.7890x123
Array
(
[0] => 123.456.7890x123
[1] => 123
[2] => 456
[3] => 7890
[4] => 123
)
Matched 123 456 7890 x123
Array
(
[0] => 123 456 7890 x123
[1] => 123
[2] => 456
[3] => 7890
[4] => 123
)
Matched (123) 456-7890 x123
Array
(
[0] => (123) 456-7890 x123
[1] => 123
[2] => 456
[3] => 7890
[4] => 123
)
Matched 123.456.7890x.123
Array
(
[0] => 123.456.7890x.123
[1] => 123
[2] => 456
[3] => 7890
[4] => 123
)
Matched 123.456.7890 ext. 123
Array
(
[0] => 123.456.7890 ext. 123
[1] => 123
[2] => 456
[3] => 7890
[4] => 123
)
Matched 123.456.7890 extension 123456
Array
(
[0] => 123.456.7890 extension 123456
[1] => 123
[2] => 456
[3] => 7890
[4] => 123456
)
Matched 123 456 7890
Array
(
[0] => 123 456 7890
[1] => 123
[2] => 456
[3] => 7890
)
Matched 123-456-7890ex123
Array
(
[0] => 123-456-7890ex123
[1] => 123
[2] => 456
[3] => 7890
[4] => 123
)
Matched 123.456.7890 ex123
Array
(
[0] => 123.456.7890 ex123
[1] => 123
[2] => 456
[3] => 7890
[4] => 123
)
Matched 123 456 7890 ext123
Array
(
[0] => 123 456 7890 ext123
[1] => 123
[2] => 456
[3] => 7890
[4] => 123
)
Matched 456-7890
Array
(
[0] => 456-7890
[1] =>
[2] => 456
[3] => 7890
)
Matched 456 7890
Array
(
[0] => 456 7890
[1] =>
[2] => 456
[3] => 7890
)
Matched 456 7890 x123
Array
(
[0] => 456 7890 x123
[1] =>
[2] => 456
[3] => 7890
[4] => 123
)
Matched 1234567890
Array
(
[0] => 1234567890
[1] => 123
[2] => 456
[3] => 7890
)
Failed () 456 7890
The current REGEX
/^[\(]?(\d{0,3})[\)]?[\.]?[\/]?[\s]?[\-]?(\d{3})[\s]?[\.]?[\/]?[\-]?(\d{4})[\s]?[x]?(\d*)$/
has a lot of issues, resulting in it matching all of the following, among others:
(0./ -000 ./-0000 x00000000000000000000000)
()./1234567890123456789012345678901234567890
\)\-555/1212 x
I think this REGEX is closer to what you're looking for:
/^(?:(?:(?:1[.\/\s-]?)(?!\())?(?:\((?=\d{3}\)))?((?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?[2-9])(?:\((?<=\(\d{3}))?)?[.\/\s-]?([0-9]{2}(?<!(11)))[.\/\s-]?([0-9]{4}(?<!(555(01([0-9][0-9])|1212))))(?:[\s]*(?:(?:x|ext|extn|ex)[.:]*|extension[:]?)?[\s]*(\d+))?$/
or, exploded:
<?
$pattern =
'/^ # Matches from beginning of string
(?: # Country / Area Code Wrapper [not captured]
(?: # Country Code Wrapper [not captured]
(?: # Country Code Inner Wrapper [not captured]
1 # 1 - CC for United States and Canada
[.\/\s-]? # Character Class ('.', '/', '-' or whitespace) for allowed (optional, single) delimiter between Country Code and Area Code
) # End of Country Code
(?!\() # Lookahead, only allowed if not followed by an open parenthesis
)? # Country Code Optional
(?: # Opening Parenthesis Wrapper [not captured]
\( # Opening parenthesis
(?=\d{3}\)) # Lookahead, only allowed if followed by 3 digits and closing parenthesis [lookahead never captured]
)? # Parentheses Optional
((?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?[2-9]) # 3-digit NANPA-valid Area Code [captured]
(?: # Closing Parenthesis Wrapper [not captured]
\( # Closing parenthesis
(?<=\(\d{3}) # Lookbehind, only allowed if preceded by 3 digits and opening parenthesis [lookbehind never captured]
)? # Parentheses Optional
)? # Country / Area Code Optional
[.\/\s-]? # Character Class ('.', '/', '-' or whitespace) for allowed (optional, single) delimiter between Area Code and Central-office Code
([0-9]{2}(?<!(11))) # 3-digit NANPA-valid Central-office Code [captured]
[.\/\s-]? # Character Class ('.', '/', '-' or whitespace) for allowed (optional, single) delimiter between Central-office Code and Subscriber number
([0-9]{4}(?<!(555(01([0-9][0-9])|1212)))) # 4-digit NANPA-valid Subscriber Number [captured]
(?: # Extension Wrapper [not captured]
[\s]* # Character Class for allowed delimiters (optional, multiple) between phone number and extension
(?: # Wrapper for extension description text [not captured]
(?:x|ext|extn|ex)[.:]* # Abbreviated extensions with character class for terminator (optional, multiple) [not captured]
| # OR
extension[:]? # The entire word extension with character class for optional terminator
)? # Marker for Extension optional
[\s]* # Character Class for allowed delimiters (optional, multiple) between extension description text and actual extension
(\d+) # Extension [captured if present], required for extension wrapper to match
)? # Entire extension optional
$ # Matches to end of string
/x'; // /x modifier allows the expanded and commented regex
?>
This modification provides several improvements.
It creates a configurable group of items that can match as the extension. You can add additional delimiters for the extension. This was the original request. The extension also allows for a colon after the extension delimter.
It converts the sequence of 4 optional delimiters (dot, whitespace, slash or hyphen) into a character class that matches only a single one.
It groups items appropriately. In the given example, you can have the opening parentheses without an area code between them, and you can have the extension mark (space-x) without an extension. This alternate regular expression requires either a complete area code or none and either a complete extension or none.
The 4 components of the number (area code, central office code, phone number and extension) are the back-referenced elements that feed into $matches in preg_match().
Uses lookahead/lookbehind to require matched parentheses in the area code.
Allows for a 1- to be used before the number. (This assumes that all numbers are US or Canada numbers, which seems reasonable since the match is ultimately made against NANPA restrictions. Also disallows mixture of country code prefix and area code wrapped in parentheses.
It merges in the NANPA rules to eliminate non-assignable telephone numbers.
It eliminates area codes in the form 0xx, 1xx 37x, 96x, x9x and x11 which are invalid NANPA area codes.
It eliminates central office codes in the form 0xx and 1xx (invalid NANPA central office codes).
It eliminates numbers with the form 555-01xx (non-assignable from NANPA).
It has a few minor limitations. They're probably unimportant, but are being noted here.
There is nothing in place to require that the same delimiter is used repeatedly, allowing for numbers like 800-555.1212, 800/555 1212, 800 555.1212 etc.
There is nothing in place to restrict the delimiter after an area code with parentheses, allowing for numbers like (800)-555-1212 or (800)/5551212.
The NANPA rules are adapted from the following REGEX, found here: http://blogchuck.com/2010/01/php-regex-for-validating-phone-numbers/
/^(?:1)?(?(?!(37|96))[2-9][0-8][0-9](?<!(11)))?[2-9][0-9]{2}(?<!(11))[0-9]{4}(?<!(555(01([0-9][0-9])|1212)))$/
Why not convert any series of letters to be "x". Then that way you would have all possibilities converted to be "x".
OR
Check for 3digits, 3digits, 4digits, 1orMoreDigits and disregard any other characters inbetween
Regex:
([0-9]{3}).*?([0-9]{3}).*?([0-9]{4}).+?([0-9]{1,})
Alternatively, you could use some pretty simple and straightforward JavaScript to force the user to enter in a much more specified format. The Masked Input Plugin ( http://digitalbush.com/projects/masked-input-plugin/ ) for jQuery allows you to mask an HTML input as a telephone number, only allowing the person to enter a number in the format xxx-xxx-xxxx. It doesn't solve your extension issues, but it does provide for a much cleaner user experience.
Well, you could modify the regex, but it won't be very nice -- should you allow "extn"? How about "extentn"? How about "and then you have to dial"?
I think the "right" way to do this is to add a separate, numerical, extension form box.
But if you really want the regex, I think I've fixed it up. Hint: you don't need [x] for a single character, x will do.
/^\(?(\d{0,3})\)?(\.|\/)|\s|\-)?(\d{3})(\.|\/)|\s|\-)?(\d{4})\s?(x|ext)?(\d*)$/
You allowed a dot, a slash, a dash, and a whitespace character. You should allow only one of these options. You'll need to update the references to $matches; the useful groups are now 0, 2, and 4.
P.S. This is untested, since I don't have a reference implentation of PHP running. Apologies for mistakes, please let me know if you find any and I'll try to fix them.
Edit
This is summed up much better than I can here.

Categories