Check for url slash count pattern - php

I am trying to write an regualr expression to match invalid url patterns
I want to match following pattern :
/article/test-string/
Above is invalid url, but following are valid
/article/abc/test-string/ and /article/xyz/abc/test-string/
I want to match those which have only one value after article slash.
Please help, I am trying using following, but it is matching all :
/article/(.*)/$

.* matches 0 or more of any character so /article/(.*)/$ will match all the URIs that have /article/ in it.
You can use this regex to validate only only one non-slash component after /article/:
$re = '~^/article/[^/]*/$~';
[^/]* # matches 0 or more of any character that is not /
/$ # matches / in the end
~ is used as regex delimiter to avoid escaping /

~^/article/(.*)+/(.*)/$~gm
^ assert position at start of a line
/article/ matches the characters /article/ literally (case sensitive)
1st Capturing group (.*)+
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
/ matches the character / literally
2nd Capturing group (.*)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
/ matches the character / literally
$ assert position at end of a line
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
$re = "~^/article/(.*)+/(.*)/$~m";
$str = "/article/xyz/abc/test-string/\n/article/test-string/";
preg_match_all($re, $str, $matches);
source https://regex101.com/

Related

Regular expression to find empty functions

I would like to use a regular expression that finds only functions that are empty in php files
For example
function name_not_important()
{
}
Regex can be function\s[^\(]+\([^)]*\)(\n)*{(\n)*}
From https://regex101.com/:
function matches the characters function literally (case sensitive) \s matches any whitespace character (equivalent to [\r\n\t\f\v ])
Match a single character not present in the list below [^(]
matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy) ( matches the
character ( literally (case sensitive) ( matches the character (
literally (case sensitive) Match a single character not present in the
list below [^)]
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy) ) matches the
character ) literally (case sensitive) ) matches the character )
literally (case sensitive) 1st Capturing Group (\n)*
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy) A repeated capturing
group will only capture the last iteration. Put a capturing group
around the repeated group to capture all iterations or use a
non-capturing group instead if you're not interested in the data \n
matches a line-feed (newline) character (ASCII 10) { matches the
character { literally (case sensitive) 2nd Capturing Group (\n)*
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy) A repeated capturing
group will only capture the last iteration. Put a capturing group
around the repeated group to capture all iterations or use a
non-capturing group instead if you're not interested in the data \n
matches a line-feed (newline) character (ASCII 10) } matches the
character } literally (case sensitive) Global pattern flags g
modifier: global. All matches (don't return after first match) m
modifier: multi line. Causes ^ and $ to match the begin/end of each
line (not only begin/end of string)
Note: This regex assumes that indentation of braces are in alignment.

Match regular expression specific character quantities in any order

I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks
One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo
Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)
You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.

Laravel validate url name and protocol

I need validate url. I need allow only main url sites, example:
http://example.com
https://example.com
I need prevent these urls on my site:
http://example.com/page/blahblahblah
https://example.com/other/bloa
I use regex:
'url' => ['required', 'url', 'regex:/((http:|https:)\/\/)[^\/]+/']
When user insert url, he can insert http://example.com/page/blahblahblah why? My regex is not working.. Validation is passing
You can use the following pattern to ensure a URL does not contain subdirectories:
^(?:\S+:\/\/)?[^\/]+\/?$
Explanation:
^ asserts position at start of the string
Non-capturing group (?:\S+://)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
\S+ matches any non-whitespace character (equal to [^\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
: matches the character : literally (case sensitive)
/ matches the character / literally (case sensitive)
/ matches the character / literally (case sensitive)
Match a single character not present in the list below [^/]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
/ matches the character / literally (case sensitive)
/? matches the character / literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
You could write a custom validator and use a combination of filter_var and parse_url?
Something as follows will do the job...
<?php
$url = "http://example.com/page/blahblahblah";
if (!filter_var($url, FILTER_VALIDATE_URL)) {
return false;
}
$parts = parse_url($url);
echo "{$parts['scheme']}://{$parts['host']}";

Regular expression to fix my long string that only partially repeats format

I have this string that I want to clean up using PHP and regex:
Name/__text,Password/__text,Profile/__text,Locale/__text,UserType/__text,Passwor
dUpdateDate/__text,Columns/0/Name/__text,Columns/0/Label/__text,Columns/0/Order/
__text,Columns/1/Name/__text,Columns/1/Label/__text,Columns/1/Order/__text,Colum
ns/2/Name/__text,Columns/2/Label/__text,Columns/2/Order/__text,Columns/3/Name/__
text,Columns/3/Label/__text,Columns/3/Order/__text,Columns/4/Name/__text,Columns
/4/Label/__text,Columns/4/Order/__text,Columns/5/Name/__text,Columns/5/Label/__t
ext,Columns/5/Order/__text,Columns/6/Name/__text,Columns/6/Label/__text,Columns/
6/Order/__text,Columns/7/Name/__text,Columns/7/Label/__text,Columns/7/Order/__te
xt,Columns/8/Name/__text,Columns/8/Label/__text,Columns/8/Order/__text,Columns/9
/Name/__text,Columns/9/Label/__text,Columns/9/Order/__text,Columns/10/Name/__tex
t,Columns/10/Label/__text,Columns/10/Order/__text,Columns/11/Name/__text,Columns
/11/Label/__text,Columns/11/Order/__text,Columns/12/Name/__text,Columns/12/Label
/__text,Columns/12/Order/__text,Columns/13/Name/__text,Columns/13/Label/__text,C
olumns/13/Order/__text,MailAddress/__text,Description/__text,Columns/14/Name/__t
ext,Columns/14/Label/__text,Columns/14/Order/__text,Columns/15/Name/__text,Colum
ns/15/Label/__text,Columns/15/Order/__text
I want it to be Password,Profile,Locale,UserType,PasswordUpdateDate,Name,Label,Order...
I'm removing the /text or /__text after the word, but there are only sometimes things like Columns/0/ before the word to remove.
I tried this (below) regular expression in the regex tester, but it misses the first few items that don't have the Columns/2/ type of thing before it. I can't use a regex that will grab what's before /__text, because the / before the word is optional, like for the first Name. Any ideas how to do this? It's tough to search for this pattern or info on how to create it. Any help would be great!
[A-Za-z\/0-9]+\/([A-Za-z]+)\/[__text]
Probably easier to just match what you want and then join them on commas. Match a word (\w+) followed by \__text:
preg_match_all('#(\w+)/__text#', $string, $matches);
$result = implode(',', $matches[1]);
You could also use ([A-Za-z0-9]+) and add anything else instead of (\w+) in case it could be First_Name, First-Name, Firstname0 etc...
Regex:
(\w+)\/__text(?:(,)(?:Columns\/\d+\/)*)*
Demo
Explanation:
/(\w+)\/__text(?:(,)(?:Columns\/\d+\/)*)*/g
1st Capturing Group (\w+)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\/ matches the character / literally (case sensitive)
__text matches the characters __text literally (case sensitive)
Non-capturing group (?:(,)(?:Columns\/\d+\/)*)*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (,)
, matches the character , literally (case sensitive)
Non-capturing group (?:Columns\/\d+\/)*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Columns matches the characters Columns literally (case sensitive)
\/ matches the character / literally (case sensitive)
\d+ matches a digit (equal to [0-9])
\/ matches the character / literally (case sensitive)

Need help understanding preg_match regular expression

A regular expression in preg_match is given as /server\-([^\-\.\d]+)(\d+)/. Can someone help me understand what this means? I see that the string starts with server- but I dont get ([^\-\.\d]+)(\d+)'
[ ] -> Match anything inside the square brackets for ONE character position once and only once, for example, [12] means match the target to 1 and if that does not match then match the target to 2 while [0123456789] means match to any character in the range 0 to 9.
- -> The - (dash) inside square brackets is the 'range separator' and allows us to define a range, in our example above of [0123456789] we could rewrite it as [0-9].
You can define more than one range inside a list, for example, [0-9A-C] means check for 0 to 9 and A to C (but not a to c).
NOTE: To test for - inside brackets (as a literal) it must come first or last, that is, [-0-9] will test for - and 0 to 9.
^ -> The ^ (circumflex or caret) inside square brackets negates the expression (we will see an alternate use for the circumflex/caret outside square brackets later), for example, [^Ff] means anything except upper or lower case F and [^a-z] means everything except lower case a to z.
You can check more explanations about it in the source I got this information: http://www.zytrax.com/tech/web/regex.htm
And if u want to test, u can try this one: http://gskinner.com/RegExr/
Here's the explanation:
# server\-([^\-\.\d]+)(\d+)
#
# Match the characters “server” literally «server»
# Match the character “-” literally «\-»
# Match the regular expression below and capture its match into backreference number 1 «([^\-\.\d]+)»
# Match a single character NOT present in the list below «[^\-\.\d]+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# A - character «\-»
# A . character «\.»
# A single digit 0..9 «\d»
# Match the regular expression below and capture its match into backreference number 2 «(\d+)»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
You can use programs such as RegexBuddy if you intend to work with regexes and are willing to spend some funds.
You can also use this free web based explanation utility.
^ means not one of the following characters inside the brackets
\- \. are the - and . characters
\d is a number
[^\-\.\d]+ means on of more of the characters inside the bracket, so one or more of anything not a -, . or a number.
(\d+) one or more number
Here is the explanation given by the perl module YAPE::Regex::Explain
The regular expression:
(?-imsx:server\-([^\-\.\d]+)(\d+))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
server 'server'
----------------------------------------------------------------------
\- '-'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^\-\.\d]+ any character except: '\-', '\.', digits
(0-9) (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Categories