variable length masking with preg_replace

variable length masking with preg_replace - php

I am masking all characters between single quotes (inclusively) within a string using preg_replace_callback(). But I would like to only use preg_replace() if possible, but haven't been able to figure it out. Any help would be appreciated.
This is what I have using preg_replace_callback() which produces the correct output:
function maskCallback( $matches ) {
return str_repeat( '-', strlen( $matches[0] ) );
}
function maskString( $str ) {
return preg_replace_callback( "('.*?')", 'maskCallback', $str );
}
$str = "TEST 'replace''me' ok 'me too'";
echo $str,"\n";
echo $maskString( $str ),"\n";
Output is:
TEST 'replace''me' ok 'me too'
TEST ------------- ok --------
I have tried using:
preg_replace( "/('.*?')/", '-', $str );
but the dashes get consumed, e.g.:
TEST -- ok -
Everything else I have tried doesn't work either. (I'm obviously not a regex expert.) Is this possible to do? If so, how?

Yes you can do it, (assuming that quotes are balanced) example:
$str = "TEST 'replace''me' ok 'me too'";
$pattern = "~[^'](?=[^']*(?:'[^']*'[^']*)*+'[^']*\z)|'~";
$result = preg_replace($pattern, '-', $str);
The idea is: you can replace a character if it is a quote or if it is followed by an odd number of quotes.
Without quotes:
$pattern = "~(?:(?!\A)\G|(?:(?!\G)|\A)'\K)[^']~";
$result = preg_replace($pattern, '-', $str);
The pattern will match a character only when it is contiguous to a precedent match (In other words, when it is immediately after the last match) or when it is preceded by a quote that is not contiguous to the precedent match.
\G is the position after the last match (at the beginning it is the start of the string)
pattern details:
~ # pattern delimiter
(?: # non capturing group: describe the two possibilities
# before the target character
(?!\A)\G # at the position in the string after the last match
# the negative lookbehind ensure that this is not the start
# of the string
| # OR
(?: # (to ensure that the quote is a not a closing quote)
(?!\G) # not contiguous to a precedent match
| # OR
\A # at the start of the string
)
' # the opening quote
\K # remove all precedent characters from the match result
# (only one quote here)
)
[^'] # a character that is not a quote
~
Note that since the closing quote is not matched by the pattern, the following characters that are not quotes can't be matched because there is no precedent match.
EDIT:
The (*SKIP)(*FAIL) way:
Instead of testing if a single quote is not a closing quote with (?:(?!\G)|\A)' like in the precedent pattern, you can break the match contiguity on closing quotes using the backtracking control verbs (*SKIP) and (*FAIL) (That can be shorten to (*F)).
$pattern = "~(?:(?!\A)\G|')(?:'(*SKIP)(*F)|\K[^'])~";
$result = preg_replace($pattern, '-', $str);
Since the pattern fails on each closing quotes, the following characters will not be matched until the next opening quote.
The pattern may be more efficient written like this:
$pattern = "~(?:\G(?!\A)(?:'(*SKIP)(*F))?|'\K)[^']~";
(You can also use (*PRUNE) in place of (*SKIP).)

Short answer : It's possible !!!
Use the following pattern
' # Match a single quote
(?= # Positive lookahead, this basically makes sure there is an odd number of single quotes ahead in this line
(?:(?:[^'\r\n]*'){2})* # Match anything except single quote or newlines zero or more times followed by a single quote, repeat this twice and repeat this whole process zero or more times (basically a pair of single quotes)
(?:[^'\r\n]*'[^'\r\n]*(?:\r?\n|$)) # You guessed, this is to match a single quote until the end of line
)
| # or
\G(?<!^) # Preceding contiguous match (not beginning of line)
[^'] # Match anything that's not a single quote
(?= # Same as above
(?:(?:[^'\r\n]*'){2})* # Same as above
(?:[^'\r\n]*'[^'\r\n]*(?:\r?\n|$)) # Same as above
)
|
\G(?<!^) # Preceding contiguous match (not beginning of line)
' # Match a single quote
Make sure to use the m modifier.
Online demo.
Long answer : It's a pain :)
Unless not only you but your whole team loves regex, you might think of using this regex but remember that this is insane and quite difficult to grasp for beginners. Also readability goes (almost) always first.
I'll break the idea of how I did write such a regex:
1) We first need to know what we actually want to replace, we want to replace every character (including the single quotes) that's between two single quotes with a hyphen.
2) If we're going to use preg_replace() that means our pattern needs to match one single character each time.
3) So the first step would be obvious : '.
4) We'll use \G which means match beginning of string or the contiguous character that we matched earlier. Take this simple example ~a|\Gb~. This will match a or b if it's at the beginning or b if the previous match was a. See this demo.
5) We don't want anything to do with beginning of string So we'll use \G(?<!^).
6) Now we need to match anything that's not a single quote ~'|\G(?<!^)[^']~.
7) Now begins the real pain, how do we know that the above pattern wouldn't go match c in 'ab'c ? Well it will, we need to count the single quotes...
Let's recap:
a 'bcd' efg 'hij'
^ It will match this first
^^^ Then it will match these individually with \G(?<!^)[^']
^ It will match since we're matching single quotes without checking anything
^^^^^ And it will continue to match ...
What we want could be done in those 3 rules:
a 'bcd' efg 'hij'
1 ^ Match a single quote only if there is an odd number of single quotes ahead
2 ^^^ Match individually those characters only if there is an odd number of single quotes ahead
3 ^ Match a single quote only if there was a match before this character
8) Checking if there is an odd number of single quotes could be done if we knew how to match an even number :
(?: # non-capturing group
(?: # non-capturing group
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
' # Match a single quote
){2} # Repeat 2 times (We'll be matching 2 single quotes)
)* # Repeat all this zero or more times. So we match 0, 2, 4, 6 ... single quotes
9) An odd number would be easy now, we just need to add :
(?:
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
' # Match a single quote
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
(?:\r?\n|$) # End of line
)
10) Merging above in a single lookahead:
(?=
(?: # non-capturing group
(?: # non-capturing group
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
' # Match a single quote
){2} # Repeat 2 times (We'll be matching 2 single quotes)
)* # Repeat all this zero or more times. So we match 0, 2, 4, 6 ... single quotes
(?:
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
' # Match a single quote
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
(?:\r?\n|$) # End of line
)
)
11) Now we need to merge all 3 rules we defined earlier:
~ # A modifier
#################################### Rule 1 ####################################
' # A single quote
(?= # Lookahead to make sure there is an odd number of single quotes ahead
(?: # non-capturing group
(?: # non-capturing group
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
' # Match a single quote
){2} # Repeat 2 times (We'll be matching 2 single quotes)
)* # Repeat all this zero or more times. So we match 0, 2, 4, 6 ... single quotes
(?:
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
' # Match a single quote
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
(?:\r?\n|$) # End of line
)
)
| # Or
#################################### Rule 2 ####################################
\G(?<!^) # Preceding contiguous match (not beginning of line)
[^'] # Match anything that's not a single quote
(?= # Lookahead to make sure there is an odd number of single quotes ahead
(?: # non-capturing group
(?: # non-capturing group
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
' # Match a single quote
){2} # Repeat 2 times (We'll be matching 2 single quotes)
)* # Repeat all this zero or more times. So we match 0, 2, 4, 6 ... single quotes
(?:
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
' # Match a single quote
[^'\r\n]* # Match anything that's not a single quote or newline, zero or more times
(?:\r?\n|$) # End of line
)
)
| # Or
#################################### Rule 3 ####################################
\G(?<!^) # Preceding contiguous match (not beginning of line)
' # Match a single quote
~x
Online regex demo.
Online PHP demo

Well, just for the fun of it and I seriously wouldn't recommend something like that because I try to avoid lookarounds when they are not necessary, here's one regex that uses the concept of 'back to the future':
(?<=^|\s)'(?!\s)|(?!^)(?<!'(?=\s))\G.
regex101 demo
Okay, it's broken down into two parts:
1. Matching the beginning single quote
(?<=^|\s)'(?!\s)
The rules that I believe should be established here are:
There should be either ^ or \s before the beginning quote (hence (?<=^|\s)).
There is no \s after the beginning quote (hence (?!\s)).
2. Matching the things inside the quote, and the ending quote
(?!^)\G(?<!'(?=\s)).
The rules that I believe should be established here are:
The character can be any character (hence .)
The match is 1 character long and following the immediate previous match (hence (?!^)\G).
There should be no single quote, that is itself followed by a space, before it (hence (?<!'(?=\s)) and this is the 'back to the future' part). This effectively will not match a \s that is preceded by a ' and will mark the end of the characters wrapped between single quotes. In other words, the closing quote will be identified as a single quote followed by \s.
If you prefer pictures...

Related

Match regular expression specific character quantities in any order

I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks

One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo

Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)

You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.

Regex conditional match for escaped apostrophe

$str = "'ei-1395529080',0,0,1,1,'Name','email#domain.com','Sentence with \'escaped apostrophes\', which \'should\' be on one line!','no','','','yes','6.50',NULL";
preg_match_all("/(')?(.*?)(?(1)(?!\\\\)'),/s", $str.',', $values);
print_r($values);
I'm trying to write a regex with these goals:
Return an array of , separated values (note I append to $str on line 2)
If the array item starts with an ', match the closing '
But, if it is escaped like \', keep capturing the value until an ' with no preceeding \ is found
If you try out those lines, it misbehaves when it encounters \',
Can anyone please explain what is happening and how to fix it? Thanks.

This is how I would go about solving this:
('(?>\\.|.)*?'|[^\,]+)
Regex101
Explanation:
( Start capture group
' Match an apostrophe
(?> Atomically match the following
\\. Match \ literally and then any single character
|. Or match just any single character
) Close atomic group
*?' Match previous group 0 or more times until the first '
|[^\,] OR match any character that is not a comma (,)
+ Match the previous regex [^\,] one or more times
) Close capture group
A note about how the atomic group works:
Say I had this string 'a \' b'
The atomic group (?>\\.|.) will match this string in the following way at each step:
'
a
\'
b
'
If the match ever fails in the future, it will not attempt to match \' as \, ' but will always match/use the first option if it fits.
If you need help escaping the regex, here's the escaped version: ('(?>\\\\.|.)*?'|[^\\,]+)
although i spent about 10 hours writing regex yesterday, i'm not too experienced with it. i've researched escaping backslashes but was confused by what i read. what's your reason for not escaping in your original answer? does it depend on different languages/platforms? ~OP
Section on why you have to escape regex in programming languages.
When you write the following string:
"This is on one line.\nThis is on another line."
Your program will interpret the \n literally and see it the following way:
"This is on one line.
This is on another line."
In a regular expression, this can cause a problem. Say you wanted to match all characters that were not line breaks. This is how you would do that:
"[^\n]*"
However, the \n is interpreted literally when written in a programming language and will be seen the following way:
"[^
]*"
Which, I'm sure you can tell, is wrong. So to fix this we escape strings. By placing a backslash in front of the first backslash when can tell the programming language to look at \n differently (or any other escape sequence: \r, \t, \\, etc). On a basic level, escape trade the original escape sequence \n for another escape sequence and then a character \\, n. This is how escaping affects the regex above.
"[^\\n]*"
The way the programming language will see this is the following:
"[^\n]*"
This is because \\ is an escape sequence that means "When you see \\ interpret it literally as \". Because \\ has already been consumed and interpreted, the next character to read is n and therefore is no longer part of the escape sequence.
So why do I have 4 backslashes in my escaped version? Let's take a look:
(?>\\.|.)
So this is the original regex we wrote. We have two consecutive backslashes. This section (\\.) of the regular expression means "Whenever you see a backslash and then any character, match". To preserve this interpretation for the regex engine, we have to escape each, individual backslash.
\\ \\ .
So all together it looks like this:
(?>\\\\.|.)

Something like this:
(?:'([^'\\]*(?:\\.[^'\\]*)*)'|([^,]+))
# (?:'([^'\\]*(?:\\.[^'\\]*)*)'|([^,]+))
#
# Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Greedy quantifiers
#
# Match the regular expression below «(?:'([^'\\]*(?:\\.[^'\\]*)*)'|([^,]+))»
# Match this alternative (attempting the next alternative only if this one fails) «'([^'\\]*(?:\\.[^'\\]*)*)'»
# Match the character “'” literally «'»
# Match the regex below and capture its match into backreference number 1 «([^'\\]*(?:\\.[^'\\]*)*)»
# Match any single character NOT present in the list below «[^'\\]*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# The literal character “'” «'»
# The backslash character «\\»
# Match the regular expression below «(?:\\.[^'\\]*)*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the backslash character «\\»
# Match any single character that is NOT a line break character (line feed) «.»
# Match any single character NOT present in the list below «[^'\\]*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# The literal character “'” «'»
# The backslash character «\\»
# Match the character “'” literally «'»
# Or match this alternative (the entire group fails if this one fails to match) «([^,]+)»
# Match the regex below and capture its match into backreference number 2 «([^,]+)»
# Match any character that is NOT a “,” «[^,]+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
https://regex101.com/r/pO0cQ0/1
preg_match_all('/(?:\'([^\'\\\\]*(?:\\\\.[^\'\\\\]*)*)\'|([^,]+))/', $subject, $result, PREG_SET_ORDER);
for ($matchi = 0; $matchi < count($result); $matchi++) {
// #todo here use $result[$matchi][1] to match quoted strings (to then process escaped quotes)
// #todo here use $result[$matchi][2] to match unquoted strings
}

explanation of preg_replace function in PHP [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
The preg_replace() function has so many possible values, like:
<?php
$patterns = array('/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/', '/^\s*{(\w+)}\s*=/');
$replace = array('\3/\4/\1\2', '$\1 =');
echo preg_replace($patterns, $replace, '{startDate} = 1999-5-27');
What does:
\3/\4/\1\2
And:
/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/','/^\s*{(\w+)}\s*=/
mean?
Is there any information available to help understand the meanings at one place? Any help or documents would be appreciated! Thanks in Advance.

Take a look at http://www.tutorialspoint.com/php/php_regular_expression.htm
\3 is the captured group 3
\4 is the captured group 4
...an so on...
\w means any word character.
\d means any digit.
\s means any white space.
+ means match the preceding pattern at least once or more.
* means match the preceding pattern 0 times or more.
{n,m} means match the preceding pattern at least n times to m times max.
{n} means match the preceding pattern exactly n times.
(n,} means match the preceding pattern at least n times or more.
(...) is a captured group.

So, the first thing to point out, is that we have an array of patterns ($patterns), and an array of replacements ($replace). Let's take each pattern and replacement and break it down:
Pattern:
/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/
Replacement:
\3/\4/\1\2
This takes a date and converts it from a YYYY-M-D format to a M/D/YYYY format. Let's break down it's components:
/ ... / # The starting and trailing slash mark the beginning and end of the expression.
(19|20) # Matches either 19 or 20, capturing the result as \1.
# \1 will be 19 or 20.
(\d{2}) # Matches any two digits (must be two digits), capturing the result as \2.
# \2 will be the two digits captured here.
- # Literal "-" character, not captured.
(\d{2}) # Either 1 or 2 digits, capturing the result as \3.
# \3 will be the one or two digits captured here.
- # Literal "-" character, not captured.
(\d{2}) # Either 1 or 2 digits, capturing the result as \4.
# \4 will be the one or two digits captured here.
This match is replaced by \3/\4/\1\2, which means:
\3 # The two digits captured in the 3rd set of `()`s, representing the month.
/ # A literal '/'.
\4 # The two digits captured in the 4rd set of `()`s, representing the day.
/ # A literal '/'.
\1 # Either '19' or '20'; the first two digits captured (first `()`s).
\2 # The two digits captured in the 2nd set of `()`s, representing the last two digits of the year.
Pattern:
/^\s*{(\w+)}\s*=/
Replacement:
$\1 =
This takes a variable name encoded as {variable} and converts it to $variable = <date>. Let's break it down:
/ ... / # The starting and trailing slash mark the beginning and end of the expression.
^ # Matches the beginning of the string, anchoring the match.
# If the following character isn't matched exactly at the beginning of the string, the expression won't match.
\s* # Any whitespace character. This can include spaces, tabs, etc.
# The '*' means "zero or more occurrences".
# So, the whitespace is optional, but there can be any amount of it at the beginning of the line.
{ # A literal '{' character.
(\w+) # Any 'word' character (a-z, A-Z, 0-9, _). This is captured in \1.
# \1 will be the text contained between the { and }, and is the only thing "captured" in this expression.
} # A literal '}' character.
\s* # Any whitespace character. This can include spaces, tabs, etc.
= # A literal '=' character.
This match is replaced by $\1 =, which means:
$ # A literal '$' character.
\1 # The text captured in the 1st and only set of `()`s, representing the variable name.
# A literal space.
= # A literal '=' character.
Lastly, I wanted to show you a couple of resources. The regex-format you're using is called "PCRE", or Perl-Compatible Regular Expressions. Here is a quick cheat-sheet on PCRE for PHP. Over the last few years, several tools have been popping up to help you visualize, explain, and test regular expressions. One is Regex 101 (just Google "regex tester" or "regex visualizer"). If you look here, this is an explanation of the first RegEx, and here is an explanation of the second. There are others as well, like Debuggex, Regex Tester, etc. But I find the detailed match breakdown on Regex 101 to be pretty useful.

Need help understanding preg_match regular expression

A regular expression in preg_match is given as /server\-([^\-\.\d]+)(\d+)/. Can someone help me understand what this means? I see that the string starts with server- but I dont get ([^\-\.\d]+)(\d+)'

[ ] -> Match anything inside the square brackets for ONE character position once and only once, for example, [12] means match the target to 1 and if that does not match then match the target to 2 while [0123456789] means match to any character in the range 0 to 9.
- -> The - (dash) inside square brackets is the 'range separator' and allows us to define a range, in our example above of [0123456789] we could rewrite it as [0-9].
You can define more than one range inside a list, for example, [0-9A-C] means check for 0 to 9 and A to C (but not a to c).
NOTE: To test for - inside brackets (as a literal) it must come first or last, that is, [-0-9] will test for - and 0 to 9.
^ -> The ^ (circumflex or caret) inside square brackets negates the expression (we will see an alternate use for the circumflex/caret outside square brackets later), for example, [^Ff] means anything except upper or lower case F and [^a-z] means everything except lower case a to z.
You can check more explanations about it in the source I got this information: http://www.zytrax.com/tech/web/regex.htm
And if u want to test, u can try this one: http://gskinner.com/RegExr/

Here's the explanation:
# server\-([^\-\.\d]+)(\d+)
#
# Match the characters “server” literally «server»
# Match the character “-” literally «\-»
# Match the regular expression below and capture its match into backreference number 1 «([^\-\.\d]+)»
# Match a single character NOT present in the list below «[^\-\.\d]+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# A - character «\-»
# A . character «\.»
# A single digit 0..9 «\d»
# Match the regular expression below and capture its match into backreference number 2 «(\d+)»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
You can use programs such as RegexBuddy if you intend to work with regexes and are willing to spend some funds.
You can also use this free web based explanation utility.

^ means not one of the following characters inside the brackets
\- \. are the - and . characters
\d is a number
[^\-\.\d]+ means on of more of the characters inside the bracket, so one or more of anything not a -, . or a number.
(\d+) one or more number

Here is the explanation given by the perl module YAPE::Regex::Explain
The regular expression:
(?-imsx:server\-([^\-\.\d]+)(\d+))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
server 'server'
----------------------------------------------------------------------
\- '-'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^\-\.\d]+ any character except: '\-', '\.', digits
(0-9) (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Regular Expression (preg_match)

This is the not working code:
<?php
$matchWith = " http://videosite.com/ID123 ";
preg_match_all('/\S\/videosite\.com\/(\w+)\S/i', $matchWith, $matches);
foreach($matches[1] as $value)
{
print 'Hyperlink';
}
?>
What I want is that it should not display the link if it has a whitespace before or after.
So now it should display nothing. But it still displays the link.

This can also match ID12, because 3 is not an space, and the / of http:/ is not a space. You can try:
preg_match_all('/^\S*\/videosite\.com\/(\w+)\S*$/i', $matchWith, $matches);

So, you don't want it to display if there's whitespaces. Something like this should work, didn't test.
preg_match_all('/^\S+?videosite\.com\/(\w+)\S+?$/i', $matchWith, $matches);

You can try this. It works:
if (preg_match('%^\S*?/videosite\.com/(\w+)(?!\S+)$%i', $subject, $regs)) {
#$result = $regs[0];
}
But i am positive that after I post this, you will update your question :)
Explanation:
"
^ # Assert position at the beginning of the string
\S # Match a single character that is a “non-whitespace character”
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\/ # Match the character “/” literally
videosite # Match the characters “videosite” literally
\. # Match the character “.” literally
com # Match the characters “com” literally
\/ # Match the character “/” literally
( # Match the regular expression below and capture its match into backreference number 1
\w # Match a single character that is a “word character” (letters, digits, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
\S # Match a single character that is a “non-whitespace character”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
"

It would probably be simpler to use this regex:
'/^http:\/\/videosite\.com\/(\w+)$/i'
I believe you are referring to the white space before http, and the white space after the directory. So, you should use the ^ character to indicate that the string must start with http, and use the $ character at the end to indicate that the string must end with a word character.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

variable length masking with preg_replace - php

Related

Match regular expression specific character quantities in any order

Regex conditional match for escaped apostrophe

explanation of preg_replace function in PHP [duplicate]

Need help understanding preg_match regular expression

Regular Expression (preg_match)

Categories

Resources