Regex to detect the colon and sides of it?

Regex to detect the colon and sides of it? - php

First see my string please:
$a = "[ child : parent ]";
How can I detect that the pattern is:
[(optional space)word or character(optional space) : (optional space)word or character(optional space)]

You can catch this as follows in PHP:
Your regular expression is /\[ *\w+ *: *\w+ *]/
You would write code that would look like this to see if it matched.
if (preg_match('/regex/', $string)) {
// do things
}
Explanation of the Regular Expression
There is a backslash (\) before the open bracket because
[ has special meaning in regular expressions. The backslash
prevents its special meaning from being used.
The asterisk (*) matches 0 or more of the previous character expression. In this
case, it matches 0 or more spaces. If you instead used the
expression \s*, it would match 0 or more white-space characters
(space, tab, line break). Finally, if you wanted it to match 0 or 1
of the previous character, you would use ? instead of *.
The plus (+) matches 1 or more of the previous character expression. The \w character expression matches a letter, digit, or underscore. If you don't want underscores to match, you should instead use a character class. For example, you could use [A-Za-z0-9].
You can find more information on regular expressions at http://www.regular-expressions.info and http://www.regular-expressions.info/php.html

From your sample text I'd say you mean a human word and not \w regex word
preg_match('/\[ ?([a-z]+) ?: ?([a-z]+) ?\]/i', $a, $matches);
Explained demo: http://regex101.com/r/hB2oV9
$matches will save both values, test with var_dump($matches);

I'm not sure on the php-specific version of regex, but this should work:
\[ ?\w+ ? : ?\w+ ?\]
Here is a site that I've used in the past to find regular expressions for my needed patterns.

use this regex \[\s*\w+\s*:\s*\w+\s*\]

I would probably do it like this
preg_match('/^\[\s?\w+\s+:\s+\w+\s?\]$/', $string)

Related

Regex match section within string

I have a string foo-foo-AB1234-foo-AB12345678. The string can be in any format, is there a way of matching only the following pattern letter,letter,digits 3-5 ?
I have the following implementation:
preg_match_all('/[A-Za-z]{2}[0-9]{3,6}/', $string, $matches);
Unfortunately this finds a match on AB1234 AND AB12345678 which has more than 6 digits. I only wish to find a match on AB1234 in this instance.
I tried:
preg_match_all('/^[A-Za-z]{2}[0-9]{3,6}$/', $string, $matches);
You will notice ^ and $ to mark the beginning and end, but this only applies to the string, not the section, therefore no match is found.
I understand why the code is behaving like it is. It makes logical sense. I can't figure out the solution though.

You must be looking for word boundaries \b:
\b\p{L}{2}\p{N}{3,5}\b
See demo
Note that \p{L} matches a Unicode letter, and \p{N} matches a Unicode number.
You can as well use your modified regex \b[a-zA-Z]{2}[0-9]{3,5}\b. Note that using anchors makes your regex match only at the beginning of a string (with ^) or/and at the end of the string (with $).
In case you have underscored words (like foo-foo_AB1234_foo_AB12345678_string), you will need a slight modification:
(?<=\b|_)\p{L}{2}\p{N}{3,5}(?=\b|_)

You have to end your regular expression with a pattern for a non-digit. In Java this would be \D, this should be the same in PHP.

Character classes strange behavior in alternations in regular expressions

I'm trying to write a simple regular expression that recognizes a sequence of characters that are not columns or are escaped columns.
I.e:
foo:bar //Does not match
but
foo\:bar //Does match
By my knowledge of Regular Languages, such language can be described by the regular expression
/([^:]|\\[:])*/
You can see a graphical representation of this expression in the wonderful tool Regexper
Using php's preg_match (that is based on the PCRE engine), such expression does not match "foo\:bar".
However, if substitute the class with the single char:
/([^:]|\\:)*/
the expression matches.
Do you have an explanation for this? Is this a sort of limitation of the PCRE engine on character classes?
PS: Testing the first expression on RegExr, that is based on AS3 Regexp engine, does not offer a match, while changing the alternation order:
/(\\[:]|[^:])*/
it does match, while the same expression does not match in PCRE.

preg_match() accepts a regular expression pattern as a string, so you need to double escape everything.
^(?:[^:\\\\]|\\\\:)+$
This matches one or more characters that are not colons or escape characters [^:\\\\], or an escaped colon \\\\:.
Why your first regular expression didn't work: /([^:]|\\[:])*/.
This matches a non-colon [^:], or it matches \\[:] which matches a literal [ followed by a literal : and then a literal ].
Why this works : /([^:]|\\:)*/ ?
This matches a non-colon [^:], or it matches a literal \\: so it effectively matches everything.
Edit: Why /([^:]|E[:])*/ won't match fooE:bar ?
This is what happens: [^:] matches the f then it matches o then the other o then it matches the E, now it finds a colon : and it can't match it, but since by default the PCRE engine doesn't look for the longest possible match it is satisfied with what is has matched so far and stops right there and returns fooE as a match without trying the other alternative E[:] (which is equal by the way to E:) at all.
If you want to match the entire sequence then you will to use an expression like this one:
/([^:E]|E[:])*/
This prevents [^:] from consuming that E.

You can try this. This allow the secuence \\: to have a chance before the negated character class [^:].
^(?:\\:|[^:])+$
If you use the values in the alternation bar inverted as in ^((?:[^:]|\\:)+$ it will not match escaped colon \: because the first alternative will consume the slash (\) before the second expression have a chance to try.

regular expressions for url parser

<?php
$string = 'user34567';
if(preg_match('/user(^[0-9]{1,8}+$)/', $string)){
echo 1;
}
?>
I want to check if the string have the word user follows by number that can be 8 symbols max.

You're very close actually:
if(preg_match('/^user[0-9]{1,8}$/', $string)){
The anchor for "must match at start of string" should be all the way in front, followed by the "user" literal; then you specify the character set [0-9] and multiplier {1,8}. Finally, you end off with the "must match at end of string" anchor.
A few comments on your original expression:
The ^ matches the start of a string, so writing it anywhere else inside this expression but the beginning will not yield the expected results
The + is a multiplier; {1,8} is one too, but only one multiplier can be used after an expression
Unless you're intending to use the numbers that you found in the expression, you don't need parentheses.
Btw, instead of [0-9] you could also use \d. It's an automatic character group that shortens the regular expression, though this particular one doesn't save all too many characters ;-)

By using ^ and $, you are only matching when the pattern is the only thing on the line. Is that what you want? If so, use the following:
preg_match( '/^user[0-9]{1,8}[^0-9]$/' , $string );
If you want to find this pattern anywhere in a line, I would try:
preg_match( '/user[0-9]{1,8}[^0-9]/' , $string );
As always, you should use a reference tool like RegexPal to do your regular expression testing in isolation.

You were close, here is your regex : /^user[0-9]{1,8}$/

try the following regex instead:
/^user([0-9]{1,8})$/

Use this regex:
/^user\d{1,8}$/

What are those characters in a regular expression?

I found this regex that works correctly but I didn't understand what is # (at the start) and at the end of the expression. Are not ^ and $ the start/end characters?
preg_match_all('#^/([^/]+)/([^/]+)/$#', $s, $matches);
Thanks

The matched pattern contains many /, thus the # is used as regex delimeter. These are identical
/^something$/
and
#^something$#
If you have multiple / in your pattern the 2nd example is better suited to avoid ugly masking with \/. This is how the RE would like like with using the standard // syntax:
/^\/([^\/]+)\/([^\/]+)\/$/

About #:
That's a delimiter of the regular expression itself. It's only meaning is to tell which delimiter is used for the expression. Commonly / is used, but others are possible. PCRE expressions need a delimiter with preg_match or preg_match_all.
About ^:
Inside character classes ([...]), the ^ has the meaning of not if it's the first character.
[abc] : matching a, b or c
[^abc] : NOT matching a, b or c, match every other character instead

Also # at the start and the end here are custom regex delimiters. Instead of the usual /.../ you have #...#. Just like perl.

These are delimiters. You can use any delimiter you want, but they must appear at the start and end of the regular expression.
Please see this documentation for a detail insight in to regular expressions:
http://www.php.net/manual/en/pcre.pattern.php

You can use pretty much anything as delimiters. The most common one is /.../, but if the pattern itself contains / and you don't want to escape any and all occurrences, you can use a different delimiter. My personal preference is (...) because it reminds me that $0 of the result is the entire pattern. But you can do anything, <...>, #...#, %...%, {...}... well, almost anything. I don't know exactly what the requirements are, but I think it's "any non-alphanumeric character".

Let me break it down:
# is the first character, so this is the character used as the delimiter of the regular expression - we know we've got to the end when we reach the next (unescaped) one of these
^ outside of a character class, this means the beginning of the string
/ is just a normal 'slash' character
([^/]+) This is a bracketed expression containing at least one (+) instance of any character that isn't a / (^ at the beginning of a character class inverts the character class - meaning it will only match characters that are not in this list)
/ again
([^/]+) again
/ again
$ this matches the end of the string
# this is the final delimeter, so we know that the regex is now finished.

How to make dot match newline characters using regular expressions

I have a string that contains normal characters, white charsets and newline characters between <div> and </div>.
This regular expression doesn't work: /<div>(.*)<\/div>. It is because .* doesn't match newline characters. How can I do this?

You need to use the DOTALL modifier (/s).
'/<div>(.*)<\/div>/s'
This might not give you exactly what you want because you are greedy matching. You might instead try a non-greedy match:
'/<div>(.*?)<\/div>/s'
You could also solve this by matching everything except '<' if there aren't other tags:
'/<div>([^<]*)<\/div>/'
Another observation is that you don't need to use / as your regular expression delimiters. Using another character means that you don't have to escape the / in </div>, improving readability. This applies to all the above regular expressions. Here's it would look if you use '#' instead of '/':
'#<div>([^<]*)</div>#'
However all these solutions can fail due to nested divs, extra whitespace, HTML comments and various other things. HTML is too complicated to parse with Regex, so you should consider using an HTML parser instead.

To match all characters, you can use this trick:
%\<div\>([\s\S]*)\</div\>%

You can also use the (?s) mode modifier. For example,
(?s)/<div>(.*?)<\/div>

There shouldn't be any problem with just doing:
(.|\n)
This matches either any character except newline or a newline, so every character. It solved it for me, at least.

An option would be:
'/<div>(\n*|.*)<\/div>/i'
Which would match either newline or the dot identifier matches.

There is usually a flag in the regular expression compiler to tell it that dot should match newline characters.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex to detect the colon and sides of it? - php

First see my string please: $a = "[ child : parent ]"; How can I detect that the pattern is: [(optional space)word or character(optional space) : (optional space)word or character(optional space)]

From your sample text I'd say you mean a human word and not \w regex word preg_match('/\[ ?([a-z]+) ?: ?([a-z]+) ?\]/i', $a, $matches); Explained demo: http://regex101.com/r/hB2oV9 $matches will save both values, test with var_dump($matches);

I'm not sure on the php-specific version of regex, but this should work: \[ ?\w+ ? : ?\w+ ?\] Here is a site that I've used in the past to find regular expressions for my needed patterns.

use this regex \[\s\w+\s:\s\w+\s\]

I would probably do it like this preg_match('/^\[\s?\w+\s+:\s+\w+\s?\]$/', $string)

Related

Regex match section within string

Character classes strange behavior in alternations in regular expressions

regular expressions for url parser

What are those characters in a regular expression?

How to make dot match newline characters using regular expressions

Categories

Resources