Regex match section within string - php

I have a string foo-foo-AB1234-foo-AB12345678. The string can be in any format, is there a way of matching only the following pattern letter,letter,digits 3-5 ?
I have the following implementation:
preg_match_all('/[A-Za-z]{2}[0-9]{3,6}/', $string, $matches);
Unfortunately this finds a match on AB1234 AND AB12345678 which has more than 6 digits. I only wish to find a match on AB1234 in this instance.
I tried:
preg_match_all('/^[A-Za-z]{2}[0-9]{3,6}$/', $string, $matches);
You will notice ^ and $ to mark the beginning and end, but this only applies to the string, not the section, therefore no match is found.
I understand why the code is behaving like it is. It makes logical sense. I can't figure out the solution though.

You must be looking for word boundaries \b:
\b\p{L}{2}\p{N}{3,5}\b
See demo
Note that \p{L} matches a Unicode letter, and \p{N} matches a Unicode number.
You can as well use your modified regex \b[a-zA-Z]{2}[0-9]{3,5}\b. Note that using anchors makes your regex match only at the beginning of a string (with ^) or/and at the end of the string (with $).
In case you have underscored words (like foo-foo_AB1234_foo_AB12345678_string), you will need a slight modification:
(?<=\b|_)\p{L}{2}\p{N}{3,5}(?=\b|_)

You have to end your regular expression with a pattern for a non-digit. In Java this would be \D, this should be the same in PHP.

Related

Regular expression for 8 to 10 letter words

I need a regular expression that matches either 8 letter words ending in "tion" or 10 letter words ending in "able".
Here is what I came up with, but for some reason http://regex101.com tells me there are no matches when I try to match a string.
My idea is as follows:
([a-z]{4}^\btion\b|[a-z]{6}^\bable\b)
Link to regex101 - Here
\b matches a word boundary. You should only have this at the beginning and end of the word, not before the suffix. You can take it outside the grouping parentheses, since all the alternatives are supposed to match at word boundaries.
\b([a-z]{4}tion|[a-z]{6}able)\b
You don't need ^ at all, it matches the beginning of the string.
Try this one:
\b([a-z]{4}tion|[a-z]{6}able)\b
Demo
You use ^\b between the variable section (e.g. [a-z]{4}) and constant postfix (e.g. tion) and that breaks the match. ^ means "beginning of the string (or a line)" and \b means "word boundary". Using it together makes little sense, as beginning of the string is always a word boundary.

Php regex that matches substring followed by any length of character and then comma

I have a long string containing Copyright: 'any length of unknown string here',
what regex should I write to exactly match this as substring in a string?
I tried this preg_replace('/Copyright:(.*?)/', 'mytext', $str); but its not working, it only matches the Copyright:
A lazily quantified pattern at the end of the pattern will always match no text in case of *? and 1 char only in case of +?, i.e. will match as few chars as possible to return a valid match.
You need to make sure you get to the ', by putting them into the pattern:
'/Copyright:.*?\',/'
^^^
See the regex demo
The ? in your group 1 (.*?) makes this block lazy, i.e. matching as few characters as possible. Removing that would solve it.
Copyright:(.*)',
However, that would match everything in that same line. If you have text in that same line, make sure to limit it further. My screenshot below just just grouping () to make it easier for you to look, you can do without the parentheses.
I usually use Regxr.com to test my regular expression, there's also many other similar tools online, note that this one is great in UX, but does not support lookbehind.

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.
A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.
You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.
In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

I can't get preg_match to test if the entire string matches the regex

I'm using this regular expression to test if a username is valid:
[A-Za-z0-9 _]{3,12} when I test it for matches in a text editor with the string test'ing, it highlights 'test' and 'ing', but when I use the following code in PHP:
if(!preg_match('/[A-Za-z0-9 _]{3,12}/', $content) where $content is test'ing and it should return FALSE, it still returns true.
Is there something wrong with my regular expression? I need:
Minimum length 3, max 12 {3,12}
No spaces/underscores in front or after the string, and no spaces/underscores in a row anywhere
(I'm using additional checks for this because I'm not very good with regex)
Only alphanumerics, spaces and underscores allowed [A-Za-z0-9 _]
Thanks in advance...
You're missing the anchors in the regular expression, so the regex can comfortably match 3 characters in the character class anywhere in the string. This is not what you want. You want to check if your regex matches against the entire string. For that, you need to include the anchors (^ and $).
if(!preg_match('/^[A-Za-z0-9 _]{3,12}$/', $content)
^ ^
^ asserts the position at the beginning of the string and $ asserts position at the end of the string. It's important to note that these meta characters do not actually consume characters. They're zero-width assertions.
Further reading:
Regex Anchors on regular-expressions.info
The Stack Overflow Regular Expressions FAQ
You should add start and end anchors (^$):
if(!preg_match('/^[A-Za-z0-9 _]{3,12}$/', $content)
The anchor ^ matches the start of the string and $ matches the end. That way, it will only match if the whole string satisfies your regex.
Hope that helps

what does the regular expression (?<!-) mean

I'm trying to understand a piece of code and came across this regular expression used in PHP's preg_replace function.
'/(?<!-)color[^{:]*:[^{#]*$/i'
This bit... (?<!-)
doesnt appear in any of my reg-exp manuals. Anyone know what this means please? (Google doesnt return anything - I dont think symbols work in google.)
The ?<! at the start of a parenthetical group is a negative lookbehind. It asserts that the word color (strictly, the c in the engine) was not preceded by a - character.
So, for a more concrete example, it would match color in the strings:
color
+color
someTextColor
But it will fail on something like -color or background-color. Also note that the engine will not technically "match" whatever precedes the c, it simply asserts that it is not a hyphen. This can be an important distinction depending on the context (illustrated on Rubular with a trivial example; note that only the b in the last string is matched, not the preceding letter).
PHP uses perl compatible regular expressions (PCRE) for the preg_* functions. From perldoc perlre:
"(?<!pattern)"
A zero-width negative look-behind assertion. For example
"/(?<!bar)foo/" matches any occurrence of "foo" that does
not follow "bar". Works only for fixed-width look-
behind.
I'm learning regular expressions using Python's re module!
http://docs.python.org/library/re.html
Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.

Categories