regular expressions multiple statements - php

I have half my problem working. The problem is: I need to match words that are either 7 letters long and starting with st OR 9 letters long ending with tion. I have code that works for the first half of the question: st\w{5}\s. This will match a 7 letter word such as 'startin' in the example: start startin starting.
However I cant seem to add the second half. (st\w{5}\s)|(tion\w{5}) Does not work in trying to find 'startin' and 'attention' out of: start startin starting attention.
Thanks.

You'll want to look for the word boundaries \b(?:(st\w{5})|(\w{5}tion))\b

use word boundaries, for example:
\b(st[a-z]{5}|[a-z]{5}tion)\b

Related

Get 'XXX' value using a RegEx in PHP

I need some help building a regex for get the value of XXX in the following set of possible matches:
+58XXXYYYYYYY
+580XXXYYYYYYY
0XXXYYYYYYY
XXXYYYYYYY
This are phone numbers so XXX is dynamic and will not hold always the same value. The RegEx is intended to be used on PHP so I know I should use preg_match() function but I have not idea about the regex. Can any give me some advice on this?
This sounds like it matches your requirements:
(\d{3})\d{7}$
With a Live Demo
If you want to match the last 3 numbers, you could use /([0-9]{3,3})$/.
The parenthesis indicates a capturing group (what you are looking for). Inside that group you want to match a pattern [...]of any number 0-9 exactly 3 times {3,3}. And finally you want to match the last occurrence of this pattern, so the $ indicates the end of the line.
A very handy tool to building simple regex queries is Debuggex. I use it all the time!

Retrieve 0 or more matches from comma separated list inside parenthesis using regex

I am trying to retrieve matches from a comma separated list that is located inside parenthesis using regular expression. (I also retrieve the version number in the first capture group, though that's not important to this question)
What's worth noting is that the expression should ideally handle all possible cases, where the list could be empty or could have more than 3 entries = 0 or more matches in the second capture group.
The expression I have right now looks like this:
SomeText\/(.*)\s\(((,\s)?([\w\s\.]+))*\)
The string I am testing this on looks like this:
SomeText/1.0.4 (debug, OS X 10.11.2, Macbook Pro Retina)
Result of this is:
1. [6-11] `1.0.4`
2. [32-52] `, Macbook Pro Retina`
3. [32-34] `, `
4. [34-52] `Macbook Pro Retina`
The desired result would look like this:
1. [6-11] `1.0.4`
2. [32-52] `debug`
3. [32-34] `OS X 10.11.2`
4. [34-52] `Macbook Pro Retina`
According to the image above (as far as I can see), the expression should work on the test string. What is the cause of the weird results and how could I improve the expression?
I know there are other ways of solving this problem, but I would like to use a single regular expression if possible. Please don't suggest other options.
When dealing with a varying number of groups, regex ain't the best. Solve it in two steps.
First, break down the statement using a simple regex:
SomeText\/([\d.]*) \(([^)]*)\)
1. [9-14] `1.0.4`
2. [16-55] `debug, OS X 10.11.2, Macbook Pro Retina`
Then just explode the second result by ',' to get your groups.
Probably the \G anchor works best here for binding the match to an entry point. This regex is designed for input that is always similar to the sample that is provided in your question.
(?<=SomeText\/|\G(?!^))[(,]? *\K[^,)(]+
(?<=SomeText\/|\G) the lookbehind is the part where matches should be glued to
\G matches where the previous match ended (?!^) but don't match start
[(,]? *\ matches optional opening parenthesis or comma followed by any amount of space
\K resets beginning of the reported match
[^,)(]+ matches the wanted characters, that are none of ( ) ,
Demo at regex101 (grab matches of $0)
Another idea with use of capture groups.
SomeText\/([^(]*)\(|\G(?!^),? *([^,)]+)
This one without lookbehind is a bit more accurate (it also requires the opening parenthesis), of better performance (needs fewer steps) and probably easier to understand and maintain.
SomeText\/([^(]*)\( the entry anchor and version is captured here to $1
|\G(?!^),? *([^,)]+) or glued to previous match: capture to $2 one or more characters, that are not , ) preceded by optional space or comma.
Another demo at regex101
Actually, stribizhev was close:
(?:SomeText\/([^() ]*)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\))
Just had to make that one class expect at least one match
(?:SomeText\/([0-9.]+)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\)) is a little more clear as long as the version number is always numbers and periods.
I wanted to come up with something more elegant than this (though this does actually work):
SomeText\/(.*)\s\(([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?\)
Obviously, the
([^\,]+)?\,?\s?
is repeated 6 times.
(It can be repeated any number of times and it will work for any number of comma-separated items equal to or below that number of times).
I tried to shorten the long, repetitive list of ([^\,]+)?\,?\s? above to
(?:([^\,]+)\,?\s?)*
but it doesn't work and my knowledge of regex is not currently good enough to say why not.
This should solve your problem. Use the code you already have and add something like this. It will determine where commas are in your string and delete them.
Use trim() to delete white spaces at the start or the end.
$a = strpos($line, ",");
$line = trim(substr($line, 55-$a));
I hope, this helps you!

Highlight a sub string from middle of the string with preg_replace

I want to make the middle part of my number string bold.
I have a number string :
$nmbr="55113741659856";
I want to highlight 4 numbers in the middle ,from the 6th position
......7416......
and replace them with bold letters
<b>7416</b>
My currunt code is failing to do what I want
$nmbr="55113741659856";
preg_replace("/d+([0-9]{4,6})/i","<b>$1</b>",$nmbr);
Your Help is much appriciated.
Thanks.
I want to highlight 4 numbers in the middle ,from the 6th position
I'd do:
$nmbr="55113741659856";
preg_replace("/^(\d{5})(\d{4})/","$1<b>$2</b>",$nmbr);
You forgot to add \d+.
preg_replace("/\d+([0-9]{4,6})/i","<b>$1</b>",$nmbr);
The reason is:
\d Find a digit
And you missed the \ here.

Quick PHP regex for digit format

I just spent hours figuring out how to write a regular expression in PHP that I need to only allow the following format of a string to pass:
(any digit)_(any digit)
which would look like:
219211_2
so far I tried a lot of combinations, I think this one was the closest to the solution:
/(\\d+)(_)(\\d+)/
also if there was a way to limit the range of the last number (the one after the underline) to a certain amount of digits (ex. maximal 12 digits), that would be nice.
I am still learning regular expressions, so any help is greatly appreciated, thanks.
The following:
\d+_\d{1,12}(?!\d)
Will match "anywhere in the string". If you need to have it either "at the start", "at the end" or "this is the whole thing", then you will want to modify it with anchors
^\d+_\d{1,12}(?!d) - must be at the start
\d+_\d{1,12}$ - must be at the end
^\d+_\d{1,12}$ - must be the entire string
demo: http://regex101.com/r/jG0eZ7
Explanation:
\d+ - at least one digit
_ - literal underscore
\d{1,12} - between 1 and 12 digits
(?!\d) - followed by "something that is not a digit" (negative lookahead)
The last thing is important otherwise it will match the first 12 and ignore the 13th. If your number happens to be at the end of the string and you used the form I originally had [^\d] it would fail to match in that specific case.
Thanks to #sln for pointing that out.
You don't need double escaping \\d in PHP.
Use this regex:
"/^(\d+)_(\d{1,12})$/"
\d{1,12} will match 1 to 12 digist
Better to use line start/end anchors to avoid matching unexpected input
Try this:
$regex= '~^/(\d+)_(\d+)$~';
$input= '219211_2';
if (preg_match($regex, $input, $result)) {
print_r($result);
}
Just try with following regex:
^(\d+)_(\d{1,12})$

Regex to remove year from a string PHP

I have created a database of cigarette and trading cards. Each set title has a year associated with it. For example, 1943 or 2011. The year is always 4 characters long, but can be anywhere in the string.
Could someone please help me create a regex that will find the year in the string. I tried '/d{4}\b/' but it is failing.
(19|20)[0-9][0-9]
This will read in only 1900 and 2000 ranged dates.
Try this one :
/\b\d{4}\b/
it will match 4 digits embeded with non-words
d{4}\b will match four d's at a word boundary. You forgot the backslash in the character class: should be \d{4}\b. Depending on the input data you may also want to consider adding another word boundary (\b) at the beginning.
Here is a full solution:
$stringWithYear = '1990 New York Marathon';
$stringNoYear = preg_replace('/(19|20)[0-9][0-9]/', '', $stringWithYear);
echo trim($stringNoYear); // outputs 'New York Marathon'
This too works.. preg_match("/^1[0-9]{3}$/",$value))
Checks year of only starting with 1. You could change according to your requirement..

Categories