Php preg_match issue not working - php

I am trying to find a php preg_match that can match:
"2-20 to 2-25"
from this text:
user levels 2-20 to 2-25 not ready
I tried
preg_match("/([0-9]+) to ([0-9]+)/", $vars[1] , $matchesto);
but the result is:
"20 to 2"
Any help appreciated.

Your pattern is almost correct; just include the dashes and adjust the capture group:
([-0-9]+ to [-0-9]+)
Example:
https://regex101.com/r/eD6lQ2/1

Thats because [0-9]+ matches one or more numbers but won't match a hyphen (-).
Try this:
$pattern = '~([0-9]+-[0-9]+) to ([0-9]+-[0-9]+)~Ui';
preg_match($pattern, $vars[1] , $matchesto);

You can use "\d" to match the digits:
<?php
$str = 'user levels 2-20 to 2-25 not ready';
$matches = array();
preg_match('/(\d+-\d+) to (\d+-\d+)/', $str, $matches);
var_dump($matches);
Output:
array(3) {
[0]=>
string(12) "2-20 to 2-25"
[1]=>
string(4) "2-20"
[2]=>
string(4) "2-25"
}

Related

What is the patern to search for any string which respect this format "CEC0000-0000"?

The zeros can be incremented but it must be of four digits, so it could be CEC0152-2005
Of course with a "-" between them.
I used www.txt2re.com to generate this patern but it didn't help me.
Maybe,
^[A-Z]{3}[0-9]{4}-[0-9]{4}$
or,
^CEC[0-9]{4}-[0-9]{4}$
might work fine.
Test
$re = '/^[A-Z]{3}[0-9]{4}-[0-9]{4}$/m';
$str = 'CEC0152-2005
CEC0152-2019
CEC0152-1999
CEC0152-19991';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(3) {
[0]=>
array(1) {
[0]=>
string(12) "CEC0152-2005"
}
[1]=>
array(1) {
[0]=>
string(12) "CEC0152-2019"
}
[2]=>
array(1) {
[0]=>
string(12) "CEC0152-1999"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
If after the dash we'd have a four-digit year,
^[A-Z]{3}[0-9]{4}-[12][0-9]{3}$
^CEC[0-9]{4}-[12][0-9]{3}$
might also work fine, I guess.
Demo 2

Finding hashtags in Text

Yes, there are lots of hashtag regex available here but none is suiting my needs. And no one is actually able to solve the problem.
The Regex should consider the following hashtags as valid:
#validhashtag
#valid_hashtag
#validhashtag_with_space_before_or_after
#valid_hashtag_chars_öÖäÄüÜß
...and not valid shoulw be:
ipsum#notvalid //Not valid: Connected to Word
http://google.com/#results //Not valid: Same as above
#not-valid
#not!valid
Allowed Characters should be:
a-Z,0-9,öÖäÄüÜß,_
Max length should be 50 characters.
The main problem is the part where the hashtags is "connected" to another textpart. I don't know how to solve that problem.
This is what I attempted to do
/([\p{Pc}\p{N}\p{L}\p{Mn}]{1,50})/u
That one works pretty well but doesn't consider the "word#hashtag" - Problem.
I think your original expression is pretty great, we'd just modify that with:
^\s*#([\p{Pc}\p{N}\p{L}\p{Mn}]{1,50})$
Demo
Test
$re = '/^\s*#([\p{Pc}\p{N}\p{L}\p{Mn}]{1,50})$/um';
$str = '#validhashtag
#valid_hashtag
#validhashtag_with_space_before_or_after
#valid_hashtag_chars_öÖäÄüÜß
ipsum#notvalid //Not valid: Connected to Word
http://google.com/#results //Not valid: Same as above
#not-valid
#not!valid';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
Output
array(4) {
[0]=>
array(2) {
[0]=>
string(13) "#validhashtag"
[1]=>
string(12) "validhashtag"
}
[1]=>
array(2) {
[0]=>
string(14) "#valid_hashtag"
[1]=>
string(13) "valid_hashtag"
}
[2]=>
array(2) {
[0]=>
string(41) " #validhashtag_with_space_before_or_after"
[1]=>
string(39) "validhashtag_with_space_before_or_after"
}
[3]=>
array(2) {
[0]=>
string(35) "#valid_hashtag_chars_öÖäÄüÜß"
[1]=>
string(34) "valid_hashtag_chars_öÖäÄüÜß"
}
}
You may use either of the two below:
/(?<!\S)#\w+(?!\S)/u
/(?<!\S)#[\w\p{M}\p{Pc}]+(?!\S)/u
See the regex demo. If you want to restrict the word part length, keep your {1,50} quantifier - /(?<!\S)#\w{1,50}(?!\S)/u.
Also note: \w even with u modifier does not match the same chars that are are considered "word" in .NET, Java, Python re regex. You may decide to include other classes to fill the gap and use [\w\p{M}\p{Pc}]+ instead of just \w where \p{M} matches any diacritics and \p{Pc} matches any connector punctuation.
Details
(?<!\S) - a whitespace or start of string required right before
# - a # sign
\w+ - 1+ word chars (NOTE if you want to restrict its length from 1 to 50, replace + with {1,50}) (also, note that u modifier lets the PCRE engine to match any Unicode letters and digits with \w shorthand)
[\w\p{M}\p{Pc}] - matches 1+ word chars + all diacritics (\p{M}) and all connector punctuation (\p{Pc}, considered as word in .NET regex)
(?!\S) - a whitespace or end of string required right after.
PHP demo:
$s = "#validhashtag
#valid_hashtag
#validhashtag_with_space_before_or_after
#valid_hashtag_chars_öÖäÄüÜß
...and not valid shoulw be:
ipsum#notvalid //Not valid: Connected to Word
http://google.com/#results //Not valid: Same as above
#not-valid
#not!valid";
if (preg_match_all('~(?<!\S)#\w+(?!\S)~u', $s, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => #validhashtag
[1] => #valid_hashtag
[2] => #validhashtag_with_space_before_or_after
[3] => #valid_hashtag_chars_öÖäÄüÜß
)

PHP preg_split, split by same characters

I'm trying to split a string with preg_split. Here's an example of the string:
111235622411
I want the output to be like this:
$arr[0] = "111";
$arr[1] = "2";
$arr[2] = "3";
$arr[3] = "5";
$arr[4] = "6";
$arr[5] = "22";
$arr[6] = "4";
$arr[7] = "11";
So if there's the same characters one after the other, I want them in the same "chunk". I just can't come up with the regular expression I should use. I'm sorry if some of the terms are wrong, because it has been some time since I coded PHP before.
I would use preg_match_all():
$string = '111235622411';
preg_match_all('/(.)\1*/', $string, $matches);
var_dump($matches[0]);
\1 references the previously captured group (.) (any single character). This feature is called back referencing. The regex repeats the previously matched character - greedy * meaning it matches as much equal characters as possible, what was desired in the question.
Output:
array(8) {
[0]=>
string(3) "111"
[1]=>
string(1) "2"
[2]=>
string(1) "3"
[3]=>
string(1) "5"
[4]=>
string(1) "6"
[5]=>
string(2) "22"
[6]=>
string(1) "4"
[7]=>
string(2) "11"
}
You can use this regex:
(.)(?=\1)\1+|\d
And instead of splitting it, take the matches.
$matches = null;
$returnValue = preg_match_all('/(.)(?=\\1)\\1+|\\d/', '111235622411', $matches);
And the $matches[0] will contain what you want. As #hek2mgl has suggested, you can also use the simpler /(\d)\1*/
DEMO
Following, a simple solution that consists in executing a preg_match_all:
The regex in this case is:
(\d)\1*
Signification of the regex:
(\d): 1st Capturing group. \d match a digit [0-9].
\1 matches the same text as most recently matched by the 1st capturing group.
*: Quantifier between zero and unlimited times.
The php code would be:
$re = "/(\\d)\\1*/";
$str = "111235622411";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
You can access for example the first matching group which is "111" like this: $matches[0][0], the second which is "2" like this $matches[0][1], and so on. Check here Demo to see a working example.
Hope it's useful!

PHP regex backreference not working

I wrote a regex pattern which works perfectly when I test it in Regexr, but when I use it in my PHP code it doesn't always match when it should match.
The regular expression, including some examples that should and shouldn't match.
Example PHP code that should match but doesn't:
preg_match('/^([~]{3,})\s*([\w-]+)?\s*(?:\{([\w-\s]+)\})?\s*(\2[\w-]+)?\s*$/', "~~~ {class} lang", $matches);
echo var_dump($matches);
I believe the problem is caused by the backreference in the last capture group (\2[\w-]+), however, I can't quire figure out how to fix this.
Because you're referring to a non-existing group(group 2). So remove \2 from the regex.
^([~]{3,})\s*([\w-]+)?\s*(?:\{([-\w\s]+)\})?\s*([\w-]+)?\s*$
DEMO
~~~ {class} lang
| | | |
Group1| Group3 Group4
|
Missing group 2
The problem is caused by capturing group #2, you have made this group optional. So since it may or may not exist, you need to make your backreference optional as well or else it always looks for a required group.
However, since all groups are optional I would just recurse the subpattern of the second group.
^(~{3,})\s*([\w-]+)?\s*(?:{([^}]+)})?\s*((?2))?\s*$
Example:
$str = '~~~ {class} lang';
preg_match('/^(~{3,})\s*([\w-]+)?\s*(?:{([^}]+)})?\s*((?2))?\s*$/', $str, $matches);
var_dump($matches);
Output
array(5) {
[0]=> string(16) "~~~ {class} lang"
[1]=> string(3) "~~~"
[2]=> string(0) "" # Returns "" for optional groups that dont exist
[3]=> string(5) "class"
[4]=> string(4) "lang"
}
The answers below helped me figure out why it wasn't working. However both the answers would give a positive match for $str = '~~~ lang {class} lang'; which I didn't want.
I fixed it my changing capturing group 2 to ([\w-]*) so that even if there is no string at that place, the capturing group exists but remains empty. This way all of the following strings match:
$str = '~~~ lang {no-lines float left} ';
$str = '~~~ {class} ';
$str = '~~~ lang';
$str = '~~~ {class } lang ';
$str = '~~~';
$str = '~~~lang{class}';
But this one won't:
$str = '~~~ css {class} php';
Full solution:
$str = '~~~ {class} lang';
preg_match('/^([~]{3,})\s*([\w-]*)?\s*(?:\{([\w-\s]+)\})?\s*(\2[\w-]+)?\s*$/', $str, $matches);
var_dump($matches);

Need Regexp help PHP

I have for example such string - "7-th Road" or "7th number some other words" or "Some word 8-th word".
I need to get the first occurrence of number and all other next symbols to first occurrence of space.
So for examples above i need such values "7-th", "7th", "8-th".
And then from these matches like "7-th" i need extract only numbers in other operations.
Thanks in advance!
Regex should be /(\d+)([^\d]+)\s/ and the numbers would resolve to $1 and the ending characters to $2
Sample Code:
$string = '7-th Road';
preg_match_all('/(\d+)([^\d]+)\s/', $string, $result, PREG_PATTERN_ORDER);
var_dump($result[1]);
array(1) {
[0]=> string(1) "7"
}
var_dump($result[2]);
array(1) {
[0]=> string(1) "-th"
}
Are you asking for something like this?
#(\d+)-?(?:st|nd|rd|th)#
Example
If you would like to get just nums from the text use it:
preg_match_all('/(\d+)[th|\-th]*?/','7-th", "7th", "8-th', $matches);
But if you would like to remove 'th' or other just do replacement:
preg_replace('/(\d+)[th|\-th]*?/','$1', 'some string')
Not sure about the last one...

Categories