Regex: Split string on number/string? - php

Consider the following:
700italic
regular
300bold
300bold900
All of those are different examples, only one of the rows will be executed per time.
Expected outcome:
// 700italic
array(
0 => 700
1 => itailc
)
// regular
array(
0 => regular
)
// 300bold
array(
0 => 300
1 => bold
)
// 300bold900
array(
0 => 300
1 => bold
2 => 900
)
I made the following:
(\d*)(\w*)
But it's not enough. It kinda works when i only have two "parts" (number|string or string|number) but if i add a third "segment" to it i wont work.
Any suggestions?

You could use preg_split instead. Then you can use lookarounds that match a position between a word an a letter:
$result = preg_split('/(?<=\d)(?=[a-z])|(?<=[a-z])(?=\d)/i', $input);
Note that \w matches digits (and underscores), too, in addition to letters.
The alternative (using a matching function) is to use preg_match_all and match only digits or letters for every match:
preg_match_all('/\d+|[a-z]+/i', $input, $result);
Instead of captures you will now get a single match for every of the desired elements in the resulting array. But you only want the array in the end, so you don't really care where they come from.

Could use the PREG_SPLIT_DELIM_CAPTURE flag.
Example:
<?php
$key= "group123425";
$pattern = "/(\d+)/";
$array = preg_split($pattern, $key, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($array);
?>
Check this post as well.

You're looking for preg_split:
preg_split(
'((\d+|\D+))', $subject, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
)
Demo
Or preg_match_all:
preg_match_all('(\d+|\D+)', $test, $matches) && $matches = $matches[0];
Demo

You should match it instead of splitting it..
Still you can split it using
(?<=\d)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=\d)

You can use a pattern like this:
(\d*)([a-zA-Z]*)(\d*)
Or you can use preg_match_all with a pattern like this:
'/(?:[a-zA-Z]+|\d+)/'
Then you can match an arbitrary number of segments, each consisting of only letters or only digits.

Maybe something like this:
(\d*)(bold|italic|regular)(\d*)
or
(\d*)([a-zA-Z]*)(\d*)

Related

How to capture substrings that start with a hashtag?

I'm looking for php code to split word from text using regex, preg_match() , preg_split(), or any other way.
$textstring="one #two three #four #five";
I need to get #two,#four, and #five saved as array elements.
Try this:
$text="one #two three #four #five";
$parts = array_filter(
explode(' ', $text), // split into words
function($word) {
// filter out all that don't start with '#' by keeping all that do
return strpos($word,"#")===0;
// returns true when the word starts with "#", false otherwise
}
);
print_r($parts);
You can see it here: https://3v4l.org/YBnu3
You may also want to read up on array_filter
Use a negated character class after the # symbol for highest pattern efficiency:
Pattern: (Demo)
#[^ ]+ #this will match the hashtag then one or more non-space characters.
Code: (Demo)
$in='one #two three #four #five';
var_export(preg_match_all('/#[^ ]+/',$in,$out)?$out[0]:'failed'); // access [0] of result
Output:
array (
0 => '#two',
1 => '#four',
2 => '#five',
)
Try splitting by this: \b #
preg_split("/\\b #/", $text)

RegEx Named Capturing Groups in PHP

I have the following regex to capture a list of numbers (it will be more complex than this eventually):
$list = '10,9,8,7,6,5,4,3,2,1';
$regex =
<<<REGEX
/(?x)
(?(DEFINE)
(?<number> (\d+) )
(?<list> (?&number)(,(?&number))* )
)
^(?&list)/
REGEX;
$matches = array();
if (preg_match($regex,$list,$matches)==1) {
print_r($matches);
}
Which outputs:
Array ( [0] => 10,9,8,7,6,5,4,3,2,1 )
How do I capture the individual numbers in the list in the $matches array? I don't seem to be able to do it, despite putting a capturing group around the digits (\d+).
EDIT
Just to make it clearer, I want to eventually use recursion, so explode is not ideal:
$match =
<<<REGEX
/(?x)
(?(DEFINE)
(?<number> (\d+) )
(?<member> (?&number)|(?&list) )
(?<list> \( ((?&number)|(?&member))(,(?&member))* \) )
)
^(?&list)/
REGEX;
The purpose of a (?(DEFINE)...) section is only to define named sub-patterns you can use later in the define section itself or in the main pattern. Since these sub-patterns are not defined in the main pattern they don't capture anything, and a reference (?&number) is only a kind of alias for the sub-pattern \d+ and doesn't capture anything too.
Example with the string: 1abcde2
If I use this pattern: /^(?<num>\d).....(?&num)$/ only 1 is captured in the group num, (?&num) doesn't capture anything, it's only an alias for \d./^(?<num>\d).....\d$/ produces exactly the same result.
An other point to clarify. With PCRE (the PHP regex engine), a capture group (named or not) can only store one value, even if you repeat it.
The main problem of your approach is that you are trying to do two things at the same time:
you want to check the format of the string.
you want to extract an unknown number of items.
Doing this is only possible in particular situations, but impossible in general.
For example, with a flat list like: $list = '10,9,8,7,6,5,4,3,2,1'; where there are no nested elements, you can use a function like preg_match_all to reuse the same pattern several times in this way:
if (preg_match_all('~\G(\d+)(,|$)~', $list, $matches) && !end($matches[2])) {
// \G ensures that results are contiguous
// you have all the items in $matches[1]
// if the last item of $matches[2] is empty, this means
// that the end of the string is reached and the string
// format is correct
echo '<°)))))))>';
}
Now if you have a nested list like $list = '10,9,(8,(7,6),5),4,(3,2),1'; and you want for example to check the format and to produce a tree structure like:
[ 10, 9, [ 8, [ 7, 6 ], 5 ], 4 , [ 3, 2 ], 1 ]
You can't do it with a single pass. You need one pattern to check the whole string format and an other pattern to extract elements (and a recursive function to use it).
<<<FORGET_THIS_IMMEDIATELY
As an aside you can do it with eval and strtr, but it's a very dirty and dangerous way:
eval('$result=[' . strtr($list, '()', '[]') . '];');
FORGET_THIS_IMMEDIATELY;
If you mean to get an array of the comma delimited numbers, then explode:
$numbers = explode(',', $matches[0]); //first parameter is your delimiter what the string will be split up by. And the second parameter is the initial string
print_r($numbers);
output:
Array(
[0] => 10,
[1] => 9,
[2] => 8,
etc
For this simple list, this would be enough (if you have to use a regular expression):
$string = '10,9,8,7,6,5,4,3,2,1';
$pattern = '/([\d]+),?/';
preg_match_all($pattern, $string, $matches);
print_r($matches[1]);

Regular expression in PHP being too greedy on words

I know I'm just being simple-minded at this point but I'm stumped. Suppose I have a textual target that looks like this:
Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.
Using this RegExp: \s[A-Z][A-Z]\d\d\d\d\s, how would I extract, individually, the first and second occurrences of the matching strings? "JH6781" and "RB1223", respectively. I guarantee that the matching string will appear exactly twice in the target text.
Note: I do NOT want to change the existing string at all, so str_replace() is not an option.
Erm... how about using this regex:
/\b[A-Z]{2}\d{4}\b/
It means 'match boundary of a word, followed by exactly two capital English letters, followed by exactly four digits, followed by a word boundary'. So it won't match 'TGX7777' (word boundary is followed by three letters - pattern match failed), and it won't match 'TX77777' (four digits are followed by another digit - fail again).
And that's how it can be used:
$str = "Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.";
preg_match_all('/\b[A-Z]{2}\d{4}\b/', $str, $matches);
var_dump($matches[0]);
// array
// 0 => string 'JH6781' (length=6)
// 1 => string 'RB1223' (length=6)
$s='Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother\'s HG766 id was RB1223.';
$n=preg_match_all('/\b[A-Z][A-Z]\d\d\d\d\b/',$s,$m);
gives the result $n=2, then
print_r($m);
gives the result
Array
(
[0] => Array
(
[0] => JH6781
[1] => RB1223
)
)
You could use a combination of preg_match with the offset parameter(5th) and strpos to select the first and second occurrence.
Alternatively you could use preg_match_all and just use the first two array entries
<?php
$first = preg_match($regex, $subject, $match);
$second = preg_match($regex, $subject, $match, 0, strpos($match[0]) + 1);
?>

How do i break string into words at the position of number

I have some string data with alphanumeric value. like us01name, phc01name and other i.e alphabates + number + alphabates.
i would like to get first alphabates + number in first string and remaining on second.
How can i do it in php?
You can use a regular expression:
// if statement checks there's at least one match
if(preg_match('/([A-z]+[0-9]+)([A-z]+)/', $string, $matches) > 0){
$firstbit = $matches[1];
$nextbit = $matches[2];
}
Just to break the regular expression down into parts so you know what each bit does:
( Begin group 1
[A-z]+ As many alphabet characters as there are (case agnostic)
[0-9]+ As many numbers as there are
) End group 1
( Begin group 2
[A-z]+ As many alphabet characters as there are (case agnostic)
) End group 2
Try this code:
preg_match('~([^\d]+\d+)(.*)~', "us01name", $m);
var_dump($m[1]); // 1st string + number
var_dump($m[2]); // 2nd string
OUTPUT
string(4) "us01"
string(4) "name"
Even this more restrictive regex will also work for you:
preg_match('~([A-Z]+\d+)([A-Z]+)~i', "us01name", $m);
You could use preg_split on the digits with the pattern capture flag. It returns all pieces, so you'd have to put them back together. However, in my opinion is more intuitive and flexible than a complete pattern regex. Plus, preg_split() is underused :)
Code:
$str = 'user01jason';
$pieces = preg_split('/(\d+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($pieces);
Output:
Array
(
[0] => user
[1] => 01
[2] => jason
)

Regex For Get Last URL

I have:
stackoverflow.com/.../link/Eee_666/9_uUU/66_99U
What regex for /Eee_666/9_uUU/66_99U?
Eee_666, 9_uUU, and 66_99U is a random value
How can I solve it?
As simple as that:
$link = "stackoverflow.com/.../link/Eee_666/9_uUU/66_99U";
$regex = '~link/([^/]+)/([^/]+)/([^/]+)~';
# captures anything that is not a / in three different groups
preg_match_all($regex, $link, $matches);
print_r($matches);
Be aware though that it eats up any character expect the / (including newlines), so you either want to exclude other characters as well or feed the engine only strings with your format.
See a demo on regex101.com.
You can use \K here to makei more thorough.
stackoverflow\.com/.*?/link/\K([^/\s]+)/([^/\s]+)/([^/\s]+)
See demo.
https://regex101.com/r/jC8mZ4/2
In the case you don't how the length of the String:
$string = stackoverflow.com/.../link/Eee_666/9_uUU/66_99U
$regexp = ([^\/]+$)
result:
group1 = 66_99U
be careful it may also capture the end line caracter
For this kind of requirement, it's simpler to use preg_split combined with array_slice:
$url = 'stackoverflow.com/.../link/Eee_666/9_uUU/66_99U';
$elem = array_slice(preg_split('~/~', $url), -3);
print_r($elem);
Output:
Array
(
[0] => Eee_666
[1] => 9_uUU
[2] => 66_99U
)

Categories