Retain Delimiters when Splitting String - php

Edit: OK, I can't read, thanks to Col. Shrapnel for the help. If anyone comes here looking for the same thing to be answered...
print_r(preg_split('/([\!|\?|\.|\!\?])/', $string, null, PREG_SPLIT_DELIM_CAPTURE));
Is there any way to split a string on a set of delimiters, and retain the position and character(s) of the delimiter after the split?
For example, using delimiters of ! ? . !? turning this:
$string = 'Hello. A question? How strange! Maybe even surreal!? Who knows.';
into this
array('Hello', '.', 'A question', '?', 'How strange', '!', 'Maybe even surreal', '!?', 'Who knows', '.');
Currently I'm trying to use print_r(preg_split('/([\!|\?|\.|\!\?])/', $string)); to capture the delimiters as a subpattern, but I'm not having much luck.

Your comment sounds like you've found the relevant flag, but your regex was a little off, so I'm going to add this anyway:
preg_split('/(!\?|[!?.])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
Note that this will leave spaces at the beginning of every string after the first, so you'll probably want to run them all through trim() as well.
Results:
$string = 'Hello. A question? How strange! Maybe even surreal!? Who knows.';
print_r(preg_split('/(!\?|[!?.])/', $string, null, PREG_SPLIT_DELIM_CAPTURE));
Array
(
[0] => Hello
[1] => .
[2] => A question
[3] => ?
[4] => How strange
[5] => !
[6] => Maybe even surreal
[7] => !?
[8] => Who knows
[9] => .
[10] =>
)

From PHP8.1, it is no longer permitted to use null as the limit parameter for preg_split() because an integer is expected. When seeking unlimited output elements from the return value, it is acceptable to use 0 or -1. (Demo)
To avoid empty elements in the returned array, I recommend PREG_SPLIT_NO_EMPTY as an additional flag. (Demo)
var_export(
preg_split(
'/(!\?|[!?.])/',
$string,
0,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
)
);
Since PHP8, it is technically possible to omit the limit parameter and declare flags by using named parameters.

Simply add the PREG_SPLIT_DELIM_CAPTURE to the preg_split function:
$str = 'Hello. A question? How strange!';
$var = preg_split('/([!?.])/', $str, 0, PREG_SPLIT_DELIM_CAPTURE);
$var = array(
0 => "Hello",
1 => ".",
2 => " A question",
3 => "?",
4 => " How strange",
5 => "!",
6 => "",
);

You can also split on the space after a ., !, ? or !?. But this can only be used if you can guarantee that there is a space after such a character.
You can do this, by matching a but with a positive look-back: (<=\.|!?|?|!): this makes the regex
'/(?<=\.|\?|!) /'
And then, you'll have to check if the strings matched ends with !?: if so, substring the last two. If not, you'll have to substring the last character.

Related

Check string words start from #

I have a string and need to get the list of all words that start from #
$string = "Hello #bablu, This is my friend #roshan. Say hi to all. Also, I introduce 1 friend that is amit#gmail.com."
Now in this string, I need to get only bablu and roshan. Not get amit#gmail.com because amit has an email address. Now I have explode from # but explode method split the email address too.
$explode = explode('#',$string);
print_r($explode);
How can I get only # words in PHP?
[
0 => "",
1 => "bablu",
2 => "",
3 => "roshan",
4 => "amit",
5 => "gmail.com"
]
My excepted answer would be :
[
0 => "bablu",
1 => "roshan"
]
explode doesn't do anything , all you need is to use preg_match_all
$string = "Hello #bablu, This is my friend #roshan. Say hi to all. Also, I introduce 1 friend that is amit#gmail.com.";
preg_match_all('/\B#([a-zA-Z]+)/', $string, $matches);
print_r($matches[1]);
Output with :
Array
(
[0] => bablu
[1] => roshan
)
The \B matches the empty string not at the beginning or end of a word. So you can ignore that email address.
It can be done by this..
$explode = explode(' #',$string);
By adding space before #

How do I split the letters and numbers to 2 arrays from a string in PHP

I would like to know how to split both letters and numbers in 2 separate arrays, for example if I have $string = "2w5d15h9s";, then I want it to become
$letters = ["w", "d", "h", "s"];
$numbers = [2, 5, 15, 9];
Anybody got any idea of how to do it?
I'm basically trying to make a ban command and I want to make it so you can specify a time for the ban to expire.
Use preg_split:
$string = "2w5d15h9s";
$letters = preg_split("/\d+/", $string);
array_shift($letters);
print_r($letters);
$numbers = preg_split("/[a-z]+/", $string);
array_pop($numbers);
print_r($numbers);
This prints:
Array
(
[0] => w
[1] => d
[2] => h
[3] => s
)
Array
(
[0] => 2
[1] => 5
[2] => 15
[3] => 9
)
Note that I am using array_shift or array_pop above to remove empty array elements which arise from the regex split. These empty entries occur because, for example, when spitting on digits the first character is a digit, which leaves behind an empty array element to the left of the first actual letter.
Using preg_split() with a PREG_SPLIT_NO_EMPTY flag is the most concise way that I can think of.
In terms of efficiency, preg_ functions are not famous for being fast. Scanning the string should arguably be done in one pass, but these micro-optimization considerations are probably not worth toiling over for such small input strings.
As for the patterns, \d+ means one or more consecutive digit characters and \D+ means one or more consecutive non-digit characters. The 0 is the limit parameter, which merely informs the function to have no limit (split the string as many times as it can).
Code: (Demo)
$string = "2w5d15h9s";
var_export(preg_split('/\d+/', $string, 0, PREG_SPLIT_NO_EMPTY)); // get non-numbers
var_export(preg_split('/\D+/', $string, 0, PREG_SPLIT_NO_EMPTY)); // get numbers
Output:
array (
0 => 'w',
1 => 'd',
2 => 'h',
3 => 's',
)
array (
0 => '2',
1 => '5',
2 => '15',
3 => '9',
)

Split string after each number

I have a database full of strings that I'd like to split into an array. Each string contains a list of directions that begin with a letter (U, D, L, R for Up, Down, Left, Right) and a number to tell how far to go in that direction.
Here is an example of one string.
$string = "U29R45U2L5D2L16";
My desired result:
['U29', 'R45', 'U2', 'L5', 'D2', 'L16']
I thought I could just loop through the string, but I don't know how to tell if the number is one or more spaces in length.
You can use preg_split to break up the string, splitting on something which looks like a U,L,D or R followed by numbers and using the PREG_SPLIT_DELIM_CAPTURE to keep the split text:
$string = "U29R45U2L5D2L16";
print_r(preg_split('/([UDLR]\d+)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
Output:
Array (
[0] => U29
[1] => R45
[2] => U2
[3] => L5
[4] => D2
[5] => L16
)
Demo on 3v4l.org
A regular expression should help you:
<?php
$string = "U29R45U2L5D2L16";
preg_match_all("/[A-Z]\d+/", $string, $matches);
var_dump($matches);
Because this task is about text extraction and not about text validation, you can merely split on the zer-width position after one or more digits. In other words, match one or more digits, then forget them with \K so that they are not consumed while splitting.
Code: (Demo)
$string = "U29R45U2L5D2L16";
var_export(
preg_split(
'/\d+\K/',
$string,
0,
PREG_SPLIT_NO_EMPTY
)
);
Output:
array (
0 => 'U29',
1 => 'R45',
2 => 'U2',
3 => 'L5',
4 => 'D2',
5 => 'L16',
)

How to preg_split without losing a character?

I have a string like this
$string = "Hello; how are you;Hey, I am fine";
$new = preg_split("/;\w/", $string);
print_r($new);
I am trying to split the string only when there is no white-space between the words and ";". But when I do this, I lose the H from Hey. It's probably because the split happens through the recognition of ;H. Could someone tell me how to prevent this?
My output:
$array = [
0 => [
0 => 'Hello; how are you ',
1 => 0,
],
1 => [
0 => 'ey, I am fine',
1 => 21,
],
]
You might use a word boundary \b:
\b;\b
$string = "Hello; how are you;Hey, I am fine";
$new = preg_split("/\b;\b/", $string);
print_r($new);
Demo
Or a negative lookahead and negative lookbehind
(?<! );(?! )
Demo
Lookarounds cost more steps. In terms of pattern efficiency, a word boundary is better and maintains the intended "no-length" character consumption.
In well-formed English, you won't ever have to check for a space before a semi-colon, so only 1 word boundary seems sufficient (I don't know if malformed English is possible because it is not represented in your sample string).
If you want to acquire the offset value, preg_split() has a flag for that.
Code: (Demo)
$string = "Hello; how are you;Hey, I am fine";
$new = preg_split("/;\b/", $string, -1, PREG_SPLIT_OFFSET_CAPTURE);
var_export($new);
Output:
array (
0 =>
array (
0 => 'Hello; how are you',
1 => 0,
),
1 =>
array (
0 => 'Hey, I am fine',
1 => 19,
),
)
Use split with this regex ;(?=\w) then you will not lose the H
You are capturingthe \w in your regex.You dont want that. Therefore, do this:
$new = preg_split("/;(?=\w)/", $string);
A capture group is defined in brackets, but the ?= means match but don't capture.
Check it out here https://3v4l.org/Q77LZ

PHP Regex with parentheses

I have the below string:
test 13 (8) end 12 (14:48 IN 1ST)
I need the output to be:
14:48 IN 1ST
or everything inside parentheses towards the end of the string.
I don't need, 8 which is inside first set of parentheses. There can be multiple sets of parentheses in the string. I only need to consider everything inside the last set of parentheses of input string.
Regex Explanation
.* Go to last
\( Stars with (
([^)]*) 0 or more character except )
\) Ends with
preg_match
$str = "test 13 (8) end 12 () (14:48 IN 1ST) asd";
$regex = "/.*\(([^)]*)\)/";
preg_match($regex,$str,$matches);
$matches
array (
0 => 'test 13 (8) end 12 () (14:48 IN 1ST)',
1 => '14:48 IN 1ST',
)
Accept Empty preg_match_all
$str = "test 13 (8) end 12 () (14:48 IN 1ST) asd";
$regex = "/\(([^)]*)\)/";
preg_match_all($regex,$str,$matches);
$matches
array (
0 =>
array (
0 => '(8)',
1 => '()',
2 => '(14:48 IN 1ST)',
),
1 =>
array (
0 => '8',
1 => '',
2 => '14:48 IN 1ST',
),
)
Don't Accept Empty preg_match_all
$str = "test 13 (8) end 12 () (14:48 IN 1ST) asd";
$regex = "/\(([^)]+)\)/";
preg_match_all($regex,$str,$matches);
$matches
array (
0 =>
array (
0 => '(8)',
1 => '(14:48 IN 1ST)',
),
1 =>
array (
0 => '8',
1 => '14:48 IN 1ST',
),
)
I wouldn't use a regex for this, it's unnecessary.
Use strrpos and substr to extract the string that you need. It's simple, straightforward, and achieves the desired output.
It works by finding the last '(' in the string, and removing one character from the end of the string.
$str = "test 13 (8) end 12 (14:48 IN 1ST)";
echo substr( $str, strrpos( $str, '(') + 1, -1);
$str = "(dont want to capture the (8)) test 13 (8) end 12 (14:48 IN 1ST)";
echo substr( $str, strrpos( $str, '(') + 1, -1);
Demo
Edit: I should also note that my solution will work for all of the following cases:
Empty parenthesis
One set of parenthesis (i.e. the string before the desired grouping does not contain parenthesis)
More than three sets of parenthesis (as long as the desired grouping is located at the end of the string)
Any text following the last parenthesis grouping (per the edits below)
Final edit: Again, I cannot emphasis enough that using a regex for this is unnecessary. Here's an example showing that string manipulation is 3x - 7x faster than using a regex.
As per MetaEd's comments, my example / code can easily be modified to ignore text after the last parenthesis.
$str = "test 13 (8) end 12 (14:48 IN 1ST) fkjdsafjdsa";
$beginning = substr( $str, strrpos( $str, '(') + 1);
substr( $beginning, 0, strpos( $beginning, ')')) . "\n";
STILL faster than a regex.
I would go with the following regex:
.*\(([^)]+)\)
\(.*\) will match the first and last parens. To prevent that, begin with .* which will greedily consume everything up to the final open paren. Then put a capture group around what you want to output, and you have:
.*\((.*)\)
This regex will do: .+\((.+?)\)$
Escape the parentheses, make the + non-greedy with ?, and make sure it's at the end of the line.
If there may be characters after it, try this instead:
.\).+\((.+?)\)
Which basically makes sure only the second parentheses will match. I would still prefer the first.
The easiest thing would be to split the string on ')' and then just grab everything from the last item in the resulting array up till '('... I know it's not strictly regex but it's close enough.
"test 13 (8) end 12 (14:48 IN 1ST)".split( /)/);
This will produce an array with two elements...
"test 13 (8"
and
" end 12 (14:48 IN 1ST"
Notice that no matter how many (xyz) you have in there you will end up with the last one in the last array item.
Then you just look through that last item for a '(' and if it's there grab everything behind it.
I suspect this will work faster than a straight regex approach, but I haven't tested, so can't guarantee that... regardless it does work.
[/edit]

Categories