For the life of me, I can't figure out how to write the regex to split this.
Lets say we have the sample text:
15HGH(Whatever)ASD
I would like to break it down into the following groups (numbers, letters by themselves, and parenthesis contents)
15
H
G
H
Whatever
A
S
D
It can have any combination of the above such as:
15HGH
12ABCD
ABCD(Whatever)(test)
So far, I have gotten it to break apart either the numbers/letters or just the parenthesis part broken away. For example, in this case:
<?php print_r(preg_split( "/(\(|\))/", "5(Test)(testing)")); ?>
It will give me
Array
(
[0] => 5
[1] => Test
[2] => testing
)
I am not really sure what to put in the regex to match on only numbers and individual characters when combined. Any suggestions?
I don't know if preg_match_all satisfying you:
$text = '15HGH(Whatever)ASD';
preg_match_all("/([a-z]+)(?=\))|[0-9]+|([a-z])/i", $text, $out);
echo '<pre>';
print_r($out[0]);
Array
(
[0] => 15
[1] => H
[2] => G
[3] => H
[4] => Whatever
[5] => A
[6] => S
[7] => D
)
I've got this: Example (I don't know how is written the \n) but the substitution is working.
(\d+|\w|\([^)]++\)) Not too much to explain, first tries to get a number, then a char, and if there's nothing there, tries to get a whole word between parentheses. (They can't be nested)
Check this out using preg_match_all():
$string = '15HGH(Whatever)(Whatever)ASD';
preg_match_all('/\(([^\)]+)\)|(\d+)|([a-z])/i', $string, $matches);
$results = array_merge(array_filter($matches[1]),array_filter($matches[2]),array_filter($matches[3]));
print_r($results);
\(([^\)]+)\) --> Matches everything between parenthesis
\d+ --> Numbers only
[a-z] --> Single letters only
i --> Case insensitive
Related
I tried multiple time to make a pattern that can validate given string is natural number and split into single number.
..and lack of understanding of regex, the closest thing that I can imagine is..
^([1-9])([0-9])*$ or ^([1-9])([0-9])([0-9])*$ something like that...
It only generates first, last, and second or last-second split-numbers.
I wonder what I need to know to solve this problem.. thanks
You may use a two step solution like
if (preg_match('~\A\d+\z~', $s)) { // if a string is all digits
print_r(str_split($s)); // Split it into chars
}
See a PHP demo.
A one step regex solution:
(?:\G(?!\A)|\A(?=\d+\z))\d
See the regex demo
Details
(?:\G(?!\A)|\A(?=\d+\z)) - either the end of the previous match (\G(?!\A)) or (|) the start of string (^) that is followed with 1 or more digits up to the end of the string ((?=\d+\z))
\d - a digit.
PHP demo:
$re = '/(?:\G(?!\A)|\A(?=\d+\z))\d/';
$str = '1234567890';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => 7
[7] => 8
[8] => 9
[9] => 0
)
I need to split my text into an array at every period, exclamation and question mark.
Example with a full-width period and exclamation mark:
$string = "日本語を勉強しているみんなを応援したいです。一緒に頑張りましょう!";
I am looking for the following output:
Array (
[0] => 日本語を勉強しているみんなを応援したいです。
[1] => 一緒に頑張りましょう! )
I need the same code to work with half-width.
Example with a mix of full-width and half-width:
$string = "Hi. I am Bob! Nice to meet you. 日本語を勉強しています。Do you understand me?";
Output:
Array (
[0] => Hi.
[1] => I am Bob!
[2] => Nice to meet you.
[3] => 日本語を勉強しています。
[4] => Do you understand me? )
I suck at regular expressions and can't figure out a solution nor find one.
I tried:
$string = preg_split('(.*?[。?!])', $string);
First of all, you forgot your delimiters (most commonly a slash).
You can split on \pP (a unicode punctuation - remember the u modifier meaning unicode):
You can see the rest of the special unicode characters here.
<?php
$str = 'Hi. I am Bob! Nice to meet you. 日本語を勉強しています。Do you understand me?';
$array = preg_split('/(?<=\pP)\s*/u', $str, null, PREG_SPLIT_NO_EMPTY);
print_r($array);
The PREG_SPLIT_NO_EMPTY is there to make sure that we don't include an empty match if your last character is punctuation.
Output:
Array
(
[0] => Hi.
[1] => I am Bob!
[2] => Nice to meet you.
[3] => 日本語を勉強しています。
[4] => Do you understand me?
)
Regex autopsy:
/ - the start delimiter - this must also come at the end before our modifiers
(?<=\pP) - a positive lookbehind matching \pP (a unicode punctuation - we could just use \pP, but then the punctuation would not be included in our final string - a positive lookbehind includes it)
\s* - a white space character matched 0 to infinity times - this is to make sure that we don't include the white space after the punctuation
/u - the end delimiter (/) and our modifier (u meaning "unicode")
DEMO
Your first sentence would result in the following array:
Array
(
[0] => 日本語を勉強しているみんなを応援したいです。
[1] => 一緒に頑張りましょう!
)
Please note that this includes all punctuation including commas.
Array
(
[0] => This is my sentence,
[1] => and it is very nice.
)
This can be fixed by using a negative lookbehind in front of our positive lookbehind:
/(?<![,、;;"”\'’``])(?<=\pP)\s*/u
I've been trying for the couple of days to split a string into letters and numbers. I've found various solutions but they do not work up to my expectations (some of them only separate letters from digits (not integers or float numbers/per say negative numbers).
Here's an example:
$input = '-4D-3A'; // edit: the TEXT part can have multiple chars, i.e. -4AB-3A-5SD
$result = preg_split('/(?<=\d)(?=[a-z])|(?<=[a-z])(?=\d)/i', $input);
print_r($result);
Result:
Array ( [0] => -4 [1] => D-3 [2] => A )
And I need it to be [0] => -4 [1] => D [2] => -3 [3] => A
I've tried doing several changes but no result so far, could you please help me if possible?
Thank you.
try this:
$input = '-4D-3A';
$result = preg_split('/(-?[0-9]+\.?[0-9]*)/i', $input, 0, PREG_SPLIT_DELIM_CAPTURE);
$result=array_filter($result);
print_r($result);
It will split by numbers BUT also capture the delimiter (number)
giving : Array ( [1] => -4 [4] => D [5] => -3 [8] => A )
I've patterened number as:
1. has optional negative sign (you may want to do + too)
2. followed by one or more digits
3. followed by an optional decimal point
4. followed by zero or more digits
Can anyone point out the solution to "-0." being valid number?
How about this regex? ([-]{,1}\d+|[a-zA-Z]+)
I tested it out on http://www.rubular.com/ seems to work as you want.
I'm having some difficulty with preg_match. I'm trying to match roman numerals, like this:
$string='This is roman XI and some other ones: XMCIII, like this.XXVIII'."\n";
preg_match('/(\s|\.)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\s/',$string,$matches);
print_r($matches);
It should match any roman numeral preceded with whitespace or period and ending with whitespace. But it returns the following:
Array
(
[0] => XI
[1] =>
[2] =>
[3] => X
[4] => I
)
You have {0, 4} or {0,3} ranges in regex which means that those parts are optional. You get spaces because space[nothing]space becomes a valid match.
You can simply filter out the empty space results from your array using array_filter
To split up a string, I come up with...
<php
preg_match_all('/(\w)|(,.!?;)/', "I'm a little teapot, short and stout.", $matches);
print_r($matches[0]);
I thought this would separate each word (\w) and the specified punctuation (,.!?;).
For example: ["I'm", "a", "little", "teapot", ",", "short", "and", "stout", "."]
Instead I get:
Array
(
[0] => I
[1] => m
[2] => a
[3] => l
[4] => i
[5] => t
[6] => t
[7] => l
[8] => e
[9] => t
[10] => e
[11] => a
[12] => p
[13] => o
etc...
What am I doing wrong here?
Thanks in advance.
You have two faults:
The \w matches only a single character. You want to match multiple by \w+. Furthermore \w matches only alphanumeric characters. If you want to match other characters like ' you will need to include them: [\w'].
The (,.!?;) matches the character sequence ,.!?;. Instead you want to match any of these characters using [,.!?;].
The correct regex is:
'/[\w\']+|[,.!?;]/'
If you want to be more permissive you should use unicode character classes instead (allows letters, numbers, combining marks, dash characters and the apostrophe for words and punctuation for punctuation):
'/[\pL\pN\pM\pPd\']+|\pP/u'
Try this - sure it works as you want:
([\w]+)|[,.!?;]+
Also want to share with you one very useful service - online regex tester
You may want to try something like:
/([^,.!?; ]+)|(,.!?;)/