Regex to truncate string on closest expression/space with length limit - php

I want to get this results (from -> to)
# use string length limit = 3
1 {2 3} -> 1 # the string between the {} must be whole
1 2 3 -> 1 2
1 23 -> 1
{1} -> {1}
{1 2} -> empty
123456 -> 123 # if there is no spaces, cut string by symbols (except {*} expressions). Not necessarily but it would be cool
# one more example. Use string length limit = 5
{1} 2 -> {1} 2
123 45 -> 123
123 4 -> 123 4
Is there a way to do this using PHP with one regex expression?
Length limit may be dynamic.
Similar question - Get first 100 characters from string, respecting full words (but my question requires full contain {*} expressions )
I tried: ^(.{1,3})({.*}|\s|$)

The idea here is to define your atomic bits, match each, and use a negative lookbehind to limit the character length (also makes sure to ditch trailing whitespace as well - not sure if this is needed or not, but figured I'd throw it in.)
Only other thing is to use a conditional expression to see whether it's just a single uninterrupted series of chars and split it naively if so (for your 123456 -> 123 example.)
function truncate($string, $length)
{
$regex = <<<REGEX
/
(?(DEFINE)
(?<chars> [^\s{}]+ )
(?<group> { (?&atom)* } )
(?<atom> (?&chars) | (?&group) | \s )
)
\A
(?(?=.*[\s{}])
(?&atom)*(?<! \s | .{{$length}}. ) |
.{0,$length}
)
/x
REGEX;
preg_match($regex, $string, $matches);
return $matches[0];
}
$samples = <<<'DATA'
1 {2 3}
1 2 3
1 23
{1}
{1 2}
123456
DATA;
foreach (explode("\n", $samples) as $sample) {
var_dump(truncate($sample, 3));
}
Output:
string(1) "1"
string(3) "1 2"
string(1) "1"
string(3) "{1}"
string(0) ""
string(3) "123"
And:
$samples = <<<'DATA'
{1} 2
123 45
123 4
DATA;
foreach (explode("\n", $samples) as $sample) {
var_dump(truncate($sample, 5));
}
Outputs:
string(5) "{1} 2"
string(3) "123"
string(5) "123 4"

The solution using preg_match_all function with specific regex pattern:
$str = '1 {2 3}
1 2 3
1 23
{1}
{1 2}
123456 ';
$re = '/^(\S \S{1}(?=\s)|\S(?= \S{2})|\{\S\}|\w{3}(?=\w))/m';
preg_match_all($re, $str, $matches);
// the new line containing truncated items(you can `implode` it to get a single string)
print_r($matches[0]);
The output:
Array
(
[0] => 1
[1] => 1 2
[2] => 1
[3] => {1}
[4] => 123
)
Regex demo (check "Explanation" section at the right side)

try this one:
/^([\w ]{1,3}(?= )|\w{1,3}|\{\w\})/gm
It's working with given samples https://regex101.com/r/iF2tSp/3
1 {2 3}
1 2 3
1 23
{1}
{1 2}
123456
Match 1
Full match 0-1 `1`
Group 1. n/a `1`
Match 2
Full match 8-11 `1 2`
Group 1. n/a `1 2`
Match 3
Full match 14-15 `1`
Group 1. n/a `1`
Match 4
Full match 19-22 `{1}`
Group 1. n/a `{1}`
Match 5
Full match 29-32 `123`
Group 1. n/a `123`

Related

PHP string split regular

Regular exp = (Digits)*(A|B|DF|XY)+(Digits)+
I'm confused about this pattern really
I want to separate this string in PHP, someone can help me
My input maybe something like this
A1234
B 1239
1A123
12A123
1A 1234
12 A 123
1234 B 123456789
12 XY 1234567890
and convert to this
Array
(
[0] => 12
[1] => XY
[2] => 1234567890
)
<?php
$input = "12 XY 123456789";
print_r(preg_split('/\d*[(A|B|DF|XY)+\d+]+/', $input, 3));
//print_r(preg_split('/[\s,]+/', $input, 3));
//print_r(preg_split('/\d*[\s,](A|B)+[\s,]\d+/', $input, 3));
You may match and capture the numbers, letters, and numbers:
$input = "12 XY 123456789";
if (preg_match('/^(?:(\d+)\s*)?(A|B|DF|XY)(?:\s*(\d+))?$/', $input, $matches)){
array_shift($matches);
print_r($matches);
}
See the PHP demo and the regex demo.
^ - start of string
(?:(\d+)\s*)? - an optional sequence of:
(\d+) - Group 1: any or more digits
\s* - 0+ whitespaces
(A|B|DF|XY) - Group 2: A, B, DF or XY
(?:\s*(\d+))? - an optional sequence of:
\s* - 0+ whitespaces
(\d+) - Group 3: any or more digits
$ - end of string.

Regex (preg_split): how do I split based on a delimiter, excluding delimiters included in a pair of quotes?

I split this:
1 2 3 4/5/6 "7/8 9" 10
into this:
1
2
3
4
5
6
"7/8 9"
10
with preg_split()
So my question is, how do I split based on a delimiter, excluding delimiters inside a pair of quotes?
I kind of want to avoid capturing the things in quotes first and would ideally like it to be a one liner.
You can use the following.
$text = '1 2 3 4/5/6 "7/8 9" 10';
$results = preg_split('~"[^"]*"(*SKIP)(*F)|[ /]+~', $text);
print_r($results);
Explanation:
On the left side of the alternation operator we match anything in quotations making the subpattern fail, forcing the regular expression engine to not retry the substring using backtracking control with (*SKIP) and (*F). The right side of the alternation operator matches either a space character or a forward slash not in quotations.
Output
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => "7/8 9"
[7] => 10
)
You can use:
$s = '1 2 3 4/5/6 "7/8 9" 10';
$arr = preg_split('~("[^"]*")|[ /]+~', $s, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
print_r( $arr );
OUTPUT:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => "7/8 9"
[7] => 10
)
An other way with an optional group:
$arr = preg_split('~(?:"[^"]*")?\K[/\s]+~', $s);
The pattern "[^"]*"[/\s]+ matches a quoted part followed by one or more spaces and slashes. But since you don't want to remove quoted parts, you put a \K after it. The \K removes all that have been matched on the left from the match result. With this trick, when a quoted part is found the regex engine returns only spaces or slashes after and split on them.
Since there are not always a quoted part before a space or a slash, you only need to make it optional with a non-capturing group (?:...) and a question mark ?

Split 4 digit numbers

I want to split a 4 digit number with 4 digit decimal .
Inputs:
Input 1 : 5546.263
Input 2 : 03739.712 /*(some time may have one zero at first)*/
Result: (array)
Result of input 1 : 0 => 55 , 1 => 46.263
Result of input 2 : 0 => 37 , 1 => 39.712
P.S : Inputs is GPS data and always have 4 digit as number / 3 digit as decimal and some time have zero at first .
You could use the following function:
function splitNum($num) {
$num = ltrim($num, '0');
$part1 = substr($num, 0, 2);
$part2 = substr($num, 2);
return array($part1, $part2);
}
Test case 1:
print_r( splitNum('5546.263') );
Output:
Array
(
[0] => 55
[1] => 46.263
)
Test case 2:
print_r( splitNum('03739.712') );
Output:
Array
(
[0] => 37
[1] => 39.712
)
Demo!
^0*([0-9]{2})([0-9\.]+) should work just fine and do what you want:
$input = '03739.712';
if (preg_match('/^0*([0-9]{2})([0-9\.]+)/', $input, $matches)) {
$result = array((int)$matches[1], (float)$matches[2]);
}
var_dump($result); //array(2) { [0]=> int(37) [1]=> float(39.712) }
Regex autopsy:
^ - the string MUST start here
0* - the character '0' repeated 0 or more times
([0-9]{2}) - a capturing group matching a digit between 0 and 9 repeated exactly 2 times
([0-9\.]+) - a capturing group matching a digit between 0 and 9 OR a period repeated 1 or more times
Optionally you can add $ to the end to specify that "the string MUST end here"
Note: Since we cast to an int in the first match, you can omit the 0* part, but if you plan NOT to cast it, then leave it in.

regex match between 2 strings

For example I have the text
a1aabca2aa3adefa4a
I want to extract 2 and 3 with a regex between abc and def, so 1 and 4 should be not included in the result.
I tried this
if(preg_match_all('#abc(?:a(\d)a)+def#is', file_get_contents('test.txt'), $m, PREG_SET_ORDER))
print_r($m);
I get this
> Array
(
[0] => Array
(
[0] => abca1aa2adef
[1] => 3
)
)
But I want this
Array
(
[0] => Array
(
[0] => abca1aa2adef
[1] => 2
[2] => 3
)
)
Is this possible with one preg_match_all call? How can I do it?
Thanks
preg_match_all(
'/\d # match a digit
(?=.*def) # only if followed by <anything> + def
(?!.*abc) # and not followed by <anything> + abc
/x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
works on your example. It assumes that there is exactly one instance of abc and def per line in your string.
The reason why your attempt didn't work is that your capturing group (\d) that matches the digit is within another, repeated group (?:a(\d)a)+. With every repetition, the result of the capture is overwritten. This is how regular expressions work.
In other words - see what's happening during the match:
Current position Current part of regex Capturing group 1
--------------------------------------------------------------
a1a no match, advancing... undefined
abc abc undefined
a2a (?:a(\d)a) 2
a3a (?:a(\d)a) (repeated) 3 (overwrites 2)
def def 3
You ask if it is possible with a single preg_match_all.
Indeed it is.
This code outputs exactly what you want.
<?php
$subject='a1aabca2aa3adefa4a';
$pattern='/abc(?:a(\d)a+(\d)a)def/m';
preg_match_all($pattern, $subject, $all_matches,PREG_OFFSET_CAPTURE | PREG_PATTERN_ORDER);
$res[0]=$all_matches[0][0][0];
$res[1]=$all_matches[1][0][0];
$res[2]=$all_matches[2][0][0];
var_dump($res);
?>
Here is the output:
array
0 => string 'abca2aa3adef' (length=12)
1 => string '2' (length=1)
2 => string '3' (length=1)

regex to match 3 parts from a given string

Example input:
hjkhwe5boijdfg
I need to split this into 3 variables as below:
hjkhwe5 (any length, always ends in some number (can be any number))
b (always a single letter, can be any letter)
oijdfg (everything remaining at the
end, numbers or letters in any combination)
I've got the PHP preg_match all setup but have no idea how to do this complex regex. Could someone give me a hand?
Have a try with:
$str = 'hjkhwe5boijdfg';
preg_match("/^([a-z]+\d+)([a-z])(.*)$/", $str, $m);
print_r($m);
output:
Array
(
[0] => hjkhwe5boijdfg
[1] => hjkhwe5
[2] => b
[3] => oijdfg
)
Explanation:
^ : begining of line
( : 1rst group
[a-z]+ : 1 or more letters
\d+ : followed by 1 or more digit
) : end of group 1
( : 2nd group
[a-z] : 1 letter
) : end group 2
( : 3rd group
.* : any number of any char
) : end group 3
$
You can use preg_match as:
$str = 'hjkhwe5boijdfg';
if(preg_match('/^(\D*\d+)(\w)(.*)$/',$str,$m)) {
// $m[1] has part 1, $m[2] has part 2 and $m[3] has part 3.
}
See it

Categories