Find all occurrences of a "unknown" substring in a string with PHP - php

I have a string and I need to find all occurrences of some substrings in it but I know only initials chars of substrings... Ho can I do?
Example:
$my_string = "This is a text cointaining [substring_aaa attr], [substring_bbb attr] and [substring], [substring], [substring] and I'll try to find them!";
I know all substrings begin with '[substring' and end with a space char (before attr) or ']' char, so in this example I need to find substring_aaa, substring_bbb and substring and count how many occurrences for each one of them.
The result would be an associative array with the substrings as keys and occurrerrences as values, example:
$result = array(
'substring' => 3,
'substring_aaa' => 1,
'substring_bbb' => 1
)

Match [substring and then NOT ] zero or more times and then a ]:
preg_match_all('/\[(substring[^\]]*)\]/', $my_string, $matches);
$matches[1] will yield:
Array
(
[0] => substring_aaa attr
[1] => substring_bbb attr
[2] => substring
[3] => substring
[4] => substring
)
Then you can count the values:
$result = array_count_values($matches[1]);
After rereading the question, if you don't want what comes after a space (attr in this case) then:
preg_match_all('/\[(substring[^\]\s]*)[\]\s]/', $my_string, $matches);
For which $matches[1] will yield:
Array
(
[0] => substring_aaa
[1] => substring_bbb
[2] => substring
[3] => substring
[4] => substring
)
With the array_count_values yielding:
Array
(
[substring_aaa] => 1
[substring_bbb] => 1
[substring] => 3
)

Related

Regex to get string between curly bracket tags in PHP

$HTML:
{list:start:Data}
{id} is having a title of {title}
{list:end:Data}
Data is dynamic and could be any string.
I am trying to loop all the occurences with the following code:
preg_match_all('/\{list:start:(.*?)\}(.*?)\{list:end:(.*?)\}/', $HTML, $match);
I want the following result:
$match = array(
array(
"string" => "Data",
"value" => "{id} is having a title of {title}"
)
);
but I get the follow result:
$match = array(
[0] => Array
(
)
[1] => Array
(
)
[2] => Array
(
)
[3] => Array
(
)
);
but that isn't working as $match returns an empty array. After a few hour of searching for a solution I am still no closer to a working result.
As an alternative, you can make use of a negated character class instead of using .*? with the /s modifier to have the dot match a newline.
If you don't want to match consecutive lines that start with {list: you can use a negative lookahead rules out those matches.
^{list:start:([^}]+)}\R((?:(?!{list:).*\R)*+){list:end:[^}]+}
The pattern matches:
^ Start of string
{list:start: Match literally (Note that the { does not need to be escaped)
( Capture group 1
[^}]+ Match 1+ times any char except }
) Close group 1
} Match the closing }
\R match any unicode newline sequence
( Capture group 2
(?:(?!{list:).*\R)*+ Repeat matching all lines as long as they not start with list:
) Close group 2
{list:end: Match literally
[^}]+ Match 1+ times any char except }
} Match the closing }
See a regex demo and a Php demo.
Example code
$re = '/^{list:start:([^}]+)}\R((?:(?!{list:).*\R)*){list:end:[^}]+}/m';
$str = '{list:start:Data}
{id} is having a title of {title}
{list:end:Data}
{list:start:Data}
{list:start:Data}
{id} is having a title of {title}
this is some text
{list:end:Data}';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r(array_map(function($x){
return [
"string" => $x[1],
"data" => trim($x[2])
];
}, $matches));
Output
Array
(
[0] => Array
(
[string] => Data
[data] => {id} is having a title of {title}
)
[1] => Array
(
[string] => Data
[data] => {id} is having a title of {title}
this is some text
)
)
You need to escape the braces, and use /s to match more than single line. Below is the code example.
Code
<?php
$input = '
{list:start:Data}
{id} is having a title of {title}
{list:end:Data}
{list:start:Data1}
{id1} is having a title of {title1}
{list:end:Data1}
{list:start:Data2}
{id2} is having a title of {title2}
{list:end:Data2}
{list:start:Data3}
{id3} is having a title of {title3}
{list:end:Data3}
';
preg_match_all(
"/\\{list:start:(.+?)\\}(.*?)\\{list:end:(.+?)\\}/s",
$input,
$preg_matches
);
$matches = [];
foreach ($preg_matches[1] as $k => $v) {
$matches[] = [
"string" => trim($v),
"data" => trim($preg_matches[2][$k])
];
}
print_r($matches);
Output
Array
(
[0] => Array
(
[string] => Data
[data] => {id} is having a title of {title}
)
[1] => Array
(
[string] => Data1
[data] => {id1} is having a title of {title1}
)
[2] => Array
(
[string] => Data2
[data] => {id2} is having a title of {title2}
)
[3] => Array
(
[string] => Data3
[data] => {id3} is having a title of {title3}
)
)

Regex - how to split string by commas, omitting commas in brackets

I have a string, say:
$str = "myTemplate, testArr => [1868,1869,1870], testInteger => 3, testString => 'test, can contain a comma'"
It basically represents a comma delimited list of parameters I need to parse.
I need to split this string in PHP (probably using preg_match_all) by commas (but omitting those in brackets and quotes) so the end result would be array of the following four matches:
myTemplate
testArr => [1868,1869,1870]
testInteger => 3
testString => 'test, can contain a comma'
The problem is with the array and string values. So any commas inside [ ] or ' ' or " " should not be considered as a delimiter.
There are many similar questions here, but I wasn't able to get it working for this particular situation. What would be the correct regex to get this result? Thank you!
You can use this lookaround based regex:
$str = "myTemplate, testArr => [1868,1869,1870], testInteger => 3, testString => 'test, can contain a comma'";
$arr = preg_split("/\s*,\s*(?![^][]*\])(?=(?:(?:[^']*'){2})*[^']*$)/", $str);
print_r( $arr );
There are 2 lookarounds used in this regex:
(?![^][]*\]) - Asserts comma is not inside [...]
(?=(?:(?:[^']*'){2})*[^']*$) - Asserts comma is not inside '...'
PS: This is assuming we don't have unbalanced/nested/escaped quotes and brackets.
RegEx Demo
Output:
Array
(
[0] => myTemplate
[1] => testArr => [1868,1869,1870]
[2] => testInteger => 3
[3] => testString => 'test, can contain a comma'
)
I wound do it like this:
<?php
$str = "myTemplate, testArr => [1868,1869,1870], testInteger => 3, testString => 'test, can contain a comma'";
$pattern[0] = "[a-zA-Z]+,"; // textonly entry
$pattern[1] = "\w+\s*?=>\s*\[.*\]\s*,?"; // array type entry with value enclosed in square brackets
$pattern[2] = "\w+\s*?=>\s*\d+\s*,?"; // array type entry with decimal value
$pattern[3] = "\w+\s*?=>\s*\'.*\'\s*,?"; // array type entry with string value
$regex = implode('|', $pattern);
preg_match_all("/$regex/", $str, $matches);
// You can also use the one liner commented below if you dont like to use the array
//preg_match_all("/[a-zA-Z]+,|\w+\s*?=>\s*\[.*\]\s*,?|\w+\s*?=>\s*\d+\s*,?|\w+\s*?=>\s*\'.*\'\s*,?/", $str, $matches);
print_r($matches);
This is easier to manage and I can easily add/remove patterns if needed. It will output like
Array
(
[0] => Array
(
[0] => myTemplate,
[1] => testArr => [1868,1869,1870],
[2] => testInteger => 3,
[3] => testString => 'test, can contain a comma'
)
)

how to split a string containing numbers in our outside parenthesis using preg_match_all

I have a string that looks something like this:
535354 345356 3543674 34667 2345347 -3536 4532452 (234536 2345634 -4513453) (2345 -13254 13545)
The text between () is always at the end of the string (at least for now).
i need to split it into an array similar to this:
[0] => [0] 535354,345356,3543674,34667,2345347,-3536,4532452
[1] => [0] 234536,2345634,-4513453
=> [1] 2345,-13254,13545
What expression should i use for preg_match_all?
Best i could get with my limited knowledge is /([0-9]{1,}){1,}.*(?=(\(.*\)))/U but i still get some unwanted elements.
You may use a regex that will match chunks of numbers outside of parentheses and those inside with "~(?<=\()\s*$numrx\s*(?=\))|\s*$numrx~" where a $numrx stands for the number regex (that can be enhanced further).
The -?\d+(?:\s+-?\d+)* matches an optional -, 1 or more digits, and then 0+ sequences of 1+ whitespaces followed with optional - and 1+ digits. (?<=\()\s*$numrx\s*(?=\)) matches the same only if preceded with ( and followed with ).
See this PHP snippet:
$s = "535354 345356 3543674 34667 2345347 -3536 4532452 (234536 2345634 -4513453) (2345 -13254 13545)";
$numrx = "-?\d+(?:\s+-?\d+)*";
preg_match_all("~(?<=\()\s*$numrx\s*(?=\))|\s*$numrx~", $s, $m);
$res = array();
foreach ($m[0] as $k) {
array_push($res,explode(" ",trim($k)));
}
print_r($res);
Output:
[0] => Array
(
[0] => 535354
[1] => 345356
[2] => 3543674
[3] => 34667
[4] => 2345347
[5] => -3536
[6] => 4532452
)
[1] => Array
(
[0] => 234536
[1] => 2345634
[2] => -4513453
)
[2] => Array
(
[0] => 2345
[1] => -13254
[2] => 13545
)
You can use this regex in preg_match_all:
$re = '/\d+(?=[^()]*[()])/';
RegEx Demo
RegEx Breakup:
\d+ # match 1 or more digits
(?= # lookahead start
[^()]* # match anything but ( or )
[()] # match ( or )
) # lookahead end

PHP: split string based on array

Below is that data I'm trying to parse:
50‐59 1High300.00 Avg300.00
90‐99 11High222.00 Avg188.73
120‐1293High204.00 Avg169.33
The first section is a weight range, next is a count, followed by Highprice, ending with Avgprice.
As an example, I need to parse the data above into an array which would look like
[0]50-59
[1]1
[2]High300.00
[3]Avg300.00
[0]90-99
[1]11
[2]High222.00
[3]Avg188.73
[0]120‐129
[1]3
[2]High204.00
[3]Avg169.33
I thought about creating an array of what the possible weight ranges can be but I can't figure out how to use the values of the array to split the string.
$arr = array("10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109","110-119","120-129","130-139","140-149","150-159","160-169","170-179","180-189","190-199","200-209","210-219","220-229","230-239","240-249","250-259","260-269","270-279","280-289","290-299","300-309");
Any ideas would be greatly appreciated.
Hope this will work:
$string='50-59 1High300.00 Avg300.00
90-99 11High222.00 Avg188.73
120-129 3High204.00 Avg169.33';
$requiredData=array();
$dataArray=explode("\n",$string);
$counter=0;
foreach($dataArray as $data)
{
if(preg_match('#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#', $data,$matches))
{
$requiredData[$counter][]=$matches[1];
$requiredData[$counter][]=$matches[2];
$requiredData[$counter][]=$matches[3];
$requiredData[$counter][]=$matches[4];
$counter++;
}
}
print_r($requiredData);
'#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#'
I don't think that will work because of the space you have in the regex
between the weight and count. The thing I'm struggling with is a row
like this where there is no space. 120‐1293High204.00 Avg169.33 that
needs to be parsed like [0]120‐129 [1]3 [2]High204.00 [3]Avg169.33
You are right. That can be remedied by limiting the number of weight digits to three and making the space optional.
'#^(\d+-\d{1,3}) *…
$arr = array('50-59 1High300.00 Avg300.00',
'90-99 11High222.00 Avg188.73',
'120-129 3High204.00 Avg169.33');
foreach($arr as $str) {
if (preg_match('/^(\d+-\d{1,3})\s*(\d+)(High\d+\.\d\d) (Avg\d+\.\d\d)/i', $str, $m)) {
array_shift($m); //remove group 0 (ie. the whole match)
$result[] = $m;
}
}
print_r($result);
Output:
Array
(
[0] => Array
(
[0] => 50-59
[1] => 1
[2] => High300.00
[3] => Avg300.00
)
[1] => Array
(
[0] => 90-99
[1] => 11
[2] => High222.00
[3] => Avg188.73
)
[2] => Array
(
[0] => 120-129
[1] => 3
[2] => High204.00
[3] => Avg169.33
)
)
Explanation:
/ : regex delimiter
^ : begining of string
( : start group 1
\d+-\d{1,3} : 1 or more digits a dash and 1 upto 3 digits ie. weight range
) : end group 1
\s* : 0 or more space character
(\d+) : group 2 ie. count
(High\d+\.\d\d) : group 3 literal High followed by price
(Avg\d+\.\d\d) : Group 4 literal Avg followed by price
/i : regex delimiter and case Insensitive modifier.
To be more generic, you could replace High and Avg by [a-z]+
This is a pattern you can trust (Pattern Demo):
/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m
The other answers overlooked the digital pattern in the weight range substring. The range start integer always ends in 0, and the range end integer always ends in 9; the range always spans ten integers.
My pattern will capture the digits that precede the 0 in the starting integer and reference them immediately after the dash, then require that captured number to be followed by a 9.
I want to point out that your sample input was a little bit tricky because your ‐ is not the standard - that is between the 0 and = on my keyboard. This was a sneaky little gotcha for me to solve.
Method (Demo):
$text = '50‐59 1High300.00 Avg300.00
90‐99 11High222.00Avg188.73
120‐1293High204.00 Avg169.33';
preg_match_all(
'/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m',
$text,
$matches,
PREG_SET_ORDER
);
var_export(
array_map(
fn($captured) => [
'weight range' => $captured[1],
'count' => $captured[3],
'Highprice' => $captured[4],
'Avgprice' => $captured[5]
],
$matches
)
);
Output:
array (
0 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
1 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
2 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
3 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
)

php regex split string by [%%%]

Hi I need a preg_split regex that will split a string at substrings in square brackets.
This example input:
$string = 'I have a string containing [substrings] in [brackets].';
should provide this array output:
[0]= 'I have a string containing '
[1]= '[substrings]'
[2]= ' in '
[3]= '[brackets]'
[4]= '.'
After reading your revised question:
This might be what you want:
$string = 'I have a string containing [substrings] in [brackets].';
preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
You should get:
Array
(
[0] => I have a string containing
[1] => [substrings]
[2] => in
[3] => [brackets]
[4] => .
)
Original answer:
preg_split('/%+/i', 'ot limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
You should get:
Array
(
[0] => ot limited to 3
[1] => so it can be
[2] => or
[3] => or
[4] => , etc Tha
)
Or if you want a mimimum of 3 then try:
preg_split('/%%%+/i', 'Not limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
Have a go at http://regex.larsolavtorvik.com/
I think this is what you are looking for:
$array = preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);

Categories