How to get (in PHP) all substrings, which describes a regular expression? - php

I'm creating regular expression in the form: A | B | C ... automatically, by program,
where A, B, C, ... are constant strings.
I need to find all the matches that correspond to these regular expression,
even if the A, B, C, ... have not empty intersection, or someone is substring of other.
Example:
preg_match_all ('/Hello World|Hello|World lo/i', 'xxxHello worldxxx', $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
var_export ($m);
It gives:
array (
0 =>
array (
0 =>
array (
0 => 'Hello World'
1 => 3, // start of match
)
)
)
I would need:
array (
0 =>
array (
0 =>
array (
0 => 'Hello World'
1 => 3, // start of match
)
1 =>
array (
0 => 'Hello'
1 => 3, // start of match
)
2 =>
array (
0 => 'lo world'
1 => 6, // start of match
)
)
)
Is there any way to get it?
Thanks

Run a preg_match_all for each expression.

I would use strpos:
$str = 'xxxHello worldxxx';
$arr = array('Hello World', 'Hello', 'World');
foreach($arr as $word) {
$pos = strpos(strtolower($str), strtolower($word));
echo "$word found at char $pos\n";
}
output:
Hello World found at char 3
Hello found at char 3
World found at char 9

Related

Find all occurrences of a "unknown" substring in a string with PHP

I have a string and I need to find all occurrences of some substrings in it but I know only initials chars of substrings... Ho can I do?
Example:
$my_string = "This is a text cointaining [substring_aaa attr], [substring_bbb attr] and [substring], [substring], [substring] and I'll try to find them!";
I know all substrings begin with '[substring' and end with a space char (before attr) or ']' char, so in this example I need to find substring_aaa, substring_bbb and substring and count how many occurrences for each one of them.
The result would be an associative array with the substrings as keys and occurrerrences as values, example:
$result = array(
'substring' => 3,
'substring_aaa' => 1,
'substring_bbb' => 1
)
Match [substring and then NOT ] zero or more times and then a ]:
preg_match_all('/\[(substring[^\]]*)\]/', $my_string, $matches);
$matches[1] will yield:
Array
(
[0] => substring_aaa attr
[1] => substring_bbb attr
[2] => substring
[3] => substring
[4] => substring
)
Then you can count the values:
$result = array_count_values($matches[1]);
After rereading the question, if you don't want what comes after a space (attr in this case) then:
preg_match_all('/\[(substring[^\]\s]*)[\]\s]/', $my_string, $matches);
For which $matches[1] will yield:
Array
(
[0] => substring_aaa
[1] => substring_bbb
[2] => substring
[3] => substring
[4] => substring
)
With the array_count_values yielding:
Array
(
[substring_aaa] => 1
[substring_bbb] => 1
[substring] => 3
)

Convert string to array at different character occurence

Consider I have this string 'aaaabbbaaaaaabbbb' I want to convert this to array so that I get the following result
$array = [
'aaaa',
'bbb',
'aaaaaa',
'bbbb'
]
How to go about this in PHP?
PHP code demo
Regex: (.)\1{1,}
(.): Match and capture single character.
\1: This will contain first match
\1{1,}: Using matched character one or more times.
<?php
ini_set("display_errors", 1);
$string="aaaabbbaaaaaabbbb";
preg_match_all('/(.)\1{1,}/', $string,$matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => aaaa
[1] => bbb
[2] => aaaaaa
[3] => bbbb
)
[1] => Array
(
[0] => a
[1] => b
[2] => a
[3] => b
)
)
Or:
PHP code demo
<?php
$string="aaaabbbaaaaaabbbb";
$array=str_split($string);
$start=0;
$end= strlen($string);
$indexValue=$array[0];
$result=array();
$resultantArray=array();
while($start!=$end)
{
if($indexValue==$array[$start])
{
$result[]=$array[$start];
}
else
{
$resultantArray[]=implode("", $result);
$result=array();
$result[]=$indexValue=$array[$start];
}
$start++;
}
$resultantArray[]=implode("", $result);
print_r($resultantArray);
Output:
Array
(
[0] => aaaa
[1] => bbb
[2] => aaaaaa
[3] => bbbb
)
I have written a one-liner using only preg_split() that generates the expected result with no wasted memory (no array bloat):
Code (Demo):
$string = 'aaaabbbaaaaaabbbb';
var_export(preg_split('/(.)\1*\K/', $string, 0, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => 'aaaa',
1 => 'bbb',
2 => 'aaaaaa',
3 => 'bbbb',
)
Pattern:
(.) #match any single character
\1* #match the same character zero or more times
\K #keep what is matched so far out of the overall regex match
The real magic happens with the \K, for more reading go here.
The 0 parameter in preg_split() means "unlimited matches". This is the default behavior, but it needs to hold its place in the function so that the next parameter is used appropriately as a flag
The final parameter is PREG_SPLIT_NO_EMPTY which removes any empty matches.
Sahil's preg_match_all() method preg_match_all('/(.)\1{1,}/', $string,$matches); is a good attempt but it is not perfect for two reasons:
The first issue is that his use of preg_match_all() returns two subarrays which is double the necessary result.
The second issue is revealed when $string="abbbaaaaaabbbb";. His method will ignore the first lone character. Here is its output:
Array (
[0] => Array
(
[0] => bbb
[1] => aaaaaa
[2] => bbbb
)
[1] => Array
(
[0] => b
[1] => a
[2] => b
)
)
Sahil's second attempt produces the correct output, but requires much more code. A more concise non-regex solution could look like this:
$array = str_split($string);
$last = "";
foreach ($array as $v) {
if (!$last || strpos($last, $v) !== false) {
$last .= $v;
} else {
$result[] = $last;
$last = $v;
}
}
$result[] = $last;
var_export($result);

PHP: split string based on array

Below is that data I'm trying to parse:
50‐59 1High300.00 Avg300.00
90‐99 11High222.00 Avg188.73
120‐1293High204.00 Avg169.33
The first section is a weight range, next is a count, followed by Highprice, ending with Avgprice.
As an example, I need to parse the data above into an array which would look like
[0]50-59
[1]1
[2]High300.00
[3]Avg300.00
[0]90-99
[1]11
[2]High222.00
[3]Avg188.73
[0]120‐129
[1]3
[2]High204.00
[3]Avg169.33
I thought about creating an array of what the possible weight ranges can be but I can't figure out how to use the values of the array to split the string.
$arr = array("10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109","110-119","120-129","130-139","140-149","150-159","160-169","170-179","180-189","190-199","200-209","210-219","220-229","230-239","240-249","250-259","260-269","270-279","280-289","290-299","300-309");
Any ideas would be greatly appreciated.
Hope this will work:
$string='50-59 1High300.00 Avg300.00
90-99 11High222.00 Avg188.73
120-129 3High204.00 Avg169.33';
$requiredData=array();
$dataArray=explode("\n",$string);
$counter=0;
foreach($dataArray as $data)
{
if(preg_match('#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#', $data,$matches))
{
$requiredData[$counter][]=$matches[1];
$requiredData[$counter][]=$matches[2];
$requiredData[$counter][]=$matches[3];
$requiredData[$counter][]=$matches[4];
$counter++;
}
}
print_r($requiredData);
'#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#'
I don't think that will work because of the space you have in the regex
between the weight and count. The thing I'm struggling with is a row
like this where there is no space. 120‐1293High204.00 Avg169.33 that
needs to be parsed like [0]120‐129 [1]3 [2]High204.00 [3]Avg169.33
You are right. That can be remedied by limiting the number of weight digits to three and making the space optional.
'#^(\d+-\d{1,3}) *…
$arr = array('50-59 1High300.00 Avg300.00',
'90-99 11High222.00 Avg188.73',
'120-129 3High204.00 Avg169.33');
foreach($arr as $str) {
if (preg_match('/^(\d+-\d{1,3})\s*(\d+)(High\d+\.\d\d) (Avg\d+\.\d\d)/i', $str, $m)) {
array_shift($m); //remove group 0 (ie. the whole match)
$result[] = $m;
}
}
print_r($result);
Output:
Array
(
[0] => Array
(
[0] => 50-59
[1] => 1
[2] => High300.00
[3] => Avg300.00
)
[1] => Array
(
[0] => 90-99
[1] => 11
[2] => High222.00
[3] => Avg188.73
)
[2] => Array
(
[0] => 120-129
[1] => 3
[2] => High204.00
[3] => Avg169.33
)
)
Explanation:
/ : regex delimiter
^ : begining of string
( : start group 1
\d+-\d{1,3} : 1 or more digits a dash and 1 upto 3 digits ie. weight range
) : end group 1
\s* : 0 or more space character
(\d+) : group 2 ie. count
(High\d+\.\d\d) : group 3 literal High followed by price
(Avg\d+\.\d\d) : Group 4 literal Avg followed by price
/i : regex delimiter and case Insensitive modifier.
To be more generic, you could replace High and Avg by [a-z]+
This is a pattern you can trust (Pattern Demo):
/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m
The other answers overlooked the digital pattern in the weight range substring. The range start integer always ends in 0, and the range end integer always ends in 9; the range always spans ten integers.
My pattern will capture the digits that precede the 0 in the starting integer and reference them immediately after the dash, then require that captured number to be followed by a 9.
I want to point out that your sample input was a little bit tricky because your ‐ is not the standard - that is between the 0 and = on my keyboard. This was a sneaky little gotcha for me to solve.
Method (Demo):
$text = '50‐59 1High300.00 Avg300.00
90‐99 11High222.00Avg188.73
120‐1293High204.00 Avg169.33';
preg_match_all(
'/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m',
$text,
$matches,
PREG_SET_ORDER
);
var_export(
array_map(
fn($captured) => [
'weight range' => $captured[1],
'count' => $captured[3],
'Highprice' => $captured[4],
'Avgprice' => $captured[5]
],
$matches
)
);
Output:
array (
0 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
1 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
2 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
3 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
)

Regex to match 2 or more words

I have a regex that tries to match for 2 or more words, but it isn't working as it's suppose to. What am I doing wrong?
$string = "i dont know , do you know?";
preg_match("~([a-z']+\b){2,}~", $string, $match);
echo "<pre>";
print_r($match);
echo "</pre>";
Expected Result:
Array ( i dont know )
Actual Result:
Array ( )
This will match for string that contains exactly 2 words or more:
/([a-zA-Z]+\s?\b){2,}/g you can go http://www.regexr.com/ and test it
PHP:
$string = "i dont know , do you know?";
preg_match("/([a-zA-Z]+\s?\b){2,}/", $string, $match);
echo "<pre>";
print_r($match);
echo "</pre>";
Note: do not use the /g in the PHP code
This one should work: ~([\w']+(\s+|[^\w\s])){2,}~g, which also match string like "I do!"
Test it here
I think you are missing how the {} are used, to match two words
preg_match_all('/([a-z]+)/i', 'one two', $match );
if( $match && count($match[1]) > 1 ){
....
}
Match is
array (
0 =>
array (
0 => 'one',
1 => 'two',
),
1 =>
array (
0 => 'one',
1 => 'two',
),
)
Match will have all matches of the pattern, so then its trivial to just count them up...
When using
preg_match('/(\w+){2,}/', 'one two', $match );
Match is
array (
0 => 'one',
1 => 'e',
)
clearly not what you want.
The only way I see with preg_match is with this /([a-z]+\s+[a-z]+)/
preg_match ([a-z']+\b){2,} http://www.phpliveregex.com/p/frM
preg_match ([a-z]+\s+[a-z]+) http://www.phpliveregex.com/p/frO
Suggested
preg_match_all ([a-z]+) http://www.phpliveregex.com/p/frR ( may have to select preg_match_all on the site )

How to count the number of matches? [duplicate]

This question already has answers here:
preg match count matches
(5 answers)
Closed 6 years ago.
I have a string like this:
$str = 'this is a string';
And this is my pattern: /i/g. There is three occurrence (as you see in the string above, it is containing three i). Now I need to count that. How can I get that number?
you can use substr_count() as well as preg_match_all()
echo substr_count("this is a string", "i"); // will echo 3
echo $k_count = preg_match_all('/i/i', 'this is a string', $out); // will echo 3
other method is convert into array and then count it:
$arr = str_split('this is a string');
$counts = array_count_values($arr);
print_r($counts);
output:
Array
(
[t] => 2
[h] => 1
[i] => 3
[s] => 3
[ ] => 3
[a] => 1
[r] => 1
[n] => 1
[g] => 1
)
You should use substr_count().
$str = 'this is a string';
echo substr_count($str, "i"); // 3
You can also use mb_substr_count()
$str = 'this is a string';
echo mb_substr_count($str, "i"); // 3
substr_count — Count the number of substring occurrences
mb_substr_count — Count the number of substring occurrences
preg_match_all could be a better fit.
Here is an example:
<?php
$subject = "a test string a";
$pattern = '/a/i';
preg_match_all($pattern, $subject, $matches);
print_r($matches);
?>
Prints:
Array ( [0] => Array ( [0] => a [1] => a ) )

Categories