Operation on String - Substrings: Get Positions and Count - php

I have a problem with operations on a string in PHP, I have one string like this:
$words = "Ala ma kota a kot ma ale";
How do I get the number of appearances of al in this long string $words? Additionally, I need the index of the beginning of all appearances of al.
$count = substr_count($words, 'al');
I tried it with the substr_count(), but it only returned the count. I need the index of the appearance as well.
EDIT Adding expected output:
number of al: 2, at index: 0, at index: 22

This is easily accomplished with preg_match_all() using PREG_OFFSET_CAPTURE flag:
$words = "Ala ma kota a kot ma ale. All in Valhalla shall recall the fall.";
preg_match_all('~(al)~i', $words, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[1]);
See a live demo at https://3v4l.org/Frsa5. We have the i modifier for case-insensitive matching. If you want case-sensitive matching, remove it. If you want only als at the start of words, use \bal (\b = word boundary). The result is an array with matches and offsets, as follows:
Array [
[0] => [
[0] => Al
[1] => 0
]
[1] => [
[0] => al
[1] => 21
]
[2] => [
[0] => Al
[1] => 26
]
[3] => [
[0] => al
[1] => 34
]
[4] => [
[0] => al
[1] => 37
]
[5] => [
[0] => al
[1] => 44
]
[6] => [
[0] => al
[1] => 51
]
[7] => [
[0] => al
[1] => 60
]
]
Edit: Since there's nothing else being matched, you don't really need the (al) capture group. You can also just remove the brackets, match ~al~i, and get the results in $matches[0] (containing full pattern matches). I've left it as is with the capture group in place, in case you may want to use more complex matching rules in the future (& being lazy to update the demo).

You could build a loop that continuously searches the array, until you run out of count, probably. Or you could wing it and make a loop that continuously does strpos() and count it in the end. Note that you need to somehow make sure you're not checking the same position over and over, that's what the $position++ is there in this piece of code.
Here, this will output exactly what you asked for in the comment, with the correct position that is. (I didn't see the comment before)
$position = 0;
$words = strtolower("Ala ma kota a kot ma ale");
$needle = 'al';
$positions = [];
do {
$position = strpos($words, $needle, $position);
if ($position !== false) {
array_push($positions, $position);
$position++;
}
} while ($position);
echo 'number of al: '.count($positions);
foreach ($positions as $position) {
echo ', at index: '.$position;
}
Output:
number of al: 2, at index: 0, at index: 21
Edit: As noted by eis, this could be simplified down to
<?php
$position = 0;
$words = strtolower("Ala ma kota a kot ma alet ma ale");
$needle = 'al';
$positions = [];
$position = -1;
while (($position = strpos($words, $needle, $position + 1)) !== false) {
array_push($positions, $position);
}
echo 'number of al: ' . count($positions);
foreach ($positions as $position) {
echo ', at index: ' . $position;
}
Edit: As Markus AO said, you might want to base your position movement (position++;) on the length of the needle. I think that depends on the way you wish to match.
a) If you want to match all appearances (aba → abababa = 3 ... match may occur within the previous match,) keep it this way.
b) If you want to match all 'full' appearances (aba → abababa = 2,) increment the position by the needle's length.

Related

How can I write a regex to pick repeating patterns in php from a file

With this string from a file with similar lines,
03/21/19 11:20 LOC3 UNA:
03/21/19 11:40 LOC2 IN: NEW BD PN VO LVA
03/21/19 11:50 LOC3 OFF:
03/21/19 12:20 LOC2 IN: OLD XD AB VO LVA
I need to capture the NEW, BD, PN, VO,LVA from lime 1, and OLD,XD,AB,VO,LVA in line 2 and so on, ignoring the other lines
This only picks the last 'VO' term
IN:\s(([^\s]+)\s+)+.*LVA
You may match the occurrences of non-whitespace chunks of text after a specific text having some text further in the string using
preg_match_all('~(?:\G(?!\A)(?=.*LVA)|IN:)\h+\K\S+~', $s, $matches)
See the regex demo
Details
(?:\G(?!\A)(?=.*LVA)|IN:) - either the end of the previous match (that has LVA later in the string after 0+ chars other than line break chars) or IN: substring (basically, it means match consecutive substrings that meet the pattern after IN: but only if there is LVA later)
\h+ - 1+ horizontal whitespaces
\K - match reset operator
\S+ - 1+ non-whitespace chars.
PHP:
$s = "03/21/19 11:20 LOC2 IN: NEW BD PN VO LVA";
if (preg_match_all('~(?:\G(?!\A)(?=.*LVA)|IN:)\h+\K\S+~', $s, $matches)) {
print_r($matches[0]);
}
// => Array ( [0] => NEW [1] => BD [2] => PN [3] => VO [4] => LVA )
To get multiple matches, wrap the pattern in the first non-capturing group with a capturing group and then check the submatches when building the final output. Something like
$s = "03/21/19 11:20 LOC2 IN: NEW BD PN VO LVA
03/21/19 11:20 LOC2 IN: NEW BD PN VO LVA VB";
$res = [];
if (preg_match_all('~(?:\G(?!\A)(?=.*LVA)|(IN:))\h+\K\S+~', $s, $matches, PREG_SET_ORDER, 0)) {
$tmp = [];
foreach ($matches as $r) {
if (count($r) > 1) {
if (count($tmp)>0) {
$res[] = $tmp;
$tmp = [];
}
}
$tmp[] = $r[0];
}
if (count($tmp)>0) {
$res[] = $tmp;
}
}
print_r($res);
// => Array (
// [0] => Array ( [0] => NEW [1] => BD [2] => PN [3] => VO [4] => A )
// [1] => Array ( [0] => NEW [1] => BD [2] => PN [3] => VO [4] => LVA )
// )
See the PHP demo.
If the string always has the same pattern then you can use a budget solution by exploding on new line and ": " and get the last value.
$str = "03/21/19 11:20 LOC2 IN: NEW BD PN VO LVA
03/21/19 11:20 LOC2 IN: OLD XD AB VO LVA";
foreach(explode("\n", $str) as $line){
$tmp = explode(": ", $line);
$result[] = end($tmp);
}
var_dump($result);
https://3v4l.org/IbfBo

Php preg_split seperates number with comma in two different numbers

$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
I need to get this array:
Array ( [0] => Bid [1] => 12/20/2018 08:10 AM (PST) [2] => $8,000 [3] => 14 [4] => 0 [5] => [6] => 120270 [7] => $10,75 [8] => false )
I agree with Andreas about using preg_match_all(), but not with his pattern.
For stability, I recommend consuming the entire string from the beginning.
Match the label and its trailing colon. [^:]+:
Match zero or more spaces. \s*
Forget what you matched so far \K
Lazily match zero or more characters (giving back when possible -- make minimal match). .*?
"Look Ahead" and demand that the matched characters from #4 are immediately followed by a comma, then 1 or more non-comma&non-colon character (the next label), then a colon ,[^,:]+: OR the end of the string $.
Code: (Demo)
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
var_export(
preg_match_all(
'/[^:]+:\s*\K.*?(?=\s*(?:$|,[^,:]+:))/',
$line,
$out
)
? $out[0] // isolate fullstring matches
: [] // no matches
);
Output:
array (
0 => 'Bid',
1 => '12/20/2018 08:10 AM (PST)',
2 => '$8,000',
3 => '14',
4 => '0',
5 => '',
6 => '120270',
7 => '$10,75',
8 => 'false',
)
New answer according to new request:
I use he same regex for spliting the string and I replace after what is before the colon:
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
$parts = preg_split("/(?<!\d),|,(?!\d)/", $line);
$result = array();
foreach($parts as $elem) {
$result[] = preg_replace('/^[^:]+:\h*/', '', $elem);
}
print_r ($result);
Output:
Array
(
[0] => Bid
[1] => 12/20/2018 08:10 AM (PST)
[2] => $8,000
[3] => 14
[4] => 0
[5] =>
[6] => 120270
[7] => $10,75
[8] => false
)
I'd use preg_match instead.
Here the pattern looks for digit(s) comma digit(s) or just digit(s) or a word and a comma.
I append a comma to the string to make the regex simpler.
$line = "TRUE,59,m,10,500";
preg_match_all("/(\d+,\d+|\d+|\w+),/", $line . ",", $match);
var_dump($match);
https://3v4l.org/HQMgu
Even with a different order of the items this code will still produce a correct output: https://3v4l.org/SRJOf
much bettter idea:
$parts=explode(',',$line,4); //explode has a limit you can use in this case 4
same result less code.
I would keep it simple and do this
$line = "TRUE,59,m,10,500";
$parts = preg_split("/,/", $line);
//print_r ($parts);
$parts[3]=$parts[3].','.$parts[4]; //create a new part 3 from 3 and 4
//$parts[3].=','.$parts[4]; //alternative syntax to the above
unset($parts[4]);//remove old part 4
print_r ($parts);
i would also just use explode(), rather than a regular expression.

PHP: Can preg_match include unmatched groups?

Can the preg_match() function include groups it did not find in the matches array?
Here is the pattern I'm using:
/^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/
What I'm trying to is parse an human readable size into bytes. This pattern fits my requirement, but only if I can retrieve matches in the absolute group order.
This can produce upto 5 match groups, which would result in a matches array with indices 0-5. However if the string does not match all groups, then the matches array may have, for example, group 5 actually at index 3.
What I'd like is the final match in that pattern (5) to always be at the same index of the matches array. Because multiple groups are optional it's very important that when reading the matches array we know which group in the expression got matched.
Example situation: The regex tester at regexr.com will show all 5 groups including those not matched always in the correct order. By enabling the "global" and "multi-line" flags and using the following text, you can hover over the blue matches for a good visual.
500.2 KiB
256M
700 Mb
1.2GiB
You'll notice that not all groups are always matched, however the group indexes are always in the correct order.
Edit: Yes I did already try this in PHP with the following:
$matches = [];
$matchesC = 0;
$matchesN = 6;
if (!preg_match("/^([0-9]+)(\.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/", $size, $matches) || ($matchesC = count($matches)) < $matchesN) {
print_r($matches);
throw new \Exception(sprintf("Could not parse size string. (%d/%d)", $matchesC, $matchesN));
}
When $size is "256M" that print_r($matches); returns:
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
)
Groups 4 and 5 are missing.
The non-participating groups are just not initialized with an empty string value in PHP, so, Group 4 and 5 are null in case of '256M' string. It seems that preg_match discards those non-initialized values from the end of the array.
In your case, you can make your capturing groups non-optional, but the patterns inside optional.
$arr = array('500.2 KiB', '256M', '700 Mb', '1.2GiB');
foreach ($arr as $s) {
if (preg_match('~^([0-9]+)(\.[0-9]+)?\s?([^ib]?)(i?)(b?)$~i', $s, $m)) {
print_r($m) . "\n";
}
}
Output:
Array
(
[0] => 500.2 KiB
[1] => 500
[2] => .2
[3] => K
[4] => i
[5] => B
)
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
[4] =>
[5] =>
)
Array
(
[0] => 700 Mb
[1] => 700
[2] =>
[3] => M
[4] =>
[5] => b
)
Array
(
[0] => 1.2GiB
[1] => 1
[2] => .2
[3] => G
[4] => i
[5] => B
)
See the PHP demo.
You can use T-Regx which can handle such cases with ease! It always checks whether a group is matched, even if it's last and unmatched. It also can differentiate between "" (matched empty) or null (unmatched):
pattern('^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$')
->match($size)
->first(function (Match $match) {
// whether the group was used in a pattern
$match->hasGroup(14);
// whether the group was matched, even if last or empty string
$match->matched(5);
// group, or default value if not matched
$match->group(5)->orReturn('unmatched');
});

Split string into array regex php

I need to split the string bellow into array keys like in this format:
string = "(731) some text here with number 2 (220) some 54 number other text here" convert into:
array(
'731' => 'some text here with number 2',
'220' => 'some 54 number other text here'
);
I have tried:
preg_split( '/\([0-9]{3}\)/', $string );
and got:
array (
0 => 'some text here',
1 => 'some other text here'
);
Code
$string = "(731) some text here with number 2 (220) some 54 number other text here";
preg_match_all("/\((\d{3})\) *([^( ]*(?> +[^( ]+)*)/", $string, $matches);
$result = array_combine($matches[1], $matches[2]);
var_dump($result);
Output
array(2) {
[731]=>
string(28) "some text here with number 2"
[220]=>
string(30) "some 54 number other text here"
}
ideone demo
Description
The regex uses
\((\d{3})\) to match 3 digits in parentheses and captures it (group 1)
\ * to match the spaces in between keys and values
([^( ]*(?> +[^( ]+)*) to match everything except a ( and captures it (group 2)
This subpattern matches exactly the same as [^(]*(?<! ) but more efficiently, based on the unrolling-the-loop technique.
*Notice though that I am interpreting a value field cannot have a ( within. If that is not the case, do tell and I will modify it accordingly.
After that, we have $matches[1] with keys and $matches[2] with values. Using array_combine() we generate the desired array.
Try this:
$string = "(731) some text here with number 2 (220) some 54 number other text here";
$a = preg_split('/\s(?=\()/', $string);//split by spaces preceding the left bracket
$res = array();
foreach($a as $v){
$r = preg_split('/(?<=\))\s/', $v);//split by spaces following the right bracket
if(isset($r[0]) && isset($r[1])){
$res[trim($r[0],'() ')] = trim($r[1]);//trim brackets and spaces
}
}
print_r($res);
Output:
Array
(
[731] => some text here with number 2
[220] => some 54 number other text here
)
DEMO
If you want to limit it only to those numbers in brackets that have 3 digits, just modify the lookarounds:
$a = preg_split('/\s(?=\([0-9]{3}\))/', $string);
you can try this one,
<?php
$str="(731) some text here (220) some other text here";
echo $str .'<br>';
$arr1=explode('(', $str);
$size_arr=count($arr1);
$final_arr=array();
for($i=1;$i<$size_arr; $i++){
$arr2=explode(')', $arr1[$i]);
$final_arr[$arr2[0]]=trim($arr2[1]);
}
echo '<pre>';
print_r($final_arr);
?>
Use this link to test the code, Click Here.
I try to use the simple syntax. Hope everybody can understand.
I'm pretty sure that defining the keys is not possible, as the regex will add matches coninuously.
I would define 2 regex,
one for the keys:
preg_match_all("/(\()([0-9]*)(\))\s/", $input_lines, $output_array);
you will find your keys in $output_array[2].
And one for the texts (that looks quite the same):
preg_split("/(\()([0-9]*)(\))\s/", $input_line);
After that, you can build your custom array iterating over both.
Make sure to trim the strings in the second array when inserting.
Using preg_replace_callback() you can quickly achieve what you desire (when only parentheses contain 3 digits):
$string = "(731) some text here with number 2 (220) some 54 number other text here";
$array = array();
preg_replace_callback('~(\((\d{3})\))(.*?)(?=(?1)|\Z)~s', function($match) use (&$array) {
$array[$match[2]] = trim($match[3]);
}, $string);
var_dump($array);
Output:
array(2) {
[731]=>
string(28) "some text here with number 2"
[220]=>
string(30) "some 54 number other text here"
}
Maybe you can add PREG_SPLIT_DELIM_CAPTURE flag to preg_split. From preg_split man page (http://php.net/manual/en/function.preg-split.php)
PREG_SPLIT_DELIM_CAPTURE
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
So if you change your code to:
$results = preg_split('/\(([0-9]+)\)/s', $data,null,PREG_SPLIT_DELIM_CAPTURE);
You will obtain an array similar to:
Array
(
[0] => KS/M/ 2013/1238
[1] => 220
[2] => 23/12/2013
[3] => 300
[4] =>
[5] => 731
[6] => VALDETE BUZA ADEM JASHARI- PRIZREN, KS
[7] => 526
[8] =>
[9] => 591
[10] =>
[11] => 740
[12] =>
[13] => 540
[14] => DEINA
[15] => 546
[16] =>
[17] => 511
[18] => 3 Preparatet për zbardhim dhe substancat tjera për larje rrobash; preparatet për pastrim, shkëlqim, fërkim dhe gërryerje; sapunët; parfumet, vajrat esencialë, preparatet kozmetike, losionet për flokë, pasta për dhembe
14 Metalet e cmueshme dhe aliazhet e tyre; mallrat në metale të cmueshme ose të veshura me to, që nuk janë përfshire në klasat tjera; xhevahirët, gurët e cmueshëm; instrumentet horologjike dhe kronometrike (për matjen dhe regjistrimin e kohës)
25 Rrobat, këpucët, kapelat
35 Reklamim, menaxhim biznesi; administrim biznesi; funksione zyre
)
What you should do is to loop over the array ignoring first element in that case:
$myArray = array();
$myKey = '';
foreach ($results as $k => $v) {
if ( ($k > 0) && ($myKey == '')) {
$myKey = $v;
} else if ($k > 0) {
$myArray[$myKey] = $v;
$myKey = '';
}
}
EDIT: This answer is for:
$data ='KS/M/ 2013/1238 (220) 23/12/2013 (300)
(731) VALDETE BUZA ADEM JASHARI- PRIZREN, KS (526)
(591)
(740)
(540) DEINA (546)
(511) 3 Preparatet për zbardhim dhe substancat tjera për larje rrobash; preparatet për pastrim, shkëlqim, fërkim dhe gërryerje; sapunët; parfumet, vajrat esencialë, preparatet kozmetike, losionet për flokë, pasta për dhembe
14 Metalet e cmueshme dhe aliazhet e tyre; mallrat në metale të cmueshme ose të veshura me to, që nuk janë përfshire në klasat tjera; xhevahirët, gurët e cmueshëm; instrumentet horologjike dhe kronometrike (për matjen dhe regjistrimin e kohës)
25 Rrobat, këpucët, kapelat
35 Reklamim, menaxhim biznesi; administrim biznesi; funksione zyre';

Replacing based on position in string

Is there a way using regex to replace characters in a string based on position?
For instance, one of my rewrite rules for a project I’m working on is “replace o with ö if o is the next-to-last vowel and even numbered (counting left to right).”
So, for example:
heabatoik would become heabatöik (o is the next-to-last vowel, as well as the fourth vowel)
habatoik would not change (o is the next-to-last vowel, but is the third vowel)
Is this possible using preg_replace in PHP?
Starting with the beginning of the subject string, you want to match 2n + 1 vowels followed by an o, but only if the o is followed by exactly one more vowel:
$str = preg_replace(
'/^((?:(?:[^aeiou]*[aeiou]){2})*)' . # 2n vowels, n >= 0
'([^aeiou]*[aeiou][^aeiou]*)' . # odd-numbered vowel
'o' . # even-numbered vowel is o
'(?=[^aeiou]*[aeiou][^aeiou]*$)/', # exactly one more vowel
'$1$2ö',
'heaeafesebatoik');
To do the same but for an odd-numbered o, match 2n leading vowels rather than 2n + 1:
$str = preg_replace(
'/^((?:(?:[^aeiou]*[aeiou]){2})*)' . # 2n vowels, n >= 0
'([^aeiou]*)' . # followed by non-vowels
'o' . # odd-numbered vowel is o
'(?=[^aeiou]*[aeiou][^aeiou]*$)/', # exactly one more vowel
'$1$2ö',
'habatoik');
If one doesn't match, then it performs no replacement, so it's safe to run them in sequence if that's what you're trying to do.
You can use preg_match_all to split the string into vowel/non-vowel parts and process that.
e.g. something like
preg_match_all("/(([aeiou])|([^aeiou]+)*/",
$in,
$out, PREG_PATTERN_ORDER);
Depending on your specific needs, you may need to modify the placement of ()*+? in the regex.
I like to expand on Schmitt. (I don't have enough points to add a comment, I'm not trying to steal his thunder). I would use the flag PREG_OFFSET_CAPTURE as it returns not only the vowels but also there locations. This is my solution:
const LETTER = 1;
const LOCATION = 2
$string = 'heabatoik'
preg_match_all('/[aeiou]/', $string, $in, $out, PREG_OFFSET_CAPTURE);
$lastElement = count($out) - 1; // -1 for last element index based 0
//if second last letter location is even
//and second last letter is beside last letter
if ($out[$lastElement - 1][LOCATION] % 2 == 0 &&
$out[$lastElement - 1][LOCATION] + 1 == $out[$lastElement][LOCATION])
substr_replace($string, 'ö', $out[$lastElement - 1][LOCATION]);
note:
print_r(preg_match_all('/[aeiou]/', 'heabatoik', $in, $out, PREG_OFFSET_CAPTURE));
Array
(
[0] => Array
(
[0] => Array
(
[0] => e
[1] => 1
)
[1] => Array
(
[0] => a
[1] => 2
)
[2] => Array
(
[0] => a
[1] => 4
)
[3] => Array
(
[0] => o
[1] => 6
)
[4] => Array
(
[0] => i
[1] => 7
)
)
)
This is how I would do it:
$str = 'heabatoik';
$vowels = preg_replace('#[^aeiou]+#i', '', $str);
$length = strlen($vowels);
if ( $length % 2 && $vowels[$length - 2] == 'o' ) {
$str = preg_replace('#o([^o]+)$#', 'ö$1', $str);
}

Categories