Is there a way using regex to replace characters in a string based on position?
For instance, one of my rewrite rules for a project I’m working on is “replace o with ö if o is the next-to-last vowel and even numbered (counting left to right).”
So, for example:
heabatoik would become heabatöik (o is the next-to-last vowel, as well as the fourth vowel)
habatoik would not change (o is the next-to-last vowel, but is the third vowel)
Is this possible using preg_replace in PHP?
Starting with the beginning of the subject string, you want to match 2n + 1 vowels followed by an o, but only if the o is followed by exactly one more vowel:
$str = preg_replace(
'/^((?:(?:[^aeiou]*[aeiou]){2})*)' . # 2n vowels, n >= 0
'([^aeiou]*[aeiou][^aeiou]*)' . # odd-numbered vowel
'o' . # even-numbered vowel is o
'(?=[^aeiou]*[aeiou][^aeiou]*$)/', # exactly one more vowel
'$1$2ö',
'heaeafesebatoik');
To do the same but for an odd-numbered o, match 2n leading vowels rather than 2n + 1:
$str = preg_replace(
'/^((?:(?:[^aeiou]*[aeiou]){2})*)' . # 2n vowels, n >= 0
'([^aeiou]*)' . # followed by non-vowels
'o' . # odd-numbered vowel is o
'(?=[^aeiou]*[aeiou][^aeiou]*$)/', # exactly one more vowel
'$1$2ö',
'habatoik');
If one doesn't match, then it performs no replacement, so it's safe to run them in sequence if that's what you're trying to do.
You can use preg_match_all to split the string into vowel/non-vowel parts and process that.
e.g. something like
preg_match_all("/(([aeiou])|([^aeiou]+)*/",
$in,
$out, PREG_PATTERN_ORDER);
Depending on your specific needs, you may need to modify the placement of ()*+? in the regex.
I like to expand on Schmitt. (I don't have enough points to add a comment, I'm not trying to steal his thunder). I would use the flag PREG_OFFSET_CAPTURE as it returns not only the vowels but also there locations. This is my solution:
const LETTER = 1;
const LOCATION = 2
$string = 'heabatoik'
preg_match_all('/[aeiou]/', $string, $in, $out, PREG_OFFSET_CAPTURE);
$lastElement = count($out) - 1; // -1 for last element index based 0
//if second last letter location is even
//and second last letter is beside last letter
if ($out[$lastElement - 1][LOCATION] % 2 == 0 &&
$out[$lastElement - 1][LOCATION] + 1 == $out[$lastElement][LOCATION])
substr_replace($string, 'ö', $out[$lastElement - 1][LOCATION]);
note:
print_r(preg_match_all('/[aeiou]/', 'heabatoik', $in, $out, PREG_OFFSET_CAPTURE));
Array
(
[0] => Array
(
[0] => Array
(
[0] => e
[1] => 1
)
[1] => Array
(
[0] => a
[1] => 2
)
[2] => Array
(
[0] => a
[1] => 4
)
[3] => Array
(
[0] => o
[1] => 6
)
[4] => Array
(
[0] => i
[1] => 7
)
)
)
This is how I would do it:
$str = 'heabatoik';
$vowels = preg_replace('#[^aeiou]+#i', '', $str);
$length = strlen($vowels);
if ( $length % 2 && $vowels[$length - 2] == 'o' ) {
$str = preg_replace('#o([^o]+)$#', 'ö$1', $str);
}
Related
I have about 200 lines in text file
values can be like
$array = ['.1','1.5','0.10','.8'....];
And I am looking specific regex pattern for replace .number like '.1' or '.8'
preg_replace('/\.(\d+)/','0${0}',$array[0]);
It's working, but for value 1.5 it's output is 10.5 and that's wrong.
If all of your values are float or integer expressions, then you can use bluntly use "start of string followed by literal dot": (Demo)
$array = ['.1', '1.5', '0.10', '.8'];
var_export(preg_replace('/^\./', '0.', $array));
If you need to make sure that the dot at the start of the string is followed by a digit, you could add a lookahead: (Demo)
var_export(preg_replace('/^\.(?=\d)/', '0.', $array));
Either way, you don't need to leverage any capture groups or backreferences.
It may be useful for you to know that preg_replace() will happily iterate your array on its own -- no additional loop needs to be written.
If you'd like to directly manipulate your txt file, just add a m pattern modifier so that ^ means the start of each line. (Demo)
$txt = <<<TEXT
.1
1.5
0.10
.8
TEXT;
echo preg_replace('/^\.(?=\d)/m', '0.', $txt);
Output:
0.1
1.5
0.10
0.8
Using \.(\d+) will get a partial match in 1.5 and matches .5
Then the replacement will be the untouched 1 plus 0 plus .5 resulting in 10.5
You can replace the values that start with a dot, and match a single following digit ^\.\d
If it should be the only value on the line, you can append an anchor for the end of the string as well ^\.\d+$
$array = ['.1','1.5','0.10','.8'];
foreach($array as $a) {
echo preg_replace('/^\.\d/','0$0', $a). PHP_EOL;
}
Output
0.1
1.5
0.10
0.8
See a PHP demo
If you have an array of values similar to the one you presented, then this can be solved without regular expressions. it's easy enough to use floatval,for example floatval('.8') will return 0.8. A general example is below:
$input = ['.1','1.5','0.10','.8'];
$output = array_map('floatval', $input);
print_r($output); // => Array ( [0] => 0.1 [1] => 1.5 [2] => 0.1 [3] => 0.8 )
Or if you need to strictly have strings:
$input = ['.1','1.5','0.10','.8'];
$output = array_map(function($item) {
return strval(floatval($item));
}, $input);
print_r($output); // => Array ( [0] => 0.1 [1] => 1.5 [2] => 0.1 [3] => 0.8 )
Or a short version with support for arrow functions (PHP >= 7.4):
$input = ['.1','1.5','0.10','.8'];
$output = array_map(fn($item) => strval(floatval($item)), $input);
print_r($output); // => Array ( [0] => 0.1 [1] => 1.5 [2] => 0.1 [3] => 0.8 )
If you need to normalize such values inside the text, then you can use preg_replace_callback + floatval
$input = "Test text: .1 + 1.5 = 1.6 and 000.10 - .8 = .2";
$output = preg_replace_callback("|\d*\.\d+|", fn($item) => floatval($item[0]), $input);
print_r($output); // => "Test text: 0.1 + 1.5 = 1.6 and 0.1 - 0.8 = 0.2"
You are not checking the context before . char.
You can check is the ' char is present before the . + digits:
preg_replace("/'\K\.\d+/",'0$0',$array[0]);
See the regex demo. Details:
' - a ' char
\K - forget the ' matched char
\. - a . char
\d+ - one or more digits
The 0$0 pattern replaces with 0 + the whole match value. Note the unambiguous ${0} backreference form is only required if the backreference is followed with a literal digit, not when it is preceded with it.
I have a database full of strings that I'd like to split into an array. Each string contains a list of directions that begin with a letter (U, D, L, R for Up, Down, Left, Right) and a number to tell how far to go in that direction.
Here is an example of one string.
$string = "U29R45U2L5D2L16";
My desired result:
['U29', 'R45', 'U2', 'L5', 'D2', 'L16']
I thought I could just loop through the string, but I don't know how to tell if the number is one or more spaces in length.
You can use preg_split to break up the string, splitting on something which looks like a U,L,D or R followed by numbers and using the PREG_SPLIT_DELIM_CAPTURE to keep the split text:
$string = "U29R45U2L5D2L16";
print_r(preg_split('/([UDLR]\d+)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
Output:
Array (
[0] => U29
[1] => R45
[2] => U2
[3] => L5
[4] => D2
[5] => L16
)
Demo on 3v4l.org
A regular expression should help you:
<?php
$string = "U29R45U2L5D2L16";
preg_match_all("/[A-Z]\d+/", $string, $matches);
var_dump($matches);
Because this task is about text extraction and not about text validation, you can merely split on the zer-width position after one or more digits. In other words, match one or more digits, then forget them with \K so that they are not consumed while splitting.
Code: (Demo)
$string = "U29R45U2L5D2L16";
var_export(
preg_split(
'/\d+\K/',
$string,
0,
PREG_SPLIT_NO_EMPTY
)
);
Output:
array (
0 => 'U29',
1 => 'R45',
2 => 'U2',
3 => 'L5',
4 => 'D2',
5 => 'L16',
)
I've been trying for the couple of days to split a string into letters and numbers. I've found various solutions but they do not work up to my expectations (some of them only separate letters from digits (not integers or float numbers/per say negative numbers).
Here's an example:
$input = '-4D-3A'; // edit: the TEXT part can have multiple chars, i.e. -4AB-3A-5SD
$result = preg_split('/(?<=\d)(?=[a-z])|(?<=[a-z])(?=\d)/i', $input);
print_r($result);
Result:
Array ( [0] => -4 [1] => D-3 [2] => A )
And I need it to be [0] => -4 [1] => D [2] => -3 [3] => A
I've tried doing several changes but no result so far, could you please help me if possible?
Thank you.
try this:
$input = '-4D-3A';
$result = preg_split('/(-?[0-9]+\.?[0-9]*)/i', $input, 0, PREG_SPLIT_DELIM_CAPTURE);
$result=array_filter($result);
print_r($result);
It will split by numbers BUT also capture the delimiter (number)
giving : Array ( [1] => -4 [4] => D [5] => -3 [8] => A )
I've patterened number as:
1. has optional negative sign (you may want to do + too)
2. followed by one or more digits
3. followed by an optional decimal point
4. followed by zero or more digits
Can anyone point out the solution to "-0." being valid number?
How about this regex? ([-]{,1}\d+|[a-zA-Z]+)
I tested it out on http://www.rubular.com/ seems to work as you want.
So I know if you pass in the flag PREG_OFFSET_CAPTURE you get the index of the regex match in the orginal "haystack", but what if I want the index of the match within the whole match?
Simple example:
Original String: "Have a <1 + 2> day today"
My regular expression /<1 ([+|-]) 2>/
So in the example I am matching whatever symbol is between the 1 and 2. If I did this in preg_match with the PREG_OFFSET_CAPTURE flag, the index for the matched symbol would be 10. I really would like it to return 3 though.
Is there any way to do this?
the only way is to substract the whole pattern offset (7) to the capturing group offset (10): 10-7=3
$group_offset = $matches[1][1] - $matches[0][1];
You could use a more tricky way by using preg_replace_callback:
$string = 'I have a <1 + 2> day today and a foo <4 - 1> week.';
$match = array();
preg_replace_callback('/<\d+ ([+|-]) \d+>/', function($m)use(&$match){
$match[] = array($m[0], $m[1], strpos($m[0], $m[1]));
}, $string); // PHP 5.3+ required (anonymous function)
print_r($match);
Output:
Array
(
[0] => Array
(
[0] => <1 + 2>
[1] => +
[2] => 3
)
[1] => Array
(
[0] => <4 - 1>
[1] => -
[2] => 3
)
)
Let's say I have random block of text:
EAMoAAQAABwEBAAAAAAAAAAAAAAABAgMFBgcIBAkBAQABBQEBAAAAAAAAAAAAAAAGAgMEBQcBCBAAAQMDAgMEBQcIBQgGCwEAAQACAxEEBSEGMRIHQVFhE3GBIhQIkaGxwTJCI9FScoKSojMV8GLCUxbhstKDo7M0ZHOTJEQlF/HiQ2PDVHSExEUmGBEBAAIBAgMDCAgCCgMBAQEAAAECAxEEITEFQRIGUWFxgZGhIhPwscHRMlIUB0Jy4fGCkqLCI1MVFrLSQ2IzF//aAAwDAQACEQMRAD8A7+QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEEDwXkzpxHgusxi7NrnXF3G0NBLhzAkAeAqVH934r6bt57uTPSJ8ne1n2Rqycezy35VlRttwYu5DXNlLOcczOdpHM3hUUqtLs/wBxulZonXJ8vjp8caa+eOa5k6flrPLVcIbm3n/gytf4NcCVKtj1XbbqNcOSuT+W0W+pi3x2rzjRWWxUCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAggV5It2Uy8GNYAWmW6kr5MDftO8T3BRXxR4s2/SccTb48lvw0jnPnn8tfP6o1Ze02ds08OERzlid+/P5Orp5BHEeFuxxa0Dxpx9a+fOu+Iup9Tmfm30p+Ss92vr/N6bat/t67fDyjWfLLG79pt45YpAA8NdUAg9ngolTFNbedtqWi0avVicv5bLKFr2kSRltHaahrXCnylZcd6k208rDy4ItxlkUr5+XnZE1zxq0h3KfUQqv1GWsxeI0tHKY1rPtjRgVivKZU7HebrS491ybX+TWnO7V7PEn7w+f0rpPhb9zdxt7Rj3szkx/n/AI6+n88f4vTyebno8Wr3qTGvun7mawSxzsbNC4Pje0Oa9pqCD2grv+3z0zUi9Ji1bRrEx2wjtqzWdJ5wqq8pEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQU
SPECIFICATIONS:
patternABC >= 2 characters = groupABC IF groupABC occurs more than once
groupABC + (groupABC)n = sequence WHERE n >= 1 AND sequence > 6 characters
** A sequence needs to be > 6 characters in order to be evaluated
BREAKDOWN:
How do I find any repeating patterns that occur in sequence?
QEBAQEBAQEBAQEBAQEBAQEBA
I also want to count how many times each group repeats:
QEBA QEBA QEBA QEBA QEBA QEBA = 6
Also the sequence must be > 6 characters in order to be evaluated:
NO GOOD: AA AA AA
GOOD: AA AA AA AA
It would be ideal if the output could be stored in an associative array, with duplicate entries removed:
QEBA => 6, AA => 4, QEBA => 3, AA => 8, (QEBA => 6)<- REMOVE
Does anyone have the time & the inclination to tackle this problem?
You rock if you do!
$str = 'EAMoAAQAABwEBAAAAAAAAAAAAAAABAgMFBgcIBAkBAQABBQEBAAAAAAAAAAAAAAAGAgMEBQcBCBAAAQMDAgMEBQcIBQgGCwEAAQACAxEEBSEGMRIHQVFhE3GBIhQIkaGxwTJCI9FScoKSojMV8GLCUxbhstKDo7M0ZHOTJEQlF/HiQ2PDVHSExEUmGBEBAAIBAgMDCAgCCgMBAQEAAAECAxEEITEFQRIGUWFxgZGhIhPwscHRMlIUB0Jy4fGCkqLCI1MVFrLSQ2IzF//aAAwDAQACEQMRAD8A7+QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEEDwXkzpxHgusxi7NrnXF3G0NBLhzAkAeAqVH934r6bt57uTPSJ8ne1n2Rqycezy35VlRttwYu5DXNlLOcczOdpHM3hUUqtLs/wBxulZonXJ8vjp8caa+eOa5k6flrPLVcIbm3n/gytf4NcCVKtj1XbbqNcOSuT+W0W+pi3x2rzjRWWxUCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAggV5It2Uy8GNYAWmW6kr5MDftO8T3BRXxR4s2/SccTb48lvw0jnPnn8tfP6o1Ze02ds08OERzlid+/P5Orp5BHEeFuxxa0Dxpx9a+fOu+Iup9Tmfm30p+Ss92vr/N6bat/t67fDyjWfLLG79pt45YpAA8NdUAg9ngolTFNbedtqWi0avVicv5bLKFr2kSRltHaahrXCnylZcd6k208rDy4ItxlkUr5+XnZE1zxq0h3KfUQqv1GWsxeI0tHKY1rPtjRgVivKZU7HebrS491ybX+TWnO7V7PEn7w+f0rpPhb9zdxt7Rj3szkx/n/AI6+n88f4vTyebno8Wr3qTGvun7mawSxzsbNC4Pje0Oa9pqCD2grv+3z0zUi9Ji1bRrEx2wjtqzWdJ5wqq8pEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQU';
preg_match_all( '/(\S{2,}?)\1+/', $str, $matches );
// Remove duplicates
$matches[0] = array_unique( $matches[0] );
foreach ( $matches[0] as $key => $value ) {
if ( strlen( $value ) > 6 ) {
$repeated = $matches[1][$key];
$results[] = array( $repeated => count( explode( $repeated, $value ) ) - 1 );
}
}
print_r($results);
/*
[AA] => 7
[QEBA] => 93
[CAgI] => 18
[EBAQ] => 18
*/
The above assumes a sequence is composed of non-space characters.
Get the sequences with preg_match_all('/(?:(.{6,})\1)/',$inputText,$sequences)
(note: sequences will be saved in $sequences)
Explained RegEx demo: http://regex101.com/r/rW4nE2
Use array_unique() to get rid of duplicates.
Loop through each sequence and:
Get the groups with preg_match_all('/(.+?)(\1)(\1)?/',$sequence,$groups)
Explained RegEx demo: http://regex101.com/r/pC3pB7
Use count() if you need to.