Convert string with no delimiters into associative multidimensional array - php

I need to parse a string that has no delimiting character to form an associative array.
Here is an example string:
*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times
Every "key" (which precedes its "value") is comprised of an asterisk (*) followed by two alphanumeric characters.
I use this regex pattern: /\*[A-Z0-9]{2}/
This is my preg_split() call:
$attributes = preg_split('/\*[A-Z0-9]{2}/', $line);
This works to isolate the "value", but I also need to extract the "key" to form my desired associative array.
What I get looks like this:
$matches = [
0 => 'the title',
1 => 'the author',
2 => 'other useless infos',
3 => 'other useful infos',
4 => 'some delimiters can be there multiple times'
];
My desired output is:
$matches = [
'*01' => 'the title',
'*35' => 'the author',
'*A7' => 'other useless infos',
'*AE' => [
'other useful infos',
'some delimiters can be there multiple times',
],
];

Use the PREG_SPLIT_DELIM_CAPTURE flag of the preg_split function to also get the captured delimiter (see documentation).
So in your case:
# The -1 is the limit parameter (no limit)
$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);
Now you have element 0 of $attributes as everything before the first delimiter and then alternating the captured delimiter and the next group so you can build your $matches array like this (assuming that you do not want to keep the first group):
for($i=1; $i<sizeof($attributes)-1; $i+=2){
$matches[$attributes[$i]] = $attributes[$i+1];
}
In order to account for delimiters being present multiple times you can adjust the line inside the for loop to check whether this key already exists and in that case create an array.
Edit: a possibility to create an array if necessary is to use this code:
for($i=1; $i<sizeof($attributes)-1; $i+=2){
$key = $attributes[$i];
if(array_key_exists($key, $matches)){
if(!is_array($matches[$key]){
$matches[$key] = [$matches[$key]];
}
array_push($matches[$key], $attributes[$i+1]);
} else {
$matches[$attributes[$i]] = $attributes[$i+1];
}
}
The downstream code can certainly be simplified, especially if you put all values in (possibly single element) arrays.

You may match and capture the keys into Group 1 and all the text before the next delimiter into Group 2 where the delimiter is not the same as the first one captured. Then, in a loop, check all the keys and values and split those values with the delimiter pattern where it appears one or more times.
The regex is
(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)
See the regex demo.
Details
(\*[A-Z0-9]{2}) - Delimiter, Group 1: a * and two uppercase letters or digits
(.*?) - Value, Group 2: any 0+ chars other than line break chars, as few as possible
(?=(?!\1)\*[A-Z0-9]{2}|$) - up to the delimiter pattern (\*[A-Z0-9]{2}) that is not equal to the text captured in Group 1 ((?!\1)) or end of string ($).
See the PHP demo:
$re = '/(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)/';
$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';
$res = [];
if (preg_match_all($re, $str, $m, PREG_SET_ORDER, 0)) {
foreach ($m as $kvp) {
$tmp = preg_split('~\*[A-Z0-9]+~', $kvp[2]);
if (count($tmp) > 1) {
$res[$kvp[1]] = $tmp;
} else {
$res[$kvp[1]] = $kvp[2];
}
}
print_r($res);
}
Output:
Array
(
[*01] => the title
[*35] => the author
[*A7] => other useless infos
[*AE] => Array
(
[0] => other useful infos
[1] => some delimiters can be there multiple times
)
)

Ok, I answer my own question on how to handle the multiple same delimiters.
Thanks to #markus-ankenbrand for the start:
$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);
$matches = [];
for ($i = 1; $i < sizeof($attributes) - 1; $i += 2) {
if (isset($matches[$attributes[$i]]) && is_array($matches[$attributes[$i]])) {
$matches[$attributes[$i]][] = $attributes[$i + 1];
} elseif (isset($matches[$attributes[$i]]) && !is_array($matches[$attributes[$i]])) {
$currentValue = $matches[$attributes[$i]];
$matches[$attributes[$i]] = [$currentValue];
$matches[$attributes[$i]][] = $attributes[$i + 1];
} else {
$matches[$attributes[$i]] = $attributes[$i + 1];
}
}
The fat if/else statement does not look really nice, but it does what it need to do.

Here is a functional-style approach that doesn't require duplicate-keyed values to be consecutively written in the input string.
Use preg_match_all() to isolate the two components of each subexpression in the input string.
Use array_map() to replace each row of indexed match values with a single, associative element.
Use the spread operator (...) to unpack the newly modified matches array as indvidual associative arrays and feed that to array_merge_recursive(). The native behavior of array_merge_recursive() is to only create subarray structures where necessary.
Code: (Demo)
$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';
var_export(
array_merge_recursive(
...array_map(
fn($row) => [$row[1] => $row[2]],
preg_match_all(
'/(\*[A-Z\d]{2})(.+?)(?=$|\*[A-Z\d]{2})/',
$str,
$m,
PREG_SET_ORDER
) ? $m : []
)
)
);
Output:
array (
'*01' => 'the title',
'*35' => 'the author',
'*A7' => 'other useless infos',
'*AE' =>
array (
0 => 'other useful infos',
1 => 'some delimiters can be there multiple times',
),
)

Related

PHP Regex with recursively filter string and append sub-string to array

I'm basically trying to extract parts of a string AFTER a character "/" but using PHP PCRE (Regular Expressions) NOT PHP substr() function, I would like to test if the initial string has multiple "/" characters using a combination of PHP PCRE (Regular Expressions) and preg_match() or preg_match_all().
I am able to select for a SINGLE iteration using a regular expression.
<?php
$rules = array(
'dbl' => "/(?'d'[^/]+)/(?'p'[^/]+)", // '.../a/a' DOUBLE ITERATION
'single' => "/(?'d'[\w\-]+)",// '.../a' SINGLE ITERATION
'multiple' => "" //MULTIPLE ITERATION
);
$string = "a/b/c/d/e";
foreach ( $rules as $action => $rule ) {
if ( preg_match_all( '~^'.$rule.'$~i', $string, $params ) ) {
switch ($action) {
case 'multiple':
$arr = explode("/", $string);
print_r($arr);
//do something
...
}
}
}
?>
I know this is because of my lack of sufficient knowledge of Regular Expressions, however, I need a dynamic Regex code to match the condition that the initial string has multiple "/" characters and then recursively store these substrings to an array.
I would approach this differently: I would first explode $string on / and then apply logic based on the number of elements in the results.
<?php
$string = "a/b/c/d/e";
$arr = explode("/", $string);
if (count($arr) > 2) {
print_r($arr);
// do something knowing there were 2 or more slashes in $string
}
?>
If you need different actions for 0, 1 or 2 slashes, add elseif blocks testing for fewer elements in $arr and put the corresponding actions there.
To answer the question, Using Wiktor Stribiżew's Regex Code:
<?php
$rules = array(
'dbl' => "/(?'d'[^/]+)/(?'p'[^/]+)", // '.../a/a' DOUBLE ITERATION
'single' => "/(?'d'[\w\-]+)",// '.../a' SINGLE ITERATION
'multiple' => "/[^/]+(?:/[^/]+){2,}/?" //MULTIPLE ITERATION
);
$string = "a/b/c/d/e";
foreach ( $rules as $action => $rule ) {
if ( preg_match_all( '~^'.$rule.'$~i', $string, $params ) ) {
switch ($action) {
case 'multiple':
$arr = explode("/", $string);
print_r($arr);
//do something
...
}
}
}
?>
For others who reference this resource, kindly upvote Wiktor Stribiżew's answer once/ if he posts it.

explode string on multiple words

There is a string like this:
$string = 'connector:rtp-monthly direction:outbound message:error writing data: xxxx yyyy zzzz date:2015-11-02 10:20:30';
This string is from user Input. So it will never have the same order. It's an input field which I need to split to build a DB query.
Now I would like to split the string based on words given in a array() which is like a mapper containing the words I need to find in the string. Looking like so:
$mapper = array(
'connector' => array('type' => 'string'),
'direction' => array('type' => 'string'),
'message' => array('type' => 'string'),
'date' => array('type' => 'date'),
);
Only the keys of the $mapper will be relevant. I've tried with foreach and explode like:
$parts = explode(':', $string);
But the problem is: There can be colons somewhere in the string so I don't need to explode there. I only need to explode if a colon is followed right after the mapper key. The mapper keys in this case are:
connector // in this case split if "connector:" is found
direction // untill "direction:" is found
message // untill "message:" is found
date // untill "date:" is found
But remember also, the user input can varey. So the string will always change ant the order of the string and the mapper array() will never be in the same order. So I'm not sure if explode is the right way to go, or if I should use a regex. And if so how to do it.
The desired result should be an array looking something like this:
$desired_result = array(
'connector' => 'rtp-monthly',
'direction' => 'outbound',
'message' => 'error writing data: xxxx yyyy zzzz',
'date' => '2015-11-02 10:20:30',
);
Help is much appreciated.
The trickier part of this is matching the original string. You can do it with Regex with the help of lookahead positive assertions:
$pattern = "/(connector|direction|message|date):(.+?)(?= connector:| direction:| message:| date:|$)/";
$subject = 'connector:rtp-monthly direction:outbound message:error writing data: xxxx yyyy zzzz date:2015-11-02 10:20:30';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER );
$returnArray = array();
foreach($matches as $item)
{
$returnArray[$item[1]] = $item[2];
}
In this Regex /(connector|direction|message|date):(.+?)(?= connector:| direction:| message:| date:|$)/, you're matching:
(connector|direction|message|date) - find a keyword and capture it;
: - followed by a colon;
(.+?) - followed by any character many times non greedy, and capture it;
(?= connector:| direction:| message:| date:|$) - up until the next keyword or the end of the string, using a non-capturing look-ahead positive assertion.
The result is:
Array
(
[connector] => rtp-monthly
[direction] => outbound
[message] => error writing data: xxxx yyyy zzzz
[date] => 2015-11-02 10:20:30
)
I didn't use the mapper array just to make the example clear, but you could use implode to put the keywords together.
Our aim isto make one array that contains the values of two arrays that we would extract from the string. It is neccesary to have two arrays since there are two string delimeters we wish to consider.
Try this:
$parts = array();
$large_parts = explode(" ", $string);
for($i=0; $i<count($large_parts); $i++){
$small_parts = explode(":", $large_parts[$i]);
$parts[$small_parts[0]] = $small_parts[1];
}
$parts should now contain the desired array
Hope you get sorted out.
Here you are. The regex is there to "catch" the key (any sequence of characters, excluding blank space and ":"). Starting from there, I use "explode" to "recursively" split the string. Tested ad works good
$string = 'connector:rtp-monthly direction:outbound message:error writing data date:2015-11-02';
$element = "(.*?):";
preg_match_all( "/([^\s:]*?):/", $string, $matches);
$result = array();
$keys = array();
$values = array();
$counter = 0;
foreach( $matches[0] as $id => $match ) {
$exploded = explode( $matches[ 0 ][ $id ], $string );
$keys[ $counter ] = $matches[ 1 ][ $id ];
if( $counter > 0 ) {
$values[ $counter - 1 ] = $exploded[ 0 ];
}
$string = $exploded[ 1 ];
$counter++;
}
$values[] = $string;
$result = array();
foreach( $keys as $id => $key ) {
$result[ $key ] = $values[ $id ];
}
print_r( $result );
You could use a combination of a regular expression and explode(). Consider the following code:
$str = "connector:rtp-monthly direction:outbound message:error writing data date:2015-11-02";
$regex = "/([^:\s]+):(\S+)/i";
// first group: match any character except ':' and whitespaces
// delimiter: ':'
// second group: match any character which is not a whitespace
// will not match writing and data
preg_match_all($regex, $str, $matches);
$mapper = array();
foreach ($matches[0] as $match) {
list($key, $value) = explode(':', $match);
$mapper[$key][] = $value;
}
Additionally, you might want to think of a better way to store the strings in the first place (JSON? XML?).
Using preg_split() to explode() by multiple delimiters in PHP
Just a quick note here. To explode() a string using multiple delimiters in PHP you will have to make use of the regular expressions. Use pipe character to separate your delimiters.
$string = 'connector:rtp-monthly direction:outbound message:error writing data: xxxx yyyy zzzz date:2015-11-02 10:20:30';
$chunks = preg_split('/(connector|direction|message)/',$string,-1, PREG_SPLIT_NO_EMPTY);
// Print_r to check response output.
echo '<pre>';
print_r($chunks);
echo '</pre>';
PREG_SPLIT_NO_EMPTY – To return only non-empty pieces.

Highlight match result in subject string from preg_match_all()

I am trying to highlight the subject string with the returned $matches array from preg_match_all(). Let me start off with an example:
preg_match_all("/(.)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
This will return:
Array
(
[0] => Array
(
[0] => Array
(
[0] => a
[1] => 0
)
[1] => Array
(
[0] => a
[1] => 0
)
)
[1] => Array
(
[0] => Array
(
[0] => b
[1] => 1
)
[1] => Array
(
[0] => b
[1] => 1
)
)
[2] => Array
(
[0] => Array
(
[0] => c
[1] => 2
)
[1] => Array
(
[0] => c
[1] => 2
)
)
)
What I want to do in this case is to highlight the overall consumed data AND each backreference.
Output should look like this:
<span class="match0">
<span class="match1">a</span>
</span>
<span class="match0">
<span class="match1">b</span>
</span>
<span class="match0">
<span class="match1">c</span>
</span>
Another example:
preg_match_all("/(abc)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
Should return:
<span class="match0"><span class="match1">abc</span></span>
I hope this is clear enough.
I want to highlight overall consumed data AND highlight each backreference.
Thanks in advance. If anything is unclear, please ask.
Note: It must not break html. The regex AND input string are both unknown by the code and completely dynamic. So the search string can be html and the matched data can contain html-like text and what not.
This seems to behave right for all the examples I've thrown at it so far. Note that I've broken the abstract highlighting part from the HTML-mangling part for reusability in other situations:
<?php
/**
* Runs a regex against a string, and return a version of that string with matches highlighted
* the outermost match is marked with [0]...[/0], the first sub-group with [1]...[/1] etc
*
* #param string $regex Regular expression ready to be passed to preg_match_all
* #param string $input
* #return string
*/
function highlight_regex_matches($regex, $input)
{
$matches = array();
preg_match_all($regex, $input, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
// Arrange matches into groups based on their starting and ending offsets
$matches_by_position = array();
foreach ( $matches as $sub_matches )
{
foreach ( $sub_matches as $match_group => $match_data )
{
$start_position = $match_data[1];
$end_position = $start_position + strlen($match_data[0]);
$matches_by_position[$start_position]['START'][] = $match_group;
$matches_by_position[$end_position]['END'][] = $match_group;
}
}
// Now proceed through that array, annotoating the original string
// Note that we have to pass through BACKWARDS, or we break the offset information
$output = $input;
krsort($matches_by_position);
foreach ( $matches_by_position as $position => $matches )
{
$insertion = '';
// First, assemble any ENDING groups, nested highest-group first
if ( is_array($matches['END']) )
{
krsort($matches['END']);
foreach ( $matches['END'] as $ending_group )
{
$insertion .= "[/$ending_group]";
}
}
// Then, any STARTING groups, nested lowest-group first
if ( is_array($matches['START']) )
{
ksort($matches['START']);
foreach ( $matches['START'] as $starting_group )
{
$insertion .= "[$starting_group]";
}
}
// Insert into output
$output = substr_replace($output, $insertion, $position, 0);
}
return $output;
}
/**
* Given a regex and a string containing unescaped HTML, return a blob of HTML
* with the original string escaped, and matches highlighted using <span> tags
*
* #param string $regex Regular expression ready to be passed to preg_match_all
* #param string $input
* #return string HTML ready to display :)
*/
function highlight_regex_as_html($regex, $raw_html)
{
// Add the (deliberately non-HTML) highlight tokens
$highlighted = highlight_regex_matches($regex, $raw_html);
// Escape the HTML from the input
$highlighted = htmlspecialchars($highlighted);
// Substitute the match tokens with desired HTML
$highlighted = preg_replace('#\[([0-9]+)\]#', '<span class="match\\1">', $highlighted);
$highlighted = preg_replace('#\[/([0-9]+)\]#', '</span>', $highlighted);
return $highlighted;
}
NOTE: As hakra has pointed out to me on chat, if a sub-group in the regex can occur multiple times within one overall match (e.g. '/a(b|c)+/'), preg_match_all will only tell you about the last of those matches - so highlight_regex_matches('/a(b|c)+/', 'abc') returns '[0]ab[1]c[/1][/0]' not '[0]a[1]b[/1][1]c[/1][/0]' as you might expect/want. All matching groups outside that will still work correctly though, so highlight_regex_matches('/a((b|c)+)/', 'abc') gives '[0]a[1]b[2]c[/2][/1][/0]' which is still a pretty good indication of how the regex matched.
Reading your comment under the first answer, I'm pretty sure you did not really formulated the question as you intended to. However following to what you ask for in concrete that is:
$pattern = "/(.)/";
$subject = "abc";
$callback = function($matches) {
if ($matches[0] !== $matches[1]) {
throw new InvalidArgumentException(
sprintf('you do not match thee requirements, go away: %s'
, print_r($matches, 1))
);
}
return sprintf('<span class="match0"><span class="match1">%s</span></span>'
, htmlspecialchars($matches[1]));
};
$result = preg_replace_callback($pattern, $callback, $subject);
Before you now start to complain, take a look first where your shortcoming in describing the problem is. I have the feeling you actually want to actually parse the result for matches. However you want to do sub-matches. That does not work unless you parse as well the regular expression to find out which groups are used. That is not the case so far, not in your question and also not in this answer.
So please this example only for one subgroup which must also be the whole pattern as an requirement. Apart from that, this is fully dynamic.
Related:
How to get all captures of subgroup matches with preg_match_all()?
Ignore html tags in preg_replace
I am not too familiar with posting on stackoverflow so I hope I don't mess this up. I do this in almost the same way as #IMSoP, however, slightly different:
I store the tags like this:
$tags[ $matched_pos ]['open'][$backref_nr] = "open tag";
$tags[ $matched_pos + $len ]['close'][$backref_nr] = "close tag";
As you can see, almost identical to #IMSoP.
Then I construct the string like this, instead of inserting and sorting like #IMSoP does:
$finalStr = "";
for ($i = 0; $i <= strlen($text); $i++) {
if (isset($tags[$i])) {
foreach ($tags[$i] as $tag) {
foreach ($tag as $span) {
$finalStr .= $span;
}
}
}
$finalStr .= $text[$i];
}
Where $text is the text used in preg_match_all()
I think my solution is slightly faster than #IMSoP's since he has to sort every time and what not. But I am not sure.
My main worry right now is performance. But it might just not be possible to make it work any faster than this?
I have been trying to get a recursive preg_replace_callback() thing going, but I've not been able to make it work so far. preg_replace_callback() seems to be very, very fast. Much faster than what I am currently doing anyway.
A quick mashup, why use regex?
$content = "abc";
$endcontent = "";
for($i = 0; $i > strlen($content); $i++)
{
$endcontent .= "<span class=\"match0\"><span class=\"match1\">" . $content[$i] . "</span></span>";
}
echo $endcontent;

Search an Array and remove entry if it doesn't contain A-Z or a A-Z with a dash

I have the following Array:
Array
(
[0] => text
[1] => texture
[2] => beans
[3] =>
)
I am wanting to get rid of entries that don't contain a-z or a-z with a dash. In this case array item 3 (contains just a space).
How would I do this?
Try with:
$input = array( /* your data */ );
function azFilter($var){
return preg_match('/^[a-z-]+$/i', $var);
}
$output = array_filter($input, 'azFilter');
Also in PHP 5.3 there is possible to simplify it:
$input = array( /* your data */ );
$output = array_filter($input, function($var){
return preg_match('/^[a-z-]+$/i', $var);
});
Try:
<?php
$arr = array(
'abc',
'testing-string',
'testing another',
'sdas 213',
'2323'
);
$tmpArr = array();
foreach($arr as $str){
if(preg_match("/^([-a-z])+$/i", $str)){
$tmpArr[] = $str;
}
}
$arr = $tmpArr;
?>
Output:
array
0 => string 'abc' (length=3)
1 => string 'testing-string' (length=14)
For the data you have provided in your question, use the array_filter() function with an empty callback parameter. This will filter out all empty elements.
$array = array( ... );
$array = array_filter($array);
If you need to filter the elements you described in your question text, then you need to add a callback function that will return true (valid) or false (invalid) depending on what you need. You might find the ctype_alpha functions useful for that.
$array = array( ... );
$array = array_filter($array, 'ctype_alpha');
If you need to allow dashes as well, you need to provide an own function as callback:
$array = array( ... );
$array = array_filter($array, function($test) {return preg_match('(^[a-zA-Z-]+$)', $test);});
This sample callback function is making use of the preg_match() function using a regular expression. Regular expressions can be formulated to represent a specifc group of characters, like here a-z, A-Z and the dash - (minus sign) in the example.
Ok , simply you can loop trough the array. Create an regular expression to test if it matches your criteria.If it fails use unset() to remove the selected element.

PHP: preg_replace (x) occurrence?

I asked a similar question recently, but didn't get a clear answer because I was too specific. This one is more broad.
Does anyone know how to replace an (x) occurrence in a regex pattern?
Example: Lets say I wanted to replace the 5th occurrence of the regex pattern in a string. How would I do that?
Here is the pattern:
preg_replace('/{(.*?)\|\:(.*?)}/', 'replacement', $this->source);
#anubhava REQUESTED SAMPLE CODE (last function doesn't work):
$sample = 'blah asada asdas {load|:title} steve jobs {load|:css} windows apple ';
$syntax = new syntax();
$syntax->parse($sample);
class syntax {
protected $source;
protected $i;
protected $r;
// parse source
public function parse($source) {
// set source to protected class var
$this->source = $source;
// match all occurrences for regex and run loop
$output = array();
preg_match_all('/\{(.*?)\|\:(.*?)\}/', $this->source, $output);
// run loop
$i = 0;
foreach($output[0] as $key):
// perform run function for each occurrence, send first match before |: and second match after |:
$this->run($output[1][$i], $output[2][$i], $i);
$i++;
endforeach;
echo $this->source;
}
// run function
public function run($m, $p, $i) {
// if method is load perform actions and run inject
switch($m):
case 'load':
$this->inject($i, 'content');
break;
endswitch;
}
// this function should inject the modified data, but I'm still working on this.
private function inject($i, $r) {
$output = preg_replace('/\{(.*?)\|\:(.*?)\}/', $r, $this->source);
}
}
You're misunderstanding regular expressions: they're stateless, have no memory, and no ability to count, nothing, so you can't know that a match is the x'th match in a string - the regex engine doesn't have a clue. You can't do this kind of thing with a regex for the same reason as it's not possible to write a regex to see if a string has balanced brackets: the problem requires a memory, which, by definition, regexes do not have.
However, a regex engine can tell you all the matches, so you're better off using preg_match() to get a list of matches, and then modify the string using that information yourself.
Update: is this closer to what you're thinking of?
<?php
class Parser {
private $i;
public function parse($source) {
$this->i = 0;
return preg_replace_callback('/\{(.*?)\|\:(.*?)\}/', array($this, 'on_match'), $source);
}
private function on_match($m) {
$this->i++;
// Do what you processing you need on the match.
print_r(array('m' => $m, 'i' => $this->i));
// Return what you want the replacement to be.
return $m[0] . '=>' . $this->i;
}
}
$sample = 'blah asada asdas {load|:title} steve jobs {load|:css} windows apple ';
$parse = new Parser();
$result = $parse->parse($sample);
echo "Result is: [$result]\n";
Which gives...
Array
(
[m] => Array
(
[0] => {load|:title}
[1] => load
[2] => title
)
[i] => 1
)
Array
(
[m] => Array
(
[0] => {load|:css}
[1] => load
[2] => css
)
[i] => 2
)
Result is: [blah asada asdas {load|:title}=>1 steve jobs {load|:css}=>2 windows apple ]
A much simpler and cleaner solution, which also deals with backreferences:
function preg_replace_nth($pattern, $replacement, $subject, $nth=1) {
return preg_replace_callback($pattern,
function($found) use (&$pattern, &$replacement, &$nth) {
$nth--;
if ($nth==0) return preg_replace($pattern, $replacement, reset($found) );
return reset($found);
}, $subject,$nth );
}
echo preg_replace_nth("/(\w+)\|/", '${1} is the 4th|', "|aa|b|cc|dd|e|ff|gg|kkk|", 4);
outputs |aa|b|cc|dd is the 4th|e|ff|gg|kkk|
As is already said, a regex has no state and you can't do this by just passing an integer to pinpoint the exact match for replacement ... you could wrap the replacement into a method which finds all matches and replaces only the nth match given as integer
<?
function replace_nth_occurence ( &$haystack, $pattern, $replacement, $occurence) {
preg_match_all($pattern, $haystack, $matches, PREG_OFFSET_CAPTURE);
if(array_key_exists($occurence-1, $matches[0])) {
$haystack = substr($haystack, 0, $matches[0][$occurence-1][1]).
$replacement.
substr($haystack,
$matches[0][$occurence-1][1] +
strlen($matches[0][$occurence-1][0])
);
}
}
$haystack = "test0|:test1|test2|:test3|:test4|test5|test6";
printf("%s \n", $haystack);
replace_nth_occurence( $haystack, '/\|:/', "<=>", 2);
printf("%s \n", $haystack);
?>
This is the alternative approach:
$parts = preg_split('/\{((?:.*?)\|\:(?:.*?))\}/', $this->source, PREG_SPLIT_DELIM_CAPTURE);
$parts will contain original string parts at even offsets [0] [2] [4] [6] [8] [10] ...
And the matched delimiters will be at [1] [3] [5] [7] [9]
To find the 5th occurence for example, you could then modify element $n*2 - 1 which would be element [9] in this case:
$parts[5*2 - 1] = $replacement.
Then reassemble everything:
$output = implode($parts);
There is no literal way to match occurrence 5 of pattern /pat/. But you could match /^(.*?(?:pat.*?){4,4})pat/ and replace by \1repl. This will replace the first 4 occurrences, plus anything following, with the same, and the fifth with repl.
If /pat/ contains capture groups you would need to use the non-capturing equivalent for the first N-1 matches. The replacing pattern should reference the captured groups starting from \\2.
The implementation looks like:
function replace_occurrence($pat_cap,$pat_noncap,$repl,$sample,$n)
{
$nmin = $n-1;
return preg_replace("/^(.*?(?:$pat_noncap.*?){".
"$nmin,$nmin".
"})$pat_cap/",$r="\\1$repl",$sample);
}
My first idea was to use preg_replace with a callback and do the counting in the callback, as other users have (excellently) demonstrated.
Alternatively you can use preg_split keeping the delimiters, using PREG_SPLIT_DELIM_CAPTURE, and do the actual replacement in the resulting array. PHP only captures what's between capturing parens, so you'll either have to adapt the regex or take care of other captures yourself. Assuming 1 capturing pair, then captured delimiters will always be in the odd numbered indexes: 1, 3, 5, 7, 9, .... You'll want index 9; and implode it again.
This does imply you'll need to have a single capturing
$sample = "blah asada asdas {load|:title} steve jobs {load|:css} windows apple\n";
$sample .= $sample . $sample; # at least 5 occurrences
$parts = preg_split('/(\{.*?\|\:.*?\})/', $sample, -1, PREG_SPLIT_DELIM_CAPTURE);
$parts[9] = 'replacement';
$return = implode('', $parts);

Categories