PHP: preg_replace (x) occurrence? - php

I asked a similar question recently, but didn't get a clear answer because I was too specific. This one is more broad.
Does anyone know how to replace an (x) occurrence in a regex pattern?
Example: Lets say I wanted to replace the 5th occurrence of the regex pattern in a string. How would I do that?
Here is the pattern:
preg_replace('/{(.*?)\|\:(.*?)}/', 'replacement', $this->source);
#anubhava REQUESTED SAMPLE CODE (last function doesn't work):
$sample = 'blah asada asdas {load|:title} steve jobs {load|:css} windows apple ';
$syntax = new syntax();
$syntax->parse($sample);
class syntax {
protected $source;
protected $i;
protected $r;
// parse source
public function parse($source) {
// set source to protected class var
$this->source = $source;
// match all occurrences for regex and run loop
$output = array();
preg_match_all('/\{(.*?)\|\:(.*?)\}/', $this->source, $output);
// run loop
$i = 0;
foreach($output[0] as $key):
// perform run function for each occurrence, send first match before |: and second match after |:
$this->run($output[1][$i], $output[2][$i], $i);
$i++;
endforeach;
echo $this->source;
}
// run function
public function run($m, $p, $i) {
// if method is load perform actions and run inject
switch($m):
case 'load':
$this->inject($i, 'content');
break;
endswitch;
}
// this function should inject the modified data, but I'm still working on this.
private function inject($i, $r) {
$output = preg_replace('/\{(.*?)\|\:(.*?)\}/', $r, $this->source);
}
}

You're misunderstanding regular expressions: they're stateless, have no memory, and no ability to count, nothing, so you can't know that a match is the x'th match in a string - the regex engine doesn't have a clue. You can't do this kind of thing with a regex for the same reason as it's not possible to write a regex to see if a string has balanced brackets: the problem requires a memory, which, by definition, regexes do not have.
However, a regex engine can tell you all the matches, so you're better off using preg_match() to get a list of matches, and then modify the string using that information yourself.
Update: is this closer to what you're thinking of?
<?php
class Parser {
private $i;
public function parse($source) {
$this->i = 0;
return preg_replace_callback('/\{(.*?)\|\:(.*?)\}/', array($this, 'on_match'), $source);
}
private function on_match($m) {
$this->i++;
// Do what you processing you need on the match.
print_r(array('m' => $m, 'i' => $this->i));
// Return what you want the replacement to be.
return $m[0] . '=>' . $this->i;
}
}
$sample = 'blah asada asdas {load|:title} steve jobs {load|:css} windows apple ';
$parse = new Parser();
$result = $parse->parse($sample);
echo "Result is: [$result]\n";
Which gives...
Array
(
[m] => Array
(
[0] => {load|:title}
[1] => load
[2] => title
)
[i] => 1
)
Array
(
[m] => Array
(
[0] => {load|:css}
[1] => load
[2] => css
)
[i] => 2
)
Result is: [blah asada asdas {load|:title}=>1 steve jobs {load|:css}=>2 windows apple ]

A much simpler and cleaner solution, which also deals with backreferences:
function preg_replace_nth($pattern, $replacement, $subject, $nth=1) {
return preg_replace_callback($pattern,
function($found) use (&$pattern, &$replacement, &$nth) {
$nth--;
if ($nth==0) return preg_replace($pattern, $replacement, reset($found) );
return reset($found);
}, $subject,$nth );
}
echo preg_replace_nth("/(\w+)\|/", '${1} is the 4th|', "|aa|b|cc|dd|e|ff|gg|kkk|", 4);
outputs |aa|b|cc|dd is the 4th|e|ff|gg|kkk|

As is already said, a regex has no state and you can't do this by just passing an integer to pinpoint the exact match for replacement ... you could wrap the replacement into a method which finds all matches and replaces only the nth match given as integer
<?
function replace_nth_occurence ( &$haystack, $pattern, $replacement, $occurence) {
preg_match_all($pattern, $haystack, $matches, PREG_OFFSET_CAPTURE);
if(array_key_exists($occurence-1, $matches[0])) {
$haystack = substr($haystack, 0, $matches[0][$occurence-1][1]).
$replacement.
substr($haystack,
$matches[0][$occurence-1][1] +
strlen($matches[0][$occurence-1][0])
);
}
}
$haystack = "test0|:test1|test2|:test3|:test4|test5|test6";
printf("%s \n", $haystack);
replace_nth_occurence( $haystack, '/\|:/', "<=>", 2);
printf("%s \n", $haystack);
?>

This is the alternative approach:
$parts = preg_split('/\{((?:.*?)\|\:(?:.*?))\}/', $this->source, PREG_SPLIT_DELIM_CAPTURE);
$parts will contain original string parts at even offsets [0] [2] [4] [6] [8] [10] ...
And the matched delimiters will be at [1] [3] [5] [7] [9]
To find the 5th occurence for example, you could then modify element $n*2 - 1 which would be element [9] in this case:
$parts[5*2 - 1] = $replacement.
Then reassemble everything:
$output = implode($parts);

There is no literal way to match occurrence 5 of pattern /pat/. But you could match /^(.*?(?:pat.*?){4,4})pat/ and replace by \1repl. This will replace the first 4 occurrences, plus anything following, with the same, and the fifth with repl.
If /pat/ contains capture groups you would need to use the non-capturing equivalent for the first N-1 matches. The replacing pattern should reference the captured groups starting from \\2.
The implementation looks like:
function replace_occurrence($pat_cap,$pat_noncap,$repl,$sample,$n)
{
$nmin = $n-1;
return preg_replace("/^(.*?(?:$pat_noncap.*?){".
"$nmin,$nmin".
"})$pat_cap/",$r="\\1$repl",$sample);
}

My first idea was to use preg_replace with a callback and do the counting in the callback, as other users have (excellently) demonstrated.
Alternatively you can use preg_split keeping the delimiters, using PREG_SPLIT_DELIM_CAPTURE, and do the actual replacement in the resulting array. PHP only captures what's between capturing parens, so you'll either have to adapt the regex or take care of other captures yourself. Assuming 1 capturing pair, then captured delimiters will always be in the odd numbered indexes: 1, 3, 5, 7, 9, .... You'll want index 9; and implode it again.
This does imply you'll need to have a single capturing
$sample = "blah asada asdas {load|:title} steve jobs {load|:css} windows apple\n";
$sample .= $sample . $sample; # at least 5 occurrences
$parts = preg_split('/(\{.*?\|\:.*?\})/', $sample, -1, PREG_SPLIT_DELIM_CAPTURE);
$parts[9] = 'replacement';
$return = implode('', $parts);

Related

Pattern Matching for a value in tuple PHP ( Regular Expressions? )

I'm having a really hard time understanding RegEx in general, so I have no clue how is it possible to use it in such an issue.
So here we have a tuple
$tuple = "(12342,43244)";
And what I try to do is get:
$value_one = 12342;
So from (value_one,value_two) get value_one.
I know it can be possible with explode( ',', $tuple ) and then delete the 1st character '(' out of the 1st element in exploded array, but that seems super sloppy, is there a way to pattern match in this manner in PHP?
Here is the simplest preg_match example with the \(([0-9]+) regex that matches a (, and captures into Group 1 one or more digits from 0 to 9 range:
$tuple = "(12342,43244)";
if (preg_match('~\(([0-9]+)~', $tuple, $m))
{
echo $m[1];
}
See the IDEONE demo
Wrapped into a function:
function retFirstDigitChunk($input) {
if (preg_match('~\(([0-9]+)~', $input, $m)) {
return $m[1];
} else {
return "";
}
}
See another demo
Or, to get both as an array:
function retValues($input) {
if (preg_match('~\((-?[0-9]+)\s*,\s*(-?[0-9]+)~', $input, $m)) {
return array('left'=>$m[1], 'right'=>$m[2]);
} else {
return "";
}
}
$tuple = "(12342,43244)";
print_r(retValues($tuple));
Output: Array( [left] => 12342 [right] => 43244 )
You have to search the number preceeded by an open brace and followed by a comma. The pattern is:
$value_one = preg_replace('/\((\d+),.*/', '$1', $tuple);
If you are looking for something efficient, try to avoid the use of regex when possible:
$result = explode(',', ltrim($tuple, '('))[0];
or
sscanf($tuple, '(%[^,]', $result);

PHP Regex for a specific numeric value inside a comma-delimited integer number string

I am trying to get the integer on the left and right for an input from the $str variable using REGEX. But I keep getting the commas back along with the integer. I only want integers not the commas. I have also tried replacing the wildcard . with \d but still no resolution.
$str = "1,2,3,4,5,6";
function pagination()
{
global $str;
// Using number 4 as an input from the string
preg_match('/(.{2})(4)(.{2})/', $str, $matches);
echo $matches[0]."\n".$matches[1]."\n".$matches[1]."\n".$matches[1]."\n";
}
pagination();
How about using a CSV parser?
$str = "1,2,3,4,5,6";
$line = str_getcsv($str);
$target = 4;
foreach($line as $key => $value) {
if($value == $target) {
echo $line[($key-1)] . '<--low high-->' . $line[($key+1)];
}
}
Output:
3<--low high-->5
or a regex could be
$str = "1,2,3,4,5,6";
preg_match('/(\d+),4,(\d+)/', $str, $matches);
echo $matches[1]."<--low high->".$matches[2];
Output:
3<--low high->5
The only flaw with these approaches is if the number is the start or end of range. Would that ever be the case?
I believe you're looking for Regex Non Capture Group
Here's what I did:
$regStr = "1,2,3,4,5,6";
$regex = "/(\d)(?:,)(4)(?:,)(\d)/";
preg_match($regex, $regStr, $results);
print_r($results);
Gives me the results:
Array ( [0] => 3,4,5 [1] => 3 [2] => 4 [3] => 5 )
Hope this helps!
Given your function name I am going to assume you need this for pagination.
The following solution might be easier:
$str = "1,2,3,4,5,6,7,8,9,10";
$str_parts = explode(',', $str);
// reset and end return the first and last element of an array respectively
$start = reset($str_parts);
$end = end($str_parts);
This prevents your regex from having to deal with your numbers getting into the double digits.

Highlight match result in subject string from preg_match_all()

I am trying to highlight the subject string with the returned $matches array from preg_match_all(). Let me start off with an example:
preg_match_all("/(.)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
This will return:
Array
(
[0] => Array
(
[0] => Array
(
[0] => a
[1] => 0
)
[1] => Array
(
[0] => a
[1] => 0
)
)
[1] => Array
(
[0] => Array
(
[0] => b
[1] => 1
)
[1] => Array
(
[0] => b
[1] => 1
)
)
[2] => Array
(
[0] => Array
(
[0] => c
[1] => 2
)
[1] => Array
(
[0] => c
[1] => 2
)
)
)
What I want to do in this case is to highlight the overall consumed data AND each backreference.
Output should look like this:
<span class="match0">
<span class="match1">a</span>
</span>
<span class="match0">
<span class="match1">b</span>
</span>
<span class="match0">
<span class="match1">c</span>
</span>
Another example:
preg_match_all("/(abc)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
Should return:
<span class="match0"><span class="match1">abc</span></span>
I hope this is clear enough.
I want to highlight overall consumed data AND highlight each backreference.
Thanks in advance. If anything is unclear, please ask.
Note: It must not break html. The regex AND input string are both unknown by the code and completely dynamic. So the search string can be html and the matched data can contain html-like text and what not.
This seems to behave right for all the examples I've thrown at it so far. Note that I've broken the abstract highlighting part from the HTML-mangling part for reusability in other situations:
<?php
/**
* Runs a regex against a string, and return a version of that string with matches highlighted
* the outermost match is marked with [0]...[/0], the first sub-group with [1]...[/1] etc
*
* #param string $regex Regular expression ready to be passed to preg_match_all
* #param string $input
* #return string
*/
function highlight_regex_matches($regex, $input)
{
$matches = array();
preg_match_all($regex, $input, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
// Arrange matches into groups based on their starting and ending offsets
$matches_by_position = array();
foreach ( $matches as $sub_matches )
{
foreach ( $sub_matches as $match_group => $match_data )
{
$start_position = $match_data[1];
$end_position = $start_position + strlen($match_data[0]);
$matches_by_position[$start_position]['START'][] = $match_group;
$matches_by_position[$end_position]['END'][] = $match_group;
}
}
// Now proceed through that array, annotoating the original string
// Note that we have to pass through BACKWARDS, or we break the offset information
$output = $input;
krsort($matches_by_position);
foreach ( $matches_by_position as $position => $matches )
{
$insertion = '';
// First, assemble any ENDING groups, nested highest-group first
if ( is_array($matches['END']) )
{
krsort($matches['END']);
foreach ( $matches['END'] as $ending_group )
{
$insertion .= "[/$ending_group]";
}
}
// Then, any STARTING groups, nested lowest-group first
if ( is_array($matches['START']) )
{
ksort($matches['START']);
foreach ( $matches['START'] as $starting_group )
{
$insertion .= "[$starting_group]";
}
}
// Insert into output
$output = substr_replace($output, $insertion, $position, 0);
}
return $output;
}
/**
* Given a regex and a string containing unescaped HTML, return a blob of HTML
* with the original string escaped, and matches highlighted using <span> tags
*
* #param string $regex Regular expression ready to be passed to preg_match_all
* #param string $input
* #return string HTML ready to display :)
*/
function highlight_regex_as_html($regex, $raw_html)
{
// Add the (deliberately non-HTML) highlight tokens
$highlighted = highlight_regex_matches($regex, $raw_html);
// Escape the HTML from the input
$highlighted = htmlspecialchars($highlighted);
// Substitute the match tokens with desired HTML
$highlighted = preg_replace('#\[([0-9]+)\]#', '<span class="match\\1">', $highlighted);
$highlighted = preg_replace('#\[/([0-9]+)\]#', '</span>', $highlighted);
return $highlighted;
}
NOTE: As hakra has pointed out to me on chat, if a sub-group in the regex can occur multiple times within one overall match (e.g. '/a(b|c)+/'), preg_match_all will only tell you about the last of those matches - so highlight_regex_matches('/a(b|c)+/', 'abc') returns '[0]ab[1]c[/1][/0]' not '[0]a[1]b[/1][1]c[/1][/0]' as you might expect/want. All matching groups outside that will still work correctly though, so highlight_regex_matches('/a((b|c)+)/', 'abc') gives '[0]a[1]b[2]c[/2][/1][/0]' which is still a pretty good indication of how the regex matched.
Reading your comment under the first answer, I'm pretty sure you did not really formulated the question as you intended to. However following to what you ask for in concrete that is:
$pattern = "/(.)/";
$subject = "abc";
$callback = function($matches) {
if ($matches[0] !== $matches[1]) {
throw new InvalidArgumentException(
sprintf('you do not match thee requirements, go away: %s'
, print_r($matches, 1))
);
}
return sprintf('<span class="match0"><span class="match1">%s</span></span>'
, htmlspecialchars($matches[1]));
};
$result = preg_replace_callback($pattern, $callback, $subject);
Before you now start to complain, take a look first where your shortcoming in describing the problem is. I have the feeling you actually want to actually parse the result for matches. However you want to do sub-matches. That does not work unless you parse as well the regular expression to find out which groups are used. That is not the case so far, not in your question and also not in this answer.
So please this example only for one subgroup which must also be the whole pattern as an requirement. Apart from that, this is fully dynamic.
Related:
How to get all captures of subgroup matches with preg_match_all()?
Ignore html tags in preg_replace
I am not too familiar with posting on stackoverflow so I hope I don't mess this up. I do this in almost the same way as #IMSoP, however, slightly different:
I store the tags like this:
$tags[ $matched_pos ]['open'][$backref_nr] = "open tag";
$tags[ $matched_pos + $len ]['close'][$backref_nr] = "close tag";
As you can see, almost identical to #IMSoP.
Then I construct the string like this, instead of inserting and sorting like #IMSoP does:
$finalStr = "";
for ($i = 0; $i <= strlen($text); $i++) {
if (isset($tags[$i])) {
foreach ($tags[$i] as $tag) {
foreach ($tag as $span) {
$finalStr .= $span;
}
}
}
$finalStr .= $text[$i];
}
Where $text is the text used in preg_match_all()
I think my solution is slightly faster than #IMSoP's since he has to sort every time and what not. But I am not sure.
My main worry right now is performance. But it might just not be possible to make it work any faster than this?
I have been trying to get a recursive preg_replace_callback() thing going, but I've not been able to make it work so far. preg_replace_callback() seems to be very, very fast. Much faster than what I am currently doing anyway.
A quick mashup, why use regex?
$content = "abc";
$endcontent = "";
for($i = 0; $i > strlen($content); $i++)
{
$endcontent .= "<span class=\"match0\"><span class=\"match1\">" . $content[$i] . "</span></span>";
}
echo $endcontent;

How to return only named groups with preg_match or preg_match_all?

Example:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i', $string, $arr_result);
print_r($arr_result);
Returns:
Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
But I want it to be:
Array
(
[date] => 2010-07-18
)
In PHP's PDO object there is an option that is filtering results from database by removing these duplicate numbered values : PDO::FETCH_ASSOC. But I haven't seen similar modifier for the PCRE functions in PHP yet.
How to return only named groups with preg_match or preg_match_all?
This is currently (PHP7) not possible.
You will always get a mixed type array, containing numeric and named keys.
Lets quote the PHP manual (http://php.net/manual/en/regexp.reference.subpatterns.php):
This subpattern will then be indexed in the matches array by its
normal numeric position and also by name.
To solve the problem the following code snippets might help:
1. filter the array by using an is_string check on the array key (for PHP5.6+)
$array_filtered = array_filter($array, "is_string", ARRAY_FILTER_USE_KEY);
2. foreach over the elements and unset if array key is_int() (all PHP versions)
/**
* #param array $array
* #return array
*/
function dropNumericKeys(array $array)
{
foreach ($array as $key => $value) {
if (is_int($key)) {
unset($array[$key]);
}
}
return $array;
}
Its a simple PHP function named dropNumericKeys(). Its for the post-processing of an matches array after a preg_match*() run using named groups for matching. The functions accepts an $array. It iterates the array and removes/unsets all keys with integer type, leaving keys with string type untouched. Finally, the function returns the array with "now" only named keys.
Note: The function is for PHP downward compatiblity. It works on all versions. The array_filter solution relies on the constant ARRAY_FILTER_USE_KEY, which is only available on PHP5.6+. See http://php.net/manual/de/array.constants.php#constant.array-filter-use-key
preg_match does not have any flag or option that it only returns named matches (yet). So what you want is not directly possible. However you can remove all items with non-fitting keys from your matches array and then you get what you're looking for:
$matches = array_intersect_key($matches, array_flip(array('name', 'likes')));
I do not think you can make preg_* do it, but you can do it with a simple loop. But I don't see why those elements pose a problem.
It also possible to unset all numeric indexes before return:
foreach (range(0, floor(count($arr_result) / 2)) as $index) {
unset($arr_result[$index]);
}
Similar to the answer that hakre posted above, I use this snippet to get just the named parameters:
$subject = "This is some text written on 2010-07-18.";
$pattern = '|(?<date>\d\d\d\d-\d\d-\d\d)|i';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
echo '<pre>Before Diff: ', print_r($matches, 1), '</pre>';
$matches = array_diff_key($matches[0], range(0, count($matches[0])));
echo '<pre>After Diff: ', print_r($matches, 1), '</pre>';
...which produces this:
Before Array
(
[0] => Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
)
After Array
(
[date] => 2010-07-18
)
I read in your post that these are possible overloads of future memory etc ...
In this case, why no't can be solved with an unset():
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d{4}-\d{2}-\d{2})|i', $string, $arr_result);
$date = array("date" => $arr_result['date']);
unset($arr_result, $string);//delete array and string preg_match origen
print_r($date);
//or create a new:
// $arr_result = $date;
//print_r($arr_result);
You could use T-Regx and go with group() or namedGroups() which only returns named capturing groups.
<?php
$subject = "This is some text written on 2010-07-18.";
pattern('(?<date>\d\d\d\d-\d\d-\d\d)', 'i')->match($subject)->first(function ($match) {
$date = $match->get('date');
// 2010-07-18
$groups = $match->namedGroups();
// [
// 'date' => '2010-07-18'
// ]
});
I use some of introduced codes and this is the final code works on php 5.6+:
$re = '/\d+\r\n(?<start>[\d\0:]+),\d+\s--\>\s(?<end>[\d\0:]+),.*\r\nHOME.*\r\nGPS\((?<x>[\d\.]+),(?<y>[\d\.]+),(?<d>[\d\.]+)\)\sBAROMETER\:(?<h>[\d\.]+)/';
$str= file_get_contents($srtFile);
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo '<pre>';
$filtered=array_map(function ($d){
return $array_filtered = array_filter($d, "is_string", ARRAY_FILTER_USE_KEY);
},$matches);
var_dump($filtered);
if you are interested what it does it read position data from a str file that DJI drones generate while recording video.
Try this:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i',$string,$arr_result);
echo $arr_result['date'];

PHP: Best way to extract text within parenthesis?

What's the best/most efficient way to extract text set between parenthesis? Say I wanted to get the string "text" from the string "ignore everything except this (text)" in the most efficient manner possible.
So far, the best I've come up with is this:
$fullString = "ignore everything except this (text)";
$start = strpos('(', $fullString);
$end = strlen($fullString) - strpos(')', $fullString);
$shortString = substr($fullString, $start, $end);
Is there a better way to do this? I know in general using regex tends to be less efficient, but unless I can reduce the number of function calls, perhaps this would be the best approach? Thoughts?
i'd just do a regex and get it over with. unless you are doing enough iterations that it becomes a huge performance issue, it's just easier to code (and understand when you look back on it)
$text = 'ignore everything except this (text)';
preg_match('#\((.*?)\)#', $text, $match);
print $match[1];
So, actually, the code you posted doesn't work: substr()'s parameters are $string, $start and $length, and strpos()'s parameters are $haystack, $needle. Slightly modified:
$str = "ignore everything except this (text)";
$start = strpos($str, '(');
$end = strpos($str, ')', $start + 1);
$length = $end - $start;
$result = substr($str, $start + 1, $length - 1);
Some subtleties: I used $start + 1 in the offset parameter in order to help PHP out while doing the strpos() search on the second parenthesis; we increment $start one and reduce $length to exclude the parentheses from the match.
Also, there's no error checking in this code: you'll want to make sure $start and $end do not === false before performing the substr.
As for using strpos/substr versus regex; performance-wise, this code will beat a regular expression hands down. It's a little wordier though. I eat and breathe strpos/substr, so I don't mind this too much, but someone else may prefer the compactness of a regex.
Use a regular expression:
if( preg_match( '!\(([^\)]+)\)!', $text, $match ) )
$text = $match[1];
i think this is the fastest way to get the words between the first parenthesis in a string.
$string = 'ignore everything except this (text)';
$string = explode(')', (explode('(', $string)[1]))[0];
echo $string;
The already posted regex solutions - \((.*?)\) and \(([^\)]+)\) - do not return the innermost strings between an open and close brackets. If a string is Text (abc(xyz 123) they both return a (abc(xyz 123) as a whole match, and not (xyz 123).
The pattern that matches substrings (use with preg_match to fetch the first and preg_match_all to fetch all occurrences) in parentheses without other open and close parentheses in between is, if the match should include parentheses:
\([^()]*\)
Or, you want to get values without parentheses:
\(([^()]*)\) // get Group 1 values after a successful call to preg_match_all, see code below
\(\K[^()]*(?=\)) // this and the one below get the values without parentheses as whole matches
(?<=\()[^()]*(?=\)) // less efficient, not recommended
Replace * with + if there must be at least 1 char between ( and ).
Details:
\( - an opening round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class)
[^()]* - zero or more characters other than ( and ) (note these ( and ) do not have to be escaped inside a character class as inside it, ( and ) cannot be used to specify a grouping and are treated as literal parentheses)
\) - a closing round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class).
The \(\K part in an alternative regex matches ( and omits from the match value (with the \K match reset operator). (?<=\() is a positive lookbehind that requires a ( to appear immediately to the left of the current location, but the ( is not added to the match value since lookbehind (lookaround) patterns are not consuming. (?=\() is a positive lookahead that requires a ) char to appear immediately to the right of the current location.
PHP code:
$fullString = 'ignore everything except this (text) and (that (text here))';
if (preg_match_all('~\(([^()]*)\)~', $fullString, $matches)) {
print_r($matches[0]); // Get whole match values
print_r($matches[1]); // Get Group 1 values
}
Output:
Array ( [0] => (text) [1] => (text here) )
Array ( [0] => text [1] => text here )
This is a sample code to extract all the text between '[' and ']' and store it 2 separate arrays(ie text inside parentheses in one array and text outside parentheses in another array)
function extract_text($string)
{
$text_outside=array();
$text_inside=array();
$t="";
for($i=0;$i<strlen($string);$i++)
{
if($string[$i]=='[')
{
$text_outside[]=$t;
$t="";
$t1="";
$i++;
while($string[$i]!=']')
{
$t1.=$string[$i];
$i++;
}
$text_inside[] = $t1;
}
else {
if($string[$i]!=']')
$t.=$string[$i];
else {
continue;
}
}
}
if($t!="")
$text_outside[]=$t;
var_dump($text_outside);
echo "\n\n";
var_dump($text_inside);
}
Output:
extract_text("hello how are you?");
will produce:
array(1) {
[0]=>
string(18) "hello how are you?"
}
array(0) {
}
extract_text("hello [http://www.google.com/test.mp3] how are you?");
will produce
array(2) {
[0]=>
string(6) "hello "
[1]=>
string(13) " how are you?"
}
array(1) {
[0]=>
string(30) "http://www.google.com/test.mp3"
}
This function may be useful.
public static function getStringBetween($str,$from,$to, $withFromAndTo = false)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
if ($withFromAndTo)
return $from . substr($sub,0, strrpos($sub,$to)) . $to;
else
return substr($sub,0, strrpos($sub,$to));
}
$inputString = "ignore everything except this (text)";
$outputString = getStringBetween($inputString, '(', ')'));
echo $outputString;
//output will be test
$outputString = getStringBetween($inputString, '(', ')', true));
echo $outputString;
//output will be (test)
strpos() => which is used to find the position of first occurance in a string.
strrpos() => which is used to find the position of first occurance in a string.
function getStringsBetween($str, $start='[', $end=']', $with_from_to=true){
$arr = [];
$last_pos = 0;
$last_pos = strpos($str, $start, $last_pos);
while ($last_pos !== false) {
$t = strpos($str, $end, $last_pos);
$arr[] = ($with_from_to ? $start : '').substr($str, $last_pos + 1, $t - $last_pos - 1).($with_from_to ? $end : '');
$last_pos = strpos($str, $start, $last_pos+1);
}
return $arr; }
this is a little improvement to the previous answer that will return all patterns in array form:
getStringsBetween('[T]his[] is [test] string [pattern]') will return:

Categories