Highlight match result in subject string from preg_match_all()

Highlight match result in subject string from preg_match_all() - php

I am trying to highlight the subject string with the returned $matches array from preg_match_all(). Let me start off with an example:
preg_match_all("/(.)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
This will return:
Array
(
[0] => Array
(
[0] => Array
(
[0] => a
[1] => 0
)
[1] => Array
(
[0] => a
[1] => 0
)
)
[1] => Array
(
[0] => Array
(
[0] => b
[1] => 1
)
[1] => Array
(
[0] => b
[1] => 1
)
)
[2] => Array
(
[0] => Array
(
[0] => c
[1] => 2
)
[1] => Array
(
[0] => c
[1] => 2
)
)
)
What I want to do in this case is to highlight the overall consumed data AND each backreference.
Output should look like this:
<span class="match0">
<span class="match1">a</span>
</span>
<span class="match0">
<span class="match1">b</span>
</span>
<span class="match0">
<span class="match1">c</span>
</span>
Another example:
preg_match_all("/(abc)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
Should return:
<span class="match0"><span class="match1">abc</span></span>
I hope this is clear enough.
I want to highlight overall consumed data AND highlight each backreference.
Thanks in advance. If anything is unclear, please ask.
Note: It must not break html. The regex AND input string are both unknown by the code and completely dynamic. So the search string can be html and the matched data can contain html-like text and what not.

This seems to behave right for all the examples I've thrown at it so far. Note that I've broken the abstract highlighting part from the HTML-mangling part for reusability in other situations:
<?php
/**
* Runs a regex against a string, and return a version of that string with matches highlighted
* the outermost match is marked with [0]...[/0], the first sub-group with [1]...[/1] etc
*
* #param string $regex Regular expression ready to be passed to preg_match_all
* #param string $input
* #return string
*/
function highlight_regex_matches($regex, $input)
{
$matches = array();
preg_match_all($regex, $input, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
// Arrange matches into groups based on their starting and ending offsets
$matches_by_position = array();
foreach ( $matches as $sub_matches )
{
foreach ( $sub_matches as $match_group => $match_data )
{
$start_position = $match_data[1];
$end_position = $start_position + strlen($match_data[0]);
$matches_by_position[$start_position]['START'][] = $match_group;
$matches_by_position[$end_position]['END'][] = $match_group;
}
}
// Now proceed through that array, annotoating the original string
// Note that we have to pass through BACKWARDS, or we break the offset information
$output = $input;
krsort($matches_by_position);
foreach ( $matches_by_position as $position => $matches )
{
$insertion = '';
// First, assemble any ENDING groups, nested highest-group first
if ( is_array($matches['END']) )
{
krsort($matches['END']);
foreach ( $matches['END'] as $ending_group )
{
$insertion .= "[/$ending_group]";
}
}
// Then, any STARTING groups, nested lowest-group first
if ( is_array($matches['START']) )
{
ksort($matches['START']);
foreach ( $matches['START'] as $starting_group )
{
$insertion .= "[$starting_group]";
}
}
// Insert into output
$output = substr_replace($output, $insertion, $position, 0);
}
return $output;
}
/**
* Given a regex and a string containing unescaped HTML, return a blob of HTML
* with the original string escaped, and matches highlighted using <span> tags
*
* #param string $regex Regular expression ready to be passed to preg_match_all
* #param string $input
* #return string HTML ready to display :)
*/
function highlight_regex_as_html($regex, $raw_html)
{
// Add the (deliberately non-HTML) highlight tokens
$highlighted = highlight_regex_matches($regex, $raw_html);
// Escape the HTML from the input
$highlighted = htmlspecialchars($highlighted);
// Substitute the match tokens with desired HTML
$highlighted = preg_replace('#\[([0-9]+)\]#', '<span class="match\\1">', $highlighted);
$highlighted = preg_replace('#\[/([0-9]+)\]#', '</span>', $highlighted);
return $highlighted;
}
NOTE: As hakra has pointed out to me on chat, if a sub-group in the regex can occur multiple times within one overall match (e.g. '/a(b|c)+/'), preg_match_all will only tell you about the last of those matches - so highlight_regex_matches('/a(b|c)+/', 'abc') returns '[0]ab[1]c[/1][/0]' not '[0]a[1]b[/1][1]c[/1][/0]' as you might expect/want. All matching groups outside that will still work correctly though, so highlight_regex_matches('/a((b|c)+)/', 'abc') gives '[0]a[1]b[2]c[/2][/1][/0]' which is still a pretty good indication of how the regex matched.

Reading your comment under the first answer, I'm pretty sure you did not really formulated the question as you intended to. However following to what you ask for in concrete that is:
$pattern = "/(.)/";
$subject = "abc";
$callback = function($matches) {
if ($matches[0] !== $matches[1]) {
throw new InvalidArgumentException(
sprintf('you do not match thee requirements, go away: %s'
, print_r($matches, 1))
);
}
return sprintf('<span class="match0"><span class="match1">%s</span></span>'
, htmlspecialchars($matches[1]));
};
$result = preg_replace_callback($pattern, $callback, $subject);
Before you now start to complain, take a look first where your shortcoming in describing the problem is. I have the feeling you actually want to actually parse the result for matches. However you want to do sub-matches. That does not work unless you parse as well the regular expression to find out which groups are used. That is not the case so far, not in your question and also not in this answer.
So please this example only for one subgroup which must also be the whole pattern as an requirement. Apart from that, this is fully dynamic.
Related:
How to get all captures of subgroup matches with preg_match_all()?
Ignore html tags in preg_replace

I am not too familiar with posting on stackoverflow so I hope I don't mess this up. I do this in almost the same way as #IMSoP, however, slightly different:
I store the tags like this:
$tags[ $matched_pos ]['open'][$backref_nr] = "open tag";
$tags[ $matched_pos + $len ]['close'][$backref_nr] = "close tag";
As you can see, almost identical to #IMSoP.
Then I construct the string like this, instead of inserting and sorting like #IMSoP does:
$finalStr = "";
for ($i = 0; $i <= strlen($text); $i++) {
if (isset($tags[$i])) {
foreach ($tags[$i] as $tag) {
foreach ($tag as $span) {
$finalStr .= $span;
}
}
}
$finalStr .= $text[$i];
}
Where $text is the text used in preg_match_all()
I think my solution is slightly faster than #IMSoP's since he has to sort every time and what not. But I am not sure.
My main worry right now is performance. But it might just not be possible to make it work any faster than this?
I have been trying to get a recursive preg_replace_callback() thing going, but I've not been able to make it work so far. preg_replace_callback() seems to be very, very fast. Much faster than what I am currently doing anyway.

A quick mashup, why use regex?
$content = "abc";
$endcontent = "";
for($i = 0; $i > strlen($content); $i++)
{
$endcontent .= "<span class=\"match0\"><span class=\"match1\">" . $content[$i] . "</span></span>";
}
echo $endcontent;

Related

how to prevent preg_match/preg_match_all from creating unnecessary array elements [duplicate]

Example:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i', $string, $arr_result);
print_r($arr_result);
Returns:
Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
But I want it to be:
Array
(
[date] => 2010-07-18
)
In PHP's PDO object there is an option that is filtering results from database by removing these duplicate numbered values : PDO::FETCH_ASSOC. But I haven't seen similar modifier for the PCRE functions in PHP yet.

How to return only named groups with preg_match or preg_match_all?
This is currently (PHP7) not possible.
You will always get a mixed type array, containing numeric and named keys.
Lets quote the PHP manual (http://php.net/manual/en/regexp.reference.subpatterns.php):
This subpattern will then be indexed in the matches array by its
normal numeric position and also by name.
To solve the problem the following code snippets might help:
1. filter the array by using an is_string check on the array key (for PHP5.6+)
$array_filtered = array_filter($array, "is_string", ARRAY_FILTER_USE_KEY);
2. foreach over the elements and unset if array key is_int() (all PHP versions)
/**
* #param array $array
* #return array
*/
function dropNumericKeys(array $array)
{
foreach ($array as $key => $value) {
if (is_int($key)) {
unset($array[$key]);
}
}
return $array;
}
Its a simple PHP function named dropNumericKeys(). Its for the post-processing of an matches array after a preg_match*() run using named groups for matching. The functions accepts an $array. It iterates the array and removes/unsets all keys with integer type, leaving keys with string type untouched. Finally, the function returns the array with "now" only named keys.
Note: The function is for PHP downward compatiblity. It works on all versions. The array_filter solution relies on the constant ARRAY_FILTER_USE_KEY, which is only available on PHP5.6+. See http://php.net/manual/de/array.constants.php#constant.array-filter-use-key

preg_match does not have any flag or option that it only returns named matches (yet). So what you want is not directly possible. However you can remove all items with non-fitting keys from your matches array and then you get what you're looking for:
$matches = array_intersect_key($matches, array_flip(array('name', 'likes')));

I do not think you can make preg_* do it, but you can do it with a simple loop. But I don't see why those elements pose a problem.

It also possible to unset all numeric indexes before return:
foreach (range(0, floor(count($arr_result) / 2)) as $index) {
unset($arr_result[$index]);
}

Similar to the answer that hakre posted above, I use this snippet to get just the named parameters:
$subject = "This is some text written on 2010-07-18.";
$pattern = '|(?<date>\d\d\d\d-\d\d-\d\d)|i';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
echo '<pre>Before Diff: ', print_r($matches, 1), '</pre>';
$matches = array_diff_key($matches[0], range(0, count($matches[0])));
echo '<pre>After Diff: ', print_r($matches, 1), '</pre>';
...which produces this:
Before Array
(
[0] => Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
)
After Array
(
[date] => 2010-07-18
)

I read in your post that these are possible overloads of future memory etc ...
In this case, why no't can be solved with an unset():
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d{4}-\d{2}-\d{2})|i', $string, $arr_result);
$date = array("date" => $arr_result['date']);
unset($arr_result, $string);//delete array and string preg_match origen
print_r($date);
//or create a new:
// $arr_result = $date;
//print_r($arr_result);

You could use T-Regx and go with group() or namedGroups() which only returns named capturing groups.
<?php
$subject = "This is some text written on 2010-07-18.";
pattern('(?<date>\d\d\d\d-\d\d-\d\d)', 'i')->match($subject)->first(function ($match) {
$date = $match->get('date');
// 2010-07-18
$groups = $match->namedGroups();
// [
// 'date' => '2010-07-18'
// ]
});

I use some of introduced codes and this is the final code works on php 5.6+:
$re = '/\d+\r\n(?<start>[\d\0:]+),\d+\s--\>\s(?<end>[\d\0:]+),.*\r\nHOME.*\r\nGPS\((?<x>[\d\.]+),(?<y>[\d\.]+),(?<d>[\d\.]+)\)\sBAROMETER\:(?<h>[\d\.]+)/';
$str= file_get_contents($srtFile);
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo '<pre>';
$filtered=array_map(function ($d){
return $array_filtered = array_filter($d, "is_string", ARRAY_FILTER_USE_KEY);
},$matches);
var_dump($filtered);
if you are interested what it does it read position data from a str file that DJI drones generate while recording video.

Try this:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i',$string,$arr_result);
echo $arr_result['date'];

PHP Regex for a specific numeric value inside a comma-delimited integer number string

I am trying to get the integer on the left and right for an input from the $str variable using REGEX. But I keep getting the commas back along with the integer. I only want integers not the commas. I have also tried replacing the wildcard . with \d but still no resolution.
$str = "1,2,3,4,5,6";
function pagination()
{
global $str;
// Using number 4 as an input from the string
preg_match('/(.{2})(4)(.{2})/', $str, $matches);
echo $matches[0]."\n".$matches[1]."\n".$matches[1]."\n".$matches[1]."\n";
}
pagination();

How about using a CSV parser?
$str = "1,2,3,4,5,6";
$line = str_getcsv($str);
$target = 4;
foreach($line as $key => $value) {
if($value == $target) {
echo $line[($key-1)] . '<--low high-->' . $line[($key+1)];
}
}
Output:
3<--low high-->5
or a regex could be
$str = "1,2,3,4,5,6";
preg_match('/(\d+),4,(\d+)/', $str, $matches);
echo $matches[1]."<--low high->".$matches[2];
Output:
3<--low high->5
The only flaw with these approaches is if the number is the start or end of range. Would that ever be the case?

I believe you're looking for Regex Non Capture Group
Here's what I did:
$regStr = "1,2,3,4,5,6";
$regex = "/(\d)(?:,)(4)(?:,)(\d)/";
preg_match($regex, $regStr, $results);
print_r($results);
Gives me the results:
Array ( [0] => 3,4,5 [1] => 3 [2] => 4 [3] => 5 )
Hope this helps!

Given your function name I am going to assume you need this for pagination.
The following solution might be easier:
$str = "1,2,3,4,5,6,7,8,9,10";
$str_parts = explode(',', $str);
// reset and end return the first and last element of an array respectively
$start = reset($str_parts);
$end = end($str_parts);
This prevents your regex from having to deal with your numbers getting into the double digits.

Use regular expression to extract attribute value for custom tag

Thanks for taking a look at this. I'm using PHP. I have a string like so:
[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don't so much dance as rhythmically convulse.[/QUOTE]
And I want to pull out the values in the quotes and create an associative array like so:
["name" => "Max-Fischer", "post" => "486662533", "member" => "123"]
Then, I would like to remove the opening and closing [QUOTE] tags and replace them with custom HTML like so:
<blockquote>Max-Fischer wrote: I don't so much dance as rhythmically convulse.</blockquote>
So the main problem is creating the preg_match() or preg_replace() to handle first: grabbing the values out in an array, and second: removing the tags and replacing them with my custom content. I can figure out how to use the array to create the custom HTML, I just can't figure how to use regular expressions well enough to achieve it.
I tried a match like this to get the attribute values:
/(\S+)=[\"\']?((?:.(?![\"\']?\s+(?:\S+)=|[>\"\']))+.)[\"\']?/
But this only returns:
[QUOTE
And that's not even addressing how to put the values (if I can get them) into an array.
Thanks in advance for your time.
Cheers.

If the tag you're looking for is always going to be quote, then perhaps something a little simpler is possible:
$s ='"[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';
$r = '/\[QUOTE="(.*?)"\](.*)\[\/QUOTE\]/';
$m = array();
$arr = array();
preg_match($r, $s, $m);
// m[0] = the initial string
// m[1] = the string of attributes
// m[2] = the quote itself
foreach(explode(',', $m[1]) as $valuepair) { // split the attributes on the comma
preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
// mm[0] = the attribute pairing
// mm[1] = the attribute name
// mm[2] = the attribute value
$arr[$mm[1]] = $mm[2];
}
print_r($arr);
print $m[2] . "\n";
this gives the following output:
Array
(
[name] => Max-Fischer
[post] => 486662533
[member] => 123
)
I don't so much dance as rhythmically convulse.
If you want to handle the case where there is more than one quote in the string, we can do this by modifying the regex to be slightly less greedy, and then using preg_match_all, instead of preg_match
$s ='[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';
$s .='[QUOTE="name: Some-Guy, post: 486562533, member: 1234"]Quidquid latine dictum sit, altum videtur[/QUOTE]';
$r = '/\[QUOTE="(.*?)"\](.*?)\[\/QUOTE\]/';
// ^ <--- added to make it less greedy
$m = array();
$arr = array();
preg_match_all($r, $s, $m, PREG_SET_ORDER);
// m[0] = the first quote
// m[1] = the second quote
// m[0][0] = the initial string
// m[0][1] = the string of attributes
// m[0][2] = the quote itself
// element for each quote found in the string
foreach($m as $match) { // since there is more than quote, we loop and operate on them individually
$quote = array();
foreach(explode(',', $match[1]) as $valuepair) { // split the attributes on the comma
preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
// mm[0] = the attribute pairing
// mm[1] = the attribute name
// mm[2] = the attribute value
$quote[$mm[1]] = $mm[2];
}
$arr[] = $quote; // we now build a parent array, to hold each individual quote
}
print_r($arr);
This gives output like:
Array
(
[0] => Array
(
[name] => Max-Fischer
[post] => 486662533
[member] => 123
)
[1] => Array
(
[name] => Some-Guy
[post] => 486562533
[member] => 1234
)
)

I managed to resolve yout problem: to get an associative array. I hope it will help you.
Here is code
$str = <<< PP
[QUOTE=" name : Max-Fischer,post : 486662533,member : 123 "]I don't so much dance as rhythmically convulse.[/QUOTE]
PP;
preg_match_all('/^\[QUOTE=\"(.*?)\"\](?:.*?)]$/', $str, $matches);
preg_match_all('/([a-zA-Z0-9]+)\s+:\s+([a-zA-Z0-9]+)/', $matches[1][0], $result);
$your_data = array_combine($result[1],$result[2]);
echo "<pre>";
print_r($your_data);

PHP: preg_replace (x) occurrence?

I asked a similar question recently, but didn't get a clear answer because I was too specific. This one is more broad.
Does anyone know how to replace an (x) occurrence in a regex pattern?
Example: Lets say I wanted to replace the 5th occurrence of the regex pattern in a string. How would I do that?
Here is the pattern:
preg_replace('/{(.*?)\|\:(.*?)}/', 'replacement', $this->source);
#anubhava REQUESTED SAMPLE CODE (last function doesn't work):
$sample = 'blah asada asdas {load|:title} steve jobs {load|:css} windows apple ';
$syntax = new syntax();
$syntax->parse($sample);
class syntax {
protected $source;
protected $i;
protected $r;
// parse source
public function parse($source) {
// set source to protected class var
$this->source = $source;
// match all occurrences for regex and run loop
$output = array();
preg_match_all('/\{(.*?)\|\:(.*?)\}/', $this->source, $output);
// run loop
$i = 0;
foreach($output[0] as $key):
// perform run function for each occurrence, send first match before |: and second match after |:
$this->run($output[1][$i], $output[2][$i], $i);
$i++;
endforeach;
echo $this->source;
}
// run function
public function run($m, $p, $i) {
// if method is load perform actions and run inject
switch($m):
case 'load':
$this->inject($i, 'content');
break;
endswitch;
}
// this function should inject the modified data, but I'm still working on this.
private function inject($i, $r) {
$output = preg_replace('/\{(.*?)\|\:(.*?)\}/', $r, $this->source);
}
}

You're misunderstanding regular expressions: they're stateless, have no memory, and no ability to count, nothing, so you can't know that a match is the x'th match in a string - the regex engine doesn't have a clue. You can't do this kind of thing with a regex for the same reason as it's not possible to write a regex to see if a string has balanced brackets: the problem requires a memory, which, by definition, regexes do not have.
However, a regex engine can tell you all the matches, so you're better off using preg_match() to get a list of matches, and then modify the string using that information yourself.
Update: is this closer to what you're thinking of?
<?php
class Parser {
private $i;
public function parse($source) {
$this->i = 0;
return preg_replace_callback('/\{(.*?)\|\:(.*?)\}/', array($this, 'on_match'), $source);
}
private function on_match($m) {
$this->i++;
// Do what you processing you need on the match.
print_r(array('m' => $m, 'i' => $this->i));
// Return what you want the replacement to be.
return $m[0] . '=>' . $this->i;
}
}
$sample = 'blah asada asdas {load|:title} steve jobs {load|:css} windows apple ';
$parse = new Parser();
$result = $parse->parse($sample);
echo "Result is: [$result]\n";
Which gives...
Array
(
[m] => Array
(
[0] => {load|:title}
[1] => load
[2] => title
)
[i] => 1
)
Array
(
[m] => Array
(
[0] => {load|:css}
[1] => load
[2] => css
)
[i] => 2
)
Result is: [blah asada asdas {load|:title}=>1 steve jobs {load|:css}=>2 windows apple ]

A much simpler and cleaner solution, which also deals with backreferences:
function preg_replace_nth($pattern, $replacement, $subject, $nth=1) {
return preg_replace_callback($pattern,
function($found) use (&$pattern, &$replacement, &$nth) {
$nth--;
if ($nth==0) return preg_replace($pattern, $replacement, reset($found) );
return reset($found);
}, $subject,$nth );
}
echo preg_replace_nth("/(\w+)\|/", '${1} is the 4th|', "|aa|b|cc|dd|e|ff|gg|kkk|", 4);
outputs |aa|b|cc|dd is the 4th|e|ff|gg|kkk|

As is already said, a regex has no state and you can't do this by just passing an integer to pinpoint the exact match for replacement ... you could wrap the replacement into a method which finds all matches and replaces only the nth match given as integer
<?
function replace_nth_occurence ( &$haystack, $pattern, $replacement, $occurence) {
preg_match_all($pattern, $haystack, $matches, PREG_OFFSET_CAPTURE);
if(array_key_exists($occurence-1, $matches[0])) {
$haystack = substr($haystack, 0, $matches[0][$occurence-1][1]).
$replacement.
substr($haystack,
$matches[0][$occurence-1][1] +
strlen($matches[0][$occurence-1][0])
);
}
}
$haystack = "test0|:test1|test2|:test3|:test4|test5|test6";
printf("%s \n", $haystack);
replace_nth_occurence( $haystack, '/\|:/', "<=>", 2);
printf("%s \n", $haystack);
?>

This is the alternative approach:
$parts = preg_split('/\{((?:.*?)\|\:(?:.*?))\}/', $this->source, PREG_SPLIT_DELIM_CAPTURE);
$parts will contain original string parts at even offsets [0] [2] [4] [6] [8] [10] ...
And the matched delimiters will be at [1] [3] [5] [7] [9]
To find the 5th occurence for example, you could then modify element $n*2 - 1 which would be element [9] in this case:
$parts[5*2 - 1] = $replacement.
Then reassemble everything:
$output = implode($parts);

There is no literal way to match occurrence 5 of pattern /pat/. But you could match /^(.*?(?:pat.*?){4,4})pat/ and replace by \1repl. This will replace the first 4 occurrences, plus anything following, with the same, and the fifth with repl.
If /pat/ contains capture groups you would need to use the non-capturing equivalent for the first N-1 matches. The replacing pattern should reference the captured groups starting from \\2.
The implementation looks like:
function replace_occurrence($pat_cap,$pat_noncap,$repl,$sample,$n)
{
$nmin = $n-1;
return preg_replace("/^(.*?(?:$pat_noncap.*?){".
"$nmin,$nmin".
"})$pat_cap/",$r="\\1$repl",$sample);
}

My first idea was to use preg_replace with a callback and do the counting in the callback, as other users have (excellently) demonstrated.
Alternatively you can use preg_split keeping the delimiters, using PREG_SPLIT_DELIM_CAPTURE, and do the actual replacement in the resulting array. PHP only captures what's between capturing parens, so you'll either have to adapt the regex or take care of other captures yourself. Assuming 1 capturing pair, then captured delimiters will always be in the odd numbered indexes: 1, 3, 5, 7, 9, .... You'll want index 9; and implode it again.
This does imply you'll need to have a single capturing
$sample = "blah asada asdas {load|:title} steve jobs {load|:css} windows apple\n";
$sample .= $sample . $sample; # at least 5 occurrences
$parts = preg_split('/(\{.*?\|\:.*?\})/', $sample, -1, PREG_SPLIT_DELIM_CAPTURE);
$parts[9] = 'replacement';
$return = implode('', $parts);

How to return only named groups with preg_match or preg_match_all?

Example:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i', $string, $arr_result);
print_r($arr_result);
Returns:
Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
But I want it to be:
Array
(
[date] => 2010-07-18
)
In PHP's PDO object there is an option that is filtering results from database by removing these duplicate numbered values : PDO::FETCH_ASSOC. But I haven't seen similar modifier for the PCRE functions in PHP yet.

How to return only named groups with preg_match or preg_match_all?
This is currently (PHP7) not possible.
You will always get a mixed type array, containing numeric and named keys.
Lets quote the PHP manual (http://php.net/manual/en/regexp.reference.subpatterns.php):
This subpattern will then be indexed in the matches array by its
normal numeric position and also by name.
To solve the problem the following code snippets might help:
1. filter the array by using an is_string check on the array key (for PHP5.6+)
$array_filtered = array_filter($array, "is_string", ARRAY_FILTER_USE_KEY);
2. foreach over the elements and unset if array key is_int() (all PHP versions)
/**
* #param array $array
* #return array
*/
function dropNumericKeys(array $array)
{
foreach ($array as $key => $value) {
if (is_int($key)) {
unset($array[$key]);
}
}
return $array;
}
Its a simple PHP function named dropNumericKeys(). Its for the post-processing of an matches array after a preg_match*() run using named groups for matching. The functions accepts an $array. It iterates the array and removes/unsets all keys with integer type, leaving keys with string type untouched. Finally, the function returns the array with "now" only named keys.
Note: The function is for PHP downward compatiblity. It works on all versions. The array_filter solution relies on the constant ARRAY_FILTER_USE_KEY, which is only available on PHP5.6+. See http://php.net/manual/de/array.constants.php#constant.array-filter-use-key

preg_match does not have any flag or option that it only returns named matches (yet). So what you want is not directly possible. However you can remove all items with non-fitting keys from your matches array and then you get what you're looking for:
$matches = array_intersect_key($matches, array_flip(array('name', 'likes')));

I do not think you can make preg_* do it, but you can do it with a simple loop. But I don't see why those elements pose a problem.

It also possible to unset all numeric indexes before return:
foreach (range(0, floor(count($arr_result) / 2)) as $index) {
unset($arr_result[$index]);
}

Similar to the answer that hakre posted above, I use this snippet to get just the named parameters:
$subject = "This is some text written on 2010-07-18.";
$pattern = '|(?<date>\d\d\d\d-\d\d-\d\d)|i';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
echo '<pre>Before Diff: ', print_r($matches, 1), '</pre>';
$matches = array_diff_key($matches[0], range(0, count($matches[0])));
echo '<pre>After Diff: ', print_r($matches, 1), '</pre>';
...which produces this:
Before Array
(
[0] => Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
)
After Array
(
[date] => 2010-07-18
)

I read in your post that these are possible overloads of future memory etc ...
In this case, why no't can be solved with an unset():
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d{4}-\d{2}-\d{2})|i', $string, $arr_result);
$date = array("date" => $arr_result['date']);
unset($arr_result, $string);//delete array and string preg_match origen
print_r($date);
//or create a new:
// $arr_result = $date;
//print_r($arr_result);

You could use T-Regx and go with group() or namedGroups() which only returns named capturing groups.
<?php
$subject = "This is some text written on 2010-07-18.";
pattern('(?<date>\d\d\d\d-\d\d-\d\d)', 'i')->match($subject)->first(function ($match) {
$date = $match->get('date');
// 2010-07-18
$groups = $match->namedGroups();
// [
// 'date' => '2010-07-18'
// ]
});

I use some of introduced codes and this is the final code works on php 5.6+:
$re = '/\d+\r\n(?<start>[\d\0:]+),\d+\s--\>\s(?<end>[\d\0:]+),.*\r\nHOME.*\r\nGPS\((?<x>[\d\.]+),(?<y>[\d\.]+),(?<d>[\d\.]+)\)\sBAROMETER\:(?<h>[\d\.]+)/';
$str= file_get_contents($srtFile);
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo '<pre>';
$filtered=array_map(function ($d){
return $array_filtered = array_filter($d, "is_string", ARRAY_FILTER_USE_KEY);
},$matches);
var_dump($filtered);
if you are interested what it does it read position data from a str file that DJI drones generate while recording video.

Try this:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i',$string,$arr_result);
echo $arr_result['date'];

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Highlight match result in subject string from preg_match_all() - php

A quick mashup, why use regex? $content = "abc"; $endcontent = ""; for($i = 0; $i > strlen($content); $i++) { $endcontent .= "<span class=\"match0\"><span class=\"match1\">" . $content[$i] . "</span></span>"; } echo $endcontent;

Related

how to prevent preg_match/preg_match_all from creating unnecessary array elements [duplicate]

PHP Regex for a specific numeric value inside a comma-delimited integer number string

Use regular expression to extract attribute value for custom tag

PHP: preg_replace (x) occurrence?

How to return only named groups with preg_match or preg_match_all?

Categories

Resources