regex - finding multiple occurances of a pattern and extracting a string [duplicate]

regex - finding multiple occurances of a pattern and extracting a string [duplicate] - php

I have tried the non capturing group option ?:
Here is my data:
hello:"abcdefg"},"other stuff
Here is my regex:
/hello:"(.*?)"}/
Here is what it returns:
Array
(
[0] => Array
(
[0] => hello:"abcdefg"}
)
[1] => Array
(
[0] => abcdefg
)
)
I wonder, how can I make it so that [0] => abdefg and that [1] => doesnt exist?
Is there any way to do this? I feel like it would be much cleaner and improve my performance. I understand that regex is simply doing what I told it to do, that is showing me the whole string that it found, and the group inside the string. But how can I make it only return abcdefg, and nothing more? Is this possible to do?
Thanks.
EDIT: I am using the regex on a website that says it uses perl regex. I am not actually using the perl interpreter
EDIT Again: apparently I misread the website. It is indeed using PHP, and it is calling it with this function: preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
I apologize for this error, I fixed the tags.
EDIT Again 2: This is the website http://www.solmetra.com/scripts/regex/index.php

preg_match_all
If you want a different captured string, you need to change your regex. Here I'm looking for anything not a double quote " between two quote " characters behind a : colon character.
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!(?<=:")[^"]+(?=")!';
preg_match_all($pattern,$string,$matches);
echo $matches[0][0];
?>
Output
abcdefg
If you were to print_r($matches) you would see that you have the default array and the matches in their own additional arrays. So to access the string you would need to use $matches[0][0] which provides the two keys to access the data. But you're always going to have to deal with arrays when you're using preg_match_all.
Array
(
[0] => Array
(
[0] => abcdefg
)
)
preg_replace
Alternatively, if you were to use preg_replace instead, you could replace all of the contents of the string except for your capture group, and then you wouldn't need to deal with arrays (but you need to know a little more about regex).
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!^[^:]+:"([^"]+)".+$!s';
$new_string = preg_replace($pattern,"$1",$string);
echo $new_string;
?>
Output
abcdefg

preg_match_all is returning exactly what is supposed to.
The first element is the entire string that matched the regex. Every other element are the capture groups.
If you just want the the capture group, then just ignore the 1st element.
preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
$firstMatch = $arr[1];

Related

Php preg_match multiple occurrences, return unique array

I want to be able to extract certain parts of the string and return unique array. Here is my string:
$string = "
<div> some text goes here... **css/method|1|2**</div>
<div>**php/method|3|4**</div>
<div>**html|method|6|9** and more text here</div>
<div>**html/method|2|5**</div>
";
using preg_match_all()
$pattern = "/**(.*?)**/";
preg_match_all($pattern, $string, $matches);
I can extract all the parts from the string, but I need to go step further, and only return the following:
css, php and html.
the final array should look like this:
$result = array("css", "php", "html");
So basically, I need to eliminate duplicate values in this case "html", as well as extract each value before backslash or pipe. I don't care about method parts as well as what goes after.

The solution using preg_match_all and array_unique functions:
preg_match_all("~\*\*([^/|*]+)(?=[/|])~", $string, $matches);
$result = array_unique($matches[1]);
print_r($result);
The output:
Array
(
[0] => css
[1] => php
[2] => html
)
(?=[/|]) - positive lookahead assertion which matches word that is followed by one of the characters /|
Update: to ignore tags from match update regex pattern with the following ~\*\*([^/|*<>]+)(?=[/|])~

preg_replace_callback regex issue, match with (.*?) returns array

Given the string {{esc}}"Content"{{/esc}} ... {{esc}}"More content"{{/esc}} I would like to output \"Content\" ... \"More content\" e.g., I am trying to escape the quotes inside a string. (This is a contrived example, though, so an answer with something like 'just use this library to do it' would be unhelpful.)
Here is my current solution:
return preg_replace_callback(
'/{{esc}}(.*?){{\/esc}}/',
function($m) {
return str_replace('"', '\\"', $m[1]);
},
$text
);
As you can see, I need to say $m[1], because a print_r reveals that $m looks like this:
Array
(
[0] => {{esc}}"Content"{{/esc}}
[1] => "Content"
)
or, for the second match,
Array
(
[0] => {{esc}}"More content"{{/esc}}
[1] => "More content"
)
My question is: why does my regex cause $m to be an array? Is there any way I can get the result of $m[1] as just a single variable $m?

The regex matches the string and puts the result into array. If match, the first index store the whole match string, the rest elements of the array are the string captured.
preg_replace_callback() acts like preg_match():
$result = array();
preg_match('/{{esc}}(.*?){{\/esc}}/', $input_str, $result);
// $result will be an array if match.

With the help of Jack, I answered my own question here since srain did not make this point clear: The second element of the array is the result captured by the parenthesized subexpression (.*?), per the PHP manual. Indeed, there does not appear to be a convenient way to extract the string matched by this subexpression otherwise.

PHP regex, how can I make my regex only return one group?

I have tried the non capturing group option ?:
Here is my data:
hello:"abcdefg"},"other stuff
Here is my regex:
/hello:"(.*?)"}/
Here is what it returns:
Array
(
[0] => Array
(
[0] => hello:"abcdefg"}
)
[1] => Array
(
[0] => abcdefg
)
)
I wonder, how can I make it so that [0] => abdefg and that [1] => doesnt exist?
Is there any way to do this? I feel like it would be much cleaner and improve my performance. I understand that regex is simply doing what I told it to do, that is showing me the whole string that it found, and the group inside the string. But how can I make it only return abcdefg, and nothing more? Is this possible to do?
Thanks.
EDIT: I am using the regex on a website that says it uses perl regex. I am not actually using the perl interpreter
EDIT Again: apparently I misread the website. It is indeed using PHP, and it is calling it with this function: preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
I apologize for this error, I fixed the tags.
EDIT Again 2: This is the website http://www.solmetra.com/scripts/regex/index.php

preg_match_all
If you want a different captured string, you need to change your regex. Here I'm looking for anything not a double quote " between two quote " characters behind a : colon character.
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!(?<=:")[^"]+(?=")!';
preg_match_all($pattern,$string,$matches);
echo $matches[0][0];
?>
Output
abcdefg
If you were to print_r($matches) you would see that you have the default array and the matches in their own additional arrays. So to access the string you would need to use $matches[0][0] which provides the two keys to access the data. But you're always going to have to deal with arrays when you're using preg_match_all.
Array
(
[0] => Array
(
[0] => abcdefg
)
)
preg_replace
Alternatively, if you were to use preg_replace instead, you could replace all of the contents of the string except for your capture group, and then you wouldn't need to deal with arrays (but you need to know a little more about regex).
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!^[^:]+:"([^"]+)".+$!s';
$new_string = preg_replace($pattern,"$1",$string);
echo $new_string;
?>
Output
abcdefg

preg_match_all is returning exactly what is supposed to.
The first element is the entire string that matched the regex. Every other element are the capture groups.
If you just want the the capture group, then just ignore the 1st element.
preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
$firstMatch = $arr[1];

Split string depending on the existence of a leading character

In PHP, I need to split a string by ":" characters without a leading "*".
This is what using explode() does:
$string = "1*:2:3*:4";
explode(":", $string);
output: array("1*", "2", "3*", "4")
However the output I need is:
output: array("1*:2", "3*:4")
How would I achieve the desired output?

You're probably looking for preg_match_all() rather than explode(), as you are attempting a more complex split than explode() itself can handle. preg_match_all() will allow you to gather all of the parts of a string that match a specific pattern, expressed using a regular expression. The pattern you are looking for is something along the lines of:
anything except : followed by *: followed by anything but :
So, try this instead:
preg_match_all('/[^:]+\*:[^:]+/', $string, $matches);
print_r($matches);
Which will output something like:
Array
(
[0] => Array
(
[0] => 1*:2
[1] => 3*:4
)
)
Which you should be able to use in much the same way that you would use the results of explode() even if there is the added dimension in the array (it divides the matches into 'groups', and all your results match against the whole expression or the first (0th) group).

$str = '1*:2:3*:4';
$res = preg_split('~(?<!\*):~',$str);
print_r($res);
will output
Array
(
[0] => 1*:2
[1] => 3*:4
)
The pattern basically says:
split by [a colon that is not lead by an asterisk]

Get all occurrences of words between curly brackets

I have a text like:
This is a {demo} phrase made for {test}
I need to get
demo
test
Note: My text can have more than one block of {}, not always two. Example:
This is a {demo} phrase made for {test} written in {English}
I used this expression /{([^}]*)}/ with preg_match but it returns only the first word, not all words inside the text.

Use preg_match_all instead:
preg_match_all($pattern, $input, $matches);
It's much the same as preg_match, with the following stipulations:
Searches subject for all matches to the regular expression given in
pattern and puts them in matches in the order specified by flags.
After the first match is found, the subsequent searches are continued
on from end of the last match.

Your expression is correct, but you should be using preg_match_all() instead to retrieve all matches. Here's a working example of what that would look like:
$s = 'This is a {demo} phrase made for {test}';
if (preg_match_all('/{([^}]*)}/', $s, $matches)) {
echo join("\n", $matches[1]);
}
To also capture the positions of each match, you can pass PREG_OFFSET_CAPTURE as the fourth parameter to preg_match_all. To use that, you can use the following example:
if (preg_match_all('/{([^}]*)}/', $s, $matches, PREG_OFFSET_CAPTURE)) {
foreach ($matches[1] as $match) {
echo "{$match[0]} occurs at position {$match[1]}\n";
}
}

As the { and } are part of regex matching syntax, you need to escape these characters:
<?php
$text = <<<EOD
this {is} some text {from}
which I {may} want to {extract}
some words {between} brackets.
EOD;
preg_match_all("!\{(\w+)\}!", $text, $matches);
print_r($matches);
?>
produces
Array
(
[0] => Array
(
[0] => {is}
[1] => {from}
[2] => {may}
[3] => {extract}
[4] => {between}
)
... etc ...
)
This example may be helpful to understand the use of curly brackets in regexes:
<?php
$str = 'abc212def3456gh34ij';
preg_match_all("!\d{3,}!", $str, $matches);
print_r($matches);
?>
which returns:
Array
(
[0] => Array
(
[0] => 212
[1] => 3456
)
)
Note that '34' is excluded from the results because the \d{3,} requires a match of at least 3 consecutive digits.

Matching portions between pair of braces using RegEx, is less better than using Stack for this purpose. Using RegEx would be something like «quick and dirty patch», but for parsing and processing input string you have to use a stack. Visit here for the concept and here for applying the same.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

regex - finding multiple occurances of a pattern and extracting a string [duplicate] - php

Related

Php preg_match multiple occurrences, return unique array

preg_replace_callback regex issue, match with (.*?) returns array

PHP regex, how can I make my regex only return one group?

Split string depending on the existence of a leading character

Get all occurrences of words between curly brackets

Categories

Resources