Get next word after preg match with PHP - php

How can I get the next word after pregmatch with PHP.
For example, If I have a string like this:
"This is a string, keyword next, some more text. keyword next-word."
I want to use a preg_match to get the next word after “keyword”, including if the word is hyphenated.
So in the case above, I want to return “next” and ”next-word”
I’ve tried :
$string = "This is a string, keyword next, some more text. keyword next-word.";
$keywords = preg_split("/(?<=\keyword\s)(\w+)/", $string);
print_r($keywords);
Which just returns everything and doesn’t seem to work at all.
Any help is much appreciated.

Using your example this should work using preg_match_all:
// Set the test string.
$string = "This is a string, keyword next, some more text. keyword next-word. keyword another_word. Okay, keyword do-rae-mi-fa_so_la.";
// Set the regex.
$regex = '/(?<=\bkeyword\s)(?:[\w-]+)/is';
// Run the regex with preg_match_all.
preg_match_all($regex, $string, $matches);
// Dump the resulst for testing.
echo '<pre>';
print_r($matches);
echo '</pre>';
And the results I get are:
Array
(
[0] => Array
(
[0] => next
[1] => next-word
[2] => another_word
[3] => do-rae-mi-fa_so_la
)
)

Positive look behind is what you are looking for:
(?<=\bkeyword\s)([a-zA-Z-]+)
Should work perfect with preg_match. Use g modifier to catch all matches.
Demo
Reference Question: How to match the first word after an expression with regex?

While regex is powerful, it's also for most of us hard to debug and memorize.
On this particular case, Get next word after ... match with PHP, which is a very common string operation.
Simply, by exploding the string in an array, and searching the index. This is useful because we can specify how much words forward or backward.
This match the first occurrence + 1 word:
<?php
$string = explode(" ","This is a string, keyword next, some more text. keyword next-word.");
echo $string[array_search("keyword",$string) + 1];
/* OUTPUT next, *
Run it online
By reversing the array, we can catch the last occurrence - 1 word:
<?php
$string = array_reverse(explode(" ","This is a string, keyword next, some more text. keyword next-word."));
echo $string[array_search("keyword",$string) - 1];
/* OUTPUT next-word. */
Run it online
This is good for performances if we are making multiples searches, but of course the length of the string must be kept short (whole string in memory).

Related

regex - finding multiple occurances of a pattern and extracting a string [duplicate]

I have tried the non capturing group option ?:
Here is my data:
hello:"abcdefg"},"other stuff
Here is my regex:
/hello:"(.*?)"}/
Here is what it returns:
Array
(
[0] => Array
(
[0] => hello:"abcdefg"}
)
[1] => Array
(
[0] => abcdefg
)
)
I wonder, how can I make it so that [0] => abdefg and that [1] => doesnt exist?
Is there any way to do this? I feel like it would be much cleaner and improve my performance. I understand that regex is simply doing what I told it to do, that is showing me the whole string that it found, and the group inside the string. But how can I make it only return abcdefg, and nothing more? Is this possible to do?
Thanks.
EDIT: I am using the regex on a website that says it uses perl regex. I am not actually using the perl interpreter
EDIT Again: apparently I misread the website. It is indeed using PHP, and it is calling it with this function: preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
I apologize for this error, I fixed the tags.
EDIT Again 2: This is the website http://www.solmetra.com/scripts/regex/index.php
preg_match_all
If you want a different captured string, you need to change your regex. Here I'm looking for anything not a double quote " between two quote " characters behind a : colon character.
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!(?<=:")[^"]+(?=")!';
preg_match_all($pattern,$string,$matches);
echo $matches[0][0];
?>
Output
abcdefg
If you were to print_r($matches) you would see that you have the default array and the matches in their own additional arrays. So to access the string you would need to use $matches[0][0] which provides the two keys to access the data. But you're always going to have to deal with arrays when you're using preg_match_all.
Array
(
[0] => Array
(
[0] => abcdefg
)
)
preg_replace
Alternatively, if you were to use preg_replace instead, you could replace all of the contents of the string except for your capture group, and then you wouldn't need to deal with arrays (but you need to know a little more about regex).
<?php
$string = 'hello:"abcdefg"},"other stuff';
$pattern = '!^[^:]+:"([^"]+)".+$!s';
$new_string = preg_replace($pattern,"$1",$string);
echo $new_string;
?>
Output
abcdefg
preg_match_all is returning exactly what is supposed to.
The first element is the entire string that matched the regex. Every other element are the capture groups.
If you just want the the capture group, then just ignore the 1st element.
preg_match_all('/hello:"(.*?)"}/', 'hello:"abcdefg"},"other stuff', $arr, PREG_PATTERN_ORDER);
$firstMatch = $arr[1];

PHP preg_match_all REGEX find all "file(...)" functions

I'm trying to parse a string containing the dump of a PHP file, to match all the occurrences of the PHP native function file(). I'm using preg_match_all with a REGEX that is not fully working as expected.
In fact, because I'm looking to match all the occurrences of the file() function, I do not want to match results like $file(), $file or is_file().
This is the PHP code on which I'm trying to match all the occurrences of file():
<?php
$x = file('one.php');
file('two.php');
//
function foo($path)
{
return file($path);
}
function inFile()
{
return "should not be matched";
}file('next_to_brackets.php');
foo('three.php');
file('four.php'); // comment
$file = 'should not be matched';
$_file = 'inFile';
$_file();
file('five.php');
The REGEX I'm using is the following:
/[^A-Za-z0-9\$_]file\s*\(.*?(\n|$)/i
[^A-Za-z0-9\$_] Starts with anyting except for letters, numbers, underscores and dollars.
file Continue with "file" word.
\s* Capture any space after "file" word.
\( After the spaces there should be an opening parenthesis.
.*? Capture any list of characters (the arguments).
(\n|$) Stop capturing until a new line or the end of haystack is found.
/i Used for case-insensitive matches.
With this PHP code for testing the result:
preg_match_all('/[^A-Za-z0-9\$_]file\s*\(.*?(\n|$)/i', $string, $matches);
print_r($matches[0]);
/*
//Prints:
Array
(
[0] => file('one.php');
[1] => file($path);
[2] => }file('next_to_brackets.php');
[3] =>
file('four.php'); // comment
[4] =>
file('five.php');
)
*/
For some reasons, my REGEX is not returning the second occurrence of file('two.php'); when this is a valid function, not a variable. It is definitely caused by the fact that it's right below another occurrence of the match ($x = file('one.php');).
Any suggestions on how to match exact PHP functions in a string containing PHP code?
Thank you!

Php preg_match multiple occurrences, return unique array

I want to be able to extract certain parts of the string and return unique array. Here is my string:
$string = "
<div> some text goes here... **css/method|1|2**</div>
<div>**php/method|3|4**</div>
<div>**html|method|6|9** and more text here</div>
<div>**html/method|2|5**</div>
";
using preg_match_all()
$pattern = "/**(.*?)**/";
preg_match_all($pattern, $string, $matches);
I can extract all the parts from the string, but I need to go step further, and only return the following:
css, php and html.
the final array should look like this:
$result = array("css", "php", "html");
So basically, I need to eliminate duplicate values in this case "html", as well as extract each value before backslash or pipe. I don't care about method parts as well as what goes after.
The solution using preg_match_all and array_unique functions:
preg_match_all("~\*\*([^/|*]+)(?=[/|])~", $string, $matches);
$result = array_unique($matches[1]);
print_r($result);
The output:
Array
(
[0] => css
[1] => php
[2] => html
)
(?=[/|]) - positive lookahead assertion which matches word that is followed by one of the characters /|
Update: to ignore tags from match update regex pattern with the following ~\*\*([^/|*<>]+)(?=[/|])~

find a specific word in string php

I have a text in PHP stored in the variable $row. I'd like to find the position of a certain group of words and that's quite easy. What's not so easy is to make my code recognize that the word it has found is exactly the word i'm looking for or a part of a larger word. Is there a way to do it?
Example of what I'd like to obtain
CODE:
$row= "some ugly text of some kind i'd like to find in someway"
$token= "some";
$pos= -1;
$counter= substr_count($row, $token);
for ($h=0; $h<$counter; $h++) {
$pos= strpos($row, $token, $pos+1);
echo $pos.' ';
}
OUTPUT:
what I obtain:
0 17 47
what I'd like to obtain
0 17
Any hint?
Use preg_match_all() with word boundaries (\b):
$search = preg_quote($token, '/');
preg_match_all("/\b$search\b/", $row, $m, PREG_OFFSET_CAPTURE);
Here, the preg_quote() statement is used to correctly escape the user input so as to use it in our regular expression. Some characters have special meaning in regular expression language — without proper escaping, those characters will lose their "special meaning" and your regex might not work as intended.
In the preg_match_all() statement, we are supplying the following regex:
/\b$search\b/
Explanation:
/ - starting delimiter
\b - word boundary. A word boundary, in most regex dialects, is a position between a word character (\w) and a non-word character (\W).
$search - escaped search term
\b - word boundary
/ - ending delimiter
In simple English, it means: find all the occurrences of the given word some.
Note that we're also using PREG_OFFSET_CAPTURE flag here. If this flag is passed, for every occurring match the appendant string offset will also be returned. See the documentation for more information.
To obtain the results you want, you can simply loop through the $m array and extract the offsets:
$result = implode(' ', array_map(function($arr) {
return $arr[1];
}, $m[0]));
echo $result;
Output:
0 18
Demo
What you're looking for is a combination of Regex with a word boundaries pattern and the flag to return the offset (PREG_OFFSET_CAPTURE).
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant
string offset will also be returned. Note that this changes the
value of matches into an array where every element is an array
consisting of the matched string at offset 0 and its string offset
into subject at offset 1.
$row= "some ugly text of some kind i'd like to find in someway";
$pattern= "/\bsome\b/i";
preg_match_all($pattern, $row, $matches, PREG_OFFSET_CAPTURE);
And we get something like this:
Array
(
[0] => Array
(
[0] => Array
(
[0] => some
[1] => 0
)
[1] => Array
(
[0] => some
[1] => 18
)
)
)
And just loop through the matches and extract the offset where the needle was found in the haystack.
// store the positions of the match
$offsets = array();
foreach($matches[0] as $match) {
$offsets[] = $match[1];
}
// display the offsets
echo implode(' ', $offsets);
Use preg_match():
if(preg_match("/some/", $row))
// [..]
The first argument is a regex, which can match virtually anything you want to match. But, there are dire warnings about using it to match things like HTML.

preg_replace_callback regex issue, match with (.*?) returns array

Given the string {{esc}}"Content"{{/esc}} ... {{esc}}"More content"{{/esc}} I would like to output \"Content\" ... \"More content\" e.g., I am trying to escape the quotes inside a string. (This is a contrived example, though, so an answer with something like 'just use this library to do it' would be unhelpful.)
Here is my current solution:
return preg_replace_callback(
'/{{esc}}(.*?){{\/esc}}/',
function($m) {
return str_replace('"', '\\"', $m[1]);
},
$text
);
As you can see, I need to say $m[1], because a print_r reveals that $m looks like this:
Array
(
[0] => {{esc}}"Content"{{/esc}}
[1] => "Content"
)
or, for the second match,
Array
(
[0] => {{esc}}"More content"{{/esc}}
[1] => "More content"
)
My question is: why does my regex cause $m to be an array? Is there any way I can get the result of $m[1] as just a single variable $m?
The regex matches the string and puts the result into array. If match, the first index store the whole match string, the rest elements of the array are the string captured.
preg_replace_callback() acts like preg_match():
$result = array();
preg_match('/{{esc}}(.*?){{\/esc}}/', $input_str, $result);
// $result will be an array if match.
With the help of Jack, I answered my own question here since srain did not make this point clear: The second element of the array is the result captured by the parenthesized subexpression (.*?), per the PHP manual. Indeed, there does not appear to be a convenient way to extract the string matched by this subexpression otherwise.

Categories