PHP preg_match_all REGEX find all "file(...)" functions - php

I'm trying to parse a string containing the dump of a PHP file, to match all the occurrences of the PHP native function file(). I'm using preg_match_all with a REGEX that is not fully working as expected.
In fact, because I'm looking to match all the occurrences of the file() function, I do not want to match results like $file(), $file or is_file().
This is the PHP code on which I'm trying to match all the occurrences of file():
<?php
$x = file('one.php');
file('two.php');
//
function foo($path)
{
return file($path);
}
function inFile()
{
return "should not be matched";
}file('next_to_brackets.php');
foo('three.php');
file('four.php'); // comment
$file = 'should not be matched';
$_file = 'inFile';
$_file();
file('five.php');
The REGEX I'm using is the following:
/[^A-Za-z0-9\$_]file\s*\(.*?(\n|$)/i
[^A-Za-z0-9\$_] Starts with anyting except for letters, numbers, underscores and dollars.
file Continue with "file" word.
\s* Capture any space after "file" word.
\( After the spaces there should be an opening parenthesis.
.*? Capture any list of characters (the arguments).
(\n|$) Stop capturing until a new line or the end of haystack is found.
/i Used for case-insensitive matches.
With this PHP code for testing the result:
preg_match_all('/[^A-Za-z0-9\$_]file\s*\(.*?(\n|$)/i', $string, $matches);
print_r($matches[0]);
/*
//Prints:
Array
(
[0] => file('one.php');
[1] => file($path);
[2] => }file('next_to_brackets.php');
[3] =>
file('four.php'); // comment
[4] =>
file('five.php');
)
*/
For some reasons, my REGEX is not returning the second occurrence of file('two.php'); when this is a valid function, not a variable. It is definitely caused by the fact that it's right below another occurrence of the match ($x = file('one.php');).
Any suggestions on how to match exact PHP functions in a string containing PHP code?
Thank you!

Related

Trying to create a regex in PHP that matches patterns inside a pattern

I have seen some regex examples where the string is "Test string: Group1Group2", and using preg_match_all(), matching for patterns of text that exists inside the tags.
However, what I am trying to do is a bit different, where my string is something like this:
"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"
What I want to do is match the sections such as 'variable=123' that exist inside the parenthesis.
What I have so far is this:
if( preg_match_all("/\(([^\)]*?)\)"), $string_value, $matches )
{
print_r( $matches[1] );
}
But this just captures everything that's inside the parenthesis, and doesn't match anything else.
Edit:
The desired output would be:
"variable1=123"
"variable2=743"
"variable3=535"
The output that I am getting is:
"variable1=123,variable2=743,variable3=535"
You can extract the matches you need with a single call to preg_match_all if the matches do not contain (, ) or ,:
$s = '"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"';
if (preg_match_all('~(?:\G(?!\A),|\()\K[^,]+(?=[^()]*\))~', $s, $matches)) {
print_r($matches[0]);
}
See the regex demo and a PHP demo.
Details:
(?:\G(?!\A),|\() - either end of the preceding successful match and a comma, or a ( char
\K - match reset operator that discards all text matched so far from the current overall match memory buffer
[^,]+ - one or more chars other than a comma (use [^,]* if you expect empty matches, too)
(?=[^()]*\)) - a positive lookahead that requires zero or more chars other than ( and ) and then a ) immediately to the right of the current location.
I would do this:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
$result = explode(",", $matches[1]);
If your end result is an array of key => value then you can transform it into a query string:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
parse_str(str_replace(',', '&', $matches[1]), $result);
Which yields:
Array
(
[variable1] => 123
[variable2] => 743
[variable3] => 535
)
Or replace with a newline \n and use parse_ini_string().

Matching whole words between commas, or a comma at the beginning, or a comma at the end with Regex

I have a string like this:
page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags
I made this regex that I expect to get the whole tags with:
(?<=\,)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=\,)
I want it to match all the ocurrences.
In this case:
page-9000 and rss-latest.
This regex checks whole words between commas just fine but it ignores the first and the last because it's not between commas (obviously).
I've also tried that it checks if it's between commas OR one comma at the beginning OR one comma to the end, however it would give me false positives, as it would match:
category-128
while the string contains:
page-category-128
Any help?
Try using the following pattern:
(?<=,|^)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=,|$)
The only change I have made is to add boundary markers ^ and $ to the lookarounds to also match on the start and end of the input.
Script:
$input = "page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags";
preg_match_all("/(?<=,|^)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=,|$)/", $input, $matches);
print_r($matches[1]);
This prints:
Array
(
[0] => page-9000
[1] => rss-latest
)
Here is a non-regex way using explode and array_intersect:
$arr1 = explode(',', 'page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags');
$arr2 = explode('|', 'rss-latest|listing-latest-no-category|category-128|page-9000');
print_r(array_intersect($arr1, $arr2));
Output:
Array
(
[0] => page-9000
[6] => rss-latest
)
The (?<=\,) and (?=,) require the presence of , on both sides of the matching pattern. You want to match also at the start/end of string, and this is where you need to either explicitly tell to match either , or start/end of string or use double-negating logic with negated character classes inside negative lookarounds.
You may use
(?<![^,])(?:rss-latest|listing-latest-no-category|category-128|page-9000)(?![^,])
See the regex demo
Here, (?<![^,]) matches the start of string position or a , and (?![^,]) matches the end of string position or ,.
Now, you do not even need a capturing group, you may get rid of its overhead using a non-capturing group, (?:...). preg_match_all won't have to allocate memory for the submatches and the resulting array will be much cleaner.
PHP demo:
$re = '/(?<![^,])(?:rss-latest|listing-latest-no-category|category-128|page-9000)(?![^,])/m';
$str = 'page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
// => Array ( [0] => page-9000 [1] => rss-latest )

How to separate string to number in single word with PHP?

I have the word AK747, I use regex to detect if a string (at least 2 chars ex: AK) is followed by a number (at least to digits ex: 747).
EDIT : (sorry that I wasn't clear on this guys)
I need to do this above because :
In some case I need to split to match search against AK-747. When I search for string 'AK-747' with keyword 'AK747' it won't find a match unless I use levenshtein in database, so I prefer splitting AK747 to AK and 747.
My code:
$strNumMatch = preg_match('/^[a-zA-Z]{2,}[0-9]{2,}$/',
$value, $match);
if(isset($match[0]))
echo $match[0];
How do I split to array ['AK', '747'] for example with preg_split() or any other way?
$input = 'AK-747';
if (preg_match('/^([a-z]{2,})-?([0-9]{2,})$/i', $input, $result)) {
unset($result[0]);
}
print_r($result);
The output:
Array
(
[1] => AK
[2] => 747
)
You may try this:
preg_match('/[0-9]{2,}/', $value, $matches, PREG_OFFSET_CAPTURE);
$position = $matches[0][1];
$letters = substr($value, 0, $position);
$numbers = substr($value, $position);
This way you get the position of the first number and split there.
EDIT:
Starting from your original approach this could look somewhat like this:
$strNumMatch = preg_match('/^([a-zA-Z]{2,})([0-9]{2,})$/', $value, $match, PREG_OFFSET_CAPTURE);
if($strNumMatch){
$position = $matches[2][1];
$letters = substr($value, 0, $position);
$numbers = substr($value, $position);
$alternative = $letters.'-'.$numbers;
}
preg_split() is a very sensible and direct call since you desire an indexed array containing the two substrings.
Code: (Demo)
$input = 'AK-747';
var_export(preg_split('/[a-z]{2,}\K-?/i',$input));
Output:
array (
0 => 'AK',
1 => '747',
)
The \K means "restart the fullstring match". Effectively, everything to the left of \K is retained as the first element in the result array and everything to right (the optional hyphen) is omitted because it is considered the delimiter. Pattern Demo
Code: (Demo)
I process a small battery of inputs to show what can be done and explain after the snippet.
$inputs=['AK747','AK-747','AK-','AK']; // variations as I understand them
foreach($inputs as $input){
echo "$input returns: ";
var_export(preg_split('/[a-z]{2,}\K-?/i',$input,2,PREG_SPLIT_NO_EMPTY));
echo "\n";
}
Output:
AK747 returns: array (
0 => 'AK',
1 => '747',
)
AK-747 returns: array (
0 => 'AK',
1 => '747',
)
AK- returns: array (
0 => 'AK',
)
AK returns: array (
0 => 'AK',
)
preg_split() takes a pattern that receives a pattern that will match a variable substring and use it as a delimiter. If - were present in every input string then explode('-',$input) would be most appropriate. However, - is optional in this task, so the pattern must allow - to be optional (this is what the ? quantifier does in all of the patterns on this page).
Now, you couldn't just use a pattern like /-?/, that would split the string on every character. To overcome this, you need to tell the regex engine the exact expected location for the optional -. You do this by referencing [a-z]{2,} before the -? (single intended delimiter).
The pattern /[a-z]{2,}-?/i does a fair job of finding the correct location for the optional hyphen, but now the trouble is, the leading letters in the string are included as part of the delimiting substring.
Sometimes, "lookarounds" can be used in regex patterns to match but not consume substrings. A "positive lookbehind" is used to match a preceding substring, however "variable length lookbehinds" are not permitted in php (and most other regex flavors). This is what the invalid pattern would look like: /(?<=[a-z]{2,})-?/i.
The way around this technicality is to "restart the fullstring match" using the \K token (aka a lookbehind alternative) just before the optional hyphen. To correctly target only the intended delimiter, the leading letters must be "matched/consumed" then "discarded" -- that's what \K does.
As for the inclusion of the 3rd and 4th parameter of preg_split()...
I've set the 3rd parameter to 2. This is just like the limit parameter that explode() has. It instructs the function to not make more than 2 output elements. For this case, I could have used NULL or -1 to mean "unlimited", but I could NOT leave the parameter empty -- it must be assigned to allow for the declaration of the 4th parameter.
I've set the 4th parameter to PREG_SPLIT_NO_EMPTY which instructs the function to not generate empty output elements.
Ta-Da!
p.s. a preg_match_all() solution is as easy as using a pipe and two anchors:
$inputs=['AK747','AK-747','AK-','AK']; // variations as I understand them
foreach($inputs as $input){
echo "$input returns: ";
var_export(preg_match_all('/^[a-z]{2,}|\d{2,}$/i',$input,$out)?$out[0]:[]);
echo "\n";
}
// same outputs as above
You can make the - optional with ?.
/([A-Za-z]{2,}-?[0-9]{2,})/
https://regex101.com/r/tIgM4F/1

Get next word after preg match with PHP

How can I get the next word after pregmatch with PHP.
For example, If I have a string like this:
"This is a string, keyword next, some more text. keyword next-word."
I want to use a preg_match to get the next word after “keyword”, including if the word is hyphenated.
So in the case above, I want to return “next” and ”next-word”
I’ve tried :
$string = "This is a string, keyword next, some more text. keyword next-word.";
$keywords = preg_split("/(?<=\keyword\s)(\w+)/", $string);
print_r($keywords);
Which just returns everything and doesn’t seem to work at all.
Any help is much appreciated.
Using your example this should work using preg_match_all:
// Set the test string.
$string = "This is a string, keyword next, some more text. keyword next-word. keyword another_word. Okay, keyword do-rae-mi-fa_so_la.";
// Set the regex.
$regex = '/(?<=\bkeyword\s)(?:[\w-]+)/is';
// Run the regex with preg_match_all.
preg_match_all($regex, $string, $matches);
// Dump the resulst for testing.
echo '<pre>';
print_r($matches);
echo '</pre>';
And the results I get are:
Array
(
[0] => Array
(
[0] => next
[1] => next-word
[2] => another_word
[3] => do-rae-mi-fa_so_la
)
)
Positive look behind is what you are looking for:
(?<=\bkeyword\s)([a-zA-Z-]+)
Should work perfect with preg_match. Use g modifier to catch all matches.
Demo
Reference Question: How to match the first word after an expression with regex?
While regex is powerful, it's also for most of us hard to debug and memorize.
On this particular case, Get next word after ... match with PHP, which is a very common string operation.
Simply, by exploding the string in an array, and searching the index. This is useful because we can specify how much words forward or backward.
This match the first occurrence + 1 word:
<?php
$string = explode(" ","This is a string, keyword next, some more text. keyword next-word.");
echo $string[array_search("keyword",$string) + 1];
/* OUTPUT next, *
Run it online
By reversing the array, we can catch the last occurrence - 1 word:
<?php
$string = array_reverse(explode(" ","This is a string, keyword next, some more text. keyword next-word."));
echo $string[array_search("keyword",$string) - 1];
/* OUTPUT next-word. */
Run it online
This is good for performances if we are making multiples searches, but of course the length of the string must be kept short (whole string in memory).

find a specific word in string php

I have a text in PHP stored in the variable $row. I'd like to find the position of a certain group of words and that's quite easy. What's not so easy is to make my code recognize that the word it has found is exactly the word i'm looking for or a part of a larger word. Is there a way to do it?
Example of what I'd like to obtain
CODE:
$row= "some ugly text of some kind i'd like to find in someway"
$token= "some";
$pos= -1;
$counter= substr_count($row, $token);
for ($h=0; $h<$counter; $h++) {
$pos= strpos($row, $token, $pos+1);
echo $pos.' ';
}
OUTPUT:
what I obtain:
0 17 47
what I'd like to obtain
0 17
Any hint?
Use preg_match_all() with word boundaries (\b):
$search = preg_quote($token, '/');
preg_match_all("/\b$search\b/", $row, $m, PREG_OFFSET_CAPTURE);
Here, the preg_quote() statement is used to correctly escape the user input so as to use it in our regular expression. Some characters have special meaning in regular expression language — without proper escaping, those characters will lose their "special meaning" and your regex might not work as intended.
In the preg_match_all() statement, we are supplying the following regex:
/\b$search\b/
Explanation:
/ - starting delimiter
\b - word boundary. A word boundary, in most regex dialects, is a position between a word character (\w) and a non-word character (\W).
$search - escaped search term
\b - word boundary
/ - ending delimiter
In simple English, it means: find all the occurrences of the given word some.
Note that we're also using PREG_OFFSET_CAPTURE flag here. If this flag is passed, for every occurring match the appendant string offset will also be returned. See the documentation for more information.
To obtain the results you want, you can simply loop through the $m array and extract the offsets:
$result = implode(' ', array_map(function($arr) {
return $arr[1];
}, $m[0]));
echo $result;
Output:
0 18
Demo
What you're looking for is a combination of Regex with a word boundaries pattern and the flag to return the offset (PREG_OFFSET_CAPTURE).
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant
string offset will also be returned. Note that this changes the
value of matches into an array where every element is an array
consisting of the matched string at offset 0 and its string offset
into subject at offset 1.
$row= "some ugly text of some kind i'd like to find in someway";
$pattern= "/\bsome\b/i";
preg_match_all($pattern, $row, $matches, PREG_OFFSET_CAPTURE);
And we get something like this:
Array
(
[0] => Array
(
[0] => Array
(
[0] => some
[1] => 0
)
[1] => Array
(
[0] => some
[1] => 18
)
)
)
And just loop through the matches and extract the offset where the needle was found in the haystack.
// store the positions of the match
$offsets = array();
foreach($matches[0] as $match) {
$offsets[] = $match[1];
}
// display the offsets
echo implode(' ', $offsets);
Use preg_match():
if(preg_match("/some/", $row))
// [..]
The first argument is a regex, which can match virtually anything you want to match. But, there are dire warnings about using it to match things like HTML.

Categories