php regex preg_match allow only certain keywords - php

I'm trying use preg_match in an IF statement and return false if a string contains some templated functions not allowed.
Here are some example templated functions allowed:
{function="nl2br($value.field_30)"}
{function="substr($value.field_30,0,250)"}
{function="addslashes($listing.photo.image_title)"}
{function="urlencode($listing.link)"}
{function="AdZone(1)"}
These are mixed in with html etc.
Now I'd like this preg_match statement to return true if regex matches the code format but didn't contain one of the allowed function keywords:
if (preg_match('(({function=)(.+?)(nl2br|substr|addslashes|urlencode|AdZone)(.+?)\})',$string)) {
// found a function not allowed
} else {
// string contains only allowed functions or doesn't contain functions at all
}
Does anyone know how to do this?

Not quite sure what you're trying here, but if I were to make a regexp that matched a list of words (or function names as the case may be), I'd do somthing like
// add/remove allowed stuff here
$allowed = array( 'nl2br', 'substr', 'addslashes' );
// make the array into a branching pattern
$allowed_pattern = implode('|', $allowed);
// the entire regexp (a little stricter than yours)
$pattern = "/\{function=\"($allowed_pattern)\((.*?)\)\"\}/";
if( preg_match($pattern, $string, $matches) ) {
# string DOES contain an allowed function
# The $matches things is optional, but nice. $matches[1] will be the function name, and
# $matches[2] will be the arguments string. Of course, you could just do a
# preg_replace_callback() on everything instead using the same pattern...
} else {
# No allowed functions found
}
The $allowed array makes it easier to add/remove allowed function names, and the regexp is stricter about the curly brackets, quotes and general syntax, which is probably a good idea.
But first of all, flip the if..else branches, or use a !. preg_match is meant for, well, matching stuff in the string, not for matching stuff that isn't in there. So you can't really get it to return true for something that isn't there
Still, as Álvaro mentioned, regexps probably aren't the best way to go about this, and it is pretty risky to have functions exposed like that, no matter the rest of the code. If you just needed to match words it should work fine, but since it's function calls with arbitrary arguments... well. I can't really recommend it :)
Edit: First time around, I used preg_quote on the imploded string, but that of course just escapes the pipe characters, and then the pattern won't work. So skip preg_quote, but then just be sure that function names don't contain anything that might mess up the final pattern (e.g. run each function name through preg_quote before imploding the array)

Related

How can I camelcase a string in php

Is there a simple way I can have php camelcase a string for me? I am using the Laravel framework and I want to use some shorthand in a search feature.
It would look something like the following...
private function search(Array $for, Model $in){
$results = [];
foreach($for as $column => $value){
$results[] = $in->{$this->camelCase($column)}($value)->get();
}
return $results;
}
Called like
$this->search(['where-created_at' => '2015-25-12'], new Ticket);
So the resulting call in the search function I would be using is
$in->whereCreateAt('2015-25-12')->get();
The only thing is I can't figure out is the camel casing...
Have you considered using Laravel's built-in camel case functtion?
$camel = camel_case('foo_bar');
Full details can be found here:
https://laravel.com/docs/4.2/helpers#strings
So one possible solution that could be used is the following.
private function camelCase($string, $dontStrip = []){
/*
* This will take any dash or underscore turn it into a space, run ucwords against
* it so it capitalizes the first letter in all words separated by a space then it
* turns and deletes all spaces.
*/
return lcfirst(str_replace(' ', '', ucwords(preg_replace('/[^a-z0-9'.implode('',$dontStrip).']+/', ' ',$string))));
}
It's a single line of code wrapped by a function with a lot going on...
The breakdown
What is the dontStrip variable?
Simply put it is an array that should contain anything you don't want removed from the camelCasing.
What are you doing with that variable?
We are taking every element in the array and putting them into a single string.
Think of it as something like this:
function implode($glue, $array) {
// This is a native PHP function, I just wanted to demonstrate how it might work.
$string = '';
foreach($array as $element){
$string .= $glue . $element;
}
return $string;
}
This way you're essentially gluing all your elements in your array together.
What's preg_replace, and what's it doing?
preg_replace is a function that uses a regular expression (also known as regex) to search for and then replace any values that it finds, which match the desired regex...
Explanation of the regex search
The regex used in the search above implodes your array $dontStrip onto a little bit a-z0-9 which just means any letter A to Z as well as numbers 0 to 9. The little ^ bit tells regex that it's looking for anything that isn't whatever comes after it. So in this case it's looking for any and all things that aren't in your array or a letter or number.
If you're new to regex and you want to mess around with it, regex101 is a great place to do it.
ucwords?
This can be most easily though of as upper case words. It will take any word (a word being any bit of characters separated by a space) and it will capitalize the first letter.
echo ucwords('hello, world!');
Will print `Hello, World!'
Okay I understand what preg_replace is, what str_replace?
str_replace is the smaller, less powerful but still very useful little brother/sister to preg_replace. By this I mean that it has a similar use. str_replace doesn't regex, but does use a literal string so whatever you type into the first parameter is exactly what it will look for.
Side note, it is worth mentioning for anyone considering only using preg_replace where str_replace would work just as well. str_replace has been noted to be benchmarked a bit faster than preg_replace on larger apps.
lcfirst What?
Since about PHP 5.3 we have been able to use the lcfirst function, which much like ucwords, it's just a text manipulation function. `lcfirst turns the first letter into it's lower case form.
echo lcfirst('HELLO, WORLD!');
Will print 'hELLO, WORLD!'
Results
All this in mind the camelCase function uses distinct non-alphanumeric characters as break points to turn a string to a camelCase string.
There's a general purpose open source library that contains a method that performs case convertions for several popular case formats. library is called TurboCommons, and the formatCase() method inside the StringUtils does camel case conversion.
https://github.com/edertone/TurboCommons
To use it, import the phar file to your project and:
use org\turbocommons\src\main\php\utils\StringUtils;
echo StringUtils::formatCase('sNake_Case', StringUtils::FORMAT_CAMEL_CASE);
// will output 'sNakeCase'
You can use Laravel's built-in camel case helper function
use Illuminate\Support\Str;
$converted = Str::camel('foo_bar');
// fooBar
Full details can be found here:
https://laravel.com/docs/9.x/helpers#method-camel-case
Use the in-built Laravel Helper Function - camel_case()
$camelCase = camel_case('your_text_here');

alternative to if(preg_match() and preg_match())

I want to know if we can replace if(preg_match('/boo/', $anything) and preg_match('/poo/', $anything))
with a regex..
$anything = 'I contain both boo and poo!!';
for example..
From what I understand of your question, you're looking for a way to check if BOTH 'poo' and 'boo' exist within a string using only one regex. I can't think of a more elegant way than this;
preg_match('/(boo.*poo)|(poo.*boo)/', $anything);
This is the only way I can think of to ensure both patterns exists within a string disregarding order. Of course, if you knew they were always supposed to be in the same order, that would make it more simple =]
EDIT
After reading through the post linked to by MisterJ in his answer, it would seem that a more simple regex could be;
preg_match('/(?=.*boo)(?=.*poo)/', $anything);
By using a pipe:
if(preg_match('/boo|poo/', $anything))
You can use the logical or as mentioned by #sroes:
if(preg_match('/(boo)|(poo)/,$anything)) the problem there is that you don't know which one matched.
In this one, you will match "I contain boo","I contain poo" and "I contain boo and poo".
If you want to only match "I contain boo and poo", the problem is really harder to figure out Regular Expressions: Is there an AND operator?
and it seems that you will have to stick with the php test.
To take conditions literally
if(preg_match('/[bp]oo.*[bp]oo/', $anything))
You can achieve this by altering your regular expression, as others have pointed out in other answers. However, if you want to use an array instead, so you do not have to list a long regex pattern, then use something like this:
// Default matches to false
$matches = false;
// Set the pattern array
$pattern_array = array('boo','poo');
// Loop through the patterns to match
foreach($pattern_array as $pattern){
// Test if the string is matched
if(preg_match('/'.$pattern.'/', $anything)){
// Set matches to true
$matches = true;
}
}
// Proceed if matches is true
if($matches){
// Do your stuff here
}
Alternatively, if you are only trying to match strings then it would be much more efficient if you were to use strpos like so:
// Default matches to false
$matches = false;
// Set the strings to match
$strings_to_match = array('boo','poo');
foreach($strings_to_match as $string){
if(strpos($anything, $string) !== false)){
// Set matches to true
$matches = true;
}
}
Try to avoid regular expressions where possible as they are a lot less efficient!

Match array values against text [duplicate]

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.

Replacing Tags with Includes in PHP with RegExps

I need to read a string, detect a {VAR}, and then do a file_get_contents('VAR.php') in place of {VAR}. The "VAR" can be named anything, like TEST, or CONTACT-FORM, etc. I don't want to know what VAR is -- not to do a hard-coded condition, but to just see an uppercase alphanumeric tag surrounded by curly braces and just do a file_get_contents() to load it.
I know I need to use preg_match and preg_replace, but I'm stumbling through the RegExps on this.
How is this useful? It's useful in hooking WordPress.
Orion above has a right solution, but it's not really necessary to use a callback function in your simple case.
Assuming that the filenames are A-Z + hyphens you can do it in 1 line using PHP's /e flag in the regex:
$str = preg_replace('/{([-A-Z]+)}/e', 'file_get_contents(\'$1.html\')', $str);
This'll replace any instance of {VAR} with the contents of VAR.html. You could prefix a path into the second term if you need to specify a particular directory.
There are the same vague security worries as outlined above, but I can't think of anything specific.
You'll need to do a number of things. I'm assuming you can do the legwork to get the page data you want to preprocess into a string.
First, you'll need the regular expression to match correctly. That should be fairly easy with something like /{\w+}/.
Next you'll need to use all of the flags to preg_match to get the offset location in the page data. This offset will let you divide the string into the before, matching, and after parts of the match.
Once you have the 3 parts, you'll need to run your include, and stick them back together.
Lather, rinse, repeat.
Stop when you find no more variables.
This isn't terribly efficient, and there are probably better ways. You may wish to consider doing a preg_split instead, splitting on /[{}]/. No matter how you slice it you're assuming that you can trust your incoming data, and this will simplify the whole process a lot. To do this, I'd lay out the code like so:
Take your content and split it like so: $parts = preg_split('/[{}]/', $page_string);
Write a recursive function over the parts with the following criteria:
Halt when length of arg is < 3
Else, return a new array composed of
$arg[0] . load_data($arg[1]) . $arg[2]
plus whatever is left in $argv[3...]
Run your function over $parts.
You can do it without regexes (god forbid), something like:
//return true if $str ends with $sub
function endsWith($str,$sub) {
return ( substr( $str, strlen( $str ) - strlen( $sub ) ) === $sub );
}
$theStringWithVars = "blah.php cool.php awesome.php";
$sub = '.php';
$splitStr = split(" ", $theStringWithVars);
for($i=0;$i<count($splitStr);$i++) {
if(endsWith(trim($splitStr[$i]),$sub)) {
//file_get_contents($splitStr[$i]) etc...
}
}
Off the top of my head, you want this:
// load the "template" file
$input = file_get_contents($template_file_name);
// define a callback. Each time the regex matches something, it will call this function.
// whatever this function returns will be inserted as the replacement
function replaceCallback($matches){
// match zero will be the entire match - eg {FOO}.
// match 1 will be just the bits inside the curly braces because of the grouping parens in the regex - eg FOO
// convert it to lowercase and append ".html", so you're loading foo.html
// then return the contents of that file.
// BEWARE. GIANT MASSIVE SECURITY HOLES ABOUND. DO NOT DO THIS
return file_get_contents( strtolower($matches[1]) . ".html" );
};
// run the actual replace method giving it our pattern, the callback, and the input file contents
$output = preg_replace_callback("\{([-A-Z]+)\}", replaceCallback, $input);
// todo: print the output
Now I'll explain the regex
\{([-A-Z]+)\}
The \{ and \} just tell it to match the curly braces. You need the slashes, as { and } are special characters, so they need escaping.
The ( and ) create a grouping. Basically this lets you extract particular parts of the match. I use it in the function above to just match the things inside the braces, without matching the braces themselves. If I didn't do this, then I'd need to strip the { and } out of the match, which would be annoying
The [-A-Z] says "match any uppercase character, or a -
The + after the [-A-Z] means we need to have at least 1 character, but we can have up to any number.
Comparatively speaking, regular expression are expensive. While you may need them to figure out which files to load, you certainly don't need them for doing the replace, and probably shouldn't use regular expressions. After all, you know exactly what you are replacing so why do you need fuzzy search?
Use an associative array and str_replace to do your replacements. str_replace supports arrays for doing multiple substitutions at once. One line substitution, no loops.
For example:
$substitutions = array('{VAR}'=>file_get_contents('VAR.php'),
'{TEST}'=>file_get_contents('TEST.php'),
...
);
$outputContents = str_replace( array_keys($substitutions), $substitutions, $outputContents);

How do you perform a preg_match where the pattern is an array, in php?

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.

Categories