Regex to disallow two characters in a row - php

I'm trying to modify this regex pattern so that it disallows two specified characters in a row or at the start/end -
/^[^\!\"\£\$\%\^\&\*\(\)\[\]\{\}\#\~\#\/\>\<\\\*]+$/
So at the moment it prevents these characters anywhere in the string, but I also want to stop the following from happening with these characters:
any spaces, apostophes ', underscores _ or hyphens - or dots . appearing at the start of end of the string
also prevent any two of these characters in a row, i.e. '' or _._ or ' -__- ' .
Any help would be hugely appreciated.
Thanks a lot

One way
/^(?=[^!"£$%^&*()[\]{}#~#\/><\\*]+$)(?!.*[ '_.-]{2})[^ '_.-].*[^ '_.-]$/
Note, only tested as javascript regex, i.e.
var rex = /^(?=[^!"£$%^&*()[\]{}#~#\/><\\*]+$)(?!.*[ '_.-]{2})[^ '_.-].*[^ '_.-]$/;
rex.test('okay'); // true
rex.test('_not okay'); // false
Or, to match on disallowed patterns
/^[ '_.-]|[ '_.-]$|[!"£$%^&*()[\]{}#~#\/><\\*]|[ '_.-]{2}/
The first regex will only match strings that contain no disallowed patterns.
The one above will match any disallowed patterns in a string.
Update
Now tested briefly using php. The only difference is that the " in the character set needed to be escaped.
<?php
$test = 'some string';
$regex = "/^[ '_.-]|[ '_.-]$|[!\"£$%^&*()[\]{}#~#\/><\\*]|[ '_.-]{2}/";
if ( preg_match( $regex, $test ) ) {
echo 'Disallowed!';
}

$tests[1] = "fail_.fail"; // doubles
$tests[] = "fail_-fail";
$tests[] = "fail_ fail";
$tests[] = "fail fail";
$tests[] = "fail -fail";
$tests[] = "pas.s_1";
$tests[] = "pa.s-s_2"; // singles
$tests[] = "pas.s_3";
$tests[] = "p.a.s.s_4";
$tests[10] = "pa s-s_5";
$tests[] = "fail fail'"; // pre or post-pended
$tests[] = " fail fail";
$tests[] = " fail fail";
$tests[] = "fail fail_";
$tests[15] = "fail fail-";
// The list of disallowed characters. There is no need to escape.
// This will be done with the function preg_quote.
$exclude = array(" ","'", "_", ".", "-");
$pattern = "#[" . preg_quote(join("", $exclude)) . "]{2,}#s";
// run through the simple test cases
foreach($tests as $k=>$test){
if(
in_array(substr($test, 0, 1), $exclude)
|| in_array(substr(strrev($test), 0 , 1) , $exclude))
{
echo "$k thats a fail" . PHP_EOL;
continue;
}
$test = preg_match( $pattern, $test);
if($test === 1){
echo "$k - thats a fail". PHP_EOL ;
}else{
echo "$k - thats a pass $test ". PHP_EOL ;
}
}
Stealing hopelessly from other replies, I'd advocate using PHPs simple in_array to check the start and end of the string first and just fail early on discovering something bad.
If the test gets past that, then run a really simple regex.
Stick the lot into a function and return false on failure -- that would rm quite a few verbose lines I added -- you could even send in the exclusion array as a variable -- but it would seem rather a specific function so may be YAGNI
eg
if( badString($exclude_array, $input) ) // do stuff

I'm not sure I understand the exact problem, but here's a suggestion:
<?php
$test = "__-Remove '' _._ or -__- but not foo bar '. _ \n";
$expected = 'Remove or but not foo bar';
// The list of disallowed characters. There is no need to escape.
// This will be done with the function preg_quote.
$excluded_of_bounds = "'_.-";
// Remove disallowed characters from start/end of the string.
// We add the space characters that should not be in the regexp.
$test = trim($test, $excluded_of_bounds . " \r\n");
// In two passes
$patterns = array(
// 1/ We remove all successive disallowed characters,
// excepted for the spaces
'#[' . preg_quote($excluded_of_bounds) . ']{2,}#',
// 2/ We replace the successive spaces by a unique space.
'#\s{2,}#',
);
$remplacements = array('', ' ');
// Go!
$test = preg_replace($patterns, $remplacements, $test);
// bool(true)
var_dump($expected === $test);

Related

How to find exact words from an array in a string in PHP

I am searching a string for a group of words in an array to inform users if any are found. However, I get results that are not exact matches. Any ideas on how I can make it show only exact matches. My code looks like below.
<?php
// Profanity check
$profaneReport = "";
$allContent = "Rice Beans Class stite";
$profanity_list = "lass tite able";
$profaneWords = explode( ' ', $profanity_list );
$wordsFoundInProfaneList = []; // Create words an array
//search for the words;
foreach ( $profaneWords as $profane ) {
if ( stripos( $allContent, $profane ) !== false ) {
$wordsFoundInProfaneList[ $profane ] = true;
}
}
// check if bad words were found
if ( $wordsFoundInProfaneList !== 0 ) {
$profaneReportDesc = "Sorry, your content may contain such words as " . "<strong>" . implode( ", ", array_keys( $wordsFoundInProfaneList )) . '</strong>"';
} else {
$profaneReportDesc = "Good: No profanity was found in your content";
}
echo $profaneReportDesc;
?>
The code above returns Sorry, your content may contain such words as lass, tite" When they are not exact matches for words in $allContent
For the benefit of other users looking for an answer to similar question, and building on Alex Howansky's comment to add in more preparation of the input string so that it can more easily be converted into an array of words, you can do it like this:
Remove all punctuation etc. that could affect breaking the string into individual words, and make sure the words are delimited by spaces by replacing all non-alphanumeric characters with spaces e.g. one,two.three will now be identifiable are 3 individual words)
Convert the input string and profanity string to lower case for easier comparison
Explode both strings into arrays (this is where replacing the spaces in the input string are important!)
Intersect the arrays to find the words common to both
You might want to consider removing numerals from your input string also, depending on how you want to handle numbers.
The complete code with detailed comments is as follows:
// Profanity check
$profaneReport = "";
$profanity_list = "hello TEN test commas";
$allContent = "Hello, world! This is a senTENce for testing. It has more than TEN words and contains some punctuation,like commas.";
/* Create an array of all words in lowercase (for easier comparison) */
$profaneWords = explode( ' ', strtolower($profanity_list) );
/* Remove everything but a-z (i.e. all punctionation numbers etc.) from the sentence
We replace them with spaces, so we can break the sentence into words */
$alpha = preg_replace("/[^a-z0-9]+/", " ", strtolower($allContent));
/* Create an array of the words in the sentence */
$alphawords = explode( ' ', $alpha );
/* get all words that are in both arrays */
$wordsFoundInProfaneList = array_intersect ( $alphawords, $profaneWords);
// check if bad words were found, and display a message
if ( !empty($wordsFoundInProfaneList)) {
$profaneReportDesc = "Sorry, your content may contain such words as " . "<strong>" . implode( ", ", $wordsFoundInProfaneList) . '</strong>"';
} else {
$profaneReportDesc = "Good: No profanity was found in your content";
}
echo $profaneReportDesc;

How can I use regex to catch unquoted array indices in PHP code and quote them?

PHP 7.2 upgraded undefined constant errors from a notice to a warning, with advice that in future they will return a full-on error instead.
I am trying to identify a way to fix these via scripting, ideally via a regex that I can run to parse each PHP file on a site, find all offending bits of code, and fix them.
I've found multiple examples of how to fix one variant, but none for another, and it's that one that I'm looking for help with.
Here's an example file:
<?php
$array[foo] = "bar";
// this should become
// $array['foo'] = "bar"
echo "hello, my name is $array[foo] and it's nice to meet you";
// would need to become
// echo "hello, my name is " . $array['foo'] . " and it's nice to meet you";
?>
I've seen a lot of options to identify and change the first type, but none for the second, where the undefined constant is within a string. In that instance the parser would need to:
Replace $array[foo] with $array['foo']
Find the entire variable, end quotes beforehand, put a . either side, and then reopen quotes afterwards
Edit: ideally one regexp would deal with both examples in the sample code in one pass - i.e. add the ticks, and also add the quotes/dots if it identifies it’s within a string.
$array[foo] = "bar";
// this should become
// $array['foo'] = "bar"
Yes, this has always triggered a notice and has always been poor practice.
echo "hello, my name is $array[foo] and it's nice to meet you";
// would need to become
// echo "hello, my name is " . $array['foo'] . " and it's nice to meet you";
No, this style has never triggered a notice and does not now. In fact, it's used as an example in the PHP documentation. PHP is never going to remove the ability to interpolate array variables in strings.
Your first case is easy enough to catch with something like this:
$str = '$array[foo] = "bar";';
echo preg_replace("/(\\$[a-z_][a-z0-9_]*)\\[([a-z][a-z0-9_]*)\\]/", "$1['$2']", $str);
But of course needs to be caught only outside of a string.
As with any complex grammar, regular expressions will never be as reliable as a grammar-specific parser. Since you're parsing PHP code, your most accurate solution will be to use PHP's own token parser.
$php = <<< 'PHP'
<?php
$array[foo] = "bar"; // this line should be the only one altered.
$array['bar'] = "baz";
echo "I'm using \"$array[foo]\" and \"$array[bar]\" in a sentence";
echo 'Now I\'m not using "$array[foo]" and "$array[bar]" in a sentence';
PHP;
$tokens = token_get_all($php);
$in_dq_string = false;
$last_token = null;
$output = "";
foreach ($tokens as $token) {
if ($last_token === "[" && is_array($token) && $token[0] === 319 && !$in_dq_string) {
$output .= "'$token[1]'";
} elseif (is_array($token)) {
$output .= $token[1];
} else {
if ($token === "\"") {
$in_dq_string = !$in_dq_string;
}
$output .= $token;
}
$last_token = $token;
}
echo $output;
Output:
<?php
$array['foo'] = "bar"; // this line should be the only one altered.
$array['bar'] = "baz";
echo "I'm using \"$array[foo]\" and \"$array[bar]\" in a sentence";
echo 'Now I\'m not using "$array[foo]" and "$array[bar]" in a sentence';
This code would need some edge cases accounted for, such as when you are intentionally using a constant as an array index.
This isn't perfect, but it should be safe to run multiple times (example)
$str = 'echo "hello, my name is $array[foo] and it\'s nice to meet you";';
echo preg_replace_callback('/\".*(\$.*\[[^\'].*[^\']\]).*\"/', function($match) {
$search = ['[', ']'];
$replace = ["['", "']"];
$array = '" . ' . str_replace($search, $replace, $match[1]) . ' . "';
return str_replace($match[1], $array, $match[0]);
}, $str);
What the regex does is it limits itself to double quoted strings (\"). Then we look for $var[val], without the ticks '. Once we've captured it, we can run it through a callback that does a two-stage str_replace. The first wraps our matched $var[val] with double quotes and inserts the ticks, while the second inserts it into the whole string, using the regex found match
It won't do some things nicely tho. If you have $array[foo] $array[bar], it will wind up as
" . $array['foo'] . "" . $array['bar'] . "
Not pretty, but still valid code

PHP: Deal with delimiters in dynamic regular expression

Let's say I want to create a PHP tool to dynamically check a string against regular expression pattern. There is one problem with that: delimiters.
I would like to be able to do the following (simplified example):
$pattern = $_POST['pattern'];
$matched = (preg_match($pattern, $_POST['string']) === 1);
I don't want the users to put delimiters in the input, just a pure pattern, like ^a[bc]+d. How to deal with delimiters? I could do this:
$pattern = '/' . $_POST['pattern'] . '/';
Or with any other possible delimiter, but what about escaping? Is placing \ before each character in the pattern, being the same one as the delimiter of my choice, enough? Like this:
$pattern = '/' . str_replace('/', '\\/', $_POST['pattern']) . '/';
What is a neat way to deal with delimiters?
You have to check the input to identify the delimiter if there is any and remove it. This way if the user follows the rules, you don't have to worry, but if they don't the delimiter is removed anyway. The delimiter can be identified by comparing the first and last character.
// with incorrect input.
$input = "/^a[bc]+d/"; // from $_POST['pattern']
$delim = "/";
if ($input[0] === $input[strlen($input) - 1]) {
$delim = $input[0];
}
$sInput = str_replace($delim,"",$input);
echo $sInput; // ^a[bc]+d
With correct input, you don't have to worry.
$input = "^a[bc]+d"; // from $_POST['pattern']
$delim = "/";
if ($input[0] === $input[strlen($input) - 1]) {
$delim = $input[0];
}
$sInput = str_replace($delim,"",$input);
echo $sInput; // ^a[bc]+d
$sInput is your sanitized pattern. You can use it directly to test your string.
$matched = (preg_match($sInput, $_POST['string']) === 1);

Find exact string inside a string

I have two strings "Mures" and "Maramures". How can I build a search function that when someone searches for Mures it will return him only the posts that contain the "Mures" word and not the one that contain the "Maramures" word. I tried strstr until now but it does now work.
You can do this with regex, and surrounding the word with \b word boundary
preg_match("~\bMures\b~",$string)
example:
$string = 'Maramures';
if ( preg_match("~\bMures\b~",$string) )
echo "matched";
else
echo "no match";
Use preg_match function
if (preg_match("/\bMures\b/i", $string)) {
echo "OK.";
} else {
echo "KO.";
}
How do you check the result of strstr? Try this here:
$string = 'Maramures';
$search = 'Mures';
$contains = strstr(strtolower($string), strtolower($search)) !== false;
Maybe it's a dumb solution and there's a better one. But you can add spaces to the source and destination strings at the start and finish of the strings and then search for " Mures ". Easy to implement and no need to use any other functions :)
You can do various things:
search for ' Mures ' (spaces around)
search case sensitive (so 'mures' will be found in 'Maramures' but 'Mures' won't)
use a regular expression to search in the string ( 'word boundary + Mures + word boundary') -- have a look at this too: Php find string with regex
function containsString($needle, $tag_array){
foreach($tag_array as $tag){
if(strpos($tag,$needle) !== False){
echo $tag . " contains the string " . $needle . "<br />";
} else {
echo $tag . " does not contain the string " . $needle;
}
}
}
$tag_array = ['Mures','Maramures'];
$needle = 'Mures';
containsString($needle, $tag_array);
A function like this would work... Might not be as sexy as preg_match though.
The very simple way should be similar to this.
$stirng = 'Mures';
if (preg_match("/$string/", $text)) {
// Matched
} else {
// Not matched
}

Sentence Comparison: Ignore Words

I need some help with sentence comparison.
$answer = "This is the (correct and) acceptable answer. Content inside the parenthesis are ignored if not present in the user's answer. If it is present, it should not count against them.";
$response = "This is the correct and acceptable answer. Content inside the parenthesis are ignored if not present in the user's answer. If it is present, it should not count against them.";
echo "<strong>Acceptable Answer:</strong>";
echo "<pre style='white-space:normal;'>$answer</pre><hr/>";
echo "<strong>User's Answer:</strong>";
echo "<pre>".$response."</pre>";
// strip content in brackets
$answer = preg_replace("/\([^)]*\)|[()]/", "", $answer);
// strip punctuation
$answer = preg_replace("/[^a-zA-Z 0-9]+/", " ", $answer);
$response = preg_replace("/[^a-zA-Z 0-9]+/", " ", $response);
$common = similar_text($answer, $response, $percent);
$orgcount = strlen($answer);
printf("The user's response has %d/$orgcount characters in common (%.2f%%).", $common, $percent);
Basically what I want to do is ignore parenthiseised words. For example, in the $answer string, correct and are in parenthesis - because of this, I don't want these words to count agains the user's response. So if the user has these words, it doesn't count against them. And if the user doesn't have these words, it doesn't count against them.
Is this possible?
Thanks to the comments, I've wrote a solution, since it's a "long" process i though to put it in a function.
EDIT: After debugging it came out that strpos() was causing some trouble if the position was 0, so i added an OR statement:
$answer = "(This) is the (correct and) acceptable answer. (random this will not count) Content inside the parenthesis are ignored if not present in the user's answer. If it is present, it should not count against them.";
$response = "This is the correct and acceptable answer. Content inside the parenthesis are ignored if not present in the user's answer. If it is present, it should not count against them.";
echo 'The user\'s response has '.round(compare($answer, $response),2).'% characters in common'; // The user's response has 100% characters in common
function compare($answer, $response){
preg_match_all('/\((?P<parenthesis>[^\)]+)\)/', $answer, $parenthesis);
$catch = $parenthesis['parenthesis'];
foreach($catch as $words){
if(!strpos($response, $words) === false || strpos($response, $words) === 0){ // if it does exist then remove brackets
$answer = str_replace('('.$words.')', $words, $answer);
}else{ //if it does not exist remove the brackets with the words
$answer = str_replace('('.$words.')', '', $answer);
}
}
/* To sanitize */
$answer = preg_replace(array('/[^a-zA-Z0-9]+/', '/ +/'), array(' ', ' '), $answer);
$response = preg_replace(array('/[^a-zA-Z 0-9]+/', '/ +/'), array(' ', ' '), $response);
$common = similar_text($answer, $response, $percent);
return($percent);
}

Categories