I am trying to get the smallest match of a regexp in php.
For example: If I have the words television and telephone and the user input is tel, my regexp should return the smallest word, in this case telephone.
For short I am trying to do like a search script. But for the letters missing in the user input I use this t[a-zA-Z0-9]{0,2}l[a-zA-Z0-9]{0,} so my last letter form the word will followed by N characters.
My question is: How can I do my REGEXP To show the smallest word.
Unfortunately you can't do it. Regex can match what you want, but it doesn't provide any functions to compare submatches. You have to match your whole string, and compare submatches by PHP code in your case.
// your array of matched words
$words = array(...);
$foundWordLength = null;
$foundWord = '';
foreach ($words as $word) {
if (strlen($word) < $foundWordLength || $foundWordLength === null) {
$wordLength = strlen($word);
$foundWord = $word;
}
}
echo $foundWord;
The only way I think you can achieve it using Regular Expressions, is to sort words in desirable order first, in your case from the shortest to the longest.
Then if you have a relatively small amount of words, for the sake of performance, words can be concatenated and checked for the first match simultaneously. It's possible because PHP RegExp implementation performs the search from left to right. See function search_short() in example below.
Anyway, the loop and checking for the word starting from the lowest will work as well. Check function search_long() in example below.
<?php
$given = [
'telephone',
'television',
];
// NB: Do not forget to sanitize user input, i.e. $query
echo (search_short($given, 'tele') ?: 'Nothing found') . PHP_EOL;
echo (search_long($given, 'tele') ?: 'Nothing found') . PHP_EOL;
echo (search_short($given, 't[a-zA-Z0-9]{0,2}l[a-zA-Z0-9]{0,}') ?: 'Nothing found') . PHP_EOL;
echo (search_long($given, 't[a-zA-Z0-9]{0,2}l[a-zA-Z0-9]{0,}') ?: 'Nothing found') . PHP_EOL;
/**
* #param string[] $given
* #param string $query
*
* #return null|string
*/
function search_short($given, $query)
{
// precalculating the length of each word, removing duplicates, sorting
$given = array_map(function ($word) {
return mb_strlen($word); // `mb_strlen()` is O(N) function, while `strlen()` is O(1)
}, array_combine($given, $given));
asort($given);
// preparing the index string
$index = implode(PHP_EOL, array_keys($given));
// and, finally, searching (the multiline flag is set)
preg_match(
sprintf('/^(?<word>%s\w*)$/mu', $query), // injecting the query word
$index,
$matches
);
// the final pattern looks like: "/^(?P<word>tele\w*)$/mui"
if (array_key_exists('word', $matches)) {
return $matches['word'];
}
return null;
}
/**
* #param string[] $given
* #param string $query
*
* #return null|string
*/
function search_long($given, $query)
{
$pattern = sprintf('/^(?<word>%s\w*)$/u', $query);
// precalculating the length of each word, removing duplicates, sorting
$given = array_map(function ($word) {
return mb_strlen($word);
}, array_combine($given, $given));
asort($given);
foreach ($given as $word => $count) {
if (preg_match($pattern, $word, $matches)) {
if (array_key_exists('word', $matches)) {
return $matches['word'];
}
}
}
return false;
}
Of course it's not the most efficient algorithm and might be improved in multiple ways. But for accomplishing this more information about the scope and usage needed.
A Regular Expression engine normally neither does have an intended memory to store complex conditions nor benefits from a programming language features to provide complex comparisons.
You can do your job with a few more lines if tagging php wasn't done aimlessly.
$str = 'television and telephone';
preg_match_all('/tel\w*/', $str, $matches);
usort($matches[0], function($a, $b) {
return strlen($a) <=> strlen($b);
});
echo $matches[0][0];
Related
I have an array like below:
$fruits = array("apple","orange","papaya","grape")
I have a variable like below:
$content = "apple";
I need to filter some condition like: if this variable matches at least one of the array elements, do something. The variable, $content, is a bunch of random characters that is actually one of these available in the array data like below:
$content = "eaplp"; // it's a dynamically random char from the actual word "apple`
what have I done was like the below:
$countcontent = count($content);
for($a=0;$a==count($fruits);$a++){
$countarr = count($fruits[$a]);
if($content == $fruits[$a] && $countcontent == $countarr){
echo "we got".$fruits[$a];
}
}
I tried to count how many letters these phrases had and do like if...else... when the total word in string matches with the total word on one of array data, but is there something that we could do other than that?
We can check if an array contains some value with in_array. So you can check if your $fruits array contains the string "apple" with,
in_array("apple", $fruits)
which returns a boolean.
If the order of the letters is random, we can sort the string alphabetically with this function:
function sorted($s) {
$a = str_split($s);
sort($a);
return implode($a);
}
Then map this function to your array and check if it contains the sorted string.
$fruits = array("apple","orange","papaya","grape");
$content = "eaplp";
$inarr = in_array(sorted($content), array_map("sorted", $fruits));
var_dump($inarr);
//bool(true)
Another option is array_search. The benefit from using array_search is that it returns the position of the item (if it's found in the array, else false).
$pos = array_search(sorted($content), array_map("sorted", $fruits));
echo ($pos !== false) ? "$fruits[$pos] found." : "not found.";
//apple found.
This will also work but in a slightly different manner.
I split the strings to arrays and sort them to match eachoter.
Then I use array_slice to only match the number of characters in $content, if they match it's a match.
This means this will match in a "loose" way to with "apple juice" or "apple curd".
Not sure this is wanted but figured it could be useful for someone.
$fruits = array("apple","orange","papaya","grape","apple juice", "applecurd");
$content = "eaplp";
$content = str_split($content);
$count = count($content);
Foreach($fruits as $fruit){
$arr_fruit = str_split($fruit);
// sort $content to match order of $arr_fruit
$SortCont = array_merge(array_intersect($arr_fruit, $content), array_diff($content, $arr_fruit));
// if the first n characters match call it a match
If(array_slice($SortCont, 0, $count) == array_slice($arr_fruit, 0, $count)){
Echo "match: " . $fruit ."\n";
}
}
output:
match: apple
match: apple juice
match: applecurd
https://3v4l.org/hHvp3
It is also comparable in speed with t.m.adams answer. Sometimes faster sometimes slower, but note how this code can find multiple answers. https://3v4l.org/IbuuD
I think this is the simplest way to answer that question. some of the algorithm above seems to be "overkill".
$fruits = array("apple","orange","papaya","grape");
$content = "eaplp";
foreach ($fruits as $key => $fruit) {
$fruit_array = str_split($fruit); // split the string into array
$content_array = str_split($content); // split the content into array
// check if there's no difference between the 2 new array
if ( sizeof(array_diff($content_array, $fruit_array)) === 0 ) {
echo "we found the fruit at key: " . $key;
return;
}
}
What about using only native PHP functions.
$index = array_search(count_chars($content), array_map('count_chars', $fruits));
If $index is not null you will get the position of $content inside $fruits.
P.S. Be aware that count_chars might not be the fastest approach to that problem.
With a random token to search for a value in your array, you have a problem with false positives. That can give misleading results depending on the use case.
On search cases, for example wrong typed words, I would implement a filter solution which produces a matching array. One could sort the results by calculating the levenshtein distance to fetch the most likely result, if necessary.
String length solution
Very easy to implement.
False positives: Nearly every string with the same length like apple and grape would match.
Example:
$matching = array_filter($fruits, function ($fruit) use ($content) {
return strlen($content) == strlen($fruit);
});
if (count($matching)) {
// do your stuff...
}
Regular expression solution
It compares string length and in a limited way containing characters. It is moderate to implement and has a good performance on big data cases.
False positives: A content like abc would match bac but also bbb.
Example:
$matching = preg_grep(
'#['.preg_quote($content).']{'.strlen($content).'}#',
$fruits
);
if (count($matching)) {
// do your stuff...
}
Alphanumeric sorting solution
Most accurate but also a slow approach concerning performance using PHP.
False positives: A content like abc would match on bac or cab.
Example:
$normalizer = function ($value) {
$tokens = str_split($value);
sort($tokens);
return implode($tokens);
};
$matching = array_filter($fruits, function ($fruit) use ($content, $normalizer) {
return ($normalizer($fruit) == $normalizer($content));
});
if (count($matching)) {
// do your stuff...
}
Here's a clean approach. Returns unscrambled value early if found, otherwise returns null. Only returns an exact match.
function sortStringAlphabetically($stringToSort)
{
$splitString = str_split($stringToSort);
sort($splitString);
return implode($splitString);
}
function getValueFromRandomised(array $dataToSearch = [], $dataToFind)
{
$sortedDataToFind = sortStringAlphabetically($dataToFind);
foreach ($dataToSearch as $value) {
if (sortStringAlphabetically($value) === $sortedDataToFind) {
return $value;
}
}
return null;
}
$fruits = ['apple','orange','papaya','grape'];
$content = 'eaplp';
$dataExists = getValueFromRandomised($fruits, $content);
var_dump($dataExists);
// string(5) "apple"
Not found example:
$content = 'eaplpo';
var_dump($dataExists);
// NULL
Then you can use it (eg) like this:
echo !empty($dataExists) ? $dataExists . ' was found' : 'No match found';
NOTE: This is case sensitive, meaning it wont find "Apple" from "eaplp". That can be resolved by doing strtolower() on the loop's condition vars.
How about looping through the array, and using a flag to see if it matches?
$flag = false;
foreach($fruits as $fruit){
if($fruit == $content){
$flag = true;
}
}
if($flag == true){
//do something
}
I like t.m.adams answer but I also have a solution for this issue:
array_search_random(string $needle, array $haystack [, bool $strictcase = FALSE ]);
Description: Searches a string in array elements regardless of the position of the characters in the element.
needle: the caracters you are looking for as a string
haystack: the array you want to search
strictcase: if set to TRUE needle 'mood' will match 'mood' and 'doom' but not 'Mood' and 'Doom', if set to FALSE (=default) it will match all of these.
Function:
function array_search_random($needle, $haystack, $strictcase=false){
if($strictcase === false){
$needle = strtolower($needle);
}
$needle = str_split($needle);
sort($needle);
$needle = implode($needle);
foreach($haystack as $straw){
if($strictcase === false){
$straw = strtolower($straw);
}
$straw = str_split($straw);
sort($straw);
$straw = implode($straw);
if($straw == $needle){
return true;
}
}
return false;
}
if(in_array("apple", $fruits)){
true statement
}else{
else statement
}
What I have so far:
$searchQuery = "keyword1 keyword2";
$searchArray = explode(" ", $searchQuery);
$stringToCheck = "keyword1 keyword3 keyword2";
foreach ($searchArray as $searchKeyword) {
if (strpos($stringToCheck, $searchKeyword) !== false) :
//do something
endif;
}
endif;
What I want is to display something only if ALL values in the search query are found in a string. With my current code above, if the string contains keyword1 and keyword2, it "does something" twice, once for each match. It also comes up as true if the string only contains keyword1 but not keyword2, in that case it displays the content once.
Solution
function str_contains_all($haystack, array $needles) {
foreach ($needles as $needle) {
if (strpos($haystack, $needle) === false) {
return false;
}
}
return true;
}
Usage:
$haystack = 'foo, bar, baz';
$needles = array('foo', 'bar', 'baz');
if (str_contains_all($haystack, $needles)) {
echo "Success\n";
// ...
}
Notes
Since you specified that you only wish to perform an action when the "haystack" string contains all the substring "needles", it is safe to return false as soon as you discover a needle that is not in the haystack.
Other
Using if (...): /* ... */ endif; is fairly uncommon in PHP from my experience. I think most developers would prefer the C-style if (...) { /* ... */ } syntax.
For anyone who might need another way. I came up with this while looking for a solution.
Ensure what's to be tested are both arrays;
$var1 = explode(' ',"a string to be turned to an array");
$var2 = ["test","a","string","array"];
Get the counts (or lengths) of both arrays;
$var1count = count($var1);
$var2count = count($var2);
//Get the intersection of both arrays with array_intersect() method
$intersect = array_intersect($var1,$var2);
//check if the count (or length) of the intersect is the same as the test var (eg $var1)
if(count($intersect)===$var1count)//it has all the words
FYI: case sensitive. You might want to convert all to lower or upper case
I am looking for a function like strpos() with two significant differences:
To be able to accept multiple needles. I mean thousands of needles at ones.
To search for all occurrences of the needles in the haystack and to return an array of starting positions.
Of course it has to be an efficient solution not just a loop through every needle. I have searched through this forum and there were similar questions to this one, like:
Using an array as needles in strpos
Define multiple needles using stripos
Can't search an array in PHP in_array for the presence of multiple needles
but nether of them was what I am looking for. I am using strpos just to illustrate my question better, probably something entirely different has to be used for this purpose.
I am aware of Zend_Search_Lucene and I am interested if it can be used to achieve this and how (just the general idea)?
Thanks a lot for Your help and time!
try preg match for multiple
if (preg_match('/word|word2/i', $str))
Checking for multiple strpos values
Here's some sample code for my strategy:
function strpos_array($haystack, $needles, $offset=0) {
$matches = array();
//Avoid the obvious: when haystack or needles are empty, return no matches
if(empty($needles) || empty($haystack)) {
return $matches;
}
$haystack = (string)$haystack; //Pre-cast non-string haystacks
$haylen = strlen($haystack);
//Allow negative (from end of haystack) offsets
if($offset < 0) {
$offset += $heylen;
}
//Use strpos if there is no array or only one needle
if(!is_array($needles)) {
$needles = array($needles);
}
$needles = array_unique($needles); //Not necessary if you are sure all needles are unique
//Precalculate needle lengths to save time
foreach($needles as &$origNeedle) {
$origNeedle = array((string)$origNeedle, strlen($origNeedle));
}
//Find matches
for(; $offset < $haylen; $offset++) {
foreach($needles as $needle) {
list($needle, $length) = $needle;
if($needle == substr($haystack, $offset, $length)) {
$matches[] = $offset;
break;
}
}
}
return($matches);
}
I've implemented a simple brute force method above that will work with any combination of needles and haystacks (not just words). For possibly faster algorithms check out:
Aho–Corasick string matching algorithm
Other Solution
function strpos_array($haystack, $needles, $theOffset=0) {
$matches = array();
if(empty($haystack) || empty($needles)) {
return $matches;
}
$haylen = strlen($haystack);
if($theOffset < 0) { // Support negative offsets
$theOffest += $haylen;
}
foreach($needles as $needle) {
$needlelen = strlen($needle);
$offset = $theOffset;
while(($match = strpos($haystack, $needle, $offset)) !== false) {
$matches[] = $match;
$offset = $match + $needlelen;
if($offset >= $haylen) {
break;
}
}
}
return $matches;
}
I know this doesn't answer the OP's question but wanted to comment since this page is at the top of Google for strpos with multiple needles. Here's a simple solution to do so (again, this isn't specific to the OP's question - sorry):
$img_formats = array('.jpg','.png');
$missing = array();
foreach ( $img_formats as $format )
if ( stripos($post['timer_background_image'], $format) === false ) $missing[] = $format;
if (count($missing) == 2)
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
If 2 items are added to the $missing array that means that the input doesn't satisfy any of the image formats in the $img_formats array. At that point you know that you can return an error, etc. This could easily be turned into a little function:
function m_stripos( $haystack = null, $needles = array() ){
//return early if missing arguments
if ( !$needles || !$haystack ) return false;
// create an array to evaluate at the end
$missing = array();
//Loop through needles array, and add to $missing array if not satisfied
foreach ( $needles as $needle )
if ( stripos($haystack, $needle) === false ) $missing[] = $needle;
//If the count of $missing and $needles is equal, we know there were no matches, return false..
if (count($missing) == count($needles)) return false;
//If we're here, be happy, return true...
return true;
}
Back to our first example using then the function instead:
$needles = array('.jpg','.png');
if ( !m_strpos( $post['timer_background_image'], $needles ) )
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
Of course, what you do after the function returns true or false is up to you.
It seems you are searching for whole words. In this case, something like this might help. As it uses built-in functions, it should be faster than custom code, but you have to profile it:
$words = str_word_count($str, 2);
$word_position_map = array();
foreach($words as $position => $word) {
if(!isset($word_position_map[$word])) {
$word_position_map[$word] = array();
}
$word_position_map[$word][] = $position;
}
// assuming $needles is an array of words
$result = array_intersect_key($word_position_map, array_flip($needles));
Storing the information (like the needles) in the right format will improve the runtime ( e.g. as you don't have to call array_flip).
Note from the str_word_count documentation:
For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.
So make sure you set the locale right.
You could use a regular expression, they support OR operations. This would however make it fairly slow, compared to strpos.
How about a simple solution using array_map()?
$string = 'one two three four';
$needles = array( 'five' , 'three' );
$strpos_arr = array_map( function ( $check ) use ( $string ) {
return strpos( $string, $check );
}, $needles );
As return, you're going to have an array where the keys are the needles positions and the values are the starting positions, if found.
//print_r( $strpos_arr );
Array
(
[0] =>
[1] => 8
)
I'm looking for a simple way to find matching portions of two strings in PHP (specifically in the context of a URI)
For example, consider the two strings:
http://2.2.2.2/~machinehost/deployment_folder/
and
/~machinehost/deployment_folder/users/bob/settings
What I need is to chop off the matching portion of these two strings from the second string, resulting in:
users/bob/settings
before appending the first string as a prefix, forming an absolute URI.
Is there some simple way (in PHP) to compare two arbitrary strings for matching substrings within them?
EDIT: as pointed out, I meant the longest matching substring common to both strings
Assuming your strings are $a and $b, respectively, you can use this:
$a = 'http://2.2.2.2/~machinehost/deployment_folder/';
$b = '/~machinehost/deployment_folder/users/bob/settings';
$len_a = strlen($a);
$len_b = strlen($b);
for ($p = max(0, $len_a - $len_b); $p < $len_b; $p++)
if (substr($a, $len_a - ($len_b - $p)) == substr($b, 0, $len_b - $p))
break;
$result = $a.substr($b, $len_b - $p);
echo $result;
This result is http://2.2.2.2/~machinehost/deployment_folder/users/bob/settings.
Finding the longest common match can also be done using regex.
The below function will take two strings, use one to create a regex, and execute it against the other.
/**
* Determine the longest common match within two strings
*
* #param string $str1
* #param string $str2 Two strings in any order.
* #param boolean $case_sensitive Set to true to force
* case sensitivity. Default: false (case insensitive).
* #return string The longest string - first match.
*/
function get_longest_common_subsequence( $str1, $str2, $case_sensitive = false ) {
// First check to see if one string is the same as the other.
if ( $str1 === $str2 ) return $str1;
if ( ! $case_sensitive && strtolower( $str1 ) === strtolower( $str2 ) ) return $str1;
// We'll use '#' as our regex delimiter. Any character can be used as we'll quote the string anyway,
$delimiter = '#';
// We'll find the shortest string and use that to check substrings and create our regex.
$l1 = strlen( $str1 );
$l2 = strlen( $str2 );
$str = $l1 <= $l2 ? $str1 : $str2;
$str2 = $l1 <= $l2 ? $str2 : $str1;
$l = min( $l1, $l2 );
// Next check to see if one string is a substring of the other.
if ( $case_sensitive ) {
if ( strpos( $str2, $str ) !== false ) {
return $str;
}
}
else {
if ( stripos( $str2, $str ) !== false ) {
return $str;
}
}
// Regex for each character will be of the format (?:a(?=b))?
// We also need to capture the last character, but this prevents us from matching strings with a single character. (?:.|c)?
$reg = $delimiter;
for ( $i = 0; $i < $l; $i++ ) {
$a = preg_quote( $str[ $i ], $delimiter );
$b = $i + 1 < $l ? preg_quote( $str[ $i + 1 ], $delimiter ) : false;
$reg .= sprintf( $b !== false ? '(?:%s(?=%s))?' : '(?:.|%s)?', $a, $b );
}
$reg .= $delimiter;
if ( ! $case_sensitive ) {
$reg .= 'i';
}
// Resulting example regex from a string 'abbc':
// '#(?:a(?=b))?(?:b(?=b))?(?:b(?=c))?(?:.|c)?#i';
// Perform our regex on the remaining string
$str = $l1 <= $l2 ? $str2 : $str1;
if ( preg_match_all( $reg, $str, $matches ) ) {
// $matches is an array with a single array with all the matches.
return array_reduce( $matches[0], function( $a, $b ) {
$al = strlen( $a );
$bl = strlen( $b );
// Return the longest string, as long as it's not a single character.
return $al >= $bl || $bl <= 1 ? $a : $b;
}, '' );
}
// No match - Return an empty string.
return '';
}
It'll generate a regex using the shorter of the two strings, although performance will most likely be the same either way. It may incorrectly match strings with recurring substrings, and we're limited to matching strings of two characters or more, unless they are equal or one is a substring of the other. For Instance:
// Works as intended.
get_longest_common_subsequence( 'abbc', 'abc' ) === 'ab';
// Returns incorrect substring based on string length and recurring substrings.
get_longest_common_subsequence( 'abbc', 'abcdef' ) === 'abc';
// Does not return any matches, as all recurring strings are only a single character long.
get_longest_common_subsequence( 'abc', 'ace' ) === '';
// One of the strings is a substring of the other.
get_longest_common_subsequence( 'abc', 'a' ) === 'a';
Regardless, it functions using an alternate method and the regex can be refined to tackle additional situations.
I'm not sure to understand your full request, but the idea is:
Let A be your URL and B your "/~machinehost/deployment_folder/users/bob/settings"
search B in A -> you get an index i (where i is the position of the first / of B in A)
let l = length(A)
You need to cut B from (l-i) to length(B) to grab the last part of B (/users/bob/settings)
I have not tested yet, but if you really need, I can help you make this brilliant (ironical) solution work.
Note that it may be possible with regular expressions like
$pattern = "$B(.*?)"
$res = array();
preg_match_all($pattern, $A, $res);
Edit: I think your last comment invalidates my response. But what you want is finding substrings. So you can first start with a heavy algorithm trying to find B[1:i] in A for i in {2, length(B)} and then use some dynamic programming stuffs.
it does not seem to be an out of the box code out there for your requirement. So lets look for a simple way.
For this exercise I utilized two methods, one for finding the longest match, and another one to chop off the matching portion.
The FindLongestMatch() method, takes apart a path, piece by piece seeks for a match in the other path, keeping just one match, the longest one (no arrays, no sorting).
The RemoveLongestMatch() method takes the suffix or 'remainder' after the longest match found position.
Here the full source code:
<?php
function FindLongestMatch($relativePath, $absolutePath)
{
static $_separator = '/';
$splitted = array_reverse(explode($_separator, $absolutePath));
foreach ($splitted as &$value)
{
$matchTest = $value.$_separator.$match;
if(IsSubstring($relativePath, $matchTest))
$match = $matchTest;
if (!empty($value) && IsNewMatchLonger($match, $longestMatch))
$longestMatch = $match;
}
return $longestMatch;
}
//Removes from the first string the longest match.
function RemoveLongestMatch($relativePath, $absolutePath)
{
$match = findLongestMatch($relativePath, $absolutePath);
$positionFound = strpos($relativePath, $match);
$suffix = substr($relativePath, $positionFound + strlen($match));
return $suffix;
}
function IsNewMatchLonger($match, $longestMatch)
{
return strlen($match) > strlen($longestMatch);
}
function IsSubstring($string, $subString)
{
return strpos($string, $subString) > 0;
}
This is a representative subset of Test Cases:
//TEST CASES
echo "<br>-----------------------------------------------------------";
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);
echo "<br>-----------------------------------------------------------";
echo "<br>".$absolutePath = 'http://1.1.1.1/root/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/root/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);
echo "<br>-----------------------------------------------------------";
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/users/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);
echo "<br>-----------------------------------------------------------";
echo "<br>".$absolutePath = 'http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/subDirectory/deployment_folderX/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);
Running previous Test Cases provides the following output:
http://2.2.2.2/~machinehost/deployment_folder/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/
Suffix: users/bob/settings
http://1.1.1.1/root/~machinehost/deployment_folder/
/root/~machinehost/deployment_folder/users/bob/settings
Longuest match: root/~machinehost/deployment_folder/
Suffix: users/bob/settings
http://2.2.2.2/~machinehost/deployment_folder/users/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/users/
Suffix: bob/settings
http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/
/~machinehost/subDirectory/deployment_folderX/users/bob/settings
Longuest match: ~machinehost/subDirectory/
Suffix: deployment_folderX/users/bob/settings
Maybe you can take the idea of this piece of code and turn it into something that you find useful for your current project.
Let me know if it worked for you too. By the way, Mr. oreX answer looks good too.
Try this.
http://pastebin.com/GqS3UiPD
How can I check if data submitted from a form or querystring has certain words in it?
I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.
I'm converting from ASP to PHP. I used to do this using an array in ASP (keep all illegal words in a string and use ubound to check the whole string for those words), but is there a better (efficient) way to do this in PHP?
Eg: A string like this would be rejected: "The administrator dropped a blah blah" because it has admin and drop in it.
I intend using this to check usernames when creating accounts and for other things too.
Thanks
You could use stripos()
int stripos ( string $haystack , string $needle [, int $offset = 0 ] )
You could have a function like:
function checkBadWords($str, $badwords) {
foreach ($badwords as $word) {
if (stripos(" $str ", " $word ") !== false) {
return false;
}
}
return true;
}
And to use it:
if (!checkBadWords('something admin', array('admin')) {
// ...
}
strpos() will let you search for a substring within a larger string. It's quick and works well. It returns false if the string's not found, and a number (which could be zero, so you need to use === to check) if it finds the string.
stripos() is a case-insensitive version of the same.
I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.
I suspect that you are trying to filter the string so it's suitable for including in something like a database query, or something like that. If this is the case, this is probably not a good way to go about it, and you'd need to actually need to escape the string using mysql_real_escape_string() or equivalent.
$badwords = array("admin", "drop",);
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (strpos($word, $bw) === 0) {
//contains word $word that starts with bad word $bw
}
}
}
For JGB146, here is a performance comparison with regular expressions:
<?php
function has_bad_words($badwords, $string) {
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (stripos($word, $bw) === 0) {
return true;
}
}
return false;
}
}
function has_bad_words2($badwords, $string) {
$regex = array_map(function ($w) {
return "(?:\\b". preg_quote($w, "/") . ")"; }, $badwords);
$regex = "/" . implode("|", $regex) . "/";
return preg_match($regex, $string) != 0;
}
$badwords = array("abc", "def", "ghi", "jkl", "mnop");
$string = "The quick brown fox jumps over the lazy dog";
$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
has_bad_words($badwords, $string);
}
echo "elapsed: ". (microtime(true) - $start);
$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
has_bad_words2($badwords, $string);
}
echo "elapsed: ". (microtime(true) - $start);
Example output:
elapsed: 0.076514959335327
elapsed: 0.29999899864197
So regular expressions are much slower.
You could use regular expression like this:
preg_match("~(admin)|(drop)|(another token)|(yet another)~",$subject);
building the pattern string from array
$pattern = implode(")|(", $banned_words);
$pattern = "~(".$pattern.")~";
function check($string, $array) {
foreach($array as $item) {
if( preg_match("/($item)/", $string) )
return true;
}
return false;
}
You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.
Originally, I was thinking you could do this with a simple preg_match() call (hence the downvote), however preg_match does not support arrays. Instead, you can do a replacement via preg_replace to have all rejected strings replaced with nothing, and then check to see if the string is changed. This is simple and avoids requiring a loop iteration for each rejected string.
$rejectedStrs = array("/admin/", "/drop/", "/create/");
if($input == preg_replace($rejectedStrs, "", $input)) {
//do stuff
} else {
//reject
}
Note also that you can provide case-insensitive searches by using the i flag on the regex patterns, changing the array of patterns to $rejectedStrs = array("/admin/i", "/drop/i", "/create/i");
On Efficiency
There has been some debate about the efficiency of doing it this way vs the accepted nested loop method. I ran some tests and found the preg_replace method executed around twice as fast as the nested loop. Here is the code and output of those tests:
$input = "You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement. You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.";
$input = "Short string with no matches";
$input2 = "Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. ";
$input3 = "Short string which loop will match quickly";
$input4 = "Longer string that will eventually be matches but first has a lot of words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words and then finally the word create near the end";
$start1 = microtime(true);
$rejectedStrs = array("/loop/", "/operation/", "/create/");
$p_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (preg_check($rejectedStrs, $input)) $p_matches++;
if (preg_check($rejectedStrs, $input2)) $p_matches++;
if (preg_check($rejectedStrs, $input3)) $p_matches++;
if (preg_check($rejectedStrs, $input4)) $p_matches++;
}
$start2 = microtime(true);
$rejectedStrs = array("loop", "operation", "create");
$l_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (loop_check($rejectedStrs, $input)) $l_matches++;
if (loop_check($rejectedStrs, $input2)) $l_matches++;
if (loop_check($rejectedStrs, $input3)) $l_matches++;
if (loop_check($rejectedStrs, $input4)) $l_matches++;
}
$end = microtime(true);
echo "preg_match: ".$start1." ".$start2."= ".($start2-$start1)."\nloop_match: ".$start2." ".$end."=".($end-$start2);
function preg_check($rejectedStrs, $input) {
if($input == preg_replace($rejectedStrs, "", $input))
return true;
return false;
}
function loop_check($badwords, $string) {
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (stripos($word, $bw) === 0) {
return true;
}
}
return false;
}
}
Output:
preg_match: 1281908071.4032 1281908071.9947= 0.5915060043335
loop_match: 1281908071.9947 1281908073.006=1.0112948417664
This is actually pretty simple, use substr_count.
And example for you would be:
if (substr_count($variable_to_search, "drop"))
{
echo "error";
}
And to make things even simpler, put your keywords (ie. "drop", "create", "alter") in an array and use foreach to check them. That way you cover all your words. An example
foreach ($keywordArray as $keyword)
{
if (substr_count($variable_to_search, $keyword))
{
echo "error"; //or do whatever you want to do went you find something you don't like
}
}