detect a string contained by another discontinuously - php

Recently I'm working on bad content(such as advertise post) filter of a BBS.And I write a function to detect a string is in another string not continuously.Code as below:
$str = 'helloguys';
$substr1 = 'hlu';
$substr2 = 'elf';
function detect($a,$b) //function that detect a in b
{
$c = '';
for($i=0;$i<=strlen($a);$i++)
{
for($j=0;$j<=strlen($b);$j++)
{
if($a[$i] == $b[$j])
{
$b=substr($b,$j+1);
$c .=$a[$i];
break;
}
}
}
if($c == $a) return true;
else return false;
}
var_dump(detect($substr1,$str)); //true
var_dump(detect($substr2,$str)); //false
Since the filter works before the users do their posts so I think the efficiency here is important.And I wonder if there's any better solution? Thanks!

a faster way to do this is converting $a to a regular expression and match it with $b, so that you just leave the optimization to the PCRE module itself which is written in C code.
for example:
detect("hlu",$b) is equal to preg_match("/h.*l.*u/", $b)
(detect("hlu",$b) && detect("elf",$b)) is equal to preg_match("/(h.*l.*u|e.*l.*f)/", $b)

not sure why you would want to do this. but i was bored
function detect( $a,$b ) {
return count( array_intersect( str_split($b), str_split($a) ) ) == strlen($b);
}

Related

How to sort array by substring? (¨PHP)

I have created method to sort array with values like this: array('regdate','birthday','editdate') which should sort the elements in the way that the elements containing word date should be moved to left like this array('regdate','editdate','birthday')
public function sortColumnsBySubstring($haystack, $substr){
if ($haystack == $substr) {
return 0;
}
return strpos($substr) !== false ? -1 : 1;
}
However it is not clear to me how to make this working. Example which I have found in php manual shows function with no arguments or closures - I use php version 5.2 so I cannot use closures.
All I can think up is this usort($date_cols, $this->sortColumnsBySubstring($value, 'date') but here $value is undefined so it's not solution.
Question is how to implement the function to work correctly?
You need to pass the callback as an array:
usort($date_cols, [$this, 'sortColumnsBySubstring']);
See Callbacks / Callables in PHP docs.
First solution is to my original question:
function cmp($a, $b)
{
$adate = (strpos($a, 'date') !== false);
$bdate = (strpos($b, 'date') !== false);
if (!($adate ^ $bdate)) return strcmp($a, $b);
return $adate ? -1 : 1;
}
$a = array('birthday', 'regdate', 'editdate');
usort($a, 'cmp');
Second solution uses splitting into two arrays, sort and then merge them back. I have tried to use more word related to time to identify the values related to time.
private function getDateColumns(&$array)
{
$search_date_columns = array('date','datetime','timestamp','time','edited','changed','modified','created','datum');
$result = array( array(), array() );
foreach($array as $v1):
$found = false;
foreach($search_date_columns as $v2)
if ( strpos($v1, $v2)!==false )
{ $found = true; break; }
if ($found)
$result[0][] = $v1;
else
$result[1][] = $v1;
endforeach;
return $result;
}
Which is implemented like that:
$date_cols = array('regdate','time','editdate','createdate','personal','mojedatum','edited','test','modified','changed','pokus','timestamp','hlava');
$arrays = $this->getDateColumns($date_cols);
rsort($arrays[0]);
$date_cols = array_merge($arrays[0], $arrays[1]);
unset($arrays);
print_r($date_cols);

Optimizing PHP algorithm

I have built a class that finds the smallest number divisible by a all numbers in a given range.
This is my code:
class SmallestDivisible
{
private $dividers = array();
public function findSmallestDivisible($counter)
{
$this->dividers = range(10, 20);
for($x=1; $x<$counter; $x++) {
if ($this->testIfDevisibleByAll($x, $this->dividers) == true) {
return $x;
}
}
}
private function testIfDevisibleByAll($x, $dividers)
{
foreach($dividers as $divider) {
if ($x % $divider !== 0) {
return false;
}
}
return true;
}
}
$n = new SmallestDivisible();
echo $n->findSmallestDivisible(1000000000);
This class finds a number that is divisible by all numbers in the range from 1 to 20 ($this->dividers).
I know it works well as I tested it with other, lower ranges, but, unfortunately, it is not able to find the solution for range(10, 20) within 30 seconds - and this is the time after which a PHP script is halted.
A parameter that is fed to the findSmallestDivisible method is the ceiling of the group of numbers the script is going to inspect (e.i. from 1 to $counter (1000000000 is this execution)).
I would be grateful for suggestions on how I can optimize this script so that it executes faster.
Your solution is brute-force and simply horrible.
Instead, how about handling it mathematically? You're looking for the lowest common multiple of numbers in your range, so...
function gcd($n, $m) {
$n=abs($n); $m=abs($m);
list($n,$m) = array(min($m,$n),max($m,$n));
while($r = $m % $n) {
list($m,$n) = array($n,$r);
}
return $n;
}
function lcm($n, $m) {
return $m * ($n/gcd($n,$m));
}
function lcm_array($arr) {
while(count($arr) > 1) {
array_push($arr, lcm(array_shift($arr),array_shift($arr)));
}
return array_shift($arr);
}
var_dump(lcm_array(range(10,20)));
// result int(232792560)
This means your original code would have had to do 232,792,560 iterations, no wonder it took so long!
Your goal is an easy mathematical calculation named the least common multiple but using brute force to compute it is totally wrong (as you already found out).
The Wikipedia page lists several reasonable algorithms that can be used to compute it faster.
The one explained in the section "A method using a table" is really fast and doesn't require much memory. You keep only the leftmost column of the table (the numbers you want to get the lcm for) and the rightmost column (the current step). If you implement it I suggest you hardcode a list of prime numbers into your program to avoid computing them.
Here is another solution I came up with.
In short, the algorithm will calculate LCM (lesast common multiple) for a group of numbers.
class Lcmx
{
public $currentLcm = 0;
private function gcd($a, $b)
{
if ($a == 0 || $b == 0)
return abs( max(abs($a), abs($b)) );
$r = $a % $b;
return ($r != 0) ?
$this->gcd($b, $r) :
abs($b);
}
private function lcm($a, $b)
{
return array_product(array($a, $b)) / $this->gcd($a, $b);
}
public function lcm_array($array = array())
{
$factors = $array;
while(count($factors) > 1) {
$this->currentLcm = $this->lcm(array_pop($factors), array_pop($factors));
array_push($factors, $this->currentLcm);
}
return $this;
}
}
$l = new Lcmx;
echo $l->lcm_array(range(1, 20))->currentLcm;
//232792560

Is there possible to check mathematical expression string?

I want to check all brackets start and close properly and also check it is mathematical expression or not in given string.
ex :
$str1 = "(A1+A2*A3)+A5+(B3^B5)*(C1*((A3/C2)+(B2+C1)))"
$str2 = "(A1+A2*A3)+A5)*C1+(B3^B5*(C1*((A3/C2)+(B2+C1)))"
$str3 = "(A1+A2*A3)+A5++(B2+C1)))"
$str4 = "(A1+A2*A3)+A5+(B3^B5)*(C1*(A3/C2)+(B2+C1))"
In above Example $str1 and $str4 are valid string....
Please Help....
You'll need a kind of parser. I don't think you can handle this by a regular expression, because you have to check the amount and the order of parentheses and possible nested ones. This class below is quick PHP port of a Python based Math expression syntax validator of parentheses I found:
class MathExpression {
private static $parentheses_open = array('(', '{', '[');
private static $parentheses_close = array(')', '}', ']');
protected static function getParenthesesType( $c ) {
if(in_array($c,MathExpression::$parentheses_open)) {
return array_search($c, MathExpression::$parentheses_open);
} elseif(in_array($c,MathExpression::$parentheses_close)) {
return array_search($c, MathExpression::$parentheses_close);
} else {
return false;
}
}
public static function validate( $expression ) {
$size = strlen( $expression );
$tmp = array();
for ($i=0; $i<$size; $i++) {
if(in_array($expression[$i],MathExpression::$parentheses_open)) {
$tmp[] = $expression[$i];
} elseif(in_array($expression[$i],MathExpression::$parentheses_close)) {
if (count($tmp) == 0 ) {
return false;
}
if(MathExpression::getParenthesesType(array_pop($tmp))
!= MathExpression::getParenthesesType($expression[$i])) {
return false;
}
}
}
if (count($tmp) == 0 ) {
return true;
} else {
return false;
}
}
}
//Mathematical expressions to validate
$tests = array(
'(A1+A2*A3)+A5+(B3^B5)*(C1*((A3/C2)+(B2+C1)))',
'(A1+A2*A3)+A5)*C1+(B3^B5*(C1*((A3/C2)+(B2+C1)))',
'(A1+A2*A3)+A5++(B2+C1)))',
'(A1+A2*A3)+A5+(B3^B5)*(C1*(A3/C2)+(B2+C1))'
);
// running the tests...
foreach($tests as $test) {
$isValid = MathExpression::validate( $test );
echo 'test of: '. $test .'<br>';
var_dump($isValid);
}
Well I suppose that the thing, you are looking for, is some Context-free grammar or Pushdown automaton. It can not be done only using regular expressions. (at least there is no easy or nice way)
That is because you are dealing with nested structures. Some idea of an implementation can be found here Regular expression to detect semi-colon terminated C++ for & while loops
Use Regular Expression that returns you howmany Opening Brackets and Closing Brackets are there?
then check for the number of both braces....if it is equal then your expression is right otherwise wrong...

Most efficient way to check if array element exists in string

I've been looking for a way to check if any of an array of values exists in a string, but it seems that PHP has no native way of doing this, so I've come up with the below.
My question - is there a better way of doing this, as this seems pretty inefficient? Thanks.
$match_found = false;
$referer = wp_get_referer();
$valid_referers = array(
'dd-options',
'dd-options-footer',
'dd-options-offices'
);
/** Loop through all referers looking for a match */
foreach($valid_referers as $string) :
$referer_valid = strstr($referer, $string);
if($referer_valid !== false) :
$match_found = true;
continue;
endif;
endforeach;
/** If there were no matches, exit the function */
if(!$match_found) :
return false;
endif;
Try with following function:
function contains($input, array $referers)
{
foreach($referers as $referer) {
if (stripos($input,$referer) !== false) {
return true;
}
}
return false;
}
if ( contains($referer, $valid_referers) ) {
// contains
}
What about this:
$exists = true;
array_walk($my_array, function($item, $key) {
$exists &= (strpos($my_string, $item) !== FALSE);
});
var_dump($exists);
This will check if any of the array values exists in the string. If only one is missing, You are given a false response. Should You need to find out which one are not present in the string, try this:
$exists = true;
$not_present = array();
array_walk($my_array, function($item, $key) {
if(strpos($my_string, $item) === FALSE) {
$not_present[] = $item;
$exists &= false;
} else {
$exists &= true;
}
});
var_dump($exists);
var_dump($not_present);
First of, alternate syntax is nice to use, but historically it's used in template files. Since it's structure is easily readable while coupling/decouping the PHP interpreter to interpolate HTML data.
Second, it's generally wise if all your code does is to check something, to immediately return if that condition is met:
$match_found = false;
$referer = wp_get_referer();
$valid_referers = array(
'dd-options',
'dd-options-footer',
'dd-options-offices'
);
/** Loop through all referers looking for a match */
foreach($valid_referers as $string) :
$referer_valid = strstr($referer, $string);
if($referer_valid !== false) :
$match_found = true;
break; // break here. You already know other values will not change the outcome
endif;
endforeach;
/** If there were no matches, exit the function */
if(!$match_found) :
return false;
endif;
// if you don't do anything after this return, it's identical to doing return $match_found
Now as specified by some of the other posts in this thread. PHP has a number of functions that can help. Here's a couple more:
in_array($referer, $valid_referers);// returns true/false on match
$valid_referers = array(
'dd-options' => true,
'dd-options-footer' => true,
'dd-options-offices' => true
);// remapped to a dictionary instead of a standard array
isset($valid_referers[$referer]);// returns true/false on match
Ask if you have any questions.

What is the algorithm for parsing expressions in infix notation?

I would like to parse boolean expressions in PHP. As in:
A and B or C and (D or F or not G)
The terms can be considered simple identifiers. They will have a little structure, but the parser doesn't need to worry about that. It should just recognize the keywords and or not ( ). Everything else is a term.
I remember we wrote simple arithmetic expression evaluators at school, but I don't remember how it was done anymore. Nor do I know what keywords to look for in Google/SO.
A ready made library would be nice, but as I remember the algorithm was pretty simple so it might be fun and educational to re-implement it myself.
Recursive descent parsers are fun to write and easy to read. The first step is to write your grammar out.
Maybe this is the grammar you want.
expr = and_expr ('or' and_expr)*
and_expr = not_expr ('and' not_expr)*
not_expr = simple_expr | 'not' not_expr
simple_expr = term | '(' expr ')'
Turning this into a recursive descent parser is super easy. Just write one function per nonterminal.
def expr():
x = and_expr()
while peek() == 'or':
consume('or')
y = and_expr()
x = OR(x, y)
return x
def and_expr():
x = not_expr()
while peek() == 'and':
consume('and')
y = not_expr()
x = AND(x, y)
return x
def not_expr():
if peek() == 'not':
consume('not')
x = not_expr()
return NOT(x)
else:
return simple_expr()
def simple_expr():
t = peek()
if t == '(':
consume('(')
result = expr()
consume(')')
return result
elif is_term(t):
consume(t)
return TERM(t)
else:
raise SyntaxError("expected term or (")
This isn't complete. You have to provide a little more code:
Input functions. consume, peek, and is_term are functions you provide. They'll be easy to implement using regular expressions. consume(s) reads the next token of input and throws an error if it doesn't match s. peek() simply returns a peek at the next token without consuming it. is_term(s) returns true if s is a term.
Output functions. OR, AND, NOT, and TERM are called each time a piece of the expression is successfully parsed. They can do whatever you want.
Wrapper function. Instead of just calling expr directly, you'll want to write a little wrapper function that initializes the variables used by consume and peek, then calls expr, and finally checks to make sure there's no leftover input that didn't get consumed.
Even with all this, it's still a tiny amount of code. In Python, the complete program is 84 lines, and that includes a few tests.
Why not jsut use the PHP parser?
$terms=array('and','or','not','A','B','C','D'...);
$values=array('*','+','!',1,1,0,0,1....);
$expression="A and B or C and (D or F or not G)";
$expression=preg_replace($terms, $values,$expression);
$expression=preg_replace('^(+|-|!|1|0)','',$expression);
$result=eval($expression);
Actually, that 2nd regex is wrong (and only required if you need to prevent any code injection) - but you get the idea.
C.
I'd go with Pratt parser. It's almost like recursive descent but smarter :) A decent explanation by Douglas Crockford (of JSLint fame) here.
Dijkstra's shunting yard algorithm is the traditional one for going from infix to postfix/graph.
I've implemented the shunting yard algorithm as suggested by plinth. However, this algorithm just gives you the postfix notation, aka reverse Polish notation (RNP). You still have to evaluate it, but that's quite easy once you have the expression in RNP (described for instance here).
The code below might not be good PHP style, my PHP knowledge is somewhat limited. It should be enough to get the idea though.
$operators = array("and", "or", "not");
$num_operands = array("and" => 2, "or" => 2, "not" => 1);
$parenthesis = array("(", ")");
function is_operator($token) {
global $operators;
return in_array($token, $operators);
}
function is_right_parenthesis($token) {
global $parenthesis;
return $token == $parenthesis[1];
}
function is_left_parenthesis($token) {
global $parenthesis;
return $token == $parenthesis[0];
}
function is_parenthesis($token) {
return is_right_parenthesis($token) || is_left_parenthesis($token);
}
// check whether the precedence if $a is less than or equal to that of $b
function is_precedence_less_or_equal($a, $b) {
// "not" always comes first
if ($b == "not")
return true;
if ($a == "not")
return false;
if ($a == "or" and $b == "and")
return true;
if ($a == "and" and $b == "or")
return false;
// otherwise they're equal
return true;
}
function shunting_yard($input_tokens) {
$stack = array();
$output_queue = array();
foreach ($input_tokens as $token) {
if (is_operator($token)) {
while (is_operator($stack[count($stack)-1]) && is_precedence_less_or_equal($token, $stack[count($stack)-1])) {
$o2 = array_pop($stack);
array_push($output_queue, $o2);
}
array_push($stack, $token);
} else if (is_parenthesis($token)) {
if (is_left_parenthesis($token)) {
array_push($stack, $token);
} else {
while (!is_left_parenthesis($stack[count($stack)-1]) && count($stack) > 0) {
array_push($output_queue, array_pop($stack));
}
if (count($stack) == 0) {
echo ("parse error");
die();
}
$lp = array_pop($stack);
}
} else {
array_push($output_queue, $token);
}
}
while (count($stack) > 0) {
$op = array_pop($stack);
if (is_parenthesis($op))
die("mismatched parenthesis");
array_push($output_queue, $op);
}
return $output_queue;
}
function str2bool($s) {
if ($s == "true")
return true;
if ($s == "false")
return false;
die('$s doesn\'t contain valid boolean string: '.$s.'\n');
}
function apply_operator($operator, $a, $b) {
if (is_string($a))
$a = str2bool($a);
if (!is_null($b) and is_string($b))
$b = str2bool($b);
if ($operator == "and")
return $a and $b;
else if ($operator == "or")
return $a or $b;
else if ($operator == "not")
return ! $a;
else die("unknown operator `$function'");
}
function get_num_operands($operator) {
global $num_operands;
return $num_operands[$operator];
}
function is_unary($operator) {
return get_num_operands($operator) == 1;
}
function is_binary($operator) {
return get_num_operands($operator) == 2;
}
function eval_rpn($tokens) {
$stack = array();
foreach ($tokens as $t) {
if (is_operator($t)) {
if (is_unary($t)) {
$o1 = array_pop($stack);
$r = apply_operator($t, $o1, null);
array_push($stack, $r);
} else { // binary
$o1 = array_pop($stack);
$o2 = array_pop($stack);
$r = apply_operator($t, $o1, $o2);
array_push($stack, $r);
}
} else { // operand
array_push($stack, $t);
}
}
if (count($stack) != 1)
die("invalid token array");
return $stack[0];
}
// $input = array("A", "and", "B", "or", "C", "and", "(", "D", "or", "F", "or", "not", "G", ")");
$input = array("false", "and", "true", "or", "true", "and", "(", "false", "or", "false", "or", "not", "true", ")");
$tokens = shunting_yard($input);
$result = eval_rpn($tokens);
foreach($input as $t)
echo $t." ";
echo "==> ".($result ? "true" : "false")."\n";
You could use an LR parser to build a parse tree and then evaluate the tree to obtain the result. A detailed description including examples can be found in Wikipedia. If you haven't coded it yourself already I will write a small example tonight.
The simplest way is to use regexes that converts your expression into an expression in php syntax and then use eval, as suggested by symcbean. But I'm not sure if you would want to use it in production code.
The other way is to code your own simple recursive descent parser. It isn't as hard as it might sound. For a simple grammar such yours (boolean expressions), you can easily code one from scratch. You can also use a parser generator similar to ANTLR for php, probably searching for a php parser generator would turn up something.

Categories