Given following (infix) expression:
(country = be or country = nl) and
(language = en or language = nl) and
message contains twitter
I'd like to create the following 4 infix notations:
message contains twitter and country = be and language = en
message contains twitter and country = be and language = en
message contains twitter and country = nl and language = nl
message contains twitter and country = nl and language = nl
So, basically, I would like to get rid of all OR's.
I already have a postfix notation for the first expression, so I'm currently trying to process that to get the desired notation. This particular situation, however, causes trouble.
(For illustration purposes, the postfix notation for this query would be:)
country be = country nl = or language en = language = nl or and message twitter contains and
Does anyone know of an algorithm to achieve this?
Break the problem into two steps: postfix to multiple postfix, postfix to infix. Each step is performed by "interpreting" a postfix expression.
For the postfix to multiple postfix interpreter: the stack values are collections of postfix expressions. The interpretation rules are as follows.
<predicate>: push a one-element collection containing <predicate>.
AND: pop the top two collections into C1 and C2. With two nested loops,
create a collection containing x y AND for all x in C1 and y in C2.
Push this collection.
OR: pop the top two collections into C1 and C2. Push the union of C1 and C2.
For the postfix to infix interpreter: the stack values are infix expressions.
<predicate>: push <predicate>.
AND: pop two expressions into x and y. Push the expression (x) and (y).
These steps could be combined, but I wanted to present two examples of this technique.
It might be easiest to work with a tree representation. Use the shunting yard algorithm to build a binary tree representing the equation. A node in the tree might be:
class Node {
const OP = 'operator';
const LEAF = 'leaf';
$type = null; // Will be eight Node::OP or Node::LEAF
$op = null; // could be 'or' or 'and' 'contains';
$value = null; // used for leaf eg 'twitter'
$left = null;
$right = null;
}
although you could use sub-classes. In the shunting yard algorithm you want the change the output steps to produce a tree.
Once you have a tree representation you need several algorithms.
First you need an algorithm to copy a tree
public function copy($node) {
if($node->type == Node::LEAF) {
$node2 = new Node();
$node2->type = Node::LEAF;
$node2->value = $node->value;
return $node2;
}
else {
$left = copy($node->left);
$right = copy($node->right);
$node2 = new Node();
$node2->type = Node::OP;
$node2->op = $node->op;
$node2->left = $node->left;
$node2->right = $node->right;
return $node2;
}
}
Next the algorithm to find the first 'or' operator node.
function findOr($node) {
if($node->type == Node::OP && $node->op == 'or') {
return $node;
} else if($node->type == Node::OP ) {
$leftRes = findOr($node->$left);
if( is_null($leftRes) ) {
$rightRes = findOr($node->$right); // will be null or a found node
return $rightRes;
} else {
return $leftRes; // found one on the left, no need to walk rest of tree
}
} else {
return null;
}
}
and finally an algorithm copyLR giving either the left (true) or right (false) branch. It behaves as copy unless the node matches $target when either the left or right branch is returned.
public function copyLR($node,$target,$leftRight) {
if($node == $target) {
if($leftRight)
return copy($node->left);
else
return copy($node->right);
}
else if($node->type == Node::LEAF) {
$node2 = new Node();
$node2->type = Node::LEAF;
$node2->value = $node->value;
return $node2;
}
else {
$left = copy($node->left,$target,$leftRight);
$right = copy($node->right,$target,$leftRight);
$node2 = new Node();
$node2->type = Node::OP;
$node2->op = $node->op;
$node2->left = $node->left;
$node2->right = $node->right;
return $node2;
}
}
The pieces are now put together
$root = parse(); // result from the parsing step
$queue = array($root);
$output = array();
while( count($queue) > 0) {
$base = array_shift($queue);
$target = findOr($base);
if(is_null($target)) {
$output[] = $base; // no or operators found so output
} else {
// an 'or' operator found
$left = copyLR($base,$target,true); // copy the left
$right = copyLR($base,$target,false); // copy the right
array_push($left); // push both onto the end of the queue
array_push($right);
}
}
Related
I have a piece of PHP code that I am trying to port over to Python that I am unsure how to get working without references.
Essentially it's a config class that works like a tree, each key can have a simple value, or it's own set of keys and values. Part of the class requires being able to set one specific part of the tree without having to send an entire new dict for a root key.
{ "caching": { "enabled": true }}
For example, the above could be a simple configuration. And calling the below code would change true to false
Config.set('caching:enabled', false);
In order to accomplish this in PHP I use references
class Config
{
private static $aValues;
public static function set($key, $value)
{
if(strpos($key, ':')) {
$aKeys = explode(':', $key);
$iCount = count($aKeys);
} else {
$aKeys = array($key);
$iCount = 1
}
$mData = &self::$aValues
for($i = 0; $i < $iCount; ++$i)
{
if(!isset($mData[$aKeys[$i])) {
$mData[$aKeys[$i]] = array();
}
$mData = &$mData[$aKeys[$i]];
if($i == ($iCount - 1)) {
$mData = $value;
}
}
}
}
But if I try to do something similar in Python
_dmValues = dict()
def set(key, value):
global _dmValues
if key.find(':'):
aKey = key.split(':')
iCount = len(key)
else:
aKey = (key,)
iCount = 1
mData = _dmValues;
for i in range(0, iCount):
if key[i] not in mData.keys():
mData[key[i]] = dict()
mData = mData[key[i]]
if i == (iCount - 1):
mData = value
It doesn't work, mData is the right value, but since I have written to it, it is no longer a reference.
How can I go about doing this? Is it even possible in Python, or should I just re-write my logic from scratch and give up on a perfect port?
You can make your set method as follows:
_dmValues = { "caching": { "enabled": True }}
def set(key, value):
global _dmValues
key1,key2 = key.split(':')
mData = _dmValues;
if key1 in mData:
if key2 in mData[key1]:
mData[key1][key2] = value
set('caching:enabled', False)
print(_dmValues) # {'caching': {'enabled': False}}
Though probably it would be better to remove the global value and pass reference to the dict as an argument:
def set(mData, key, value):
key1,key2 = key.split(':')
if key1 in mData:
if key2 in mData[key1]:
mData[key1][key2] = value
set(_dmValues, 'caching:enabled', False)
print(_dmValues) # {'caching': {'enabled': False}}
I played around with it more and realised I had the solution, I was just applying it improperly.
Each dictionary, even if it's part of a key of another dictionary, can be passed around by reference. Meaning if I change a key in that dictionary, it will change in the parent as well. Unfortunately I was changing the variable that was a reference to the dictionary, not the dictionary itself.
This works perfectly
mData = _dm_Values
for i in range(0, iCount):
if i == (iCount - 1):
mData[key[i]] = value
else:
if key[i] not in mData.keys():
mData[key[i]] = dict()
mData = mData[key[i]]
I have an array of symbols (not only characters, but also syllables, such as 'p', 'pa', etc.) and I'm trying to come up with a good algorithm to identify words that can be created by concatenating those symbols.
e.g. given the array of symbols ('p', 'pa', 'aw'), the string 'paw' would be a positive match.
This is my current implementation (too slow):
function isValidWord($word,&$symbols){
$nodes = array($word);
while (count($nodes)>0){
$node = array_shift($nodes);
$nodeExpansions = array();
$nodeLength = strlen($node);
if (in_array($node,$symbols)) { return true; }
for ($len=$nodeLength-1;$len>0;$len--){
if (in_array(substr($node, 0, $len), $symbols)){
$nodeExpansions[] = substr($node, $len-$nodeLength);
}
}
$nodes = array_merge($nodeExpansions,$nodes);
}
return false;
}
It doesn't seem like a difficult problem, it's just a depth-first search implementation on an acyclic? tree, but I'm struggling to come up with an implementation which is both memory and CPU efficient. Where can I find resources to learn about this kind of problem?
Also, here is a link to a script for testing it and comparing it to the solutions proposed in the comments below: http://ideone.com/zQ9Cie
And here an album showing captures of really odd results: How can my current iterative method be 12x faster than the recursive one (proposed by #Waleed Khan) when I run them on my dev server, but 2x slower when I run them on my production server, considering both servers have almost identical configurations? (One is an EC2 micro instance and the other a VirtualBox container, but they both have the same OS, config, updates, PHP version and config, number of cores and available RAM)
Not sure wether it's very efficient but I guess I would create a loop with an inner loop which goes through the given array containg the symbols.
<?php
$aSymbols = array('p', 'pa', 'aw');
$aDatabase = array('paw');
$aMatches = array();
for ($iCounter = 0; $iCounter < count($aSymbols); $iCounter++)
{
for ($yCounter = 0; $yCounter < count($aSymbols); $yCounter++)
{
$sString = $aSymbols[$iCounter].$aSymbols[$yCounter];
if (in_array($sString, $aDatabase))
{
$aMatches[] = $sString;
}
}
}
?>
The if query can be replaced by a regex query, too.
As #Waleed Khan suggested, I've tried improving my algorithm using a Trie structure for the dictionary instead of a plain array to speed up the search for matches.
function generateTrie(&$dictionary){
if (is_string($dictionary)){
$dictionary = array($dictionary);
}
if (!is_array($dictionary)){
throw new Exception(
"Invalid input argument for \$dictionary (must be array)",
500
);
}
$trie = array();
$dictionaryCount = count($dictionary);
$f = false;
for ($i=0;$i<$dictionaryCount;$i++){
$word = $dictionary[$i];
if ($f&&!inTrie('in',$trie)){
var_export($trie);
exit;
}
if (!is_string($word)){
throw new Exception(
"Invalid input argument for \$word (must be string)",
500
);
}
$wordLength = strlen($word);
$subTrie = &$trie;
for ($j=1;$j<$wordLength;$j++){
if (array_key_exists($subWord = substr($word,0,$j),$subTrie)){
$subTrie = &$subTrie[$subWord];
}
}
if (array_key_exists($word,$subTrie)){
continue;
}
$keys = array_keys($subTrie);
if (!array_key_exists($word,$subTrie)) {
$subTrie[$word] = array();
}
foreach ($keys as $testWordForPrefix){
if (substr($testWordForPrefix,0,$wordLength) === $word){
$subTrie[$word][$testWordForPrefix] = &$subTrie[$testWordForPrefix];
unset($subTrie[$testWordForPrefix]);
}
}
}
return $trie;
}
/**
* Checks if word is on dictionary trie
*/
function inTrie($word, &$trie){
$wordLen = strlen($word);
$node = &$trie;
$found = false;
for ($i=1;$i<=$wordLen;$i++){
$index = substr($word,0,$i);
if (isset($node[$index])){
$node = &$node[$index];
$found = true;
} else {
$found = false;
}
}
return $found;
}
/**
* Checks if a $word is a concatenation of valid $symbols using inTrie()
*
* E.g. `$word = 'paw'`, `$symbols = array('p', 'pa', 'aw')` would return
* true, because `$word = 'p'.'aw'`
*
*/
function isValidTrieWord($word,&$trie){
$nodes = array($word);
while (count($nodes)>0){
$node = array_shift($nodes);
if (inTrie($node,$trie)) { return true; }
$nodeExpansions = array();
$nodeLength = strlen($node);
for ($len=$nodeLength-1;$len>0;$len--){
if (inTrie(substr($node, 0, $len), $trie)){
$nodeExpansions[] = substr($node, $len-$nodeLength);
}
}
$nodes = array_merge($nodeExpansions,$nodes);
}
return false;
}
It doesn't make much of a difference for small dictionary sizes (where preg_match is still the fastest implementation by several orders of magnitude), but for medium sized dictionaries (~10000 symbols) where longer symbols are usually a combination of shorter ones (which is where preg breaks and the other two implementations can take close to 25 seconds per 2-6 symbols word), the Trie search takes only about 1 second. That's close enough for my needs (check if a given password is a combination of symbols from a given dictionary or not).
(See the whole script on http://ideone.com/zQ9Cie)
Results on my local dev VM:
Results on my AWS EC2 test server:
I have a particularly large graph, making it nearly impossible to traverse using recursion because of the excessive amount of memory it uses.
Below is my depth-first function, using recursion:
public function find_all_paths($start, $path)
{
$path[] = $start;
if (count($path)==25) /* Only want a path of maximum 25 vertices*/ {
$this->stacks[] = $path;
return $path;
}
$paths = array();
for($i = 0; $i < count($this->graph[$start])-1; $i++) {
if (!in_array($this->graph[$start][$i], $path)) {
$paths[] = $this->find_all_paths($this->graph[$start][$i], $path);
}
}
return $paths;
}
I would like to rewrite this function so it is non-recursive. I assume I will need to make a queue of some sort, and pop off values using array_shift() but in which part of the function, and how do I make sure the queued vertices are preserved (to put the final pathway on $this->stacks)?
It doesn't take exponential space, number of paths in a tree is equal to number of leaves, every leaf has only 1 path from the root ..
Here is a DFS simple search for an arbitrary binary tree:
// DFS: Parent-Left-Right
public function dfs_search ( $head, $key )
{
var $stack = array($head);
var $solution = array();
while (count($stack) > 0)
{
$node = array_pop($stack);
if ($node.val == $key)
{
$solution[] = $node;
}
if ($node.left != null)
{
array_push($stack, $node.left);
}
if ($node.right != null)
{
array_push($stack, $node.right);
}
}
return $solution;
}
What you need to find all paths in a tree is simply Branch & Fork, meaning whenever you branch, each branch takes a copy of the current path .. here is a 1-line recursive branch & fork I wrote:
// Branch & Fork
public function dfs_branchFork ( $node, $path )
{
return array($path)
+($node.right!=null?dfs_branchFork($node.right, $path+array($node)):null)
+($node.left!=null?dfs_branchFork($node.left, $path+array($node)):null);
}
I want to dynamically access value of variable, let's say I have this array:
$aData = array(
'test' => 123
);
Standard approach to print the test key value would be:
print $aData['test'];
However, if I have to work with string representation of variable (for dynamic purposes)
$sItem = '$aData[\'test\']';
how can I achieve to print aData key named test? Neither of examples provided below works
print $$sItem;
print eval($sItem);
What would be the solution?
Your eval example is lacking the return value:
print eval("return $sItem;");
should do it:
$aData['test'] = 'foo';
$sItem = '$aData[\'test\']';
print eval("return $sItem;"); # foo
But it's not recommended to use eval normally. You can go into hell's kitchen with it because eval is evil.
Instead just parse the string and return the value:
$aData['test'] = 'foo';
$sItem = '$aData[\'test\']';
$r = sscanf($sItem, '$%[a-zA-Z][\'%[a-zA-Z]\']', $vName, $vKey);
if ($r === 2)
{
$result = ${$vName}[$vKey];
}
else
{
$result = NULL;
}
print $result; # foo
This can be done with some other form of regular expression as well.
As your syntax is very close to PHP an actually a subset of it, there is some alternative you can do if you want to validate the input before using eval. The method is to check against PHP tokens and only allow a subset. This does not validate the string (e.g. syntax and if a variable is actually set) but makes it more strict:
function validate_tokens($str, array $valid)
{
$vchk = array_flip($valid);
$tokens = token_get_all(sprintf('<?php %s', $str));
array_shift($tokens);
foreach($tokens as $token)
if (!isset($vchk[$token])) return false;
return true;
}
You just give an array of valid tokens to that function. Those are the PHP tokens, in your case those are:
T_LNUMBER (305) (probably)
T_VARIABLE (309)
T_CONSTANT_ENCAPSED_STRING (315)
You then just can use it and it works with more complicated keys as well:
$aData['test'] = 'foo';
$aData['te\\\'[]st']['more'] = 'bar';
$sItem = '$aData[\'test\']';
$vValue = NULL;
if (validate_tokens($sItem, array(309, 315, '[', ']')))
{
$vValue = eval("return $sItem;");
}
I used this in another answer of the question reliably convert string containing PHP array info to array.
No eval necessary if you have (or can get) the array name and key into separate variables:
$aData = array(
'test' => 123
);
$arrayname = 'aData';
$keyname = 'test';
print ${$arrayname}[$keyname]; // 123
You can just use it like an ordinary array:
$key = "test";
print $aData[$key];
Likewise $aData could itself be an entry in a larger array store.
As alternative, extracting the potential array keys using a regex and traversing an anonymous array (should have mentioned that in your question, if) with references would be possible. See Set multi-dimensional array by key path from array values? and similar topics.
Personally I'm using a construct like this to utilize dynamic variable paths like varname[keyname] instead (similar to how PHP interprets GET parameters). It's just an eval in sheeps clothing (do not agree with the eval scaremongering though):
$val = preg_replace("/^(\w)+(\[(\w+)])$/e", '$\1["\3"]', "aData[test]");
The only solution in your case is to use Eval().
But please be very very very careful when doing this! Eval will evaluate (and execute) any argument you pass to it as PHP. So if you will feed it something that comes from users, then anyone could execute any PHP code on your server, which goes without saying is a security hole the size of Grand canyon!.
edit: you will have to put a "print" or "echo" inside your $sItem variable somehow. It will either have to be in $sItem ($sItem = 'echo $aData[\'test\']';) or you will have to write your Eval() like this: Eval ( 'echo ' . $sData ).
$sItem = '$aData[\'test\']';
eval('$someVar = '.$sItem.';');
echo $someVar;
Use eval() with high caution as others aldready explained.
You could use this method
function getRecursive($path, array $data) {
// transform "foo['bar']" and 'foo["bar"]' to "foo[bar]"
$path = preg_replace('#\[(?:"|\')(.+)(?:"|\')\]#Uis', '[\1]', $path);
// get root
$i = strpos($path, '[');
$rootKey = substr($path, 0, $i);
if (!isset($data[$rootKey])) {
return null;
}
$value = $data[$rootKey];
$length = strlen($path);
$currentKey = null;
for (; $i < $length; ++$i) {
$char = $path[$i];
switch ($char) {
case '[':
if ($currentKey !== null) {
throw new InvalidArgumentException(sprintf('Malformed path, unexpected "[" at position %u', $i));
}
$currentKey = '';
break;
case ']':
if ($currentKey === null) {
throw new InvalidArgumentException(sprintf('Malformed path, unexpected "]" at position %u', $i));
}
if (!isset($value[$currentKey])) {
return null;
}
$value = $value[$currentKey];
if (!is_array($value)) {
return $value;
}
$currentKey = null;
break;
default:
if ($currentKey === null) {
throw new InvalidArgumentException(sprintf('Malformed path, unexpected "%s" at position %u', $char, $i));
}
$currentKey .= $char;
break;
}
}
if ($currentKey !== null) {
throw new InvalidArgumentException('Malformed path, must be and with "]"');
}
return $value;
}
I have problem that it cant detect every word in string. similar to filter or tag or category or sort of..
$title = "What iS YouR NAME?";
$english = Array( 'Name', 'Vacation' );
if(in_array(strtolower($title),$english)){
$language = 'english';
} else if(in_array(strtolower($title),$france)){
$language = 'france';
} else if(in_array(strtolower($title),$spanish)){
$language = 'spanish';
} else if(in_array(strtolower($title),$chinese)){
$language = 'chinese';
} else if(in_array(strtolower($title),$japanese)){
$language = 'japanese';
} else {
$language = null;
}
output is null.. =/
No real problem here... the string in $title isn't in any of the arrays you are testing with, even in lower case.
Try testing every word in each language tab against the string instead.
$english = Array( 'Name', 'Vacation' );
$languages = array('english'=>$english,
'france'=>$france,
'spanish'=>$spanish,
'chinese'=>$chinese,
'japenese'=>$japenese);
while($line = mysql_fetch_assoc($result)) {
$title = $line['title'];
//$lower_title = strtolower($title); stristr() instead.
$language_found = false;
foreach($languages as $language_name=>$language) {
idx = 0;
while(($language_found === false) && idx < count($language)) {
$word = $language[$idx];
if(stristr($title, $word) !== false) {
$language_found = $language_name;
}
$idx++;
}
}
// got language name or false...
}
You could also breaking the string using explode of course and testing each word in the created array for each language.
The output is null because the string "what is your name?" is not in any of the language arrays. Note that you should not write language names in code.
Instead, a dictionary or array of languages (in the form of dictionaries or objects) allows future extension and separates data from control logic.
There are several logic issues:
The first problem is that you are trying to see if a multi-word string is in an array of single words. That will always fail to find a match. You would need to break up $title with explode and loop over the words.
ie) $title_words = explode(' ', strtolower($title));
foreach($title_words as $word){
//check language
//now you can use in_array and expect some matches
}
However that presents a second issue.What if a word is in multiple languages?
A third issue is in your sample, you have converted your search string to lowercase, but your array of matches has all words with an upper case first letter.
ie)
$english = Array( 'Name', 'Vacation' ); should be
$english = array( 'name', 'vacation' );
if you expect matches