PHP Search systems - how to use LIKE - php

The following code fetches the TITLE attribute for each post. I want to compare it to a search phrase $f using something similar to the way LIKE works with SQL.
<?php
$terms = $_GET['f'];
$searchs = get_posts($args);
foreach($searchs as $search){
$title = get_the_title($search);
Then we would need something like:
if($title is LIKE $f) { }
How would it work?

From the way you word this, I presume it cannot be done in the database with an actual LIKE or full-text search. I think what you're probably after is Regular Expressions, which are a little more advanced than the pattern matching provided by the SQL "LIKE" operator, but can be used for the same purpose.
PHP integrates the "Perl Compatible Regular Expressions" library. Have a look here: http://uk.php.net/manual/en/book.pcre.php or look for introductions to standard regular expression syntax online.
As a simple case, you might want to do this:
// Compose your matching pattern
// Be sure to escape user-supplied parts
// ".*" is roughly equivalent to "%" in LIKE
// the "/i" modifier means "case insensitive"
$pattern = '/.*' . preg_quote($_GET['f']) . '.*/i';
$searches = get_posts($args);
foreach($searches as $search)
{
$title = get_the_title($search);
if ( preg_match($pattern, $title) )
{
# FOUND A MATCH!
}
}

LIKE is a MySQL keyword, you have to use that in an SQL statement like so:
SELECT * FROM posts WHERE title LIKE '%keyword%';
If you want to do your search using PHP, you can use strpos like so:
if(strpos($title, $keyword) !== FALSE) { }

You need to lookup MySQL or PostgreSQL full text search. LIKE isn't what you want if you plan on doing "fuzzy" searches or need more power than simple string comparisons.

Related

Using each match in preg_replace() in PHP

I've read through multiple tutorials on regex and it's associated functions but I am stumped on this one.
I have a really simple replace that looks for a specific delimiter and parses the name of a PHP variable. Here it is:
var_dump(preg_replace('/{{\$(.*?)}}/', ${$1}, $this->file));
I keep getting errors about php not liking the #1 in ${$1}. Fair enough, can't start a variable name with a number, I knew that...
So I tried:
var_dump(preg_replace('/{{\$(.*?)}}/', ${'$1'}, $this->file));
Same thing.
Yet if I try:
var_dump(preg_replace('/{{\$(.*?)}}/', '$1 yo', $this->file));
It works...
So, how do I get php to echo a variable named whatever $1 is.
For example:
$hola = yo;
$string = hello{{$hola}}hello{{$hola}};
var_dump(preg_replace('/{{\$(.*?)}}/', ${$1}, $string));
And the output would be:
helloyohelloyo
Spank you!
EDIT
I should also mention that I am aware that there is a standard recommendation on how to match php variables with regex, but i'd like to get it working with a regex that I fully understand first.
Like so:
$hola = 'yo';
$string = 'hello{{$hola}}hello{{$hola}}';
$result = preg_replace_callback('/\{\{\$(.*?)\}\}/', function ($matches) use ($hola) {
return ${$matches[1]};
}, $string);
var_dump($result);
preg_replace_callback calls a callback on every match.
In order to use the $hola variable inside the callback you need to explicitly make it available inside the function (use ($hola)).
All this said... I don't get it. What this code does is essentially what PHP already does out-of-the-box.
$hola = 'yo';
$string = "hello{$hola}hello{$hola}";
echo $string; // "helloyohelloyo"

PHP regex parsing - splitting tokens in my own language. Is there a better way?

I am creating my own language.
The goal is to "compile" it to PHP or Javascript, and, ultimately, to interpret and run it on the same language, to make it look like a "middle-level" language.
Right now, I'm focusing on the aspect of interpreting it in PHP and run it.
At the moment, I'm using regex to split the string and extract the multiple tokens.
This is the regex I have:
/\:((?:cons#(?:\d+(?:\.\d+)?|(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|(?:[a-z]+(?:#[a-z]+)?|\^?[\~\&](?:[a-z]+|\d+|\-1)))/g
This is quite hard to read and maintain, even though it works.
Is there a better way of doing this?
Here is an example of the code for my language:
:define:&0:factorial
:param:~0:static
:case
:lower#equal:cons#1
:case:end
:scope
:return:cons#1
:scope:end
:scope
:define:~0:static
:define:~1:static
:require:static
:call:static#sub:^~0:~1 :store:~0
:call:&-1:~0 :store:~1
:call:static#sum:^~0:~1 :store:~0
:return:~0
:scope:end
:define:end
This defines a recursive function to calculate the factorial (not so well written, that isn't important).
The goal is to get what is after the :, including the #. :static#sub is a whole token, saving it without the :.
Everything is the same, except for the token :cons, which can take a value after. The value is a numerical value (integer or float, called static or dynamic in the language, respectively) or a string, which must start and end with ", supporting escaping like \". Multi-line strings aren't supported.
Variables are the ones with ~0, using ^ before will get the value to the above :scope.
Functions are similar, being used &0 instead and &-1 points to the current function (no need for ^&-1 here).
Said this, Is there a better way to get the tokens?
Here you can see it in action: http://regex101.com/r/nF7oF9/2
[Update] To issue the pattern being complicated and maintainability, you can split it using PCRE_EXTENDED, and comments:
preg_match('/
# read constant (?)
\:((?:cons#(?:\d+(?:\.\d+)?|
# read a string (?)
(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|
# read an identifier (?)
(?:[a-z]+(?:#[a-z]+)?|
# read whatever
\^?[\~\&](?:[a-z]+|\d+|\-1)))
/gx
', $input)
Beware that all space are ignored, except under certain conditions (\n is normally "safe").
Now, if you want to pimp you lexer and parser, then read that:
What does (f)lex [GNU equivalent of LEX] is simply let you pass a list of regexp, and eventually a "group". You can also try ANTLR and PHP Target Runtime to get the work done.
As for you request, I've made a lexer in the past, following the principle of FLEX. The idea is to cycle through the regexp like FLEX does:
$regexp = [reg1 => STRING, reg2 => ID, reg3 => WS];
$input = ...;
$tokens = [];
while ($input) {
$best = null;
$k = null;
for ($regexp as $re => $kind) {
if (preg_match($re, $input, $match)) {
$best = $match[0];
$k = $kind;
break;
}
}
if (null === $best) {
throw new Exception("could not analyze input, invalid token");
}
$tokens[] = ['kind' => $kind, 'value' => $best];
$input = substr($input, strlen($best)); // move.
}
Since FLEX and Yacc/Bison integrates, the usual pattern is to read until next token (that is, they don't do a loop that read all input before parsing).
The $regexp array can be anything, I expected it to be a "regexp" => "kind" key/value, but you can also an array like that:
$regexp = [['reg' => '...', 'kind' => STRING], ...]
You can also enable/disable regexp using groups (like FLEX groups works): for example, consider the following code:
class Foobar {
const FOOBAR = "arg";
function x() {...}
}
There is no need to activate the string regexp until you need to read an expression (here, the expression is what come after the "="). And there is no need to activate the class identifier when you are actually in a class.
FLEX's group permits to read comments, using a first regexp, activating some group that would ignore other regexp, until some matches is done (like "*/").
Note that this approach is a naïve approach: a lexer like FLEX will actually generate an automaton, which use different state to represent your need (the regexp is itself an automaton).
This use an algorithm of packed indexes or something alike (I used the naïve "for each" because I did not understand the algorithm enough) which is memory and speed efficient.
As I said, it was something I made in the past - something like 6/7 years ago.
It was on Windows.
It was not particularly quick (well it is O(N²) because of the two loops).
I think also that PHP was compiling the regexp each times. Now that I do Java, I use the Pattern implementation which compile the regexp once, and let you reuse it. I don't know PHP does the same by first looking into a regexp cache if there was already a compiled regexp.
I was using preg_match with an offset, to avoid doing the substr($input, ...) at the end.
You should try to use the ANTLR3 PHP Code Generation Target, since the ANTLR grammar editor is pretty easy to use, and you will have a really more readable/maintainable code :)

safely using the eval function in php: modifying user input to avoid security issues

I am taking over over some webgame code that uses the eval() function in php. I know that this is potentially a serious security issue, so I would like some help vetting the code that checks its argument before I decide whether or not to nix that part of the code. Currently I have removed this section of code from the game until I am sure it's safe, but the loss of functionality is not ideal. I'd rather security-proof this than redesign the entire segment to avoid using eval(), assuming such a thing is possible. The relevant code snip which supposedly prevents malicious code injection is below. $value is a user-input string which we know does not contain ";".
1 $value = eregi_replace("[ \t\r]","",$value);
2 $value = addslashes($value);
3 $value = ereg_replace("[A-z0-9_][\(]","-",$value);
4 $value = ereg_replace("[\$]","-",$value);
5 #eval("\$val = $value;");
Here is my understanding so far:
1) removes all whitespace from $value
2) escapes characters that would need it for a database call (why this is needed is not clear to me)
3) looks for alphanumeric characters followed immediately by \ or ( and replaces the combination of them with -. Presumably this is to remove anything resembling function calls in the string, though why it also removes the character preceding is unclear to me, as is why it would also remove \ after line 2 explicitly adds them.
4) replaces all instances of $ with - in order to avoid anything resembling references to php variables in the string.
So: have any holes been left here? And am I misunderstanding any of the regex above? Finally, is there any way to security-proof this without excluding ( characters? The string to be input is ideally a mathematical formula, and allowing ( would allow for manipulation of order of operations, which currently is impossible.
Evaluate the code inside a VM - see Runkit_Sandbox
Or create a parser for your math. I suggest you use the built-in tokenizer. You would need to iterate tokens and keep track of brackets, T_DNUMBER, T_LNUMBER, operators and maybe T_CONSTANT_ENCAPSED_STRING. Ignore everything else. Then you can safely evaluate the resulting expression.
A quick google search revealed this library. It does exactly what you want...
A simple example using the tokenizer:
$tokens = token_get_all("<?php {$input}");
$expr = '';
foreach($tokens as $token){
if(is_string($token)){
if(in_array($token, array('(', ')', '+', '-', '/', '*'), true))
$expr .= $token;
continue;
}
list($id, $text) = $token;
if(in_array($id, array(T_DNUMBER, T_LNUMBER)))
$expr .= $text;
}
$result = eval("<?php {$expr}");
(test)
This will only work if the input is a valid math expression. Otherwise you'll get a parse error in your eval`d code because of empty brackets and stuff like that. If you need to handle this too, then sanitize the output expression inside another loop. This should take care of the most of the invalid parts:
while(strpos($expr, '()') !== false)
$expr = str_replace('()', '', $expr);
$expr = trim($expr, '+-/*');
Matching what is allowed instead of removing some characters is the best approach here.
I see that you do not filter ` (backtick) that can be used to execute system commands. God only knows what else is also not prevented by trying to sanitize the string... No matter how many holes are found, there is no guarantee that there cannot be more.
Assuming that your language is not quite complex, it may not be that hard to implement it yourself without the use of eval.
The following code is our own attempt to answer the same sort of question:
$szCode = "whatever code you would like to submit to eval";
/* First check against language construct or instructions you don't allow such as (but not limited to) "require", "include", ..." : a simple string search will do */
if ( illegalInstructions( $szCode ) )
{
die( "ILLEGAL" );
}
/* This simple regex detects functions (spend more time on the regex to
fine-tune the function detection if needed) */
if ( preg_match_all( '/(?P<fnc>[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*) ?\(.*?\)/si',$szCode,$aFunctions,PREG_PATTERN_ORDER ) )
{
/* For each function call */
foreach( $aFunctions['fnc'] as $szFnc )
{
/* Check whether we can accept this function */
if ( ! isFunctionAllowed( $szFnc ) )
{
die( "'{$szFnc}' IS ILLEGAL" );
} /* if ( ! q_isFncAllowed( $szFnc ) ) */
}
}
/* If you got up to here ... it means that you accept the risk of evaluating
the PHP code that was submitted */
eval( $szCode );

Custom REGEXP Function to be used in a SQLITE SELECT Statement

I have an SQLITE Database's File which in one of the table columns there is some simple Regular Expressions.
These Expressions are something like /foo(.?) or /foo/bar/(.?) and so on...
Well, when we try to match some text against a Regular Pattern, in PHP, we do:
preg_match( $pattern, $target, $matches )
Replacing the variables with the content, obviously.
What I would like to do is send ANY STRING as value of a WHERE Clause and, when searching the SQLITE Database's File, use each of the stored Regular Expressions to match a pattern in the given string.
I think that using PHP's sqlite_create_function() I can create some kind of routine to do this, but I don't know exactly how, since is the first time I develop using SQLITE.
If interest, it's part of an MVC Routing of a Framework I'm developing.
Thank you very much, in advance.
You can use SQLiteDatabase::createFunction documentation here or PDO::sqliteCreateFunction documentation here
I did something like this:
<?php
function _sqliteRegexp($string, $pattern) {
if(preg_match('/^'.$pattern.'$/i', $string)) {
return true;
}
return false;
}
$PDO->sqliteCreateFunction('regexp', '_sqliteRegexp', 2);
?>
Use:
SELECT route FROM routes WHERE pattern REGEXP 'your/url/string' LIMIT 1
Is this what you are looking for?
e.g. SELECT * FROM `my_table` WHERE `my_column` REGEXP "\/foo(.?)"

match any part of a URL

Can anyone help me figure out how to search a URL and print a class based on its result for example:
http://www.?.com/garden-design.html
What i am trying to achieve is using a switch using PHP mathcing a term of say "garden" after the first trailing slash then it would print a class. If it matched construction it would print a different class. There are only three so i know which words to search for. This doesnt have to be dynamic.
Any help would be appreciated.
If it does not have to be dynamic, you could do it like this:
switch(true)
{
case stripos($_SERVER['REQUEST_URI'], 'garden'):
return 'garden';
break;
case stripos($_SERVER['REQUEST_URI'], 'construction'):
return 'construction';
break;
default:
return 'default';
break;
}
The above is quite explicit. A more compact solution could be
function getCategory($categories)
{
foreach($catgories as $category) {
if stripos($_SERVER['REQUEST_URI'], $category) {
return $category;
}
}
}
$categories = array('garden', 'construction', 'indoor');
echo getCategory($categories);
This will not give you the first word after /, but just check if one of your keywords exists in the requested URI and return it.
You could also use parse_url to split the URI into it's components and work with String functions on the path component, e.g.
$path = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);
In www.example.com/garden-design.html?foo=bar, this would give you just the part garden-design.html. In your scenario you'd probably still do the stripos on it then, so you can just as well do it directly on the URL instead of parsing it.
I'd have thought that making use of parse_url would probably be a good starting point - you could then explode the path component and then do simply strpos comparisons against the three strings you're looking for, if a simple switch statement isn't sufficient. (Be sure to check for !== false if you go down the strpos route.)
This would potentially be faster than a regex based solution.
You can try a regular expression:
/http://www..+.com/(garden|construction|indoor)(-design)?.html/
then $1 would give you garden, construction or indoor as string.
you can also use (.+) to collect any string at that spot.
Update made "-design" optional.
Take a look at the php function strpos(). With it you could do something like:
$url = "http://www.nothing.com/garden-design.html";
if (strpos($url, "/garden-design.html"))
doSomething();
else if (strpos($url, "/construction-design.html"))
doSomethingElse();
else if (strpos($url, "/indoor-design.html"))
doSomethingElseInstead();
String matching is generally quicker and more efficient than regular expressions.

Categories