Custom REGEXP Function to be used in a SQLITE SELECT Statement

Custom REGEXP Function to be used in a SQLITE SELECT Statement - php

I have an SQLITE Database's File which in one of the table columns there is some simple Regular Expressions.
These Expressions are something like /foo(.?) or /foo/bar/(.?) and so on...
Well, when we try to match some text against a Regular Pattern, in PHP, we do:
preg_match( $pattern, $target, $matches )
Replacing the variables with the content, obviously.
What I would like to do is send ANY STRING as value of a WHERE Clause and, when searching the SQLITE Database's File, use each of the stored Regular Expressions to match a pattern in the given string.
I think that using PHP's sqlite_create_function() I can create some kind of routine to do this, but I don't know exactly how, since is the first time I develop using SQLITE.
If interest, it's part of an MVC Routing of a Framework I'm developing.
Thank you very much, in advance.

You can use SQLiteDatabase::createFunction documentation here or PDO::sqliteCreateFunction documentation here
I did something like this:
<?php
function _sqliteRegexp($string, $pattern) {
if(preg_match('/^'.$pattern.'$/i', $string)) {
return true;
}
return false;
}
$PDO->sqliteCreateFunction('regexp', '_sqliteRegexp', 2);
?>
Use:
SELECT route FROM routes WHERE pattern REGEXP 'your/url/string' LIMIT 1

Is this what you are looking for?
e.g. SELECT * FROM `my_table` WHERE `my_column` REGEXP "\/foo(.?)"

Related

PHP regex parsing - splitting tokens in my own language. Is there a better way?

I am creating my own language.
The goal is to "compile" it to PHP or Javascript, and, ultimately, to interpret and run it on the same language, to make it look like a "middle-level" language.
Right now, I'm focusing on the aspect of interpreting it in PHP and run it.
At the moment, I'm using regex to split the string and extract the multiple tokens.
This is the regex I have:
/\:((?:cons#(?:\d+(?:\.\d+)?|(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|(?:[a-z]+(?:#[a-z]+)?|\^?[\~\&](?:[a-z]+|\d+|\-1)))/g
This is quite hard to read and maintain, even though it works.
Is there a better way of doing this?
Here is an example of the code for my language:
:define:&0:factorial
:param:~0:static
:case
:lower#equal:cons#1
:case:end
:scope
:return:cons#1
:scope:end
:scope
:define:~0:static
:define:~1:static
:require:static
:call:static#sub:^~0:~1 :store:~0
:call:&-1:~0 :store:~1
:call:static#sum:^~0:~1 :store:~0
:return:~0
:scope:end
:define:end
This defines a recursive function to calculate the factorial (not so well written, that isn't important).
The goal is to get what is after the :, including the #. :static#sub is a whole token, saving it without the :.
Everything is the same, except for the token :cons, which can take a value after. The value is a numerical value (integer or float, called static or dynamic in the language, respectively) or a string, which must start and end with ", supporting escaping like \". Multi-line strings aren't supported.
Variables are the ones with ~0, using ^ before will get the value to the above :scope.
Functions are similar, being used &0 instead and &-1 points to the current function (no need for ^&-1 here).
Said this, Is there a better way to get the tokens?
Here you can see it in action: http://regex101.com/r/nF7oF9/2

[Update] To issue the pattern being complicated and maintainability, you can split it using PCRE_EXTENDED, and comments:
preg_match('/
# read constant (?)
\:((?:cons#(?:\d+(?:\.\d+)?|
# read a string (?)
(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|
# read an identifier (?)
(?:[a-z]+(?:#[a-z]+)?|
# read whatever
\^?[\~\&](?:[a-z]+|\d+|\-1)))
/gx
', $input)
Beware that all space are ignored, except under certain conditions (\n is normally "safe").
Now, if you want to pimp you lexer and parser, then read that:
What does (f)lex [GNU equivalent of LEX] is simply let you pass a list of regexp, and eventually a "group". You can also try ANTLR and PHP Target Runtime to get the work done.
As for you request, I've made a lexer in the past, following the principle of FLEX. The idea is to cycle through the regexp like FLEX does:
$regexp = [reg1 => STRING, reg2 => ID, reg3 => WS];
$input = ...;
$tokens = [];
while ($input) {
$best = null;
$k = null;
for ($regexp as $re => $kind) {
if (preg_match($re, $input, $match)) {
$best = $match[0];
$k = $kind;
break;
}
}
if (null === $best) {
throw new Exception("could not analyze input, invalid token");
}
$tokens[] = ['kind' => $kind, 'value' => $best];
$input = substr($input, strlen($best)); // move.
}
Since FLEX and Yacc/Bison integrates, the usual pattern is to read until next token (that is, they don't do a loop that read all input before parsing).
The $regexp array can be anything, I expected it to be a "regexp" => "kind" key/value, but you can also an array like that:
$regexp = [['reg' => '...', 'kind' => STRING], ...]
You can also enable/disable regexp using groups (like FLEX groups works): for example, consider the following code:
class Foobar {
const FOOBAR = "arg";
function x() {...}
}
There is no need to activate the string regexp until you need to read an expression (here, the expression is what come after the "="). And there is no need to activate the class identifier when you are actually in a class.
FLEX's group permits to read comments, using a first regexp, activating some group that would ignore other regexp, until some matches is done (like "*/").
Note that this approach is a naïve approach: a lexer like FLEX will actually generate an automaton, which use different state to represent your need (the regexp is itself an automaton).
This use an algorithm of packed indexes or something alike (I used the naïve "for each" because I did not understand the algorithm enough) which is memory and speed efficient.
As I said, it was something I made in the past - something like 6/7 years ago.
It was on Windows.
It was not particularly quick (well it is O(N²) because of the two loops).
I think also that PHP was compiling the regexp each times. Now that I do Java, I use the Pattern implementation which compile the regexp once, and let you reuse it. I don't know PHP does the same by first looking into a regexp cache if there was already a compiled regexp.
I was using preg_match with an offset, to avoid doing the substr($input, ...) at the end.
You should try to use the ANTLR3 PHP Code Generation Target, since the ANTLR grammar editor is pretty easy to use, and you will have a really more readable/maintainable code :)

str_replace: Replace string with a function

Just a simple question. I have a contact form stored in a function because it's just easier to call it on the pages I want it to have.
Now to extend usability, I want to search for {contactform} using str_replace.
Example:
function contactform(){
// bunch of inputs
}
$wysiwyg = str_replace('{contactform}', contactform(), $wysiwyg);
So basically, if {contactform} is found. Replace it with the output of contactform.
Now I know that I can run the function before the replace and store its output in a variable, and then replace it with that same variable. But I'm interested to know if there is a better method than the one I have in mind.
Thanks

To answer your question, you could use PCRE and preg_replace_callback and then either modify your contactform() function or create a wrapper that accepts the matches.
I think your idea of running the function once and storing it in a variable makes more sense though.

Your method is fine, I would set it as a $var if you are planning to use the contents of contactform() more than once.
It might pay to use http://php.net/strpos to check if {contact_form} exists before running the str_replace function.
You could try both ways, and if your server support it, benchmark:
<?php echo 'Memory Usage: '. (!function_exists('memory_get_usage') ? '0' : round(memory_get_usage()/1024/1024, 2)) .'MB'; ?>

you may want to have a look at php's call_user_func() more information here http://php.net/call_user_func
$wysiwyg = 'Some string and {contactform}';
$find = '{contactform}';
strpos($wysiwyg, $find) ? call_user_func($find) : '';

Yes, there is: Write one yourself. (Unless there already is one, which is always hard to be sure in PHP; see my next point.)
Ah, there it is: preg_replace_callback(). Of course, it's one of the three regex libraries and as such, does not do simple string manipulation.
Anyway, my point is: Do not follow PHP's [non-]design guidelines. Write your own multibyte-safe string substitution function with a callback, and do not use call_user_func().

PHP Search systems - how to use LIKE

The following code fetches the TITLE attribute for each post. I want to compare it to a search phrase $f using something similar to the way LIKE works with SQL.
<?php
$terms = $_GET['f'];
$searchs = get_posts($args);
foreach($searchs as $search){
$title = get_the_title($search);
Then we would need something like:
if($title is LIKE $f) { }
How would it work?

From the way you word this, I presume it cannot be done in the database with an actual LIKE or full-text search. I think what you're probably after is Regular Expressions, which are a little more advanced than the pattern matching provided by the SQL "LIKE" operator, but can be used for the same purpose.
PHP integrates the "Perl Compatible Regular Expressions" library. Have a look here: http://uk.php.net/manual/en/book.pcre.php or look for introductions to standard regular expression syntax online.
As a simple case, you might want to do this:
// Compose your matching pattern
// Be sure to escape user-supplied parts
// ".*" is roughly equivalent to "%" in LIKE
// the "/i" modifier means "case insensitive"
$pattern = '/.*' . preg_quote($_GET['f']) . '.*/i';
$searches = get_posts($args);
foreach($searches as $search)
{
$title = get_the_title($search);
if ( preg_match($pattern, $title) )
{
# FOUND A MATCH!
}
}

LIKE is a MySQL keyword, you have to use that in an SQL statement like so:
SELECT * FROM posts WHERE title LIKE '%keyword%';
If you want to do your search using PHP, you can use strpos like so:
if(strpos($title, $keyword) !== FALSE) { }

You need to lookup MySQL or PostgreSQL full text search. LIKE isn't what you want if you plan on doing "fuzzy" searches or need more power than simple string comparisons.

replace all smileys by enumeration in php

Is there any fast (regex-based?) method to replace all smileys in a text, each by an unique corresponding identifier? For example, the first occurrence of :) should be replaced by smiley1, the :)) by smiley2 and another occurrence of :) by smiley1 again? Furthermore, the identifyier should be the same using different text for input
Any potential combination of the typical symbols (<5 chars?) such as :;-()&%}{[]D<>30_o should be recognizable.
Can this be done without a generating a large array of all combinations? In case, how?

Are you looking for preg_replace_callback()? You can even use closures in php 5.3. I am not clear on what the objective is, so at this point this is the best I can provide, if you can clarify, then maybe I can see what I can come up with for sample code.
edit, here's an example from the PHP manual. Doesn't help in this case specifically, but if you just change the regex, the function and the string (basically everything, lol), then it will do the job:
<?php
echo preg_replace_callback('/-([a-z])/', function ($match) {
return strtoupper($match[1]);
}, 'hello-world');
// outputs helloWorld
?>

I don't understand why you can't do:
str_replace(":))","<img src=\"smiley1.jpg\">",$STRING)
str_replace(":)","<img src=\"smiley2.jpg\">",$STRING)
etc... seems to be the most simple solution and logical

Obviously, it cannot be done by using such a str_replace. How would you fetch a ":)))" or maybe a "-.-" which is also not present in your list? Enumerating all potential smileys is a hard task, resulting in n!/(n-k)! candidates. Here, in the example provided above n=18 and k=5...
Thus, I'm asking for a way to use a regex - but I don't how to replace each combination of chars which is intended to represent a smiley each time by the same text.
Idea: is it possible to use a callback function in combination with a hash?

Yeah, Tim! That is exactely what came into my mind when writing the last post. So the solution is
<?php
echo preg_replace_callback("/([\)\(\[\]<>#-\.:;*+{}]{2,9})/", function ($match) {
return " ".md5($match[1])." ";
}, ':::-) :-)) nope (yeah) cool:) }:)');
?>
Thanks!

match any part of a URL

Can anyone help me figure out how to search a URL and print a class based on its result for example:
http://www.?.com/garden-design.html
What i am trying to achieve is using a switch using PHP mathcing a term of say "garden" after the first trailing slash then it would print a class. If it matched construction it would print a different class. There are only three so i know which words to search for. This doesnt have to be dynamic.
Any help would be appreciated.

If it does not have to be dynamic, you could do it like this:
switch(true)
{
case stripos($_SERVER['REQUEST_URI'], 'garden'):
return 'garden';
break;
case stripos($_SERVER['REQUEST_URI'], 'construction'):
return 'construction';
break;
default:
return 'default';
break;
}
The above is quite explicit. A more compact solution could be
function getCategory($categories)
{
foreach($catgories as $category) {
if stripos($_SERVER['REQUEST_URI'], $category) {
return $category;
}
}
}
$categories = array('garden', 'construction', 'indoor');
echo getCategory($categories);
This will not give you the first word after /, but just check if one of your keywords exists in the requested URI and return it.
You could also use parse_url to split the URI into it's components and work with String functions on the path component, e.g.
$path = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);
In www.example.com/garden-design.html?foo=bar, this would give you just the part garden-design.html. In your scenario you'd probably still do the stripos on it then, so you can just as well do it directly on the URL instead of parsing it.

I'd have thought that making use of parse_url would probably be a good starting point - you could then explode the path component and then do simply strpos comparisons against the three strings you're looking for, if a simple switch statement isn't sufficient. (Be sure to check for !== false if you go down the strpos route.)
This would potentially be faster than a regex based solution.

You can try a regular expression:
/http://www..+.com/(garden|construction|indoor)(-design)?.html/
then $1 would give you garden, construction or indoor as string.
you can also use (.+) to collect any string at that spot.
Update made "-design" optional.

Take a look at the php function strpos(). With it you could do something like:
$url = "http://www.nothing.com/garden-design.html";
if (strpos($url, "/garden-design.html"))
doSomething();
else if (strpos($url, "/construction-design.html"))
doSomethingElse();
else if (strpos($url, "/indoor-design.html"))
doSomethingElseInstead();
String matching is generally quicker and more efficient than regular expressions.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Custom REGEXP Function to be used in a SQLITE SELECT Statement - php

Is this what you are looking for? e.g. SELECT * FROM `my_table` WHERE `my_column` REGEXP "\/foo(.?)"

Related

PHP regex parsing - splitting tokens in my own language. Is there a better way?

str_replace: Replace string with a function

PHP Search systems - how to use LIKE

replace all smileys by enumeration in php

match any part of a URL

Categories

Resources