Get and replace quoted strings with regex - php

I'm trying to get strings inside a quote.
I'm using regex but i have problems with escaped quotes.
For example, i have this:
$var = "SELECT * FROM TABLE WHERE USERNAME='Carasuman'";
preg_match_all('~([\'"])(.*?)\1~s', $var, $result);
$new = preg_replace('~([\'"])(.*?)\1~s',"<#################>",$var);
The code Works perfect. I got a replaced value in $new and quoted value in $result[1]
$new = "SELECT * FROM TABLE WHERE USERNAME=<#################>";
$result[1] = "Carasuman";
My problem is when i add a scaped quote inside quotes:
$var = "SELECT * FROM TABLE WHERE USERNAME='Carasuman\'s'";
I got this:
$new = "SELECT * FROM TABLE WHERE USERNAME=<#################>'s";
$result[1] = "Carasuman\" //must be "Carasuman\'s";
How I can avoid this error and get $new and $result[1] like first example?:
$new = "SELECT * FROM TABLE WHERE USERNAME=<#################>";
$result[1] = "Carasuman\'s";
Thanks!

for the match, you're never going to get Carasuman's without the \ as a single matched element since you can have match skip over chars within a single match. its either going to grab the Carasuman or Carasuman\'sjust use str_replace to get rid of the backslash
preg_match_all('~([\'"])(.*)\1~s', $var, $result);
$result[2] = str_replace('\\','',$result[2]);
for the replace, the ? in the (.*?) group makes it ungreedy, meaning it will stop at the first match. Remove the ? in (.*?) to make it greedy, meaning it will keep going until the last match
preg_replace('~([\'"])(.*)\1~s',"<#################>",$var);
Edit
Rather than doing the str_replace after the match on $result[2], it would probably be better to just do beforehand on the initial string like:
$var = str_replace("\\'","'",$var);
preg_match_all('~([\'"])(.*)\1~s', $var, $result);
$new = preg_replace('~([\'"])(.*)\1~s',"<#################>",$var);
You still need to make your wildcard match greedy like (.*?) to (.*) in order to have the apostrophe in the name included in the match/replace instead of being counted as the terminating single quote

Why don't you do this:
$var = "SELECT * FROM TABLE WHERE USERNAME='" . mysql_real_escape_string($input) . "'";
I don't think you necessarily need to do regex. Also, mysql_real_escape_string properly escapes your inputs so you can just have $input = 'Carasuman\'s'; or $input = "Carasuman's";

To match quoted strings, you could use the regex '\'.*?(?:\\\\.[^\\\\\']*)*\'' and four double quoted strings '".*?(?:\\\\.[^\\\\"]*)*"'

Related

preg_replace with a word in an array

I am trying to use certain words in a array called keywords, which will be used to be replaced in a string by "as".
for($i = 0; $i<sizeof($this->keywords[$this->lang]); $i++)
{
$word = $this->keywords[$this->lang][$i];
$a = preg_replace("/\b$word\b/i", "as",$this->code);
}
It works with if I replace the variable $word with something like /\bhello\b/i, which then would replace all hello words with "as".
Is the approach am using even possible?
Before to be a pattern, it's a double quoted string, so variables will be replaced, it's not the problem.
The problem is that you use a loop to change several words and you store the result in $a:
the first iteration, all the occurences of the first word in $this->code are replaced and the new string is stored in $a.
but the next iteration doesn't reuse $a as third parameter to replace the next word, but always the original string $this->code
Result: after the for loop $a contains the original string but with only the occurences of the last word replaced with as.
When you want to replace several words with the same string, a way consists to build an alternation: word1|word2|word3.... It can easily be done with implode:
$alternation = implode('|', $this->keywords[$this->lang]);
$pattern = '~\b(?:' . $alternation . ')\b~i';
$result = preg_replace($pattern, 'as', $this->code);
So, when you do that, the string is parsed only once and all the words are replaced in one shot.
If you have a lot of words and a very long string:
Testing a long alternation has a significant cost. Even if the pattern starts with \b that highly reduces the possible positions for a match, your pattern will have hard time to succeed and more to fail.
Only in this particular case, you can use this another way:
First you define a placeholder (a character or a small string that can't be in your string, lets say §) that will be inserted in each positions of word boundaries.
$temp = preg_replace('~\b~', '§', $this->code);
Then you change all the keywords like this §word1§, §word2§ ... and you build an associative array where all values are the replacement string:
$trans = [];
foreach ($this->keywords[$this->lang] as $word) {
$trans['§' . $word . '§'] = 'as';
}
Once you have do that you add an empty string with the placeholder as key. You can now use the fast strtr function to perform the replacement:
$trans['§'] = '';
$result = strtr($temp, $trans);
The only limitation of this technic is that it is case-sensitive.
it will work if you keep it like bellow:
$a = preg_replace("/\b".$word."\b/i", "as",$this->code);

php preg_match fails when used with a variable

Excuse me, I am making a terrible mistake somewhere, but this is the situaton:
In php i have:
$ln = "A/RADIUS ADMITS VALUE 20";
$trg = "A/RADIUS";
$matches = array();
$zz = preg_match('#$trg\sADMITS\s(VALUE)\s([^\s]+)#',$ln,$matches);
I want to capture the word "VALUE" without quotes and the last word, here the string 20, given by ([^\s]+) . That is not a whitespace repeated more than once right?
But $zz is 0, indicating no match and $matches is empty. I also tried with
$zz = preg_match('#'.$trg.'\sADMITS\s(VALUE)\s([^\s]+)#',$ln,$matches);
same problem.
Where is the mistake I am stupidly making?
You are using single quotes to embed the $trg variable in the string, which only works when using double quotes to enclose the string. This should work:
$zz = preg_match("#$trg\sADMITS\s(VALUE)\s([^\s]+)#",$ln,$matches);

Parse WHERE clause with regex in php [duplicate]

This question already has answers here:
Regular expression to match common SQL syntax?
(13 answers)
Closed 9 years ago.
I need to retrieve table parent-child relationship from "WHERE" clause like this:
select ... large list of fields with aliases ...
from ... list of joined tables ...
where ((`db_name`.`catalog`.`group` = `db_name`.`catalog_group`.`iden`)
and (`db_name`.`catalog`.`iden` = `db_name`.`catalog_sub`.`parent`))
Is there a some regex to get identifiers from each condition? Say in an array element[0] = table from the left side, element[1] is table from right. Ident's name may be any. So only sql operators like 'where' 'and' '=' may be keys.
Any help would be greatly appreciated.
CLARIFY
I dont want to get references from WHERE clause by WHERE clause. I just want references as such. So as could I see there may be regex to replace all sequences
`.`
to
.
and then match all backticked pairs by
` # ` = ` # `
Backticks around identifier always present in any may query by default. All string values surrounded by double quotes by default. I thought it's not a complex task for regex guru. Thanks in advance.
PS It's because myISAM engine does not support references I forced to restore in manually.
ENDED with:
public function initRef($q) {
$s = strtolower($q);
// remove all string values within double quotes
$s = preg_replace('|"(\w+)"|', '', $q);
// split by 'where' clause
$arr = explode('where', $s);
if (isset($arr[1])) {
// remove all spaces and parenthesis
$s = preg_replace('/\s|\(|\}/', '', $arr[1]);
// replace `.` with .
$s = preg_replace('/(`\.`)/', '.', $s);
// replace `=` with =
$s = preg_replace("/(`=`)/", "=", $s);
// match pairs within ticks
preg_match_all('/`.*?`/', $s, $matches);
// recreate arr
$arr = array();
foreach($matches[0] as &$match) {
$match = preg_replace('/`/', '', $match); // now remove all backticks
$match = str_replace($this->db . '.', '', $match); // remove db_name
$arr[] = explode('=', $match); // split by = sign
}
$this->pairs = $arr;
} else {
$this->pairs = 0;
}
}
Using a regular expression seems like it won't help you. What if there are subqueries? What if your query contains a string with the text "WHERE" in it? Hakre mentioned it in a comment above, but your best bet really is using something that can actually interpret your SQL so that you can find what really is a proper WHERE clause and what is not.
If you insist on doing this the "wrong" way instead of by using some context aware parser, you would have to find the WHERE clause, for instance like this:
$parts = explode('WHERE', $query);
Assuming there is only one WHERE clause in your query, $parts[1] will then contain everything from the WHERE onwards. After that you would have to detect all valid clauses like ORDER BY, GROUP BY, LIMIT, etc. that could follow, and break off your string there. Something like this:
$parts = preg_split("/(GROUP BY|ORDER BY|LIMIT)|/", $parts[1]);
$where = $parts[0];
You would have to check the documentation for your flavor of SQL and the types of queries (SELECT, INSERT, UPDATE, etc.) you want to support for a full list of keywords that you want to split on.
After that, it would probably help to remove all brackets because precedence is not relevant for your problem and they make it harder to parse.
$where = preg_replace("/[()]/", "", $where);
From that point onward, you'd have to split again to find all the separate conditions:
$conditions = preg_split("/(AND|OR|XOR)/", $where);
And lastly, you'd have to split on operators to get the right and left values:
foreach ($conditions as $c)
{
$idents = preg_split("/(<>|=|>|<|IS|IS NOT)/");
}
You would have to check that list of operators and add to it if needed. $idents now has all possible identifiers in it.
You may want to note that several of these steps (at the very least the last step) will also require trimming of the string to work properly.
Disclaimer: again, I think this is a very bad idea. This code only works if there is only one WHERE clause and even then it depends on a lot of assumptions. A complicated query will probably break this code. Please use a SQL parser/interpreter instead.

What does '\" actually mean in PHP Syntax?

I have a piece of code and i keep getting syntax errors for codes like thess :
$query ="SELECT * from `jos_menu` where `id` = ".'\".$parent.'\";
Now when i reformat it as :
$query ="SELECT * from `jos_menu` where `id` = ".$parent;
That is when i remove : '\"
it works fine. So i am just wondering, what does ('\") actually do ???
\ is the escape character. It means the next character should be taken literally, without care for its special meaning.
In PHP, you would generally see '\" inside of a string if the string were delimited with double quotes (and the developer just wanted a preceding single quote).
It works fine because you have a numeric value - so mysql automatically converts a string to a number for you. So you get 2 different queries (assuming that $parent = 42;:
SELECT * from `jos_menu` where `id` = 42
vs
SELECT * from `jos_menu` where `id` = "42"
It denotes escaped characters. The next character that appear after it, will be taken as its current form.
Your Query is incorrectly escaped
$query ="SELECT * from `jos_menu` where `id` = ".'\".$parent.'\";
//^ You mismatched the quotes from here
A correctly escaped query should be
$query ="SELECT * from `jos_menu` where `id` = \"$parent\"";
// ^ Note here " will printed as it is within the query
For example,
If $parent was 2, then the query would be
SELECT * from `jos_menu` where `id` = "2"
The only problem with
$query ="SELECT * from `jos_menu` where `id` = ".'\".$parent.'\";
Is that you missed a few ':
$query ="SELECT * from `jos_menu` where `id` = ".'\"'.$parent.'\"';
In PHP, a string can either be:
$var = 'This is a string';
Or
$var = "This is a string";
If you want to put " inside a string that you already started with ", you need tell PHP that you don't want your second " to end the string but use the character " as part of the string itself. This is what \" does. It tells PHP that Don't give the " character any special meaning; since normally if you started the string with ", the next " would end the string.
\ means remove any "special" meaning to the next character
This only works if the character after the \ would have had special meaning. Some examples:
Suppose we want to print Hello "World". I am a string!:
$var = "Hello "World". I am a string!";
In this example we will have errors. Since we started the string with ", the next " will close the string. So what PHP thinks:
" Start of string
Hello part of string variable.
" Hey, since I saw that the string was started with ", this must mean the end of it!
World" <-- Error
Stop processing and throw errors.
However, if we write:
$var = "Hello \"World\". I am a string!";
Now, PHP thinks:
" Start of string
Hello part of string variable
\ Ah, okay, the next character I should remove any special meaning
" Okay, this is immediately after \, so I just use it normally, as a ".
World part of string
\ Okay, the next character I will remove any special meaning
" This is now a normal "
. I am a string! - part of string variable.
" Ah! Since the string was started with ", this must be the ending.
; ends statement.
Hopefully this clarifies things for you.
A few things:
To denote the next character a literal, '\'' // outputs a single '
Special characters, \n newline, \t tab character etc
The back-slash escapes next charactor after it; in your example this would work:
$query = "SELECT * from jos_menu where id = ".$parent;
But so would this:
$query = "SELECT * from jos_menu where id = $parent";
When escaping quotations, it varies on the type of parenthesis used. With double parenthesis, you can include the variable right into the string, just be careful of accessing arrays by key:
$var = "This \"works\" ".$fine.".";
$var = "This 'also' works just $fine.";
$var = "This $will['fail'].";
$var = "However, $this[will] work and so ".$will['this'].".";
Same rules apply for single parenthesis.

PHP: Regular Expression to delete text from a string if condition is true

I have a variable containing a string. The string can come in several ways:
// some examples of possible string
$var = 'Tom Greenleaf (as John Dunn Hill)'
// or
$var = '(screenplay) (as The Wibberleys) &'
// or
$var = '(novella "Four Past Midnight: Secret Window, Secret Garden")'
I need to clean up the string to get only the first element. like this:
$var = 'Tom Greenleaf'
$var = 'screenplay'
$var = 'novella'
I know this is a bit complicated to do but there is a pattern we can use.
Looking at the first variable, we can first check if the string contains the text ' (as'. If it does then we need to delete ' (as' and everything that comes after that.
In the second variable the rule we made in the first one also applies. We just need also to delete the parenthesis .
In the third variable if ' (as' was not found we need to get the first word only and then delete the parenthesis.
Well, thas all. I just need someone to help me do it because I'm new to PHP and I don't know regular expressions.... or another way to do it.
Thanks!!!!
actually, there's no need for complex regex
$var = 'Tom Greenleaf (as John Dunn Hill)';
$var = '(novella "Four Past Midnight: Secret Window, Secret Garden")';
$var = '(screenplay) (as The Wibberleys) &';
if ( strpos($var,"(as") !== FALSE ){
# get position of where (as is
$ind = strpos($var,"(as");
$result = substr($var,0,$ind);
}else {
# if no (as, split on spaces and get first element
$s = preg_split("/\s+/",$var);
$result = $s[0];
}
print preg_replace( "/\(|\)/","",$result);
Try this regular expression:
(?:^[^(]+|(?<=^\()[^\s)]+)
This will get you either anything up to the first parenthesis or the first word inside the first parenthesis. Together with preg_match:
preg_match('/(?:^[^(]+|(?<=^\\()[^\\s)]+)/', $var, $match);
try this one
^\((?<MATCH>[^ )]+)|^(?<MATCH>[^ ]+)

Categories