This question already has answers here:
Regular expression to match common SQL syntax?
(13 answers)
Closed 9 years ago.
I need to retrieve table parent-child relationship from "WHERE" clause like this:
select ... large list of fields with aliases ...
from ... list of joined tables ...
where ((`db_name`.`catalog`.`group` = `db_name`.`catalog_group`.`iden`)
and (`db_name`.`catalog`.`iden` = `db_name`.`catalog_sub`.`parent`))
Is there a some regex to get identifiers from each condition? Say in an array element[0] = table from the left side, element[1] is table from right. Ident's name may be any. So only sql operators like 'where' 'and' '=' may be keys.
Any help would be greatly appreciated.
CLARIFY
I dont want to get references from WHERE clause by WHERE clause. I just want references as such. So as could I see there may be regex to replace all sequences
`.`
to
.
and then match all backticked pairs by
` # ` = ` # `
Backticks around identifier always present in any may query by default. All string values surrounded by double quotes by default. I thought it's not a complex task for regex guru. Thanks in advance.
PS It's because myISAM engine does not support references I forced to restore in manually.
ENDED with:
public function initRef($q) {
$s = strtolower($q);
// remove all string values within double quotes
$s = preg_replace('|"(\w+)"|', '', $q);
// split by 'where' clause
$arr = explode('where', $s);
if (isset($arr[1])) {
// remove all spaces and parenthesis
$s = preg_replace('/\s|\(|\}/', '', $arr[1]);
// replace `.` with .
$s = preg_replace('/(`\.`)/', '.', $s);
// replace `=` with =
$s = preg_replace("/(`=`)/", "=", $s);
// match pairs within ticks
preg_match_all('/`.*?`/', $s, $matches);
// recreate arr
$arr = array();
foreach($matches[0] as &$match) {
$match = preg_replace('/`/', '', $match); // now remove all backticks
$match = str_replace($this->db . '.', '', $match); // remove db_name
$arr[] = explode('=', $match); // split by = sign
}
$this->pairs = $arr;
} else {
$this->pairs = 0;
}
}
Using a regular expression seems like it won't help you. What if there are subqueries? What if your query contains a string with the text "WHERE" in it? Hakre mentioned it in a comment above, but your best bet really is using something that can actually interpret your SQL so that you can find what really is a proper WHERE clause and what is not.
If you insist on doing this the "wrong" way instead of by using some context aware parser, you would have to find the WHERE clause, for instance like this:
$parts = explode('WHERE', $query);
Assuming there is only one WHERE clause in your query, $parts[1] will then contain everything from the WHERE onwards. After that you would have to detect all valid clauses like ORDER BY, GROUP BY, LIMIT, etc. that could follow, and break off your string there. Something like this:
$parts = preg_split("/(GROUP BY|ORDER BY|LIMIT)|/", $parts[1]);
$where = $parts[0];
You would have to check the documentation for your flavor of SQL and the types of queries (SELECT, INSERT, UPDATE, etc.) you want to support for a full list of keywords that you want to split on.
After that, it would probably help to remove all brackets because precedence is not relevant for your problem and they make it harder to parse.
$where = preg_replace("/[()]/", "", $where);
From that point onward, you'd have to split again to find all the separate conditions:
$conditions = preg_split("/(AND|OR|XOR)/", $where);
And lastly, you'd have to split on operators to get the right and left values:
foreach ($conditions as $c)
{
$idents = preg_split("/(<>|=|>|<|IS|IS NOT)/");
}
You would have to check that list of operators and add to it if needed. $idents now has all possible identifiers in it.
You may want to note that several of these steps (at the very least the last step) will also require trimming of the string to work properly.
Disclaimer: again, I think this is a very bad idea. This code only works if there is only one WHERE clause and even then it depends on a lot of assumptions. A complicated query will probably break this code. Please use a SQL parser/interpreter instead.
Related
This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 2 years ago.
I have to modify an URL like this:
$string = "/st:1/sc:RsrlYQhSQvs=/fp:1/g:3/start:2015-07-01/end:2015-07-30";
Namely, I want to delete st:1 with a regex. I used:
preg_replace("/\/st:(.*)\//",'',$string)
but I got
end:2015-07-30
while I would like to get:
/sc:RsrlYQhSQvs=/fp:1/g:3/start:2015-07-01/end:2015-07-30
Same if I would like to delete fp:1.
You can use:
$string = preg_replace('~/st:[^/]*~','',$string);
[^/]* will only match till next /
You are using greedy matching with . that matches any character.
Use a more restricted pattern:
preg_replace("/\/st:[^\/]*/",'',$string)
The [^\/]* negated character class only matches 0 or more characters other than /.
Another solution would be to use lazy matching with *? quantifier, but it is not that efficient as with the negated character class.
FULL REGEX EXPLANATION:
\/st: - literal /st:
[^\/]* - 0 or more characters other than /.
You need to add ? in your regex:-
<?php
$string = "/st:1/sc:RsrlYQhSQvs=/fp:1/g:3/start:2015-07-01/end:2015-07-30";
echo preg_replace("/\/st:(.*?)\//",'',$string)
?>
Output:- https://eval.in/397658
Based on this same you can do for next things also.
Instead of using regex here you should make parsing utility functions for your special format string, they are simple, they don't take to long to write and they will make your life a lot easier:
function readPath($path) {
$parameters = array();
foreach(explode('/', $path) as $piece) {
// Here we make sure we have something
if ($piece == "") {
continue;
}
// This here is just a fancy way of splitting the array returned
// into two variables.
list($key, $value) = explode(':', $piece);
$parameters[$key] = $value;
}
return $parameters;
}
function writePath($parameters) {
$path = "";
foreach($parameters as $key => $value) {
$path .= "/" . implode(":", array($key, $value));
}
return $path;
}
Now you can just work on it as a php array, in this case you would go:
$parameters = readPath($string);
unset($parameters['st']);
$string = writePath($parameters);
This makes for much more readable and reusable code, additionally since most of the time you are dealing with only slight variations of this format you can just change the delimiters each time or even abstract these functions to using different delimiters.
Another way to deal with this is to convert the string to conform to a normal path query, using something like:
function readPath($path) {
return parse_str(strtr($path, "/:", "&="));
}
In your case though since you are using the "=" character in a url you would also need to url encode each value so as to not conflict with the format, this would involve similarly structured code to above though.
I think my question is so easy to be solved, but I can't.
I want to take this words inside of my query string:
$string = "INSERT INTO table (a,b) VALUES ('foo', 'bar')";
Expected result:
array one = [a,b]
array two = [foo, bar]
There are many regex strategies you could use for this depending on how flexible you need it to be. Here is one very simple implementation which assumes that you know the string 'VALUES' is in all caps, and there is exactly one space before and after 'VALUES' and the two sets of parenthesis.
$string = "INSERT INTO table (a,b) VALUES ('foo', 'bar')";
$matches = array();
// we escape literal parenthesis in the regex, and also add
// grouping parenthesis to capture the sections we're interested in
preg_match('/\((.*)\) VALUES \((.*)\)/', $string, $matches);
// make sure the matches you expect to be found are populated
// before referencing their array indices. index 0 is the match of
// our entire regex, indices 1 and 2 are our two sets of parens
if (!empty($matches[1]) && !empty($matches[2]) {
$column_names = explode(',', $matches[1]); // array of db column names
$values = explode(',', $matches[2]); // array of values
// you still have some clean-up work to do here on the data in those arrays.
// for instance there may be empty spaces that need to be trimmed from
// beginning/ending of some of the strings, and any of the values that were
// quoted need the quotation marks removed.
}
This is only a starting point, be sure to test it on your data and revise the regex as needed!
I recommend using a regex tester to test your regex string against actual query strings you need it to work on. http://regexpal.com/ (There are many others)
I am trying to use a License PHP System…
I will like to show the status of their license to the users.
The license Server gives me this:
name=Service_Name;nextduedate=2013-02-25;status=Active
I need to have separated the data like this:
$name = “Service_Name”;
$nextduedate = “2013-02-25”;
$status = “Active”;
I have 2 days tring to resolve this problem with preg_match_all but i cant :(
This is basically a query string if you replace ; with &. You can try parse_str() like this:
$string = 'name=Service_Name;nextduedate=2013-02-25;status=Active';
parse_str(str_replace(';', '&', $string));
echo $name; // Service_Name
echo $nextduedate; // 2013-02-25
echo $status; // Active
This can rather simply be solved without regex. The use of explode() will help you.
$str = "name=Service_Name;nextduedate=2013-02-25;status=Active";
$split = explode(";", $str);
$structure = array();
foreach ($split as $element) {
$element = explode("=", $element);
$$element[0] = $element[1];
}
var_dump($name);
Though I urge you to use an array instead. Far more readable than inventing variables that didn't exist and are not explicitly declared.
It sounds like you just want to break the text down into separate lines along the semicolons, add a dollar sign at the front and then add spaces and quotes. I'm not sure you can do that in one step with a regular expression (or at least I don't want to think about what that regular expression would look like), but you can do it over multiple steps.
Use preg_split() to split the string into an array along the
semicolons.
Loop over the array.
Use str_replace to replace each '=' with ' = "'.
Use string concatenation to add a $ to the front and a "; to the end of each string.
That should work, assuming your data doesn't include quotes, equal signs, semicolons, etc. within the data. If it does, you'll have to figure out the parsing rules for that.
Suppose I have the following string:
insert into table values ('text1;'); insert into table values ('text2')
How do I break those queries (get each individual query) using regular expressions?
I've found a very similar problem: Use regex to find specific string not in html tag ...but it uses a solution that is specific to .NET: behind lookup (in php it complains that is not fixed length).
I would be very grateful if someone could give me some hints on how to deal with this problem.
The trick is to count how many unescaped quote characters you've passed. Assuming that the SQL is syntactically correct, semicolons after an even number of unescaped quote characters will be the ones you want, and semicolons after an odd number of unescaped quote characters will be part of a string literal. (Remember, string literals can contain properly escaped quote characters.)
If you want 100% reliability, you'll need a real SQL parser, like this. (I just Googled "SQL parser in PHP". I don't know if it works or not.)
EDIT:
I don't think it's possible to find pairs of unescaped quote characters using nothing but regex. Maybe a regex guru will prove me wrong, but it just seems too damn difficult to distinguish between escaped and unescaped quote characters in so many possible combinations. I tried look-behind assertions and backrefs with no success.
The following is not a pure-regex solution, but I think it works:
preg_match_all("/(?:([^']*'){2})*[^']*;/U", str_replace("\\'", "\0\1\2", $input), $matches);
$output = array_map(function($str){ return str_replace("\0\1\2", "\\'", $str); }, $matches[0]);
Basically, we temporarily replace escaped quote characters with a string of bytes that is extremely unlikely to occur, in this case \0\1\2. After that, all the quote characters that remain are the unescaped ones. The regex picks out semicolons preceded by an even number of quote characters. Then we restore the escaped quote characters. (I used a closure there, so it's PHP 5.3 only.)
If you don't need to deal with quote characters inside string literals, yes, you can easily do it with pure regex.
Assuming proper SQL syntax it would probably be best to split on the semicolon.
The following regexp will work but only if all quotes come in pairs.
/.+?\'.+?\'.*?;|.+?;/
To avoid escaped single quotes:
/.+?[^\\\\]\'.+?[^\\\\]\'.*?;|.+?;/
To handle Multiple pairs of single quotes.
/.+?(?:[^\\]\'.+?[^\\]\')+.*?;|.+?;/
Tested against the following data set:
insert into table values ('text1;\' ','2'); insert into table values ('text2'); insert into test3 value ('cookie\'','fly');
Returns:
insert into table values ('text1;\' ','2');
insert into table values ('text2');
insert into test3 value ('cookie\'','fly');
I have to admit this is a pretty dirty regexp. It would not handle any sort of SQL syntax errors at all. I enjoyed the challenge of coming up with a pure regex though.
How you want to break?
You can use explode( ' ', $query ) to transform into an array.
Or if you want to get text1 and text2 values with regexp you can use preg_match( '/(\'([\w]+)\')/', $query, $matches ) where $matches[1] is your value.
preg_match_all( '/([\w ]+([\w \';]+))/', $queries, $matches ) will give to you all matches with this pattern of query.
Regex's aren't always good at this type of thing. The following function should work though:
function splitQuery($query) {
$open = false;
$buffer = null;
$parts = array();
for($i = 0, $l = strlen($query); $i < $l; $i++) {
if ($query[$i] == ';' && !$open) {
$parts[] = trim($buffer);
$buffer = null;
continue;
}
if ($query[$i] == "'") {
$open = ($open) ? false: true;
}
$buffer .= $query[$i];
}
if ($buffer) $parts[] = trim($buffer);
return $parts;
}
Usage:
$str = "insert into table values ('text1;'); insert into table values ('text2')";
$str = splitQuery($str);
print_r($str);
Outputs:
Array
(
[0] => insert into table values ('text1;')
[1] => insert into table values ('text2')
)
I have a really long string in a certain pattern such as:
userAccountName: abc userCompany: xyz userEmail: a#xyz.com userAddress1: userAddress2: userAddress3: userTown: ...
and so on. This pattern repeats.
I need to find a way to process this string so that I have the values of userAccountName:, userCompany:, etc. (i.e. preferably in an associative array or some such convenient format).
Is there an easy way to do this or will I have to write my own logic to split this string up into different parts?
Simple regular expressions like this userAccountName:\s*(\w+)\s+ can be used to capture matches and then use the captured matches to create a data structure.
If you can arrange for the data to be formatted as it is in a URL (ie, var=data&var2=data2) then you could use parse_str, which does almost exactly what you want, I think. Some mangling of your input data would do this in a straightforward manner.
You might have to use regex or your own logic.
Are you guaranteed that the string ": " does not appear anywhere within the values themselves? If so, you possibly could use implode to split the string into an array of alternating keys and values. You'd then have to walk through this array and format it the way you want. Here's a rough (probably inefficient) example I threw together quickly:
<?php
$keysAndValuesArray = implode(': ', $dataString);
$firstKeyName = 'userAccountName';
$associativeDataArray = array();
$currentIndex = -1;
$numItems = count($keysAndValuesArray);
for($i=0;$i<$numItems;i+=2) {
if($keysAndValuesArray[$i] == $firstKeyName) {
$associativeDataArray[] = array();
++$currentIndex;
}
$associativeDataArray[$currentIndex][$keysAndValuesArray[$i]] = $keysAndValuesArray[$i+1];
}
var_dump($associativeDataArray);
If you can write a regexp (for my example I'm considering there're no semicolons in values), you can parse it with preg_split or preg_match_all like this:
<?php
$raw_data = "userAccountName: abc userCompany: xyz";
$raw_data .= " userEmail: a#xyz.com userAddress1: userAddress2: ";
$data = array();
// /([^:]*\s+)?/ part works because the regexp is "greedy"
if (preg_match_all('/([a-z0-9_]+):\s+([^:]*\s+)?/i', $raw_data,
$items, PREG_SET_ORDER)) {
foreach ($items as $item) {
$data[$item[1]] = $item[2];
}
print_r($data);
}
?>
If that's not the case, please describe the grammar of your string in a bit more detail.
PCRE is included in PHP and can respond to your needs using regexp like:
if ($c=preg_match_all ("/userAccountName: (<userAccountName>\w+) userCompany: (<userCompany>\w+) userEmail: /", $txt, $matches))
{
$userAccountName = $matches['userAccountName'];
$userCompany = $matches['userCompany'];
// and so on...
}
the most difficult is to get the good regexp for your needs.
you can have a look at http://txt2re.com for some help
I think the solution closest to what I was looking for, I found at http://www.justin-cook.com/wp/2006/03/31/php-parse-a-string-between-two-strings/. I hope this proves useful to someone else. Thanks everyone for all the suggested solutions.
If i were you, i'll try to convert the strings in a json format with some regexp.
Then, simply use Json.