Dynamic regex to capture - php

Not sure if it is actually possible, but consider the following text:
INSERT INTO cms_download_history
SET
user_id = '{$userId}',
download_id = '{$fileId}',
remote_addr = '{$remote_addr}',
doa = GetDate()";
I want to change that to be:
INSERT INTO cms_download_history
(user_id,download_id,remote_addr,doa)
VALUES('{$userId}','{$fileId}','{$remote_addr}',GetDate());
Doing a regex to find and replace this one is easy as I know how many columns I have but what if I am trying to do this for multiple similar queries without knowing the number of columns, i.e.:
INSERT INTO mystery_table
SET
col1 = val1
col2 = val2
.... unknown number of columns and values.
Is there a dynamic regex that I can write that would detect that example?

Actually, if all queries look like this, with only a variable amount of columns, you can get the field names using a somewhat simple regex:
(\w+)\W*=\W*['"].+?(?!\\)['"],
Here is an example. Here is what it does:
It captures one or more word characters, if followed by:
Zero or more whitespace characters
An equal sign
Zero or more whitespace characters (again)
A ' or " (start of a String)
One or more characters
An unescaped ' or "
A comma
Note that this does assume that all values are strings. If you also need support for numbers, please let me know.

Related

REGEXP not working as expected, inside a PHP script

I have this issue with mysql when querying a DB inside PHP.
The PHP code is:
$Query = "SELECT COUNT(*) FROM theTable WHERE fieldValue REGEXP 'Dom-R[eéèêë]my'";
$DBR = mysql_query($Query,$Connection);
I am expecting this query to get things like, I mean find the number of those:
Dom-Remy
Dom-Rémy
Dom-Rèmy
...etc...
But I get nothing, I mean zero. What is wrong in the code? I have tried several variations, all equally not working.
This is subject of Unicode characters.
What happens is that e,é,è,ê,ë.. in your example is not a single letter but 2 because the tilde counts as a character as well. This brings lots of complexities and rules that needs to be followed in order to meet Unicode rules.
You could do something like: ([\x{0049}-\x{0130}]) to search letters with tildes but this expression may vary depending if you are going to use this expression on .net, java, javascript or php.
You could also check what code each character represents here:
http://www.fileformat.info/info/unicode/char/search.htm?q=%C4%B0&preview=entity
As per official website specification, MySQL regex is matched in byte-wise fashion
The REGEXP and RLIKE operators compare characters by their
byte values and accented characters may not compare as equal even if a
given collation treats them as equal.
If you can match any character in place of [eéèêë], this should be sufficient:
$Query = "SELECT COUNT(*) FROM theTable WHERE field REGEXP '^Dom-R.+?my$'";
If
the column's CHARACTER SET is utf8 or utf8mb4, and
your connection between client and mysql server is also either of those character set, and
you are not using COLLATION utf8_bin, then
'Dom-Remy' = 'Dom-Rémy' = ...
WHERE ... = ... and WHERE ... LIKE ... will abide by the above. REGEXP (RLIKE) cannot be used, for the reasons already discussed.
This shows what is equal (for = and LIKE.)
If you are simply searching a string for Dom-Remy, use
fieldValue LIKE '%Dom-Remy%`
and instead of regexp/rlike
If you have something more complex that needs REGEXP, then start a new question with the details.

sql returning 0 rows even when where clause mathches

I am trying to execute
SELECT * FROM `product_laptop` WHERE name = "Acer Sdfsdf"
MySQL returned an empty result set (i.e. zero rows). (Query took 0.0010 seconds.)
even though there is an entry with the name Acer  Sdfsdf`
and name is also defined as unique key.
Maybe you should try like to get the correct data:
SELECT * FROM `product_laptop` WHERE name LIKE "Acer%"
Would that suit your needs?
The = requires an exact match, using like with the % wildcard will give you the results you want. This won't match exact records though and will allow for all sorts of variations.
SELECT * FROM `product_laptop` WHERE name like "Acer%Sdfsdf"
With LIKE you can use the following two wildcard characters in the pattern:
% matches any number of characters, even zero characters.
_ matches exactly one character.
http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html
You can use:
$name = preg_replace('/\s+/', '%', trim($name));
to convert all whitespaces to wildcards.

Having a little trouble with my RegEx today

I am having a bit of a time with my RegEx today
\('[\d',]+
In the string:
INSERT INTO `order_status_histories` VALUES ('3602','52efabe9-5f8c-4512-a994-3227c63dd20e','1','','Order recieved','2014-02-03 16:47:05','2014-02-03 16:47:05'),('3603','52eff713-54fc-4be0-9389-68d5c63dd20e','1','','Order recieved','2014-02-03 22:07:47','2014-02-03 22:07:47'),('3604','52effd1a-bc14-4095-97fd-6d46c63dd20e','1','','Order recieved','2014-02-03 22:33:30','2014-02-03 22:33:30')
As you can see this is an insert statement, however, that 1st value is the ID of the record, which I do not need inserted, so I am attempting to find all of them, and simply blank them out... but I need to #1 get that number, the 2 ' characters, and the , after it in order to do so... so I though that I would start with the opening (.
The regex I posted in here is grabbing what I need, but a bit extra... it seems to be grabbing this ('3670','5304 (for instance in that first insertable record)
How can I do what I need here?
What about \('\d+', - so explicitly looking for digits, then ' then ,
the character class [\d',] isn't doing what you want - its matching both the opening quote of the second field and the decimal digits after it until you get to a letter

Select query to retrieve text column with hyphen, comma, period and parantheses

Thanks for taking time to read my question.
I have created a MySQL table, a HTML form and a program in PHP which connects the form to MySQL table and retrieves sequences for column Annotations which is text data type.
This column has characters and also has one or more of hyphen, comma, parentheses, period or spaces.
Please look at the following code that I used for select query:
$values=mysql_query("SELECT Sequence
FROM oats
WHERE Foldchange = '$Foldchange' AND
RustvsMockPvalue = '$RustvsMockpvalue' AND
Annotations REGEXP '%$Annotation%[-]+'");
Here $Annotation is the form variable which holds the value entered by the user in the form. Annotations is the column name in the MySQL table.
Annotations column has characters A-Z or a-z and one or more of hyphen, comma, space or parentheses like the following.
Sequence is another text column in the MySQL table but does not have ,./().
Example data from Annotations column:
ADP, ATP carrier protein,  mitochondrial precursor (ADP/ATP translocase) (Adenine nucleotide translocator) (ANT).
I am not able to retrieve Sequence column data when I search for any Annotations column data with comma, parentheses, period and slash. It works fine for those records which does not have these ,.()/.
I tried to use LIKE instead of REGEX but it didn't work either.
A record from mysql table:(columns that you see below: contigid,source,genelength,rustmeans, mockmeans,foldchange,pvalue,rustmockteststatistic,Annotations and Sequence)
as_rcr_contig_10002 ORME1 2101 506.33 191 -2.18 2.21E-10 -6.35 Tesmin/TSO1-like, CXC domain containing protein. AACAATTCCCCTCAACCAACCTTTTATTTCATCCCATTTTTATCATCTGTCCGGTTACAGATTTTGCTTCCAGTTAGGTGCCACTTCTTCAAACGCTCAACCCTTACCCACTACCACCCCACCAAAACCAACCCCCCAAGATGCAGTTCATCACTCTCGCCGTTGCTTTTGCTTTCTTTGCTGGTGCCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTTTTGCTTTCTTTGCTGGTGCCACCTCGTCGCCGGTTTCCATGGACCCCAAAGCCGAGAAGTCCGGCTCCTCGGGATCCGGTGGCGCCCCTCTGGGCACTGCTAGCCCCTATCCCCAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTGGCCCTCAGTCGCCAGGCTCTGGCCAACCCGGTAGGATGCCATGGGGTAGCGACCAATCTGCCTACGGTGGTGGTTTCCCTTATGGATCATTCCCCTCGGTTTCGGGGCAATCCCAATCGACGGCCTATGCTCAAGCTCAATCATCCAGTTTCCCCTCAAACGGTGTCCCGACACACTCCTCGGCCTCCGCCCAAGCGCAATCATCCGGTCCTGGACAAGCTCAGGCAGCCGCTTCTGCCCAGGTTCCCGGCGGCCCCCACGGTCAAGGTTCTAACGGATTTGGCGCACAAGGCCAGTTTGGACAGAACGGGCAGAACGGCCTCTATGGTCAAGACGGCAATGGCTTTAGTGCCCAAGGCCAATTTGGACAGAGTGGACAGAATGGCTTCTATGGTCA
Could someone please help me in the correct syntax of the SELECT syntax? Thank you.
You need to familiarise yourself with regex - it's its ownittke language.
Use REGEXP with the right regex:
WHERE ...
AND Annotations REGEXP '[-A-Za-z(). ]+'
AND Annotations NOT_REGEXP '[A-Za-z]+'
If mysql supported regex look aheads, this could be done in one test.
,
First of all, you are not using REGEXP properly.
You should check the differences between LIKE and REGEXP.
REGEXP use Regular expresions, which have very particular syntax.
LIKE use simple text remplacement with key characters like % or _
Here you are using REGEXP with %, that's why it's not working. % is a key character for LIKE only.
But in REGEXP, . and - are special characters that you need to escape to.
If you want to check several characters, REGEXP is the way to go :
Annotations REGEXP '.*$Annotation.*[\-(),\.]+.*'
This match :
.* : 0 to n characters
$Annotation : Your keyword
.* : 0 to n characters
[\-(),\.]+ : At least 1 character from the list : - ( ) , .
.* : 0 to n characters
Tell us if that match your data.
Since we can't craft a Regular Expression that would work in your case without getting into some crazy matching schemes (orders and so forth), In order to find what you're looking for, you'll need to custom construct the SQL statement and luckily you're using PHP.
Here I'm starting with a simple space delimited entry. Remember that you can't wrap something with parenthesis because the parenthesis might not match up in your result set.
$search_input = 'ADP ANT';
//example of array from a search page full of check boxes or fields
$annSearches = explode(' ',$search_input);
/*annSearches is now and array with ADP,ANT*/
$sql = "SELECT Sequence FROM oats WHERE Foldchange = '$Foldchange' AND RustvsMockPvalue = '$RustvsMockpvalue'";
foreach ($annSearches as $Annotation){
$sql .= " AND Annotations LIKE '%$Annotation%'";
}
The output SQL statement would look like this (wrapped for clarity):
SELECT Sequence FROM oats WHERE
Foldchange = '$Foldchange'
AND RustvsMockPvalue = '$RustvsMockpvalue'
AND Annotations LIKE '%ADP%'
AND Annotations LIKE '%ANT%';
If you do a really long query, this will get slower and slower as MySQL has to run through every record in the database over and over for the results.
FULLTEXT SEARCH OPTION
Another way that you could potentially do this is to enable FULLTEXT search functionality on the Annotations field in the table in the database.
ALTER TABLE oats ADD FULLTEXT(Annotations);
This would allow you to do a search something like this:
Sequence FROM oats WHERE
Foldchange = '$Foldchange'
AND RustvsMockPvalue = '$RustvsMockpvalue'
MATCH(Annotations) AGAINST ('ADP ANT')

REGEXP: get the contents of a PHP variable, the assignation itself. Specifically, searching variables with SQL queries

I think I'm going crazy with this...
I've tried a lot of combinations and I can't get with the good one.
I need to find all the SQL queries in a PHP code after having read it with a file_get_contents().
Of course, all those queries are variable assignations like:
$sql1 = "
SELECT *
FROM users u
WHERE u.name LIKE '%".$name."%' AND ... ;
";
or
$sql2 = "
SELECT *
FROM users u
WHERE u.id = ".$user_id;
or
$sql3 = '
SELECT *
FROM users u
ORDER BY u.surname1 DESC
'; //this query blablabla.......
So you can see that there are many factors to take in account for PHP variables.
First I've tried with an approximation based on getting the variable itself combined with the getting it's content...
I've tried to find specific words from SQL in a regex pattern too...
Whatever...
I don't know how to do it.
Getting all the variable and it's assignation, grouping the assignation and after it, looping through matches searching for special SQL words (that's what I've right now, but it doesn't work cause assignation regex part).
Directly searching for a SQL queries with a good regex?
PHP variables (specifically strings), contains partially concatenations with other variables, double and single quoted strings, comments at the end of ";" or in the middle...
So what can I do?
So far, that's my variable regex part:
$regex_variable = '\\$([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)\s*[\+\-\*\/\%\.\&\|\^\<\>]*=\s*';
Which I concatenate with $regex_sql which I've tried different forms:
//$regex_sql = '(["\'])(.*?)\2\s*;';
//$regex_sql = '(["\'])([^;]*?)\2\s*;';
//$regex_sql = '(?<!")\b\w+\b|(?<=")\b[^"]+';
//$regex_sql = '([^;]+)(?<=["\']);(?!["\'])';
//$regex_sql = '(.*?;)[^\\$]*';
None of those correctly works.
Can you help me please? I'm sure the best approximation it's getting all the variable itself, and after it, testing the assignation for containing some special SQL words like SELECT, WHERE, UNION, ORDER, ...
So much thanks in advance!
Mark.
edit:
To add that of course, variables with queries could have any kind of form. Those from above are just simple examples.
We're talking about things such:
$s = 'insert into tabletest(a,b,c) values('asd','r32r32','fdfdf')';
or
$where = 'where a=2';
$sql="select distinct * from test ".$where;
or
$a = '
select *
from users
left outer join ...
inner join ...
left join ...
where ...
group by ...
having ...
order by ...
limit ...
...
';
or
...
Imagine a lot of programmers, creating queries inside the code, anyone doing it at their own way... :\
I've to get ALL of them. At least, maximise the results... ^^'
I suggest you take a look at the PHP Tokenizer - you can use it to tokenize your source (i.e. parse it so it is easier to comprehend) then you can look through the tokens for strings and variables that match your requirements, knowing that each token ; ends a line of code.
Don't know if this is what you are looking for :
preg_match_all('/\$.*?=(.*?)(?<=[\'"]);/s', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[1];
This will have all the assignments(assignations) stored in $result. I tested it with all your samples.
Sorry if you wanted something else.
Explanation :
"
\$ # Match the character “\$” literally
. # Match any single character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
= # Match the character “=” literally
( # Match the regular expression below and capture its match into backreference number 1
. # Match any single character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
['\"] # Match a single character present in the list “'\"”
)
; # Match the character “;” literally
"

Categories