I have a databse with keywords coloumn
Need to search the database on the basis of query done by user.
Every keyword has word "outlet" at the end but user will only search "gul ahmad" not "gul ahmad outlet". For this i used following query and things worked fine to get results and found complete result "Gul Ahmad Outlet"
$sql = "SELECT keywords FROM table WHERE keywords REGEXP '([[:blank:][:punct:]]|^)$keyword([[:blank:][:punct:]]|$)'";
Now i have 2 issues
1. If the word "outlet is in between the query words then it does not find the word. e.g if user search "kohistan lahore", database has an outlet named "kohistan outlet lahore" but it does not find the keyword in database and returns empty. How to tell database to include "outlet" in between, at the start or athe end to find and match the result.
if some user search "nabeel's outlet" database has it but due to " ' " this query returns empty without any result.
What you can do is that you can match your column values with just the first word
of your search expression(i.e nabeel's outlet). I believe this way you will be able to cover all your scenarios.
select
*
from `outlets`
where REPLACE(`name`,'\'','') regexp SUBSTRING_INDEX('nabeels outlet', ' ', 1)
Look at this fiddle and test yourself : http://sqlfiddle.com/#!9/b3000/21
Hope it helps.
Much simpler: [[:<:]]$keyword[[:>:]] -- This checks for "word boundary" instead of space or punctuation or start/end of string. And $keyword = "nabeel's" should not be a problem.
Don't you want to always tack on "outlet"?
REGEXP "[[:<:]]$keyword[[:>:]] outlet"
And, yes, you must escape certain things, such as the quotes that will be used to quote the regex string. PHP's addslashes() is one way.
Related
I've read quite a few similar posts but none solves my case, which could well be because of my lack of sufficient knowledge, so please bear with me.
One of the search options in my terminological dictionary is "whole words only". At first I was using
WHERE ".$source." RLIKE '[[:<:]]".$keyword."[[:>:]]'
However, this failed to match whole words for the first or second $keyword when there is more than one. Then I found
WHERE ".$source." REGEXP '[[:<:]]".$keyword."[[:>:]]'
and
WHERE ".$source." REGEXP '(^| )".$keyword."( |$)'
while searching these forums
I just tested both of the above in my PhpMyAdmin and found out that the former executes in 0.0740 seconds, while the latter takes twice as long, 0.1440 seconds, so I guess I should stick with the former.
What bothers me the most is the huge discrepancy in results, e.g. searching for a single word ("tool"):
Using the [[:<:]] and [[:>:]] word boundary in PhpMyAdmin returns 34 results.
Using (^| ) and ( |$) in PhpMyAdmin returns 26 results.
Running the #1 regexp in my PHP script returns 34 results (this is the correct number).
Here's the whole MySQL block:
foreach($keywords as $keyword) {
$query = $db->query("SELECT * FROM ".DICTIONARY_TABLE." " .
"JOIN ".DICTIONARY_THEMES." ON ".DICTIONARY_TABLE.".theme_id = ".DICTIONARY_THEMES.".theme_id ".
"LEFT JOIN ".DICTIONARY_DEFINITIONS." ON ".DICTIONARY_TABLE.".term_id = ".DICTIONARY_DEFINITIONS.".term_id ".
"WHERE ".DICTIONARY_TABLE.".".$source." REGEXP '(^| )".$keyword."( |$)'".
//"WHERE ".DICTIONARY_TABLE.".".$source." REGEXP '[[:<:]]".$keyword."[[:>:]]'".
" ORDER BY ".DICTIONARY_TABLE.".theme_id, ".DICTIONARY_TABLE.".".$source."");
}
I've commented out the search option I'm not using.
Now, if I try TWO keywords, e.g. "cutting tool", I still get 34 results in the page. I'm unsure if I'm doing this right in PhpMyAdmin:
SELECT * FROM `asphodel_dictionary_terms` WHERE english REGEXP '[[:<:]]cutting[[:>:]]';
SELECT * FROM `asphodel_dictionary_terms` WHERE english REGEXP '[[:<:]]tool[[:>:]]'
This returns 44 results for "cutting" and 34 results for "tool". The query using (^| )... returns 37 + 26 results respectively.
Any feedback that would help me sort things out would be appreciated!
The database contains a total of 109,000 entries in the main table, there are 82 themes in the DICTIONARY_THEMES table and 727 entries in the DICTIONARY_DEFINITIONS table. Not a huge database and it won't grow much bigger.
You are getting different results because the two regexes are not identical.
(^| ) means : either the beginning of the string or a space (( |$) has the same meaning at end of string).
[[:<:]] and [[:>:]] are word boundaries : conceptually this refers to characters that separate words, and usually regex engines interpret it as something like : anything but a digit, a letter or an underscore.
So basically the first pattern is more restrictive than the second (space, beginning and end of string are word boundaries, but there are others).
If you have more than one keyword to search for, you would need to repeat the regex matches, like :
WHERE
".$source." RLIKE '[[:<:]]".$keyword1."[[:>:]]'
OR ".$source." RLIKE '[[:<:]]".$keyword2."[[:>:]]'
Or create a new regex by combining the keywords :
WHERE
".$source." RLIKE '[[:<:]](".$keyword1.")|(".$keyword2.")[[:>:]]'
NB : for search requirement, you should consider using MySQL Full Text Search, which are primarily built for the purpose of searching for full words (there are pre-requisites, though).
I have not been able to find a question close enough to what I am asking, so here is my problem:
I have a list of blacklisted words stored in a MySQL table. Then I have a sentence. I would need to construct a MySQL query, where I search for occurence of any of the blacklisted words in the sentence.
If there is just one match, the search may stop, as the sentence is not acceptable.
Can anyone help me construct this query? Thanks!
Edit
If possible, I would like to avoid jumping beween PHP and MySQL. I can have two thousands or more blacklisted words. I would like to submit my file as a string/variable into the MySql, not to build a table from it.
The closest one line SQL I get is:
SELECT keyword, STRCMP('this is my sentence with blacklisted word', keyword) FROM blacklist;
Maybe, my line goes in a good direction and can be improved to simply return TRUE or FALSE if a match was found?
you have to break your sentense into words an check if your table contains any of those words by using in. case you use php, you can do something like this:
$expression="is there any blacklisted word here";
$words=str_word_count($expression, 1);
$words=implode(",",$words);
$sql=mysql_query("select word from table_black_list_word where word in ($words)",$db_conn);
if($row=mysql_fetch_array($sql)){
//case your expression do have a blacklisted word
}else{
//expression does not contains any blacklisted word
}
Assuming that you have a table with all blacklisted words you could construct a list of words from the sentence in php and pass it as an argument in the where clause query.
$words = str_word_count($string, 1);
$whereclause = join("','",$words);
$whereclause = "('".$whereclause."')";
$query = "Select COUNT(words) from blacklisttable where words IN".$whereclause;
Then you can check if the result is equal to zero.
How about using mysql WHERE and LIKE :
WHERE
(
`sentence` LIKE '%blist1%'
OR `sentence` LIKE '%blist2%'
OR `sentence` LIKE '%blist3%'
OR `sentence` LIKE '%blist4%'
)
Now using php you can generate the where statement from blacklisted array by.
$whereStatement = "";
$blackList = new Array('blist1','blist2','blist3','blist4');
$len=count($blackList);
for($i=0;$i<$len-2;$i++) {
$whereStatement+="sentence LIKE '%$blackList[$i]' ";
}
$whereStatement+="sentence LIKE '%$blackList[$len-1]' ";
$query = "(WHERE $whereStatement )";
After a lot of experimentation, I have found an answer to my own question:
SELECT SUM( 'this is my windows xp file' LIKE CONCAT('%', keyword, '%')) AS result FROM blacklist;
No need for multiple queries or to preprocess anything in PHP or multiple jumps between MySQL and PHP.
you can do it with locate function :
SELECT * FROM blacklist WHERE locate(keyword, 'the sentence') > 0
To search for the expression inside a sentence, you can just use a wildcard before and after the blacklisted word. Assuming your sentence is all in one column, you can filter for it in the where clause.
Try this:
SELECT *
FROM myTable
WHERE sentenceColumn LIKE '%blacklistedWord%'
SQLFiddle example.
EDIT
I'm sorry OP, but I thought I misread your question. I see now that you wanted to look for any number of blacklisted words in a sentence.
The following query pulls all sentences based on whether they have one of the blacklisted words inside of the sentence. This query however will pull every occurrence of a match. In other words, if a sentence has three blacklisted word, three rows will be returned. To correct that, you can group by sentence (or sentence id, whatever matches your table).
SELECT sentences.*
FROM sentences
JOIN blacklisted ON sentences.sentence LIKE CONCAT('%', blacklisted.word, '%')
GROUP BY id;
Here is an updated SQL fiddle. You'll notice that this checks for any sentence with the word 'blacklisted' or 'this'. Four rows should be returned.
I'm trying to match multiple words in Lucene as I could do in MySQL.
It's harder than I thought:
written in PHP:
my query for perfect match is:
$words = explode($words, " ");
(text:(' . implode(" ", $words) . ')
but if text is "a bunch of words I wrote", it won't match until I have written everything
Does exist any way to force Lucene to behave exactly like MySQL's like "%a bunc%" and retrieve the hole phrase?
Thanks in advance
EDIT:
I'm not using Lucene directly, I use Solr as a REST service. So I'm looking for the "plain grammar" to solve this problem like: select?q=: and the query is : ( select all ) if I have many words in the text field, as told before, I don't find any way to consider them as a unique word.
If it were a unique word, I could do "text:(beginningOfW*)" and it would find it,
If it is a multiple words, If I write "text:(beginning Of W*)" it will find only words beginning with W, and ignore the other words.
Yes, Lucene is full text search engine API. I think you are looking for WildCardQuery:
Term term = new Term("whichField", "searchString");
Query query = new WildcardQuery(term);
Hits hits = indexSearcher.search(query);
Thanks for taking time to read my question.
I have created a MySQL table, a HTML form and a program in PHP which connects the form to MySQL table and retrieves sequences for column Annotations which is text data type.
This column has characters and also has one or more of hyphen, comma, parentheses, period or spaces.
Please look at the following code that I used for select query:
$values=mysql_query("SELECT Sequence
FROM oats
WHERE Foldchange = '$Foldchange' AND
RustvsMockPvalue = '$RustvsMockpvalue' AND
Annotations REGEXP '%$Annotation%[-]+'");
Here $Annotation is the form variable which holds the value entered by the user in the form. Annotations is the column name in the MySQL table.
Annotations column has characters A-Z or a-z and one or more of hyphen, comma, space or parentheses like the following.
Sequence is another text column in the MySQL table but does not have ,./().
Example data from Annotations column:
ADP, ATP carrier protein, mitochondrial precursor (ADP/ATP translocase) (Adenine nucleotide translocator) (ANT).
I am not able to retrieve Sequence column data when I search for any Annotations column data with comma, parentheses, period and slash. It works fine for those records which does not have these ,.()/.
I tried to use LIKE instead of REGEX but it didn't work either.
A record from mysql table:(columns that you see below: contigid,source,genelength,rustmeans, mockmeans,foldchange,pvalue,rustmockteststatistic,Annotations and Sequence)
as_rcr_contig_10002 ORME1 2101 506.33 191 -2.18 2.21E-10 -6.35 Tesmin/TSO1-like, CXC domain containing protein. AACAATTCCCCTCAACCAACCTTTTATTTCATCCCATTTTTATCATCTGTCCGGTTACAGATTTTGCTTCCAGTTAGGTGCCACTTCTTCAAACGCTCAACCCTTACCCACTACCACCCCACCAAAACCAACCCCCCAAGATGCAGTTCATCACTCTCGCCGTTGCTTTTGCTTTCTTTGCTGGTGCCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTTTTGCTTTCTTTGCTGGTGCCACCTCGTCGCCGGTTTCCATGGACCCCAAAGCCGAGAAGTCCGGCTCCTCGGGATCCGGTGGCGCCCCTCTGGGCACTGCTAGCCCCTATCCCCAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTGGCCCTCAGTCGCCAGGCTCTGGCCAACCCGGTAGGATGCCATGGGGTAGCGACCAATCTGCCTACGGTGGTGGTTTCCCTTATGGATCATTCCCCTCGGTTTCGGGGCAATCCCAATCGACGGCCTATGCTCAAGCTCAATCATCCAGTTTCCCCTCAAACGGTGTCCCGACACACTCCTCGGCCTCCGCCCAAGCGCAATCATCCGGTCCTGGACAAGCTCAGGCAGCCGCTTCTGCCCAGGTTCCCGGCGGCCCCCACGGTCAAGGTTCTAACGGATTTGGCGCACAAGGCCAGTTTGGACAGAACGGGCAGAACGGCCTCTATGGTCAAGACGGCAATGGCTTTAGTGCCCAAGGCCAATTTGGACAGAGTGGACAGAATGGCTTCTATGGTCA
Could someone please help me in the correct syntax of the SELECT syntax? Thank you.
You need to familiarise yourself with regex - it's its ownittke language.
Use REGEXP with the right regex:
WHERE ...
AND Annotations REGEXP '[-A-Za-z(). ]+'
AND Annotations NOT_REGEXP '[A-Za-z]+'
If mysql supported regex look aheads, this could be done in one test.
,
First of all, you are not using REGEXP properly.
You should check the differences between LIKE and REGEXP.
REGEXP use Regular expresions, which have very particular syntax.
LIKE use simple text remplacement with key characters like % or _
Here you are using REGEXP with %, that's why it's not working. % is a key character for LIKE only.
But in REGEXP, . and - are special characters that you need to escape to.
If you want to check several characters, REGEXP is the way to go :
Annotations REGEXP '.*$Annotation.*[\-(),\.]+.*'
This match :
.* : 0 to n characters
$Annotation : Your keyword
.* : 0 to n characters
[\-(),\.]+ : At least 1 character from the list : - ( ) , .
.* : 0 to n characters
Tell us if that match your data.
Since we can't craft a Regular Expression that would work in your case without getting into some crazy matching schemes (orders and so forth), In order to find what you're looking for, you'll need to custom construct the SQL statement and luckily you're using PHP.
Here I'm starting with a simple space delimited entry. Remember that you can't wrap something with parenthesis because the parenthesis might not match up in your result set.
$search_input = 'ADP ANT';
//example of array from a search page full of check boxes or fields
$annSearches = explode(' ',$search_input);
/*annSearches is now and array with ADP,ANT*/
$sql = "SELECT Sequence FROM oats WHERE Foldchange = '$Foldchange' AND RustvsMockPvalue = '$RustvsMockpvalue'";
foreach ($annSearches as $Annotation){
$sql .= " AND Annotations LIKE '%$Annotation%'";
}
The output SQL statement would look like this (wrapped for clarity):
SELECT Sequence FROM oats WHERE
Foldchange = '$Foldchange'
AND RustvsMockPvalue = '$RustvsMockpvalue'
AND Annotations LIKE '%ADP%'
AND Annotations LIKE '%ANT%';
If you do a really long query, this will get slower and slower as MySQL has to run through every record in the database over and over for the results.
FULLTEXT SEARCH OPTION
Another way that you could potentially do this is to enable FULLTEXT search functionality on the Annotations field in the table in the database.
ALTER TABLE oats ADD FULLTEXT(Annotations);
This would allow you to do a search something like this:
Sequence FROM oats WHERE
Foldchange = '$Foldchange'
AND RustvsMockPvalue = '$RustvsMockpvalue'
MATCH(Annotations) AGAINST ('ADP ANT')
I think I'm going crazy with this...
I've tried a lot of combinations and I can't get with the good one.
I need to find all the SQL queries in a PHP code after having read it with a file_get_contents().
Of course, all those queries are variable assignations like:
$sql1 = "
SELECT *
FROM users u
WHERE u.name LIKE '%".$name."%' AND ... ;
";
or
$sql2 = "
SELECT *
FROM users u
WHERE u.id = ".$user_id;
or
$sql3 = '
SELECT *
FROM users u
ORDER BY u.surname1 DESC
'; //this query blablabla.......
So you can see that there are many factors to take in account for PHP variables.
First I've tried with an approximation based on getting the variable itself combined with the getting it's content...
I've tried to find specific words from SQL in a regex pattern too...
Whatever...
I don't know how to do it.
Getting all the variable and it's assignation, grouping the assignation and after it, looping through matches searching for special SQL words (that's what I've right now, but it doesn't work cause assignation regex part).
Directly searching for a SQL queries with a good regex?
PHP variables (specifically strings), contains partially concatenations with other variables, double and single quoted strings, comments at the end of ";" or in the middle...
So what can I do?
So far, that's my variable regex part:
$regex_variable = '\\$([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)\s*[\+\-\*\/\%\.\&\|\^\<\>]*=\s*';
Which I concatenate with $regex_sql which I've tried different forms:
//$regex_sql = '(["\'])(.*?)\2\s*;';
//$regex_sql = '(["\'])([^;]*?)\2\s*;';
//$regex_sql = '(?<!")\b\w+\b|(?<=")\b[^"]+';
//$regex_sql = '([^;]+)(?<=["\']);(?!["\'])';
//$regex_sql = '(.*?;)[^\\$]*';
None of those correctly works.
Can you help me please? I'm sure the best approximation it's getting all the variable itself, and after it, testing the assignation for containing some special SQL words like SELECT, WHERE, UNION, ORDER, ...
So much thanks in advance!
Mark.
edit:
To add that of course, variables with queries could have any kind of form. Those from above are just simple examples.
We're talking about things such:
$s = 'insert into tabletest(a,b,c) values('asd','r32r32','fdfdf')';
or
$where = 'where a=2';
$sql="select distinct * from test ".$where;
or
$a = '
select *
from users
left outer join ...
inner join ...
left join ...
where ...
group by ...
having ...
order by ...
limit ...
...
';
or
...
Imagine a lot of programmers, creating queries inside the code, anyone doing it at their own way... :\
I've to get ALL of them. At least, maximise the results... ^^'
I suggest you take a look at the PHP Tokenizer - you can use it to tokenize your source (i.e. parse it so it is easier to comprehend) then you can look through the tokens for strings and variables that match your requirements, knowing that each token ; ends a line of code.
Don't know if this is what you are looking for :
preg_match_all('/\$.*?=(.*?)(?<=[\'"]);/s', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[1];
This will have all the assignments(assignations) stored in $result. I tested it with all your samples.
Sorry if you wanted something else.
Explanation :
"
\$ # Match the character “\$” literally
. # Match any single character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
= # Match the character “=” literally
( # Match the regular expression below and capture its match into backreference number 1
. # Match any single character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
['\"] # Match a single character present in the list “'\"”
)
; # Match the character “;” literally
"