Eclipse: Find query strings spread in multiple line - php

Say a file opened in Eclipse has the following string
$stmt = "select addr from
student
where id=123";
$stmtA = "alter table tablename";
$stmtB = " delete from student
where school=ABC";
$var1 = "This is not a query.
Just a string";
I need to find all query statements that effect student table and school column. Searching with :
(?s)"(.*?)"
gives me all the strings that are quoted and spread over multiple lines. Now how do I enhance the above regex to filter the result so that it ensures that result has
1) select or alter or insert or delete keywords of MySQL,
and
2) student and school keywords.
I think with above two conditions satisfied I will be able to extract strings that hit student table and school column. Any help?

(?s)".*?(?:select|alter|insert|delete).*?(?:student|school).*?"
Though using [^"]*? instead of .*? would maybe be better.
Edit:
Let's switch to lookaheads as they are quite the cool tool when ensuring some conditions (as string length, having a special char or smthg):
(?s)".*?(?:select|alter|insert|delete)(?=[^"]*?student)(?=[^"]*?school).*?"
Ok if you're not interested in regex, you can stop here, else, as an example of lookaheads (note: this is slower):
(?s)"(?=[^"]*?(?:select|alter|insert|delete)(?=[^"]*?student)(?=[^"]*?school).*?"
If you have access to atomic groups, it's always better to do this (atomic grouping):
(?>select|alter|insert|delete)
as if one word fails to match after the 1st letter, it skips the rest (they all have a different first letter).
Finally, I guess you could use if/then/else:
(?s)".*?(?:select|alter|insert|delete).*?(?:(student)|school)(?(1).*?school|.*?student).*?"
Or something similar.

Related

PHP How to check if a String is contained in a text from the database using php

I'm creating a paraphrasing system, where a user inputs text and the system paraphrases for them.
My database looks like this:
KeyWord: dainty
Synonyms1: choice; delicious; tasty; juicy; luscious; palatable; savoury
Synonyms2: ethereal; beautiful; fragile; charming; petite; frail; elegant
where Keyword (varchar), Synonym1 (text), and Synomy2 (text) are database columns. The example above is one row of a database with 3 fields and their values.
This how it works if the system finds, for example, a word like tasty, it can be replaced by any of the words separated by a semicolon from either Synomyn1 or Synonym2 or the keyword because they are all synonyms.
Let me explain how the word search is working. The system first searches for the word in the Keyword column, if the word is not found, I go further and search for a word in the Synmon1 column and so on.
My Problem is checking the user's specific word in the Synonym1 or Synonym2 columns. When I use the LIKE clause, the generic way of searching from the database, the system is not searching for a full name, instead, it's searching for characters. For example, let's assume the writer's text is: "Benson has an ice cube", the system is assuming the ice was found in the choice. I don't want that, I want to search for a full word.
If anyone has understood me, please help to solve this.
If I understand your question, you want to search for ice in columns Synonyms1 and Synonyms2 but make sure you do not inadvertently match a word such as choice.
If you have ever read or heard anything on the subject of database normalization you would realize that your database does not even meet the requirements for 1NF (first normal form) becuase it has columns that consist of repeating values, which, as you have found out, makes searching inefficient and difficult. But let's move on:
A synonym column might just contain one word, so it might look like:
ethereal
Or:
ethereal; beautiful; fragile; charming; petite; frail; elegant
Thus the word you are looking for might be:
the entire column value
preceded by nothing and followed by a ;
preceded by a space and followed by a ;
preceded by a space and followed by nothing
So if your version of MySQL does not support regular expressions, then if you are looking for example the word ice in column Synonyms2, the WHERE clause should be:
WHERE (
Synonyms2 = 'ice'
OR
Synonyms2 like 'ice;%'
OR
Synonyms2 like '% ice;%'
OR
Synonyms2 like '% ice'
)
If you are running SQL 8+, then:
WHERE regexp_like(Synonyms2, '( |^)ice(;|$)')
This states that ice must be preceded by either a space or start of string and followd by either a ; or end of string.

How to implement Full Text search in InnoDB?

I have a query,
e.g.
name column have "Rodrigue Dattatray Desilva".
I want to write a query in such a way that,
If I search for 'gtl' and match anywhere in string it should show the result.
I know in PHP I can apply the patch like '%g%t%l%'.
But I want to know MySql way.
Note: I can search for anything, I am just giving above an example.
EDIT:
create table Test(id integer, title varchar(100));
insert into Test(id, title) values(1, "Rodrigue Dattatray Desilva");
select * from Test where title like '%g%t%l%';
Consider the above case. Where "gtl" is string I am trying to search in the title but search string can be anything.
gtl is string where it exists in the current title but not in sequence.
The easy answer is that you need an extra wildcard:
select * from Test where title like '%g%t%l%';
The query you posted does not have a wild card after the 'l', so would only match if the phrase ended with 'l'.
The more complicated answer is that you can also use regular expressions, which give you more power over the search.
The even more complicated answer is that performance of these string matching queries tends to be poor - the wild cards mean that indexes are usually ineffective. If you have a large number of rows in your table, full-text searching is much faster.
You can do the same in Mysql too.
You can use the keyword like in MySql.
% - The percent sign represents zero, one, or multiple characters
_ - The underscore represents a single character

php Insert to db where another column contains some words

I have a new question cause i didnt find it anywhere.
I have a db which contains 4 columns. I did my bot to insert array to a column.Now i have to fill another columns.
My filled column contains site links. Exmp: www.dizipub.com/person-of-interest-1-sezon-2-bolum-izle
I need to take "person-of-ınterest" part and write it to another column as kind of a "Person of Interest". And also "1-sezon-2-bolum" as "Sezon 1 - Bölüm 1".
I couldnt find it to do with php not sql. I need to make it with bot. Can someone help me about it please.
database
There is a column named bolumlink where i put the links. As i told i need to take some words from these links. For instance:
dizi column needs to be filled with "Pretty Little Liars" in first 9 row.
It can be done by SQL Update with Like which allows you to select rows with pattern based search using wild-cards:
% matches any number of characters, even zero characters.
_ matches exactly one character.
update your_table set dizi = 'Pretty Little Liars' where bolumlink like '%pretty-little-liars%'
NOTE:
Updating your database using like without limit or conditions with unique columns can be dangerous. This code might affect the whole table if empty string is passed.

How do I find records when data entry has been inconsistent?

A group of people have been inconsistently entering data for a while.
Some people will enter this:
101mxeGte - TS 200-10
And other people will enter this
101mxeGte-TS-200-10
The sad thing is, those are supposed to be identical records.
They will also search inconsistently. If a record was entered one way, some people will search the other way.
Now, I know all about how you can fix data entry for the future, but that's NOT what I am asking about. I want to know how it is possible to:
Leave the data alone, but...
Search for the right thing.
Am I asking for the impossible here?
The best thing I found so far was a suggestion to simply muck about with the existing data, using the REPLACE function in mySQL.
I am uncomfortable with this option, as it means it will certainly actively piss off half of the users. The unfocused angst of all is less than the active ire of half.
The problem is that it has to go both ways:
Entering spaces in the query has to find both space and not-space entries,
and NOT entering spaces ALSO has to find both space and not-space entries.
Thanks for any help you can offer!
The "ideal" solution is pretty straightforward:
Decide what is the canonical way of representing a record
When someone saves a record, canonicalize it before saving
When someone searches for a record, canonicalize the input before searching for it
You could also write a small program to convert all existing data to the canonical form (you will have the code for it anyway, as "canonicalize" in steps 2 and 3 require that you write code that does so).
Edit: some specific information on how to canonicalize
With the sample data you give, the algorithm might be:
Replace all spaces with hyphens
Replace all runs of one or more hyphens with a single hyphen (a regex would be easiest for this -- actually, a regex can do both steps in one go)
Is there any practical problem with this approach?
Trim whitespaces from BOTH the existing data and the input of the search. That way the intended record(s) will always be returned. Hope your data size is small, though, because it's going to perform pretty poorly.
Edit: by "existing data" I meant "the query of existing data". My answer was based on assumption that the actual data could not be touched (which might not be correct).
If it where up to me, I'd have the data in the database updated with REPLACE, and on future searches when dealing with the given row remove all spaces in the input.
Presumably your users enter the search terms (or record details, when creating a record) in an HTML form, which then goes to a PHP script. It looks like your data can always be written in a way that contains no spaces, so why don't you do this:
Run a query that strips spaces from the existing data
Add code in the PHP script(s) that receives the form(s), so that it strips spaces from submitted data - whether that data is to be used for search or for writing new data.
Edit: I guess you would also need to change some spaces to hyphens. Shouldn't be too hard to write logic to accomplish that.
Something like this.
pseudo code:
$myinput = mysql_real_escape_string('101mxeGte-TS-200-10')
$query = " SELECT * FROM table1
WHERE REPLACE(REPLACE(f1, ' ', ''),'-','')
= REPLACE(REPLACE($myinput, ' ', ''),'-','') "
Alternatively you might write your own function to trim the data so it can be compared.
DELIMITER $$
CREATE FUNCTION myTrim(AStr varchar) RETURNS varchar
BEGIN
declare Result varchar;
SET Result = REPLACE(AStr, ' ','');
SET Result = ......
.....
RETURN Result;
END$$
DELIMITER ;
And then use this in your select
$query = " SELECT * FROM table1
WHERE MyTrim(f1) = MyTrim($myinput) "
have you ever heard of SQL's LIKE?
http://dev.mysql.com/doc/refman/4.1/en/string-comparison-functions.html
there's also regex
http://dev.mysql.com/doc/refman/4.1/en/regexp.html#operator_regexp
101mxeGte - TS 200-10
101mxeGte-TS-200-10
how about this?
SELECT 'justalnums' REGEXP '101mxeGte[[:blank:]]*(\-[[:blank:]]*)?TS[[:blank:]-]*200[[:blank:]-]*10'
digits can be represented by [0-9] and alphas as [a-z] or [A-Z] or [a-zA-Z]
append a + to make then multiple of that. perens allow you to group and even capture what is in the perens and reuse it later in a replace or something else.
RLIKE is the same as REGEXP.

finding similar strings to given by keywords, each keyword have got it's own 'power'

This question is a chalenge for me, my friend can`t tell me how to do it, but he is really good programmer (I think).
Users can put into database sentences. When user puts a sentence it is saved in sentences table.
Next, sentence is split into words, each soundex of the word is saved into table tags with id of the splited sentence.
Last, each soundax of the word is put into weights table, if there arleady is the same soundex, function adds 1 to counter of this soundex.
(For those who dont know: soundex is a function that returns a phonetic representation (the way it sounds) of a string)
Structure of the database:
One table sentences contains two rows: id and sentence.
Other table tags contains id (with is id of a sentence) and tag (with is one word from the sentence).
tag isn't really just plain word, but soundex of this word.
Last table weights contains tag and weight (with is number, it tells us how many there is tags like this in table tags)
My question is: how can I make a function witch returns similar sentences to given string.
It should use tags (soundex of word) and each tag should have its own power based on weights table.
Tags, that are often used are more important, then more original tags. Can it be done in just one mysql query?
Next question: I think that this way of looking for similar sentences is good, but what with speed of this function?
I need to use it very very often in my site.
Well instead of having a weights table, why don't you have a table that relates tags to sentences? So have a table called sentence_tags with a sentence_id and a tag_id column. Then you can compute the weights by doing a join on those two tables, and still reference back to the sentence that contains the tag. You may as well store both the tag and the soundex in the tags table, while you're at it.
Perhaps the Levenshtein Distance is what you are looking for. It calculates the number of steps there are needed to transfer from one word to another.
Do realize this is a costly operation.
Joe K's suggestion seems spot on for good database design.
Do not store information that can be extrapolated.
Meaning, use the join statement and PHP to calculate the weight at run-time.
I understand this may not be the correct solution in your design, but often a few moments spent on smart database struture design will make everything work that much better.

Categories