So let me give you some information, i have a blog system which is backed to a database, the database holds the title of the articles. Now i have tried to create a "related news" feature, and it is rather basic, it just takes the title of the existing page and splits it up like so (Note that $row_object_title is just the title pulled from the database):
$row_object_title_lower = strtolower($row_object_title);
$keywords = explode(" ",$row_object_title_lower);
I then run it through my function:
exclude_conjuctions($keywords);
code for that function(looks for certain words and removes it from the array:
function exclude_conjuctions($array){
global $keywords_new;
$keywords_new = $array;
$conjuctions = array("here","to","and","but","or","nor","for");
$counter = count($keywords_new);
foreach($conjuctions as $conjuction){
for($i=0;$i <= $counter;$i++){
if ($keywords_new[$i] == $conjuction){
unset($keywords_new[$i]);
}
}
}
return $keywords_new;
}
So now i will build my query to retreive all articles that have the keywords in the title:
$sql = '';
foreach ($keywords_new AS $keyword)
{
if ($sql != '')
$sql .= ' OR ';
$sql .= "object_title LIKE '%$keyword%'";
}
$squery = 'SELECT object_title FROM example_table WHERE '.$sql;
NOW. It seems to be working okay, but there are times when it returns a title which does not have the same words as the current article, so i investigated it and it seems it picks up parts of the word and returns it, which is of course not what we want, if you are confused take a look at this image:
http://puu.sh/7UhhW.jpg
Note how i search for "dow" and those letters are found in both the current title and the retrieved titles. Of course i need it to only return related articles that have the full words in the title, not part of the words. What am i doing wrong guys? maybe my MySQL query needs to be changed? maybe there is a better solution? would love some help.
This is a problem as you can imagine.
Thanks for the help in advance.
Try doing LIKE '% {$keyword} %'
Also your query is vulnerable for SQL Injections.
How can I prevent SQL injection in PHP?
EDIT : A better way to do this would be using a Regular Expression:
REGEXP '[[:<:]]{$keyword}[[:>:]]'
Instead of LIKE...
Try using the === operator instead of the == operator to compare strings. A good reason why can be found here
Also, you are wrapping your query with % on each side. That says to return all matches that CONTAIN those strings. Thus 'Dow' is contained in 'Down' and would be returned. You probably want to add a space around the %'s to only get matches that equal your keywords.
Could you implement the "LIKE" search to include a preceding and succeeding space? You would possibly need to have three conditions though to cater for words at the start and end of sentence:
$sql .= "object_title LIKE '% $keyword %' OR LIKE '$keyword %' OR LIKE '% $keyword'";
Related
I've got this code:
function searchMovie($query)
{
$this->db->where("film_name LIKE '%$query%'");
$movies = $this->db->get ("films", 40);
if($this->db->count > 0)
{
return $movies;
}
return false;
}
Javascript code from my submit form button strips all special characters like ; : ' / etc. from query string, and then redirects user to search uri (szukaj/query). So for example if film_name is Raj: wiara, and user searches for raj: wiara, the query looks like raj wiara and user doesn't get any results. I was thinking about exploding query into single words and then foreach word do a SELECT from db, but it would give multiple results of same movie. Don't want to change the javascript code, and I think I can't make that film names without the special characters like :.
Or maybe create another column in db for film_keywords and add there all words of movie separated by , or something and then search this column?
MySQL's Full Text Search functions are your friend here:
http://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
Will return a series of matches and give a score so you return in best-match order.
Warning: $this->db->where("film_name LIKE '%$query%'"); is open to SQL injection. Anyone can circumnavigate the JavaScript so you must always clean up input server-side. This is best done using the DB functions as well, not just stripping characters - so check whatever library you are using in order to do this.
You could indeed explode your string, using this answer's solution.
function searchMovie($query)
{
$queries = preg_split('/[^a-z0-9.\']+/i', $query);
foreach ($queries as $keyword){
$this->db->where("film_name LIKE '%$keyword%'");
}
$movies = $this->db->get ("films", 40);
if($this->db->count > 0)
{
return $movies;
}
return false;
}
This will create multiple ANDconditions for your db where, so the result will be filtered.
I have website where users can search posts by entering keywords,
I am using Sphinx search for full text search, everyhting is working as expected.
But when i enter/input some special charaters in search query the search dosnt complete and throws error.
e.g.
keyword i search for :
hello)
my query for sphinxql :
SELECT id FROM index1 WHERE MATCH('hello)')
error i get :
index index1: syntax error, unexpected ')' near ')'
my php code looks like this
<?php
$sphinxql = mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','') or die('ERROR');
$q = urldecode($_GET['q']);
$sphinxql_query = "SELECT id FROM $sphinx_index WHERE MATCH('".$q."') ";
?>
How can i escape user input and make sure the query wont brake and return the result set ?
You should use SQL escaping, to avoid SQL injection.
http://php.net/manual/en/mysqli.real-escape-string.php
$sphinxql_query = ".... MATCH('".mysqli_real_escape_string($sphinxql,$q)."') ";
... BUT you may want to ALSO, escape extended syntax.
See the FIRST THREE POSTS (after that it delves into misunderstanding) in this thread in the sphinx forum
http://sphinxsearch.com/forum/view.html?id=13619
For a simple solution.
The function in that thread, can be used to make your query work. It will escape the ) and stop it being taken as a operator.
BUT, it also means you WONT be able to use any search operators - because it blindly escapes them ALL. (which is the confusion later in the thread)
If you want to be able to use some or all operators, need to use more advanced escaping. (which I dont have a good solution for)
Edit: actully lets go the whole hog...
<?php
//Escapes all the Extended syntax, so can accept anything the user throws at us.
function EscapeString ( $string ) {
$from = array ( '\\', '(',')','|','-','!','#','~','"','&', '/', '^', '$', '=' );
$to = array ( '\\\\', '\(','\)','\|','\-','\!','\#','\~','\"', '\&', '\/', '\^', '\$', '\=' );
return str_replace ( $from, $to, $string );
}
if ($allow_full_extended_syntax) {
$q = $_GET['q'];
// the user is responsible for providing valid query.
} elseif ($allow_partical_extended_syntax) {
$q = InteligentEscape($_GET['q']);
//I don't have this function, it would need to be created.
} else {
$q = EscapeString($_GET['q']);
// escapes ALL extended syntax. NO operators allowed
}
$sphinxql_query = ".... MATCH('".mysqli_real_escape_string($sphinxql,$q)."') ";
Then it sounds like you want both $allow_full_extended_syntax and $allow_partical_extended_syntax set to false. Which means no operators will work, because they will be fully escaped.
The EscapeString function needs to escape the < character as well. Also see escapeString function in PECL shpinx for reference.
I have an html form with multiple checkboxes. I pass those to php...
The values will come in like "G,BW" for example ("BW,G" needs to match as well)
in check.php, I need to take the values from $_GET and modify them for an sql query...
<?php
if(!empty($_GET['wireColor'])) {
foreach($_GET['wireColor'] as $colors) {
echo $colors; //make sure it's right, then comment out
}
}
$colors = rtrim($colors, ','); //Get rid of trailing , - not working right
$wireSearch = '\'REGEXP \'.*(^|,).$wireColor.(,|$)\''; //Trying to add the sql bits to pass to the query.
Ideally to get this passed:
$colors_lookup_sql = "SELECT * FROM parts WHERE ( wire_colors REGEXP '.*(^|,)$wireSearch(,|$)' and wire_colors REGEXP '.*(^|,)$wireSearch(,|$)' );";
Here's how the query should look at the end:
SELECT * FROM parts WHERE ( wire_colors REGEXP '.*(^|,)G(,|$)' and wire_colors REGEXP '.*(^|,)BW(,|$)' );
I'm having a hard time getting the regex bits into the query.
UPDATE
Here's what I have now:
<?php
if(!empty($_GET['wireColor'])) {
foreach($_GET['wireColor'] as $colors) {
$wireSearch = ' REGEXP \'.*(^|,)' .$colors.'(,|$)\' AND ';
}
}
$Search = rtrim($wireSearch, 'AND'); //Trying to trim the last "AND"
$colors_lookup_sql = "SELECT * FROM parts WHERE ( wire_colors $wireSearch% );";
Which gives me mostly what I need, but print/echo the results and I get:
$wireSearch ends up as: REGEXP '.*(^|,)G(,|$)' AND REGEXP '.*(^|,)BW(,|$)' AND Which is great - I just need to nuke the last "AND". The trim above replaces it with the second value instead though. Weird.
and $colors_lookup_sql ends up as: SELECT * FROM parts WHERE ( wire_colors REGEXP '.*(^|,)BW(,|$)' AND % );
BUt for some reason the first value in the array goes away, which I don't understand since it was present before the sql statement.
I'm not sure about the REGEX inside the query since I haven't used it but:
$wireSearch = '\'REGEXP \'.*(^|,).$wireColor.(,|$)\'';
Here you have a problem, the variable $wireColor is inside a string, and you are using ' so anything inside is not read as a variable, it should be something like:
$wireSearch = '\'REGEXP \'.*(^|,)'.$wireColor.'(,|$)\'';
I cant say I entirely understand how your data is being stored, and I havent worked with REGEX much myself, but perhaps something like this would be a bit easier to work with:
$wireSearch = explode(",", $_GET['wireColor']);
$query = "SELECT * FROM parts WHERE wire_colors LIKE '%$wireSearch[0]%'
AND wire_colors LIKE '%$wireSearch[1]%'";
Not sure if this helps but I thought id throw in the idea.
I guess mysql regex supports word boundaries, how about:
wire_colors REGEX '\bBW\b' and wire_colors regex '\bW\b'
Let's say i check if
$strig = "how can i do this";
already exists in my database with all words order options?
Like:
"how i can do this"
or
"i do this can how"
...
...
my database looks like:
id string
1 how can i do this
2 hello how are you
3 how i can do this world
4 another title
etc etc
Thanks
The number of possible combinations is n! (120 in your sample) so checking if this string already exists is quite complex task.
I would recommend to use the following algorithm:
Add new column StringHash to your table
On insert order your string (e.g. alphabetically), calculate its hash and store in StringHash:
"how can i do this" => "can do how i this" => md5("can+do+how+i+this")
If you want to check if a certain string exists in the db then again calculate its hash as described above and query the db on YourTable.StringHash
This is a tricky problem if you want to fix this in sql only, but that aside:
As #er.anuragjain says, you can do a query with LIKE %word%, but you would also get a hit on your example '3'.
So if you have a query like this:
SELECT * FROM table WHERE
column LIKE '%how%'
AND column LIKE '%can%'
AND column LIKE '%i%'
AND column LIKE '%do%'
AND column LIKE '%this%'
Then you also get number 3. So you need to check if there are no other words. You can do this by checking the word count (if you have 5 words and all of your words are in there, you are done.).
Checking wordcount is not trivial, but there is a trick. From several sources*:
SELECT LENGTH(total_words) - LENGTH(REPLACE(total_words, ' ', ''))+1
FROM tbl_test;
should do the trick. So check the LIKE's, and check the wordcount, and you're done. But I'm not really sure this is a pretty sollution :)
http://www.webtechquery.com/index.php/2010/03/count-number-of-words-in-mysql-mysql-words-count/
and http://www.mwasif.com/2008/12/count-number-of-words-in-a-mysql-column/
(random google hits :) )
you can ask if the string where you are searching in, contains: "how" and "can" and "I" and "do" and "this"
something like this:(I don't know the syntax in mysql but see the concept)
if(string.contain("how")&&
string.contain("can")&&
string.contain("I")&&
string.contain("do")&&
string.contain("this"))
{
//you find the string
}
If you are using mysql then try this..
select * from tablename where columnname Like '%how%' AND columnname LIKE '%can%' AND columnname LIKE '%I'% AND columnname LIKE '%DO'% AND columnname LIKE '%This'%;
Here if u have dynamic value in $string then first convert it into an array spliting by space.then create a $condition varriable from the array and append that in select * from tablename where and run that query.
thanks
should be the wildcard call in mysql
select * From tablename Where columnname LIKE '%how%'
you can use regex first crate a function like this
public function part($str){
$str = str_replace('',' ',$str);
$arr = explode(' ',$str);
$rejex = '';
foreach($arr as $item){
$rejex .= "(?=.*$item)";
}
return $rejex;
}
and then use sql regex
$sql = "SELECT * FROM `table` WHERE `column` REGEXP ".part($str);
I'd like to be able to use php search an array (or better yet, a column of a mysql table) for a particular string. However, my goal is for it to return the string it finds and the number of matching characters (in the right order) or some other way to see how reasonable the search results are, so then I can make use of that info to decide if I want to display the top result by default or give the user options of the top few.
I know I can do something like
$citysearch = mysql_query(" SELECT city FROM $table WHERE city LIKE '$city' ");
but I can't figure out a way to determine how accurate it is.
The goal would be:
a) find "Milwaukee" if the search term were "milwakee" or something similar.
b) if the search term were "west", return things like "West Bend" and "Westmont".
Anyone know a good way to do this?
You should check out full text searching in MySQL. Also check out Zend's port of the Apache Lucene project, Zend_Search_Lucene.
More searching led me to the Levenshtein distance and then to similar_text, which proved to be the best way to do this.
similar_text("input string", "match against this", $pct_accuracy);
compares the strings and then saves the accuracy as a variable. The Levenshtein distance determines how many delete, insert, or replace functions on a single character it would need to do to get from one string to the other, with an allowance for weighting each function differently (eg. you can make it cost more to replace a character than to delete a character). It's apparently faster but less accurate than similar_text. Other posts I've read elsewhere have mentioned that for strings of fewer than 10000 characters, there's no functional difference in speed.
I ended up using a modified version of something I found to make it work. This ends up saving the top 3 results (except in the case of an exact match).
$input = $_POST["searchcity"];
$accuracy = 0;
$runner1acc = 0;
$runner2acc = 0;
while ($cityarr = mysql_fetch_row($allcities)) {
$cityname = $cityarr[1];
$cityid = $cityarr[0];
$city = strtolower($cityname);
$diff = similar_text($input, $city, $tempacc);
// check for an exact match
if ($tempacc == '100') {
// closest word is this one (exact match)
$closest = $cityname;
$closestid = $cityid;
$accuracy = 100;
break;
}
if ($tempacc >= $accuracy) { // more accurate than current leader
$runner2 = $runner1;
$runner2id = $runner1id;
$runner2acc = $runner1acc;
$runner1 = $closest;
$runner1id = $closestid;
$runner1acc = $accuracy;
$closest = $cityname;
$closestid = $cityid;
$accuracy = $tempacc;
}
if (($tempacc < $accuracy)&&($tempacc >= $runner1acc)) { // new 2nd place
$runner2 = $runner1;
$runner2id = $runner1id;
$runner2acc = $runner1acc;
$runner1 = $cityname;
$runner1id = $cityid;
$runner1acc = $tempacc;
}
if (($tempacc < $runner1acc)&&($tempacc >= $runner2acc)) { // new 3rd place
$runner2 = $cityname;
$runner2id = $cityid;
$runner2acc = $tempacc;
}
}
echo "Input word: $input\n<BR>";
if ($accuracy == 100) {
echo "Exact match found: $closestid $closest\n";
} elseif ($accuracy > 70) { // for high accuracies, assumes that it's correct
echo "We think you meant $closestid $closest ($accuracy)\n";
} else {
echo "Did you mean:<BR>";
echo "$closestid $closest? ($accuracy)<BR>\n";
echo "$runner1id $runner1 ($runner1acc)<BR>\n";
echo "$runner2id $runner2 ($runner2acc)<BR>\n";
}
This can be very complicated, and I am not personally aware of any good 3rd party libraries although I'm sure they exist. Others may be able to suggest some canned solutions, though.
I have written something similar from scratch a few times in the past. If you go down that route, it is probably not something you'd want to do in PHP by itself as every query would involve getting all of the records and performing your calculations on them. It will almost certainly involve creating a set of index tables that meet your specifications.
For instance, you would have to come up with rules for how you imagine that "Milwaukee" could end up spelled "milwakee." My solution to this was to do vowel compression and duplication compression (not sure if these are actually search terms). So, milwaukee would be indexed as:
milwaukee
m_lw__k__
m_lw_k_
When the search query came in for "milwaukee", I would run the same process on the text input, and then run a search on the index table for:
SELECT cityId,
COUNT(*)
FROM myCityIndexTable
WHERE term IN ('milwaukee', 'm_lw__k__', 'm_lw_k_')
When the search query came in for "milwakee", I would run the same process on the text input, and then run a search on the index table for:
SELECT cityId,
COUNT(*)
FROM myCityIndexTable
WHERE term IN ('milwaukee', 'm_lw_k__', 'm_lw_k_')
In the case of Milwaukee (spelled correctly), it would return "3" for the count.
In the case of Milwakee (spelled incorrectly) ,it would return "2" for the count (since it would not match the m_lw__k__ pattern as it only had one vowel in the middle).
If you sort the results based on the count, you would end up meeting one of your rules, that "Milwaukee" would end up being sorted higher as a possible match than "Milwakee."
If you want to build this system in a generic way (as hinted by your use of $table in the query) then you'd probably need another mapping table somewhere in there to map your terms to the appropriate table.
I'm not suggesting this is the best (or even a good) way to go about this, just something I've done in the past that might prove useful to you if you plan to try and do this without a third party solution.
Most maddening result with LIKE is this one "%man" this will return all woman in file!
In case of listing perhaps a not too bad solution is to keep on shortening the searching needle. In your case a match will come up when your searching $ is as short as "milwa".