I need to search a text column for the occurrence of certain words in the column.
For instance as an example the column contents may look like:
KIT: TYPE WELD NECK ORIFICE FLANGE, CONSISTING OF TWO FLANGES WITH
JACK SCREWS BUT WITHOUT BOLTS AND GASKETS, RATING CLASS 600, RAISED
FACE, BORE TO MATCH .312 IN WALL, MATERIAL FORGED 304 STAINLESS STEEL
ASTM-A182 GRADE F304, SPECIFICATION: SP-50-13 REV: 1
Now the user needs to enter into a textbox for instance the following:
ASTM-A182 F304 WELD NECK
Currently I use this code:
$sql = "SELECT * FROM commtable WHERE MATCH (ldisc) AGAINST ('" . $ldisc . "' IN NATURAL LANGUAGE MODE);";
But most records returned (it returns hundreds of records) don't contain all the search terms entered in the text field.
How can I fine tune this full text search (or use another method) to give me a better result more closely to what was entered?
EDIT:
Table type:
EDIT 2:
It is working now, added this code:
if ($ldisc != "") {
$parts = explode(" ", $ldisc);
$tempSQL = "SELECT * FROM commtable WHERE MATCH (ldisc) AGAINST (" . "'";
for ($i = 0; $i < count($parts); $i++) {
$tempSQL = $tempSQL . '+' . $parts[$i] . ' ';
}
$tempSQL = $tempSQL . "' IN BOOLEAN MODE);";
$sql = $tempSQL;
$result = mysqli_query($con, $sql);
}
And changed the minimum word length to 1.
This question sounds like basically what you're looking for: MySQL fulltext search - Only results that contain all words
NATURAL LANGUAGE MODE by its nature returns approximate matches. To match all words in BOOLEAN MODE, you must add a + in front of every required word. For example,
MATCH (ldisc) AGAINST ('+ASTM-A182 +F304 +WELD +NECK' IN BOOLEAN MODE)
You'll have to split the input string and add the + signs. How to do that is left as an exercise to the programmer. =)
Change it to boolean mode
"SELECT * FROM commtable WHERE MATCH (ldisc) AGAINST ('" . $ldisc . "' IN BOOLEAN MODE);";
Another thing is to keep an eye on ft_min_word_length
Related
As per my previous question, a software recruiter can enter a boolean text string, such as C++ AND ((UML OR Python) OR (not Perl)), which I will translate to SELECT * FROM candidates WHERE skill=C++ AND ((skill=UML OR skill=Python) OR (not skill=Perl)).
(I highlighted such as because some answers seem to think that I am only interested in this query. It is only an example. I seek a generic solution, coded in PHP. Maybe a regex? Just some code which finds every sub-term of a query, so that I can query the sub-terms individually.)
I could like to COUNT(*) the number of hits, but I would also be very interested to know how much each "sub-clause" (if that's the correct term) of the query contributed to the result.
E.g. there might have been 200 candidates with C++, but 50 were not suitable because they have neither UML nor Python experience.
So, using either PHP (and rexex?) or MySql, how can I break that down to see which parts of the search term contribute to the result?
I.e, break down skill=C++ AND ((skill=UML OR skill=Python) OR (not skill=Perl)) into COUNT(*) WHERE skill=C++ and `COUNT (*) WHERE (skill=UML OR skill=Python), etc
I don't know if MySQL has some sort of EXPLAIN for this, but suspect not, so that I will have to break out the SELECT as described and COUNT each sub-clause separately.
We need a method for splitting the conditions. However we cannot split ANDs and ORs as equal because ANDs have higher priority over ORs.
So in an example like this:
Cond1 AND Cond2 OR Cond3
We cannot split by AND|OR because we would be missing Cond1 AND Cond2 as a whole.
So the first thing to do is add extra parenthesis (with regexes) where needed so that the following algorithm will split correctly the conditions. In the previous example, It would be (Cond1 AND Cond2) OR Cond3.
Once setup, we use a regex to fetch conditions for the current Level. We need to use recursive regular expressions in order to detect opening/closing parenthesis.
Every conditions is stored on an array and then sent to be processed (recursively). This is because some conditions may be complex and have nested conditions.
All this conditions and sub-conditions are being stored on the array.
Once you have all the conditions (and sub-conditions) you have two alternatives to mount the SQL.
The first option would be a single query with no WHERE clause and one SUM for every condition. This is probably best if the are no that many rows on the table
The second option is running multiple SELECT count(*) queries with all the conditions.
I leave here the php code. I also added an option to customize the maximum number of nest levels when splitting the conditions.
You have a demo on Ideone, here.
<?php
$conditions = 'C++ AND ((UML OR Python) OR (not Perl))';
// Other tests...
//$conditions = "C++ AND Python OR Perl";
//$conditions = "C++ AND Python OR Perl OR (Perl AND (Ruby AND Docker AND (Lisp OR (C++ AND Ada) AND Java)))";
///////// CONFIGURATION /////////
$maxNest = 0; // Set to 0 for unlimited nest levels
/////////////////////////////////
print "Original Input:\n";
print $conditions . "\n\n";
// Add implicit parenthesis...
// For example: `A AND B OR C` should be: `(A AND B) OR C`
$addParenthesis = '/(?|(((?:\bNOT\b\s*+)?+[^)(\s]++|(?:\bNOT\b\s*+)?+[(](?:\s*+(?2)\s*+)*+[)])(?:\s*+\bAND\b\s*+((?2)))++)(?=\s*+\bOR\b\s*+)|\s*+\bOR\b\s*+\K((?1)))/im';
while (preg_match($addParenthesis, $conditions)) {
$conditions = preg_replace($addParenthesis, '(\1)', $conditions);
}
print "Input after adding implicit parenthesis (if needed):\n";
print $conditions . "\n\n";
// Optional cleanup: Remove useless NOT () parenthesis
$conditions = preg_replace('/[(]\s*((?:NOT\s*)?+(\S+))\s*[)]/i', '\1', $conditions);
// Optional cleanup: Remove useless NOT NOT...
$conditions = preg_replace('/\bNOT\s+NOT\b/i', '', $conditions);
$list_conditions = [$conditions];
function split_conditions($input, $level = 0) {
global $list_conditions, $maxNest;
if ($maxNest > 0 && $level >= $maxNest) { return; }
// If it is a logic operator, skip
if ( preg_match('/^\s*(?:AND|OR)\s*$/i', $input) ) {
return;
}
// Add condition to the list:
array_push($list_conditions, $input);
// Don't go on if this is a single filter
if ( preg_match('/^\s*(?:NOT\s+)?+[^)(\s]+\s*$/i', $input) ) {
return;
}
// Remove parenthesis (if exists) before evaluating sub expressions
// Do this only for level > 0. Level 0 is not guaranteed to have
// sorrounding parenthesis, so It may remove wanted parenthesis
// such in this expression: `(Cond1 AND Cond2) OR (Cond3 AND Cond4)`
if ($level > 0) {
$input = preg_replace('/^\s*(?:NOT\b\s*)?+[(](.*)[)]\s*$/i', '\1', $input);
}
// Fetch all sub-conditions at current level:
$next_conds = '/((?:\bNOT\b\s*+)?+[^)(\s]++|(?:\bNOT\b\s*+)?+[(](?:\s*+(?1)\s*+)*+[)])/i';
preg_match_all($next_conds, $input, $matches);
// Evaluate subexpressions
foreach ($matches[0] as $match) {
split_conditions($match, $level + 1);
}
}
split_conditions($conditions);
// Trim and remove duplicates
$list_conditions = array_unique(array_map(function($x){
return preg_replace('/^\s*|\s*$/', '', $x);
}, $list_conditions));
// Add columns
$list_conditions = array_map(function($x){
return preg_replace('/([^\s()]++)(?<!\bAND\b)(?<!\bOR\b)(?<!\bNOT\b)/i', "skill='$1'", $x);
}, $list_conditions);
print "Just the conditions...\n\n";
print_r($list_conditions);
print "\n\n";
print "Method 1) Single query with multiple SUM\n\n";
$sum_conditions = implode(",\n", array_map(function($x){
return " SUM( $x )";
}, $list_conditions));
$sumSQL = "SELECT\n$sum_conditions\nFROM candidates;";
print $sumSQL . "\n\n";
print "Method 2) Multiple queries\n\n";
$queries = implode("\n", array_map(function($x){
return "SELECT count(*) from candidates WHERE $x;";
}, $list_conditions));
print $queries . "\n\n";
While not the most elegant solution, the WITH ROLLUPMysql function could be useful. See https://dev.mysql.com/doc/refman/8.0/en/group-by-modifiers.html
In its simplest method, you could write this query to capture unique skills:
SELECT skill, COUNT(skill) AS mycount
FROM cands
GROUP BY skill WITH ROLLUP
This will return the total count of all skills, with a NULL line at the bottom with the total, like this:
|skill |mycount |
|--------|---------|
|C++ | 2 |
|Java | 3 |
|Python | 4 |
|NULL | 9 |
By adding boolean operations, you could obtain a more complex result:
SELECT skill, COUNT(skill) AS mycount, SUM(IF(skill='C++' || skill='Python', 1, 0)) AS CorPython
FROM cands
GROUP BY skill WITH ROLLUP
With this second option, the CorPython column will sum up -- at the last NULL line -- the total people with "C or Python". You can make this boolean section as complex are necessary.
|skill |mycount |CorPython |
|--------|---------|-----------|
|C++ | 2 | 2 |
|Java | 3 | 0 |
|Python | 4 | 4 |
|NULL | 9 | 6 | <-- This is the value you want (6)
How about using the built-in MySQL full-text search capabilities? The return is automatically ranked with the best matches at the top.
You could create a new column which holds all the skills of the candidate. Then a search on that field would give you the ranked results.
Full-Text Search Functions
SELECT
count(*),
sum(skill=C++),
sum(skill=UML),
sum(skill=Python),
sum(not skill=Perl)
FROM candidates WHERE TRUE
AND skill=C++
AND (FALSE
OR (FALSE
OR skill=UML
OR skill=Python)
OR (not skill=Perl)
)
Recompute a table from SELECT skill, COUNT(*) FROM tbl and the complements.
Provide the full table from step 1; let the recruiter eyeball the list.
To get fancier, simply strip parens, OR, AND from the text string to get the various skills mentioned. Then display only those.
But neither of these handle (UML OR Python) or non-adjacent things like (C++ and not Perl). Anyway, how many counts would you expect from your example? There is also (UML OR Python) AND C++ and several more.
Don't even think of parsing via SQL; use some client language. Or pose the question to the candidates.
Code hints
In Perl, one might do:
$str =~ s{[()]|AND|OR|NOT}{ }ig;
$str =~ s{ +}{ }g;
#skills = split(' ', $str);
The PHP code would use preg_replace and explode, but otherwise be similar. In your example, C++ AND ((UML OR Python) OR (not Perl)) would become the array ['C++', 'UML', 'Python', 'Perl']
Hi that's not abig deal watch out
$sqlresult =array ('php, html, php, c++, perl');
//that is array result from MySQL and now we need to count every term alone only in php
//now I create this
function getcount ($word, $paragraphp){
if (preg_match("/$word/i", $paragraph))
$count = 1;
else
$count = 0;
return $count;
}
foreach ( $sqlresult as $key ) {
$finalresult = array ();
$finalresult['$key'] += getcount($key, $key);
}
//now retrieve results as following
$php = " results for php word is $finalresult[php]";
$perl = "results for perl word is $finalresult[perl]";
echo $php;
echo $perl;
If you have paragraph with many words so you should convert it to array with explode php function first and start steps as above
In big project that is not suitable you need a good substitute to MySQL
In this case I suggest SPHINX search
After you run query in SPHINX
Run this query
SHOW META;
This will give every word in your search with hit count for more details check this http://sphinxsearch.com/docs/current/sphinxql-show-meta.html
I'm trying to get strings from a my_sqli query that only start with a certain letter, however it always outputs the whole array. Here's my code, I'm trying to get the first value, see if it starts with the correct letter, then echo it if it does, then go on to the next.
$strSQL = "SELECT title FROM blogtable ORDER BY title ASC";
$titleResult = mysqli_query($con, $strSQL);
while($rowTitle = mysqli_fetch_array($titleResult))
{
$strTitle = $rowTitle['title'];
$subTitle = substr($strTitle,0,2);
$subNum = ord($subTitle);//This gets me the value of the first letter
if($subNum = $topLetter)//$topLetter = 65, which is capital A
{
echo $strTitle;
echo "<br>";
}
}
So the problem here is that, say if I have 3 things, and only 2 start with A, it will output all 3, but I just want the 2 that start with A.
Change your if statement.
if($subNum == $topLetter)
Explanation: Operator == test for equality and operator = is assignment operator.
Basically, the idea is that people filter search results on the length. However, length is stored in a text field, as this can be different from item to item. An item could be 16,20 metres, and another could be 8.1m.
How would one even start? Get all the possible values from the database, change them to the format that is filtered on (parse everything to numeric only, comma separated?), get the associated ID, and then only show all the info related to those IDs, or is there a way I haven't found yet?
Kind regards.
edit: Standardizing format is a solution, but not one I can apply in this situation. I am not allowed to do this. It gets worse - the filtering can have both a minimum and a maximum value. Minimum: 4 and maximum: 8 should show everything between those lengths. 6.1m, 7,82 metres, 5.
Because I couldn't change the way the database was set up (standardize is, keep separate fields for the length itself, a float/double, and a field for the appendix), I've decided to go with this approach.
First get all the possible lengths, then:
foreach($lengths as $length) {
$cLengths[$length['item_id']] = preg_replace("/[^0-9,.]/", "", str_replace(',', '.', $length['field_value']));
}
Assuming the page would then be called with &minLength=2&maxLength=10 in the URL:
$minLength = $_GET['minLength'];
$maxLength = $_GET['maxLength'];
if(!is_null($minLength) && !is_null($maxLength)) {
$filteredItems = '';
foreach($cLengths as $itemId => $cL) {
if($cL >= $minLength && $cL <= $maxLength) {
$filteredItems .= $itemId . ',';
}
}
$filteredItems = substr($filteredItems, 0, -1);
$where .= 'item_id IN(' . $filteredItems . ') AND ';
}
I would recommend to standardize a format for length. It is hard to filter numeric values stored in different formats.
You can use the LIKE operator to search for any pattern in a string:
SELECT * FROM table WHERE length LIKE '%filtervalue%'
I would simply enforce standard during your query. Assuming your length column is named "length" and you query table is "table":
SELECT something, replace(length,',','.') + 0 as mylen FROM `table` HAVING mylen BETWEEN 16.2 AND 18
It would not be very fast, you can also use replace + 0 directly in where, it would remove the need for HAVING that is:
SELECT something FROM `table` replace(length,',','.') + 0 BETWEEN 16.2 AND 18
Good Luck!
I made a simple query system through mySQL which is showing me 100 records and I fetch them into my game but I have probelm with the codes in PHP.
I want to have 5char space between each row So I have to use tab space (\t\t\t\t\t), But I have a problem with this current system (e.g If I have field with two diffrent string value 10char and 2char then use tab space to make space between them I get different results:
2Char string + 5char space = 7Char and 10Char string + 5Char space = 15Char
$query = "SELECT * FROM `scores` ORDER by `score` DESC LIMIT 100";
$result = mysql_query($query) or die('Query failed: ' . mysql_error());
$num_results = mysql_num_rows($result);
for($i = 0; $i < $num_results; $i++)
{
$row = mysql_fetch_array($result);
echo $i+1 . "-" . "\t\t Name: " .$row['name'] . "\t\t\t\t Device: " . $row['device'] . "\n \t\t Difficulty: " . $row['level']. "\t\t\t\t Score: " . $row['score'] . "\n\n";
}
Codes Output
1- Name: James Device: HTC OneX
Difficulty: Hard Score: 5760
2- Name: Erika_S Device: PC
Difficulty: Normal Score: 13780
...
My Desired Output
1- Name: James Device: HTC OneX
Difficulty: Hard Score: 5760
2- Name: Erika_S Device: PC
Difficulty: Normal Score: 13780
...
Tab in fact is one char, but displayed in the way that user want. When, for example, in IDE you choose 8 spaces for 1 tab you will get it. There's a fantastic concept called elastic tabstops, but it's only concept - so sad.
Conclusion: you can't do it what you described with tab.
What you can do:
Calculate needed spaces and hardcode with , but it's dirty and you shouldn't do this.
Use html tables
Instead of $row['...'] use sprintf("%-15s", $row['...']), but in each place you'll need to adjust the number (-15) to what's really needed
<?php
$s = 'monkey';
$t = 'many monkeys';
printf("[%s]\n", $s); // standard string output
printf("[%10s]\n", $s); // right-justification with spaces
printf("[%-10s]\n", $s); // left-justification with spaces
printf("[%010s]\n", $s); // zero-padding works on strings too
printf("[%'#10s]\n", $s); // use the custom padding character '#'
printf("[%10.10s]\n", $t); // left-justification but with a cutoff of 10 characters
?>
The above example will output:
[monkey]
[ monkey]
[monkey ]
[0000monkey]
[####monkey]
[many monke]
read more at http://www.php.net/manual/en/function.sprintf.php
if you can't use printf, you can easily create your own function that does something similar, and is enough for what you need:
function add_spaces($str, $total_len) {
return $str . substr(" ", 0, $total_len - strlen($str));
}
I'm working on a legacy database table that has a phone no. field
but the problem is that all the entries (over 4k) are in varying
formats.
I was wondering if there was a quick way to fix them by looping through the records using PHP and updating the records to a particular phone format
4K doesn't sound like many records at all.
And I'd bet that the varying formats fall into a finite number of combinations.
I wonder if it'd be possible with a few judicious SQL queries. If your RDBMS has regular expressions and string replacement functions you could manage it with a few UPDATE instructions. If not, any language with the capability of querying a database, doing string replacement, and updating would do the job.
I agree 4k records isn't anything to worry about. I suggest querying the phone numbers and the primary id of the table, stripping all characters from phone number and then manipulating it to be the format you want. Or, you could keep it as only numbers and use your front-end to modify the number every time you display it. Below is a little untested script you can try to use. Doesn't handle extensions and expects there are 10 numbers.
// mysql_connect stuff
$q = mysql_query("Select phone_id, phone_number From phones");
while($r = mysql_fetch_assoc($q)) {
$num = ereg_replace("[^0-9]", "", $r['phone_number']);
if(strlen($num) == 10) {
$num = substr($num, 0, 3) . '-' . substr($num, 3, 3) . '-' . $substr($num,-4);
}
$update = mysql_query("Update phones Set phone_number = '" . $num . "' Where phone_id = " . $r['phone_id']);
// updated?
}