I use the following in mysql to get titles in libray sort, e.g. The Godfather is sorted as Godfather.
SELECT apnumber, aptitle, IF( aptitle LIKE 'The %', SUBSTRING( aptitle, 5 ) , IF( aptitle LIKE 'A %', SUBSTRING( aptitle, 3 ) , IF( aptitle LIKE 'An %', SUBSTRING( aptitle, 4 ) , aptitle ) ) ) AS sorttitle etc...
What's the most efficient way in PHP to change a string into a library sorted version? That is dropping any "An ", "A " or "The " at the beginning of a title. Am I correct in assuming I need to be looking at something like:
substr_replace( "The " , "" , 0)
I think you are generally correct, yes. As far as I'm aware, there is nothing built in to either PHP or MySQL to assist with this type of sorting. Code implementations I've seen have all used a set of substring replacement rules to get rid of unwanted prefixes, just as you are assuming above. Generally preg_replace() is used so that you can specify a match at the beginning of the string only:
preg_replace("/^The /",'',$title,1);
Something to consider if you're willing to forego a little disk space and normalization would be to store a second column called something like sort_title, and do the prefix removal before inserting the records. This would allow you to index and ORDER BY on the sort_title field within MySQL, and reduce the complexity of your SELECT statements.
As another aside, this would be a great extension to write for PHP!
Related
I am having issues structuring a sqlite statement. I am familiar with instr() and IN however is it possible to use them together? I have an array of multiple substrings that I would like to see if a column contains any of them.
Here is an example of what I am trying to accomplish with no luck
$array = array('-SVA[rev1]', '-KINGS[rev2]', '-TBS[rev3]');
$query2 = $db->query("SELECT * FROM BOT_Downloads WHERE instr( FileName, IN ('$array') ) AND SeriesTitle = 'The Simple Truth' ");
I know using LIKE '% %' instead of instr() would be another way however that one is over my head as well when combining it with IN
Your logic needs to be completely revised; IN is a SQL statement to determine if the contents of the parameter on the left of the IN statement is equal to any of the literal comma separated values in the statement's parenthesis, and is not a valid parameter for the INSTR function. You should be getting errors from SQLite when you try to execute that statement.
Your current code should produce a SQL query that looks like:
SELECT *
FROM BOT_Downloads
WHERE instr( FileName, IN ('-SVA[rev1],-KINGS[rev2],-TBS[rev3]') )
AND SeriesTitle = 'The Simple Truth';
I guarantee that is not what you want. You want to build a statement that looks more like this:
SELECT *
FROM BOT_Downloads
WHERE (
instr( FileName, '-SVA[rev1]') or
instr( FileName, '-KINGS[rev2]') or
instr( FileName, '-TBS[rev3]'))
AND SeriesTitle = 'The Simple Truth’;
As an aside the PHP array function is not going to surround array elements with single quotes, so your code will build a statement that looks like:
SELECT *
FROM BOT_Downloads
WHERE instr( FileName, IN ('-SVA[rev1],-KINGS[rev2],-TBS[rev3]'))
AND SeriesTitle = 'The Simple Truth’;
where you were probably shooting for something that looked more like this:
SELECT *
FROM BOT_Downloads
WHERE instr( FileName, IN ('-SVA[rev1]','-KINGS[rev2]','-TBS[rev3]'))
AND SeriesTitle = 'The Simple Truth’;
However as the IN clause is not a valid parameter to the Instr function then either way is wrong. The IN clause requires each string literal to have delimiters, otherwise it’s just going be an equality operator looking for the one string contained within the parenthesis. The PHP array method does not include those delimiters by default.
Continuation:
In continuation of my previous answer, I used the following SQL to create and populate a test table:
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS `BOT_Downloads` (
`FileName` TEXT,
`SeriesTitle` TEXT,
`Note` TEXT
);
INSERT INTO `BOT_Downloads` (FileName,SeriesTitle,Note) VALUES ('File-SVA[rev1].txt','The Simple Truth','matches filename and series title'),
('File-KINGS[rev2].txt','The Simple Truth','matches filename and series title'),
('File-TBS[rev3].txt','The Simple Truth','matches filename and series title'),
('File-WTH[rev1].txt','The Simple Truth','matches series title, doesn’t match filename'),
('File-KINGS[rev1].txt','The Simple Truth','matches series title, doesn’t match filename'),
('File-SVA[rev1].txt','War and Peace','matches filename, doesn’t match series title'),
('File-KINGS[rev2].txt','War and Peace','matches filename, doesn’t match series title'),
('File-TBS[rev3].txt','War and Peace','matches filename, doesn’t match series title'),
('File-WTH[rev1].txt','War and Peace','No matches'),
('File-KINGS[rev1].txt','War and Peace','No matches');
COMMIT;
I then ran the following query:
SELECT *
FROM BOT_Downloads
WHERE (
instr( FileName, '-SVA[rev1]') or
instr( FileName, '-KINGS[rev2]') or
instr( FileName, '-TBS[rev3]'))
AND SeriesTitle = 'The Simple Truth’;
and got results of the nature you’re asking for:
"File-SVA[rev1].txt" "The Simple Truth" "matches filename and series title"
"File-KINGS[rev2].txt" "The Simple Truth" "matches filename and series title"
"File-TBS[rev3].txt" "The Simple Truth" "matches filename and series title"
The logic you are pursuing, using PHP to create an array and then expanding that into a SQL statement will not work if the elements of the array are strings because PHP will not automatically add the required string delimiters to the array element; this is why you have to manually include the single tick marks in your query in the IN statement. This would work with integers but is a bad approach IMO. Since you are not putting string delimiters around each individual string element you’re testing for, it appears to the SQL engine that you are seeking all rows where the file name contains the literal string ‘-SVA[rev1],-KINGS[rev2],-TBS[rev3]’ which is not what you are seeking, and this is why you are getting no results even though there are no syntax errors.
If you can concatenate the array items to this:
'-SVA[rev1],-KINGS[rev2],-TBS[rev3]'
then:
SELECT * FROM BOT_Downloads
WHERE
',' || '-SVA[rev1],-KINGS[rev2],-TBS[rev3]' || ',' LIKE '%,' || FileName || ',%'
AND
SeriesTitle = 'The Simple Truth'
Please help...
I need to be able to search an exact words in my database.., I've already used different methods..
Method 1
$param2 = "SELECT * from item WHERE prodname REGEXP '[[:<:]]($param)[[:>:]]'
order by CASE WHEN instr(prodname, '$param') = 0 then 1 else 0 end,
instr(prodname, '$param') ASC";`
This is working really good but when I tried searching words with \ or " it returns an error. I've already used htmlspecialchars and mysql_real_escape_string but the problem still exist..
Method 2
$param2 = "WHERE prodname LIKE '$param %' OR prodname LIKE '% $param'
OR prodname LIKE ' $param%' OR prodname LIKE '%$param '
OR prodname LIKE '% $param %'
order by CASE WHEN instr(prodname, '$param') = 0 then 1 else 0 end, instr(prodname, '$param') ASC";`
This is also working good but when I type the exact product eg "STAMP PAD INK (RED)" it returns the result "NOT FOUND" but it shows when I only type "STAMP PAD INK".. for some reason It works when I add '%$param%' but when I type an exact word ex. "INK" the word "WRINKLED" also shows and I dont want that.
I can't use fulltext..
For method 2 :
To get the exact product, use: LIKE '$param'
The % character matches any number of characters, even zero characters, but there is still the space characters in '% $param %', for instance.
More info: http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html#operator_like
I think you should try using the PHP function addslashes() for $param as in:
$param = addslashes($param);
This will make method 1 work.
mysql_real_escape_string's behavior is based on the default character set and I don't know which one you are using. Since it's not working for you, then addslashes probably will.
BTW, you may want to look further into how to use fulltext indexing because REGEXP has the potential of becoming mindnumbingly slow as the database grows. And method 2 that you show will perform very slowly, always.
I have a few 10GB files with mysql info which I would like to filter for a specific table.
The queries look like this (though can have fewer or more line breaks):
SET INSERT_ID=2/\*!\*/;
at 858735202
121124 12:36:53 server id 1 end_log_pos 0 Query thread_id=9695754 exec_time=0 error_code=0
SET TIMESTAMP=1663753413/\*!\*/;
INSERT INTO `bank_accounts_daily`
(
`accounts_bank_md5` ,
`accounts_bank_payment_desc` ,
`accounts_bank_amount` ,
`accounts_bank_number` ,
`accounts_bank_sortcode` ,
`accounts_bank_currency` ,
`accounts_bank_date`,
`accounts_bank_code`
)
VALUES
(
'zxcvxzcvxzc4c9eeca78908296a2f007',
'NAMEJO M 1105294 BBP',
'278.50',
'645450441',
'20-55-19',
'1',
'26/55/2012',
'BBP'
)
/\*!\*/
I am using this, which works to retrieve every single statement:
preg_match_all('/(SET INSERT_ID=([0-9]+)\/\*\!\*\/\;)(.*?)(\/\*\!\*\/\;)(.*?)(\/\*\!\*\/)/s', $input, $output);
But when I attempt to expand it and add the extra pattern to match specifically the 'bank_accounts_daily' pattern it does not retrieve anything(regardless of backticks being escaped or not):
preg_match_all('/(SET INSERT_ID=([0-9]+)\/\*\!\*\/\;)(.*?)(\/\*\!\*\/\;)(.*?)(INSERT INTO \`bank_accounts_daily\`)(.*?)(\/\*\!\*\/)/s', $input, $output);
I do not understand why this is not working. I've tried variations without brackets, but nothing is working. Also - are there any potential problems with my approach that I am not seeing?
Try this regexp:
/(SET INSERT_ID=([0-9]+)\/\\\*\!\\\*\/\;)(.*?)(\/\\\*\!\\\*\/\;)(.*?)(INSERT INTO `bank_accounts_daily`)(.*?)(\/\\\*\!\\\*\/)/s
You weren't matching the backslashes in the /\*!\*/ markers. I don't see how the original regexp could have worked, since it also had this mistake.
Using Drupal 7, and I'm trying to get results from the database using the LIKE command but it doesn't recognize my wildcards. I'm not sure if this is even a Drupal issue, or if I'm doing something wrong. Anyways here's an example of the data I'm trying to match, along with my patters
Data to Match
a:2:{i:1;s:2:"17";i:2;s:1:"3";}
My like Queries
$pattern1 = 'a:2:{i:1;s:2:"17";i:2;s:1:"%";}'//works
$pattern2 = 'a:2:{i:1;s:1:"%";i:2;s:1:"3";}'//fails
$result = db_query(
"
SELECT pa.nid, pa.model, pa.combination
FROM {$Product_Adjustments} pa
WHERE pa.combination LIKE :pattern
",
array(
':pattern' => $pattern1
)
);
Additionally, I've tried the '_' wildcard, but that doesn't bring anything up either
Are you sure the pattern is correct? Notice pattern 1, the first string is 2 long, and in pattern 2 you're looking for one that's only 1 long. Are you sure that's right? Are the lengths of the individual pieces of that serialized data predictable enough to even query this way? It seems unlikely, and you'll probably have to store some normalized data instead.
I'm trying to create a search engine for an inventory based site. The issue is that I have information inside bbtags (like in [b]test[/b] sentence, the test should be valued at 3, whereas sentence should be valued at 1).
Here is an example of an index:
My test sentence, my my (has a SKU of TST-DFS)
The Database:
|Product| word |relevancy|
| 1 | my | 3 |
| 1 | test | 1 |
| 1 |sentence| 1 |
| 1 | TST-DFS| 10 |
But how would I match TST-DFS if the user typed in TST DFS? I would like that SKU to have a relevancy of say 8, instead of the full 10..
I have heard that the FULL TEXT search feature in MySQL would help, but I can't seem to find a good way to do it. I would like to avoid things like UNIONS, and to keep the query as optimized as possible.
Any help with coming up with a good system for this would be great.
Thanks,
Max
But how would I match TST-DFS if the user typed in TST DFS?
I would like that SKU to have a relevancy of say 8, instead of the full 10..
If I got the question right, the answer is actually easy.
Well, if you forge your query a little before sending it to mysql.
Ok, let's say we have $query and it contains TST-DFS.
Are we gonna focus on word spans?
I suppose we should, as most search engines do, so:
$ok=preg_match_all('#\w+#',$query,$m);
Now if that pattern matched... $m[0] contains the list of words in $query.
This can be fine-tuned to your SKU, but matching against full words in a AND fashion is pretty much what the user presumes is happening. (as it happens over google and yahoo)
Then we need to cook a $expr expression that will be injected into our final query.
if(!$ok) { // the search string is non-alphanumeric
$expr="false";
} else { // the search contains words that are no in $m[0]
$expr='';
foreach($m[0] as $word) {
if($expr)
$expr.=" AND "; // put an AND inbetween "LIKE" subexpressions
$s_word=addslashes($word); // I put a s_ to remind me the variable
// is safe to include in a SQL statement, that's me
$expr.="word LIKE '%$s_word%'";
}
}
Now $expr should look like "words LIKE '%TST%' AND words LIKE '%DFS%'"
With that value, we can build the final query:
$s_expr="($expr)";
$s_query=addslashes($query);
$s_fullquery=
"SELECT (Product,word,if((word LIKE '$s_query'),relevancy,relevancy-2) as relevancy) ".
"FROM some_index ".
"WHERE word LIKE '$s_query' OR $s_expr";
Which shall read, for "TST-DFS":
SELECT (Product,word,if((word LIKE 'TST-DFS'),relevancy,relevancy-2) as relevancy)
FROM some_index
WHERE word LIKE 'TST-DFS' OR (word LIKE '%TST%' AND word LIKE '%DFS%')
As you can see, in the first SELECT line, if the match is partial, mysql will return relevancy-2
In the third one, the WHERE clause, if the full match fails, $s_expr, the partial match query we cooked in advance, is tried instead.
I like to lower case everything and strip out special characters (like in a phone number or credit card I take everything out on both sides that isn't a number)
Rather than try to create your own FTS solution, you could try to fit the MySQL FTS engine to your requirements. What I've seen done is create a new table to store your FTS data. Create a column for each different piece of data that you want to have a different relevance. For your sku field you could store the raw sku, with spaces, underscores, hyphens and any other special character intact. Then store a stripped down version with all these things removed. You may also want to store a version with leading zeros removed, as people often leave things like that out. You can store all these variations in the same column. Store your product name in another column, and the product description in another column. Create a separate index on each column. Then when you do your search, you can search each column individually, and multiply the rank of the results based on how important you think that column is. So you could multiply sku results by 10, title by 5 and leave description results as is. You may have to do a little experimentation to get the results you want, but it may ultimately be simpler than creating your own index.
Create a keywords table. Something along the lines of:
integer keywordId (autoincrement) | varchar keyword | int pointValue
Assign all possible keywords, skus, etc, into this table. Create another table, a post-keywords bridge, (assuming postId is the id you've assigned in your original table) along the lines of:
integer keywordId | integer postId
Once you have this, you can easily add keywords to each post as it is interested. To calculate total point value for a given post, a query such as the following should do the trick:
SELECT sum(pointValue) FROM keywordPostsBridge kpb
JOIN keywords k ON k.keywordId = kpb.keywordId
WHERE kpb.postId = YOUR_INTENDED_POST
I think the solution is quite straightforward unless I missed something.
Basically run two search, one is exact match, the other is like match or regex match.
Join two resultsets together, like match left join exact match. Then for example:
final_relevancy = (IFNULL(like_relevancy, 0) + IFNULL(exact_relevancy, 0) * 3) / 4
I didn't try this myself though. Just an idea.
I would add a column that is stripped of all special character's, misspellings, and then upcased (or create a function that compares on text that has been stripped and upcased). That way your relevancy will be consistent.
/*
q and q1 - you table
this query takes too much resources,
make from it update-query ( scheduled task or call it on_save if you develop new system )
*/
SELECT
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
/*many replace with junk characters
or create custom function
or if you have full db access install his https://launchpad.net/mysql-udf-regexp
*/
THEN REPLACE(REPLACE( word, '-', ' ' ), '#', ' ')
ELSE word
END word ,
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
THEN 8
ELSE relevancy
END relevancy
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q
UNION
SELECT *
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q1
it is a page coading where query result shows
**i can not use functions by use them work are more easier**
<html>
<head>
</head>
<body>
<?php
//author S_A_KHAN
//date 10/02/2013
$dbcoonect=mysql_connect("127.0.0.1","root");
if (!$dbcoonect)
{
die ('unable to connect'.mysqli_error());
}
else
{
echo "connection successfully <br>";
}
$data_base=mysql_select_db("connect",$dbcoonect);
if ($data_base==FALSE){
die ('unable to connect'.mysqli_error($dbcoonect));
}
else
{
echo "connection successfully done<br>";
***$SQLString = "select * from user where id= " . $_GET["search"] . "";
$QueryResult=mysql_query($SQLString,$dbcoonect);***
echo "<table width='100%' border='1'>\n";
echo "<tr><th bgcolor=gray>Id</th><th bgcolor=gray>Name</th></tr>\n";
while (($Row = mysql_fetch_row($QueryResult)) !== FALSE) {
echo "<tr><td bgcolor=tan>{$Row[0]}</td>";
echo "<td bgcolor=tan>{$Row[1]}</td></tr>";
}
}
?>
</body>
</html>