Let me describe the problem based on the example below.
Lets say there is a string "abc12345" (could be any!!!) and there is a table mytable with a column mycolumn of varchar(100).
There are some rows that ends with the last character 5.
There are some rows that ends with the last characters 45.
There are some rows that ends with the last characters 345
There are no rows that ends with the last characters 2345.
In this case these rows should be selected:
SELECT * FROM mytable WHERE mycolumn LIKE "%345"
That's because "345" is the longest right substring of "abc12345" that occurs at least once as the right substring of at least one string in the mycolumn column.
Any ideas how to write it in one query?
Thank you.
This is a brute force method:
select t.*
from (select t.*,
dense_rank() over (order by (case when mycolumn like '%abc12345' then 1
when mycolumn like '%bc12345' then 2
when mycolumn like '%c12345' then 3
when mycolumn like '%12345' then 4
when mycolumn like '%2345' then 5
when mycolumn like '%345' then 6
when mycolumn like '%45' then 7
when mycolumn like '%5' then 8
end)
) as seqnum
where mycolumn like '%5' -- ensure at least one match
from t
) t
where seqnum = 1;
This then inspires something like this:
select t.*
from (select t.*, max(i) over () as maxi
from t join
(select str, generate_series(1, length(str)) as i
from (select 'abc12345' as str) s
) s
on left(t.mycolumn, i) = left(str, i)
) t
where i = maxi;
Interesting puzzle :)
The hardest problem here is finding what is the length of the target suffix matching your suffix pattern.
In MySQL you probably need to use either generating series or a UDF. Others proposed these already.
In PostgreSQL and other systems that provide regexp-based substring, you can use the following trick:
select v,
reverse(
substring(
reverse(v) || '#' || reverse('abcdefg')
from '^(.*).*#\1.*'
)) res
from table;
What it does is:
constructs a single string combining your string and suffix. Note, we reverse them.
we put # in between the strings that's important, you need a character that doesn't exist in your string.
we extract a match from a regular expression, using substring, such that
it starts at the beginning of the string ^
matches any number of characters (.*)
can have some remaining characters .*
now we find #
now, we want the same string we matched with (.*) to be present right after #. So we use \1
and there can be some tail characters .*
we reverse the extracted string
Once you have the longest suffix, finding maximum length, and then finding all strings having the suffix of that length is trivial.
Here's a SQLFiddle using PostgreSQL:
If you cannot restructure the table I would approach the problem this way:
Write an aggregate UDF LONGEST_SUFFIX_MATCH(col, str) in C (see an example in sql/udf_example.c in the MySQL source, search for avgcost)
SELECT #longest_match:=LONGEST_SUFFIX_MATCH(mycol, "abcd12345") FROM mytbl; SELECT * FROM mytbl WHERE mycol LIKE CONCAT('%', SUBSTR("abcd12345", -#longest_match))
If you could restructure the table, I do not have a complete solution yet, but the first thing I would add a special column mycol_rev obtained by reversing the string (via REVERSE() function) and create a key on it, then use that key for lookups. Will post a full solution when I have a moment.
Update:
If you can add a reversed column with a key on it:
use the query in the format of `SELECT myrevcol FROM mytbl WHERE myrevcol LIKE CONCAT(SUBSTR(REVERSE('$search_string'), $n),'%') LIMIT 1 performing a binary search with respect to $n over the range from 1 to the length of $search_string to find the largest value of $n for which the query returns a row
SELECT * FROM mytbl WHERE myrevcol LIKE CONCAT(SUBSTR(REVERSE('$search_string'), $found_n),'%')
This solution should be very fast as long as you do not have too many rows coming back. We will have a total of O(log(L)) queries where L is the length of the search string each of those being a B-tree search with the read of just one row followed by another B-tree search with the index read of only the needed rows.
Related
I have a SQL Server connection to an external table in my application and I need to make a query where one of the columns has wrong formatting, let's say, the format is alphanumeric without symbols but the column has data with dashes, apostrophes, dots, you name it. Is it possible to just query one of the columns with that filtered out? It'd really help me. I'm using Laravel and I know I can make an accessor to clean that out but the query is heavy.
This is an example:
Data sought: 322211564
Data found: 322'211'564
Also 322-211-564
EDIT: Just to clarify, I don't want to EXCLUDE data, but to "reformat" it without symbols.
EDIT: By the way, if you're curious using Laravel 5.7 apparently you can query the accessor directly if you have the collection already. I'm surprised but it does the trick.
A wild card guess, but perhaps this works:
WITH VTE AS(
SELECT *
FROM (VALUES('322''211''564'),
('322-211-564')) V(S))
SELECT S,
(SELECT '' + token
FROM dbo.NGrams8k(V.S,1) N
WHERE token LIKE '[A-z0-9]'
ORDER BY position
FOR XML PATH('')) AS S2
FROM VTE V;
This makes use of the NGrams8k function. If you need other acceptable characters you can simply add them to the pattern string ('[A-z0-9]').
If, for some reason, you don't want to use NGrams8k, you could create an inline tally table, which will perform a similar function:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1 --10
CROSS JOIN N N2 --100
CROSS JOIN N N3 --1000
CROSS JOIN N N4 --10000 --Do we need any more than that? You may need less
),
VTE AS(
SELECT *
FROM (VALUES('322''211''564'),
('322-211-564')) V(S))
SELECT V.S,
(SELECT '' + SS.C
FROM Tally T
CROSS APPLY (VALUES(SUBSTRING(V.S,T.I,1))) SS(C)
WHERE SS.C LIKE '[A-z0-9]'
ORDER BY T.I
FOR XML PATH(''),TYPE).value('.','varchar(8000)') AS S2
FROM VTE V;
Also, just in case, I've used the TYPE format and the value function. If you then change your mind about not wanting any special characters and need an acceptable character like &, it won't be changed to &.
Note for pattern-based string replacements, you can use a library like SQL Server Regex. Call RegexReplace on the string you want to transform:
select RegexReplace(col, '[^A-Za-z0-9]', '') from tbl
That call will remove any non-alphanumeric character.
To find all the rows where the column contains only alphanumeric characters:
select col from tbl where col not like '%[^A-Za-z0-9]%'
The like pattern consists of:
% - Matches 0 or more characaters.
[^A-Za-z0-9] - Matches any character not in A-Z, a-z, and 0-9. The ^ symbol at the beginning of the character class means characters that do not match.
By using not like your query will reject strings that contain a non-alphanumeric character anywhere in the string.
I've got a database table mytable with a column name in Varchar format, and column date with Datetime values. I'd like to count names with certain parameters grouped by date. Here is what I do:
SELECT
CAST(t.date AS DATE) AS 'date',
COUNT(*) AS total,
SUM(LENGTH(LTRIM(RTRIM(t.name))) > 4
AND (LOWER(t.name) LIKE '%[a-z]%')) AS 'n'
FROM
mytable t
GROUP BY
CAST(t.date AS DATE)
It seems that there's something wrong with range syntax here, if I just do LIKE 'a%' it does count properly all the fields starting with 'a'. However, the query above returns 0 for n, although should count all the fields containing at least one letter.
You write:
It seems that there's something wrong with range syntax here
Indeed so. MySQL's LIKE operator (and SQL generally) does not support range notation, merely simple wildcards.
Try MySQL's nonstandard RLIKE (a.k.a. REGEXP), for fuller-featured pattern matching.
I believe LIKE is just for searching for parts of a string, but it sounds like you want to implement a regular expression to search for a range.
In that case, use REGEXP instead. For example (simplified):
SELECT * FROM mytable WHERE name REGEXP "[a-z]"
Your current query is looking for a string of literally "[a-z]".
Updated:
SELECT
CAST(t.date AS DATE) AS 'date',
COUNT(*) AS total,
SUM(LENGTH(LTRIM(RTRIM(t.name))) > 4
AND (LOWER(t.name) REGEXP '%[a-z]%')) AS 'n'
FROM
mytable t
GROUP BY
CAST(t.date AS DATE)
I believe you want to use WHERE REGEXP '^[a-z]$' instead of LIKE.
You have regex in your LIKE statement, which doesn't work. You need to use RLIKE or REGEXP.
SELECT CAST(t.date AS DATE) AS date,
COUNT(*) AS total
FROM mytable AS t
WHERE t.name REGEXP '%[a-zA-Z]%'
GROUP BY CAST(t.date AS DATE)
HAVING SUM(LENGTH(LTRIM(RTRIM(t.name))) > 4
Also just FYI, MySQL is terrible with strings, so you really should trim before you insert into the database. That way you don't get all that crazy overhead everytime you want to select.
I have table column that contain strings seperated by , like so
Algebraic topology,Riemannian geometries
Classical differential geometry,Noncommutative geometry,Integral transforms
Dark Matter
Spectral methods,Dark Energy,Noncommutative geometry
Energy,Functional analytical methods
I am trying to search for the MySQL row that has a string between comma, for example if I was search for Noncommutative geometry, I want to select these two rows
Classical differential geometry,Noncommutative geometry,Integral transforms
Spectral methods,Dark Energy,Noncommutative geometry
This is what I tried
SELECT * FROM `mytable` WHERE ``col` LIKE '%Noncommutative geometry%'
which works fine, but there problem is that if I was searching for Energy I want to select the row
Energy,Functional analytical methods
but my code gives the two rows
Energy,Functional analytical methods
Spectral methods,Dark Energy,Noncommutative geometry
which is not what I am looking for. Is there a way to fix this so that it only finds the rows that have the string between commas?
Give these a try, using the REGEXP operator:
SELECT * FROM `mytable`
WHERE `col` REGEXP '(^|.*,)Noncommutative geometry(,.*|$)'
SELECT * FROM `mytable`
WHERE `col` REGEXP '(^|.*,)Energy(,.*|$)'
The expression being used ('(^|.*,)$searchTerm(,.*|$)') requires the search term to be either preceded by a comma or the beginning of the string, and followed by either a comma or the end of the string.
you can do like this
SELECT * FROM `mytable` WHERE `col` LIKE '%,$yourString,%'
or `col` LIKE '$yourString,%'
or `col` LIKE '%,$yourString'
I want to choose a simple naming conventions to the table in my database.
It means I have to name each row with a random 2 word string.
For example
ID NAME
1 ROMEL SUMPI
2 BORMI SUIEMOD
and so on,,,,,,,,,
It means each NAME column should have a Unique name.....
How can I do this in PHP which uses postgreSQL DB....
Thank you in advance,,.,,,,
It sounds like you might want a pair of randomly-selected words from a dictionary. It's kind of hard to tell given the lack of clarity of the question.
RANDOM DICTIONARY WORDS
The best way to pick random dictionary words is probably at the PHP end by using a pass-phrase generator that does it for you.
You can do it in PostgreSQL using a table dictionary with one word per row, though:
SELECT word FROM dictionary ORDER BY random() LIMIT 2;
Performance will be truly awful with a large dictionary. It can be done much faster if the dictionary doesn't change and there's a unique word_id with no gaps in the numbering, allowing you to write:
CREATE OR REPLACE FUNCTION get_random_word() RETURNS text AS $$
SELECT word FROM dictionary
WHERE word_id = (
SELECT width_bucket(random(), 0, 1, (SELECT max(word_id) FROM dictionary))
);
$$ LANGUAGE sql;
SELECT get_random_word() || ' ' || get_random_word();
against a table like this:
CREATE TABLE dictionary(word_id serial primary key, word text UNIQUE NOT NULL);
This will only produce consistent results if there are no gaps in the word numbering and if word_id is unique or the PRIMARY KEY. It can produce the same word twice. If you want to avoid that you'll need a recursive CTE or some PL/PgSQL.
RANDOM GIBBERISH
If you actually want truly random strings, that's already well covered here on Stack Overflow. See How do you create a random string that's suitable for a session ID in PostgreSQL? among others; look at this search.
To ensure uniqueness, just add a UNIQUE constraint. Have your application test to see if a unique_violation was raised when you INSERTed the row, and insert it with a new random ID if a violation occurred. If you like you can automate this with a PL/PgSQL helper procedure, though it'll still be subject to races between concurrent inserts in different transactions.
I suspect you actually mean "arbitrary" unique names. Simple way could be:
INSERT INTO tbl (id, name)
SELECT g, 'name'::text || g || ' surname' || g
FROM generate_series(1, 1000) g;
.. to generate 1000 distinct names - not random at all, but unique.
To generate 100 names consisting of two words with 3 - 10 random letters from A-Z:
INSERT INTO tbl (id, name)
SELECT g%100
,left(string_agg(chr(65 + (random() * 25)::int), ''), 3 + (random() * 7)::int)
|| ' ' ||
left(string_agg(chr(65 + (random() * 25)::int), ''), 3 + (random() * 7)::int)
FROM generate_series(1, 1000) g
GROUP BY 1
ORDER BY 1;
ASCII code of 'A' is 65, the one of 'Z' is 90. Fortunately, the range between spans the basic upper case alphabet. You can find out with the ascii() function, which is the reverse of chr():
SELECT ascii('A')
The second method doesn't guarantee uniqueness, but duplicates are extremely unlikely with just a few hundred names. Eliminating possible duplicates is trivial. Add another SELECT layer where you GROUP BY name and pick min(id).
What should I use, to search for a keyword with mysql ?
I have a word and in the query I have
wordless
something word something
someword some
else
other
wordother
thingword
I want to output everything that has the word inside it, but the output to be like first outputed rows to be that rows with word as first letter on them, for example
wordless - will be the first because word are first characters of the word wordless
and the wordother to be outputed to to the first outputed rows, then after them to output something word something and etc, every word that contains the name word, but again to output first that rows that have the word at the first characters.
EDIT:
SELECT *,MATCH(add_songName) AGAINST('d' IN BOOLEAN MODE) asscoreFROM songs WHERE MATCH(add_songName) AGAINST('d') ORDER BYscoreDESC , Here i'm searching for d but it gives me an error -
Can't find FULLTEXT index matching the column list SELECT *,MATCH(add_songName) AGAINST('d' IN BOOLEAN MODE) as `score` FROM songs WHERE MATCH(add_songName) AGAINST('d') ORDER BY `score` DESC
Try to use Levenshtein algorithm in MySQL.
Levenshtein matching is a metric for measuring the amount of difference between two sequence, here it is strings. By default MySQL does not have this function, but you can write and add one.
Please take a look at the code here and add that code as a system function in MySQL, please see the example below on how to get the similarity of two strings.
Please see: https://github.com/rakesh-sankar/Tools/blob/master/MySQL/Levenshtein.txt
Example:
SELECT column1, LEVENSHTEIN(column1, 'matchme') AS perfectmatch FROM sometable ORDER BY perfectmatch DESC