Regular expression for query to SQL Server - php

I have a SQL Server connection to an external table in my application and I need to make a query where one of the columns has wrong formatting, let's say, the format is alphanumeric without symbols but the column has data with dashes, apostrophes, dots, you name it. Is it possible to just query one of the columns with that filtered out? It'd really help me. I'm using Laravel and I know I can make an accessor to clean that out but the query is heavy.
This is an example:
Data sought: 322211564
Data found: 322'211'564
Also 322-211-564
EDIT: Just to clarify, I don't want to EXCLUDE data, but to "reformat" it without symbols.
EDIT: By the way, if you're curious using Laravel 5.7 apparently you can query the accessor directly if you have the collection already. I'm surprised but it does the trick.

A wild card guess, but perhaps this works:
WITH VTE AS(
SELECT *
FROM (VALUES('322''211''564'),
('322-211-564')) V(S))
SELECT S,
(SELECT '' + token
FROM dbo.NGrams8k(V.S,1) N
WHERE token LIKE '[A-z0-9]'
ORDER BY position
FOR XML PATH('')) AS S2
FROM VTE V;
This makes use of the NGrams8k function. If you need other acceptable characters you can simply add them to the pattern string ('[A-z0-9]').
If, for some reason, you don't want to use NGrams8k, you could create an inline tally table, which will perform a similar function:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1 --10
CROSS JOIN N N2 --100
CROSS JOIN N N3 --1000
CROSS JOIN N N4 --10000 --Do we need any more than that? You may need less
),
VTE AS(
SELECT *
FROM (VALUES('322''211''564'),
('322-211-564')) V(S))
SELECT V.S,
(SELECT '' + SS.C
FROM Tally T
CROSS APPLY (VALUES(SUBSTRING(V.S,T.I,1))) SS(C)
WHERE SS.C LIKE '[A-z0-9]'
ORDER BY T.I
FOR XML PATH(''),TYPE).value('.','varchar(8000)') AS S2
FROM VTE V;
Also, just in case, I've used the TYPE format and the value function. If you then change your mind about not wanting any special characters and need an acceptable character like &, it won't be changed to &.

Note for pattern-based string replacements, you can use a library like SQL Server Regex. Call RegexReplace on the string you want to transform:
select RegexReplace(col, '[^A-Za-z0-9]', '') from tbl
That call will remove any non-alphanumeric character.
To find all the rows where the column contains only alphanumeric characters:
select col from tbl where col not like '%[^A-Za-z0-9]%'
The like pattern consists of:
% - Matches 0 or more characaters.
[^A-Za-z0-9] - Matches any character not in A-Z, a-z, and 0-9. The ^ symbol at the beginning of the character class means characters that do not match.
By using not like your query will reject strings that contain a non-alphanumeric character anywhere in the string.

Related

Starts with any Digit using Eloquent ORM in Laravel [duplicate]

I've got a database table mytable with a column name in Varchar format, and column date with Datetime values. I'd like to count names with certain parameters grouped by date. Here is what I do:
SELECT
CAST(t.date AS DATE) AS 'date',
COUNT(*) AS total,
SUM(LENGTH(LTRIM(RTRIM(t.name))) > 4
AND (LOWER(t.name) LIKE '%[a-z]%')) AS 'n'
FROM
mytable t
GROUP BY
CAST(t.date AS DATE)
It seems that there's something wrong with range syntax here, if I just do LIKE 'a%' it does count properly all the fields starting with 'a'. However, the query above returns 0 for n, although should count all the fields containing at least one letter.
You write:
It seems that there's something wrong with range syntax here
Indeed so. MySQL's LIKE operator (and SQL generally) does not support range notation, merely simple wildcards.
Try MySQL's nonstandard RLIKE (a.k.a. REGEXP), for fuller-featured pattern matching.
I believe LIKE is just for searching for parts of a string, but it sounds like you want to implement a regular expression to search for a range.
In that case, use REGEXP instead. For example (simplified):
SELECT * FROM mytable WHERE name REGEXP "[a-z]"
Your current query is looking for a string of literally "[a-z]".
Updated:
SELECT
CAST(t.date AS DATE) AS 'date',
COUNT(*) AS total,
SUM(LENGTH(LTRIM(RTRIM(t.name))) > 4
AND (LOWER(t.name) REGEXP '%[a-z]%')) AS 'n'
FROM
mytable t
GROUP BY
CAST(t.date AS DATE)
I believe you want to use WHERE REGEXP '^[a-z]$' instead of LIKE.
You have regex in your LIKE statement, which doesn't work. You need to use RLIKE or REGEXP.
SELECT CAST(t.date AS DATE) AS date,
COUNT(*) AS total
FROM mytable AS t
WHERE t.name REGEXP '%[a-zA-Z]%'
GROUP BY CAST(t.date AS DATE)
HAVING SUM(LENGTH(LTRIM(RTRIM(t.name))) > 4
Also just FYI, MySQL is terrible with strings, so you really should trim before you insert into the database. That way you don't get all that crazy overhead everytime you want to select.

select rows with longest substring of the string

Let me describe the problem based on the example below.
Lets say there is a string "abc12345" (could be any!!!) and there is a table mytable with a column mycolumn of varchar(100).
There are some rows that ends with the last character 5.
There are some rows that ends with the last characters 45.
There are some rows that ends with the last characters 345
There are no rows that ends with the last characters 2345.
In this case these rows should be selected:
SELECT * FROM mytable WHERE mycolumn LIKE "%345"
That's because "345" is the longest right substring of "abc12345" that occurs at least once as the right substring of at least one string in the mycolumn column.
Any ideas how to write it in one query?
Thank you.
This is a brute force method:
select t.*
from (select t.*,
dense_rank() over (order by (case when mycolumn like '%abc12345' then 1
when mycolumn like '%bc12345' then 2
when mycolumn like '%c12345' then 3
when mycolumn like '%12345' then 4
when mycolumn like '%2345' then 5
when mycolumn like '%345' then 6
when mycolumn like '%45' then 7
when mycolumn like '%5' then 8
end)
) as seqnum
where mycolumn like '%5' -- ensure at least one match
from t
) t
where seqnum = 1;
This then inspires something like this:
select t.*
from (select t.*, max(i) over () as maxi
from t join
(select str, generate_series(1, length(str)) as i
from (select 'abc12345' as str) s
) s
on left(t.mycolumn, i) = left(str, i)
) t
where i = maxi;
Interesting puzzle :)
The hardest problem here is finding what is the length of the target suffix matching your suffix pattern.
In MySQL you probably need to use either generating series or a UDF. Others proposed these already.
In PostgreSQL and other systems that provide regexp-based substring, you can use the following trick:
select v,
reverse(
substring(
reverse(v) || '#' || reverse('abcdefg')
from '^(.*).*#\1.*'
)) res
from table;
What it does is:
constructs a single string combining your string and suffix. Note, we reverse them.
we put # in between the strings that's important, you need a character that doesn't exist in your string.
we extract a match from a regular expression, using substring, such that
it starts at the beginning of the string ^
matches any number of characters (.*)
can have some remaining characters .*
now we find #
now, we want the same string we matched with (.*) to be present right after #. So we use \1
and there can be some tail characters .*
we reverse the extracted string
Once you have the longest suffix, finding maximum length, and then finding all strings having the suffix of that length is trivial.
Here's a SQLFiddle using PostgreSQL:
If you cannot restructure the table I would approach the problem this way:
Write an aggregate UDF LONGEST_SUFFIX_MATCH(col, str) in C (see an example in sql/udf_example.c in the MySQL source, search for avgcost)
SELECT #longest_match:=LONGEST_SUFFIX_MATCH(mycol, "abcd12345") FROM mytbl; SELECT * FROM mytbl WHERE mycol LIKE CONCAT('%', SUBSTR("abcd12345", -#longest_match))
If you could restructure the table, I do not have a complete solution yet, but the first thing I would add a special column mycol_rev obtained by reversing the string (via REVERSE() function) and create a key on it, then use that key for lookups. Will post a full solution when I have a moment.
Update:
If you can add a reversed column with a key on it:
use the query in the format of `SELECT myrevcol FROM mytbl WHERE myrevcol LIKE CONCAT(SUBSTR(REVERSE('$search_string'), $n),'%') LIMIT 1 performing a binary search with respect to $n over the range from 1 to the length of $search_string to find the largest value of $n for which the query returns a row
SELECT * FROM mytbl WHERE myrevcol LIKE CONCAT(SUBSTR(REVERSE('$search_string'), $found_n),'%')
This solution should be very fast as long as you do not have too many rows coming back. We will have a total of O(log(L)) queries where L is the length of the search string each of those being a B-tree search with the read of just one row followed by another B-tree search with the index read of only the needed rows.

How to search for particular strings between commas in MySQL?

I have table column that contain strings seperated by , like so
Algebraic topology,Riemannian geometries
Classical differential geometry,Noncommutative geometry,Integral transforms
Dark Matter
Spectral methods,Dark Energy,Noncommutative geometry
Energy,Functional analytical methods
I am trying to search for the MySQL row that has a string between comma, for example if I was search for Noncommutative geometry, I want to select these two rows
Classical differential geometry,Noncommutative geometry,Integral transforms
Spectral methods,Dark Energy,Noncommutative geometry
This is what I tried
SELECT * FROM `mytable` WHERE ``col` LIKE '%Noncommutative geometry%'
which works fine, but there problem is that if I was searching for Energy I want to select the row
Energy,Functional analytical methods
but my code gives the two rows
Energy,Functional analytical methods
Spectral methods,Dark Energy,Noncommutative geometry
which is not what I am looking for. Is there a way to fix this so that it only finds the rows that have the string between commas?
Give these a try, using the REGEXP operator:
SELECT * FROM `mytable`
WHERE `col` REGEXP '(^|.*,)Noncommutative geometry(,.*|$)'
SELECT * FROM `mytable`
WHERE `col` REGEXP '(^|.*,)Energy(,.*|$)'
The expression being used ('(^|.*,)$searchTerm(,.*|$)') requires the search term to be either preceded by a comma or the beginning of the string, and followed by either a comma or the end of the string.
you can do like this
SELECT * FROM `mytable` WHERE `col` LIKE '%,$yourString,%'
or `col` LIKE '$yourString,%'
or `col` LIKE '%,$yourString'

Separating comma broken words in PHP and selecting extra information

I've been trying to look into this for a while now but can't find an answer that explains the coding properly. Basically I have a mysql table with 'connections' relevant to a user. These connections are separated with a comma:
connection1, connection2, connection3, connection4, etc
What I need to do is to separate each one into an array like so:
$connection1
$connection2
$connection3
$connection4
Then for each of these I need to be able to select sertain information from a different table, so for example:
(SELECT name,id FROM users WHERE username = (all of the connections above))
Could any of you let me kow how this would be possible? Thank you
You can use FIND_IN_SET to do a JOIN, or you can join against a table of integers and use that with Substring_index to get the values from the CSV string
Normally a comma separated list in a database is the sign of poor design. Better to split them off into another table, with one row for each item in the comma separated list.
EDIT - Example of how to do it using a table of integers:-
SELECT name,id
FROM users a
INNER JOIN (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(concat(Connection, ','), ',', aCnt), ',', -1) AS aConnection
FROM Connections
CROSS JOIN (
SELECT a.i+b.i*10+c.i*100 + 1 AS aCnt
FROM integers a, integers b, integers c) Sub1
WHERE (LENGTH(Connection) + 1 - LENGTH(REPLACE(Connection, ',', ''))) >= aCnt) Sub2
ON a.username = Sub2.aConnection
This relies on a table called integers with a single column called i, with 10 rows with the values 0 to 9. You can cross join this against itself to get a range of numbers. In this case from 0 to 999, then limited by the number of commas in the field you are splitting up. This value is then used to find the commas for SUBSTRING_INDEX to split the string up.
It is simple. Use explode function of php.
$text= "connection1, connection2, connection3, connection4";
$res= explode(",",$text);
foreach($res as $i=>$id)
{
echo $res[$i];
}

Regex to Parse MySQL Column Type for Floats, Decimals, Doubles

In MySQL the results of DESCRIBE table_name or SHOW COLUMNS FOR table_name contains a Type field which for floats, decimals and doubles can look like decimal(15,3) or just decimal. The same follows for float and double.
I currently have:
'/^float|decimal|double(?:\((\d+),(\d+)\))?$/'
How do I need to modify this regular expression so that the (15,3) would be optional?
Update (2):
Even though the (15,3) is optional, I still need to capture the two values if they are there. Adding the ? worked but now it doesn't capture the 15 and 3. Suggestions?
'/^(?:float|decimal|double)(?:\((\d+),(\d+)\))?/'
Surround the \((\d+),(\d+)\) part with a non-capturing group ((?:)) and make it optional (?).
By the way: Is the $ at the end missing intentionally?
You just need to append ? to the group you want to make optional. It means "at most one". So you would need:
'/^(?:float|decimal|double)(\((\d+),(\d+)\))?/'
Try to read data from the information_schema.columns table.
For example:
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, COLUMN_TYPE FROM information_schema.columns;
DATA_TYPE returns - decimal
COLUMN_TYPE returns - decimal(5,2)
SELECT * FROM mytable WHERE mycolumn REGEXP "^(((float|decimal|double)(\\([[:digit:]]+,[[:digit:]]+\\))?))$";
This will match both of your cases. It expects either float, decimal or double alone or decimal(at least one digit , at least one digit)
Maybe this blog can help you!!!
http://qpassablog.blogspot.com.es/2014/02/expresiones-regulares-en-mysql.html
Ok, sorry for the inconvenience. This is the solution:
To check if the field value is a decimal number, you can use the following expresion
REGEXP '^[[:digit:]]+.{0,1}[[:digit:]]*$'
Example: select '3.222' REGEXP '^[[:digit:]]+.{0,1}[[:digit:]]*$' as prueba;
The INFORMATION_SCHEMA database is very useful because it breaks up the information. However, it can be incredibly slow to access. On my web host it takes 95 seconds!

Categories