Is it possible to use a query to swap positions of characters within the SAME value/column?
Example, I have a large table with a column that lists sizes like so:
12x24 Bulletin
7x14 Bulletin
14x48 Bulletin, etc.
The problem is now my boss has decided that he wants it to read:
Bulletin 12x24
Bulletin 7x14
Bulletin 14x48
Is it possible to swap those positions with a query or a regular expression? Or do I have to go manually update each record (very time consuming as there aprrox. 500 records to be modified)? Thanks for your help!
Is it always a whitespace between this two values? If yes, you can select your records like this:
SELECT CONCAT(SUBSTRING_INDEX(details, ' ', -1),' ',SUBSTRING_INDEX(details, ' ', 1)) AS value FROM tableName
SQL Fiddle: http://sqlfiddle.com/#!2/a04d8d/6
Table schema for this query I used:
CREATE TABLE tableName (
id int auto_increment primary key,
details varchar(30));
INSERT INTO tableName (id, details)
VALUES (NULL,'12x24 Bulletin'), (NULL,'7x14 Bulletin');
Here's what seems natural to me, though not too flexible:
UPDATE SOMETABLE
SET SOMECOLUMN = CONCAT('Bulletin ', REPLACE(SOMECOLUMN, ' Bulletin', ''))
WHERE SOMECOLUMN LIKE '% Bulletin'
;
Yes. It is possible.
The "trick" is to write an expression that returns the new value to be assigned.
It looks like you could achieve what you want by finding occurrences of ' Bulletin' and replacing that with an empty string, and pre-pending the remaining value with 'Bulletin '.
Here's a demonstration...
-- test cases
CREATE TABLE foo (id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, mycol VARCHAR(255))
INSERT INTO foo (mycol) VALUES ('12x24 Bulletin'),('7x14 Bulletin'),('14x48 Bulletin')
-- test expression to return new value to be assigned
SELECT mycol AS oldval
, CONCAT('Bulletin ',REPLACE(mycol,' Bulletin','')) AS newval
FROM foo
WHERE mycol LIKE '% Bulletin%'
-- use that expression in an UPDATE statement to replace the value
UPDATE foo
SET mycol = CONCAT('Bulletin ',REPLACE(mycol,' Bulletin',''))
WHERE mycol LIKE '% Bulletin%'
There are some gotchas to be aware of in the UPDATE statement, such as referencing a column in an expression after it's assigned a value. (Not a problem in the example I gave above, but it can be in the more general case.
Note: The MySQL REGEXP operator returns a boolean, not a string function, so, no, you can't "use a regular expression" to return a replacement value. You could use REGEXP in place the LIKE operator in the examples above.
FOLLOWUP
The above assumes you want to update the values stored in a table. If you only want to do the modification at the time you pull the data, you could use an expression in the SELECT list, similar to the above.
INSERT INTO foo (mycol) VALUES ('Stapler 12')
SELECT IF(mycol LIKE '% Bulletin'
,CONCAT('Bulletin ',REPLACE(mycol,' Bulletin',''))
,mycol
) AS mycol
FROM foo
If those examples hold true, then you can do the swapping by doing:
update t
set col = ws_concat(' ', substring_index(col, ' ', -1), substring_index(col, ' ', 1)
where col like '% Bulletin';
Alternatively, you could just do this when you query on the column:
select ws_concat(' ', substring_index(col, ' ', -1), substring_index(col, ' ', 1) as size,
<other columns
from t;
However, the existence of the problem suggests that you are stuffing two types of information into a single column. You should consider having two columns -- in your case, one would contain 'Bulletin' (some sort of "sizetype" column) and another would have the dimensions. You might even want to split the dimensions into additional columns.
Actually, as I write this, I suspect that you are missing an entity in your database. This column should actually be a foreign key to a sizes table. That table would have one row per size thingee. You can then have various columns for representing the size to suite you, your boss, your boss's boss, or whoever.
Related
I have a enum column in my table and I am trying to get out the values I have set in the table in a drop down. So first I have written this query to get the column_type and column_name
"SELECT `COLUMN_NAME`,`DATA_TYPE`,`COLUMN_TYPE` FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_SCHEMA`='devsbh' AND `TABLE_NAME`='modules' AND `COLUMN_NAME` NOT IN ('created_at', 'updated_at')"
then I do this to get out the enum values like so
<?php
$regex = "/'(.*?)'/";
preg_match_all( $regex , $modules->COLUMN_TYPE , $enum_array );
$enum_fields = $enum_array[1];
?>
and I display like so
PS:Using laravel's blade template engine.
{!! Form::select($modules->COLUMN_NAME,$enum_fields) !!}
Everything is correct up until here. When I try to store it tries too save as for Y => 0 and for N => 1. How can I get the key => value same as enum value?
the values of $enum_fields as per the console is [0]=>Y , [1]=>N.
You can use the array_combine method to make the key and the value of the array same like so
<?php
$regex = "/'(.*?)'/";
preg_match_all( $regex , $modules->COLUMN_TYPE , $enum_array );
$keyValueSame = array_combine($enum_array[1],$enum_array[1]);
?>
now the key and value of the $keyValueSame array will have the same value.
the values of $keyValueSame as per the console is [Y]=>Y , [N]=>N.
You are probably better off just hard coding the enumerations into your codebase.
Assuming you are using MySQL, the idea behind enumerations is really to restrict the values in the column to a specific set - basically saying "if the value is not one of these strings, don't allow it.".
Enumerations are not designed to be changed often (if at all) - in fact, you may find issues if you do try to alter them it - it can take some database gymnastics to alter them especially if you have lots of records.
If your "lookup" data will change, and you need it to be database stored, make the column a foreign key to another table containing your lookup fields.
If you are stuck with the enumerations, just hard code the list in your dropdown.
Getting the Value:
I've got the levenshtein_ratio function, from here, queued up in my MySQL database. I run it in the following way:
$stmt = $db->prepare("SELECT r_id, val FROM table WHERE levenshtein_ratio(:input, someval) > 70");
$stmt->execute(array('input' => $input));
$result = $stmt->fetchAll();
if(count($result)) {
foreach($result as $row) {
$out .= $row['r_id'] . ', ' . $row['val'];
}
}
And it works a treat, exactly as expected. But I was wondering, is there a nice way to also get the value that levenshtein_ratio() calculates?
I've tried:
$stmt = $db->prepare("SELECT levenshtein_ratio(:input, someval), r_id, val FROM table WHERE levenshtein_ratio(:input, someval) > 70");
$stmt->execute(array('input' => $input));
$result = $stmt->fetchAll();
if(count($result)) {
foreach($result as $row) {
$out .= $row['r_id'] . ', ' . $row['val'] . ', ' . $row[0];
}
}
and it does technically work (I get the percentage from the $row[0]), but the query is a bit ugly, and I can't use a proper key to get the value, like I can for the other two items.
Is there a way to somehow get a nice reference for it?
I tried:
$stmt = $db->prepare("SELECT r_id, val SET output=levenshtein_ratio(:input, someval) FROM table WHERE levenshtein_ratio(:input, someval) > 70");
modelling it after something I found online, but it didn't work, and ends up ruining the whole query.
Speeding It Up:
I'm running this query for an array of values:
foreach($parent as $input){
$stmt = ...
$stmt->execute...
$result = $stmt->fetchAll();
... etc
}
But it ends up being remarkably slow. Like 20s slow, for an array of only 14 inputs and a DB with about 350 rows, which is expected to be in the 10,000's soon. I know that putting queries inside loops is naughty business, but I'm not sure how else to get around it.
EDIT 1
When I use
$stmt = $db->prepare("SELECT r_id, val SET output=levenshtein_ratio(:input, someval) FROM table WHERE levenshtein_ratio(:input, someval) > 70");
surely that's costing twice the time as if I only calculated it once? Similar to having $i < sizeof($arr); in a for loop?
To clean up the column names you can use "as" to rename the column of the function. At the same time you can speed things up by using that column name in your where clause so the function is only executed once.
$stmt = $db->prepare("SELECT r_id, levenshtein_ratio(:input, someval) AS val FROM table HAVING val > 70");
If it is still too slow you might consider a c library like https://github.com/juanmirocks/Levenshtein-MySQL-UDF
doh - forgot to switch "where" to "having", as spencer7593 noted.
I'm assuming that `someval` is an unqalified reference to a column in the table. While you may understand that without looking at the table definition, someone else reading the SQL statement can't tell. As an aid to future readers, consider qualifying your column references with the name of the table or (preferably) a short alias assigned to the table in the statement.
SELECT t.r_id
, t.val
FROM `table` t
WHERE levenshtein_ratio(:input, t.someval) > 70
That function in the WHERE clause has to be evaluated for every row in the table. There's no way to get MySQL to build an index on that. So there's no way to get MySQL to perform an index range scan operation.
It might be possible to get MySQL to use an index for the query, for example, if the query had an ORDER BY t.val clause, or if there is a "covering index" available.
But that doesn't get around the issue of needing to evaluate the function for every row. (If the query had other predicates that excluded rows, then the function wouldn't necessarily need be evaluated for the excluded rows.)
Adding the expression to the SELECT list really shouldn't be too expensive if the function is declared to be DETERMINISTIC. A second call to a DETERMINISTIC function with the same arguments can reuse the value returned for the previous execution. (Declaring a function DETERMINISTIC essentially means that the function is guaranteed to return the same result when given the same argument values. Repeated calls will return the same value. That is, the return value depends only the argument values, and doesn't depend on anything else.
SELECT t.r_id
, t.val
, levenshtein_ratio(:input, t.someval) AS lev_ratio
FROM `table` t
WHERE levenshtein_ratio(:input2, t.someval) > 70
(Note: I used a distinct bind placeholder name for the second reference because PDO doesn't handle "duplicate" bind placeholder names as we'd expect. (It's possible that this has been corrected in more recent versions of PDO. The first "fix" for the issue was an update to the documentation noting that bind placeholder names should appear only once in statement, if you needed two references to the same value, use two different placeholder names and bind the same value to both.)
If you don't want to repeat the expression, you could move the condition from the WHERE clause to the HAVING, and refer to the expression in the SELECT list by the alias assigned to the column.
SELECT t.r_id
, t.val
, levenshtein_ratio(:input, t.someval) AS lev_ratio
FROM `table` t
HAVING lev_ratio > 70
The big difference between WHERE and HAVING is that the predicates in the WHERE clause are evaluated when the rows are accessed. The HAVING clause is evaluated much later, after the rows have been accessed. (That's a brief explanation of why the HAVING clause can reference columns in the SELECT list by their alias, but the WHERE clause can't do that.)
If that's a large table, and a large number of rows are being excluded, there might be a significant performance difference using the HAVING clause.. there may be a much larger intermediate set created.
To get an "index used" for the query, a covering index is the only option I see.
ON `table` (r_id, val, someval)
With that, MySQL can satisfy the query from the index, without needing to lookup pages in the underlying table. All of the column values the query needs are available from the index.
FOLLOWUP
To get an index created, we would need to create a column, e.g.
lev_ratio_foo FLOAT
and pre-populate with the result from the function
UPDATE `table` t
SET t.lev_ratio_foo = levenshtein_ratio('foo', t.someval)
;
Then we could create an index, e.g.
... ON `table` (lev_ratio_foo, val, r_id)
And re-write the query
SELECT t.r_id
, t.val
, t.lev_ratio_foo
FROM `table` t
WHERE t.lev_ratio_foo > 70
With that query, MySQL can make use of an index range scan operation on an index with lev_ratio_foo as the leading column.
Likely, we would want to add BEFORE INSERT and BEFORE UPDATE triggers to maintain the value, when a new row is added to the table, or the value of the someval column is modified.
That pattern could be extended, additional columns could be added for values other than 'foo'. e.g. 'bar'
UPDATE `table` t
SET t.lev_ratio_bar = levenshtein_ratio('bar', t.someval)
Obviously that approach isn't going to be scalable for a broad range of input values.
I am searching welds.welder_id and welds.bal_welder_id which are lists of unique welder IDs separated by spaces by the users.
The record set looks like 99,199,99 w259,w259 259 5-a
99,199,259,5-a and w259 are unique welder id numbers
I cannot use the MYSQL INSTR() function by itself as a search for "99" will pull up records with "199"
Users on each project format their welder IDs a different way (000,a000,0aa) usually to match their customer's records.
I really want to avoid using PHP code for a number of reasons.
To select records with "w259" in the welder_id OR in the bal_welder_id columns, my query looks like this.
SELECT * FROM `welds`
WHERE `omit`=0
AND( (`welder_id`='w259' OR `bal_welder_id`='w259')
OR (`welder_id` LIKE 'w259 %' OR `bal_welder_id` LIKE 'w259 %')
OR (`welder_id` LIKE '% w259' OR `bal_welder_id` LIKE '% w259')
OR (INSTR(`welder_id`, ' w259 ') > 0 OR INSTR(`bal_welder_id`,' w259 ') > 0))
ORDER BY `date_welded` DESC
LIMIT 100;
It works but it takes 0.0030 seconds with 1300 test records on my workstation's SSD.
The actual DB will have hundreds of thousands after a year or two.
Is there a better way?
Thanks.
If I understand your question correctly, one option is to use FIND_IN_SET(str, strlist) string function, which returns the position of the string str in the comma separated string list strlist, for example:
SELECT FIND_IN_SET('b','a,b,c,d');
will return 2. Since your string is not separated by commas, but by spaces, you could use REPLACE() to replace spaces with commas. Your query can be like this:
SELECT * FROM `welds`
WHERE
`omit`=0
AND
(FIND_IN_SET('w259', REPLACE(welder_id, ' ', ','))>0
OR
FIND_IN_SET('w259', REPLACE(bal_welder_id, ' ', ','))>0)
The optimizer however cannot to much, since FIND_IN_SET cannot make use of an index, if present. I would suggest you to normalize your table, if it is possible.
I am using the following query (simplified for here) to check if a string contains a "watch-word" where the watch words are contained in a MySQL table:
$sql = "SELECT ww_id FROM watch_words WHERE ww_word IN (" . $string . ")";
This works perfectly for single words, but now I need to make it work for phrases (i.e. the field ww_word may contain more than one word). All I can think of are things like reading the whole table into an array and then doing multiple loops to compare against combinations of the words in the string, but I'm sure (hoping) there's a better way.
EDIT: Thanks for the suggestions, but as pointed out by Mike Brant, the needle is in MySQL and the haystack in PHP - not the "usual" way around (like a search form for instance). I need to check if a string (actually a message) contains one or more "watch phrases" - like a bad-language filter (but not that).
Sample table thus:
CREATE TABLE `watch_words` (
`ww_id` int(11) NOT NULL AUTO_INCREMENT,
`ww_word` varchar(250) NOT NULL,
PRIMARY KEY (`ww_id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `watch_words` VALUES (1, 'foo bar');
INSERT INTO `watch_words` VALUES (2, 'nice sunny day');
INSERT INTO `watch_words` VALUES (3, 'whatever');
INSERT INTO `watch_words` VALUES (4, 'my full name here');
INSERT INTO `watch_words` VALUES (5, 'keyword');
So string "What a nice sunny day we're having" should return a match, whereas "What a lovely sunny day..." wouldn't. TIA.
use LIKE for pattern matching
$sql = "SELECT ww_id FROM watch_words WHERE ww_word LIKE '%" . $string . "%'";
or maybe interchange the two,
$sql = "SELECT ww_id FROM watch_words WHERE " . $string . " LIKE CONCAT('%', ww_word,'%')";
As a sidenote, the query is vulnerable with SQL Injection if the value(s) came from the outside. Please take a look at the article below to learn how to prevent from it. By using PreparedStatements you can get rid of using single quotes around values.
How to prevent SQL injection in PHP?
You will likely need to take a different approach here. You have the needle in MySQL and the haystack in PHP. Using things like LIKE (which you use for string matches not IN), MySQL can work fine with the haystack being in MySQL table and the needle in the application (in the LIKE).
There is no convenient reverse matching to pass MySQL the haystack and have it apply a needle from a field in a table against it.
You will likely need to select your needles out of the database and compare it to the haystack in your application.
I'm having trouble with the sql below. Basically I have rows that contains strings according to the format: 129&c=cars. I only want the digits part, e.g. 129. The sql query is:
$result = mysql_query("SELECT * FROM " . $db_table . " WHERE id LIKE '" . $id . "%'");
Why doesn't % work? I can't use %...% because it catches too much.
I would actually recommend using regular expressions fo the matching, but unfortunately, there is no way to capture the matching part with mysql. You will have to do the extraction in php. If you have an array containing all the results called $array:
$array = preg_replace('/^(\d+).*/', '$1', $array);
You can use the MySQL 'regexp' stuff in the WHERE clause to reduce the amount of data retrieved to just the rows you want. The basic for of your query would look like:
SELECT * FROM table WHERE field REGEXP '^$id&'
where $id is inserted by PHP and the data you want is always at the start of the field and followed by a &. If not, adjust the regex to suit, of course.
MySQL's regex engine can't do capturing, unfortunately, so you'll still have to do some parsing in PHP as soulmerge showed above, but with the 'where regexp' stuff in MySQL, you'll only have to deal with rows you know contain the data you want, not the entire table.
Using a query like this:
SELECT *
FROM mytable
WHERE id >= '0' COLLATE UTF8_BIN
AND id < ':' COLLATE UTF8_BIN
will return all strings that start with a digit and make your expression sargable, i. e. and index on id can be used.
This will make your query run faster.