Splitting value in MySQL - php

I want to update a field on a really huge (1m rows) table. I want to update it from:
+-----------------------------------------------------------+
| ref |
+-----------------------------------------------------------+
| 0001___000000000003616655___IVANTI UK___TEMPLATE MATERIAL |
+-----------------------------------------------------------+
to:
+-------------------------------+
| ref |
+-------------------------------+
| IVANTI UK___TEMPLATE MATERIAL |
+-------------------------------+
So basically its just changing the ref (which is not fixed length) from sid___sku___mfr___pnum to mfr___pnum format.
In PHP I'd do it like so (pseduo code):
list($p['sid'], $p['sku'], $p['mfr'], $p['pnum']) = explode('___', $row['ref']);
$row['ref'] = $p['mfr'] . '___' . $p['pnum'];
Wondering if its possible to do it directly with MySQL with a performant query?

select SUBSTRING_INDEX(ref,'___',-2) from test
0001___000000000003616655___IVANTI UK___TEMPLATE MATERIAL
=>
IVANTI UK___TEMPLATE MATERIAL
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_substring-index
SUBSTRING_INDEX(str,delim,count)
Returns the substring from string str before count occurrences of the
delimiter delim. If count is positive, everything to the left of the
final delimiter (counting from the left) is returned. If count is
negative, everything to the right of the final delimiter (counting
from the right) is returned. SUBSTRING_INDEX() performs a
case-sensitive match when searching for delim.

Related

PHP/MySQL table update issue while checking condition in column

I have the above table: tblCompInfo, the product_id value is not 100% accurate and I need to fix it. I have total of 543847 total row with 25 different company and 12 different products.
now, The URL is 100% accurate and as you can see from the image I have highlighted with RED which means they are wrong and GREEN which is what it should be updated to.
TASK:
I need to update Product_id by parsing through URL and getting the INTEGER and checking it with product table, if its a product, assign the value else assign 0.
SOLUTION:
I got two solution in my head:
1. EXPORT the entire DATA to EXCEL CVS, change it and UPLOAD it to DATABASE. which means my entire week will be working with EXCEL only.
2. Since I have laravel framework: I can make a function in PHP and get the DATA company wise and UPDATE the table in a foreach loop with condition.
PROBLEM:
So, to make my life easy, I made the PHP function with a simple solution and it works BUT I get MEMORY ALLOCATION PROBLEM.
$companyID = ??;
$tblCompInfos = tblCompInfo::where('company_id', '=', $companyID)->get();
foreach($tblCompInfos as $tblCompInfo)
{
$actual_link = $tblCompInfo->url;
$pathlink = parse_url($actual_link, PHP_URL_PATH);
$product_id_from_url = preg_replace("/[^0-9]/", "" , $pathlink);
$FindIfItsInProductTable = Product::find($product_id_from_url);
$real_product_id = $FindIfItsInProductTable == null ? 0 : $product_id_from_url;
DB::table('tblCompInfo')->where('company_id', '=', $companyID)->where('url', '=', $tblCompInfo->url)->update(array(
'product_id' => $real_product_id,
));
echo $actual_link."-".$real_product_id."=".$tblCompInfo->product_id."<br>";
}
if it was a local server, I would have update my PHP.ini with more memory and do the job.
However, I have a LIVE server and it has to be done in the live server and I have no control or power over PHP.ini.
What to do? How can I do it easily that I will not get a memory issue?
Please help if anyone?
Try this :
UPDATE [table_name] SET product_id = CONVERT(SUBSTR(url, LOCATE('products/', url)+9, LOCATE('/compare',url)-LOCATE('products/', url)+9),UNSIGNED INTEGER)
But this will only works if every url field has suffix as /compare
if you use MariaDB you can use REGEXP_REPLACE to do the changes like
UPDATE your_table
SET url = REGEXP_REPLACE(url,'[0-9]+',Product_id)
WHERE Product_id > 0;
sample
MariaDB [your_schema]> SELECT REGEXP_REPLACE('http://example.com/products/12/compare','[0-9]+','99');
+--------------------------------------------------------------------+
| REGEXP_REPLACE('http://example.com/products/12/compare','[0-9]+','99') |
+--------------------------------------------------------------------+
| http://example.com/products/99/compare |
+--------------------------------------------------------------------+
1 row in set (0.00 sec)
MariaDB [your_schema]>
I have a pretty odd idea but it can work.
Look at that query :
SELECT
'http://example.com/products/12/compare' as url,
'http://example.com/products/' as check1,
'http://example.com/termsets/' as check2,
'http://example.com/products/12/compare' REGEXP 'http://example.com/products/' as regexp_check1, -- check 1
SUBSTRING('http://example.com/products/12/compare', LOCATE('http://example.com/products/','http://example.com/products/12/compare')+LENGTH('http://example.com/products/'),1 ) as test1,
SUBSTRING('http://example.com/products/12/compare', LOCATE('http://example.com/products/','http://example.com/products/12/compare')+LENGTH('http://example.com/products/'),1 ) REGEXP "^[0-9]+$" as test1_only_num,
SUBSTRING('http://example.com/products/12/compare', LOCATE('http://example.com/products/','http://example.com/products/12/compare')+LENGTH('http://example.com/products/'),2 ) as test11,
SUBSTRING('http://example.com/products/12/compare', LOCATE('http://example.com/products/','http://example.com/products/12/compare')+LENGTH('http://example.com/products/'),1 ) REGEXP "^[0-9]+$" as test11_only_num,
SUBSTRING('http://example.com/products/12/compare', LOCATE('http://example.com/products/','http://example.com/products/12/compare')+LENGTH('http://example.com/products/'),3 ) as test111,
SUBSTRING('http://example.com/products/12/compare', LOCATE('http://example.com/products/','http://example.com/products/12/compare')+LENGTH('http://example.com/products/'),1 ) REGEXP "^[0-9]+$" as test111_only_num;
Result :
+----------------------------------------+------------------------------+------------------------------+---------------+-------+----------------+--------+-----------------+---------+------------------+
| url | check1 | check2 | regexp_check1 | test1 | test1_only_num | test11 | test11_only_num | test111 | test111_only_num |
+----------------------------------------+------------------------------+------------------------------+---------------+-------+----------------+--------+-----------------+---------+------------------+
| http://example.com/products/12/compare | http://example.com/products/ | http://example.com/termsets/ | 1 | 1 | 1 | 12 | 1 | 12/ | 0 |
+----------------------------------------+------------------------------+------------------------------+---------------+-------+----------------+--------+-----------------+---------+------------------+
Url, check1 and check2 are just to display the variables I'm using. It's a main ID, the query is not usable that way of course.
Logic with check1
You check with a REGEX if check1 is present in your URL. If yes, regexp_check1 is 1, else it's 0.
ONLY if regexp_check1 is 1, then you SUBSTRING your URL to take the part that is located AFTER the check1 sentence. You take the first character AFTER (test1), then the two characters AFTER (test11), the three characters AFTER (test111) etc.. until the max length your ID_PRODUCT can be (6 or 7 for example).
You REGEX the SUBSTR you isolated to check if they are numeric only (test1 is numeric, test11 is numeric only, test111 is not numeric only.
Then you know that the content of test11 is your ID
Then you do the same thing with check2 if regexp_check1 was 0, and with an eventual check3 (which would contain http://www.comadso.dk/products/ for example), and for every beginning you can have.
Maybe my idea is a shitty one, but hey if it's seem dumb but works, it's not dumb !

How to find the length of a chinese phrase in a MySQL database with SQL?

For example, this is my table, which is called example:
--------------------------
| id | en_word | zh_word |
--------------------------
| 1 | Internet| 互联网 |
--------------------------
| 2 | Hello | 你好 |
--------------------------
and so on...
And I tried using this SQL Query:
SELECT * FROM `example` WHERE LENGTH(`zh_word`) = 3
For some reason, it wouldn't give me three, but would give me a lot of single letter characters.
Why is this? Can this be fixed? I tried this out in PhpMyAdmin.
But when I did it with JavaScript:
"互联网".length == 3; // true
And it seems to work fine. So how come it doesn't work?
you should use CHAR_LENGTH instead of LENGTH
LENGTH() returns the length of the string measured in bytes.
CHAR_LENGTH() returns the length of the string measured in characters.
LENGTH returns length in bytes (and chinese is multibyte)
Use CHAR_LENGTH to get length in characters
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_char-length
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_length

Mysql string check on equals is false for the same values

I have a problem with MySql
I have a table with parsed informations from websites. A strange string interpretation appear:
the query
select id, address from pagesjaunes_test where address = substr(address,1,length(address)-1)
return a set of values instead of none
at beginning I executed functions as:
address = replace(address, '\n', '')
address = replace(address, '\t', '')
address = replace(address, '\r', '')
address = replace(address, '\r\n', '')
address = trim(address)
but the problem still persist.
Values of field 'address' have some french chars , but the query returned also values that contains only alfanumeric english chars.
Another test: I tried to check the length of strings and ... the strlen() from PHP and LENGTH() from MYSQL display different results! Somewhere difference is by 2 chars, somewhere by 1 character without a specific "rule".
Visual I can't see any space or tabs or something else.
After I modified an address manualy(I deleted all string and I wrote it again), the problem is solved, but I have ~ 6000 values, so this is not a solution :)
What can be the problem?
I suppose that strings can have something as an "empty char", but how to detect and remove it?
Thanks
P.S.
the problem is not just length. I need to join this table with other one and using a condition that check if values from fields 'address' are equals. Even if the fields have the same collation and tables have the same collation, query returns that no addresses match
E.g.
For query:
SELECT p.address,char_length(p.address) , r.address, char_length(r.address)
FROM `pagesjaunes_test` p
LEFT JOIN restaurants r on p.name=r.name
WHERE
p.postal_code=r.postal_code
and p.address!=r.address
and p.phone=''
and p.cuisines=''
LIMIT 10
So: p.address!=r.address
The result is:
+-------------------------------------+------------------------+--------------------------+------------------------+
| address | char_length(p.address) | address | char_length(r.address) |
+-------------------------------------+------------------------+--------------------------+------------------------+
| Dupin Marc13 quai Grands Augustins | 34 | 13 quai Grands Augustins | 24 |
| 39 r Montpensier | 16 | 39 r Montpensier | 16 |
| 8 r Lord Byron | 14 | 3 r Balzac | 10 |
| 162 r Vaugirard | 15 | 162 r Vaugirard | 15 |
| 32 r Goutte d'Or | 16 | 32 r Goutte d'Or | 16 |
| 2 r Casimir Périer | 18 | 2 r Casimir Périer | 18 |
| 20 r Saussier Leroy | 19 | 20 r Saussier Leroy | 19 |
| Senes Douglas22 r Greneta | 25 | 22 r Greneta | 12 |
| Ngov Ly Mey44 r Tolbiac | 23 | 44 r Tolbiac | 12 |
| 33 r N-D de Nazareth | 20 | 33 r N-D de Nazareth | 20 |
+-------------------------------------+------------------------+--------------------------+------------------------+
As you see, "162 r Vaugirard", "20 r Saussier Leroy" contains only ASCII chars, have the same length but aren't equals!
Maybe have a look at the encoding of the mysql text fields - UTF8 encodes most of its characters with 2 bytes - only a small subset of UTF8 (ASCII characters for example) get encoded with one byte.
MySQL knows UTF8 and counts right.
PHP text functions aren't UTF8 aware and count the bytes itself.
So if PHP counts more than MYSQL, this is probably the cause and you could have a look at utf8decode.
br from Salzburg!
The official documentation says:
Returns the length of the string str, measured in bytes. A multi-byte character counts as multiple bytes. This means that for a string containing five two-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5.
So, use CHAR_LENGTH instead :)
select id, address from pagesjaunes_test
where address = substr(address, 1, char_length(address) - 1)
Finally, I found the problem. After changed collation to ascii_general_ci all non-ascii chars was transformed to "?". Some spaces also was replaced with "?". After check initial values, function ORD() from MySQL returned 160 (instead of 32) for these spaces. So,
UPDATE pagesjaunes_test SET address = TRIM(REPLACE(REPLACE(address, CHAR(160), ' '), ' ',' ')
resolved my question.

Searching for a partial match in a SQL database with PHP

I have a php file that search a SQL database. It takes a string from a textbox and tries to match it to various attributes for the database. Here is the code that performs the searched:
if ($filter['meta_info']) {
$search_string = $filter['meta_info'];
unset($filter['meta_info']);
$m_intSortField = null;
$m_strWhere .= (($m_strWhere) ? " AND " : "")."(MATCH (`courses`.`assigned_id`,`courses`.`title`,`courses`.`author`,`courses`.`keywords`,`courses`.` abstract`,`courses`.`objective`,`courses`.`summary`,`courses`.`copyright`,`courses`.`notes`) AGAINST ('".mysql_escape_string($search_string)."' IN BOOLEAN MODE))";
}
My problem is, I want it to return courses that have a partial match to the assigned ID not just a complete match. Anyone know how I could do this?
Turn off strict mode on your mysql options, or use LIKE.
SELECT id,name from LESSONS where name LIKE "English%";
returns
| id | Name
| 2 | English Literature
| 8 | English Language

SQL string match numbers, varying length

I am looking up which exchange services which telephone numbers, from a table of fragmentary numbers that show which exchange services them.
So my table contains, for example:
id |exchcode |exchname |easting|northin|leadin |
-----------------------------------------------------------------
12122 |SNL/UC |SANDAL |43430 |41306 |1924240 |
12123 |SNL/UC |SANDAL |43430 |41306 |1924241 |
881 |SNL/UD |SANDAL |43430 |41306 |1924249 |
2456 |BD/BCC/1 |BRADFORD CABLE |41627 |43262 |192421 |
4313 |NEY/UB |NORMANTON |43847 |42289 |192422 |
12124 |SNL/UC |SANDAL |43430 |41306 |192425 |
9949 |OBE/UB |HORBURY OSSETT |42857 |41971 |192428 |
9987 |OBE/UB |WAKEFIELD |42857 |41971 |1924 |
(sorry, formatting a bit rubbish)
leadin is the leading part of the phone number I have to match (stored as a VARCHAR, not a number)
And I am supplied with a phone number 1924283777 (not real)
how do I query to get the best match from the above table (It should pick exchange id 9949), or do I deal with it in code after I've done the query (php)
tl;dr: variable length for values of leadin column, want best match with a number longer than leadin.
I would think something like
WHERE ? LIKE concat(leadin, '%') order by length(leadin) desc limit 1
(I haven't checked the function names, and I'm not certain that this will work in MYSQL - I'm pretty sure it will work in one of the SQL's I've used).

Categories