I have a set of textfiles containing personal data. I want to use a script to run through them and write them to a mysql database.
What I have done so far is to read the text file into a string, explode the string at " " and then run through the array to find the required information. There are a couple of problems with that though, so maybe someone knows a better solution alltogether.
The base text file looks like this:
Name: Marius
Address 1: Street
Address 2: Town
Address 3: Country
etc
The problem in this is that I have no way of knowing how long the 'value' of i.e. Street is. Could be just the 'Somestreet' or 'Some Street' or 'Some Street 1337'. This goes for pretty much every field.
Is there maybe a method to turn that text into an array, where for every item the string ending with ":" is used as the key and every following item without ending ":" is considered the value?
As Adam hints in the comments: If there is no tab, cr/lf or similar delimiter, there is not enough information in your string to parse the data by.
If there is only a limited set of keys, you can provide this set as additional information.
Since you want to insert the data into a mysql table, I assume that the keys are limited and unique.
With an array of these key strings, your parser algorithm would work this way:
1 explode the string at ":"
2 loop through the resulting array, where you should find a structure like this:
[some or no value][space or other delimiter][known key]
2.1 remember the key from the previous loop (empty as default)
2.2 since the keys should be unique, loop through the array of known keys
2.2.1 if a key matches the current string structure from the right, you can
strip this key off the right side of the string, trim the string and you
have the current value
2.2.2 if you found a value, put both into the result array:
$arr[previous key] = [current value]
This should do the trick.
Related
I'm using implode to insert few values into one row in MySQL database.
implode(' ', $_POST['tag']);
Assuming that I have table named product with row named tags with 3 different values that inserted inside like this:
usb adapter charger
I have tried using this method using like operator (%), but that didn't worked.
$sql = "SELECT * FROM product WHERE tags='%usb%'";
How can I extract only one value from the imploded array using WHERE in mysql query?
I agree with the comments about re-designing the database. At first read it seems that using LIKE would definitely get the result you want but after reading #Patrick Q's pan - panther example, it makes a lot sense that LIKE is not really a good solution. There are ways to get exactly the tag string you're looking for but it may hurt the performance and the query will be longer and complex. Hence the following are to demonstrate how the query would look like with your current tags data value:
MySQL query:
SELECT tags,
SUBSTRING_INDEX(SUBSTRING_INDEX(tags,' ',FIND_IN_SET('usb',REPLACE(tags,' ',','))),' ',-1) v
FROM mytable
HAVING v = 'usb';
As you can see, there are a few functions being used just to get the exact string from the data cell. Since your example data was separating with spaces and FIND_IN_SET identify value separation by comma, REPLACE take place on the tags column first to replace spaces with comma. Then with SUBSTRING_INDEX twice to get the string using the location extracted in FIND_IN_SET. Finally at the end HAVING to get only the tag you're looking for.
Further demo here : https://www.db-fiddle.com/f/joDa7MNcQL2RakTgBa7qBM/3
i have a question about searching in MySQL.I couldn't find answer for long time.
I use symfony 3.1 and i have the next situation:
I have column site_languages (longtext, (DC2Type:simple_array) ) in sql with values (for example) "0,1,7", "11,15,27" etc
So my question is:
how i can select row by site_languages which include in array a search value?
I tried use LIKE, checked REGEXP but for example if i need search by value "1" it will return rows with 11/51/111 etc too.I was thinking about keep values like "[1],[15]" But i think there is exist easier and right solution for it?
If you'd like to solve this issue by using a regex you can use this regex (^|,)1(,|$) to match a 1 that is preceded by a comma and followed by a comma OR the 1 may be preceded by the beginning of the line or end of the line.
You should consider normalizing your DB table but if you insist to keep it this way you can save it as json then retrieve all rows and iterate (bad idea) but is ok if it is a short table
I want to store large amount (~thousands) of strings and be able to perform matches using wildcards.
For example, here is a sample content:
Folder1
Folder1/Folder2
Folder1/*
Folder1/Folder2/Folder3
Folder2/Folder*
*/Folder4
*/Fo*4
(each line has additionnal data too, like tags, but the matching is only against that key)
Here is an example of what I would like to match against the data:
Folder1
Folder1/Folder2/Folder3
Folder3
(* being a wildcard here, it can be a different character)
I naively considered storing it in a MySQL table and using % wildcards with the LIKE operator, but MySQL indexes will only work for characters on the left of the wildcard, and in my case it can be anywhere (i.e. %/Folder3).
So I'm looking for a fast solution, that could be used from PHP. And I am open: it can be a separate server, a PHP library using files with regex, ...
Have you considered using MySQL's regular expression engine? Try something like this:
SELECT *
FROM your_table
WHERE your_query_string REGEXP pattern_column
This will return rows with regex keys that your query string matches. I expect it will perform better than running a query to pull all of the data and doing the matching in PHP.
More info here: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
You might want to use the multicore approach to solve that search in a fraction of the time, i would recommend for search and matching, using FPGA's but thats probably the hardest way to do it, consider THIS ARTICLE using CUDA, you can do that searches in 16x usual time, in multicore CPU Systems, you can use posix, or a cluster of computers to do the job (MPI for example), you can call Gearman service to run the searches using advanced algorithms.
Were it me, I'd store out the key field two times ... once forward and once reversed (see mysql's reverse function). you can then search the index with left(main_field) and left(reversed_field). it won't help you when you have a wildcard in the middle of the string AND the beginning (e.g. "*Folder1*Folder2), but it will when you have a wildcard at the beginning or the end.
e.g. if you want to search */Folder1 then search where left(reverse_field, 8) = '1redloF/';
for Folder1/*/FolderX search where left(reverse_field, 8) = 'XredloF/' and left(main_field, 8) = 'Folder1/'
If your strings represent some kind of hierarchical structure (as it looks like in your sample content), actually not "real" files, but you say you are open to alternative solutions - why not consider something like a file-based index?
Choose a new directory like myindex
Create an empty file for each entry using the string key as location & file name in myindex
Now you can find matches using glob - thanks to the hierarchical file structure a glob search should be much faster than searching up all your database entries.
If needed you can match the results to your MySQL data - thanks to your MySQL index on the key this action will be very fast.
But don't forget to update the myindex structure on INSERT, UPDATE or DELETE in your MySQL database.
This solution will only compete on a huge data-set (but not too huge as #Kyle mentioned) with a rather deep than wide hierarchical structure.
EDIT
Sorry this would only work if the wildcards are in your search terms not in the stored strings itself.
As the wildcards (*) are in your data and not in your queries I think you should start with breaking up your data into pieces. You should create an index-table having columns like:
dataGroup INT(11),
exactString varchar(100),
wildcardEnd varchar(100),
wildcardStart varchar(100),
If you have a value like "Folder1/Folder2" store it in "exactString" and assign the ID of the value in the main data table to "dataGroup" in the above index table.
If you have a value like "Folder1/*" store a value of "Folder1/" to "wildcardEnd" and again assign the id of the value in the main table to the "dataGroup" field in above Table.
You can then do a match within your query using:
indexTable.wildcardEnd = LEFT('Folder1/WhatAmILookingFor/Data', LENGTH(indexTable.wildcardEnd))
This will truncate the search string ('Folder1/WhatAmILookingFor/Data') to "Folder1/" and then match it against the wildcardEnd field. I assume mysql is clever enough not to do the truncate for every row but to start with the first character and match it against every row (using B-Tree indexes).
A value like "*/Folder4" will go into the field "wildcardStart" but reversed. To cite Missy Elliot: "Is it worth it, let me work it
I put my thing down, flip it and reverse it" (http://www.youtube.com/watch?v=Ke1MoSkanS4). So store a value of "4redloF/" in "wildcardStart". Then a WHERE like the following will match rows:
indexTable.wildcardStart = LEFT(REVERSE('Folder1/WhatAmILookingFor/Folder4'), LENGTH(indexTable.wildcardStart))
of course you could do the "REVERSE" already in your application logic.
Now about the tricky part. Something like "*/Fo*4" should get split up into two records:
# Record 1
dataGroup ==> id of "*/Fo*4" in data table
wildcardStart ==> oF/
wildcardEnd ==> /Fo
# Record 2
dataGroup ==> id of "*/Fo*4" in data table
wildcardStart ==> 4
Now if you match something you have to take care that every index-record of a dataGroup gets returned for a complete match and that no overlapping occurs. This could also get solved in SQL but is beyond this question.
Database isn't the right tool to do these kinds of searches. You can still use a database (any database and any structure) to store the strings, but you have to write the code to do all the searches in memory. Load all the strings from the database (a few thousand strings is really no biggy), cache them and run your search\match algorithm on them.
You probably have to code your algorithm yourself because the standard tools will be an overkill for what you are trying to achieve and there is no garantee that they will be able to achieve exactly what you need.
I would build a regex representation of your wildcard based strings and run those regexs on your input. Your probabaly will have to do some work until you get the regex right, but it will be the fastest way to go.
I suggest reading the keys and their associated payload into a binary tree representation ordered alphanumerically by key. If your keys are not terribly "clumped" then you can avoid the (slight additional) overhead building of a balanced tree. You also can avoid any tree maintenance code as, if I understand your problem correctly, the data will be changing frequently and it would be simplest to rebuild the tree rather than add/remove/update nodes in place. The overhead of reading into the tree is similar to performing an initial sort, and tree traversal to search for your value is straight-forward and much more efficient than just running a regex against a bunch of strings. You may even find while working it through that your wild cards in the tree will lead to some shortcuts to prune the search space. A quick search show lots of resources and PHP snippets to get you started.
If you run SELECT folder_col, count(*) FROM your_sample_table group by folder_col do you get duplicate folder_col values (ie count(*) greater than 1)?
If not, that means you can produce an SQL that would generate a valid sphinx index (see http://sphinxsearch.com/).
I wouldn't recommend to do text search on large collection of data in MySQL. You need a database to store the data but that would be it. For searching use a search engine like:
Solr (http://lucene.apache.org/solr/)
Elastic Search (http://www.elasticsearch.org/)
Sphinx (http://sphinxsearch.com/)
Those services will allow you doing all sort of funky text search (including Wildcards) in a blink of an eye ;-)
i need to sort through a column in my database, this column is my category structure the data thats in the column is city names but not all the names are the same for each city, what i need to do is go through the values in the column i may have 20-40 value that are the same city but written differently i need a script that can interpret them and change them to a single value
so i may have two values in the city column say:( england > london ) and ( westlondon ) but i need to change to just london, is there a script out there that is capable of interpreting the values that are already there and change them to the value would want i know the dificult way of doing this one by one but wondered if there was a script in any language that could complete this
I've done this sort of data clean-up plenty of times and I'm afraid I don't know of anything easier than just writing your own fixes.
One thing I can recommend is making the process repeatable. Have a replacement table with something like (rulenum, pattern, new_value). Then, work on a copy of the relevant bits of your table so you can just re-run the whole script.
Then, you can start with the obvious matches (just see what looks plausible) and move to more obscure ones. Eventually you'll have 50 without matches and you can just manually patch entries for this.
Making it repeatable is important because you'll be bound to find mis-matches in your first few attempts.
So, something like (syntax untested):
CREATE TABLE matches (rule_num int PRIMARY KEY, pattern text, new_value text)
CREATE TABLE cityfix AS
SELECT id, city AS old_city, '' AS new_city, 0 AS match_num FROM locations;
UPDATE c SET c.new_city = m.new_value, c.match_num = m.rule_num
FROM cityfix AS c JOIN matches m ON c.old_city LIKE m.pattern
WHERE c.match_num = 0;
-- Review results, add new patterns to rule_num, repeat UPDATE
-- If you need to you can drop table cityfix and repeat it.
Just an idea: 16K is not so much. first use Perl's DBI (im assuming you are going to use Perl) to fetch that city column, store it in a hash (city name as the hash), then find your an algorithm that suites your needs (performance wise) to iterate over the hash keys and use String::Diff to find matching intersection (read about it, it definitely can help you out) and store it as a value.. then you can use that to update the database using the key (old value) and the value as the new value to update.
I'm looking for a MySQL equivalent of what str_replace is for PHP. I want to replace all instances of one word with another, I want to run a query that will replace all "apples" with "oranges".
The reason why:
UPDATE fruits SET name='oranges' WHERE name='apples';
isn't going to work for my situation, is because I often times have multiple words in a table row separated by commas like: "apples, pears, pineapples". In this case I want just apples to be replaced by oranges and pear and pineapples to stay in tact.
Is there any way to do this?
You have a database design problem, as Ignacio has pointed out. Instead of including separate pieces of information included in a single column, that column should become a separate table with one piece of information per row. For instance, if that "fruits" field is in a table called "hats", you would have one table for "hats" with a column "hat_id" but no information about fruits and a second column "hat_fruits" with two columns, "hat_id" and "fruit_name". In your example, the given hat would have three rows in "hat_fruits", one for each fruit.
Once you implement this design (if you have control of the database design) you can go back to use the simple UPDATE command you originally had. In addition, you will be able to index by fruit type, search more easily, use less disk space, validate fruit names, and not have any arbitrary limit on the number of fruits that fit into the database
That said, if you absolutely cannot fix the database structure, you might try something like this:
REPLACE(REPLACE(CONCAT(',', fruits, ','), ', ', ','), ',apples,', ',oranges,')
This monstrosity first converts the fruits field to begin and end with commas, then removes any spaces before commas. This should give you a string in which fruit names are unambiguously delimited by commas. Finally, it replaces the ,apples, (note the delimiters) with ,oranges,.
After that, of course, you ought to strip off the beginning and ending commas and put back the spaces after the commas (that's left as an exercise for the reader).
Update: Okay, I couldn't resist looking it up:
REPLACE(TRIM(',' FROM REPLACE(REPLACE(CONCAT(',', fruits, ','), ', ', ','), ',apples,', ',oranges,')), ',', ' ,')
Note that this isn't tested and I'm not a MySQL expert anyway — I don't know if MySQL has function nesting issues or anything like that.
PS: Don't tell anyone I was the one who showed you this!
Not reliably. There is REPLACE(), but that will only work until you decide to add pineapples to your menu.
Putting your database in First Normal Form and then using UPDATE is the only reliable solution.
I think you want to use REPLACE():
REPLACE(str,from_str,to_str)
Returns the string str with all occurrences of the string from_str replaced by the string to_str. REPLACE() performs a case-sensitive match when searching for from_str.
Below will replace all occurances of 'apples' with 'oranges' in the 'Name' column for all the rows in the 'Fruits' table.
UPDATE fruits SET Name=REPLACE(Name,'apples','oranges')