I'm setting up to gather long time statistics. It will be recorded in little blocks that I'm planning to stick all into one TEXT field, latest first.. sorta like this
[date:03.01.2016,data][date:02.01.2016,data][date:01.01.2016,data]...
it will be more frequent than that (just a sample) but should remain small enough to keep recording for decades, yet big enough to make me want to optimize it.
I'm looking for 2 things
Can you append to the front of a field in mysql?
Can you read the field partially, just the first 100 characters for example?
The blocks will be fixed length so I can accurately estimate how many characters I need to download to display statistics for X time period.
The answer to your two questions is "yes":
update t
set field = concat($newval, field)
where id = $id;
And:
select left(field, 100)
from t
where id = $id;
(These assume that you have multiple rows in the table.)
That said, you method of storing the data is absolutely not the right thing to do in a relational database.
Presumably, you want a table that looks something like this:
create table t (
tId int auto_increment primary key,
creationDate date,
data <something>
);
(This may be more complicated if data should be multiple columns.)
Then you insert into the table:
insert into t(createDate, data)
select $date, $data;
And you can fetch the most recent row:
select t.*
from t
order by tId desc
limit 1;
All of these are just examples, because your question doesn't give a complete picture of the data.
I'm making a car part system, to store all the parts inside mysql and then search for them.
Part adding goes like this:
you select up to 280 parts and add all the car info, then all the parts are serialized and put into mysql along with all the car info in a single row.
(for this example I'll say that my current database has 1000 cars and all of those cars have 280 parts selected)
The problem is that when I have 1000 cars with each of them having 280 parts, php and mysql starts getting slow and takes a lot of time to load the data, because the number of parts is 1000*280=280 000.
I use foreach on all of the cars and then put each part into another array.
The final array has 280 000 items and then I filter it by the selected parts in the search, so out of 28 000 parts it may have only have to print like 12 500 parts (if someone is searching for 50 different parts at the same time and 250 cars have that part).
Example database: http://pastebin.com/aXrpgeBP
$q=mysql_query("SELECT `id`,`brand`,`model`,`specification`,`year`,`fueltype`,`capacity`,`parts`,`parts_num` FROM `warehouse`");
while($r=mysql_fetch_assoc($q)){
$partai=unserialize($r['parts']);
unset($r['parts']); //unsetting unserialized parts so the whole car parts won't be passed into the final parts-only array
foreach($partai as $part){
$r['part']=$parttree[$part]; //$parttree is an array with all the part names and $part is the part id - so this returns the part name by it's id.
$r['part_id']=$part; // saves the part id for later filtering selected by the search
$final[]=$r;
}
}
$selectedparts=explode('|', substr($_GET['selected'], 0,strlen($_GET['selected'])-1)); //exploding selected part ids from data sent by jquery into an array
foreach($final as $f){
if(in_array($f['part_id'], $selectedparts)){
$show[]=$f; //filtering only the parts that need to be shown
}
}
echo json_encode($show);
This is the code I use to all the cars parts into arrays and the send it as json to the browser.
I'm not working on the pagination at the moment, but I'll be adding it later to show only 10 parts.
Could solution be to index all the parts into a different table once 24h(because new parts will be added daily) and then just stressing mysql more than php? Because php is doing all the hard work now.
Or using something like memcached to store the final unfiltered array once 24h and then just filter the parts that need to be shown with php?
These are the options I considered, but I know there must be a better way to solve this.
Yes, you should definitely put more emphasis on MySQL. Don't serialize the parts for each car into a single row of a single column. That's terribly inefficient.
Instead, make yourself a parts table, with columns for the various data items that describe each part.
part_id an autoincrement item.
car_id which car is this a part of
partnumber the part's external part number (barcode number?)
etc
Then, use JOIN operations.
Also, why don't you use a WHERE clause in your SELECT statement, to retrieve just the car you want?
Edit
If you're looking for a part, you definitely want a separate parts table. Then you can do a SQL search something like this.
SELECT w.id, w.model, w.specification, w.year, w.fueltype,
p.partnumber
FROM warehouse w
JOIN parts p ON (w.id = p.car_id)
WHERE p.partnumber = 'whatever-part-number-you-want'
This will take milliseconds, even if you have 100K cars in your system, if you index it right.
Your query should be something like:
<?php
$selectedparts=explode('|', substr($_GET['selected'], 0,strlen($_GET['selected'])-1)); //exploding selected part ids from data sent by jquery into an array
$where = ' id < 0 ';
foreach ($selectedparts AS $a){
$where .= " OR `parts` like '%".$a."%'";
}
$query = "SELECT * FROM `warehouse` WHERE ".$where." ORDER BY `id` ASC";//this is your query
//.... rest of your code
?>
Yes, look into has many relationships a car has many parts.
http://net.tutsplus.com/tutorials/databases/sql-for-beginners-part-3-database-relationships/
Then you can use an inner join to get the specified parts. You can do a where clause to match the specific partIds to filter out unwanted parts or cars.
I'm trying to get data out of a two tables by an inner join. I can select the data and use a foreach loop to print it out but I get multiple sets of the same data.
This is my SQL statement
SELECT workout.*, exercise.ExerciseName, exercise.Sets, exercise.Reps, exercise.Weight
FROM workout
INNER JOIN exercise
ON workout.WorkoutID = exercise.WORKOUTID
WHERE workout.WorkoutID = 1
It brings back WorkoutID, UserID, WorkoutName, & Description three times despite it being the same information. I assume this is because ExerciseName, Sets, Reps, & Weight are different for each. This is problematic when I loop through the data to echo it out as it prints out the data 3 times, once for each different exercise.
How do I get WorkoutID, UserID, WorkoutName, & Description once and continue to get the different ExerciseName, Sets, Reps, & Weights. If that is even possible.
Thanks.
How do you want to see WorkoutID, UserID, WorkoutName, & Description only one time if your user is linked to 3 exercices with 3 different weight (correct me if i misread the question)
if the weights are the same and you want to see the information once, you can add the following command :
select DISTINCT [...]
To sum my comments as an answer:
When you have multiple rows that contain information about a single entity like the detail of a workout in this case, there is no grouping possible to remove the repetition of the workout id and user id which will always be the same. What you want to do is iterate on the result for each id.
I have an application with images stored in multiple categories, currently being stored by category ID in a column as a space separated list (eg. 1 5 23 2).
I have a query from a search filter, which is currently an array of IDs, (eg. 1 5).
Ideally, I'd find a solution using something like WHERE IN that would see if any of my array values exist in the stored column, although I don't see an easy solution.
At the moment I have to query all the images, bring them into PHP and check there, using "array_intersect". I see this as being a problem if I have 100,000s of images in the future to pull and then check.
Can anyone think of an elegant solution? The application is still in development, so I could arguably change the structure of my tables.
I think adding a map table would probably be best here which maps the image_id with the category_id.
refactor your database tables!!!
use sth like this:
table_image
id int
name text,
content text,
...
and a second table for the categories:
table_category
id int,
image_id int,
category int
this way, you can store categories in a separate table using foreign keys. now, you can do simple sql queries like
SELECT table_image.id FROM table_image, table_category WHERE table_image.id = table_category.image_id and table_category.category = $cat_arr[0] OR table_category.category = $cat_arr[1] ...
H Hatfield has the best answer. If you really must use a single column (which I do not recommend) you could store the categories as a comma separated list instead of spaces. You can then use the MySql function find_in_set, as such:
WHERE FIND_IN_SET('3', categoryListColumnName) > 0 OR FIND_IN_SET('15', categoryListColumnName) > 0
Using your current database design you could use an IN query:
WHERE categoryListColumnName LIKE '% 3 %' OR categoryListColumnName LIKE '% 15 %'
and add more OR's for every category you want to find. When using this query you have to make sure your list separated by spaces ends and starts with a space, otherwise it won't work.
Let me just reiterate, that these methods will work, but they are not recommended.
I am trying to create a script that finds a matching percentage between my table rows. For example my mySQL database in the table products contains the field name (indexed, FULLTEXT) with values like
LG 50PK350 PLASMA TV 50" Plasma TV Full HD 600Hz
LG TV 50PK350 PLASMA 50"
LG S24AW 24000 BTU
Aircondition LG S24AW 24000 BTU Inverter
As you may see all of them have some same keyword. But the 1st name and 2nd name are more similar. Additionally, 3rd and 4th have more similar keywords between them than 1st and 2nd.
My mySQL DB has thousands of product names. What I want is to find those names that have more than a percentage (let's say 60%) of similarity.
For example, as I said, 1st, 2nd (and any other name) that match between them with more than 60%, will be echoed in a group-style-format to let me know that those products are similar. 3rd and 4th and any other with more than 60% matching will be echoed after in another group, telling me that those products match.
If it is possible, it would be great to echo the keywords that satisfy all the grouped matching names. For example LG S24AW 24000 BTU is the keyword that is contained in 3rd and 4th name.
At the end I will create a list of all those keywords.
What I have now is the following query (as Jitamaro suggested)
Select t1.name, t2.name From products t1, products t2
that creates a new name field next to all other names. Excuse me that I don't know how to explain it right but this is what it does: (The real values are product names like above)
Before the query
-name-
A
B
C
D
E
After the query
-name- -name-
A A
B A
C A
D A
E A
A B
B B
C B
D B
E B
.
.
.
Is there a way either with mySQL or PHP that will find me the matching names and extract the keywords as I described above? Please share code examples.
Thank you community.
Query the DB with LIKE OR REGEXP:
SELECT * FROM product WHERE product_name LIKE '%LG%';
SELECT * FROM product WHERE product_name REGEXP "LG";
Loop the results and use similar_text():
$a = "LG 50PK350 PLASMA TV 50\" Plasma TV Full HD 600Hz"; // DB value
$b = "LG TV 50PK350 PLASMA 50\"" ; // USER QUERY
$i = similar_text($a, $b, $p);
echo("Matched: $i Percentage: $p%");
//outputs: Matched: 21 Percentage: 58.3333333333%
Your second example matches 62.0689655172%:
$a = "LG S24AW 24000 BTU"; // DB value
$b = "Aircondition LG S24AW 24000 BTU Inverter" ; // USER QUERY
$i = similar_text($a, $b, $p);
echo("Matched: $i Percentage: $p%");
You can define a percentage higher than, lets say, 40%, to match products.
Please note that similar_text() is case SensItivE so you should lower case the string.
As for your second question, the levenshtein() function (in MySQL) would be a good candidate.
When I look at your examples, I consider how I would try to find similar products based on the title. From your two examples, I can see one thing in each line that stands out above anything else: the model numbers. 50PK350 probably doesn't show up anywhere other than as related to this one model.
Now, MySQL itself isn't designed to deal with questions like this, but some bolt-on tools above it are. Part of the problem is that querying across all those fields in all positions is expensive. You really want to split it up a certain way and index that. The similarity class of Lucene will grant a high score to words that rarely appear across all data, but do appear as a high percentage of your data. See High level explanation of Similarity Class for Lucene?
You should also look at Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?
Scoring each word against the Lucene similarity class ought to be faster and more reliable. The sum of your scores should give you the most related products. For the TV, I'd expect to see exact matches first, then some others of the same size, then brand, then TVs in general, etc.
Whatever you do, realize that unless you alter the data structures by using another tool on top of the SQL system to create better data structures, your queries will be too slow and expensive. I think Lucene is probably the way to go. Sphinx or other options not mentioned may also be up for consideration.
This is trickier than it seems and there is information missing in your post:
How are people going to use this auto-complete function?
Is it relevant that you can find all names for a product? Because apparently not all stores name their products similarly so a clerk might not be able to find the product (s)he found.
Do you have information about which product names are for the same product?
Is it relevant from which store you're searching? where is this auto-complete used?
Should the auto-complete really only suggest products that match all the words you typed? (it's not so hard, technically, to correct typos)
I think you need a more clear picture of what you (or better yet: the users) want this auto-complete function to do.
An auto-complete function is very much a user-friendly type feature. It aids the user, possibly in a fuzzy way so there is no single right answer. You have to figure out what works best, not what is easiest to do technically.
First figure out what you want, then worry about technology.
One possible solution is to use Damerau-Levenstein distance. It could be used like this
select *
from products p
where DamerauLevenstein(p.name, '*user input here*')<=*X*
You'll have to figure out X that suites your needs best. It should be integer greater than zero. You could have it hard-coded, parameterized or calculated as needed.
The trickiest thing here is DamerauLevenstein. It has to be stored procedure, that implements Damerau-Levenstein algorithm. I don't have MySQL here, so I might write it for you later this day.
Update: MySQL does not support arrays in stored procedures, so there is no way to implement Damerau-Levenstein in MySQL, except using temporary table for each function call. And that will result in terrible performance. So you have two options: loop through the results in PHP with levenstein like Alix Axel suggests, or migrate your database to PostgreSQL, where arrays are supported.
There is also an option to create User-Defined function, but this requires writing this function in C, linking it to MySQL and possibly rebuilding MySQL, so this way you'll just add more headache.
Your approach seems sound. For matching similar products, I would suggest a trigram search. There's a pretty decent explanation of how this works along with the String::Trigram Perl module.
I would suggest using trigram search to get a list of matches, perhaps coupled with some manual review depending on how much data you have to deal with and how frequent you need to add new products. I've found this approach to work quite well in practice.
Maybe you want to find the longest common substring from the 2 strings? Then you need to compute a suffix tree for each of your strings see here http://en.wikipedia.org/wiki/Longest_common_substring_problem.
If you want to check all names against each other you need a cross join in mysql. There are many ways to achieve this:
1. Select a, b From t1, t2
2. Select a, b From t1 Join t2
3. Select a, b From t1 Cross Join t2
Then you can loop through the result. This is the same when I say create a 2d array with n^2-(n-1) elements and each element is connected with each other.
P.S.: Select t1.name, t2.name From products t1, products t2
It sounds like you've gone through all this trouble to explain a complex scenario, then said that you want to ignore the optimal answers and just get us to give you the "handshake" protocol (everything is compared to everything that hasn't been compared to it yet). So... pseudocode:
select * from table order by id
while (result) {
select * from table where id > result_id
}
That will do it.
If your database simply had a UPC code as one of it's fields, and this field was well-maintained, i.e., you could trust that it was entered correctly by the database maintainer and correctly reflected what the item was -- then you wouldn't need to do all of the work you suggest.
An even better idea might be to have a UPC field in your next database -- and constrain it as unique.
Database users attempt to put an-already-existing UPC into the database -- they get an error.
Database maintains its integrity.
And if such a database maintained its integrity -- the necessity of doing what you suggest never arises.
This probably doesn't help much with your current task (apologies) -- but for a future similar database -- you might wish to think about it...
I`d advise you to use some fulltext search engine, like sphinx. It has possibilities to implement any algorithm you want. For example, you may use "quorom" or "any" searches.
It seems that you might always want to return the shortest string?? That's more or a question than anything. But then you might have something like...
SELECT * FROM products LIMIT 1
WHERE product_name like '%LG%'
ORDER BY LENGTH(product_name) ASC
This is a clustering problem, which can be resolved by a data mining method. ( http://en.wikipedia.org/wiki/Cluster_analysis) It requires a lot of memory and computation intensive operations which is not suitable for database engine. Otherwise, separate data mining, text mining, or business analytics software wouldn't have existed.
This question is similar :) to this one:
What is the best way to implement a substring search in SQL?
Trigram can easily find similar rows, and in that question i posted a php+mysql+trigram solution.
You can use LIKE to find similar product names within the table. For example:
SELECT * FROM product WHERE product_name LIKE 'LG%';
Here is another idea (but I'm voting for levenshtein()):
Create a temporary table of all words used in names and their frequencies.
Choose range of results (most popular words are probably words like LCD or LED, most unique words could be good, they might be product actual names).
Suggest for each of result words either:
results with those words
results containing longest substring (like this: http://forums.mysql.com/read.php?10,277997,278020#msg-278020 ) of those words.
Ok, I think I was trying to implement very much similar thing. It can work the same as the google chrome address box. When you type the address it gives you the suggestions. This is what you are trying to achieve as far I am concerned.
I cannot give you exact solution to that but some advice.
You need to implement the dropdown box where someone starts to enter the product they are looking for
Then you need to get the current value of the dropdown and then run query like guy posted above. Can be "SELECT * FROM product WHERE product_name LIKE 'LG%';"
Save results of the query
Refresh the page
Add the results of the query to the dropdown
Note:
You need to save the query results somewhere like the text file with the HTML code i.e. "option" LG TS 600"/option" (add <> brackets to option of course). This values will be used for populating your option box after the page refresh. You need to set up the users session for the user to get the same results for the same user, otherwise if more users would use the search at the same time it could clash. So, with the search id and session id you can match them then. You can save it in the file or the table. Table would be more convenient. It is actually in my sense the whole subsystem for that what are you looking for.
I hope it helps.