I think this is a fairly simple problem, though it's a little difficult to explain.
Let's start with a list of state symbols - state flags, flowers, birds, etc. The rows in my MySQL database match each symbol with the URL at which its page appears. So if field Designation = 'flower', then the URL will be arizona/flower, new-york/flower, or something like that.
However, some states have multiple symbols in various categories; e.g. two state birds, three state flowers, etc. Making it still more confusing, I haven't yet figured out if I should describe all of a state's flower symbols on one page or make a unique page for each flower symbol.
For the time being, I'm playing it by ear, as indicated by the following database data. In this example, Arizona has two flower symbols, each discussed on the same page (URL = arizona/flower).
+----------+-----------------+-------------+
| State-ID | URL | Designation |
+----------+-----------------+-------------+
| us-az | arizona/flower | flower |
| us-az | arizona/flower | flower |
| us-az | arizona/tree | tree |
| us-fl | florida/mammal | mammal |
| us-fl | florida/mammal2 | mammal |
| us-fl | florida/mammal2 | mammal |
| us-fl | florida/mammal3 | mammal |
| us-fl | florida/bird | bird |
| us-fl | florida/bird2 | bird |
+----------+-----------------+-------------+
However, Florida has FOUR official mammals. The first one (the Florida panther) is discussed # mysite/florida/mammal. The two marine mammals are discussed at mysite/florida/mammal2, and the state horse is featured # mysite/florida/mammal3.
So here's my question: How can I write a query that 1) distinguishes between single designations (e.g. Arizona's state tree) and multiple designations (e.g. Arizona's state flowers) and 2) also tells me if the multiple designations are linked to a single URL or multiple URL's?
It will take me a while to iron out all the kinks, but, for now, it would be very helpful if I had a query that listed ONLY multiple URL's. For example, it wouldn't even display Arizona's state tree. The query could serve as sort of a snapshot of my list of symbols, helping me identify all the multiple designations and which of them are linked to single URL's vs multiple URL's.
I'm working with PHP and MySQL on a Mac.
P.S. I should point out that there are additional fields, including one that gives symbols a specific value (e.g. 'Florida panther', 'manatee', 'dolphin', 'Cracker Horse').
SELECT `State-ID`, Designation, COUNT(*) all_count, COUNT(DISTINCT URL) url_count
FROM YourTable
GROUP BY `State-ID`, Designation
url_count will tell you if they're linked to the same URL or different ones. You can add HAVING url_count > 1 to restrict the results only to items that link to multiple URLs.
You can use the HAVING clause to filter on aggregate functions
SELECT State-ID, Designation, COUNT(*) AS cnt
FROM stateTable
GROUP BY State-ID,Designation
HAVING COUNT(*)>1
That will give you all State-ID,Designation combinations that have more than one row in the table. If you also group by URL you can determine if they share a URL.
Basically you should read up on grouping and aggregation in MySQL: 12.15.1 GROUP BY (Aggregate) Functions and MySQL SELECT Syntax
Related
I want to set up a system that allows for, say, 200 different translations per post. However most translations wouldn't exist, so there'd be a lot of empty datasets. How much of a performance and storage hit is it if I save every language (including empty ones) as a specific column? I.E.
English | Arabic | Mandarin | Russian | French | German
Potato | | | | Pomme de Terre |
Orange | | | | Orange |
Peach | | | | |
I wouldn't cycle through the whole list very often, I'd use a session variable or usersetting and then load directly from that column if it exists, with a fallback to a default language, and perhaps after that a full search.
if (exists(french))
{echo french}
else {if(exists(english))
{echo english}}
else {echo links to non-null language}
}
I'd assume that, if I tell the server which column to go to, the overhead in terms of processing would be negligible? I also assume that an empty cell would be negligible in terms of storage? However I don't know for sure, and it could potentially be a huge mistake.
The reason I'd want to work like this is so I could assign language codes, instead of every installed instance having a different order (e.g. english|french|german|mandarin versus english|mandarin|german|french).
To prevent XY-problems, here's a more global formulation:
I want to set up a system that allows for many languages, but I expect that in most cases only 1 or two are used. What would be an efficient way to store?
Keyword: Relational database.
You will want to use multiple tables.
Let's say that the default langauge is english, then your "words" table will implicitly contain the english words.
Words:
Id | Word
1 | Potato
2 | Orange
Languages:
Id | Name
1 | Norwegian
2 | Danish
Translations:
Word | Language | Translated
1 | 1 | Potet
2 | 1 | Oransje
1 | 2 | Kartoffel
2 | 2 | Appelsin
Then you can do (pseudo sql, you can look up the language and word ids first, or use a more advanced query):
SELECT Translated FROM Translations WHERE Word = (the word id) and Language = (the language id)
This comes with the benefit that it's very simple to list all the languages you support, all the Words you support, and also all translated words for a specific language (or, find all NON translated words for a language).
A specific query for translating "Potato" into "Danish" would look like:
SELECT Translated FROM Translations
JOIN Words ON Words.Id = Translations.Word
JOIN Languages ON Languages.Id = Translations.Language
WHERE
Languages.Name = "Danish" and Words.Word = "Potato"
I've got a search puzzle I need to solve, but my skillset is minimal, so apologies if I don't explain this well. To try and demonstrate the problem, here is an example of the data in two database columns:
| Start address | End address |
-----------------------------------------------
1 | Essex | Moortown, Leeds |
2 | Place A, London | Place B, Manchester |
3 | Townsville, Essex | Leeds Town Hall |
4 | Essex Trading Estate | Another Leeds Estate |
5 | Somewhere, Devon | Yeoville |
6 | ... | ... |
If, for example, a user submits "21 Some Street, Essex" and "Leeds some place" in the corresponding form fields, I need to search the MySQL database and pull back the top X number of best matches, which in this example would be rows 1, 3 and 4, as they all contain Essex in the first column and Leeds in the second.
I can see that PHP has two functions similar_text() and levenshtein(), which may help with this, but I'm not sure which is the more appropriate for this sort of part matching and how to get the most accurate search results. I've not found anything similar within MySQL.
Is anybody with experience of this able to give me any pointers, please?
Cheers
Andy
You can use simple like function to get the required search output. To make overcome the case sensitive use lower or upper.
select start_address,end_address from table where lower(start_address) like '%urfirststring%'
and lower(end_address) like '%ursecondstring%'
Pass your search strings also in lower .
I am trying to create a relatively simple hierarchical tagging system that can be searched. Here's how it works as of now, this is the MySQL table structure:
--------------------------------------------
id | tag | parentID | topParentID |
--------------------------------------------
1 | Boston | NULL | NULL |
--------------------------------------------
2 | Events | 1 | 1 |
--------------------------------------------
3 | June 30th | 2 | 1 |
--------------------------------------------
4 | NYC | NULL | NULL |
--------------------------------------------
5 | Attractions | 4 | 4 |
--------------------------------------------
So, if a user types Boston in the search bar, they will be delivered the suggestions "Boston Events" and "Boston Events June 30th". Similarly, if they type NYC in the search bar, they will be delivered "NYC Attractions" as a suggestion.
Also, if someone typed Events into the search bar, they would get the suggestion "Boston Events" or if they typed June 30th, they would get the suggestion "Boston Events June 30th"
I've messed around with code to do this, and I can definitely break the query string into keywords then search the tag table for each of the keywords and return matches, but I have not found the correct way to return the full tag strings in the format I mentioned above.
Well, you can join the same table twice. Suppose, we have $id - id of the current tag:
SELECT
tags.id,
tags.tag,
parent_tags.id,
parent_tags.tag,
parent2_tags.id,
parent2_tags.tag,
FROM
tags
INNER JOIN
tags AS parent_tags
ON
tags.parentID = parent_tags.id
INNER JOIN
tags AS parent2_tags
ON
tags.topParentID = parent2_tags.id
WHERE
tags.id=$id
But it will give parents and grandparents twice because of the incorrect data in your table: parent.id = parent2.id
Actually, this is a very primitive solution, allowing only 2 levels of hierarchy to be displayed in 1 request. If you want to implement any levels, read about nested sets on stack. And there is a great book: "Trees and hierarchies in SQL for smarties" by Joe Celko
I think that you may delete the topParentID column and add one called "level" (Boston would have level 0, events level 1, June 30th level 2).
So you cold order by this level column and implode the values so you would have something like what you wish.
You can do that without the level column, but I think it will be a lot more work on the php side.
I am planning on setting up a filter system (refine your search) in my ecommerce stores. You can see an example here: http://www.bettymills.com/shop/product/find/Air+and+HVAC+Filters
Platforms such as PrestaShop, OpenCart and Magento have what's called a Layered Navigation.
My question is what is the difference between the Layered Navigation in platforms such as Magento or PrestaShop in comparison to using something like Solr or Lucene for faceted navigation.
Can a similar result be accomplished via just php and mysql?
A detailed explanation is much appreciated.
Layered Navigation == Faceted Search.
They are the same thing, but Magento and al uses different wording, probably to be catchy. As far as I know, Magento supports both the Solr faceted search or the MySQL one. The main difference is the performance.
Performance is the main trade-off.
To do faceted search in MySQL requires you to join tables, while Solr indexes the document facets automatically for filtering. You can generally achieve fast response times using Solr (<100ms for a multi-facet search query) on average hardware. While MySQL will take longer for the same search, it can be optimized with indexes to achieve similar response times.
The downside to Solr is that it requires you to configure, secure and run yet another service on your server. It can also be pretty CPU and memory intensive depending on your configuration (Tomcat, jetty, etc.).
Faceted search in PHP/MySQL is possible, and not as hard as you'd think.
You need a specific database schema, but it's feasible. Here's a simple example:
product
+----+------------+
| id | name |
+----+------------+
| 1 | blue paint |
| 2 | red paint |
+----+------------+
classification
+----+----------+
| id | name |
+----+----------+
| 1 | color |
| 2 | material |
| 3 | dept |
+----+----------+
product_classification
+------------+-------------------+-------+
| product_id | classification_id | value |
+------------+-------------------+-------+
| 1 | 1 | blue |
| 1 | 2 | latex |
| 1 | 3 | paint |
| 1 | 3 | home |
| 2 | 1 | red |
| 2 | 2 | latex |
| 2 | 3 | paint |
| 2 | 3 | home |
+------------+-------------------+-------+
So, say someones search for paint, you'd do something like:
SELECT p.* FROM product p WHERE name LIKE '%paint%';
This would return both entries from the product table.
Once your search has executed, you can fetch the associated facets (filters) of your result using a query like this one:
SELECT c.id, c.name, pc.value FROM product p
LEFT JOIN product_classification pc ON pc.product_id = p.id
LEFT JOIN classification c ON c.id = pc.classification_id
WHERE p.name LIKE '%paint%'
GROUP BY c.id, pc.value
ORDER BY c.id;
This'll give you something like:
+------+----------+-------+
| id | name | value |
+------+----------+-------+
| 1 | color | blue |
| 1 | color | red |
| 2 | material | latex |
| 3 | dept | home |
| 3 | dept | paint |
+------+----------+-------+
So, in your result set, you know that there are products whose color are blue and red, that the only material it's made from is latex, and that it can be found in departments home and paint.
Once a user select a facet, just modify the original search query:
SELECT p.* FROM product p
LEFT JOIN product_classification pc ON pc.product_id = p.id
WHERE
p.name LIKE '%paint%' AND (
(pc.classification_id = 1 AND pc.value = 'blue') OR
(pc.classification_id = 3 AND pc.value = 'home')
)
GROUP BY p.id
HAVING COUNT(p.id) = 2;
So, here the user is searching for keyword paint, and includes two facets: facet blue for color, and home for department. This'll give you:
+----+------------+
| id | name |
+----+------------+
| 1 | blue paint |
+----+------------+
So, in conclusion. Although it's available out-of-the-box in Solr, it's possible to implement it in SQL fairly easily.
Magento Enterprise Edition has an implementation of Solr with faceted search. Still you need to configure Solr to index the correct data; i.e. Solr runs on Java on a host with a specific port. Magento connects to it through a given url. When Magento sets up the faceted search, it does a request to Solr and processes the received xml into a form on the frontend.
The difference would be one of speed. Requesting to Solr is very fast. If you have about 100,000+ products in your shop and want quick responses on search requests, you can use Solr. But still, if you have a separate server for the Magento database with a lot of memory, you can also just use Magento's built in Mysql based faceted search. If you don't have money to spend on Magento EE, you can use this solr implementation. But I do not have any experience with this one.
out of the solr box, you can use calculated facet, range, choose a facet or exclude one, declare if a facet is mono valued, or multi valued with a very low cpu/ram cost
On the other hand, it takes some time to parameter and secure the solr installation, it also takes some time to crawl your data.
You can created faceted search with just PHP and MySQL, Drupal Faceted Search is a good example. But if you already use Solr, you get faceted search included for free.
We are building a help desk application for running our service company, and I am trying to figure out to assist the call center people in assigning a category based the problem description from the customer.
My primary idea, is to compare the description the customer gave, to prior descriptions, and use the category that was used in the prior service calls based on the most common category assigned.
Any ideas how to do it?
My description field is a blob field as some descriptions are quite long. I would prefer to find a way to do this that requires the least system resources.
Thanks for any input :)
Mike
I'm a person of custom code; I don't feel the job is done right if you use big, bloated systems, so take this with a grain of salt if you are not wanting to code this yourself. However, this might not be as hard as you're making it; yes, I would definitely go with a tagging system. However, it doesn't have to be so complicated.
Here's how I would handle it:
First, make a database with 3 tables; one for categories, tags, and 'links' (links between categories and tags).
Then, create a PHP function that initializes an array (empty works just fine) and pushes new (lowercased) words if they don't exist. An example of this might be:
<?php
// Pass the new description to this
// function.
function getCategory($description)
{
// Lowercase it all
$description = strtolower($description);
// Kill extra whitespace
$description = trim($description);
$description = preg_replace('~\s\s+~', ' ', $description);
// Kill anything that isn't a number or a letter
// NOTE: This is untested, so just edit this however you'd like to make it work. The
// idea is to just eliminate everything that isn't a letter or number. Just don't take out
// spaces; we need them!
$descripton = trim($description, "!##$%^&*()_+-=[]{};:'\"\\\n\r|<>?,./");
// Now the description should just contain words with a single space in between them.
// Let's break them up.
$dict = explode(" ", $description);
// And find the unique ones...
$dict = array_unique($dict, SORT_STRING);
// If you wanted to, you could trim either common words you specify,
// or any words under, say, 4 characters. Up to you!
return $dict;
}
?>
Next, populate your database how you want; make a few categories and some tags, and then link them together (if you want to get fancy, switch the MySQL engine to InnoDB and make relationships. Makes things a bit quicker!)
Table `Categories`
|-------------------------|
| Column: Category |
| Rows: |
| Food |
| Animals |
| Plants |
| |
|-------------------------|
Table `Tags`
|-------------------------|
| Column: Tag |
| Rows: |
| eat |
| hamburger |
| meat |
| leaf |
| stem |
| seed |
| fur |
| hair |
| claws |
| |
|-------------------------|
Table `Links`
|-------------------------|
| Columns: tag, category |
| Rows: |
| eat, Food |
| hamburger, Food |
| meat, Food |
| leaf, Food |
| leaf, Plant |
| stem, Plant |
| fur, Animals |
| ... |
|-------------------------|
By using MySQL InnoDB relationships, the links table will not take up any more space by creating rows; this is because they are linked, in a way, and are all stored by reference. This will immensely cut down on database size.
Now, for the kicker, a clever mysql query to the database, which follows these steps:
For each category, sum up the tags belonging both to the category and the description dictionary (which we created in the earlier PHP function).
Sort them from greatest to least
Pull the top 1 or 3 or however many suggested categories you'd like!
This will get you a nice list of categories that have the highest matching count of tags. How you want to craft the MySQL query is up to you.
While this seems like a lot of setup, it really isn't. You have 3 tables at most, one or two PHP functions and a few MySQL queries. The database will only be as big as the categories, the tags and the references to both (in the links table; references don't take up much space!)
To update the database, simply put in tags that don't exist to the tags database and link them to the category you decided to assign to the description. This will broaden your database's range of tags and will, over time, get your database more tuned to your descriptions (i.e. more accurate).
If you wanted to get really detailed, you'd insert duplicate links between categories and tags to create a sort of weighted tag system, which would make your system even more accurate.