I've got a tree-algorithm and database problem I'd like to have an answer to.
I've got a few areas, let's say 20. Each of these areas have sub-areas ~ 20 each.
These parent areas are spread out on a map. Some of these parent areas are close to eachother.
The database looks like this: [area_id, title, parent_id] - Some has multiple children, there's a root node containing all areas. (The Adjacency List Model)
To make this in a picture I could do this:
The different areas as I said, can be close to eachother, (or far away). I'd like someway to tie let's say Area 1 and Area 5 together because I know they're close, and Area 1 is also close to Area 4. Now, here's the problem, let's say that Area 4 is also close to Area 5.
It would look like something like this:
Which makes it an infitite loop? because I want Area 1 to be close to Area 4, but also Area 4 is close to Area 1.
I'd like to do a search, where you can select "search nearby areas", so you select one area then you can search the nearby ones. I could use some tips, on how to solve this with a database and php.
I've been looking around on this forum for help but I don't really know the "name" of this problem, I'd be happy if someone could point me in the right direction or straight up help me out in thist thread.
Thanks guys, and if it's something else you need to know I'll try to answer as soon as possible.
For anything that is dealing with proximity, I would certainly take an approach of putting in either geospatial information (if these are true areas/regions) and then applying a radial search which can be done via any number of simple through to complex queries and calculations.
If these places are on the other hand fictional, it might be interesting to consider making a fake location - even if it a simple x,y coordinate system. This will allow you to perform radial searches again - which you can enlarge or shrink to your needs - or even simply order the results in ascending distance from site a to b.
To subdivide an area you need a rectangle that you can split along the axis. Look at kd-tree, r-tree, or quadtrees and spatial index. I can recommend you my php class Hilbert curve. It's a monster curve and fills completely the plane. You can find it at phpclasses.org.
I finally solved it using a kind of "neighboring" select statment.
I did this by creating another table which contains neighbor-relationships. That table looked like: [table_id, area_id, neighbor_area_id]
Here I added all the neighbors that are avaliable, with some INNER JOIN and select statment I managed to get out what I wanted, so a search can be done for all areas neighboring the selected one.
The sql-statment looked like this:
SELECT adds.title, categories.title, area.title
FROM adds
INNER JOIN categories ON categories.category_id = adds.category_id
INNER JOIN areas ON adds.area_id = areas.area_id
WHERE areas.area_id IN (SELECT area_neighbors.area_neighbor_id
FROM area_neighbors
WHERE area_id='25')
OR adds.area_id='25'
This would give me all adds in neighboring areas to area_id 25.
I cannot say if this is the smartest or best solution but it's one that works for me. Hope this helps someone! and thanks for all the replies!
Related
Related question
I have a very similar query to this question below:
Search GROUP_CONCAT using LIKE
I think my question is pretty much identical, but in CodeIgniter. Therefore I think I am just looking for the same answer but converted into active record language...
My question
To provide some background, in my case I have a many to many relationship so with three tables:
companies which has two fields (company_id and company_name)
sectors which has two fields (sector_id and text)
companies_sectors which has two fields (company_id and sector_id)
(One company can operate in multiple sectors, multiple companies operate in the same sector.)
I have grouped by company to show sectors.sector_name as a group_concat field and I have given an alias to this concatenated field at the select level:
$this->db->select('sectors.sector_id, GROUP_CONCAT(DISTINCT sectors.text SEPARATOR "; ") as sector_text', false);
I want to include a filter which selects where 'sector_text' (the group_concat field) includes the text from the query form. I understand that, because I want to run the filter on an aggregated list, I should use "having" and not "where". Per the answer to the link above, it looks like MySQL has a HAVING LIKE, but I was struggling to replicate this under CodeIgniter's active record (in fact my understanding is that CodeIgniter's ->like() is a WHERE x LIKE y which is not what I am looking for...)
At the moment I am only using a LIKE:
$this->db->like('sectors.text', $this->input->post('sector_text') );
But this filters before the grouping, which means the output will only show the sector that was searched for. For example, if Company A operates in "fishing" and "shipping" while Company B operates only in "fishing", and a user searched for "fishing", I want the result to show:
Company A - Fishing; Shipping
Company B - Fishing
(This is the desired result!)
But at the moment I am only getting:
Company A - Fishing
Company B - Fishing
... which I think is because I have used like, which filers pre-grouping?
Can someone please assist? Many thanks in advance!
PS If I can also use the alias "sector_text" instead of sector.text that would be ideal (I think I have read that "Having" allows you to use the alias?)
I found the answer! I just modified the "LIKE" to:
$sector_text = $this->input->post('sector_text');
$this->db->having("sector_text LIKE '%$sector_text%' ");
It feels a little off doing it via active record, but it works. If there is a solution that keeps it within active record then please let me know as I would probably prefer this!
Many thanks!
I'm currenlty building a webshop. This shop allows users to filter products by category, and a couple optional, additional filters such as brand, color, etc.
At the moment, various properties are stored in different places, but I'd like to switch to a tag-based system. Ideally, my database should store tags with the following data:
product_id
tag_url_alias (unique)
tag_type (unique) (category, product_brand, product_color, etc.)
tag_value (not unique)
First objective
I would like to search for product_id's that are associated with anywhere between 1-5 particular tags. The tags are extracted from a SEO-friendly url. So I will be retrieving a unique strings (the tag_url_alias) for each tag, but I won't know the tag_type.
The search will be an intersection, so my search should return the product_id's that match all of the provided tags.
Second objective
Besides displaying the products that match the current filter, I would also like to display the product-count for other categories and filters which the user might supply.
For instance, my current search is for products that match the tags:
Shoe + Black + Adidas
Now, a visitor of the shop might be looking at the resulting products and wonder which black shoes other brands have to offer. So they might go to the "brand" filter, and choose any of the other listed brands. Lets say they have 2 different options (in practice, this will probably have many more), resulting in the following searches:
Shoe + Black + Nike > 103 results
Shoe + Black + K-swiss > 0 results
In this case, if they see the brand "K-swiss" listed as an available choise in their filter, their search will return 0 results.
This is obviously rather disappointing to the user... I'd much rather know that switching the "brand" from "adidas" to "k-swiss" will 0 results, and simply remove the entire option from the filter.
Same thing goes for categories, colors, etc.
In practice this would mean a single page view would not only return the filtered product list described in my primary objective, but potentially hundreds of similar yet different lists. One for each filter value that could replace another filter value, or be added to the existing filter values.
Capacity
I suspect my database will eventually contain:
between 250 and 1.000 unique tags
And it will contain:
between 10.000 and 100.000 unique products
Current Ideas
I did some Google searches and found the following article: http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
Judging by that article, running hundreds of queries to achieve the 2nd objective, is going to be a painfully slow route. The "toxy" example might work for my needs and it might be acceptable for my First objective, but it would be unacceptably slow for the Second objective.
I was thinking I might run individual queries that match 1 tag to it's associated product_id's, cache those queries, and then calculate intersections on the results. But, do I calculate these intersections in MySQL? or in PHP? If I use MySQL, is there a particular way I should cache these individual queries, or is supplying the right indexes all I need?
I would imagine it's also quite possible to maybe even cache the intersections between two of these tag/product_id sets. The amount of intersections would be limited by the fact that a tag_type can have only one particular value, but I'm not sure how to efficiently manage this type of caching. Again, I don't know if I should do this in MySQL or in PHP. And if I do this in MySQL, what would be the best way to store and combine this type of cached results?
Using sphinx search engine can make this magic for you. Its is VERY fast, and even can handle wordforms, what can be useful with SEO requests.
In terms of sphinx, make a document - "product", index by tags, choose proper ranker for query (ex, MATCH_ALL_WORDS) and run batch request with different tag combinations to get best results.
Dont forget to use cachers like memcahed or any other.
I did not test this yet, but it should be possible to have one query to satisfy your second objective rather than triggering several hundred queries...
The query below illustrates how this should work in general.
The idea is to combine the three different requests at once and group by the dedicated value and collect only those which have any results.
SELECT t1.product_id, count(*) FROM tagtable t1, tagtable t2, tagtable t3 WHERE
t1.product_id = t2.product_id AND
t2.product_id = t3.product_id AND
t1.tag_type='yourcategoryforShoe' AND t1.tag_value='Shoe' AND
t2.tag_type='product_color' AND t2.tag_value='Black' AND
t3.tag_type='brand'
GROUP BY t3.tag_value
HAVING count(*) > 0
I'm really hoping someone can help me with this. I have a number of product attribute types that users can select from to refine the products that are returned to them on screen. What I'm trying to do is, for each product attribute type, I want to list all attributes that relate to either the selected category or search term, then once they've made their selections, I still want to display each of the attributes that relate to the category or search term, but only display a clickable link if the product count for that particular attribute is greater than 1 and for those that have a product count of zero, I want to list them, but make them unclickable. An example of what I'm trying to achieve can be found on the ASOS website, in the left hand menu
http://www.asos.com/Women/Dresses/Cat/pgecategory.aspx?cid=8799#state=Rf961%3D3340%2C3341%40Rf-200%3D20&parentID=Rf-300&pge=0&pgeSize=20&sort=-1
Initially I tried using just joins to achieve this, but I wasn't able to do it, successfully. So I decided to create a temporary table for each attribute type which held a list of all the attributes that related to the main query and then created a refined query, with a left join. Here's my code:
CREATE TEMPORARY TABLE temp_table
SELECT su_types.id, type AS item FROM su_types
INNER JOIN su_typerefs ON su_types.id=su_typerefs.id
INNER JOIN su_pref ON su_typerefs.mykey = su_pref.mykey
WHERE wp_category_id =40 GROUP BY su_typerefs.id
$sudb->query($query);
if ($sudb->affected_rows > 0) {
SELECT temp_table.id,item,COUNT(su_typerefs.mykey) AS product_count FROM temp_table
LEFT JOIN su_typerefs ON temp_table.id=su_typerefs.id
LEFT JOIN su_pref ON su_typerefs.mykey = su_pref.mykey
LEFT JOIN su_stylerefs ON su_pref.mykey = su_stylerefs.mykey
LEFT JOIN su_productrefs ON su_pref.mykey = su_productrefs.mykey
WHERE wp_category_id =40 AND su_stylerefs.id in (91) AND su_productrefs.id in (54) AND su_typerefs.id in (159) GROUP BY su_typerefs.id
if ($itemresults = $sudb->query($query)) {
while($itemresult = $itemresults->fetch_array(MYSQLI_ASSOC)) {
$id=$itemresult['id'];
$item=$itemresult['item'];
$product_count=$itemresult['product_count'];
build_link($list_type, $item, $product_count, $id);
}
}
In the above example the first query selects all the product types that relate to a particular category, say dresses. And the second query is based on the refinements the user has made on the category, in this example this is product, product type and style. A user can also refine their search by colour, fit, fabric and design.
There are a couple of issues with this:
1) The number of results returned in the second query do not match the results of the first. Using the above as an example, I wish to list all products that relate to the chosen category, then using the second query return the product count for each of these products as I described above. So if the temporary table returns, trousers, jeans and skirts. I expected these three items to be displayed on screen based on the conditions applied in the second query, however my results may only show trousers and jeans, if there is not a match for skirts in the second query. I thought that using a left join would mean that all the results of the temporary table would be displayed.
2)Also I wonder if I'm doing this the most efficient way. I have a total of 8 attribute groups, and therefore need to do the above 8 times. If the user choses to refine the results using all 8 attribute groups then in addition to the temp table join, there will be a total of 9 joins for each type. It's taking a while to execute, is there a better way to do this? There are approximately 1/2 million products in the table, and this will probably be 5 times this, once my site goes live.
I really hope all that I have written makes sense and I'd really appreciate the stackoverflow community's help with this, if anyone can help. I apologise for the essay ;). Thanks in advance
To answer your first question; yes, a LEFT JOIN will indeed keep all data from the initial table. That, however, isn't the problem.
The reason why you lose empty categories, is most likely (I say this because I don't fully know your db structure) because of the where condition filtering out all results based on the data in the joined tables.
If for a category all items get filtered out (possibly including the NULL joined values), you will not get this category back from that query anymore. Also the GROUP BY is done on a joined column, that might also effectively wipe out your other categories.
As for the second question, you already state it's taking long; so it's probably not the way to go if you want things to work fast ;) (okay, obvious answer, low hanging fruit, etc). What you might want to do, is get a collection of keys from the filterable categories first, and use that data to select items.
This prevents that you have to join up your entire products table in a temp table (at least, that's what I think you're doing), which of course will take long with the given number of entries. Selecting a list of matching IDs from the given attributes also gives you the advance of using your indexes (more), which a temp-table probably won't have. If this is possible and feasible mainly depends on your schema's structure; but I hope it might lead you to the direction you want to go :)
I wish all nodes of a certain type to have a rank (or at least be sortable in Views by this rank). This rank is a score based on different criteria: Voting API (5-star rating) average, Voting API number of votes, number of comments etc. Any suggestions on how to achive this?
You would have to create your own views sort handler, where you calculate the ranking score and sort the nodes by that. Start by looking at the views doc, and find some modules doing this and look at their code. Views can be a bit overwhelming at first, but just stick to it and experiment and you will find out how to do it.
This may help you
Fivestar uses percentage voting. You want to sort by the results of the vote: use the "Voting API percent vote result (average)" field, and it should do the trick.
views is the simpliest way to do this, look to the sort criteria then add the fields that you want.
we often see 'related items'. For instance in blogs we have related posts, in books we have related books, etc. My question is how do we compile those relevency? If it's just tag, I often see related items that does not have the same tag. For instance, when search for 'pink', a related item could have a 'purple' tag.
Anyone has any idea?
There are many ways to calculate similarity of two items, but for a straightforward method, take a look at the Jaccard Coefficient.
http://en.wikipedia.org/wiki/Jaccard_index
Which is: J(a,b) = intersection(a,b)/union(a,b)
So lets say you want to compute the coefficient of two items:
Item A, which has the tags "books, school, pencil, textbook, reading"
Item B, which has the tags "books, reading, autobiography"
intersection(A,B) = books, reading
union(A,B) = books, school, pencil, textbook, reading, autobiography
so J(a,b) = 2/6 = .333
So the most related item to A would be the item which results in the highest Jaccard Coefficient when paired with A.
Here are some of the ways:
Manually connecting them. Put up a table with the fields item_id and related_item_id, then make an interface to insert the connections. Useful to relate two items that are related but have no resemblance or do not belong to the same category/tag (or in an uncategorized entry table). Example: Bath tub and rubber ducky
Pull up some items that belong to the same category or have a similar tag. The idea is that those items must be somewhat related since they are in the same category. Example: in the page viewing LCD monitors, there are random LCD monitors (with same price range/manufacturer/resolution) in the "Related items" section.
Do a text search matching current item's name (and or description) against other items in the table. You get the idea.
To get a simple list of related items based on tags, the basic solutions goes like this:
3 tables, one with items, one with tags and one with the connection. The connection table consists of two columns, one for each id from the remaining tables. An entry in the connection table links a tag with an item by putting their respective ids in a row.
Now, to get that list of related items.
fetch all items which share at least one tag with the original item. be sure to fetch the tags along with the items, and then use a simple rating mechanism to determine, which item shares the most tags with the original one. each tag increases the relation-relevancy by one.
Depending on your tagging-habits, it might be smart to add some counter-mechanism to prevent large overarching tags from mixing up the relevancy. to achieve this, you could give greater weight to tags below a certain threshold of appliances. A threshold which has generally worked nicely for me, is total_number_of_tag_appliances/total_number_of_tags, which results in the average number of appliances. If the tags appliance-count is smaller than average, the relation-relevancy is increased double.
It can be more than a tag, for example it can be average of each work appearing in a paragraph, and then titles, etc
I would say they use ontology for that which adds more great features to the application.
it can also be based on "people who bought this book also bought"
No matter how, you will need some dort of connection between your items, and they will mostly be made by human beings
This is my implementation(GIST) of Jaccard index with PostgreSQL, and Ruby on Rails...
Here is an implementation of jaccard index between two texts based on bigrams.
https://packagist.org/packages/darkopetreski/textcategorization