omit search results

omit search results - php

using foursquare api php
I am performing a search for venues with nightlife categories:
$params = array("near"=>"92101", "radius"=>"800", "intent"=>"checkin",
"categoryId"=>"4d4b7105d754a06376d81259", "limit"=>"50");
$venues = $foursquare->GetPublic("venues/search", $params);
works as expected...kind of. the problem is restaurants that have been sub categorized as bars are filling up my return limit. so in that search i may only get a few actual nightlife venues. it would be very helpful if i could omit venues that have certain categories. get 50 nightlife venues but not the ones also labeled as food.
i have searched around and keep re-reading the search endpoint page hoping i overlooked the omit feature. any help?

We have had the same problem (different category types)
What we ended up doing is performing several searches with specific categories. The categoryId field accepts multiple comma delimited categories, so we executed sometimes up to 3 searches with multiple categoryIds.
So in stand of asking for a single category, your request would look like (no 'bars', i just picked a couple of random nightlife categories):
$params = array("near"=>"92101", "radius"=>"800", "intent"=>"checkin",
"categoryId"=>"4bf58dd8d48988d11f941735,4bf58dd8d48988d121941735,...", "limit"=>"50");
And then do another request with the general nightlife
$params = array("near"=>"92101", "radius"=>"800", "intent"=>"checkin",
"categoryId"=>"4d4b7105d754a06376d81259", "limit"=>"50");
And merge the results.
Two things to note with this solution:
You may (probably) get overlapping results from multiple searches, as venues sometimes have more than one category (as you found out already), remember to handle the multiple results. We first scanned all the results and kept on the unique ones according to the foursquare ID, then started our processing.
This solution does not scale well with the foursquare API - doing 40 searches will not work.. (but there is no other way of getting what you need without it, so I am still writing the entire solution here)

Related

PHP - Sort and order an array of objects by category name

I am learning PHP and have decided to code my own OOP MVC framework. Now, I have realized several times already that it might not be the smartest move but I mean to see this out to the end. And then onwards...
My issue is creating a listing sidebar based on categories and a second based on year-month-postname.
I am officially stuck on the first one, let alone the second damn option. I have included some code and description of what I have tried. The lack of OOP info on the net is daunting or maybe it is because I am searching for the wrong thing, I dont know. But the tutorials have not given me any insight as to how to actually do this in a way where my database is in a model file and my class logic is in the class file.
Sorting logic should be like this Array-Object-Propertyname-Value.
The value, as I hope is easy to understand in my example below, is the category name eg Javascript, PHP, HTML. By that category i wish to sort my blog posts. But not in the way that requires me to manually input the category names to the code. I want to allow users to enter categories if they so choose.
I also wish to display the blog posts inside said category, lets say 5 most recent. But that should not be too hard with a
for($i=0,$i<5,i++)
nested inside whatever solution in the end will work for the category sort.
I have tried MySQLI procedural solutions ranging from multiple google searches and tutorials. Can do it, but dont want to do it procedurally. Tried foreach loop and nesting multiple foreach loops - simply cannot get either the problem of having duplicates based on the shared category name or if trying to group in the SQL query, it simply groups results with same category and then displays only the first one in the group. While loops with mysqli procedural work but with pdo in my case they produce infinite loops, no matter the condition I try to set.
So foreach is the way to go I believe. I have read up on loops and array sorting but I've yet to find a solution. I thought of sorting by key because that is what i need but to no applicable solutions.
It's easy to display the category names and dates and all that. But with category I always get duplicates.
Ive tried some logic where as to assign category names as variables but only to have them all be different variables, meaning still having duplicates or only rewriting the variable with each iteration.
Also, array sorts havent worked because I havent gotten any to work with sorting either on property or if converting Objects to a multidimensional array. Granted that may be because I am a beginner and not understood the syntax but I am not going to post them all here I think.
If you think an array sorting function will do the trick then perhaps give me an example and I will look into it with some new perspective hopefully.
PDO query :
'SELECT * FROM postTable
INNER JOIN userTable ON postTable.postUserId = userTable.id
INNER JOIN postCategories ON postTable.postCategoryId = postCategories.categoryName
ORDER BY postTable.postDate DESC'
Tried also to add
GROUP BY categoryName
but that resulted in only one entry per category shown when using var_dump. Sidenote - same is when grouping by creation date. Is there another layer added to the array when using group in the SQL command and I missed that in the docs?
PDO returns to view file :
$this->stmt->fetchAll(PDO::FETCH_OBJ);
this all gets passed into an array of
$results
and then that is sent to the php on the view page where the resulting array has this structure with var_dump.
array() {
[0]=>
object(stdClass)# () {
["categoryName"]=>
string() "Help"
}
[1]=>
object(stdClass)# () {
["categoryName"]=>
string() "Me"
}
and so on.
Note - also tried using -
fetchAll(PDO::FETCH_ASSOC);
But ive had similar failures with attempting any sorting or limiting to just one category name displayed but all entries under said category being displayed correctly and not just one per category.
I will be checking back when i finish work tomorrow so in about 20-22 hours from the time of posting.
If you need any more info just let me know and I'll post it.

You can order by multiple columns. Use:
ORDER BY categoryName, postDate DESC
This will keep all the posts in the same category together, and in decreasing date order within each category.
See How can i list has same id data with while loop in PHP? for how you can output the results, showing a heading for each category.
If you just want to get the 5 latest posts in each category, see Using LIMIT within GROUP BY to get N results per group?

Proper way of doing a Twitter-like follower and retweets system on CakePHP 2?

I'm super new to CakePHP. I've searched everywhere for this but I can't seem to be able to get it right, or find any sort of orientation. I still don't get how the whole HABTM thing works and I'm expecting to learn more from this.
I'm trying to do a Twitter-like system, with users and followers, and posts (tweets) and shares (retweets). I've set up the users and posts models, and join tables for followers (between users and users) and shares (between users and posts). How should I set up my model associations? I've been trying several ways but I'm not certain on whether I'm doing it right or not.
And the other question is, what would be the proper find query to get all posts by the people I follow, plus the posts they've shared, without getting all the unnecessary data like user info and such, just the posts in one array? Is it possible with find in one query, or should I do several and then merge the arrays? Plus, it would be extremely useful to understand how to properly filter and limit this rather complex query (obtaining a "posts timeline" between certain date ranges, limit the posts to a certain amount, or both).
I know my question is a little bit silly, but I swear I've done a lot of research and I can't seem to be able to get it right. So any help, especially with the query part, would be greatly appreciated.
Thanks!

So these would be some weird relationships. I'm feeling you should have the following tables:
Users (with alias Followers) hasMany tweets
Posts belongsTo Users
UsersFollowers (A HABTM table)
To make this work on just three tables, Posts would need to be a threaded table. In essence, if a person retweets (shares) a post a new record is created with the id of the original post in the new posts parent_id column. Then when the record was called the model could pull up the additional data and include it in the feed.
The alias aspect of Users allows for the follower part to be done in just one table. To find followers of a person, search with one key of the table (follower_id) and to find the people a person follows just search with the other key (users_id).
As for the second part of your question, finds should be pretty easy in this setup, but you might want to read up on Containable and threaded queries. You could include timestamp columns in the tables so you could later do a search by date feature (or a post timeline).

PHP, MySQL, Efficient tag-driven search algorithm

I'm currenlty building a webshop. This shop allows users to filter products by category, and a couple optional, additional filters such as brand, color, etc.
At the moment, various properties are stored in different places, but I'd like to switch to a tag-based system. Ideally, my database should store tags with the following data:
product_id
tag_url_alias (unique)
tag_type (unique) (category, product_brand, product_color, etc.)
tag_value (not unique)
First objective
I would like to search for product_id's that are associated with anywhere between 1-5 particular tags. The tags are extracted from a SEO-friendly url. So I will be retrieving a unique strings (the tag_url_alias) for each tag, but I won't know the tag_type.
The search will be an intersection, so my search should return the product_id's that match all of the provided tags.
Second objective
Besides displaying the products that match the current filter, I would also like to display the product-count for other categories and filters which the user might supply.
For instance, my current search is for products that match the tags:
Shoe + Black + Adidas
Now, a visitor of the shop might be looking at the resulting products and wonder which black shoes other brands have to offer. So they might go to the "brand" filter, and choose any of the other listed brands. Lets say they have 2 different options (in practice, this will probably have many more), resulting in the following searches:
Shoe + Black + Nike > 103 results
Shoe + Black + K-swiss > 0 results
In this case, if they see the brand "K-swiss" listed as an available choise in their filter, their search will return 0 results.
This is obviously rather disappointing to the user... I'd much rather know that switching the "brand" from "adidas" to "k-swiss" will 0 results, and simply remove the entire option from the filter.
Same thing goes for categories, colors, etc.
In practice this would mean a single page view would not only return the filtered product list described in my primary objective, but potentially hundreds of similar yet different lists. One for each filter value that could replace another filter value, or be added to the existing filter values.
Capacity
I suspect my database will eventually contain:
between 250 and 1.000 unique tags
And it will contain:
between 10.000 and 100.000 unique products
Current Ideas
I did some Google searches and found the following article: http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
Judging by that article, running hundreds of queries to achieve the 2nd objective, is going to be a painfully slow route. The "toxy" example might work for my needs and it might be acceptable for my First objective, but it would be unacceptably slow for the Second objective.
I was thinking I might run individual queries that match 1 tag to it's associated product_id's, cache those queries, and then calculate intersections on the results. But, do I calculate these intersections in MySQL? or in PHP? If I use MySQL, is there a particular way I should cache these individual queries, or is supplying the right indexes all I need?
I would imagine it's also quite possible to maybe even cache the intersections between two of these tag/product_id sets. The amount of intersections would be limited by the fact that a tag_type can have only one particular value, but I'm not sure how to efficiently manage this type of caching. Again, I don't know if I should do this in MySQL or in PHP. And if I do this in MySQL, what would be the best way to store and combine this type of cached results?

Using sphinx search engine can make this magic for you. Its is VERY fast, and even can handle wordforms, what can be useful with SEO requests.
In terms of sphinx, make a document - "product", index by tags, choose proper ranker for query (ex, MATCH_ALL_WORDS) and run batch request with different tag combinations to get best results.
Dont forget to use cachers like memcahed or any other.

I did not test this yet, but it should be possible to have one query to satisfy your second objective rather than triggering several hundred queries...
The query below illustrates how this should work in general.
The idea is to combine the three different requests at once and group by the dedicated value and collect only those which have any results.
SELECT t1.product_id, count(*) FROM tagtable t1, tagtable t2, tagtable t3 WHERE
t1.product_id = t2.product_id AND
t2.product_id = t3.product_id AND
t1.tag_type='yourcategoryforShoe' AND t1.tag_value='Shoe' AND
t2.tag_type='product_color' AND t2.tag_value='Black' AND
t3.tag_type='brand'
GROUP BY t3.tag_value
HAVING count(*) > 0

Weighing search results

PHP / MySQL backend. I've got a database full of movies YouTube-style. Each video has a name and category. Videos and categories have a m:n relationship.
I'd like for my visitors to be able to search for videos and have them enter the search terms in one search field. I can't figure out how to return the best search results based on being category, occurrences in name.
What's the best way to go about something like this? Scoring? => Check for each search term whether it occurs in the name of the video; if so, award the video a point; check if the video is in categories that are also contained in the search query; if so, award it a point. Sort it by number points received? That sounds very expensive in terms of CPU usage.

Using Full-Text Search may help: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html#function_match
You can test several columns at once against an expression.

First, use full text search. It can be either MySql full-text search or some kind of extrenal full-text search engine. I recommend sphinx. It is very fast, simple and even can be integrated with MuSQL using SphinxSE (so search indexes look loke tables in MySQL). However you have to install and configure it.
Second, think about splitting search results by search type. Any kind of full-text search will return list of matched items sorted by relevancy. You can search by all fields and get a single list. This is bad idea because hits by name and hits by category will be mixed. To solve this you can do multiple searches - search by name first, then search by category.
As a result you'll have two matching sets and you have a lot of options how to display this. Some ideas:
merge 2 sets based on relevancy rate returned by the search engine. This looks like result of one single query but you know what each item is (name hit or category hit) so you can highlight this
do the same marge as above but assign different weights to different sets, for eaxmple relevancy = 0.7*name_relevancy+0.3*category_relevancy. This will make search results more natural
spit results into tabs/groups e.g. 'There are N titles and M categories matching your query)
Use bands when displaying results. For each page (assuming you are splitting search results using paginator) dispslay N items from the first set and M items from the second set (you can dipslya sets one by one or shuffle items). If there is no enough items in one of sets then just get more items from another set, so there is always M+N items per page
Any other way you can imagine
And you can use this method for any kind of fields - name, categroy, actor, director, etc. However the more fields you use the more search queries you have to execute

I don't think you can avoid looking at the title and category of every movie for each search. So the CPU usage for that is a given. If you are concerned about the CPU usage of the sort, it would be negligible in most cases, since you would only be sorting the items that have more than zero points.
Having said that, what you probably want is a system that is partially rule-based and partially point-based. For instance, if you have a title that is equal to the search term, it should come first, regardless of points. Architect your search such that you can easily add rules and tweak points as you see fit to yield the best results.
Edit: In the event of an exact title match, you can take advantage of a DB index and not search the whole table. Optionally, the same goes for category.

How to find "related items" in PHP

we often see 'related items'. For instance in blogs we have related posts, in books we have related books, etc. My question is how do we compile those relevency? If it's just tag, I often see related items that does not have the same tag. For instance, when search for 'pink', a related item could have a 'purple' tag.
Anyone has any idea?

There are many ways to calculate similarity of two items, but for a straightforward method, take a look at the Jaccard Coefficient.
http://en.wikipedia.org/wiki/Jaccard_index
Which is: J(a,b) = intersection(a,b)/union(a,b)
So lets say you want to compute the coefficient of two items:
Item A, which has the tags "books, school, pencil, textbook, reading"
Item B, which has the tags "books, reading, autobiography"
intersection(A,B) = books, reading
union(A,B) = books, school, pencil, textbook, reading, autobiography
so J(a,b) = 2/6 = .333
So the most related item to A would be the item which results in the highest Jaccard Coefficient when paired with A.

Here are some of the ways:
Manually connecting them. Put up a table with the fields item_id and related_item_id, then make an interface to insert the connections. Useful to relate two items that are related but have no resemblance or do not belong to the same category/tag (or in an uncategorized entry table). Example: Bath tub and rubber ducky
Pull up some items that belong to the same category or have a similar tag. The idea is that those items must be somewhat related since they are in the same category. Example: in the page viewing LCD monitors, there are random LCD monitors (with same price range/manufacturer/resolution) in the "Related items" section.
Do a text search matching current item's name (and or description) against other items in the table. You get the idea.

To get a simple list of related items based on tags, the basic solutions goes like this:
3 tables, one with items, one with tags and one with the connection. The connection table consists of two columns, one for each id from the remaining tables. An entry in the connection table links a tag with an item by putting their respective ids in a row.
Now, to get that list of related items.
fetch all items which share at least one tag with the original item. be sure to fetch the tags along with the items, and then use a simple rating mechanism to determine, which item shares the most tags with the original one. each tag increases the relation-relevancy by one.
Depending on your tagging-habits, it might be smart to add some counter-mechanism to prevent large overarching tags from mixing up the relevancy. to achieve this, you could give greater weight to tags below a certain threshold of appliances. A threshold which has generally worked nicely for me, is total_number_of_tag_appliances/total_number_of_tags, which results in the average number of appliances. If the tags appliance-count is smaller than average, the relation-relevancy is increased double.

It can be more than a tag, for example it can be average of each work appearing in a paragraph, and then titles, etc

I would say they use ontology for that which adds more great features to the application.

it can also be based on "people who bought this book also bought"
No matter how, you will need some dort of connection between your items, and they will mostly be made by human beings

This is my implementation(GIST) of Jaccard index with PostgreSQL, and Ruby on Rails...

Here is an implementation of jaccard index between two texts based on bigrams.
https://packagist.org/packages/darkopetreski/textcategorization

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.