Filtering within REST api - php

We're currently in the process of building a RESTful API. Now, it's a matter of what the best way of tackling filtering is.
We have /products. /products returns all given products you have access to. Now, let's say you want the products where the description matches exactly 'No description'. You'd get /products?description=No+description.
Now, ideally we would have more filter options. Show only products where the stock is more than or equal to 1, but less than 10. Show only products where the name ends in black, or starts with white. What is the best practice of doing this? Would we use logical operators in the URL, how would we escape wild cards?
Current state of affairs is:
/products?product_name=%25black will find all products with names ending in black.
or
/products?product_name=white%25 will find all products with names starting with white.
%25 is the encoded form of %. So far so good.
But what if someone wants to find a product where the name matches the literal % character? Or wants to find products with stock? Would it be best to introduce
min_stock and max_stock, or is it possible (or do we even want to?) to use logical operators (?stock=>=1&stock=<=5). Is there a standard for handling URLs or situations like this?
Are we overthinking? Is it possible? Should we not do filtering our end, but let users figure it out themselves?

REST paradigm is about ressources (all you access is ressource) and human understandability. That's why you make your listing url plural.
With that said, I do think, if you want to filter in two different ways (with =, like, regex...) you have two possibilitiees :
first create three different filters product_name_exact, product_name_like, product_name_regex. It looks like python.django way of filtering and it's quite elegant;
second way : create one query field, and then a query_mode it is quite the way bing api works.

Related

How to match similar products for a price comparison website

I am working on a small price/product comparison website, it's a niche website related to laptops and tablets, built in php.
My problem/question is how to do the following :"matching similar products from different merchants". I mean, when the product has EAN/ISBN, a simple %LIKE% can do it. But the datafeeds I get have a lot of products missing the ean or any other unique ID. How do price comparison websites deal with this?
I'm thinking of searching for string similarity between products names, but I don't want to match : Acer iconia tab a500
and acer iconia tab a500 case as similar products. any ideas?
Thank you !
To implement the comparison you have to put some tags for the products.And when a person search for a product, list the other products which have the same tag.
eg: for a laptop tags are like laptop, acer, 14", 500$(price), etc.
So when someone search laptop, list all the laptops. so that he can choose 2 of them. and make comparison.
Hope you got the concept.
I faced a similar problem. There are different solutions.
You can find similar items with some search technology (full text search engines can be helpful) or by using some data mining methods (have a look at named entity recognition for recognizing brand, model, color..etc. and especially machine learning methods for text mining). Latter can be much more accurate if you do it well.
In both methods, then you can use some additional fuzzy logic for string comparison of the words that can be written in different ways. and general predefined rules to eliminate wrong items. for example, considering the prices can differentiate an item and its accessories although they have very similar titles.

Cleaning up text? "The Beatles" to "Beatles, The"

I'm working on website with lyrics from all kind of bands, much like Lyrics.com I guess. What I have right now is a page that echo's the name of the band, the title of the song and the text itself from the database. I would like to properly categorize this.
Take for example "Strawberry Fields Forever" by "The Beatles".
I would like to categorize this as "B" as in "Beatles". And on Example.com/b/ list every band that starts with the letter B. My question:
The name of the band is The Beatles but "The" should be dropped. How would I do this? Making two columns in the database author and authour-clean would be way to much work.
Also, my URL currently is:
example.com/lyrics.php?id=1
. I would like this to look like example.com/b/beatles/strawberry-fields-forever. From Googling I understand this can be done with .htacces? Is my database designed correctly for this right now? This is what it looks like ATM:
(darn, cant post images -- here is plain text)
id (int10)
title (varchar255)
author (varchar255)
lyrics (text)
I was thinking I need another column, e.g. category and for this example the value b (as in Beatles) to more easly list all bands starting with B, and to make sure the htaccess thing is possible?
The name of the band is The Beatles but "The" should be dropped. How would I do this? Making two columns in the database author and authour-clean would be way to much work.
While this might appear to be more initial work, you'd find that it is a solution which would require less work in the long run.
If you were to pre-index the author's by how they are supposed to be searched then you can let SQL do all of the work for you when it comes to returning results.
Storing the data properly in the database is always preferred over doing complex processing (over and over) when pulling the data out. Space is a lot cheaper than processing power, not to mention how much faster this would end up being in the long run.
There are a few ways you can accomplish goal number one. The best way would either be a preg_replace like Trendee suggests or even breaking the string into an array and then searching for instances of words you'd like to replace. The array version is cool because you can easily shuffle stuff around.
As for the second goal, you're looking at mod_rewrite. What is happening is that when you go to your url example.com/b/beatles/strawberry-fields-forever, you'll have a rewrite rule that says "treat each / as if it were part of a query string" and you define what each one is. So in reality, your url is:
?category=b&band=beatles&song=strawberry-fields-forever.
There are tons of examples on how to do this
I think this might be of use.
http://php.net/manual/en/function.preg-replace.php

Need an algorithm to find near-duplicate text values

I run a photo website where users are free to enter any tag they like, even tags not used before. As a result, a photo of a tag may sometimes be tagged as "insect" whilst somebody else tags it as "insects".
I'd like to keep the free-tagging capability, yet would like to have a way to filter out such near-duplicates. The total collection of tags is currently at 1,500. My idea is to read all of them from the DB into mem and then run an alghoritm on it that displays "suspects".
My idea of a suspect is that x% of the characters in the string are the same (same char and order), where x is configurable. I could probably code a really inefficient way to do this but I was wondering if there is an existing solution to this problem?
Edit: Forgot to mention: just sorting the tags isn't enough, as that would require me to go through the entire set to find dupes.
There are some flaws in your logic. For example, what happens when the plural of an object is different from the singular (i.e. person vs. people or even candy vs. candies).
If English is the primary language, check out Soundex which allows phonetic matches. Also consider using a crowd-sourced synonym model where users can create links to existing tags.
Maybe the algorithm you are looking for is approximate string matching.
http://en.wikipedia.org/wiki/Approximate_string_matching.
by a given word you can match it to list of words and if the 'distance' is close add it to suspects.
A fast implementation is to use dynamic programming like the Needleman–Wunsch algorithm.
I have made a blog example of this in C# where you can configure the 'distance' using a matrix character lookup file.
http://kunuk.wordpress.com/2010/10/17/dynamic-programming-example-with-c-using-needleman-wunsch-algorithm/
Is "either contains either" fine? You could do a SQL query something like this, if your images are in a database (which would only make sense):
SELECT * FROM ImageTags WHERE INSTR('theNewTag', TagName) > 0 OR INSTR(TagName, 'theNewTag') > 0 LIMIT 1;
If you really want to do this efficiently I would suggest some sort of JavaScript implementation that displays possibilities as the user is typing in a tag that they want. Not only will it save the user time to happily see 5 suggestions as they type. It will automatically stop them from typing "suspects" when "suspect" shows up as a suggestion. That is, of course, unless they really want "suspects" as a point of urgency.
You could load a huge list of words and as the user types narrow them down. I get the feeling that this could be very simplistic esp if you want to anticipate correctly spelled words. If someone misses a letter, they'll probably go back to fix it when they see a list of suggestions that isn't at all what they meant to type. And when they do correctly type a word it'll pop up in the suggestions.

CakePHP urls with unique ids

I have seen urls such as this on some CakePHP websites: http://sample.com/posts/WordPress_get_URL_based_on_page_post_name-O8C
What would the 08C part be? As on my current setup I pass the title and id to the url to give each item a nice url e.g. http://driz.co.uk/cake/portfolio/view/NA_Software-4 but my id is just a number. How would I change it to get a 3 character id that mixed numbers and letters?
Thanks
I guess the tiny number is just a short slug.
If you already use integers for your records I don't see a point of adding an additional overhead to create tiny slugs. Also the tiny slug won't have always 3 characters when you get a decent amount of records. Tiny slugs make the most sense if you need a short URL like in emails or for twitter and other similar usecases.
However if you want to use them the CakeDC Utils plugin comes with a TinySluggable behavior.
https://github.com/CakeDC/utils

PHP Detect Pages Genre/Category

I was wondering if their was any sort of way to detect a pages genre/category.
Possibly their is a way to find keywords or something?
Unfortunately I don't have any idea so far, so I don't have any code to show you.
But if anybody has any ideas at all, let me know.
Thanks!
EDIT #Nican
Perhaps their is a way to set, let's say 10 category's (Entertainment, Funny, Tech).
Then creating keywords for these category's (Funny = Laughter, Funny, Joke etc).
Then searching through a webpage (maybe using a cUrl) for these keywords and assigning it to the right category.
Hope that makes sense.
What you are talking about is basically what Google Adsense and similar services do, and it's based on analyzing the content of a page and matching it to topics. Generally, this kind of stuff is beyond what you would call simple programming / development and would require significant resources to be invested to get it to work "right".
A basic system might work along the following lines:
Get page content
Get X most commonly used words (omitting stuff like "and" "or" etc.)
Get words used in headings
Assign weights to different words according to a set of factors (is used in heading, is used in more than one paragraph, is used in link anchors)
Match the filtered words against a database of words related to a specific "category"
If cumulative score > treshold, classify site as belonging to category
Rinse and repeat
Folksonomy may be a way of accomplishing what you're looking for:
http://en.wikipedia.org/wiki/Folksonomy
For instance, in Drupal they have a Folksonomy module:
http://drupal.org/node/19697 (Note this module appears to be dead, see http://drupal.org/taxonomy/term/71)
Couple that with a tag cloud generator, and you may get somewhere:
http://drupal.org/project/searchcloud
Plus, a little more complexity may be able to derive mapped relationships to other terms, especially if you control the structure of the tagging options.
http://intranetblog.blogware.com/blog/_archives/2008/5/22/3707044.html
EDIT
In general, the type of system you're trying to build relies on unique word values on a page. So you would need to...
Get unique word values from your content (index values or create a bot to crawl your site)
Remove all words and symbols you can't use (at, the, or, and, etc...)
Count the number of times the unique words appear on the page
Add them to some type of datastore so you can call them based on the relationships you're mapping
If you have a root label system in place, associate those values with the word counts on the page (such as a query or derived table)
This is very general, and there are a number of ways this can be implemented/interpreted. Folksonomies are meant to "crowdsource" much of the effort for you, in a "natural way", as long as you have a user base that will contribute.

Categories