I'm trying to make a PHP-based location search. I want it to be as 'smart' as possible, being able to find both addresses and hotels, musea etc.
Now I am currently using the Google Geocoding API, but the problem is that it can only seem to find addresses (when I input a hotel name it finds either nothing or some location on the other side of the planet).
I looked further and found the Places API, which can find all kinds of businesses and other locations. Problem is, I don't think (though correct me if I'm wrong) it can find normal adresses.
So my ideal situation would be being able to look for adresses AND other places at the same time. I would like to receive either a list of results sorted by relevance (determined by Google), or only the most relevant result.
Thanks in advance!
Wouter Florign,
Your current problem has a few components:
(1) the request/response from the Google Geocoding API
(2) the request/response from the Google Places API
(3) your workflow to process the responses/data from the above to API calls.
The main objective of your code is to maintain consistency between related and dependent objects without sacrificing code reusability (the continuation of your workflow is dependent upon your API responses). In order to ensure this, you should use the Observer pattern to wait for your requests to complete in order to continue your workflow. The reason for using the observer pattern and not using promises is that PHP is almost completely single-threaded. Beacuse of this, an implementation with a promise will block your script until it is complete.
If you feel more comfortable using promises, you can have your promise fork from the main script, (using the PCNTL family of functions). This will allow your promise code to run in the background, while the main script continues. It makes active use of pcntl_fork, which allows you to fork a new thread. When the Promise completes, it comes back. It has drawbacks - the biggest of them being the inability to message the main process by anything but signals.
Another caveat:
I implemented something very similar to this a couple of years ago. I believe I ran into the same problem;
In my case, I was able to leverage the Yelp API. This API is really fantastic]. All you have to do is perform a GET request on the Search API using the optional longitude, latitude parameter (it also has a radius parameter to limit your search). With this, I was able to get all kinds of information about businesses given the locations: hotels, restaurants, professional services [doctors, dentists, physical therapists]) and I was able to sort it based on various metrics (satisfactions, relevance, etc).
Please let me know if you have any questions!
Related
I would like to implement an API using php and mysql technologies that can handle several thousands of requests per second.
I haven't did this kind of API before. If you have an experienced to implement similar task could you please tell me what are the steps?
How can I implement such kind of API that can handle thousands of request per second?
I would be glad if you could explain with sample codes.
Thanks in advance for your help.
Based on the details described in the post, you likely want to use an asynchronous, stateless architecture. So requests don’t block resources and can scale easier (always sounds easier than actually doing it ;)).
Without knowing to what other services these servers would connect (it certainly doesn’t make things easier), I’d go for Elixir/Erlang as programming language and use Phoenix as a framework.
You get a robust functional language which comes with a lot of great built-in features/modules (e.g. mnesia, roll/unroll versions while being live) and scales well (good in utilizing all cores of your server).
If you need to queue up requests to the 2nd tier servers AMQP client/server (e.g. RabbitMQ) might be a good option (holds/stores the requests for the servers).
That works pretty okay if it’s stateless, in form of client asks one thing and the server responds once and is done with the task. If you have many requests because the clients ask for updates every single second, then you’re better switching to a stateful connection and use WebSockets so the server can push updates back to a lot of clients and cuts a lot of chatter/screaming.
All of this writing is from a ‘high up view’. In the end, it depends on what kind of services you want to provide. As that narrows down what the ‘suitable tool’ would be. My suggestion is one possibility which I think isn’t far off (Node.js mentioned before is also more than valid).
Well you need to consider several factors such as:
Authenticating the API. Your API should be called by valid users that are authorized and authenticated
Caching API results. Your API should cache the results of API call. This will allow your API to handle requests more quickly, and it will be able to handle more requests per second. Memcache can be used to cache results of API call
The API architecture. RESTFul APIs have less overhead as compared to SOAP based APIs. SOAP based APIs have better support for authentication. They are also better structured then RESTFul APIs.
API documentation. Your API should be well documented and easy for users to understand.
API scope. Your API should have a well defined scope. For example will it be used over the internet as a public API or will it be used as private API inside corporate intranet.
Device support. When designing your API you should keep in mind the devices that will consume your API. For example smart phones, desktop application, browser based application, server application etc
API output format. When designing your API you should keep in mind the format of the output. For example will the output contain user interface related data or just plain data. One popular approach is known as separation of concerns (https://en.wikipedia.org/wiki/Separation_of_concerns). For example separating the backend and frontend logic.
Rate limiting and throttling. Your API should implement rate limiting and throttling to prevent overuse and misuse of the API.
API versioning and backward compatibility. Your API should be carefully versioned. For example if you update your API, then the new version of your API should support older version of API clients. Your API should continue to support the old API clients until all the API clients have migrated to the new version of your API.
API pricing and monitoring. The usage of your API should be monitored, so you know who is using your API and how it is being used. You may also charge users for using your API.
Metric for success. You should also decide which metric to use for measuring the success of your API. For example number of API calls per second or monitory earnings from your API. Development activities such as research, publication of articles, open source code, participation in online forums etc may also be considered when determining the success of your API.
Estimation of cost involved. You should also calculate the cost of developing and deploying your API. For example how much time it will take you to produce a usable version of your API. How much of your development time the API takes etc.
Updating your API. You should also decide how often to update your API. For example how often should new features be added. You should also keep in mind the backward compatibility of your API, so updating your API should not negatively affect your clients.
Good answer, I think one thing to keep in mind is where the bottleneck is. Many times, the bottleneck isn't the API server itself but the data access patterns with the persistence layer.
Think about how you access your data. For posting new items, a lot of times the processing can be delayed and processed async to the original request. For example if resizing an image or sending an email, you can integrate RabmitMQ or SQS to queue up a job, which can be processed later by workers. Queues are great in buffering work so that if a server goes down, stuff is just queued up to be processed once back online.
On the query side, it is important to understand how indexing works and how data is stored. There are different types of indices, for example hash tables can give you constant access time, but you cannot perform range queries with hash tables. The easiest is if you have simple decentralized data objects queried by identifiers which can be stored in a index. If you're data is more complex where you need to do heavy joins or aggregations, then you can look at precomputed values stored in something like Redis or memcache.
I am defining out specs for a live activity feed on my website. I have the backend of the data model done but the open area is the actual code development where my development team is lost on the best way to make the feeds work. Is this purely done by writing custom code or do we need to use existing frameworks to make the feeds work in real time? Some suggestions thrown to me were to use reverse AJAX for this. Some one mentioned having the client poll the server every x seconds but i dont like this because it is unwanted server traffic if there are no updates. I was also mentioned a push engine like light streamer to push from server to browser.
So in the end: What is the way to go? Is it code related, purely pushing SQL quires, using frameworks, using platforms, etc.
My platform is written in PHP codeignitor and DB is MySQL.
The activity stream will have lots of activities. There are 42 components on the social networking I am developing, each component has approx 30ish unique activities that can be streamed.
Check out http://www.stream-hub.com/
I have been using superfeedr.com with Rails and I can tell you it works really well. Here are a few facts about it:
Pros
Julien, the lead developer is very helpful when you encounter a problem.
Immediate push of new feed entries which support PubSubHubHub.
JSon response which is perfect for parsing whoever you'd like.
Retrieve API in case the update callback fails and you need to retrieve the latest entries for a given feed.
Cons
Documentation is not up to the standards I would like, so you'll likely end up searching the web to find obscure implementation details.
You can't control how often superfeedr fetches each feed, they user a secret algorithm to determine that.
The web interface allows you to manage your feeds but becomes difficult to use when you subscribe to a loot of them
Subscription verification mechanism works synchronous so you need to make sure the object URL is ready for the superfeedr callback to hit it (they do provide an async option which does not seem to work well).
Overall I would recommend superfeedr as a good solution for what you need.
How copyscape uses google API?
The ajax api works only on browsers with javascript enabled, So this api is not used. The SOAP api is not used, because it is not allowed to be used for commercial use and no more than 100 queries are allowed per day.
Copyscape not uses Google api instead it uses Google search it does a simple curl request to http://www.google.com/search?q=Search Keywords here . Then uses regexp patterns to find title, descriptions and links and shows to user. But this strictly violates Google terms of service which can also get them ban, so they uses proxies(or any other ip hiding method) to hide their ip for each search
From their FAQ they have explained how they do it.
Where does Copyscape get its results?
Copyscape uses Google and Yahoo! as search providers, under agreed
terms. These search providers send standard search results to
Copyscape, without any post-processing. Copyscape uses complex
proprietary algorithms to modify these search results in order to
provide a ?plagiarism checking service. Any charges are for
Copyscape's value-added services, not for the provision of search
results by the search providers.
http://www.copyscape.com/faqs.php#providers
Analysis
CopyScape made us 100% sure that Google and Yahoo have special agreements. I am 80% sure that CopyScape are using a similar search solution (probably undisclosed but similar) to Google Enterprise Search provided by the search engines.
CopyScape does not do scraped results, but is fetching API based formats like json and xml. Which is good for the providers (Google and Yahoo) for bandwidth and response time improvements. I came up with this part due to my previous attempts to scrape google search results via python by phrase searches ("phrase matching"). Your scraping bot cannot and no known way to bypass 503 that google will respond after couple of hundred results (100 search intervals or 50 search intervals).
They obviously did not do some browser automation then fetching data between web drivers and programming languages like python. I have tried doing it and it gave similar results except that the automated searcher will need some manual intervention for the captcha which will then let you continue with the scraping. I also tried using some latest bypass which was patch in just minutes/seconds. Surely they did not do any automated scraping from search engines and if ever they are doing it. It will not work long term.
How they are using their special privilege?
Since they have paid off / have special terms they can now automate from the special APIs. They are either using Google Search Enterprise & Yahoo Search Marketing Enterprise or they have something more special solution.
Not Using List
Regular / Free APIs (Not sure if google and yahoo made it free for them)
Scrapers (Scrapy, Beautiful Soup, Selenium and Etc)
Using List
Enterprise Level API
Server Bash Scripts / Python Scripts / Ruby Scripts / PHP Scripts for scalabilities and such.
Hoping
I hope someone from CopyScape can leak information so that people won't be guessing and CopyScape should have more competition since there are only some plagarism checkers out there which are highly reliable and regarded (probably 1-10 only).
We have a client that wants a store locator on their website. I've been asked to find a webservice that will allow us to send a zipcode as a request and have it return locations within x radius. We found this, but it's maintained by a single person, and doesn't look like it gets updated or supported very well. We're looking for something commercial, ideally that updates their zipcode database at least once per quarter, and that has a well-documented API with PHP accessibility. I won't say price isn't an object, but right now we just want some ideas, and my google-fu has failed me.
I've already posted this over on the webmasters forum, but thought I'd cover my bases and post here too.
I've repurposed this outstanding script to conquor this same challenge. It's free, has been very reliable, and is relatively quick.
In my script, I have addresses stored in the DB. So rather than show a page to enter addresses, I simply pass them as a string and let the magic happen.
He says it in the app, but ensure that if you go this route you get your own Google Maps API. It won't work with his!
If you want to go a bit less technical approach, here's a MySQL query you could run on your locations (you'd have to add lat/long to your DB or setup a GEOCODING service) to give you distance as the crow flies.
Google Maps has a geocoder as well and it geocodes to the specific address.
It's limited to x number of requests but that shouldn't be a big deal if your site is small and if you cache. You can get more requests if you pay.
It can be accessed via javascript or via PHP (and there are several prewritten PHP modules out there)
Link here:
http://code.google.com/apis/maps/documentation/javascript/v2/services.html
(I worked for a company that did upwards of 800,000 requests a day, so it's stable and fast :) )
PostcodeAnywhere has a Store Locator feature - I think it's pay per use, but I've used their other products before and they're very cheap.
http://www.postcodeanywhere.co.uk/store-locator-tool/
Let's say I have a website that has a lot of information on our products. I'd like some of our customers (including us!) to be able to look up our products for various methods, including:
1) Pulling data from AJAX calls that return data in cool, JavaScripty-ways
2) Creating iPhone applications that use that data;
3) Having other web applications use that data for their own end.
Normally, I'd just create an API and be done with it. However, this data is in fact mildly confidential - which is to say that we don't want our competitors to be able to look up all our products every morning and then automatically set their prices to undercut us. And we also want to be able to look at who might be abusing the system, so if someone's making ten million complex calls to our API a day and bogging down our server, we can cut them off.
My next logical step would be then to create a developers' key to restrict access - which would work fine for web apps, but not so much for any AJAX calls. (As I see it, they'd need to provide the key in the JavaScript, which is in plaintext and easily seen, and hence there's actually no security at all. Particularly if we'd be using our own developers' keys on our site to make these AJAX calls.)
So my question: after looking around at Oauth and OpenID for some time, I'm not sure there is a solution that would handle all three of the above. Is there some sort of canonical "best practices" for developers' keys, or can Oauth and OpenID handle AJAX calls easily in some fashion I have yet to grok, or am I missing something entirely?
I think that 2-legged OAuth is what you want to satisfy #2 and #3. For #1 I would suggest that instead of the customer making JS requests directly against your application, they could instead proxy those requests through their own web application.
A midway solution is to require an API key; and then demand that whomsoever uses it doesn't actually use it directly with the AJAX; but wrap their calls in a server-side request, e.g.:
AJAX -> customer server -> your server -> customer server -> user
Creating a simple PHP API for interested parties shouldn't be too tricky, and your own iPhone applications would obviously cut out the middle man, shipping with their own API key.
OAuth and OpenID are unlikely to have much to do with the AJAX calls directly. Most likely, you'll have some sort of authorization filter in front of your AJAX handler that checks a cookie, and maybe that cookie is set as a result of an OpenID authentication.
It seems like this is coming down to a question of "how do I prevent screen scraping." If only logged-in customers get to see the prices, that's one thing, but assuming you're like most retail sites and your barrier to customer sign-up is as low as possible, that doesn't really help.
And, hey, if your prices aren't available, you don't get to show up in search engines like Froogle or Nextag or PriceGrabber. But that's more of a business strategy decision, not a programming one.