PHP Script that receives information from specified websites - php

I am trying to create a script that will gather information from Amazon product listing based on entered product ASIN (description, image, price, seller name). Functionality should be similar to this one: http://www.savings.com/pricejump
I tried to use DOM to receive HTML elements but I am concerned about IP ban if I have too many requests in short period of time. I plan to have several hundred requests per day with this script.
Can you please share some useful links on this subject. I really don't know in which direction to head.
Any help would be highly appreciated.

You could use their affiliate program, which is free and would allow enough requests to accomplish what you asked. Plus it provides a web services API, which is much cleaner and easier to use than screen scraping.
https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html

Related

How to scrape amazon search with php?

Any ideas? I am new to php and am having a lot of trouble with curl and domdocuments so please write or show me an example. I was thinking of using dom documents but I can not figure out how to get amazon to search a users input from my site and display certain parts of the results such as price, category ex.....
There are several methods using file_get_contents, a "save html" plugin (https://simplehtmldom.sourceforge.io/) and CURL which I've had varying luck with, but eventually it starts flagging my requests with robot checks. I originally used the API, but Amazon locked that down to minimum traffic rules that I can't meet with my budding webservice rendering that useless.
There currently is no easy/effective way to consistently pull Amazon data though I'm playing with randomizing useragents and using proxies.
Use the Product Advertising API instead of scraping. http://docs.aws.amazon.com/AWSECommerceService/latest/DG/ItemSearch.html
The product API actually would be the best resource for this although it gives you limited results and after 180 days if no affiliate transaction occurs I believe they may revoke your access so it does limit you to some extent depending on your uses. Not sure but I think you may need a professional seller account or an affiliate membership, not 100% on that but that is my understanding.

Track usage of Javascript on third party site

I have a snippet of javascript that I allow people to use as a way to display content from my site as a daily reminder/quote widget using. The data itself is from a php source i.e. code is written in php but displayed to third party sites by embedding a snippet of javascript.
I want to be able to accurately track how many unique websites are using my script to display the daily quote.
I can check awstats etc for number of hits to the page but that isn't the most accurate way of doing it.
I don't want to implement a mandatory registration before I have them use it as a I feel it will deter users from using it.
I dont want to create webservices (REST/Soap etc) either as that creates an obstacle for the users to implement on their side so leaving it as an embed works best at the moment.
I can't seem to find a solution, what are my options if any?

Legal script that scrapes and indexes?

I want to create a website that scrapes certain websites (specified by me) to collect data and pricing and then offer that data as search results on my own site. So basically like a search engine, but for specific sites, indexed in a specific way. I can write this myself, but would like to know:
Is it legal? Can I grab for example, all the items off ebay, put it in a search engine and allow users to search ebay using my site?
What if I make money off this?
Are there any popular PHP scripts that already do this?
The legal aspect has been covered. I found a way around this (well, I got permission from the persons creating the content)... so the only real question is: what can I use to crawl the content, especially keeping in mind, each site will have diffrent rules that I will have to set up? It must also be clever enough to not spider the same content twice?
Is it legal?
Yes. And no. Probably.
There isn't one set of laws covering the entire planet, and SO isn't really for legal advice, you need to find a lawyer in your jurisdiction.
My own thoughts are that you would probably be okay in most jurisdictions as long as you use only the information. So, no eBay logos, no representations that you may be associated with them and so on.
But I am not a lawyer (though I deal a lot with the US sub-species as part of my work), certainly not your lawyer, and this advice (which isn't legal advice) is worth every cent you paid for it, which is ZERO!
What if I make money of this?
Good for you :-) Make mega-bucks. But see above point.
Are there any popular PHP scripts that already do this?
That's the bit I can't answer. My experience with PHP ranges somewhere between zero and nothing.
The legality is a bit shady in this area. You should look for the presence of a robots.txt ( http://www.robotstxt.org/robotstxt.html ) file to first determine if the website welcomes web spiders.
Also, there is a very good PHP search script called sphider ( http://www.sphider.eu/ ), you should have a look at.
EDIT:
I can't see many websites having an issue with you taking snippets of their website and then linking users onto the webpage which the content came from. However, if you plan on just taking all their content and displaying it on your own website in order to make profit, I can only assume many web sites would have an issue as they are the ones who should be profiting off the content.
1) Is it legal? Can I grab for example, all the items off ebay, put it in a search engine and allow users to search ebay using my site?
This is technically feasible. You can build a PHP script that does this quite easily. I would say that it is borderline illegal however, because by scraping content from somebody elses site you will be using their intellectual property, their data without permission.
2) What if I make money off this?
Then the original owners of the data are very likely to come after you, issue a cease and desist notice then sue you. An organization as large as ebay could do this without blinking.
3) Are there any popular PHP scripts that already do this?
Because of the questionable legal nature of your question, I highly doubt there are any scripts that already do this.
The correct technique of getting data from ebay and other large data providers is by using APIs, or application programming interfaces. These are special protocols, languages, designed for programs to communicate with each other. This has the benifit of being significantly more efficient than page-scraping, while also being a known legal way to get data from a provider.
More information about the ebay specific API can be found here; http://developer.ebay.com/common/api/

Recommend a web service that handles location within a specific radius?

We have a client that wants a store locator on their website. I've been asked to find a webservice that will allow us to send a zipcode as a request and have it return locations within x radius. We found this, but it's maintained by a single person, and doesn't look like it gets updated or supported very well. We're looking for something commercial, ideally that updates their zipcode database at least once per quarter, and that has a well-documented API with PHP accessibility. I won't say price isn't an object, but right now we just want some ideas, and my google-fu has failed me.
I've already posted this over on the webmasters forum, but thought I'd cover my bases and post here too.
I've repurposed this outstanding script to conquor this same challenge. It's free, has been very reliable, and is relatively quick.
In my script, I have addresses stored in the DB. So rather than show a page to enter addresses, I simply pass them as a string and let the magic happen.
He says it in the app, but ensure that if you go this route you get your own Google Maps API. It won't work with his!
If you want to go a bit less technical approach, here's a MySQL query you could run on your locations (you'd have to add lat/long to your DB or setup a GEOCODING service) to give you distance as the crow flies.
Google Maps has a geocoder as well and it geocodes to the specific address.
It's limited to x number of requests but that shouldn't be a big deal if your site is small and if you cache. You can get more requests if you pay.
It can be accessed via javascript or via PHP (and there are several prewritten PHP modules out there)
Link here:
http://code.google.com/apis/maps/documentation/javascript/v2/services.html
(I worked for a company that did upwards of 800,000 requests a day, so it's stable and fast :) )
PostcodeAnywhere has a Store Locator feature - I think it's pay per use, but I've used their other products before and they're very cheap.
http://www.postcodeanywhere.co.uk/store-locator-tool/

Scraping from wsj.com or finance.yahoo.com

I want to display on a wordpress page the total volume of shares traded on the NYSE stock exchange the last 2 weeks that it's been open. What is the best way to go about doing this?
Yahoo Finance lets you export their data.
For a ticker, on the left sidebar there is a link to Historical Prices. On the bottom of that page there is a link "Download To Spreadsheet".
You could pass that to fgetcsv to parse it.
Scraping websites for data is generally seen as unethical, depending on your intentions and the frequency of the scrape. The bandwidth isn't free, you know. Instead, you should hopefully be able to find a data feed which has been designed to be consumed by other sites, such as yours.
Not knowing very much about your domain, I wouldn't really know what to search for, but here's some guesses:
The NYSE website seems to offer a subscription data feed
Look around the Yahoo Finance page here
Yahoo would be your best bet as they have an unofficial api documented here:
http://www.gummy-stuff.org/Yahoo-data.htm
Tons of apps/widgets rely on this so I can't see it going away
It has in fact gone away, due to yahoo asking that it be taken down.
From first glance, this url would give you what you need: http://finance.yahoo.com/d/quotes.csv?s=^NYA&f=v

Categories