Scraping from wsj.com or finance.yahoo.com - php

I want to display on a wordpress page the total volume of shares traded on the NYSE stock exchange the last 2 weeks that it's been open. What is the best way to go about doing this?

Yahoo Finance lets you export their data.
For a ticker, on the left sidebar there is a link to Historical Prices. On the bottom of that page there is a link "Download To Spreadsheet".
You could pass that to fgetcsv to parse it.

Scraping websites for data is generally seen as unethical, depending on your intentions and the frequency of the scrape. The bandwidth isn't free, you know. Instead, you should hopefully be able to find a data feed which has been designed to be consumed by other sites, such as yours.
Not knowing very much about your domain, I wouldn't really know what to search for, but here's some guesses:
The NYSE website seems to offer a subscription data feed
Look around the Yahoo Finance page here

Yahoo would be your best bet as they have an unofficial api documented here:
http://www.gummy-stuff.org/Yahoo-data.htm
Tons of apps/widgets rely on this so I can't see it going away
It has in fact gone away, due to yahoo asking that it be taken down.
From first glance, this url would give you what you need: http://finance.yahoo.com/d/quotes.csv?s=^NYA&f=v

Related

Associate Adwords info with Contacts / Conversions on my website

My company run ads on Adwords for quite a while now.
We provide a special kind of Construction Services.
Since our solution is very complex, some people dont quite understand what we are offering but convert into a contact anyway. This happens because:
1 - They dont read the FAQ, neither the website content.
2 - They are just collecting quotes for a service they need, and just Copy + Paste their question on our Contact Form.
3 - Our service name is very hard to target correctly since the keyword is kind of "open to interpretation" and has a bunch of meanings.
I could exemplify with the keyword but i dont know the exact translation to english.
Anyway, we receive a bunch of Contacts. I can track conversions on AdWords and Analytics, i can see wich campaings, keywords, placements, etc are perfoming well, but here is what i really need:
I need to know where THAT GUY that wrote that stupid question came from.
I need to be able to associate AdWords data with the E-mails i receive trough the Contact Form.
I need to be able to, at the end of the month, collect those 1.000 contacts from my CRM wich are classified as "bad contact" and check where they came from. I could for instance find out that a specific placement in Display Network simply suck, i get conversions, but those conversions are bad. Or maybe a specific keyword simply dont work, people always mistake that for something else.
I need to be able to look at this very good contact that came in and trace it back to exactly what campaign originated it, what keyword, what search term, etc.
At my disposal i have, of course, AdWords resources - i looked everywhere i could not find how to do it with Adwords only.
I also have a CodeIgniter / PHP Website. Maybe there is a way to read AdWords / Analytics cookies.
I also have knoledge to mess up with the Adwords API, in case that helps.
Any help from you guys is welcome.
Sorry for my english - not my first language.
Best regards.
You might try using eCommerce module for analytics. Every time when the conversion is performed, create a new product and fill the fields (SKU, Price, Order ID, etc.) with data you need to identify contact

Legal script that scrapes and indexes?

I want to create a website that scrapes certain websites (specified by me) to collect data and pricing and then offer that data as search results on my own site. So basically like a search engine, but for specific sites, indexed in a specific way. I can write this myself, but would like to know:
Is it legal? Can I grab for example, all the items off ebay, put it in a search engine and allow users to search ebay using my site?
What if I make money off this?
Are there any popular PHP scripts that already do this?
The legal aspect has been covered. I found a way around this (well, I got permission from the persons creating the content)... so the only real question is: what can I use to crawl the content, especially keeping in mind, each site will have diffrent rules that I will have to set up? It must also be clever enough to not spider the same content twice?
Is it legal?
Yes. And no. Probably.
There isn't one set of laws covering the entire planet, and SO isn't really for legal advice, you need to find a lawyer in your jurisdiction.
My own thoughts are that you would probably be okay in most jurisdictions as long as you use only the information. So, no eBay logos, no representations that you may be associated with them and so on.
But I am not a lawyer (though I deal a lot with the US sub-species as part of my work), certainly not your lawyer, and this advice (which isn't legal advice) is worth every cent you paid for it, which is ZERO!
What if I make money of this?
Good for you :-) Make mega-bucks. But see above point.
Are there any popular PHP scripts that already do this?
That's the bit I can't answer. My experience with PHP ranges somewhere between zero and nothing.
The legality is a bit shady in this area. You should look for the presence of a robots.txt ( http://www.robotstxt.org/robotstxt.html ) file to first determine if the website welcomes web spiders.
Also, there is a very good PHP search script called sphider ( http://www.sphider.eu/ ), you should have a look at.
EDIT:
I can't see many websites having an issue with you taking snippets of their website and then linking users onto the webpage which the content came from. However, if you plan on just taking all their content and displaying it on your own website in order to make profit, I can only assume many web sites would have an issue as they are the ones who should be profiting off the content.
1) Is it legal? Can I grab for example, all the items off ebay, put it in a search engine and allow users to search ebay using my site?
This is technically feasible. You can build a PHP script that does this quite easily. I would say that it is borderline illegal however, because by scraping content from somebody elses site you will be using their intellectual property, their data without permission.
2) What if I make money off this?
Then the original owners of the data are very likely to come after you, issue a cease and desist notice then sue you. An organization as large as ebay could do this without blinking.
3) Are there any popular PHP scripts that already do this?
Because of the questionable legal nature of your question, I highly doubt there are any scripts that already do this.
The correct technique of getting data from ebay and other large data providers is by using APIs, or application programming interfaces. These are special protocols, languages, designed for programs to communicate with each other. This has the benifit of being significantly more efficient than page-scraping, while also being a known legal way to get data from a provider.
More information about the ebay specific API can be found here; http://developer.ebay.com/common/api/

iTunes App Store Web Scraping

I'm looking to have a user enter an app ID on a website, save the information from their app (my sql database), and then display that information on the website.
If anyone would mind sharing the code/process that would be used to do this or are there tutorials that you can point me in the direction of learning how to do this?
If you could help me out at all I will be very grateful. Thanks.
You CANNOT do this via screen scraping. Read Apple's Terms Of Use
Your Use of the Site You may not use
any “deep-link”, “page-scrape”,
“robot”, “spider” or other automatic
device, program, algorithm or
methodology, or any similar or
equivalent manual process, to access,
acquire, copy or monitor any portion
of the Site or any Content, or in any
way reproduce or circumvent the
navigational structure or presentation
of the Site or any Content, to obtain
or attempt to obtain any materials,
documents or information through any
means not purposely made available
through the Site. Apple reserves the
right to bar any such activity.
What you need to do is investigate Apple's Partner Program which includes a program for developers and I believe would grant you access to an API where you would be able to directly query for and receive the info you wanted (such as app descriptions) to display on your site and perhaps even get a commission for sales you generate when people purchase something from apple via links on your site etc.
Odds are that the other site you see which is displaying such info from Apple's store, has an affiliate/partner arrangement with apple. (and if not, it's just a matter of time till they get blocked from the site, find cease and desist letters in the mail from apple's laywers, get sued, or some combination of those three.
You should really look at these first.
https://stackoverflow.com/questions/822380/how-legal-is-screen-scraping
https://stackoverflow.com/questions/396778/legalities-of-screen-scraping
Knowing Apple, they'll probably sue you. They have sued for less. Or IOW who haven't they sued?
If you want to save the information to your database then you might want to look at the program Appfetcher at http://www.altraware.com.
It uses the Xml that apple provides so there should be no legality issues.

How do I show "daily hits" in any URL (visits/page loads)?

I need to count page views (from any url on my site including search pages) and show them on my site but I can't manage to make it work. I wanted to show the numer of times a page is loaded daily but at this point I don't really care whether I get pageviews, single visitors, or any kind of visits, as long as I do have some kind of counter.
Is there an easy way to do it?
Thanks
Yeah. The easiest way is to use Google Analytics.
I would suggest one of the free web statistics programs out there to just analyze your web logs. They'll be more fully featured than just counting visits, and there will be no overhead of database transactions just because someone is visiting a page.
http://awstats.sourceforge.net/
First, I'd have to say it: displaying number of views is SO 2000.
Well, now to the actual question, you'll have to identify each page and find out how flexible that can be:
/?p=1
/?p=1&q=2
/?p=1&s=1
Those might be the paths and might be referring to the same object, so you'll have to grab it and parse it if necessary. Now, just save it to a table in your database and increase the counter each time a new view is there.
Back on Visonary Software Solutions' track. I would use a Google Analytics-based solution too, perhaps you will use it on your site anyway. I did a quick search and found a tutorial on how to create counters like you wanted, displaying Analytics data. It doesn't look so complicated.
http://www.webresourcesdepot.com/feedcount-like-google-analytics-counter/
As far as I can tell, there are quite a lot of extensions for this purpose for the popular CMSs:
For Drupal: http://drupal.org/project/google_analytics_counter
For WordPress: http://analytics.blogspot.com/2009/05/share-your-google-analytics-data-with.html

Using the Twitter API

I am trying to build a small useful application with twitter. I will publish it as an open source project once I am done. I am trying to decide what is the best way to do the following:
I want to get the latest 200 tweets from Washington for example and see the most important thing these 200 tweets share. For example, if 20 tweets have tweeted the same link, this is probably an important story in Washington. Or if 50 tweets mentioned (This specific subject) it means this is important and I could get information about it.
What is the best way to do that? and is there a better way to get this information without getting the latest 200 tweets (except trends).
If you feel like this is not clear enough please provide some questions and I will clear this up
Thank you all for the help.
I don't think there is going to be any "custom" trending available, so you are going to have to parse out the links from the search results yourself.
You would use the search api function:
http://search.twitter.com/search.atom?geocode=40.757929%2C-73.985506%2C25km
After that, it should be pretty trivial to maintain a list of trends and links over the past 24 hours.
I would suggest you use a php twitter library which does already the things you wrote.
Please have a look at this question to find a library which fits your needs https://stackoverflow.com/questions/422879/best-twitter-php-library

Categories