Once in a day we want to download google finance data for 6100+ stock symbols.
Right now we are going to this url and getting the data for all stock symbols.
http://www.google.com/finance?q=NYSE:AA&fstype=ii
Getting the data like this use lots of bandwidth and slowness on the server.
Is there better way to get the data from Google.
There is Google Finance API, but it is not available in PHP
The Google Finance API is available for use by all web scripting languages. All the API is is a description of and instructions on how to utilize their REST service. You use something like cURL in PHP to call to their REST service and retrieve the results output by it in XML format, and then parse the XML to display, store, whatever the information retreived.
Note that even though they don't have an example for PHP like they do for many of their APIs, their APIs still all use the same sort of system, so the examples provided for something like the Google Spreadsheets Data API or the Google Documents List API would be valid for getting a starting point on the Google Finance API. The difference between them would be the parameters passed, the data returned, and the url which you would be calling with the parameters to get the data.
If you can live without the data coming from Google, have you looked at the Yahoo Finance API? It's pretty flexible and allows the downloading of multiple symbols at once (though you may not necessarily want to do all 6100 at once).
For example, you can do:
http://finance.yahoo.com/d/quotes.csv?s=XOM+BBDb.TO+JNJ+MSFT&f=snd1l1yr
More detail on how to use the API is nicely written up at:
http://www.gummy-stuff.org/Yahoo-data.htm
Related
I have a client who has a feed of leads which have Name, IP, Address, OptIn Time/Date, and I want them to be able to post the data to my hosted SQL database. If you are familiar with lead generation you will get what Im trying to do.
Id also like to know if its possible to write a script and place it on my server so that when someone posts a CSV file to it I can have the script automatically post the data in the CSV to the SQL server.
Is this possible? And are there any tutorials our reference manuals, sources, etc. I can use to accomplish this?
The answer to your question is Yes.
You can go about this two ways:
Write an API for your database which is consumed by those wishing to search/write/query your database. To do this, you can use any language that you are comfortable with. PHP, XML and Python are not interchangeable. XML is a format specification, it describes what the data should look like when its being transported between two systems. So you can use any programming language that provides XML libraries to write your code. In addition to XML, JSON has emerged as the more popular transport format especially for mobile and web applications.
The second option is to use a service like apigee, google cloud endpoints and mashery which will automate a lot of this process for you. Each requires its own amount of effort (with google cloud endpoints perhaps requiring the most effort). For example apigee will automatically create an API for you as long as you can provide it access to your data source.
I want to collect data of a particular keyword for the last seven days without user authentication on twitter. The problem is that the result set for one day itself was more than 3000. This quickly blocks my app due to rate limitation. I need a work around this. In fact I don't need the data, I just need the count for each day ( probably this is not possible). Could you please advise me to get over the same. I am using search api, and I am open to use any api.
One more question: Is it possible to collect the public posts at regular intervals ( all posts, without a query term). If this is possible then I can save them in my database and perform the search on the same.
This sounds like a job for the streaming API. You can think of it as setting a keyword and opening a firehose where you will receive tweets containing your keyword until you close the firehose connection. The streaming API is designed for persistent connections, tracking a limited number of keywords. You login with basically a default user.
This 140 PHP Development Framework is a great help in working with the Twitter streaming API in PHP.
Resources:
Twitter Streaming API Information -
https://dev.twitter.com/docs/streaming-apis
140 Twitter Streaming API Framework -
http://140dev.com/free-twitter-api-source-code-library/
I'm looking at feeding dojo charts with data from google analytics, within a Zend Framework app. Has anyone done this or have any overview as to how I would go about it? I see there is a dojox.data.GoogleSearchStore. Does it make sense to have a dojox.data.GoogleAnalyticsStore and is anyone working on something like this?
I did a project recently doing this exactly - presenting data from the Google Analytics API using Dojo Charts. I'm not sure if the approach I used was the best, but I can at least give you some pointers.
Daniel Hartmann has a proposal for a Zend_Gdata_Analytics component. It hasn't been approved yet however you can find his code on Github and it works perfectly. I used this to get all the data I needed from analytics.
The Google Analytics API itself is quite powerful but it takes a while to get your head round it. Try and understand the difference between dimensions and metrics from Google's docs. It helps if you think of the service as building queries that return a table of data (like SQL), rather than just one value. In this table, each metric you add to the query adds a column of data to the result, and dimensions are used to restrict and group the data overall. So for example:
$ga->newDataQuery()
->addDimension(Zend_Gdata_Analytics_DataQuery::DIMENSION_DATE)
->addMetric(Zend_Gdata_Analytics_DataQuery::METRIC_VISITS)
->addMetric(Zend_Gdata_Analytics_DataQuery::METRIC_VISITORS)
->addMetric(Zend_Gdata_Analytics_DataQuery::METRIC_PAGEVIEWS);
gives you the total visits, visitors and page views for each day.
Analytics sometimes takes a few seconds to respond to queries (especially complex ones) so you'll want to cache the data. In my case I was selecting it at regular intervals by cron and storing it in a database.
On the Dojo side, I don't think dojox.data.GoogleSearchStore will help you. I used a combination of dojo.data.ItemFileWriteStore, dojox.charting.DataSeries and Zend_Dojo_Data, but I don't think my requirements would be typical. I'd suggest starting with the basics - get your graphs working with sample (hardcoded) data before your try and drop in your analytics. There are some tutorials on sitepen.com which I found useful.
Good luck!
How copyscape uses google API?
The ajax api works only on browsers with javascript enabled, So this api is not used. The SOAP api is not used, because it is not allowed to be used for commercial use and no more than 100 queries are allowed per day.
Copyscape not uses Google api instead it uses Google search it does a simple curl request to http://www.google.com/search?q=Search Keywords here . Then uses regexp patterns to find title, descriptions and links and shows to user. But this strictly violates Google terms of service which can also get them ban, so they uses proxies(or any other ip hiding method) to hide their ip for each search
From their FAQ they have explained how they do it.
Where does Copyscape get its results?
Copyscape uses Google and Yahoo! as search providers, under agreed
terms. These search providers send standard search results to
Copyscape, without any post-processing. Copyscape uses complex
proprietary algorithms to modify these search results in order to
provide a ?plagiarism checking service. Any charges are for
Copyscape's value-added services, not for the provision of search
results by the search providers.
http://www.copyscape.com/faqs.php#providers
Analysis
CopyScape made us 100% sure that Google and Yahoo have special agreements. I am 80% sure that CopyScape are using a similar search solution (probably undisclosed but similar) to Google Enterprise Search provided by the search engines.
CopyScape does not do scraped results, but is fetching API based formats like json and xml. Which is good for the providers (Google and Yahoo) for bandwidth and response time improvements. I came up with this part due to my previous attempts to scrape google search results via python by phrase searches ("phrase matching"). Your scraping bot cannot and no known way to bypass 503 that google will respond after couple of hundred results (100 search intervals or 50 search intervals).
They obviously did not do some browser automation then fetching data between web drivers and programming languages like python. I have tried doing it and it gave similar results except that the automated searcher will need some manual intervention for the captcha which will then let you continue with the scraping. I also tried using some latest bypass which was patch in just minutes/seconds. Surely they did not do any automated scraping from search engines and if ever they are doing it. It will not work long term.
How they are using their special privilege?
Since they have paid off / have special terms they can now automate from the special APIs. They are either using Google Search Enterprise & Yahoo Search Marketing Enterprise or they have something more special solution.
Not Using List
Regular / Free APIs (Not sure if google and yahoo made it free for them)
Scrapers (Scrapy, Beautiful Soup, Selenium and Etc)
Using List
Enterprise Level API
Server Bash Scripts / Python Scripts / Ruby Scripts / PHP Scripts for scalabilities and such.
Hoping
I hope someone from CopyScape can leak information so that people won't be guessing and CopyScape should have more competition since there are only some plagarism checkers out there which are highly reliable and regarded (probably 1-10 only).
I've got a couple of affiliate sites and would like to bring together the earnings reports from several Amazon sites into one place, for easier viewing and analysis.
I get the impression that cURL can be used to get external webpage content, which I could then scrape to obtain the necessary info. However, I've hit a wall in trying to log in to the Associates reports using cURL.
Has anyone done this and do you have any advice?
I am working on an open-source project called PHP-OARA it's allows you to get your data from the different networks, it's part of the AffJet Project.
We have solved the problem with Amazon(it wasn't easy) and a PHP class is available to get your data from your associate account.If you like it you can even co-operate to do it better.
I hope it helps!
You can do this, but youll need to make use of cookies with curl: http://www.electrictoolbox.com/php-curl-cookies/ But id be willing to bet some cash that Amazon offers an API to get the data you want, although the last time i dealt with their web services it was a nightmare but proably because i was using the PHP SOAP extension and Amazon SOAP API.