Collect data over a period - php

I want to collect data of a particular keyword for the last seven days without user authentication on twitter. The problem is that the result set for one day itself was more than 3000. This quickly blocks my app due to rate limitation. I need a work around this. In fact I don't need the data, I just need the count for each day ( probably this is not possible). Could you please advise me to get over the same. I am using search api, and I am open to use any api.
One more question: Is it possible to collect the public posts at regular intervals ( all posts, without a query term). If this is possible then I can save them in my database and perform the search on the same.

This sounds like a job for the streaming API. You can think of it as setting a keyword and opening a firehose where you will receive tweets containing your keyword until you close the firehose connection. The streaming API is designed for persistent connections, tracking a limited number of keywords. You login with basically a default user.
This 140 PHP Development Framework is a great help in working with the Twitter streaming API in PHP.
Resources:
Twitter Streaming API Information -
https://dev.twitter.com/docs/streaming-apis
140 Twitter Streaming API Framework -
http://140dev.com/free-twitter-api-source-code-library/

Related

How to scrape amazon search with php?

Any ideas? I am new to php and am having a lot of trouble with curl and domdocuments so please write or show me an example. I was thinking of using dom documents but I can not figure out how to get amazon to search a users input from my site and display certain parts of the results such as price, category ex.....
There are several methods using file_get_contents, a "save html" plugin (https://simplehtmldom.sourceforge.io/) and CURL which I've had varying luck with, but eventually it starts flagging my requests with robot checks. I originally used the API, but Amazon locked that down to minimum traffic rules that I can't meet with my budding webservice rendering that useless.
There currently is no easy/effective way to consistently pull Amazon data though I'm playing with randomizing useragents and using proxies.
Use the Product Advertising API instead of scraping. http://docs.aws.amazon.com/AWSECommerceService/latest/DG/ItemSearch.html
The product API actually would be the best resource for this although it gives you limited results and after 180 days if no affiliate transaction occurs I believe they may revoke your access so it does limit you to some extent depending on your uses. Not sure but I think you may need a professional seller account or an affiliate membership, not 100% on that but that is my understanding.

Twitter API similar to Google Alert

I am trying to create a web application which have a similar functionality with Google Alerts. (by similar I mean, the user can provide their email address for the alert to be sent to, daily or hourly) The only limitation is that it only gives alerts to user based on a certain keyword or hashtag. I think that I have found the fundamental API needed for this web application.
https://dev.twitter.com/docs/api/1/get/search
The problem is I still don't know all the web technologies needed for this application to work properly. For example, Do I have to store all of the searched keywords in database? Do I have to keep pooling ajax request all the time in order to keep my database updated? What if the keyword the user provided is very popular right now that might have thousands of tweets just in an hour (not to mention, there might be several emails that request several trending topics)?
By the way, I am trying to build this application using PHP. So please let me know, what kind of techniques I need to learn for such web app (and some references maybe)? Any kind of help will be appreciated. Thanks in advance :)
Regards,
Felix Perdana
I guess you should store user's e-mails and search keywords (or whatever) in the database.
Then your app should make API queries (so it should be run by a server) to get some relevant data. Then you have to send data to the all users.
To understand here is the algorithm:
User adds his request to the page like http://www.google.ru/alerts
You store his e-mail and keyword in the database.
Then your server runs script (you can loop it or use cron) which makes queries to the Twitter to get some data.
Your script process all the data and send it to the user's e-mails.

Netflix: Fetching new remote data when it becomes readily available

I hope this is an appropriate question: I'm using the Netflix API and I'm wondering what the best way one would be able to automatically receive new data when presented (in this case, recently watched films when a Netflix user finishes watching one) The only way I can think of is spamming requests in intervals to query their feed. And would PHP be my best bet?
That's right, Netflix doesn't provide any push notifications through their API. You'll have to poll their feed periodically, but not too often: your consumer key is limited to a certain number of requests per second and requests per day.
I'm not exactly sure what you're trying to do, to know whether PHP would be the right choice. OAuth libraries are available for pretty much every major language, so it's up to you.

Tweet Contest logic ( Twitter )

Disclaimer: I have no Twitter API
experience nor have I used Twitter
until today
I've been given the task of creating a 'tweeting contest' - if anyone has Twitter API experience and/or has done this in the past, I would appreciate any useful tips that you may have.
So the basic rules are that in order for a user to enter the contest, said user must follow the contest's twitter and must retweet with a specific message, such as 'just entered a contest for http://foo.com/contest'.
Questions:
To get the entrants, I have to parse the rss feed of the contest, http://twitter.com/statuses/user_timeline/21586418.rss seems to only list the last few posts so I would probably have to interact with the Twitter API in order to get all messages. Can someone recommend documentation or a page that covers this?
I'm not exactly sure if I should store the actual users in a local xml file or rely on querying the Twitter API, if I store them I would have a cache local copy of users... a database would be overkill and if I were to store them it would be better off in an xml file, right?
Related to #1, should I actually parse for the exact message which the user has to tweet, eg "just entered a contest", the exact string when I parse through the data feed of all the tweets? Or is there some sort of tagging system I can use?
Related to #1, I would have to determine whether the user is a follower or not, so I can't determine that by parsing an entry/tweet, I would have to query the user's id and grab statistics from the people he/she follows?
You could search for the URL, but the best approach would be to use a hashtag:
just entered #supercoolcontest for http://foo.com/contest
You can search for incidences of #supercoolcontest which contain the required contest URL or whatever other keywords you might want. This will ensure users don't have to be text-precise when retweeting, and also gives people a way to talk about the contest in a general way that is trackable.
You can pull all tweets with a hashtag by using the search API:
http://search.twitter.com/search.json?q=%23supercoolcontest
This is probably the most efficient approach, since you are guaranteed to only pull the tweets you're interested in, instead of n tweets from n users, only a tiny fraction of which has anything to do with you.
Every time you scrape that API feed (every n minutes), insert new unique users. I'd use a database - not hard or time consuming to stand something up with a table or two. Easier to query against later.
To answer your last question, you do need to make a separate API call to determine if a given user follows another user.
I know this is an old question and is probably not relevant to meder anymore, nonetheless I want to comment that now there is another way to solve this problem using Twitter's Streamming API http://dev.twitter.com/pages/streaming_api the advantage of this approach is that you are telling twitter to send all the tweets that accomplish some conditions right when they are generated.
With the search API you need to poll twitter for new tweets all the time and there is a bigger chance that some of them will be missing from the search results; meanwhile with the streaming API you keep an open connection to twitter and process the tweets as they come, Twitter won't guarantee that you will get all the tweets that meet the conditions, but from my experience the risk is much lower.

Find Users based on Twitter Friends

I have an app where I pull in tweets with a certain hash tag. When I find the hash tag the app automatically creates a user if they don't exist. When the user logs in via Twitter, I want be able to present them with their friends which are also using the app. The problem is for Twitter users with a ton of friends there is a max response of 100 and I'd have to continue to hit the API to 10 times to get the users of someone with 1000 friends.
Also, when pulling the friends info, should I just cache the friends in an array and move to a matched array so I don't have to hit the API again?
Given that most Twitter apps have a per hour limit on API calls you really should cache pretty much everything. Check the cache to see if you have the data first before pulling down any information.
If you are worried about how up-to-date the data is then put a time stamp in the cache. When you try to access something from the cache check whether the time difference to now is larger than some defined amount (depending on how fresh your data needs to be & how much you can keep hitting the server with requests) and if it is go and refresh the data.
This is a little like writing a good web crawler (which Jeff Atwood seems to suggest has only been done by Google). It is easy to write something that will attempt to pull down everything from the internet at once but it is more difficult to write something that will do it in a sustainable, manageable way.
Twitter have been sensible in forcing people to think through these issues by placing a "per-hour access count" on their API.
I found an API call that just returns the IDs of a Twitter user's friends and returns upwards of 5000, however, tries to return all. The docs for the call are here: http://apiwiki.twitter.com/Twitter-REST-API-Method:-friends%C2%A0ids
What I did was took the response from the API call and created a SQL statement utilizing IN. This way, I now can handle all my sorting and so forth via SQL, rather than doing a nasty array compare.

Categories