Disclaimer: I have no Twitter API
experience nor have I used Twitter
until today
I've been given the task of creating a 'tweeting contest' - if anyone has Twitter API experience and/or has done this in the past, I would appreciate any useful tips that you may have.
So the basic rules are that in order for a user to enter the contest, said user must follow the contest's twitter and must retweet with a specific message, such as 'just entered a contest for http://foo.com/contest'.
Questions:
To get the entrants, I have to parse the rss feed of the contest, http://twitter.com/statuses/user_timeline/21586418.rss seems to only list the last few posts so I would probably have to interact with the Twitter API in order to get all messages. Can someone recommend documentation or a page that covers this?
I'm not exactly sure if I should store the actual users in a local xml file or rely on querying the Twitter API, if I store them I would have a cache local copy of users... a database would be overkill and if I were to store them it would be better off in an xml file, right?
Related to #1, should I actually parse for the exact message which the user has to tweet, eg "just entered a contest", the exact string when I parse through the data feed of all the tweets? Or is there some sort of tagging system I can use?
Related to #1, I would have to determine whether the user is a follower or not, so I can't determine that by parsing an entry/tweet, I would have to query the user's id and grab statistics from the people he/she follows?
You could search for the URL, but the best approach would be to use a hashtag:
just entered #supercoolcontest for http://foo.com/contest
You can search for incidences of #supercoolcontest which contain the required contest URL or whatever other keywords you might want. This will ensure users don't have to be text-precise when retweeting, and also gives people a way to talk about the contest in a general way that is trackable.
You can pull all tweets with a hashtag by using the search API:
http://search.twitter.com/search.json?q=%23supercoolcontest
This is probably the most efficient approach, since you are guaranteed to only pull the tweets you're interested in, instead of n tweets from n users, only a tiny fraction of which has anything to do with you.
Every time you scrape that API feed (every n minutes), insert new unique users. I'd use a database - not hard or time consuming to stand something up with a table or two. Easier to query against later.
To answer your last question, you do need to make a separate API call to determine if a given user follows another user.
I know this is an old question and is probably not relevant to meder anymore, nonetheless I want to comment that now there is another way to solve this problem using Twitter's Streamming API http://dev.twitter.com/pages/streaming_api the advantage of this approach is that you are telling twitter to send all the tweets that accomplish some conditions right when they are generated.
With the search API you need to poll twitter for new tweets all the time and there is a bigger chance that some of them will be missing from the search results; meanwhile with the streaming API you keep an open connection to twitter and process the tweets as they come, Twitter won't guarantee that you will get all the tweets that meet the conditions, but from my experience the risk is much lower.
Related
I want to collect data of a particular keyword for the last seven days without user authentication on twitter. The problem is that the result set for one day itself was more than 3000. This quickly blocks my app due to rate limitation. I need a work around this. In fact I don't need the data, I just need the count for each day ( probably this is not possible). Could you please advise me to get over the same. I am using search api, and I am open to use any api.
One more question: Is it possible to collect the public posts at regular intervals ( all posts, without a query term). If this is possible then I can save them in my database and perform the search on the same.
This sounds like a job for the streaming API. You can think of it as setting a keyword and opening a firehose where you will receive tweets containing your keyword until you close the firehose connection. The streaming API is designed for persistent connections, tracking a limited number of keywords. You login with basically a default user.
This 140 PHP Development Framework is a great help in working with the Twitter streaming API in PHP.
Resources:
Twitter Streaming API Information -
https://dev.twitter.com/docs/streaming-apis
140 Twitter Streaming API Framework -
http://140dev.com/free-twitter-api-source-code-library/
I am trying to create a web application which have a similar functionality with Google Alerts. (by similar I mean, the user can provide their email address for the alert to be sent to, daily or hourly) The only limitation is that it only gives alerts to user based on a certain keyword or hashtag. I think that I have found the fundamental API needed for this web application.
https://dev.twitter.com/docs/api/1/get/search
The problem is I still don't know all the web technologies needed for this application to work properly. For example, Do I have to store all of the searched keywords in database? Do I have to keep pooling ajax request all the time in order to keep my database updated? What if the keyword the user provided is very popular right now that might have thousands of tweets just in an hour (not to mention, there might be several emails that request several trending topics)?
By the way, I am trying to build this application using PHP. So please let me know, what kind of techniques I need to learn for such web app (and some references maybe)? Any kind of help will be appreciated. Thanks in advance :)
Regards,
Felix Perdana
I guess you should store user's e-mails and search keywords (or whatever) in the database.
Then your app should make API queries (so it should be run by a server) to get some relevant data. Then you have to send data to the all users.
To understand here is the algorithm:
User adds his request to the page like http://www.google.ru/alerts
You store his e-mail and keyword in the database.
Then your server runs script (you can loop it or use cron) which makes queries to the Twitter to get some data.
Your script process all the data and send it to the user's e-mails.
Facebook has the feature to show instant auto-suggestion result in-various situations such as : searching , message sending etc.
i think I have been correct in terming the functionality as 'auto-suggestion'.
If a user has 1000 friends and s/he wishes to send message to a friend , then facebook will suggest his/her name on typing a few characters.
My question is: While pulling the data out of database to find friends (or for any such situation) and then handling with it, which technique does FB use to maintain the speed in auto-suggestion?
Is it caching the variable or what? I wish to know in details as i am planning to build a social networking site. My scripting language is php
I think a good chunk of it is not so much PHP, although facebook are known to use hiphop to compile the PHP.
A more important factor IMO would be the database side of things. The query is probably as optimised as it can be, only getting back what it needs, caching will probably also come into play, i.e. the user's friends have been already retrieved, quite likely getting back the most frequently contacted friends. Also facebook have tons and tons of database servers, which can only help speed really.
Hope that helps
Probably a data structure like patricia-trie or ternary search tree.
A suggesttree like: suggesttree.
Auto-suggesting with 1000 or even 5000 entries is not that hard. You have to retrieve the whole friend list, and to store it in indexed javascript array (for example we did it using the first letter as index, so friends['a'] = [andrey, albert] ) and then you are actually searching in memory from a small subset.
The invite window is build in similar fashion - you build an index of names -> dom elements, you perform the dom manipulation offline - and you are attaching the results with only people that match the searched term.
The friendlist is most likely cached in memcached, and facebook warm up caches as early as it can - it does not wait to use the friend list in any way in order to put it in memcache. So - it's retrieven in memcached, stored in local storage and uses efficient JavaScript. No DB involved here.
P.S. I'm not speaking for facebook, but for a similar solution we've designed to handle fast auto-suggest / invite dialog on 5000+ entries.
I'm trying to sample the relative frequencies of regular tweets vs retweets vs replies in the public timeline; however, I can't seem access the latter two. Is there any way to pull down public replies and retweets using the Twitter API? (for the record, I'm using PHP, but I think this is more of an API question) Or, alternatively, is there any way to empirically determine the relative fractions of retweets/replies/neither that exist on Twitter?
Edit: I should have made this explicit, and I apologize: the problem resides in the fact that the API seems to have eliminated replies and retweets from the statuses/public_timeline REST call, leaving only regular tweets. My question, then, is whether or not there is a way to access public replies and retweets in addition to regular tweets, given that this particular method call does not seem to work. I hope that clears things up.
If you would like to get those frequences for live status updates, you could consume Twitter's Streaming API. Since you want to use PHP, Phirehose allows you to consume the streams easily.
From there you would just examine each status update, looking for whatever markers you like to determine whether they are retweets, replies, etc.. Twitter Text (PHP) might be useful (even if you just borrow the regexes).
A quick run of the above against the "Sample" stream (~1% of public statuses) showed:
2,986 replies
1,481 retweets
6,706 mentions
10,000 tweets
You can extract that information parsing the tweets text looking for "RT #XXXX" or "#XXXX" patterns.
I have an app where I pull in tweets with a certain hash tag. When I find the hash tag the app automatically creates a user if they don't exist. When the user logs in via Twitter, I want be able to present them with their friends which are also using the app. The problem is for Twitter users with a ton of friends there is a max response of 100 and I'd have to continue to hit the API to 10 times to get the users of someone with 1000 friends.
Also, when pulling the friends info, should I just cache the friends in an array and move to a matched array so I don't have to hit the API again?
Given that most Twitter apps have a per hour limit on API calls you really should cache pretty much everything. Check the cache to see if you have the data first before pulling down any information.
If you are worried about how up-to-date the data is then put a time stamp in the cache. When you try to access something from the cache check whether the time difference to now is larger than some defined amount (depending on how fresh your data needs to be & how much you can keep hitting the server with requests) and if it is go and refresh the data.
This is a little like writing a good web crawler (which Jeff Atwood seems to suggest has only been done by Google). It is easy to write something that will attempt to pull down everything from the internet at once but it is more difficult to write something that will do it in a sustainable, manageable way.
Twitter have been sensible in forcing people to think through these issues by placing a "per-hour access count" on their API.
I found an API call that just returns the IDs of a Twitter user's friends and returns upwards of 5000, however, tries to return all. The docs for the call are here: http://apiwiki.twitter.com/Twitter-REST-API-Method:-friends%C2%A0ids
What I did was took the response from the API call and created a SQL statement utilizing IN. This way, I now can handle all my sorting and so forth via SQL, rather than doing a nasty array compare.