I have an app where I pull in tweets with a certain hash tag. When I find the hash tag the app automatically creates a user if they don't exist. When the user logs in via Twitter, I want be able to present them with their friends which are also using the app. The problem is for Twitter users with a ton of friends there is a max response of 100 and I'd have to continue to hit the API to 10 times to get the users of someone with 1000 friends.
Also, when pulling the friends info, should I just cache the friends in an array and move to a matched array so I don't have to hit the API again?
Given that most Twitter apps have a per hour limit on API calls you really should cache pretty much everything. Check the cache to see if you have the data first before pulling down any information.
If you are worried about how up-to-date the data is then put a time stamp in the cache. When you try to access something from the cache check whether the time difference to now is larger than some defined amount (depending on how fresh your data needs to be & how much you can keep hitting the server with requests) and if it is go and refresh the data.
This is a little like writing a good web crawler (which Jeff Atwood seems to suggest has only been done by Google). It is easy to write something that will attempt to pull down everything from the internet at once but it is more difficult to write something that will do it in a sustainable, manageable way.
Twitter have been sensible in forcing people to think through these issues by placing a "per-hour access count" on their API.
I found an API call that just returns the IDs of a Twitter user's friends and returns upwards of 5000, however, tries to return all. The docs for the call are here: http://apiwiki.twitter.com/Twitter-REST-API-Method:-friends%C2%A0ids
What I did was took the response from the API call and created a SQL statement utilizing IN. This way, I now can handle all my sorting and so forth via SQL, rather than doing a nasty array compare.
Related
I'm not sure where is the best place to ask this question, so maybe if this is the wrong place, someone can help move this question to a better group?
It has elements of programming, user experience and database, but doesn't really fit well into any one category!
I need to take data and display it in a graph on my site. This data is available from an API.
But I cant decide if it is best to just get this data from the API "live" when needed, or to save data from the API to a local (on my own server) database.
Both methods have pros and cons.
Getting the data live means more URL requests, more latency, and if the site is used by many users, may limit the API access. I 'assume' the site will always be available if using the data live. The API data is also restricted to the past 2000 historical data points.
If I use a cron job to request the data, say once an hour, and save it to my own database, then I am only calling the API once every hour. Accessing my own database should be faster than calling an API from a URL GET request when drawing my page. And if my site is up, then my database will be up, so I don't need to worry about the API site uptime. And I can store as many historical data points as I want to, if I am storing the data myself.
But it seems wasteful to simply duplicate data that is already existing elsewhere.
There could be millions of data points. Is it really sensible to store perhaps 50 millions pieces of data on my own server, when it already exists on an API?
From the user's perspective, there shouldn't be any difference as to which method I choose - other than perhaps if my site is up and the API site is down, in which case there would be missing data on my site.
I am torn between these two options and don't know how best to proceed with this.
I am trying to create a web application which have a similar functionality with Google Alerts. (by similar I mean, the user can provide their email address for the alert to be sent to, daily or hourly) The only limitation is that it only gives alerts to user based on a certain keyword or hashtag. I think that I have found the fundamental API needed for this web application.
https://dev.twitter.com/docs/api/1/get/search
The problem is I still don't know all the web technologies needed for this application to work properly. For example, Do I have to store all of the searched keywords in database? Do I have to keep pooling ajax request all the time in order to keep my database updated? What if the keyword the user provided is very popular right now that might have thousands of tweets just in an hour (not to mention, there might be several emails that request several trending topics)?
By the way, I am trying to build this application using PHP. So please let me know, what kind of techniques I need to learn for such web app (and some references maybe)? Any kind of help will be appreciated. Thanks in advance :)
Regards,
Felix Perdana
I guess you should store user's e-mails and search keywords (or whatever) in the database.
Then your app should make API queries (so it should be run by a server) to get some relevant data. Then you have to send data to the all users.
To understand here is the algorithm:
User adds his request to the page like http://www.google.ru/alerts
You store his e-mail and keyword in the database.
Then your server runs script (you can loop it or use cron) which makes queries to the Twitter to get some data.
Your script process all the data and send it to the user's e-mails.
Facebook has the feature to show instant auto-suggestion result in-various situations such as : searching , message sending etc.
i think I have been correct in terming the functionality as 'auto-suggestion'.
If a user has 1000 friends and s/he wishes to send message to a friend , then facebook will suggest his/her name on typing a few characters.
My question is: While pulling the data out of database to find friends (or for any such situation) and then handling with it, which technique does FB use to maintain the speed in auto-suggestion?
Is it caching the variable or what? I wish to know in details as i am planning to build a social networking site. My scripting language is php
I think a good chunk of it is not so much PHP, although facebook are known to use hiphop to compile the PHP.
A more important factor IMO would be the database side of things. The query is probably as optimised as it can be, only getting back what it needs, caching will probably also come into play, i.e. the user's friends have been already retrieved, quite likely getting back the most frequently contacted friends. Also facebook have tons and tons of database servers, which can only help speed really.
Hope that helps
Probably a data structure like patricia-trie or ternary search tree.
A suggesttree like: suggesttree.
Auto-suggesting with 1000 or even 5000 entries is not that hard. You have to retrieve the whole friend list, and to store it in indexed javascript array (for example we did it using the first letter as index, so friends['a'] = [andrey, albert] ) and then you are actually searching in memory from a small subset.
The invite window is build in similar fashion - you build an index of names -> dom elements, you perform the dom manipulation offline - and you are attaching the results with only people that match the searched term.
The friendlist is most likely cached in memcached, and facebook warm up caches as early as it can - it does not wait to use the friend list in any way in order to put it in memcache. So - it's retrieven in memcached, stored in local storage and uses efficient JavaScript. No DB involved here.
P.S. I'm not speaking for facebook, but for a similar solution we've designed to handle fast auto-suggest / invite dialog on 5000+ entries.
I'm currently developing an app for iOS-devices. This app downloads data from a wordpress blog, but fetches a nonce-token first. This has been tested, and is showing to take about 2~3 seconds, which is a lot, considering it's a mobile device that should have the data ready in a few seconds. In addition to this, the data has to be downloaded as well, which takes another 4~5 seconds.
In the data-fetching-method there are several security-measures taken, for example a secret string that needs to match on both the web-server and device (of course encrypted), and some sort of simple UDID-validation + some header and useragent-tests. Is this enough, or do I really need the nonces? It's not like there is any sensitive data being passed through, and if it was, I'd of course encrypt it further.
Is it really necessary for me to use nonces?
Thank you.
If you are downloading public data, there's no need for the nonce authentication stuff.
If you are going to be modifying data on the server, or fetching data that is not public or otherwise has some kind of access control around it, then you'll need whatever mechanism Wordpress requires to gain access (which it sounds like is a nonce-based token approach).
If it's taking a few seconds to get that token, how about fetching it on app startup/resume in the background?
Disclaimer: I have no Twitter API
experience nor have I used Twitter
until today
I've been given the task of creating a 'tweeting contest' - if anyone has Twitter API experience and/or has done this in the past, I would appreciate any useful tips that you may have.
So the basic rules are that in order for a user to enter the contest, said user must follow the contest's twitter and must retweet with a specific message, such as 'just entered a contest for http://foo.com/contest'.
Questions:
To get the entrants, I have to parse the rss feed of the contest, http://twitter.com/statuses/user_timeline/21586418.rss seems to only list the last few posts so I would probably have to interact with the Twitter API in order to get all messages. Can someone recommend documentation or a page that covers this?
I'm not exactly sure if I should store the actual users in a local xml file or rely on querying the Twitter API, if I store them I would have a cache local copy of users... a database would be overkill and if I were to store them it would be better off in an xml file, right?
Related to #1, should I actually parse for the exact message which the user has to tweet, eg "just entered a contest", the exact string when I parse through the data feed of all the tweets? Or is there some sort of tagging system I can use?
Related to #1, I would have to determine whether the user is a follower or not, so I can't determine that by parsing an entry/tweet, I would have to query the user's id and grab statistics from the people he/she follows?
You could search for the URL, but the best approach would be to use a hashtag:
just entered #supercoolcontest for http://foo.com/contest
You can search for incidences of #supercoolcontest which contain the required contest URL or whatever other keywords you might want. This will ensure users don't have to be text-precise when retweeting, and also gives people a way to talk about the contest in a general way that is trackable.
You can pull all tweets with a hashtag by using the search API:
http://search.twitter.com/search.json?q=%23supercoolcontest
This is probably the most efficient approach, since you are guaranteed to only pull the tweets you're interested in, instead of n tweets from n users, only a tiny fraction of which has anything to do with you.
Every time you scrape that API feed (every n minutes), insert new unique users. I'd use a database - not hard or time consuming to stand something up with a table or two. Easier to query against later.
To answer your last question, you do need to make a separate API call to determine if a given user follows another user.
I know this is an old question and is probably not relevant to meder anymore, nonetheless I want to comment that now there is another way to solve this problem using Twitter's Streamming API http://dev.twitter.com/pages/streaming_api the advantage of this approach is that you are telling twitter to send all the tweets that accomplish some conditions right when they are generated.
With the search API you need to poll twitter for new tweets all the time and there is a bigger chance that some of them will be missing from the search results; meanwhile with the streaming API you keep an open connection to twitter and process the tweets as they come, Twitter won't guarantee that you will get all the tweets that meet the conditions, but from my experience the risk is much lower.