A client would like me to add their Twitter stream to their website homepage, using a custom solution built in PHP.
The Twitter API obviously has a limited number of calls you can make to it per hour, so I can't automatically ping Twitter every time someone refreshes my client's homepage.
The client's website is purely HTML at the moment and so there is no database available. My solution must therefore only require PHP and the local file system (e.g. saving a local XML file with some data in it).
So, given this limited criteria, what's the best way for me to access the Twitter API - via PHP - without hitting my API call limit within a few minutes?
It will be quite easy, once you can pull down a timeline and display it, to then add some file-based-caching to it.
check age of cache
Is it more than 5 mins old?
fetch the latest information
regenerate the HTML for output
save the finished HTML to disk
display the cached pre-prepared HTML
PEAR's Cache_Lite will do all you need on the caching layer.
a cron job (not likley - if there's not even a database, then there are no cron jobs)
write the microtime() to a file. on a page view compare the current timestamp to the saved one. its the difference greater than N minutes, pull the new tweetfeed and write the current timestamp to the file
if the front page is a static html-file not calling any php, include an image <img src="scheduler.php"/> that returns an 1px transparent gif (at least you did it this way when i was young) and does your twitter-pulling silently
or do you mean local-local filesystem, as in "my/the customers computer not the server"-local?
in this case:
get some server with a cron job or scheduler and PHP
write a script that reads and saves the feed to a file
write the file to the customers server using FTP
display the feed using javascript (yes, ajax also works with static files as datasources). jquery or some lib is great for this
or: create the tweet-displaying html file locally and upload it (but be careful ... because you may overwrite updates on the server)
imo: for small sites you often just don't need a fully grown sql database anyway. filesystems are great. a combination of scandir, preg_match and carefully chosen file names are often good enough.
and you can actually do a lot of front-end processing (like displaying XML) using beautiful javascript.
Since we don't know your server config I suggest you set up a cron job (assuming your on a Linux box). If you have something like cPanel on a shared hosting environment than it should be not much of an issue. You need to write a script that is called by cron and that will get the latest tweets and write them to a file (xml?). You can schedule cron to run every 30 min. or what ever you want.
You may want to use TweetPHP by Tim Davies. http://lab.lostpixel.net/classes/twitter/ - This class has lots of features including the one you want, showing your clients time line.
The page shows good examples on how to use it.
You can then put the output of this in a file or database. If you want the site visitor to update the database or the file like every 5 minutes so, you can set a session variable holding a timestamp and just allow another update if the timestamp was at least 5 minutes ago.
Hope this helps
My suggestion: Create a small simple object to hold the cache date and an array of tweets. Every time someone visits the page, it performs the following logic:
A) Does file exist?
Yes: Read it into a variable
No: Proceed to step D)
B) Unserialize the variable (The PHP pair serialize()/unserialize() would do just fine)
C) Compare the age of the cache stored with the current time (a Unix timestamp would do it)
Its over 5 minutes from each other:
D) Get the newest tweets from Twitter, update the object, serialize it and write in the cache again. Store the newest tweets for printing, too.
Its not: Just read the tweets from the cache.
E) Print the tweets
Simplest and easiest way to serialize the object is the serialize()/unserialize() pair. If you're not willing to put off the effort to make the object, you could just use 2D array, serialize() will work just fine. Give a look on http://php.net/serialize
Considering you have no cPanel access, its the best solution since you won't have access to PEAR packages, cron or any other simpler solutions.
array(
'lastrequest' => 123,
'tweets' => array ()
)
now in your code put a check to see if the timestamp in the datastore for lastrequest is more than X seconds old. If it is then its time to update your data.
serialize and store the array in a file, pretty simple
Related
I'm building a PHP application which has a database containing approximately 140 URL's.
The goal is to download a copy of the contents of these web pages.
I've already written code which reads the URL's from my database then uses curl to grab a copy of the page. It then gets everything between <body> </body>, and writes it to a file. It also takes into account redirects, e.g. if I go to a URL and the response code is 302, it will follow the appropriate link. So far so good.
This all works ok for a number of URL's (maybe 20 or so) but then my script times out due to the max_execution_time being set to 30 seconds. I don't want to override or increase this, as I feel that's a poor solution.
I've thought of 2 work arounds but would like to know if these are a good/bad approach, or if there are better ways.
The first approach is to use a LIMIT on the database query such that it splits the task up into 20 rows at a time (i.e. run the script 7 separate times, if there were 140 rows). I understand from this approach it still needs to call the script, download.php, 7 separate times so would need to pass in the LIMIT figures.
The second is to have a script where I pass in the ID of each individual database record I want the URL for (e.g. download.php?id=2) and then do multiple Ajax requests to them (download.php?id=2, download.php?id=3, download.php?id=4 etc). Based on $_GET['id'] it could do a query to find the URL in the database etc. In theory I'd be doing 140 separate requests as it's a 1 request per URL set up.
I've read some other posts which have pointed to queueing systems, but these are beyond my knowledge. If this is the best way then is there a particular system which is worth taking a look at?
Any help would be appreciated.
Edit: There are 140 URL's at the moment, and this is likely to increase over time. So I'm looking for a solution that will scale without hitting any timeout limits.
I dont agree with your logic , if the script is running OK and it needs more time to finish, just give it more time it is not a poor solution.What you are suggesting makes things more complicated and will not scale well if your urls increase.
I would suggest moving your script to the command line where there is no time limit and not using the browser to execute it.
When you have an unknown list wich will take an unknown amount of time asynchronous calls are the way to go.
Split your script into a single page download (like you proposed, download.php?id=X).
From the "main" script get the list from the database, iterate over it and send an ajax call to the script for each one. As all the calls will be fired all at once, check for your bandwidth and CPU time. You could break it into "X active task" using the success callback.
You can either set the download.php file to return success data or to save it to a database with the id of the website and the result of the call. I recommend the later because you can then just leave the main script and grab the results at a later time.
You can't increase the time limit indefinitively and can't wait indefinitively time to complete the request, so you need a "fire and forget" and that's what asynchronous call does best.
As #apokryfos pointed out, depending on the timing of this sort of "backups" you could fit this into a task scheduler (like chron). If you call it "on demand", put it in a gui, if you call it "every x time" put a chron task pointing the main script, it will do the same.
What you are describing sounds like a job for the console. The browser is for the users to see, but your task is something that the programmer will run, so use the console. Or schedule the file to run with a cron-job or anything similar that is handled by the developer.
Execute all the requests simultaneously using stream_socket_client(). Save all the socket IDs in an array
Then loop through the array of IDs with stream_select() to read the responses.
It's almost like multi-tasking within PHP.
I am currently working on a project that runs online tournaments. Normally admins of the site will generate brackets when it is time for the tournaments to start, but we have run into inconsistent start times, etc. and i am looking into proper ways to automate this process.
I have looked into running cronjobs every x min to check if a bracket needs to be generated but i am worried about issue when it comes to overlapping cronjobs, having to create/manage cronjobs through cpanel etc.
I was thinking about other solutions and thought it would be great if a user could load a page, the backend checks timestamps and determines if the bracket should be generated. An event is then fired/set to begin the auto-generation process elsewhere so it does not impact user load times. I just do not know the best route of going about this.
PS: I just need an idea of the direction i should be looking into so i can learn how to solve this issue i am not looking to copy and paste code. I just haven't been able to find anything. All of my search results provide cronjob examples.
EDIT
After thinking about things could using this work?
$(document).ready(function() {
$.ajax('Full Url Path Here');
})
I don't need to pass user input, or return any data i simply need a way to fire an event, it would be easy to include this only when needed via a helper class. Also i won't necessarily have to worry about users attempting to access i can restrict the route to ajax only requests and since nothing is needed/used on input or returned as output what can happen?
You could do it everytime a user loads a page (idea not tested, but theoretically possible):
1) Create a file and store the timestamp of the last time you updated the database.
2) Everytime a user loads a page, read that timestamp and check if 15 minutes passed.
3) If 15 minutes passed: Run a background script (with shell_exec?) that will do what you want and update the timestamp when it's done executing.
One obvious flaw with this system is that if you have no visitors in let's say a 30 minute frame, you will miss 2 updates. Though I guess that if you have no visitors you also have no point in generating brackets?
I have access to an traffic data server from where I get XML files with the information that I need. (Example: Point A to Point B: travel time 20 min, distance 18 miles, etc).
I download the XML file (which is archived), extract it, then process it and store it into a DB. I only allow for the download of the XML file per request but only if 5 minutes have passed from last download. The XML on the traffic server gets updated every 30 seconds to maybe 5 minutes. During the 5 minute period any user requesting the webpage will retrieve the data from the DB (no update) therefore limiting the number of requests made to the traffic server.
My problem with my current approach is that when I get new XML file the whole process takes some time (3-7 seconds) and that makes the user wait too much before getting anything. However, when no XML download is needed and all the data gets displayed straight from the DB the process is very fast.
The archived XML is about 100-200KB, while the unarchived one is about 2MB. The XML file contains traffic data from 3 or 4 states, while I only need the data for one state. That is why I currently use the DB method.
Is this approach a good one? I was wondering if I should just extract the data directly from the downloaded XML file for every request and limit somehow how often the XML file gets downloaded from the traffic server. Or, can anyone point me to a better way?
Sample of the XML file
This is how it looks on my website
You need to download the XML each time it changes.
But only if you've got active users in the next period of time it takes to download the files.
As you can't foresee the future, you don't know whether or not you'll get a request of a user within the next 7 seconds.
You can however possibly find out with a HEAD request if the XML file has been updated.
So you could create yourself a service that is downloading from the remote system the XML each time it changes. In case the date is indeed not needed that often, you can configure that service to not check and/or download that often.
The rest of your system can be independent to it as long as you can learn about the best configuration of the download service by statistical analysis of your users behavior.
If you need this even more real-time you need to configure the new services based on changing data from the other system and then you need to start to interchange data bidirectionally between those two systems which is more complicated and can lead to more side-effects. But from the numbers you give, this level of detail probably isn't needed anyway, so I won't care about it.
I am thinking about converting a visual basic application (that takes pipe delimited files and imports them into a microsoft sql database) into a php page. One of these files is on average about 9 megabytes in size. (I couldn't be accurate about the number of lines involved but I'd say it's about 20 thousand)
One of the advantages being that any changes made to the page would be automatically 'deployed' to the intended user (currently when I make changes to the visual basic app, which was originally created by someone else, I have to put the latest version on all the PCs of the people that use it).
Problem is these imports can take like two minutes to complete. Two minutes is a long time in website-time so I would like to provide feedback to the user to indicate that the page hasn't failed/timed out and is definitely doing something.
The best idea that I can think of so far is to use ajax to do it incrementally. Say import 1000 records at a time then feed back, implement the next 1000, feed back, and so on.
Are there better ways of doing this sort of thing that wouldn't require me to learn new programming languages or download apps or libraries?
You don't have to make the Visual Basic -> PHP switch. You can stick with VB syntax in ASP or ASP.NET applications. With an ASP based solution, you can reuse plenty of the existing code so it won't be learning a new language / starting from scratch.
As for how to present a long running process to the user, you're looking for "Asynchronous Handlers" - the basic premise being the user visits a web page (A) which starts the process page (B).
(A) initiates (B), reports starting to the user and sets the page to reload in n seconds.
(B) does all the heavy lifting - just like your existing VB app. Progress is stored in some shared space (a flat file, a database, a memory cache, etc)
Upon reload, (A) reports current progress of (B) by read-only accessing the shared space (B) is keeping progress in.
Scope of (A):
Look for running (B) process - report status if found, or initiate fresh (B) process. Since (B) appears to be based on the existence of files (from your description) you might grant (A) the ability to determine if there's any point in calling (B) or not (ie. If files exist call (B) else report: nothing to do) or you may wish to keep the scopes entirely free and call (B).
Report progress of (B).
Should take very little time to execute, may want to include HTTP refresh header so user automatically gets updates.
Scope of (B):
Same as existing VB script – look for files, load… yada yada yada.
Should take similar time to execute as existing VB script (2 minutes)
Potential Improvements:
(A) could use an AJAX interface, so instead of a page-reload (HTTP refresh), an AJAX call is made every n seconds and simply the status box is updated. Some sort of animated icon (swirling wheel) will give the user the impression something is going on between refreshes.
It sounds like (B) could benefit from a multi-threaded approach (loading multiple files at once) depending on whether the files are related. As pointed out by Ponies, there may be a better strategy to such a load, but that's a different topic all together :)
Some sort of semaphore/flag approach may be required if page (A) could be simultaneously hit at the same time by multiple users and (B) takes a few seconds to start up and report status'.
Both (A) and (B) can be developed in PHP or ASP technology.
How are you importing the data into the database? Ideally, you should be using SQL Server's BULK INSERT which likely would speed up things. But it's still a matter of uploading the file for parsing...
I don't think it's worth the effort to get status of insertions - most sites only display an animated gif/etc (like the hourglass, etc) to indicate that the system is processing things but no real details.
I am working on my bachelor's project and I'm trying to figure out a simple dilemma.
It's a website of a football club. There is some information that will be fetched from the website of national football association (basically league table and matches history). I'm trying to decide the best way to store this fetched data. I'm thinking about two possibilities:
1) I will set up a cron job that will run let's say every hour. It will call a script that will fetch the league table and all other data from the website and store them in a flat file.
2) I will use Zend_Cache object to do the same, except the data will be stored in cached files. The cache will get updated about every hour as well.
Which is the better approach?
I think the answer can be found in why you want to cache the file. Is it to place minimal load on the external server by only updating the cache every so often, or is it to keep pages loading fast because the file takes long to download or process?
If it's only to respect the other server, and fetching/processing the page takes little noticable time to the end user, I'd just implement Zend_Cache. It's simple, you don't have to worry about one script downloading the page, then another script loading the downloaded data (plus the cron job).
If the cache is also needed because fetching/processing the page is significant, I'd still use Zend_Cache; however, I'd set the cache to expire every 2 hours, and setup a cron job (or something similar) to manually update the cache every hour. Sure, this adds back the complexity of two scripts (or at least adding a request flag to manually refresh the cache), but should the cron job fail, you're still fine.
Well if you choose 1 it somewhat adds complexity because you have to use cron as well (not that cron is overly complex) and then you have to test that the data file is complete before using it or deall with moving files from a temp location after they have downloaded and been parsed to the proper format.
If you use two it eliminates much of 1, except now on the request where the cache is dead you have to wait for the download/parse.
I would say 1 is the better option, but 2 is going to be easier to implement and less prone to error. That said its fairly trivial to implement things in the cron script to prevent the negatives i describe. So i would probably go with 1.