I have a website set up to check if certain channels on ustream.com and livestream.com are live or not.
The way it currently works it queries a database table of channels and then for each channel uses the API for the ustream.com or livestream.com to check if it is live or not, and does so each time someone visits the site.
The problem is that in just the first half a day of making the site live it received over 350 visits, and people keep refreshing the page so it had 15,000 hits. Which is great except that it is overloading the database.
I am thinking I need to use a cron job and create a cached page that refreshes every few minutes so that it queries the database and the API's far fewer times per hour.
Can someone give me some pointers on how to go about doing that? I know how to set up a cron job, but how do I create a cached page that is constantly being updated?
Or if you have a better solution I'd like to hear it.
This isn't a paid job, I built it as a free service to help people know which livestreamers are currently live at any particular moment.
Here is a link to the site,
http://freedomfighterstreams.com/
I am using Codeigniter MVC framework.
I would recommemd you this caching library for Codeigniter, since it's more customizable than CI's buit one:
https://github.com/philsturgeon/codeigniter-cache/blob/master/README.md
You'll find usefull examples there.
Your results will be cached as a files.
(posting this as an answer, not a comment, because of the privelages)
Related
This issue has been quite the brain teaser for me for a little while. Apologies if I write quite a lot, I just want to be clear on what I've already tried etc.
I will explain the idea of my problem as simply as possible, as the complexities are pretty irrelevant.
We may have up to 80-90 users on the site at any one time. They will likely all be accessing the same page, that I will call result.php. They will be accessing different results however via a get variable for the ID (result.php?ID=456). It is likely that less than 3 or 4 users will be on an individual record at any one time, and there are upwards of 10000 records.
I need to know, with less than a 20-25 second margin of error (this is very important), who is on that particular ID on that page, and update the page accordingly. Removing their name once they are no longer on the page, once again as soon as possible.
At the moment, I am using a jQuery script which calls a php file, reading from a database of "Currently Accessing" usernames who are accessing this particular ID, and only if the date at which they accessed it is within the last 25 seconds. The file will also remove all entries older than 5 minutes, to keep the table tidy.
This was alright with 20 or 30 users, but now that load has more than doubled, I am noticing this is a particularly slow method.
What other methods are available to me? Has anyone had any experience in a similar situation?
Everything we use at the moment is coded in PHP with a little jQuery. We are running on a server managed offsite by a hosting company, if that matters.
I have come across something called Comet or a Comet Server which sounds like it could potentially be of assistance, but it also sounds extremely complicated for my purposes and far beyond my understanding at the moment.
Look into websockets for a realtime socket connection. You could use websockets to push out updates in real time (instead of polling) to ensure changes in the 'currently online users' is sent within milliseconds.
What you want is an in-memory cache with a service layer that maintains the state of activity on the site. Using memcached might be a good starting point. Your pseudo-code would be something like:
On page access, make a call to CurrentUserService
CurrentUserService takes as a parameter the page you're accessing and who you are.
Each time you call it, it removes whatever you were accessing before from the cache.
Then it adds what you're currently accessing.
Then it compiles a list of who else is accessing the same thing based on the current state in the cache.
It returns this list, which your page processes and displays.
If you record when someone accesses a page, you can set a timeout for when the service stops 'counting' them as accessing the page.
I'm working on campus project using PHP and CodeIgniter.
It's only a project for my assignment.
Then it comes that i have to limit bandwidth used by user (of app, not linux user).
And i have no idea to implement this.
I wonder if someone know the logic or ever work on the similar tasks.
What basically i need is, how to track that user (system user, not linux user) bandwidth?
Should i count every requests and responses for that user?
How to count for imagse and static files download for specific user system?
Any hints is greatly appreciated.
Thaks
Ivan
One of the only ways I can think of (using php) is to parse the access.log of the webserver, and add up the bandwidth for each client.
The next time a page is loaded and the client has reached a set limit you can then run what ever code you want.
Parsing the log each page load though does seem like it would be time consuming.
Thats how some website statistic programs get that info.
EDIT
Also some log files get archived at certain points, like mine gets a fresh start every Sunday at 6am so if a user was browsing during that time, their access history would disapear after 6, so saving the clients bandwidth in a database is a way to keep that information all the time
I'm constructing a social networking website and I'm currently testing Ajax on certain features.
Right now, I have 3 separate calls coming from my include (which is on every page) that will check the database for new messages, achievements, and notifications and update the specific divs.
My biggest worry is that 3 separate calls would eventually lead to performance issues.
I have no way of really seeing how a large user-base would affect this, because my site is currently in beta and I haven't advertised yet. So I have a limited number of people to test with. When I do advertise and gain more members, I don't want to run into any hiccups.
I currently have the calls being made every 20 seconds. I was hoping someone could give me some advice on how long I should set the intervals. I have no way of currently knowing if 20 seconds would be too much, or if I could even set it to 10 and be fine.
Any advice would be appreciated. Thanks.
Perhaps you should look into server push to push the data to the client when there are new messages and so on. That means that instead of polling using resources, resources are only used when there are new messages and pushed to the client.
Some server push servers are:
Ajax Push Engine
Nginx Push Module
For a homework project, I'm creating a PHP driven website which main function is aggregating news about various university courses.
The main problem is this: (almost) each course has it's own website. These are usually just plain HTML or built using some simple free CMS system.
As a student, participating in 6-7 courses, almost every day you go through 6-7 websites checking if there are any news. The idea behind the project is that you don't have to do that, instead, you just check the aggregation site.
My idea is the following: each time a student logs in, go through his course list. For every course, get it's website (recursively, like with wget), and create a hash value of it. If the hash is different then one stored in database, we know that site has changed, and we notify the student.
So, what do you think, is this reasonable way to achieve the functionality?
And if yes, what is (technically) the best way to go about this? I was checking php_curl, put I don't know if it can get a website recursively.
Furthermore, there's a slight problem I have somewhat limited resources, only a few MB of quota on public (university) server. However, if that's a big problem, I could use a seperate hosting solution.
Thanks :)
Just use file_get_contents, or cURL if you absolutely have to (in case you need COOKIES).
You can use your hashing trick to check for modifications but it's not very elegant. What you want to know is when was it last changed. I doubt this information is on the website, but maybe they offer an RSS feed or some webservice or API you can use for this purpose.
Don't worry about doing recursive requests. Just make a new request each time.
"When all else fails, build a scraper"
I am really close to finishing up on a project that I've been working on. I have done websites before, but never on my own and never a site that involved user generated data.
I have been reading up on things that should be considered before you go live and I have some questions.
1) Staging... (Deploying updates without affecting users). I'm not really sure what this would entail, since I'm sure that any type of update would affect users in some way. Does this mean some type of temporary downtime for every update? can somebody please explain this and a solution to this as well.
2) Limits... I'm using the Kohana framework and I'm using the Auth module for logging users in. I was wondering if this already has some type of limit (on login attempts) built in, and if not, what would be the best way to implement this. (save attempts in database, cookie, etc.). If this is not whats meant by limits, can somebody elaborate.
Edit: I think a good way to do this would be to freeze logging in for a period of time (say 15 minutes), or displaying a captcha after a handful (10 or so) of unseccesful login attempts
3) Caching... Like I said, this is my first site built around user content. Considering that, should I cache it?
4) Back Ups... How often should I backup my (MySQL) database, and how should I back it up (MySQL export?).
The site is currently up, yet not finished, if anybody wants to look at it and see if something pops out to you that should be looked at/fixed. Clashing Thoughts.
If there is anything else I overlooked, thats not already in the list linked to above, please let me know.
Edit: If anybody has any advice as to getting the word out (marketing), i'd appreciate that too.
Thanks.
EDIT: I've made the changes, and the site is now live.
1) Most sites who incorporate frequent updates or when their is a massive update that will take some time use a beta domain such as beta.example.com that is restricted to staff until it is released to the main site for the public.
2) If you use cookies then they can just disable cookies and have infinite login attempts, so your efforts will go to waste. So yeah, use the database instead. How you want it to keep track is up to you.
3) Depends on what type of content it is and how much there is. If you have a lot of different variables, you should only keep the key variables that recognize the data in the database and keep all the additional data in a cache so that database queries will run faster. You will be able to quickly find the results you want and then just open the cache file associated with them.
4) It's up to you, it really depends on traffic. If you're only getting 2 or 3 new pieces of data per day, you probably don't want to waste the time and space backing it up every day. P.S. MySQL exports work just fine, I find them much easier to import and work with.
1) You will want to keep taking your site down for updates to a minimum. I tend to let jobs build up, and then do a big update at the end of the month.
2) In terms of limiting login attempts; Cookies will be simple to implement but is not fool-proof, it will prevent the majority of your users but it can be easily circumvented so it would be best to choose another way. Using a database would be better but a bit more complicated to implement and could add more strain to a database.
3) Cacheing depends greatly on how often content is updated or changes. If content is changing a lot it may not be worth caching data but if a lot of more static then maybe using something like memcache or APC will be of use.
4) You should always make regular backups. I do one daily via a cron job to my home server although a weekly one would suffice.
Side notes: YSlow indicates that:
you are not serving up expires headers on your CSS or images (causes pages to load slower, and costs you more bandwidth)
you have CSS files that are not served up with gzip compression (same issues)
also consider moving your static content (CSS,Images,etc.) to a separate domain (CDN) for faster load times