I would like to create a caching system that will bypass some mechanisms in order to improve the performance.
I have some examples:
1-) I have a dynamic PHP page that is updated every hour. The page content is same for every user. So in this case I can either:
a) create an HTML page, and that page can be generated every hour. In this case I would like to bypass PHP, so there should be a static page and if the database is updated, a new HTML file will be generated. How can I do this? I can create a crontab script that generates the HTML file, but it does not seem as an elegant way.
b) cache the output in the memory, so the web server will update the content every hour. I guess I need a memory cache module for the web server. There is a unofficial memcache module for lighttpd, but it does not seem stable, I have also heard a memcache module for nginx but don't know whether is this possible or not. This way seems more elegant and possible, but how? Any ideas? (Again, I would like to bypass PHP in this case)
Another example is that I have a dynamic PHP page that is updated every hour, in that page only user details part is fully dynamic (so a user logs in or out and see his/her status in that section)
Again, how can I create a caching system for this page? I think, if I can find a solution for the first example, then I can use AJAX in that part with the same solution. Am I correct?
edit: I guess, I could not make clear. I would like to bypass PHP completely. PHP script will be run once an hour, after that no PHP call will be made. I would like to remove its overhead.
Thanks in advance,
Go with static HTML. Every hour simply update a static HTML file with your output. You'll want to use an hourly cron to run a PHP script to fopen() and fwrite() to the file. There's no need to hit PHP to retrieve the page whatsoever. Simply make a .htaccess mod_rewrite redirection rule for that particular page to maintain your current URL naming.
Although not very elegant, static HTML with gzip compression to me is more efficient and would use less bandwidth.
An example of using cron to run a PHP script hourly:
// run this command in your console to open the editor
crontab -e
Enter these values:
01 * * * * php -f /path/to/staticHtmlCreater.php > /dev/null
The last portion ensures you will not have any output. This cron would run on the first minute of every hour.
UPDATE
Either I missed the section regarding your dynamic user profile information or it was added after my initial comment. If you are only using a single server, I would suggest you make a switch to APC which provides both opcode caching and a caching mechanism faster than memcached (for a single server application). If the user's profile data is below the fold (below the user's window view), you could potentially wait to make the AJAX request until the user scrolls down to a specified point. You can see this functionality used on the facebook status page.
If this is just a single web server, you could just use PHP's APC module to cache the contents of the page. It's not really designed to cache entire pages, but it should do in a pinch.
Edit: I forgot to mention that APC isn't (yet) shipped with PHP, but can be installed from PECL. It will be shipped as part of PHP 6.
A nice way to do it is to have the static content stored in a file. Things should work like this :
your PHP script is called
if your content file has been modified more than 1 hour ago (width filemtime($yourFile))
re-generate content + store it in the file + send it back to the client
else
send the file content as is (with file($yourFile), or echo file_get_contents($yourFile)
Works great in every cases, even under heavy load.
Related
I'm creating a website that requires a file to be generated and stored on the server periodically (an XML feed for iTunes). The page is generated using ExpressionEngine. I discovered that the website's current server has a very restricted cPanel and doesn't have access to cron.
So I'm considering two options; find an alternative way to access the cronjobs (if they are available), or find an alternative way to created regularly scheduled tasks.
Regarding the first option, how would I go about determining if a server has cron available? I'm not sure how useful this would be anyway since I don't think the server allows shell access (it's a very basic setup for people who aren't tech savvy).
Regarding the second option, a friend mentioned to me that the functionality of cronjobs can just be done in PHP. How would I go about this?
Or, am I perhaps thinking too much with this? The page in ExpressionEngine that outputs the XML file is domain.com/itunes/itunes_feed. This just has some EE tags that outputs the relevant XML and the resultant page is in .xml format. Is it enough to just submit the above url to iTunes, or does it have to be a url to the actual pre-existing file on the server?
Option 1
Simply contact your hosts and ask them do they support cron jobs, and if so, how to set up.
Option 2
I only just set up my own set of cron jobs yesterday..
Create a php file that runs the code you want,
Set up and account on https://www.easycron.com/
Upload your php file to easycron
Set the times in which you would like your php code to run
Simple as that! Does that make sense?
this isn't the best method for doing the task, but how would you run a cronjob of a zend view.
The view is used to generate a file using an output buffer and then save the file on the server, it runs once a day.
Would it just be a matter of calling the url of action of the controller with curl:
23 50 * * curl http://pclite.com/statistics/generate
The application required authentication though.
If you are the admin of the server, I will not do this way,
I will code a PHP page using curl to download and save the file, since you coding a php file,you are able to simulate the login procedure , you can write the username and password in the php file, and make sure the file is saved by where you want
then I using LYNX in the corn, a text browser , it will call this php file once a day, so you don't have to record any username password in the cronjob and this php do what ever you wan to grab
Since you said, that this is not the best method for doing such a task, i won't tell it again :D
If the cronjob runs on the same server your webserver is on, you could check the client-ip and skip authentication if they are the same. Because if the "attacker" can send requests from your own server to the application you really have a serious security issue.
So, yes. If you skip authentication when the ip is the same you just need to call the url.
As any other class Zend_View can be instantiated from anywhere and in particular Zend_View can render to a variable. This means that you do not need to call the whole web application if all you want to do is render something.
As stated your other option is to have an entry point to the application and call it to get the return. But if you're just saving some file to the server it could be perceived as a better approach to have the cronjob be a script that does any thing. This way you will also save some load of your web application. The last thing may not be so relevant but what if in the future you want to call this endpoint several times per day for a lot of users or something?
So, you can create a CLI script that includes Zend_View and renders within itself. As always with Zend Framework the implementation choice i left entirely to you.
I've seen many web apps that implement progress bars, however, my question is related to the non-uploading variety.
Many PHP web applications (phpBB, Joomla, etc.) implement a "smart" installer to not only guide you through the installation of the software, but also keep you informed of what it's currently doing. For instance, if the installer was creating SQL tables or writing configuration files, it would report this without asking you to click. (Basically, sit-back-and-relax installation.)
Another good example is with Joomla's Akeeba Backup (formerly Joomla Pack). When you perform a backup of your Joomla installation, it makes a full archive of the installation directory. This, however, takes a long time, and hence requires updates on the progress. However, the server itself has a limit on PHP script execution time, and so it seems that either
The backup script is able to bypass it.
Some temp data is stored so that the archive is appended to (if archive appending is possible).
Client scripts call the server's PHP every so often to perform actions.
My general guess (not specific to Akeeba) is with #3, that is:
Web page JS -> POST foo/installer.php?doaction=1 SESSID=foo2
Server -> ERRCODE SUCCESS
Web page JS -> POST foo/installer.php?doaction=2 SESSID=foo2
Server -> ERRCODE SUCCESS
Web page JS -> POST foo/installer.php?doaction=3 SESSID=foo2
Server -> ERRCODE SUCCESS
Web page JS -> POST foo/installer.php?doaction=4 SESSID=foo2
Server -> ERRCODE FAIL Reason: Configuration.php not writable!
Web page JS -> Show error to user
I'm 99% sure this isn't the case, since that would create a very nasty dependency on the user to have Javascript enabled.
I guess my question boils down to the following:
How are long running PHP scripts (on web servers, of course) handled and are able to "stay alive" past the PHP maximum execution time? If they don't "cheat", how are they able to split the task up at hand? (I notice that Akeeba Backup does acknowledge the PHP maximum execution time limit, but I don't want to dig too deep to find such code.)
How is the progress displayed via AJAX+PHP? I've read that people use a file to indicate progress, but to me that seems "dirty" and puts a bit of strain on I/O, especially for live servers with 10,000+ visitors running the aforementioned script.
The environment for this script is where safe_mode is enabled, and the limit is generally 30 seconds. (Basically, a restrictive, free $0 host.) This script is aimed at all audiences (will be made public), so I have no power over what host it will be on. (And this assumes that I'm not going to blame the end user for having a bad host.)
I don't necessarily need code examples (although they are very much appreciated!), I just need to know the logic flow for implementing this.
Generally, this sort of thing is stored in the $_SESSION variable. As far as execution timeout goes, what I typically do is have a JavaScript timeout that sets the innerHTML of an update status div to a PHP script every x number of seconds. When this script executes, it doesn't "wait" or anything like that. It merely grabs the current status from the session (which is updated via the script(s) that is/are actually performing the installation) then outputs that in whatever fancy method I see fit (status bar, etc).
I wouldn't recommend any direct I/O for status updates. You're correct in that it is messy and inefficient. I'd say $_SESSION is definitely the way to go here.
I have a web-service serving from a MySQL database. I would like to create cache file to improve the performance. The idea is once a while we read data from DB and generate a text file. My question is:
What if a client-side user is accessing the file while we are generating it?
We are using LAMP. In PHP there is flock() handles concurrency problem, but my understanding is that it's only for when 2 PHP processes accessing the file simultaneously. Our case is different.
I don't know whether this will cause issues at all. If so, how can I prevent it?
Thanks,
don't use locking;
if your cachefile is /tmp/cache.txt then you should always regenerate the cache to /tmp/cache2.txt and then do a
mv /tmp/cache2.txt /tmp/cache.txt
or
rename('/tmp/cache2.txt','/tmp/cache.txt')
the mv/rename operation is atomic if it happens inside the same filesystem; no locking needed
All sorts of optimisation options here;
1) Are you using the MySQL queryCache - that can take a huge load off the database to start with.
2) You could pull the file through a web proxy like squid (or Apache configured as a reverse caching proxy). I do this all the time and it's a really handy technique - generate the file by fetching it from a url using wget for example (that way you can have it in a cron job). The web proxy takes care of either delivering the same file that was there before, or regenerating it if needs be.
3) You don't want to be rolling your own file locking solution in this scenario.
Depending on your scenario, you could also consider cacheing pages in something like memcache which is fantastic for high traffic scenarios, but possibly beyond the scope of this question.
You can use A -> B switching to avoid this issue.
E.g. : Let there be two copies of this cache file A and B, program should read these via a symlink, C.
When program is building the cache, it would modify the file that is not "current" I.e. if C link to A, update B. Once update is complete, switch symlink to B.
next time, update A and switch symlink to A once update is complete.
this way clients would never read a file while it is being updated.
When a client-side access the file, it reads it as it is in that moment.
flock() is for when 2 PHP processes accessing the file simultaneously.
I would solve it like this:
While generating the new text file, save it to a temporary file (cache.tmp), that way the old file (cache.txt) is being accessed like before.
When generation is done, delete the old file and rename the new file
To avoid problems during that short period of time, your code should check wether cache.txt exists and retry for a short period of time.
Trivial but that should do the trick
I've heard of two caching techniques for the PHP code:
When a PHP script generates output it stores it into local files. When the script is called again it check whether the file with previous output exists and if true returns the content of this file. It's mostly done with playing around the "output buffer". Somthing like this is described in this article.
Using a kind of opcode caching plugin, where the compiled PHP code is stored in memory. The most popular of this one is APC, also eAccelerator.
Now the question is whether it make any sense to use both of the techniques or just use one of them. I think that the first method is a bit complicated and time consuming in the implementation, when the second one seem to be a simple one where you just need to install the module.
I use PHP 5.3 (PHP-FPM) on Ubuntu/Debian.
BTW, are there any other methods to cache PHP code or output, which I didn't mention here? Are they worth considering?
You should always have an opcode cache like APC. Its purpose is to speed up the parsing of your code, and will be bundled into PHP in a future version. For now, it's a simple install on any server and doesn't require you write or change any code.
However, caching opcodes doesn't do anything to speed up the actual execution of your code. Your bottlenecks are usually time spent talking to databases or reading to/from disk. Caching the output of your program avoids unnecessary resource usage and can speed up responses by orders of magnitude.
You can do output caching many different ways at many different places along your stack. The first place you can do it is in your own code, as you suggested, by buffering output, writing it to a file, and reading from that file on subsequent requests.
That still requires executing your PHP code on each request, though. You can cache output at the web server level to skip that as well. Crafting a set of mod_rewrite rules will allow Apache to serve the static files instead of the PHP code when they exist, but you'll have to regenerate the cached versions manually or with a scheduled task, since your PHP code won't be running on each request to do so.
You can also stick a proxy in front of your web server and use that to cache output. Varnish is a popular choice these days and can serve hundreds of times more request per second with caching than Apache running your PHP script on the same server. The cache is created and configured at the proxy level, so when it expires, the request passes through to your script which runs as it normally would to generate the new version of the page.
You know, for me, optcache , filecache .. etc only use for reduce database calls.
They can't speed up your code. However, they improve the page load by using cache to serve your visitors.
With me, APC is good enough for VPS or Dedicated Server when I need to cache widgets, $object to save my mySQL Server.
If I have more than 2 Servers, I like to used Memcache , they are good on using memory to cache. However it is up to you, not everyone like memcached, and not everyone like APC.
For caching whole web page, I ran a lot of wordpress, and I used APC, Memcache, Filecache on some Cache Plugins like W3Total Cache. And I see ( my own exp ): Filecache is good for caching whole website, memory cache is good for caching $object
Filecache will increase your CPU if your hard drive is slow, and Memory cache is terrible if you don't have enough memory on your VPS.
An SSD HDD will be super good speed to read / write file, but Memory is always faster. However, Human can't see what is difference between these speed. You only pick one method base on your project and your server ( RAM, HDD ) or are you on a shared web hosting?
If I am on a shared hosting, without root permission, without php.ini, I like to use phpFastCache, it a simple file cache method with set, get, stats, delete only.
In Addition, I like to use .htaccess to cache static files like images, js, css or by html headers. They will help visitors speed up your page, and save your server bandwidth.
And If you can use .htaccess to redirect to static .html cache if you cache whole page is a great thing.
In future, APC or some Optcache will be bundle into PHP version, but I am sure all the cache can't speed up your code, they use to:
Reduce Database / Query calls.
Improve the speed of page load by use cache to serve.
Save your API Transactions ( like Bing ) or cURL request...
etc...
A lot of times, when it comes to PHP web applications, the database is the bottleneck. As such, one of the best things you can do is to use memcached to cache results in memory. You can also use something like xhprof to profile your code, and really dial in on what's taking the most time.
Yes, those are two different cache-techniques, and you've understood them correctly.
but beware on 1):
1.) Caching script generated output to files or proxies may render problems
if content change rapidly.
2.) x-cache exists too and is easy to install on ubuntu.
regards,
/t
I don't know if this really would work, but I came across a performance problem with a PHP script that I had. I have a plain text file that stores data as a title and a URL tab separated with each record separated by a new line. My script grabs the file at each URL and saves it to its own folder.
Then I have another page that actually displays the local files (in this case, pictures) and I use a preg_replace() to change the output of each line from the remote url to a relative one so that it can be displayed by the server. My tab separated file is now over 1 MB and it takes a few SECONDS to do the preg_replace(), so I decided to look into output caching. I couldn't find anything definitive, so I figured I would try my own hand at it and here's what I came up with:
When I request the page to view stuff locally, I try to read it from a variable in a global scope. If this is empty, it might be that this application hasn't run yet and this global needs populated. If it was empty, read from an output file (plain html file that literally shows everything to output) and save the contents to the global variable and then display the output from the global.
Now, when the script runs to update the tab separated file, it updates the output file and the global variable. This way, the portion of the script that actually does the stuff that runs slowly only runs when the data is being updated.
Now I haven't tried this yet, but theoretically, this should improve my performance a lot, although it does actually still run the script, but the data would never be out of date and I should get a much better load time.
Hope this helps.