Caching a page to multiple files or a single file - php

I'm developing a wiki-like web app and each page has 5, individually-editable content parts to it.
I have a simple caching class that saves the rendered parts to a file.
If a part of the page has not changed, it loads the cache, if it has, it renders it and then saves it to cache.
Because the page has 5 parts that are separately editable, I am saving each part as its own file, so when an edit is made, only that part is re-rendered and cached.
But, this also mean that every load, 5 files are read and included in the code.
Is it better to do it this way, or save the entire page in a single cache file?

That depends on multiple factors, I guess...
site load
file sizes of pieces
update frequency
are updates likely to happen on more then one subitem?
...
I would optimize for the viewing of the site, because it happens a lot more frequent than making a change, I suppose. So I would cache it in one file.
The only way to know, is to measure it... with the microtime()-function, you can compare the script executiontime on different points and during different tryouts...

Related

It is possible to cache all html/php page, except a part of it in browser?

Is it possible to cache all page, but do not cache a part of it in the browser?
For example, i have a page with date. Only date is changing daily, the rest of page never changes. How i shall cache such a page in the browser?
Could on the browser cached page contain dynamical content?
Actually, i am new to caching, i do not understand how it works with dynamical content and browser caching. Is this right, that from the moment some dynamic page is cached, it is served always as it was after during the caching, and new dynamic content is not displayed?
I do not ask about the server side caching, only about the browser side caching.
There is no specific tool for ignoring caching a part of the page. but you can do some tricks like:
You can cache whole page and change the part you want by iframe!
You can cache whole page and change the part you want by ajax!
You can cache whole page and change the part you want by a javascript
file!
i have not checked iframe solution and not sure if it works.
If the js files has cache, you can add a version in their file name like scripts.v.2.3.js and load them by version name.
You can't really dynamically cache "part" of a file, you can however cache separate assets the more you split your page into separate assets the more you can cache each one of them separately.
Your index.html could have a cache setting of dont-cache (using the Cache-Control header)
Your logo.png could have a long cache set of 10 days
Now if you want to have certain elements changing but the core to stay the same then I believe this would be a better job for JavaScript. What you could do is write a Javascript function to display the date, then you can fully cache the HTML page and the Javascript page and since the raw content never changes (only the manipulation of the DOM does you have very little client->server requests.

Overwriting files properly

I am trying to manage caching on heavily used webpage written in PHP. I have marked some cacheable sections of PHP code, which I want to execute only pre-cache when administrator make changes in CMS. For this, I use this method:
I have file (for example "index-source.php") with some marked ares of PHP code, which are interpretable alone. When admin change some settings, these marked parts are executed and replaced with result (for example MySQL queries which reads menu items from DB are replaced with generated HTML menu). Resulted file is saved as new "index.php", which still have some PHP code, which can't be optimized by caching.
Now to my problem
If we assume, that this server is heavilly load, which means there is for example 100 requests per second, which in PHP requires file index.php. If I will use file_put_contents() to overwrite this index.php with new pre-cached version, is there any risk, that some requests will be interrupted, because of locked/not fully overwritten file? Basically I want to somehow update my PHP file and assure that PHP will include complete old or complete new version of that file or wait few milliseconds until file is overwritten. I dont't want PHP to fail require or load partially overwritten file.
Is that possible? Thanks
file_put_contents is not what you want.
Have a look at this project, and dive into the source to get a feel for what challenges you may have to face as well as the solution chosen.
https://github.com/PHPSocialNetwork/phpfastcache

What is full page caching

I am working on Magento(EE). I found a term "Full page caching". Can any one please tell me what is "full page caching in Magento or in Zend ?
Caching the full page?
As in every thing that is generated from a script is written to HTML and served next time, improving performance (by reducing load and not having to generate the page for every visit).
However this comes as the disadvantage of having occasionally out of date pages.
If you website isn't getting a significant amount of hits, enabling full page caching or caching of all the HTML it going to make little difference
Magento is a shopping website CMS.
It simple means that to boost the performance of the website, it will cache (store in buffer) the HTML output of a particular page. For example, your homepage, everytime user opens your homepage, the PHP behind it, has to fetch the information from database, parse it with related views and then display the final HTML output, LOTSA processing.
Instead, caching will store the HTML output in its buffer and when user comes in, it will show the HTML cached output rather going to the database and stuff. However, life of cache has to be defined although modern cache plugins will check for any changes in the output data and update the cache as is.
Simple?

Keeping track of links or references to image files and deleting unused ones (PHP/Database)

I need a way to remove "unused" images from my filesystem, i.e. images that are never accessed from any point in my website (doesn't matter if I break external links. I might disable external hotlinking altogether). What's the best way of going about this? Regular users can add multiple attachments to topics/posts and content contributers can bulk upload large numbers of images which can be used in articles or image galleries.
The problem is that the images could be referenced in any of the following ways:
From user content (text/html, possibly Markdown or BBCode) stored in the database
Hardcoded into an HTML page
Hardcoded into a PHP file
Hardcoded into a CSS file
As an "attachment" field in a database table, usually containing only the filename itself with no path, because the application assumes that it would be in a certain folder.
And to top it off, the path of the image could be an absolute or relative HTTP or PHP path and may or may not be built with string concatenation in PHP.
So obviously find/replace or regexing the database or filesystem is out of the question. But luckily for you and me, this system isn't fully implemented yet and I don't need anything that deals with an existing hoard of images. I just need to set up some efficient structure that will allow this in the future.
Some ideas I've thought of:
Intercepting the HTTP request for the image with PHP, and keeping track of the HTTP_REFERER. The problem with this is that just because no one has clicked on a link at the time of checking this doesn't mean the link doesn't exist.
Use extreme database normalization - i.e. make a table for images and use foreign keys for anything that references it. However this would result in making a metric craptonne of many-to-many relationships (and the crosstables) in addition to being impractical for any regular user to use.
Backup all the images and delete them, and check every single 404 request and run a script each time that attempts to find the image from the backup folder and puts it in the "real" folder. The problem is that this cache would have to be purged every so often and the server might be strained when rebuilding the cache.
Ideas/suggestions? Is this just something you have to ignore and live with even if you're making a site with a ridiculous amount of images? Even if it's not worth it, how would something work just for proof-of-concept (I added the garbage-collection tag just because this might be going into that area conceptually).
I will admit that my experience with this was simpler than yours. I had no 'user generated content' so to speak, and my images were all in only templates or database with full path. But what I did is create a perl script that
Analyzed my HTML templates, database
table, and CSS generated a list of
files
In the HTML it looked for <img> tags
In the CSS it looked for any .png, .jp*g, or .gif regex strings
The tables were easy because I had an Image table for the image data
The files list was then
ordered to remove duplicates
The script iterated through the list and
wrote a csv like:
filename,(CSS filename|HTML filename|DBTABLE),(exists|notexists) for
auditing
In another iteration it
renamed all files not in the list by
appended .del to the filename
After regression testing I called the
script with a -docleanup tag which
told it to go through and delete all
the .del appended files.
If for whatever reason an image was tagged
as .del and shouldn't have been, I
just manually renamed it back to its
original form.
A couple of notes: I realize that I could have made this script 'smoother' and done multiple things in multiple steps, but its use grew over time and I wanted clearly delineated processing steps so it couldn't ever run amok. I used the CSV to go back and clean up the information where the image didn't exist.

Viewing large text file in a browser

I need to write a text file viewer (not the directory tree, but the actual file contents) for use in a browser. It will be used to view large files. I want to give the user the ability to actually ummm, browse the file, ie prev page & next page buttons, while each page will show only a portion of the file.
Two question:
Is there anyway to pass the file descriptor through POST (or something) so that on each page I can keep reading from an already open file, and not starting all over again (again - huge files)
Is there a way to read the file backwards? Will be very useful for browsing back in a file.
Any other implementation ideas are very welcome. Thanks
Keeping the file open between requests is not a good idea - you don't have to "start all over again" - just maintain an offset and use fseek() to jump to that offset. That way, you can also implement the "backwards jumping".
Cut your huge files into smaller files once, and then serve the small files to the user.
You should consider pagination. If you're concerned about the user being frustrated by needing to click "next" too often, you could make each chunk reasonably large (so a normal reader pages every 20min).
Another option is the Chunked-Endoding transfer type: Wikipedia Entry. This would allow your server to respond quickly and give the user something to read while it streams the rest of the file over the network (rather than the server needing to read in the file and send it all at once). This could dramatically improve the perceived performance compared to serving the files normally, but still consumes a lot of bandwidth for your server.
You might be able to simulate a large document with Javascript and AJAX, but only send pieces at a time for better performance.
Consider sending a few pages worth of your document and attaching listeners to the scroll event of your browser. Over time or as the user scrolls down you AJAX more chunks. This creates a few annoying UX edge cases, like:
Scroll bar indicates a much smaller document than there actually is
You might be able to avoid this by filling in the bottom of your document with many page breaks, but it'll be difficult to make the length perfect.
Scrolling past the point of currently-available content will show a blank page.
You could detect this using JavaScript and display a "loading" icon to let the user know what's going on.
Built-in "find" feature doesn't work
Hard to avoid this without the user downloading the entire document, but you could provide your own search feature for them to use instead (not as good but perhaps adequate).
Really though, you're probably best off with pagination with medium-sized pages. It's a very well understood design pattern that's a relatively easy (compared to other options at least) to implement and make fast.
Hope that helps!

Categories