Storing TYPO3 cache files in AWS EFS

Storing TYPO3 cache files in AWS EFS - php

I have an application built on TYPO3 CMS and its hosted on AWS. The architecture is like this:
Auto scaling Group
Load Balancer
Two instances hosting the application
sometimes when opening the application, we have a PHP error :
The temporary cache file /var/www/htdocs/typo3temp/Cache/Code/fluid_template/file.tmp could not be written
The Exception is generated by the file FileBackEnd.PHP
if ($result === false) {
throw new \TYPO3\CMS\Core\Cache\Exception('The cache file "' . $cacheEntryPathAndFilename . '" could not be written.', 1222361632);
}
The full content of the file HERE.
I guess the reason of this error, is because the load balancer is sending traffic to the other instance where the file was not generated. AM I right?
To resolve this error, I am thinking of instead of storing the temporary files in the volumes of the instances, we should store them on a shared EFS.
Is that technically, TYPO3 wise, possible ?
P.S: TYPO3 v6.2
Thank you.

This answer applies mostly to the specific Fluid caching but can also be used on other caches, but should not be used on caches like "cache_core" which is essential. You've got a couple of options:
You can mark the particular cache "frozen" which means no new entries will be allowed. An exception will be thrown if trying to set in a frozen cache.
You can distribute the specific filesystem location containing the files (but you should be careful not to distribute too much, as this can negatively impact performance on some scaling setups).
I think you want the second option here which allows expired or new Fluid cache entries to be written, but distributes the resulting class files so they may be loaded on any of the slaves.
The frozen cache is only an option if you are able to 100% pre-generate all the compiled Fluid classes (which depending on your setup may not even be possible with a full crawl of the site).
Unfortunately you are on TYPO3 6.2 - had you been on v8 I would certainly recommend https://github.com/NamelessCoder/typo3-cms-fluid-precompiler-module as a nice way to control where those classes get compiled and stored in a cache, and catching all templates (when they exist in standard paths).

In the fileadmin file storage record, you may specify the path for temporary files - and use a different file storage for them.
So create a new AWS file storage, and set the fileadmin temp directory to a directory in that new file storage. See the documentation at https://docs.typo3.org/typo3cms/FileAbstractionLayerReference/singlehtml/Index.html#processed-files

Related

Too many cache file in system/cache folder codeigniter

I have a music site developed on CodeIgniter with youtube API without database. Recently I have noticed that too many files are getting generated in system/cache folder.
How can I stop generating this cache files? Note that I am not using any cache method.

system/cache is NOT default codeigniter cache directory at first. I would not store cache in there, as its framework main folder. Default is application/cache.
By default, CI does NOT cache anything. So your application is build with caching.
You told you don't use database, so it's not DB cache I assume.
Check in your app for somethign like "$this->load->driver('cache'".
Caching can be loaded WITHOUT additional parameters like
$this->load->driver('cache'); OR with parameters like
$this->load->driver('cache',array('adapther'=>'xxx'));
https://www.codeigniter.com/userguide3/libraries/caching.html
Now, in your app search for $this->cache->save OR $this->cache->file->save
if you found this, it means you are using CI caching.
Problem is, you cannot just remove cache loading, as app initiates cache object, and your app will fail, unless you rewrite all places where caching is used.
Now, you have few choices:
1.just clean cache dir with some script periodically via cron.
you can change cache folder permissions to NON writable, which will generate warnings in your logs, so logging should be disabled. This is not the right way IMHO, as can cause fatal errors/blank pages but just one of possible solutions. If file caching is used, this should not cause issues, while in other cases it could.
you can extend caching library, and simply create empty cache SAVE function. In this case your files will not be saved.
you can cache to memcached, if you have it on your server. Well, if your caching is written like $this->cache->file->{operation}, then you will need update all those to $this->cache->memcached->{operation}. If caching is written like $this->cache->{operation}, you can just adjust configuration something like
$this->load->driver('cache',array('adapther'=>'memcached'));
and set memcached server info in config file. (config/memcached.php)
You told you are not using any caching method. So you should not find any of code I put above.
The last thing I can think about is
$this->output->cache(xxx);
where xxx is cache time in minutes.
it forces entire generated page to be cached;
if you find such lines, you can try comment them out and see what happens
https://www.codeigniter.com/user_guide/general/caching.html
there is a good note: If you change configuration options that might affect your output, you have to manually delete your cache files.
If absolutely none from the examples above is not found, you might use some custom make caching.
Good luck!

Put this in your common controller
$this->output->delete_cache();

Laravel Cache: how do identify what writes data to it

Recently, in one of our projects that use Laravel 5.4, we have noticed that some data is being cached in /storage/framework/cache/data - we are using file cache. The contents of the files in the cache are things like: 1529533237i:1;. Several files are created in the cache throughout the day with content similar to that. So many files are created that we have to clean this cache periodically in order not to run into disk space issues by running out of inodes.
I know that an alternative to using file cache are things like Redis or Memcache, but the issue is, we're not sure what is this data being cached or what component of the project is caching it. We do use several external libraries so it could be one of many, but we don't know for sure what. I've already looked into all configuration files of the project, but couldn't identify anything that is obviously controlling data caching.
Are there any recommendations on trying to identify which piece of code is writing this data so we can better handle the caching of this data, whatever it may be?

Laravel has several events that dispatch during caching.
Create a new listener that listens on the Illuminate\Cache\Events\KeyWritten event. You could log the backtrace to see exactly what leads to specific items being cached.

Caching a fully dynamic website

I made a dynamic site that has over 20,000 pages and once a page is created there is no need to update it for at-least one month or even a year. So I'm caching every page when it is first created and then delivering it from a static html page
I'm running a php script (whole CMS is on PHP) if (file_exists($filename)) to first search for the filename from the url in cache-files directory and if it matches then deliver it otherwise generate the page and cache it for latter use. Though it is dynamic but still my url does not contain ?&=, I'm doing this by - and breaking it into array.
What I want to know is will it create any problem to search for a file from that huge directory?
I saw a few Q/A like this where it says that there should not be problem with number of files I can store on directory with ext2 or ext3 (I guess my server has ext3) file system but the speed of creating a new file will decrease rapidly after there are files over 20-30,000.
Currently I'm on a shared host and I must cache files. My host a soft limit of 100,000 files in my whole box which is good enough so far.
Can someone please give me any better idea about how to cache the site.

You shouldn't place all of the 20K files in a single directory.
Divide them into directories (by letter, for example), so you access:
a/apple-pie-recipe
j/john-doe-for-presidency
etc.
That would allow you to place more files with less constraints on the file-system, which would increase the speed. (since the FS doesn't need to figure out where your file is in the directory along with other 20k files, it needs to look in about a hundred)

there should not be problem with number of files I can store on directory with ext2 or ext3
That's rather an old document - there are 2 big differences between ext2 and ext3 - journalling is one, the other is H-TREE indexing of directories (which reduces the impact of storing lots of files in the same directory). While it's trivial to add journalling to an ext2 filesystem and mount it as ext3, this does not give the benefits of dir_index - this requires a full fsck.
Regardless of the filesystem, using a nested directory structure makes the system a lot more manageable and scalable - and avoids performance problems on older filesystems.
(I'm doing 3 other things since I started writing this and see someone else has suggested something similar - however Madara's approach doesn't give an evenly balanced tree, OTOH having a semantic path may be more desirable)
e.g.
define('GEN_BASE_PATH','/var/data/cache-failes');
define('GEN_LEVELS', 2);
function gen_file_path($id)
{
$key=md5($id);
$fname='';
for ($x=0; $x<=GEN_LEVELS; $x++) {
$fname=substr($key, 0, 1) . "/";
$key=substr($key,1);
}
return GEN_BASE_PATH . "/" . $fname . $key;
}
However the real way to solve the problem would be to serve the content with the right headers and run a caching reverse proxy in front of the webserver (though this isn't really practical for a very lwo volume site).

How to self-update PHP+MySQL CMS?

I'm writing a CMS on PHP+MySQL. I want it to be self-updatable (throw one click in admin panel). What are the best practices?
How to compare current version of cms and a version of the update (application itself and database). Should it just download zip archive, upzip it and overwrite files? (but what to do with files that are no longer used). How to check if an update is downloaded correctly? Also it supports modules and I want this modules to be downloadable from the admin panel of cms.
And how should I update MySQL tables?

Keep your code in a separate location from configuration and otherwise variable files (uploaded images, cache files, etc.)
Keep the modules separate from the main code as well.
Make sure your code has file system permissions to change itself (use SuPHP for example).
If you do these, simplest would be to completely download the new version (no incremental patches), and unzip it to a directory adjacent to the one containing the current version. Because there won't be variable files inside the code directory, you can just remove or rename the old one and rename the new one to replace it.
You can keep the version number in a global constant in the code.
As for MySQL, there's no other way than making an upgrade script for every version that changes the DB layout. Even automatic solutions to change the table definition can't know how to update the existing data.

A slightly more experimental solution could be to use something like the phpsvnclient library.
With features:
List all files in a given SVN repository directory
Retrieve a given revision of a file
Retrieve the log of changes made in a repository or in a given file between two revisions
Get the repository latest revision
This way you can see if there are new files, removed files or updated files and only change those in your local application.
I recon this will be a little harder to implement, but the benefit would probably be that it is easier and quicker to add updates to your CMS.

You have two scenarios to deal with:
The web server can write to files.
The web server can not write to files.
This just dictates if you will be decompressing a ZIP file or using FTP to update the files. In ether case, your first step is to take a dump of the database and a backup of the existing files, so that the user can roll back if something goes horribly wrong. As others have said, its important to keep anything that the user will likely customize out of the scope of the update. Wordpress does this nicely. If a user has made changes to core logic code, they are likely smart enough to resolve any merge conflicts on their own (and smart enough to know that a one click upgrade is probably going to lose their modifications).
Your second step is to make sure that your script doesn't die if the browser is closed. This is a process that really should not be interrupted. You could accomplish this via ignore_user_abort(true);, or some other means. Or, if you like, allow the user to check a box that says "Keep going even if I get disconnected". I'm assuming that you'll be handling errors internally.
Now, depending on permissions, you can either:
Compress the files to be updated to the system /tmp directory
Compress the files to be updated to a temporary file in the home directory
Then you are ready to:
Download and decompress the update en situ , or in place.
Download and decompress the update to the system's /tmp directory and use FTP to update the files in the web root
You can then:
Apply any SQL changes as needed
Ask the user if everything went OK
Roll back if things went badly
Clean up your temp directory in the system /tmp directory, or any staging files in the user's web root / home directory.
The most important aspect is making sure you can roll back changes if things went bad. The other thing to ensure is that if you use /tmp, be sure to check permissions of your staging area. 0600 should do nicely.
Take a look at how Wordpress and others do it. If your choice of licenses and their's agree, you might even be able to re-use some of that code.
Good luck with your project.

There is a SQL library called SQLOO (that I created) that attempts to solve this problem. It's a little rough still, but the basic idea is that you setup the SQL schema in PHP code and then SQLOO changes the current database schema to match the code. This allows for the SQL schema and attached PHP code to be changed together and in much smaller chunks.
http://code.google.com/p/sqloo/
http://code.google.com/p/sqloo/source/browse/#svn/trunk/example <- examples

Based on experience with a number of applications, CMS and otherwise, this is a common pattern:
Upgrades are generally one-way. It's possible to take a snapshot of full system state for a restore upon failure, but to restore usually entails losing any data/content/logs added to the system since the upgrade. Performing an incremental rollback can put data at risk if something were not converted properly (e.g. database table changes, content conversions, foreign key constraints, index creation, etc.) This is especially true if you've made customizations that rollback scripts couldn't possibly account for.
Upgrade files are packaged with some means of authentication/verification, such as md5 or sha1 hashes and/or digital signature to ensure it came from a trusted source and was not tampered. This is particularly important for automated upgrade processes. Suppose a hacker exploited a vulnerability and told it to upgrade from a rogue source.
Application should be in an offline mode during the upgrade.
Application should perform a self-check after an upgrade.

I agree with Bart van Heukelom's answer, it's the most usual way of doing it.
The only other option would be to turn your CMS into a bunch of remote Web Services/scripts and external CSS/JS files that you host in one location only.
Then everyone using your CMS would connect to your central "CMS server" and all that would be on their (calling) server is a bunch of scripts to call your Web Services/scripts that do all the processing and output. If you went down this route you'd need to identify/authenticate each request so that you returned the corresponding data for the given CMS user.

Optimize PHP framework loading

I have a custom built application framework written in PHP which I have been tasked to optimize. This framework is a shared codebase which loads MVC "modules" to provide various functionality. Each module is a directory containing multiple PHP classes for controllers and models and PHP files for views.
The entire framework loads for almost all requests, including images and stylesheets. This is because the modules are designed to be self contained packages, and they may contain images, stylesheets, javascripts or other static files within them. Because of this, there is overhead in serving what would normally be a very simple request because the system has to load all the modules just to determine what modules are available from which to pull static files.
The general process for handling any given URI is as follows:
All base system classes are included
A global exception handler and some global variables are set
A system-wide configuration file is read. (This is a file filled with PHP statements to set config variables)
A connection to the database is made
The modules folder is scanned via opendir() and each module is verified to be valid and free of syntax errors, and then included.
A second configuration file is loaded which sets up configuration for the modules
A new instance of each module is created (calling it's __construct() method and possibly creating other database connections, performing individual startup routines, etc)
The URI is examined and passed off to the appropriate module(s)
Steps 1 - 7 will almost always be exactly the same. They will always perform the exact same operations unless new modules are installed or the configuration file is changed. My question is, what could be done to optimize the process? Ideally, I'd like some sort of way of handling multiple requests, similar to the way KeepAlive requests work. All the overhead of initializing all modules seems like a waste just to readfile() a single image or css file, just to have that same overhead again to serve another request.
Is there any way to reduce the overhead of a framework like this? (I don't even know if anyone can help me without studying all the code, this may be a hopeless question)

It's generally a bad idea to tie up a dynamic web server thread serving static content. Apache, IIS, Nginx, et. al. already do everything you need to serve up these files. If each static asset is located somewhere within the public docroot and has a unique URL, you shouldn't need to worry about PHP being involved in loading them.
Furthermore, if you can ensure that your cache-related headers (ETag, Last-Modified, etc.) are being generated correctly, and each client should only request each file once. Free caching == win!

Is there a reason all of the modules need to be loaded for every request? Why not allow controllers to specify which modules they require to be loaded, and only load those which are requested?

Why not move step 8 before step 5? Examine the URL first, then load modules based on the results.
Another one:
each module is verified to be valid and free of syntax errors, and then included.
Are you really syntax checking files before including() them? If so, why is this necessary?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.