I am creating a simple file manager for a CMS. I included a function that calculates the total size of a folder, but I noticed that it increases load times considerably.
The first thing that came up was an option to enable or disable performing those calculations and only display the size of individual files. But now I am thinking about the impact it would have on the server when, at some point and hypothetically speaking, the option is activated and in the directory there is a folder whose size is 1 TB or more (assuming that the file system allows it).
What do you think would happen to the server if it received a request to perform those big calculations? Would it be better to remove the option?
PHP Can overload the system if you done it on PHP alone. You can use PHP native support for your Operating System to handle the calculations, because as far as I know, your OS cache the indexing result so calculating the folder size is not a problem.
reference: how-to-get-directory-size-in-php
Related
Even if there seem to exist a few duplicate questions, I think this one is unique. I'm not asking if there are any limits, it's only about performance drawbacks in context of Apache. Or unix file system in general.
Lets say if I request a file from an Apache server
http://example.com/media/example.jpg
does it matter how many files there are in the same directory "media"?
The reason I'm asking is that my PHP application generates images on the fly.
Once created, it places it at the same location the PHP script would trigger due to ModRewrite. If the file exists, Apache will skip the whole PHP execution and directly serve the static image instead. Some kind of gateway cache if you want to call it that way.
Apache has basically two things to do:
Check if the file exists
Serve the file or forward the request to PHP
Till now, I have about 25.000 files with about 8 GB in this single directory. I expect it to grow at least 10 times in the next years.
While I don't face any issues managing these files, I have the slight feeling that it keeps getting slower when requesting them via HTTP. So I wondered if this is really what happens or if it's just my subjective impression.
Most file systems based on the Berkeley FFS will degrade in performance with large numbers of files in one directory due to multiple levels of indirection.
I don't know about other file systems like HFS or NTFS, but my suspicion is that they may well suffer from the same issue.
I once had to deal with a similar issue and ended up using a map for the files.
I think it was something like md5 myfilename-00001 yielding (for example): e5948ba174d28e80886a48336dcdf4a4 which I then put into a file named e5/94/8ba174d28e80886a48336dcdf4a4. Then a map file mapped 'myfilename-00001' to 'e5/94/8ba174d28e80886a48336dcdf4a4'. This not-quite-elegant solution worked for my purposes and it only took a little bit of code.
I am in process of rewriting some scripts to parse machine generated logs from perl to php
The files range from 20mb~400mb
I am running into this problem to decide if I should use file() or fopen()+fgets() combo to go through the file for some faster performance.
Here is the basic run through,
I check for file size before opening it, and if file is larger than 100mb(pretty rare case, but it does happen from time to time) I will go the fopen+fgets route since I only bumped the memory limit for the script to 384mb, any file larger than 100mb will have chance causing fatal error. Otherwise, I use file().
I am only going through the file once from beginning to the end in both method, line by line.
Here is the question, is it worth it to keep the file() part of the code to deal with the small files? I don't know how exactly file() (i use the SKIP_EMPTY_LINE option as well) works in php, does it map the file into the memory directly or does it shove line by line into the memory while going through it? I ran some benchmark on it, performance is pretty close, average difference is about 0.1s on 40mb file, and file() has advantage over fopen+fgets about 80% of the time(out of 200 test on the same fileset).
Dropping the file part could save me some memory from the system for sure, and considering I have 3 instance of the same script running at the same time, it could save me 1G worth of memory on a 12G system that's also hosting the database and other crap. But I don't want to let the performance of the script down also, since there is like 10k of these logs coming in per day, 0.1s difference actually adds up.
Any suggestion would help and TIA!
I would suggest sticking with one mechanism, like foreach(new \SplFileObject('file.log') as $line). Split your input files and process them in parallel, 2-3x per CPU core. Bonus: lower priority than database on same system. In PHP, this would mean spawning off N copies of the script at once, where each copy has its own file list or directory. Since you're talking about a rewrite and IO performance is an issue, consider other platforms with more capabilities here, eg Java 7 NIO, nodejs asynchronous IO, C# TPL.
I've got nearly a million images for my site and they are stored in one folder on my windows server.
Since opening this folder directly on desktop drive me and my CPU crazy I am wondering that whether fetching one of them using my PHP script for a HTTP request is also laborious. So, will separating them into different folders improve the performance?
No, the performance does not depend on the number of files that are in a directory. The reason why opening the folder in Windows explorer is slow is because it has to render icons and various other GUI related things for each file.
When the web server fetches a file, it doesn't need to do that. It just (more or less) directly goes to the location of the file on the disk.
EDIT: Millions is kind of pushing the limits of your file system (I assume NTFS in your case). It appears that anything over 10,000 files in a directory starts to degrade your performance. So not only from a performance standpoint, but from an organizational standpoint as well, you may want to consider separating them into subdirectories.
Often the best answer in a case like this is to benchmark it. It shouldn't be too hard to create a program that opens 1000 hard-coded file names and closes them. Run the test on your million-plus directory and another directory containing only those 1000 files being tested and see if there's a difference.
Not only does the underlying file system make a difference, but accessing over a network can affect the results too.
Separating your files into seperate directories will most likely help performance. But as mark suggests it's probably worth benchmarking
I'm developing a webapp in PHP, and the core library is 94kb in size at this point. While I think I'm safe for now, how big is too big? Is there a point where the script's size becomes an issue, and if so can this be ameliorated by splitting the script into multiple libraries?
I'm using PHP 5.3 and Ubuntu 10.04 32bit in my server environment, if that makes any difference.
I've googled the issue, and everything I can find pertains to PHP upload size only.
Thanks!
Edit: To clarify, the 94kb file is a single file that contains all my data access and business logic, and a small amount of UI code that I have yet to extract to its own file.
Do you mean you have 1 file that is 94KB in size or that your whole library is 94KB in?
Regardless, as long as you aren't piling everything into one file and you're organizing your library into different files your file size should remain manageable.
If a single PHP file is starting to hit a few hundred KB, you have to think about why that file is getting so big and refactor the code to make sure that everything is logically organized.
I've used PHP applications that probably included several megabytes worth of code; the main thing if you have big programs is to use a code caching tool such as APC on your production server. That will cache the compiled (to byte code) PHP code so that it doesn't have to process every file for every page request and will dramatically speed up your code.
I am doing some tests (lamp):
Basically I have 2 version of my custom framework.
A normal version, that includes ~20 files.
A lite version that has everything inside one single big file.
Using my lite version more and more i am seeing a time decrease for the load time. ie, from 0.01 of the normal to 0.005 of the lite version.
Let's consider just the "include" part. I always thought PHP would store the included .php files in memory so the file system doesn't have to retrieve them at every request.
Do you think condensing every classes/functions in one big file it's worth the "chaos" ?
Or there is a setting to tell PHP to store in memory the require php files?
Thanks
(php5.3.x, apache2.x, debian 6 on a dedicated server)
Don't cripple your development by mushing everything up in one file.
A speed up of 5ms is nothing compared to the pain you will feel maintaining such a beast.
To put it another way, a single incorrect index in your database can give you orders of magnitude more slowdown.
Your page would load faster using the "normal" version and omitting one 2kb image.
Don't do it, really just don't.
Or you can do this:
Leave the code as it is (located in
many different files)
Combine them in one file when you are ready to upload it to the production server
Here's what i use:
cat js/* > all.js
yuicompressor all.js -o all.min.js
First i combine them into a single file and then i minify them with the yui compressor.