PHP script: How big is too big? - php

I'm developing a webapp in PHP, and the core library is 94kb in size at this point. While I think I'm safe for now, how big is too big? Is there a point where the script's size becomes an issue, and if so can this be ameliorated by splitting the script into multiple libraries?
I'm using PHP 5.3 and Ubuntu 10.04 32bit in my server environment, if that makes any difference.
I've googled the issue, and everything I can find pertains to PHP upload size only.
Thanks!
Edit: To clarify, the 94kb file is a single file that contains all my data access and business logic, and a small amount of UI code that I have yet to extract to its own file.

Do you mean you have 1 file that is 94KB in size or that your whole library is 94KB in?
Regardless, as long as you aren't piling everything into one file and you're organizing your library into different files your file size should remain manageable.
If a single PHP file is starting to hit a few hundred KB, you have to think about why that file is getting so big and refactor the code to make sure that everything is logically organized.

I've used PHP applications that probably included several megabytes worth of code; the main thing if you have big programs is to use a code caching tool such as APC on your production server. That will cache the compiled (to byte code) PHP code so that it doesn't have to process every file for every page request and will dramatically speed up your code.

Related

Do many files in a single directory cause longer loading time under Apache?

Even if there seem to exist a few duplicate questions, I think this one is unique. I'm not asking if there are any limits, it's only about performance drawbacks in context of Apache. Or unix file system in general.
Lets say if I request a file from an Apache server
http://example.com/media/example.jpg
does it matter how many files there are in the same directory "media"?
The reason I'm asking is that my PHP application generates images on the fly.
Once created, it places it at the same location the PHP script would trigger due to ModRewrite. If the file exists, Apache will skip the whole PHP execution and directly serve the static image instead. Some kind of gateway cache if you want to call it that way.
Apache has basically two things to do:
Check if the file exists
Serve the file or forward the request to PHP
Till now, I have about 25.000 files with about 8 GB in this single directory. I expect it to grow at least 10 times in the next years.
While I don't face any issues managing these files, I have the slight feeling that it keeps getting slower when requesting them via HTTP. So I wondered if this is really what happens or if it's just my subjective impression.
Most file systems based on the Berkeley FFS will degrade in performance with large numbers of files in one directory due to multiple levels of indirection.
I don't know about other file systems like HFS or NTFS, but my suspicion is that they may well suffer from the same issue.
I once had to deal with a similar issue and ended up using a map for the files.
I think it was something like md5 myfilename-00001 yielding (for example): e5948ba174d28e80886a48336dcdf4a4 which I then put into a file named e5/94/8ba174d28e80886a48336dcdf4a4. Then a map file mapped 'myfilename-00001' to 'e5/94/8ba174d28e80886a48336dcdf4a4'. This not-quite-elegant solution worked for my purposes and it only took a little bit of code.

PHP read text files very slowly

I have a large number of files in a directory and I'm using php to read it to a string. For example, a file's path looks like this: filerootdir/dir1/dir2/dir3/dir4/dir5/dir6/file.txt.
I have a million such txt files. Based on different parameter, php will read the txt file and display it as a part of the webpage. I'm testing the php program on Windows 7 Pro right now. When a file's absolute path is short, e.g., filerootdir/dir1/file.txt, it's pretty fast to load. But when the absolute path is long, it is VERY slow. I'm wondering if there is a better solution for this problem.
I'm testing my program under windows WAMP, but it will be moved to LAMP later eventually. Will the file loading program fun faster on linux servers? Could this be a problem of Windows operating system?
The code I'm using looks like the following:
if (file_exists($filePath.".html")) {
$code = file_get_contents($filePath.".html");
}
Thanks very much!
You might consider storing the data in a database - if you are using this number of records, especially if they are small files, a database will probably be more efficient. Before you do, read up on indexes - they can grab the right record out of billions in a tiny fraction of a second.

PHP file() vs fopen()+fgets() performance debate

I am in process of rewriting some scripts to parse machine generated logs from perl to php
The files range from 20mb~400mb
I am running into this problem to decide if I should use file() or fopen()+fgets() combo to go through the file for some faster performance.
Here is the basic run through,
I check for file size before opening it, and if file is larger than 100mb(pretty rare case, but it does happen from time to time) I will go the fopen+fgets route since I only bumped the memory limit for the script to 384mb, any file larger than 100mb will have chance causing fatal error. Otherwise, I use file().
I am only going through the file once from beginning to the end in both method, line by line.
Here is the question, is it worth it to keep the file() part of the code to deal with the small files? I don't know how exactly file() (i use the SKIP_EMPTY_LINE option as well) works in php, does it map the file into the memory directly or does it shove line by line into the memory while going through it? I ran some benchmark on it, performance is pretty close, average difference is about 0.1s on 40mb file, and file() has advantage over fopen+fgets about 80% of the time(out of 200 test on the same fileset).
Dropping the file part could save me some memory from the system for sure, and considering I have 3 instance of the same script running at the same time, it could save me 1G worth of memory on a 12G system that's also hosting the database and other crap. But I don't want to let the performance of the script down also, since there is like 10k of these logs coming in per day, 0.1s difference actually adds up.
Any suggestion would help and TIA!
I would suggest sticking with one mechanism, like foreach(new \SplFileObject('file.log') as $line). Split your input files and process them in parallel, 2-3x per CPU core. Bonus: lower priority than database on same system. In PHP, this would mean spawning off N copies of the script at once, where each copy has its own file list or directory. Since you're talking about a rewrite and IO performance is an issue, consider other platforms with more capabilities here, eg Java 7 NIO, nodejs asynchronous IO, C# TPL.

Average PHP file size, or recommended limit?

I quickly wrote this up, http://www.ionfish.org/php-shrink/ where a user uploads a .php file with comments and spaces in it, and it will "minify" it for small file-size.
It does exactly this: http://devpro.it/remove_phpcomments/ except it's web-based, and instant. You get a download prompt after you click upload, to save the processed file.
My questions:
Is a 2-megabyte upload limit enough? Should I make it 4, 8, etc. ?
Is the output (requires you test it with some random PHP) satisfactory, or should it be tweaked?
Would there be any use for this for the general public, and should I add support to minify HTML, CSS, JS, and even C++ and Python etc. ?
EDIT: Changed to 250K for now, will see if it suffices.
A file that needs such a minification is proof of bad practice.
The file size of a single php file should never be a problem when using best practice.
You should not be uploading files during your deployment anyways. Instead you should be checking out files from your VCS and you don't want 'minified' files in your VCS.
Such a minification will not improve site performance either since every serious project uses opcode caching.
Conclusion: Such a service is not needed.

Web Speed: it's worth it to put every included file in only one?

I am doing some tests (lamp):
Basically I have 2 version of my custom framework.
A normal version, that includes ~20 files.
A lite version that has everything inside one single big file.
Using my lite version more and more i am seeing a time decrease for the load time. ie, from 0.01 of the normal to 0.005 of the lite version.
Let's consider just the "include" part. I always thought PHP would store the included .php files in memory so the file system doesn't have to retrieve them at every request.
Do you think condensing every classes/functions in one big file it's worth the "chaos" ?
Or there is a setting to tell PHP to store in memory the require php files?
Thanks
(php5.3.x, apache2.x, debian 6 on a dedicated server)
Don't cripple your development by mushing everything up in one file.
A speed up of 5ms is nothing compared to the pain you will feel maintaining such a beast.
To put it another way, a single incorrect index in your database can give you orders of magnitude more slowdown.
Your page would load faster using the "normal" version and omitting one 2kb image.
Don't do it, really just don't.
Or you can do this:
Leave the code as it is (located in
many different files)
Combine them in one file when you are ready to upload it to the production server
Here's what i use:
cat js/* > all.js
yuicompressor all.js -o all.min.js
First i combine them into a single file and then i minify them with the yui compressor.

Categories