I have a PHP file that is 90 KB. It pretty much does all the work for the back end on my site and it gets loaded on every page. However, if I split up the different functions in this file and separated them across multiple files, I could reduce the total size of what is loaded on average on each different page to 45 KB.
Is loading a 90 KB PHP file with "everything on it" for each page going to slow the performance of my site down? Does it make more sense to split up the 90 KB file into smaller files and only load what is necessary for each page? Or is 90 KB small enough that it shouldn't matter.
Turn on Opcache and PHP will store the pre-processed source in memory and then it doesn't matter.
When a file is accessed your OS will usually cache that file in RAM until the memory needs to be reclaimed, and then it doesn't matter.
What is in which file is far better classified as an organization problem for a project and its maintainers, and this incredibly marginal performance consideration [say it with me] doesn't matter.
Thank you Sammitch from the comments
Related
I want to sort a really very huge file of approx 80GB, the file contains Lastnames on each line (separated by \r\n)
Because I am on shared resources with 1GB RAM available, is there a way to do this without putting all the data into a array in PHP?
I earlier came up with the idea to pick up chunks of 1GB, sort each chunk as separate sorted file, than I ended up with 80 sorted 1GB files on the file system. But I was quickly failed with this approach when I realized that I yet need to sort these 80 files with each other, for example reading two 1GB files will make 2048Megs which exceeds the size of my RAM so how would this work?
Considering the resources available to me, is this a possible task? Is there any other way?
I've noticed that loading an image into imagick ($im = new Imagick($sFilename);) in php is taking 0.6 seconds for an 8MB image. This seems a bit slow to me, so I tried a test and read the file in using file_get_contents instead. About 0.005 seconds. Better. A bit too good tbh, I guess there's some caching going on there?
But I can load the same file a dozen times into imagick and it's always ~0.6 seconds.
Can I tell file_get_contents to bypass the system cache somehow, to give me a better idea of the raw speed with which an 8MB file can be retrieved from my hard drives?
Is there anything that can be done to speed up imagick? Or is 0.6 seconds for this operation completely normal?
The server has two 7200rpm HP sata drives in RAID 1.
Thanks.
Is there anything that can be done to speed up imagick?
Buy a faster CPU
Or is 0.6 seconds for this operation completely normal?
Yes.
This seems a bit slow to me
but it seems a long time for that.
I guess there's some caching going on there?
You're just guessing that something should be faster.....and you'r comparing it to a completely different operation. file_get_contents just reads the bytes in the file off the disk. Creating an image from a JPG means the computer has to read the bytes off the disk, and then decode them from the compressed data to be the actual image data.
If you want to see how much work has to be done during the compression, you can easily see this by writing the image out in an uncompressed format e.g.
$imagick = new Imagick("./testImage.jpg");
$imagick->setImageFormat('BMP');
$imagick->writeImage("./output.bmp");
And yes, this is longer than is reasonable for a HTTP request to take processing. Which is just another reason for why not running Imagick in a webserver is a good idea, but to instead run it as a background task.
I am in process of rewriting some scripts to parse machine generated logs from perl to php
The files range from 20mb~400mb
I am running into this problem to decide if I should use file() or fopen()+fgets() combo to go through the file for some faster performance.
Here is the basic run through,
I check for file size before opening it, and if file is larger than 100mb(pretty rare case, but it does happen from time to time) I will go the fopen+fgets route since I only bumped the memory limit for the script to 384mb, any file larger than 100mb will have chance causing fatal error. Otherwise, I use file().
I am only going through the file once from beginning to the end in both method, line by line.
Here is the question, is it worth it to keep the file() part of the code to deal with the small files? I don't know how exactly file() (i use the SKIP_EMPTY_LINE option as well) works in php, does it map the file into the memory directly or does it shove line by line into the memory while going through it? I ran some benchmark on it, performance is pretty close, average difference is about 0.1s on 40mb file, and file() has advantage over fopen+fgets about 80% of the time(out of 200 test on the same fileset).
Dropping the file part could save me some memory from the system for sure, and considering I have 3 instance of the same script running at the same time, it could save me 1G worth of memory on a 12G system that's also hosting the database and other crap. But I don't want to let the performance of the script down also, since there is like 10k of these logs coming in per day, 0.1s difference actually adds up.
Any suggestion would help and TIA!
I would suggest sticking with one mechanism, like foreach(new \SplFileObject('file.log') as $line). Split your input files and process them in parallel, 2-3x per CPU core. Bonus: lower priority than database on same system. In PHP, this would mean spawning off N copies of the script at once, where each copy has its own file list or directory. Since you're talking about a rewrite and IO performance is an issue, consider other platforms with more capabilities here, eg Java 7 NIO, nodejs asynchronous IO, C# TPL.
I'm developing a webapp in PHP, and the core library is 94kb in size at this point. While I think I'm safe for now, how big is too big? Is there a point where the script's size becomes an issue, and if so can this be ameliorated by splitting the script into multiple libraries?
I'm using PHP 5.3 and Ubuntu 10.04 32bit in my server environment, if that makes any difference.
I've googled the issue, and everything I can find pertains to PHP upload size only.
Thanks!
Edit: To clarify, the 94kb file is a single file that contains all my data access and business logic, and a small amount of UI code that I have yet to extract to its own file.
Do you mean you have 1 file that is 94KB in size or that your whole library is 94KB in?
Regardless, as long as you aren't piling everything into one file and you're organizing your library into different files your file size should remain manageable.
If a single PHP file is starting to hit a few hundred KB, you have to think about why that file is getting so big and refactor the code to make sure that everything is logically organized.
I've used PHP applications that probably included several megabytes worth of code; the main thing if you have big programs is to use a code caching tool such as APC on your production server. That will cache the compiled (to byte code) PHP code so that it doesn't have to process every file for every page request and will dramatically speed up your code.
I'm using PHP to make a simple caching system, but I'm going to be caching up to 10,000 files in one run of the script. At the moment I'm using a simple loop with
$file = "../cache/".$id.".htm";
$handle = fopen($file, 'w');
fwrite($handle, $temp);
fclose($handle);
($id being a random string which is assigned to a row in a database)
but it seems a little bit slow, is there a better method to doing that? Also I read somewhere that on some operating systems you can't store thousands and thousands of files in one single directory, is this relevant to CentOS or Debian? Bare in mind this folder may well end up having over a million small files in it.
Simple questions I suppose but I don't want to get scaling this code and then find out I'm doing it wrong, I'm only testing with chaching 10-30 pages at a time at the moment.
Remember that in UNIX, everything is a file.
When you put that many files into a directory, something has to keep track of those files. If you do an :-
ls -la
You'll probably note that the '.' has grown to some size. This is where all the info on your 10000 files is stored.
Every seek, and every write into that directory will involve parsing that large directory entry.
You should implement some kind of directory hashing system. This'll involve creating subdirectories under your target dir.
eg.
/somedir/a/b/c/yourfile.txt
/somedir/d/e/f/yourfile.txt
This'll keep the size of each directory entry quite small, and speed up IO operations.
The number of files you can effectively use in one directory is not op. system but filesystem dependent.
You can split your cache dir effectively by getting the md5 hash of the filename, taking the first 1, 2 or 3 characters of it and use it as a directory. Of course you have to create the dir if it's not exsists and use the same approach when retrieving files from cache.
For a few tens of thousands, 2 characters (256 subdirs from 00 to ff) would be enough.
File I/O in general is relatively slow. If you are looping over 1000's of files, writing them to disk, the slowness could be normal.
I would move that over to a nightly job if that's a viable option.
You may want to look at memcached as an alternative to filesystems. Using memory will give a huge performance boost.
http://php.net/memcache/