PHP script to delete zero byte files - php

I'm having a problem with zero byte files. Sometimes, randomly it seems, the server I'm working with adds zero byte files into a directory. These files break another script. I can delete the files manually with no problem, but becuase of the extremely tight controls on the server, I can't do things like run batch scripts or cron jobs.
What I think I need is a small script on the front page (the only page, actually) that will run a script every time someone visits. It won't get huge traffic. The script would target a specific directory and delete zero byte files.
I've been experimenting with just something as basic as finding and displaying file sizes, and I'm not having much luck. I've even searched online for solutions to similar problems and I haven't found anything.
I don't expect you to do my coding for me (although I wouldn't turn it down! ; ) ), but if someone could help me with a simple way of even just displaying ONLY the zero byte file names, I might be able to proceed on my own from there. I just can't find a way that makes sense to me. And sorry to say, I have essentially no control over the server.

You can use DirectoryIterator class to loop through the files in the specified directory and unlink() them.

Related

Do many files in a single directory cause longer loading time under Apache?

Even if there seem to exist a few duplicate questions, I think this one is unique. I'm not asking if there are any limits, it's only about performance drawbacks in context of Apache. Or unix file system in general.
Lets say if I request a file from an Apache server
http://example.com/media/example.jpg
does it matter how many files there are in the same directory "media"?
The reason I'm asking is that my PHP application generates images on the fly.
Once created, it places it at the same location the PHP script would trigger due to ModRewrite. If the file exists, Apache will skip the whole PHP execution and directly serve the static image instead. Some kind of gateway cache if you want to call it that way.
Apache has basically two things to do:
Check if the file exists
Serve the file or forward the request to PHP
Till now, I have about 25.000 files with about 8 GB in this single directory. I expect it to grow at least 10 times in the next years.
While I don't face any issues managing these files, I have the slight feeling that it keeps getting slower when requesting them via HTTP. So I wondered if this is really what happens or if it's just my subjective impression.
Most file systems based on the Berkeley FFS will degrade in performance with large numbers of files in one directory due to multiple levels of indirection.
I don't know about other file systems like HFS or NTFS, but my suspicion is that they may well suffer from the same issue.
I once had to deal with a similar issue and ended up using a map for the files.
I think it was something like md5 myfilename-00001 yielding (for example): e5948ba174d28e80886a48336dcdf4a4 which I then put into a file named e5/94/8ba174d28e80886a48336dcdf4a4. Then a map file mapped 'myfilename-00001' to 'e5/94/8ba174d28e80886a48336dcdf4a4'. This not-quite-elegant solution worked for my purposes and it only took a little bit of code.

Aptana Studio 3 with PHP - constant indexing

I'm using Aptana Studio 3 with several big PHP projects (10.000+ files) and it suffers from very slow indexing of PHP files.... which takes 10-20 minutes to complete and starts every time at the startup of Aptana, and also sometimes at random moments, for example when synchronizing with SVN...
In the progress view I get multiple 'Indexing new PHP Modules' items.
All the time it is doing this Aptana is unusably slow. I don't get why this indexing starts over and over again on files that aren't new at all!
I already turned off automatic refreshes and automatic build. If I exclude 'PHP' from the 'Project Natures' in the properties of the projects, the indexing stops, but then I don't have code completion in PHP files.
I cleaned all projects, created a new workspace, etc. and nothing helps... This happens on multiple pc's (Windows) so I guess more people get this behaviour.
Any possible solutions?
UPDATE
I added the folder of my workspace to the 'ignore'-folders of my virus scanner (Microsoft Security Essentials). At first this seemed to work, but then the indexing started again...
Seems like you did the right steps to try and resolve it, and it also seems we should have a ticket for that, so I created one at https://jira.appcelerator.org/browse/APSTUD-4500 (please add yourself as a 'watcher').
One more thing to try is to break down a big project into a few smaller ones (whenever possible, of course). The indexer creates a binary index file for each project, and this file size is proportional to amount of classes, functions, variables and constants you have in your project. In case, for some reason (e.g. a bug) this file gets corrupted, a re-index will happen, so having multiple smaller projects may help with that. Again... just an idea.

How many PHP includes are too many?

Each page on my website is rendered using PHP.
Each PHP file uses around 10 includes. So for every page that is displayed, the server needs to fetch 10 files, in addition to the rest of its functions (MySQL, etc).
Should I combine them into a single include file? Will that make ANY difference to the real-world speed? It's not a trivial task as there would be a spaghetti of variable scope to sort out.
Include files are processed on the server, so they're not "fetched" by the browser. The performance difference of using includes vs. copy and pasting the code or consolidating files is so negligible (and I'm guessing we're talking about in the 10 ms to 100 ms range, at the absolute most), that it isn't at all worth it.
Feel free to include and require to your heart's content. Clean code is substantially more important than shaving less than 100 ms off a page load. If you're building something where timing is that critical, you shouldn't be using PHP anyway.
What takes time is figuring out where the files are actually located in the include path. If you got multiple locations in your include path, PHP will search each location until it either finds the file or fails (in which case it throws an error). That's why you should put the include path where most of the included files are to be found on top of the include path.
If you use absolute paths in your include path, PHP will cache the path in the realpath cache, but note that this gets stale very quickly. So yes, including ten files is potentially slower than including one large file, simply because PHP has to check the include path more often. However, unless your webserver is a really weak machine, ten files is not enough to make an impact. This gets only interesting when including hundreds of files or have many locations to search, in which case you should use an OpCode cache anyway.
Also note that when including files, it is not good practice to include each and every file right at the beginning, because you might be including files that are never called by your application for a specific request.
Reference
http://de2.php.net/manual/en/ini.core.php#ini.include-path
http://de2.php.net/manual/en/ini.core.php#ini.sect.performance
http://en.wikipedia.org/wiki/List_of_PHP_accelerators
Although disk I/O operations among the biggest performance-eaters, a regular site won't notice any sensible number of includes.
Before you hit any problems with includes, you probably already would have some opcode cache that eliminates this problem too.
include\ andrequires` only open file on the server side, but that might be time consumming depending on the hardware/filesystem, etc.
Anyway, if you can, use autoloader. Only needed files will be loaded that way.
Then if you think included files are a source of slowdown (and I think there is a lot of other points to look for improvement before), you can try to automatically merge the files. You still have one file per class when developping, but you can build a file that contains each class' definition to have only one include (something like cat <all your included file>.php > to_include.php).

Deleting large chunks of PHP effectively

I've just inherited a project, and been told that an entire folder, "includes/" needs to be removed due to licensing issues -- We don't have the right to redistribute the files in that folder, so we need to cut our dependencies on them, and fix whatever breaks. I've been told "Less than 5% of the lines in that folder are ever even called by our program", but I have no way of verifying this.
There are about 50 files in the folder, each with a couple hundred lines of code. There is no unit testing currently in place. There's one master file, include.php, that require()s all 49 other files, so I can't just grep for any file doing import() on includes/.*.
This is about as much detail as I've really figured out at this point. I spent all last week reading through the files in the includes/ folder, and it won't be hard to rewrite any of this, but I'm having trouble deciding where to start. I tried deleting the folder and slowly fixing things that break, but I'm afraid that this route will cause me to miss some crucial functions in my rewrite.
Can anyone point me in a direction to get started? Are there tools that will simplify this process? I'm looking at xdebug right now, but I'm not sure exactly how I'd use it for this.
You may want to search for "php code coverage." That should help you figure out what code is used. For instance, this appears like it might help:
http://www.xdebug.org/docs/code_coverage
Your initial approach isn't bad at all. It's certainly a reasonable place to start:
delete that code that isn't allowed.
try to run what's left.
if things break: create a stub for a method that is now missing, and set it to return some sensible "default" value for now.
goto 2.
Then, itemize all the things that were missing, and make a sensible schedule to re-implement each thing.
I would start by grepping for files that reference include.php. Check through them if they're manageable, one by one. Then I'd grep for each of the functions in the /include/*php files. See if they're called anywhere, find 'em, replace 'em.
Because PHP is so dynamically typed, I don't think there's going to be a tool for this.
(Eagerly awaiting someone to prove me wrong because I have similar tasks all the time... )
See SD PHP Test Coverage Tool. It will provide a visual view of what code actually executes, as well as a report on what parts of files are used (including "no parts", which is your cue that
the code is a likely candidate to delete).
It doesn't require any hand-modifications of your code, or any unit tests to run it.
To answer my own question, I wound up using xdebug profiler to do the job, as I was initially investigating (after a friend's suggestion prompted me to take a second look).
In my /etc/php5/apache2/conf.d/xdebug.ini (on ubuntu 9.10), I set xdebug.profiler_enable=1 and xdebug.profiler_output_dir=/var/log/xdebug/, then loaded up the resulting cachegrind files with KCacheGrind and just ran a search on filenames for "includes/".
Now I have a mountain of work ahead of me to remove all this, but at least I've got a good overview of what I'll be modifying!

Will including unnecessary php files slow down website?

The question might prompt some people to say a definitive YES or NO almost immediately, but please read on...
I have a simple website where there are 30 php pages (each has some php server side code + HTML/CSS etc...). No complicated hierarchy, nothing. Just 30 pages.
I also have a set of purely back-end php files - the ones that have code for saving stuff to database, doing authentication, sending emails, processing orders and the like. These will be reused by those 30 content-pages.
I have a master php file to which I send a parameter. This specifies which one of those 30 files is needed and it includes the appropriate content-page. But each one of those may require a variable number of back-end files to be included. For example one content page may require nothing from back-end, while another might need the database code, while something else might need the emailer, database and the authentication code etc...
I guess whatever back-end page is required, can be included in the appropriate content page, but one small change in the path and I have to edit tens of files. It will be too cumbersome to check which content page is requested (switch-case type of thing) and include the appropriate back-end files, in the master php file. Again, I have to make many changes if a single path changes.
Being lazy, I included ALL back-end files inthe master file so that no content page can request something that is not included.
First question - is this a good practice? if it is done by anyone at all.
Second, will there be a performance problem or any kind of problem due to me including all the back-end files regardless of whether they are needed?
EDIT
The website gets anywhere between 3000 - 4000 visits a day.
You should benchmark. Time the execution of the same page with different includes. But I guess it won't make much difference with 30 files.
But you can save yourself the time and just enable APC in the php.ini (it is a PECL extension, so you need to install it). It will cache the parsed content of your files, which will speed things up significantly.
BTW: There is nothing wrong with laziness, it's even a virtue ;)
If your site is object-oriented I'd recommend using auto-loading (http://php.net/manual/en/language.oop5.autoload.php).
This uses a magic method (__autoload) to look for a class when needed (it's lazy, just like you!), so if a particular page doesn't need all the classes, it doesn't have to get them!
Again, though, this depends on if it is object-oriented or not...
It will slow down your site, though probably not by a noticable amount. It doesn't seem like a healthy way to organize your application, though; I'd rethink it. Try to separate the application logic (eg. most of the server-side code) from the presentation layer (eg. the HTML/CSS).
it's not a bad practice if the files are small and contains just definition and settings.
if they actually run code, or extremely large, it will cause a performance issue.
now - if your site has 3 visitors an hour - who cares, if you have 30000... that's another issue, and you need to work harder to minimize that.
You can migitate some of the disadvantages of PHP code-compiling by using XCache. This PHP module will cache the PHP-opcode which reduces compile time and performance.
Considering the size of your website; if you haven't noticed a slowdown, why try to fix it?
When it comes to larger sites, the first thing you should do is install APC. Even though your current method of including files might not benefit as much from APC as it could, APC will still do an amazing job speeding stuff up.
If response-speed is still problematic, you should consider including all your files. APC will keep a cached version of your sourcefiles in memory, but can only do this well if there are no conditional includes.
Only when your PHP application is at a size where memory exhaustion is a big risk (note that for most large-scale websites Memory is not the bottleneck) you might want to conditionally include parts of your application.
Rasmus Lerdorf (the man behind PHP) agrees: http://pooteeweet.org/blog/538
As others have said, it shouldn't slow things down much, but it's not 'ideal'.
If the main issue is that you're too lazy to go changing the paths for all the included files (if the path ever needs to be updated in the future). Then you can use a constant to define the path in your main file, and use the constant any time you need to include/require a file.
define('PATH_TO_FILES', '/var/www/html/mysite/includes/go/in/here/');
require_once PATH_TO_FILES.'database.php';
require_once PATH_TO_FILES.'sessions.php';
require_once PATH_TO_FILES.'otherstuff.php';
That way if the path changes, you only need to modify one line of code.
It will indeed slow down your website. Most because of the relative slow loading and processing of PHP. The more code you'd like to include, the slower the application will get.
I live by "include as little as possible, as much as necessary" so i usually just include my config and session handling for everything and then each page includes just what they need using an include path defined in the config include, so for path changes you still just need to change one file.
If you include everything the slowdown won't be noticeable until you get a lot of page hits (several hits per second) so in your case just including everything might be ok.

Categories