Find and replace in multiple files - php

OK, whats the best solution in php to search through a bunch of files contents for a certain string and replace it with something else.
Exactly like how notepad++ does it but obviously i dont need the interface to that.

foreach (glob("path/to/files/*.txt") as $filename)
{
$file = file_get_contents($filename);
file_put_contents($filename, preg_replace("/regexhere/","replacement",$file));
}

So I recently ran into an issue in which our web host converted from PHP 5.2 to 5.3 and in the process it broke our installation of Magento. I did some individual tweaks that were suggested, but found that there were still some broken areas. I realized that most of the problems were related to an issue with the "toString" function present in Magento and the now deprecated PHP split function. Seeing this, I decided that I would try to create some code that would find and replace all the various instances of the broken functions. I managed to succeed in creating the function, but unfortunately the shot-gun approach didn't work. I still had errors afterwards. That said, I feel like the code has a lot of potential and I wanted to post what I came up with.
Please use this with caution, though. I'd recommended zipping a copy of your files so that you can restore from a backup if you have any issues.
Also, you don't necessarily want to use this as is. I'm providing the code as an example. You'll probably want to change what is replaced.
The way the code works is that it can find and replace whatever is in the folder it is put in and in the sub folders. I have it tweaked so that it will only look for files with the extension PHP, but you could change that as needed. As it searches, it will list what files it changes. To use this code save it as "ChangePHPText.php" and upload that file to wherever you need the changes to happen. You can then run it by loading the page associated with that name. For example, mywebsite.com\ChangePHPText.php.
<?php
## Function toString to invoke and split to explode
function FixPHPText( $dir = "./" ){
$d = new RecursiveDirectoryIterator( $dir );
foreach( new RecursiveIteratorIterator( $d, 1 ) as $path ){
if( is_file( $path ) && substr($path, -3)=='php' && substr($path, -17) != 'ChangePHPText.php'){
$orig_file = file_get_contents($path);
$new_file = str_replace("toString(", "invoke(",$orig_file);
$new_file = str_replace(" split(", " preg_split(",$new_file);
$new_file = str_replace("(split(", "(preg_split(",$new_file);
if($orig_file != $new_file){
file_put_contents($path, $new_file);
echo "$path updated<br/>";
}
}
}
}
echo "----------------------- PHP Text Fix START -------------------------<br/>";
$start = (float) array_sum(explode(' ',microtime()));
echo "<br/>*************** Updating PHP Files ***************<br/>";
echo "Changing all PHP containing toString to invoke and split to explode<br/>";
FixPHPText( "." );
$end = (float) array_sum(explode(' ',microtime()));
echo "<br/>------------------- PHP Text Fix COMPLETED in:". sprintf("%.4f", ($end-$start))." seconds ------------------<br/>";
?>

Related

Recursively Delete Matching Files via PHP

I recently reworked the naming convention of some images on our website. When I uploaded the images with altered names, I ended up getting the images duplicated, one copy with the old naming convention and one with the new naming convention. The images numbered in the thousands and so I didn't want to manually delete them all.
So I decided that I needed to figure out a php script that would be capable of deleting the old images from the site. Luckily the old images were consistently named with either an ending of f.jpg or s.jpg. So all I had to do is find all the files with those endings and delete them. I thought it was a fairly straightforward thing, but for whatever reason the several different solutions I found listed online didn't work right. I ended up going back to some old code I had posted on Stackoverflow for a different purpose and reworked it for this. I'm posting that code as the answer to my problem in case it might be useful to anyone else.
Below is my solution to finding files matching a certain naming convention in a selected folder and its sub-folders and deleting them. To make it work for your situation. You'll want to place it above the directory that you want to delete, and you'll specify the specific folder by replacing the part where I have ./media/catalog/. You'll also want to replace the criteria I have selected, namely (substr($path, -5)=='f.jpg' || substr($path, -5)=='s.jpg'). Note that the 5 in the preceding code refers to how many letters are being matched in the criteria. If you wanted to simply match ".jpg" you would replace the 5 with a 4.
As always, when working with code that can effect a lot of files, be sure to make a backup in case the code doesn't work the way you expect it will.
<?php #stick ClearOldJpg.php above the folder you want to delete
function ClearOldJpg(){
$iterator = new RecursiveIteratorIterator(new RecursiveDirectoryIterator("./media/catalog/"));
$files = iterator_to_array($iterator, true);
// iterate over the directory
foreach ($files as $path) {
if( is_file( $path ) && (substr($path, -5)=='f.jpg' || substr($path, -5)=='s.jpg')){
unlink($path);
echo "$path deleted<br/>";
}
}
}
$start = (float) array_sum(explode(' ',microtime()));
echo "*************** Deleting Selected Files ***************<br/>";
ClearOldJpg( );
$end = (float) array_sum(explode(' ',microtime()));
echo "<br/>------------------- Deleting selected files COMPLETED in:". sprintf("%.4f", ($end-$start))." seconds ------------------<br/>";
?>
One fun bonus of this code is that it will list the files being deleted and tell how long it took to run.

unlink files with a case-insensitive (glob-like) pattern

I have two folders, in one i have the videos and in the second one the configuration files for each video(3 files per video). Now if i want to delete a video i have to delete files by hand.
I found this :
<?php
$filename = 'name.of.the.video.xml';
$term = str_replace(".xml","", $filename);
$dirPath = ("D:/test/");
foreach (glob($dirPath.$term.".*") as $removeFile)
{
unlink ($removeFile);
}
?>
A echo will return:
D:/test/name.of.the.video.jpg
D:/test/name.of.the.video.srt
D:/test/name.of.the.video.xml
Is ok and it help me a lot, but i have a problem here.
Not all files are the same ex:
Name.of.The.video.jpg
Name.Of.The.Video.xml
If i echo the folder looking for that string and is not identic with the $filename will return empty.
So, my question is, how can i make that search Case insensitive?
Thank you.
You are making use of the glob function which is case sensitive. You are using the wrong function therefore to get the list of files.
You should therefore first normalize the filenames in the directory so they all share the same case (e.g. all lowercase). Or you need to use another method to get the directory listing case-insensitive. I suggest the first, however if that is not an option, why don't you glob for all files first and then filter the list of files using preg_grep which allows to specify patterns that are case-insensitive?
Which leads me to the point that it's more practicable to use DirectoryIterator with a RegexIterator:
$filename = 'name.of.the.video.xml';
$term = basename($filename, ".xml");
$files = new DirectoryIterator($dirPath);
$filesFiltered = new RegexIterator($files, sprintf('(^%s\\..*$)i', preg_quote($term)));
foreach($filesFiltered as $file)
{
printf("delete: %s\n", $file);
unlink($file->getPathname());
}
A good example of the flexibility of the Iterators code are your changed requirements: Do that for two directories at once. You just create two DirectoryIterators and append the one to the other with an AppendIterator. Job done. The rest of the code stays the same:
...
$files = new AppendIterator();
$files->append(new DirectoryIterator($dirPath1));
$files->append(new DirectoryIterator($dirPath2));
...
Voilá. Sounds good? glob is okay for some quick jobs that need just it. For everything else with directory operations start to consider the SPL. It has much more power.
Is strcasecmp() a valid function for this? Its a case insensitive str comparison function?
Surely if you know the file name and you can echo it out, you can pass this to unlink()?

PHP script that sends an email listing file changes that have happened in a directory/subdirectories

I have a directory with a number of subdirectories that users add files to via FTP. I'm trying to develop a php script (which I will run as a cron job) that will check the directory and its subdirectories for any changes in the files, file sizes or dates modified. I've searched long and hard and have so far only found one script that works, which I've tried to modify - original located here - however it only seems to send the first email notification showing me what is listed in the directories. It also creates a text file of the directory and subdirectory contents, but when the script runs a second time it seems to fall over, and I get an email with no contents.
Anyone out there know a simple way of doing this in php? The script I found is pretty complex and I've tried for hours to debug it with no success.
Thanks in advance!
Here you go:
$log = '/path/to/your/log.js';
$path = '/path/to/your/dir/with/files/';
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path), RecursiveIteratorIterator::SELF_FIRST);
$result = array();
foreach ($files as $file)
{
if (is_file($file = strval($file)) === true)
{
$result[$file] = sprintf('%u|%u', filesize($file), filemtime($file));
}
}
if (is_file($log) !== true)
{
file_put_contents($log, json_encode($result), LOCK_EX);
}
// are there any differences?
if (count($diff = array_diff($result, json_decode(file_get_contents($log), true))) > 0)
{
// send email with mail(), SwiftMailer, PHPMailer, ...
$email = 'The following files have changed:' . "\n" . implode("\n", array_keys($diff));
// update the log file with the new file info
file_put_contents($log, json_encode($result), LOCK_EX);
}
I am assuming you know how to send an e-mail. Also, please keep in mind that the $log file should be kept outside the $path you want to monitor, for obvious reasons of course.
After reading your question a second time, I noticed that you mentioned you want to check if the files change, I'm only doing this check with the size and date of modification, if you really want to check if the file contents are different I suggest you use a hash of the file, so this:
$result[$file] = sprintf('%u|%u', filesize($file), filemtime($file));
Becomes this:
$result[$file] = sprintf('%u|%u|%s', filesize($file), filemtime($file), md5_file($file));
// or
$result[$file] = sprintf('%u|%u|%s', filesize($file), filemtime($file), sha1_file($file));
But bare in mind that this will be much more expensive since the hash functions have to open and read all the contents of your 1-5 MB CSV files.
I like sfFinder so much that I wrote my own adaption:
http://www.symfony-project.org/cookbook/1_0/en/finder
https://github.com/homer6/altumo/blob/master/source/php/Utils/Finder.php
Simple to use, works well.
However, for your use, depending on the size of the files, I'd put everything in a git repository. It's easy to track then.
HTH

PHP, search and delete files from directory - performance

I want to delete cache files in a directory, the directory can contain up to 50.000 files. I currently I use this function.
// Deletes all files in $type directory that start with $start
function clearCache($type,$start)
{
$open = opendir($GLOBALS['DOC_ROOT']."/cache/".$type."/");
while( ($file = readdir($open)) !== false )
{
if ( strpos($file, $start)!==false )
{
unlink($GLOBALS['DOC_ROOT']."/cache/".$type."/".$file);
}
}
closedir($open);
}
This works fine and it is fast, but is there any faster way to do this? (scan_dir seems to be slow). I can move the cache to memory obviously.
Thanks,
hamlet
You may want to take a look into the glob function, as it may be even faster... it depends on the C library's glob command to do its work.
I haven't tested this, but I think this would work::
foreach (glob($GLOBALS['DOC_ROOT']."/cache/".$type."/".$start) as $file) {
unlink($GLOBALS['DOC_ROOT']."/cache/".$type."/".$file);
}
Edit: I'm not sure if $file would be just the filename or the entire path. glob's documentation implies just the filename.
Either glob as suggested before or, if you can be certain there won't be malicious input, by issueing directly to the system via exec(sprintf('rm %s/sess*', realpath($path)));, which should be fastest.

How can I improve this PHP code?

I have the php code below which help me get a photo's thumbnail image path in a script
It will take a supplied value like this from a mysql DB '2/34/12/thepicture.jpg'
It will then turn it into this '2/34/12/thepicture_thumb1.jpg'
I am sure there is a better performance way of doing this and I am open to any help please
Also on a page with 50 user's this would run 50 times to get 50 different photos
// the photo has it is pulled from the DB, it has the folders and filename as 1
$photo_url = '2/34/12/thepicture_thumb1.jpg';
//build the full photo filepath
$file = $site_path. 'images/userphoto/' . $photo_url;
// make sure file name is not empty and the file exist
if ($photo_url != '' && file_exists($file)) {
//get file info
$fil_ext1 = pathinfo($file);
$fil_ext = $fil_ext1['extension'];
$fil_explode = '.' . $fil_ext;
$arr = explode($fil_explode, $photo_url);
// add "_thumb" or else "_thumb1" inbetween
// the file name and the file extension 2/45/12/photo.jpg becomes 2/45/12/photo_thumb1.jpg
$pic1 = $arr[0] . "_thumb" . $fil_explode;
//make sure the thumbnail image exist
if (file_exists("images/userphoto/" . $pic1)) {
//retunr the thumbnail image url
$img_name = $pic1;
}
}
1 thing I am curious about is how it uses pathinfo() to get the files extension, since the extension will always be 3 digits, would other methods of getting this value better performance?
Is there a performance problem with this code, or are you just optimizing prematurely? Unless the performance is bad enough to be a usability issue and the profiler tells you that this code is to blame, there are much more pressing issues with this code.
To answer the question: "How can I improve this PHP code?" Add whitespace.
Performance-wise, if you're calling built-in PHP functions the performance is excellent because you're running compiled code behind the scenes.
Of course, calling all these functions when you don't need to isn't a good idea. In your case, the pathinfo function returns the various paths you need. You call the explode function on the original name when you can build the file name like this (note, the 'filename' is only available since PHP 5.2):
$fInfo = pathinfo($file);
$thumb_name = $fInfo['dirname'] . '/' . $fInfo['filename'] . '_thumb' . $fInfo['extension'];
If you don't have PHP 5.2, then the simplest way is to ignore that function and use strrpos and substr:
// gets the position of the last dot
$lastDot = strrpos($file, '.');
// first bit gets everything before the dot,
// second gets everything from the dot onwards
$thumbName = substr($file, 0, $lastDot) . '_thumb1' . substr($file, $lastDot);
The best optimization for this code is to increase it's readability:
// make sure file name is not empty and the file exist
if ( $photo_url != '' && file_exists($file) ) {
// Get information about the file path
$path_info = pathinfo($file);
// determine the thumbnail name
// add "_thumb" or else "_thumb1" inbetween
// the file name and the file extension 2/45/12/photo.jpg
// becomes 2/45/12/photo_thumb.jpg
$pic1 = "{$path_info['dirname']}/{$path_info['basename']}_thumb.{$fil_ext}";
// if this calculated thumbnail file exists, use it in place of
// the image name
if ( file_exists( "images/userphoto/" . $pic1 ) ) {
$img_name = $pic1;
}
}
I have broken up the components of the function using line breaks, and used the information returned from pathinfo() to simplify the process of determining the thumbnail name.
Updated to incorporate feedback from #DisgruntledGoat
Why are you even concerned about the performance of this function? Assuming you call it only once (say, when the "main" filename is generated) and store the result, its runtime should be essentially zero compared to DB and filesystem access. If you're calling it on every access to re-compute the thumbnail path, well, that's wasteful but it's still not going to be significantly impacting your runtime.
Now, if you want it to look nicer and be more maintainable, that's a worthwhile goal.
The easiest way to fix this is to thumbnail all user profile pics before hand and keep it around so you don't keep resizing.
$img_name = preg_replace('/^(.*)(\..*?)$/', '\1_thumb\2', $file);
Edit: bbcode disappeared with \.

Categories