Search and Replace Files in Directory - php

<span class="itemopener">82 top</span> <span class="allopener">all</span>
How can I change above to:
<span class="itemopener">top</span> <span class="allopener">82</span>
with PHP on an html file that contains around 30 of those HTML snippets.
Note: 82 can be any integer above 1.
Also, I want to run this script from a new file that I place in a directory, which will run the search and replace once for each of the 8000 HTML files in that directory (the script mustn't timeout before done - perhaps some feedback.)

i wrote function for replacement of the row:
function replace($row){
$replaced = preg_replace_callback("~(\<span class=\"itemopener\"\>)(\d{1,5})\s(top\</span\>.*\<span class=\"allopener\"\>).{3}(\</span\>)~iU", function($matches){
$str = $matches[1] . $matches[3] . $matches[2] . $matches[4];
return $str;
}, $row);
return $replaced;
}
$s = '<span class="itemopener">82 top</span> <span class="allopener">all</span>';
$replaced = replace($s);
echo "<pre>" . print_r($replaced, 1) . "</pre>";
exit();
Working demo of the function
If you would take file by one row, and do some simple check whether there is those spans you want to replace, then you can send them into this function..
But with number of files you specified, it will take some time.
For scanning of all files in path you can use my answer there: scandir
After little editing you can modify it to read only .htm files, and return to you what structure you desire..
Then you take all scanned htm files and process them with something like this:
$allScannedFiles = array("......");
foreach($allScannedFiles as $key => $path){
$file = file_get_contents($path);
$lines = explode(PHP_EOL, $file);
$modifiedFile = "";
foreach($lines as $line){
if(strpos($line, "span") && strpos($line, "itemopener")){
$line = replace($line);
}
$modifiedFile .= $line . PHP_EOL;
}
file_put_contents($path, $modifiedFile);
}
I wrote this one snippet from the head, so some testing is needed..
Then run it, go make yourself coffe and wait :)
If it will timeout, you can increase php timeout. How to do that is asked&answered here: how to increase timeout in php
alternatively you can try load files as DOMDocument and do replacements on that class documentation of DomDocument
But if in the files somewhere is not valid html, it may cause you problems..

I'm using the function created by #Jimmmy (replaced range d{2} by d{1,5} because "Note: 82 can be any integer above 1") and added the files search (tested it and works great) :
<?php
function replace($row){
$replaced = preg_replace_callback("~(\<span class=\"itemopener\"\>)(\d{1,5})\s(top\</span\>.*\<span class=\"allopener\"\>).{3}(\</span\>)~iU", function($matches){
$str = $matches[1] . $matches[3] . $matches[2] . $matches[4];
return $str;
}, $row);
return $replaced;
}
foreach ( glob( "*.html" ) as $file ) // GET ALL HTML FILES IN DIRECTORY.
{ $lines = file( $file ); // GET WHOLE FILE AS ARRAY OF STRINGS.
for ( $i = 0; $i < count( $lines ); $i++ ) // CHECK ALL LINES IN ARRAY.
$lines[ $i ] = replace( $lines[ $i ] ); // REPLACE PATTERN IF FOUND.
file_put_contents( $file,$lines ); // SAVE ALL ARRAY IN FILE.
}
?>

Related

How to get preg_replace() to delete text between two tags?

I'm trying to make a function in PHP that can delete code within two tags from all .js file within one folder and all its subfolders. So far everything works except preg_replace(). This is my code:
<?php
deleteRealtimeTester('test');
function deleteRealtimeTester($folder_path)
{
foreach (glob($folder_path . '/*.js') as $file)
{
$string = file_get_contents($file);
$string = preg_replace('#//RealtimeTesterStart(.*?)//RealtimeTesterEnd#', 'test2', $string);
$file_open = fopen($file, 'wb');
fwrite($file_open, $string);
fclose($file_open);
}
$subfolders = array_filter(glob($folder_path . '/*'), 'is_dir');
if (sizeof($subfolders) > 0)
{
for ($i = 0; $i < sizeof($subfolders); $i++)
{
echo $subfolders[$i];
deleteRealtimeTester($subfolders[$i]);
}
}
else
{
return;
}
}
?>
As mentioned I want to delete everything inside these tags and the tags themselve:
//RealtimeTesterStart
//RealtimeTesterEnd
It is important that the tags contains the forward slashes and also that if a file contains multiple of these tags, only code from //RealtimeTesterStart to //RealtimeTesterEnd is deleted and not from //RealtimeTesterEnd to //RealtimeTesterStart.
I hope that someone can help me.
You could also change your regex to use the [\s\S] character set which can be used to match any character, including line breaks.
So have the following
preg_replace('#\/\/RealtimeTesterStart[\s\S]+\/\/RealtimeTesterEnd#', '', $string);
This would remove the contents of //RealtimeTesterStart to //RealtimeTesterEnd and the tags themselves.
I'm assuming that //RealtimeTesterStart, //RealtimeTesterEnd and the code in between are on different lines? In PCRE . does NOT match newlines. You need to use the s modifier ( and you don't need the () unless you need the captured text for the replacement):
#//RealtimeTesterStart.*?//RealtimeTesterEnd#s
Also, look at GLOB_ONLYDIR for glob instead of array_filter. Also, also, maybe file_put_contents instead of fopen etc.
Maybe something like:
foreach (glob($folder_path . '/*.js') as $file) {
$string = file_get_contents($file);
$string = preg_replace('#//RealtimeTesterStart.*?//RealtimeTesterEnd#s', 'test2', $string);
file_put_contents($file, $string);
}
foreach(glob($folder_path . '/*', GLOB_ONLYDIR) as $subfolder) {
deleteRealtimeTester($subfolder);
}

Search only the first column of a text file

I have a text file with the following contents:
---> 12455 ---> 125 ---> KKK
---> 11366 ---> 120 ---> LLL
---> 12477 ---> 120 ---> YYY
I am using the following PHP code to search the file for "---> 124" and I get the following results:
---> 12455 ---> 125 ---> KKK
---> 12477 ---> 120 ---> YYY
but I want the results to be like this:
---> 12455
---> 12477
I want it to return only the first column.
<?php
$file = 'mytext.txt';
$searchfor = '---> ' . "124";
// the following line prevents the browser from parsing this as HTML.
header('Content-Type: text/plain');
// get the file contents, assuming the file to be readable (and exist)
$contents = file_get_contents($file);
// escape special characters in the query
$pattern = preg_quote($searchfor, '/');
// finalise the regular expression, matching the whole line
$pattern = "/^.*$pattern.*\$/m";
// search, and store all matching occurences in $matches
if(preg_match_all($pattern, $contents, $matches)) {
echo implode($matches[0]);
} else {
echo "No matches found";
}
?>
Change your approach a little bit. Instead of storing the search term and separator in a single string, use two variables.
$sep = '--->';
$searchfor = '124';
$pattern = "/^$sep\s+($searchfor\d+)\s+.*/m";
// search, and store all matching occurences in $matches
if(preg_match_all($pattern, $contents, $matches)){
echo implode(' ', $matches[1])."\n";
}
Outputs:
12455 12477
Demo.
First of all, seperate your concerns:
Read the file
Parse the content
Search
Using Iterators, you can achieve something great here but it will need a deeper understanding of OOP and the iterator interface. What i'll recommend is a simpler approach:
<?php
//Read the file line by line
$handle = fopen('file.txt', 'r');
while(!foef($handle)){
$content = fgets($handle);
//Parse the line
$content = explode('---> ', $content);
//Analyse the line
if($content[1] == 124){
echo $content[0]."\n";
}
}
fclose($handle);
That should be it, just adapt it as you see it, i haven't tested the code here!
change "/^.*$pattern.*\$/m" to "/$pattern\d*/i"
and then echo implode($matches[0]); to foreach($matches[0] as $item) echo "$item<br />\r\n";
If the structure is always as you have shown, then:
Read the file line by line;
explode(); each line by space  ;
Read the element [1] of the result;
This seems to be most logical to me. No need for regex in here, because it will work slower then simple explode operation.
Here is an example:
$handle = fopen( 'file.txt', 'r' );
if ( $handle ) {
while ( ( $line = fgets( $handle ) ) !== false ) {
$matches = explode( ' ', $line );
if ( $matches[4] == '124' )
echo $matches[1] . '<br/>';
}
}
try this:
--->\s\d{5}
regex is overkill here, a simple explode('--->', $str) and selecting the first element would suffice
$file = file_get_contents('file.txt');
$lines = explode('---> ', $file);
for($i=1; $i<count($lines); $i=$i+3)
if(strpos($lines[$i], '124')!==false)
$col[$i/3] = /*'--> ' . */$lines[$i];
print_r($col);
That seems to work just fine. Uncomment the comment above if you want the --> included in the output. Also, the resulting $col array is indexed with the row number it is found. Just replace [$i/3] with [] if you don't want that.
Furthering this:
function SearchFileByColumn($contents, $col_num, $search, $col_count = 3) {
$segs = explode('---> ', $contents);
for($i=$col_num; $i<count($segs); $i=$i+$col_count)
if(strpos($segs[$i], $search) !== false)
$res[] = $segs[$i];
return $res;
}
$results = SearchFileByColumn($contents, 1, '124');

Don't echo things with certain characters

I have a php program that looks at a log file and prints it to a page (code below). I don't want the user of said website to be able to look at any line containing a /. I know I could use trim to delete certain characters, but is there a way to delete the entire line? For example, I want to keep something like "Hello" and delete something like /xx.xx.xx.xx connected. All the lines I wish to delete have the same common key, /. Peoples names in said log file have <>s around them, so I must use htmlspecialcharacters
$file = file_get_contents('/path/to/log', true);
$file = htmlspecialchars($file);
echo nl2br($file);
Thanks for your help!
EDIT:
Thanks for all of the answers, currently tinkering with them!
EDIT2:
final code:
<?php
$file = file_get_contents('/path/to/log', true);
// Separate by line
$lines = explode(PHP_EOL, $file);
foreach ($lines as $line) {
if (strpos($line, '/') === false) {
$line = htmlspecialchars($line . "\n");
echo nl2br($line);
}
}
?>
Do you mean, like this?
$file = file_get_contents('/path/to/log', true);
// Separate by line
$lines = explode(PHP_EOL, $file);
foreach ($lines as $line) {
if (strpos($line, '/') === false) {
// If the line doesn't contain a "/", echo it
echo $line . PHP_EOL;
}
}
For anyone wondering, PHP_EOL is the PHP constant for "end of line" and promotes consistency between different systems (Windows, UNIX, etc.).
If you are iterating through the file line by line you can check with preg_match if the line contains /character and skip the echo if it does. If not, first split them at new line and iterate over that array.
If you don't want to split the file you can probably use preg_replace with a regexp such as (^|\n).*/.*(\n|$) and replace with empty string.
Use the str_replace function -
http://php.net/manual/en/function.str-replace.php. Alternate solution (before escaping the special characters) -
/* pattern /\/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\sconnected/ = /xx.xx.xx.xx connected */
/* pattern will be replaced with "newtext" */
$file = file_get_contents("/path/to/log", true);
$lines = explode("\n", $file);
foreach ($lines as $line)
$correctline = preg_replace( '/\/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\sconnected/', 'newtext', $line );
echo $correctline;
}
<?php
$file = file_get_contents("/path/to/log", true);
$lines = explode("\n", $file);
foreach ($lines AS $num => $line)
{
if ( strpos($line, "/") === false ) // Line doesn't contain "/"
{
echo htmlspecialchars($line) . "\n";
}
}
?>

PHP file to open and modify other php (With find and replace via regex?)

I want to do the following
I want to create .php file (executed via cronjobs) that will paste this code $files[] = 'example.php';
to other php file (paste.php) but it has to find the lastest $files[] line like regex $files[] = '(AnythingHere)'; and after this line to paste the new line. It can have random number of pages so I have no way of knowing.
<?php
if (!isset($php_file)) {
$files[] = 'page1.php';
$files[] = 'page2.php';
$files[] = 'page3.php';
$files[] = 'page4.php';
$file = $files[ rand(0,count($files)) ];
I hope you guys understand what I want; can anyone help me out with this one?
if you have ONLY $file[] = '...' in paste.php, you can simply append to the file:
$line = '$file[] = "pageX.php";' . PHP_EOL;
file_put_contents('paste.php', $line, FILE_APPEND);
of you want the last "page[]" enty.
$yourNewLine = '$file[] = "pageX.php";'; // this is an example. put your "line" prm here
$filename = 'paste.php';
$lines = file($filename);
$lines = array_reverse($lines)
$found = false;
$i = 0;
while ( ! $found )
{
if ( strpos($lines[$i], '$files[] = ' === 0) )
{
$found = true;
array_splice($lines, $i, 0, $yourNewLine.PHP_EOL);
}
$i++;
}
$lines = array_reverse($lines);
file_put_contents($filename, $lines);
Instead of doing it this way, how about instead setting your files array in a script and then include it at the top. This way you can reference the array directly and still only have to edit the file listing in only one place.
Quick and dirty first-fit solution:
Open the file
Read each line until you find one matching your regex for $files[] = ...
Read more lines until you find one that doesn't match the regex
Write each line read in 2 and 3 to the output file
Insert your new line into the output
Write the rest of the input to the output
This may not be the best way to approach the problem, drawbacks being that you have to read each line in and compare it with your regex until you find your insertion point. You'll also probably have a temporary file for output which you'll then rename to the original filename.
You'll have 2 while loops:
while (line does not match): read next line
and then
while (line does match): read next line
Someone who knows PHP better than I do might be able to come up with something a bit cleaner, but if you're just looking for something quick to get the job done, this ought to work.
Having this code:
$filesArray = array('page1.php','page2.php','page3.php','page4.php','page5.php',);
then getting the php file with $data = file("path/to/editable_file.php");
foreach($data as $line)
{
if(preg_replace("/\$filesArray\s=\sarray\([\w'.,]+()\);/", "'".$newfilename."',", $line, $match))
{
file_put_contents(implode("\r\n", $data));
break;
}
}

PHP - alpha sort lines from several files in one directory and save them to files of "x" lines max in alpha named folders

This below goes through files in a directory, reads them and saves them in files of 500 lines max to a new directory.
This works great for me (thanks Daniel) but, I need a modification.
I would like to save to alpha num based files.
First, sort the array alpha numerically (already lowercase) would be the first step I assume.
Grab all of the lines in each $incoming."/.txt" that start with "a" and put them into a folder at $save500."/a" but, a max of 500 lines each.
(I guess it would be best to start with the first at the top of the sort so "0" not "a" right?)
All the lines that start with a number, go into $save500."/num".
None of the lines will start with anything but a-z0-9.
This will allow me to search my files for a match more efficiently using this flatfile method. Narrowing it down to one folder.
$nextfile=0;
if (glob("" . $incoming . "/*.txt") != false){
$nextfile = count(glob("" . $save500 . "/*.txt"));
$nextfile++;
}
else{$nextfile = 1;}
/**/
$files = glob($incoming."/*.txt");
$lines = array();
foreach($files as $file){
$lines = array_merge($lines, file($file, FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES));
}
$lines = array_unique($lines);
/*this would put them all in one file*/
/*file_put_contents($dirname."/done/allofthem.txt", implode("\n", $lines));*/
/*this breaks them into files of 500*/
foreach (array_chunk($lines, 500) as $chunk){
file_put_contents($save500 . "/" . $nextfile . ".txt", implode("\n", $chunk));
$nextfile++;
}
Each still need to be in a max of 500 lines.
I will graduate to mysql later on. Only been doing this a couple months now.
As if that is not enough. I even thought of taking the first two characters off. Making directories with subs a/0 thru z/z!
Could be the wrong approach above since no responses.
But I want a word like aardvark saved to the 1.txt the a/a folder (appending). Unless 1.txt has 500 lines then save it to a/a 2.txt.
So xenia would be appended to the x/e folder 1.txt file unless there are 500 lines so create 2.txt and save it there.
I will then be able to search for those words more efficiently without loading a ton into memory or looping through files /lines that won't contain a match.
Thanks everyone!
I wrote some code here that should do what you're looking for, it's not a perfomance beauty but should do the job. Try it in a safe environment, no guarantee for any data-loss ;)
Comment if there are any errors, it's pretty late here ;) I have to get some sleep ;)
NOTE: This one only works if every line has at least 2 characters! ;)
$nextfile=0;
if (glob("" . $incoming . "/*.txt") != false){
$nextfile = count(glob("" . $save500 . "/*.txt"));
$nextfile++;
}
else
{
$nextfile = 1;
}
$files = glob($incoming."/*.txt");
$lines = array();
foreach($files as $file){
$lines = array_merge($lines, file($file, FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES));
}
$lines = array_unique($lines);
/*this would put them all in one file*/
/*file_put_contents($dirname."/done/allofthem.txt", implode("\n", $lines));*/
/*this breaks them into files of 500*/
// sort array
sort($lines);
// outer grouping
$groups = groupArray($lines, 0);
$group_keys = array_keys($groups);
foreach($group_keys as $cKey) {
// inner grouping
$groups[$cKey] = groupArray($groups[$cKey], 1);
foreach($groups[$cKey] as $innerKey => $innerArray) {
$nextfile = 1;
foreach(array_chunk($innerArray, 500) as $chunk) {
file_put_contents($save500 . "/" . $cKey . "/" . $innerKey . "/" . $nextfile . ".txt", implode("\n", $chunk));
$nextfile++;
}
}
}
function groupArray($data, $offset) {
$grouped = array();
foreach($data as $cLine) {
$key = substr($cLine, $offset, 1);
if(!isset($grouped[$key])) {
$grouped[$key] = array($cLine);
}
else
{
$grouped[$key][] = $cLine;
}
}
return $grouped;
}

Categories