Find specific text in multiple TXT files in PHP

Find specific text in multiple TXT files in PHP - php

I want to find a specific text string in one or more text files in a directory, but I don't know how. I have Googled quite a long time now and I haven't found anything. Therefor I'm asking you guys how I can fix this?
Thanks in advance.

If it is a Unix host you're running on, you can make a system call to grep in the directory:
$search_pattern = "text to find";
$output = array();
$result = exec("/path/to/grep -l " . escapeshellarg($search_pattern) . " /path/to/directory/*", $output);
print_r($output);
// Prints a list of filenames containing the pattern

You can get what you need without the use of grep. Grep is a handy tool for when you are on the commandline but you can do what you need with just a bit of PHP code.
This little snippet for example, gives you results similar to grep:
$path_to_check = '';
$needle = 'match';
foreach(glob($path_to_check . '*.txt') as $filename)
{
foreach(file($filename) as $fli=>$fl)
{
if(strpos($fl, $needle)!==false)
{
echo $filename . ' on line ' . ($fli+1) . ': ' . $fl;
}
}
}

If you're on a linux box, you can grep instead of using PHP. For php specifically, you can iterate over the files in a directory, open each as a string, find the string, and save the file if the string exists.

Just specify a file name, get the contents of the file, and do regex matching against the file contents. See this and this for further details regarding my code sample below:
$fileName = '/path/to/file.txt';
$fileContents = file_get_contents($fileName);
$searchStr = 'I want to find this exact string in the file contents';
if ($fileContents) { // file was retrieved successfully
// do the regex matching
$matchCount = preg_match_all($searchStr, $fileContents, $matches);
if ($matchCount) { // there were matches
// $match[0] will contain the entire string that was matched
// $matches[1..n] will contain the match substrings
}
} else { // file retrieval had problems
}
Note: This will work irrespective of whether or not you're on a linux box.

Related

How to replace PHP code in a file having newlinse

I got some files to change by clicking a button. To go for it, i have the old string to replace, saved in database, and also the new one.
On the click button, it executes a function that is gonna find the old string in the PHP file, then gonna replace it by the new one. (Final goal is to automate the PHP edits in a web software after an update).
My problem is that it perfectly works on short strings (without newline), but as soon as there is a newline into the file, nothing happens.
This is my actual code :
$path = '/mypath/' . $item['path'];
$old_code = $item['old_code'];
$new_code = $item['new_code'];
}
$pos = strpos(file_get_contents($path), $old_code);
$file = file_get_contents($path);
$str = str_replace($old_code, $new_code, $file);
file_put_contents($path, $str);
$pos is "true" if my $old_code doesn't have any newline.
I tried to use preg_match to remove \n, but the problem is that when i'll have to push my edits on the file with file_put_contents, every newline will also disapear.
Example of non-working str_replace :
echo "ok"; echo 'hey there is some spaces before'
echo 'this is a sentence';
$menu = ['test1', 'test200'];
print_r($menu);
$url = "/link/to/test";
$div = "echo \"<div class='central_container' align='center'>\";";
Do you have any idea for resolving this ?
Thanks

if I`m not wrong str_replace() work only with single lines . Its have 2 options.
Option line replace str_replace() with preg_replace() or just use https://regex101.com/ there also have code generator after you finish you Regex

How do I properly log only files changed with PHP shell_exec?

I have created a script which copies changed files from a development site to a live site that works flawlessly.
I'm now trying to log which files were changed and then add that list to a DB table that keeps track of changes.
I use shell_exec to run rsync for the copy and then am trying to trim the output and add \n for formatting.
The output is something like "sending incremental file list portalMaint.php sent 27,659 bytes received 81 bytes 55,480.00 bytes/sec total size is 101,582,367 speedup is 3,661.95".
Here is the code I have:
$command = "sudo -S rsync -av ".$exclude." ".$source." ".$dest." --delete 2>&1";
// --- Issue command and check for errors.
$exErrors = shell_exec($command);
if (stripos($exErrors, "error:") !== false || stripos($exErrors, "[sudo]")) {
$error = "Uh-OH, we have a problem! Don't Panic!";
$errors = $exErrors;
include("head.php");
include("template_".$currentPage.".html");
include("foot.php");
exit();
}else{
$filesCopied = $exErrors;
$filesCopied = substr($filesCopied, 0, strrpos($filesCopied, " sent "));
$filesCopied = preg_replace("/\s+/", "\n", $filesCopied);
}
This does NOT work. $filesCopied ends up being blank.
If I comment out $filesCopied = substr($filesCopied, 0, strrpos($filesCopied, " sent ")); I get the entire output unformatted.
What am I doing wrong? I just need the files that were changed 1 per line.
Thanks.

If your unformatted output have same pattern:
sending incremental file list file1.php sent ...
sending incremental file list file2.php sent ...
sending incremental file list file3.php sent ...
you can use preg_match_all() to capture the file names into array:
if (preg_match_all('/file list (.*?) sent /', $result, $matches)) {
$filesCopied = $matches[1];
} else {
echo 'Pattern does not match';
}

Found the answer!
As it turns out although the shell output was echoing as one line it was masking line breaks. So when it came to formatting the output the regex wasn't matching.
What I did was send the shell output to a file where I could see that it was line breaking.
So I took the shell output variable $filesCopied and ran it through preg_replace() using \R as the pattern. I found that here: Replace multiple newlines, tabs, and spaces
Thank you #Anggara as your code was better than mine for formatting and is what I am using.
Here is my final code:
$command = "sudo -S rsync -av ".$exclude." ".$source." ".$dest." --delete 2>&1";
// --- Issue command and check for errors.
$exErrors = shell_exec($command);
if (stripos($exErrors, "error:") !== false || stripos($exErrors, "[sudo]")) {
$error = "Uh-OH, we have a problem! Don't Panic!";
$errors = $exErrors;
include("head.php");
include("template_".$currentPage.".html");
include("foot.php");
exit();
}else{
// -- Strip invisible line breaks
$filesCopiedRaw = preg_replace('#\R+#', ' ', $exErrors);
// -- Strip all but files and folders from string and build an array
preg_match_all('/file list (.*?) sent /', $filesCopiedRaw, $matches);
$result = $matches[1];
// -- Convert array to single string
$filesCopied = "";
foreach($result as $file) {
$filesCopied .= $file." ";
}
// -- Replace spaces in string with line breaks
$filesCopied = preg_replace("/\s+/", "\n", $filesCopied);
}

How to highlight strings in a file to download?

I have a text file "output.txt" which is an output of a shell command. I want to highlight certain words taken as $_GET['word'] in that file and then allow to download using href.
I have seen multiple questions but none of them seems to be working in this case.
Highlight multiple keywords from a given string
highlight the word in the string, if it contains the keyword
Code:
$cmd = shell_exec("/usr/local/bin/clustalw2 -infile=input.txt -tree -type=protein -case=upper &");
$file = 'output.txt';
$content = explode("\n",file_get_contents("output.txt"));
$keyword = $_GET['word'];
$content = str_replace($keyword,'<span style="color:red">'.$keyword.'</span>',$content);
$file = file_put_contents($file, $content);
echo "<a href='http://some.thing.com/folder/output.txt'>Download Result file</a>";
It is not giving any error neither highlighting the text.

I suspect the trouble is with opening the initial file. The following did work for me.
I created /home/input.txt:
this is a test
that may
or may not work.
Then ran:
$cmd = shell_exec(" cp /home/input.txt output.txt");
$file = 'output.txt';
$content = explode("\n",file_get_contents("output.txt"));
$keyword = "may";
$content = str_replace($keyword,'<span style="color:red">'.$keyword.'</span>',$content);
$file = file_put_contents($file, implode("\n", $content));
echo "<a href='http://some.thing.com/folder/output.txt'>Download Result file</a>";
And the output.txt is now:
this is a test
that <span style="color:red">may</span>
or <span style="color:red">may</span> not work.

Find/Replace part of text in PHP and convert to HTML

I have a large number of ASCII text files and am listing out the contents of each using the code below:
<?php
$file = $_GET['file'];
$orig = file_get_contents($file);
$a =htmlentities($orig);
echo $a;
?>
Some strings of text in each ASCII file are references to file names of other files and I'm trying to find and replace them with a Hyperlink to that file.
For example, a text file might be called "LAB_E143.txt" which looks like this:
LAB_E143:
LDX $#FF ; load X with $FF
JSR LAB_E151 ; jump to this location
and what I'm trying to find & replace are references beginning with "LAB_" (e.g. LAB_E151 in the example above) so that it displays the text as a Hyperlink with a href of:
http:\\capture.php?file=lab_e151.txt
Clicking on that link will then display the contents of that particular text file and so on. All the references begin with "LAB_" followed by 4 variable characters.
I've tried str_replace but am struggling to parse the 4 variable characters each time.
Any help / pointers greatly appreciated

You should use Regex for such cases. As shudder mentioned, preg_replace_callback should be the best function to use for this purpose.
Detect all references with the following Regex: /LAB_(?<id>\S{4})/
Write a function to replace the matches with the <a> tag
That's it.
$text = 'LAB_8435 Lorem ipsum dolor sit amet. LAB_8337 Amet.';
$formattedText = preg_replace_callback('/LAB_(?<id>\S{4})/', function ($matches) {
return ''.$matches[0].'';
}, $text);
echo $formattedText;

Warning: you want to display file from specific folder - make sure that user can't change the path with provided string (file whitelist, filename sanitization), because it would be possible to do some serious damage.
I suggest not giving a clue that link is directly connected with included file name. Instead /capture.php?file=lab_e151.txt you may have /capture.php?id=e151 and then something like this:
$id = isset($_GET['id']) ? $_GET['id'] : ''; //in php7: $id = $_GET['id'] ?? '';
if (!preg_match('/[0-9A-Za-z]{4}/', $id)) { die('Invalid link'); }
$file = 'lab_' . $id . '.txt';
//...
$convertToLink = function ($matches) {
return '' . $matches[0] . '';
};
$code = preg_replace_callback('/LAB_([0-9A-Za-z]{4})/', $convertToLink, $string);
echo '<pre>' . $code . '</pre>';
If those 4 chars are hex number then you may use this pattern instead: /LAB_([0-9A-Fa-f]{4})/

Switch gettext translated language with original language

I started my PHP application with all text in German, then used gettext to extract all strings and translate them to English.
So, now I have a .po file with all msgids in German and msgstrs in English. I want to switch them, so that my source code contains the English as msgids for two main reasons:
More translators will know English, so it is only appropriate to serve them up a file with msgids in English. I could always switch the file before I give it out and after I receive it, but naaah.
It would help me to write English object & function names and comments if the content text was also English. I'd like to do that, so the project is more open to other Open Source collaborators (more likely to know English than German).
I could do this manually and this is the sort of task where I anticipate it will take me more time to write an automated routine for it (because I'm very bad with shell scripts) than do it by hand. But I also anticipate despising every minute of manual computer labour (feels like an oxymoron, right?) like I always do.
Has someone done this before? I figured this would be a common problem, but couldn't find anything. Many thanks ahead.
Sample Problem:
<title><?=_('Routinen')?></title>
#: /users/ruben/sites/v/routinen.php:43
msgid "Routinen"
msgstr "Routines"
I thought I'd narrow the problem down. The switch in the .po-file is no issue of course, it is as simple as
preg_replace('/msgid "(.+)"\nmsgstr "(.+)"/', '/msgid "$2"\nmsgstr "$1"/', $str);
The problem for me is the routine that searches my project folder files for _('$msgid') and substitutes _('msgstr') while parsing the .po-file (which is probably not even the most elegant way, after all the .po-file contains comments which contain all file paths where the msgid occurs).
After fooling around with akirk's answer a little, I ran into some more problems.
Because I have a mixture of _('xxx') and _("xxx") calls, I have to be careful about (un)escaping.
Double quotes " in msgids and msgstrs have to be unescaped, but the slashes can't be stripped, because it may be that the double quote was also escaped in PHP
Single quotes have to be escaped when they're replaced into PHP, but then they also have to be changed in the .po-file. Luckily for me, single quotes only appear in English text.
msgids and msgstrs can have multiple lines, then they look like this
msgid = ""
"line 1\n"
"line 2\n"
msgstr = ""
"line 1\n"
"line 2\n"
plural forms are of course skipped at the moment, but in my case that's not an issue
poedit wants to remove strings as obsolete that seem successfully switched and I have no idea why this happens in (many) cases.
I'll have to stop working on this for tonight. Still it seems using the parser instead of RegExps wouldn't be overkill.

I built on akirk's answer and wanted to preserve what I came up with as an answer here, in case somebody has the same problem.
This is not recursive, but that could easily change of course. Feel free to comment with improvements, I will be watching and editing this post.
$po = file_get_contents("locale/en_GB/LC_MESSAGES/messages.po");
$translations = array(); // german => english
$rawmsgids = array(); // find later
$msgidhits = array(); // record success
$msgstrs = array(); // find later
preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$german = str_replace('\"','"',$match[1]); // unescape double quotes (could misfire if you escaped double quotes in PHP _("bla") but in my case that was one case versus many)
$english = str_replace('\"','"',$match[2]);
$en_sq_e = str_replace("'","\'",$english); // escape single quotes
$translations['_(\''. $german . '\''] = '_(\'' . $en_sq_e . '\'';
$rawmsgids['_(\''. $german . '\''] = $match[1]; // find raw msgid with searchstr as key
$translations['_("'. $match[1] . '"'] = '_("' . $match[2] . '"';
$rawmsgids['_("'. $match[1] . '"'] = $match[1];
$translations['__(\''. $german . '\''] = '__(\'' . $en_sq_e . '\'';
$rawmsgids['__(\''. $german . '\''] = $match[1];
$translations['__("'. $match[1] . '"'] = '__("' . $match[2] . '"';
$rawmsgids['__("'. $match[1] . '"'] = $match[1];
$msgstrs[$match[1]] = $match[2]; // msgid => msgstr
}
foreach (glob("*.php") as $file) {
$code = file_get_contents($file);
$filehits = 0; // how many replacements per file
foreach($translations AS $msgid => $msgstr) {
$hits = 0;
$code = str_replace($msgid,$msgstr,$code,$hits);
$filehits += $hits;
if($hits!=0) $msgidhits[$rawmsgids[$msgid]] = 1; // this serves to record if the msgid was found in at least one incarnation
elseif(!isset($msgidhits[$rawmsgids[$msgid]])) $msgidhits[$rawmsgids[$msgid]] = 0;
}
// file_put_contents($file, $code); // be careful to test this first before doing the actual replace (and do use a version control system!)
echo "$file : $filehits <br>";
echo $code;
}
/* debug */
$found = array_keys($msgidhits, 1, true);
foreach($found AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";
echo "Not Found: <br>";
$notfound = array_keys($msgidhits, 0, true);
foreach($notfound AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";
/*
following steps are still needed:
* convert plurals (ngettext)
* convert multi-line msgids and msgstrs (format mentioned in question)
* resolve uniqueness conflict (msgids are unique, msgstrs are not), so you may have duplicate msgids (poedit finds these)
*/

See http://code.activestate.com/recipes/475109-regular-expression-for-python-string-literals/ for a good python-based regular expression for finding string literals, taking escapes into account. Although it's python, this might be quite good for multiline strings and other corner cases.
See http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/poswap.html for a ready, out-of-the-box base language swapper for .po files.
For instance, the following command line will convert german-based spanish translation to english-based spanish translation. You just have to ensure that your new base language (english) is 100% translated before starting conversion:
poswap -i de-en.po -t de-es.po -o en-es.po
And finally to swap english po file to german po file, use swappo:
http://manpages.ubuntu.com/manpages/hardy/man1/swappo.1.html
After swapping files, some manual polishing of resultant files might be required. For instance headers might be broken and some duplicate texts might occur.

So if I understand you correctly you'd like to replace all German gettext calls with English ones. To replace the contents in the directory, something like this could work.
$po = file_get_contents("translation.pot");
$translations = array(); // german => english
preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$translations['_("'. $match[1] . '")'] = '_("' . $match[2] . '")';
$translations['_(\''. $match[1] . '\')'] = '_(\'' . $match[2] . '\')';
}
foreach (glob("*.php") as $file) {
$code = file_get_contents($file);
$code = str_replace(array_keys($translations), array_values($translations), $code);
//file_put_contents($file, $code);
echo $code; // be careful to test this first before doing the actual replace (and do use a version control system!)
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.