Reading a file and extracting data using a regex in PHP

Reading a file and extracting data using a regex in PHP - php

I am trying to echo out the names/paths of the files that are written in logfile.txt. For that, I use a regex to match everything before the first ocurrence of : and output it. I am reading the logfile.txt line by line:
<?php
$logfile = fopen("logfile.txt", "r");
if ($logfile) {
while (($line = fgets($logfile)) !== false) {
if (preg_match_all("/[^:]*/", $line, $matched)) {
foreach ($matched as $val) {
foreach ($val as $read) {
echo '<pre>'. $read . '</pre>';
}
}
}
}
fclose($logfile);
} else {
die("Unable to open file.");
}
?>
However, I get the entire contents of the file instead. The desired output would be:
/home/user/public_html/an-ordinary-shell.php
/home/user/public_html/content/execution-after-redirect.html
/home/user/public_html/paypal-gateway.html
Here is the content of logfile.txt:
-------------------------------------------------------------------------------
/home/user/public_html/an-ordinary-shell.php: Php.Trojan.PCT4-1 FOUND
/home/user/public_html/content/execution-after-redirect.html: {LDB}VT-malware33.UNOFFICIAL FOUND
/home/user/public_html/paypal-gateway.html: Html.Exploit.CVE.2015_6073
Extra question: How do I skip reading the first two lines (namely the dashes and emtpy line)?

Here you go:
<?php
# load it as a string
$data = #file("logfile.txt");
# data for this specific purpose
$data = <<< DATA
-------------------------------------------------------------------------------
/home/user/public_html/an-ordinary-shell.php: Php.Trojan.PCT4-1 FOUND
/home/user/public_html/content/execution-after-redirect.html: {LDB}VT-malware33.UNOFFICIAL FOUND
/home/user/public_html/paypal-gateway.html: Html.Exploit.CVE.2015_6073
DATA;
$regex = '~^(/[^:]+):~m';
# ^ - anchor it to the beginning
# / - a slash
# ([^:]+) capture at least anything NOT a colon
# turn on multiline mode with m
preg_match_all($regex, $data, $files);
print_r($files);
?>
It even skips both your lines, see a demo on ideone.com.

preg_match_all returns all occurrences for the pattern. For the first line, it will return:
/home/user/public_html/an-ordinary-shell.php,an empty string, Php.Trojan.PCT4-1 FOUND
and an other empty string
that don't contain :.
to obtain a single result, use preg_match, but to do that using explode should suffice.
To skip lines you don't want, you can for example build a generator function that gives only the good lines. You can also use a stream filter.

Related

How to remove lines containing a specific character (#) from file and echo out the data using PHP

I have a text file with data that looks like this:
#dacdcadcasvsa
#svsdvsd
#
#sfcnakjncfkajnc
I want to keep this line
and this one
How can I remove all lines containing # and echo out the lines that don't so it looks like:
I want to keep this line
and this one
All I know is that I have to get_file_contents($filename). Would I have to put it in an array?
Any tips and guidance would be appreciated.

Using file() and foreach()
$lines = file("a.txt");
foreach ( $lines as $line ) {
if ( $line[0] != '#' ){
echo $line;
}
}
Just update the name of the file.

You can replace all the comment lines with empty strings before you output.
<div style="white-space: pre-line;">
<?= preg_replace('/^#.*\n/m', '', file_get_contents($filename)) ?>
</div>

You're thinking along the right lines; although the PHP method (function) you need is actually file_get_contents(), not get_file_contents() (as per your question).
Let's break it down:
We need a way of separating out our data into sortable chunks. As you stated, the best way to do this is using an array.
We could do this, using the hash symbol (#) as a delimiter - but this
would mean the last chunk of text is a mixture of text we want to
remove, and text we want to keep. Instead, we'll be using line
breaks as our delimiter.
Once the data has been separated, we can work on removing those lines that begin with a hash symbol.
Our code will look something like this:
<?php
// Get the file contents
$fileContents = file_get_contents('my_file.txt'); // This could be any file extension
// Split the file by new lines
$contentsArr = preg_split('/\r\n|\r|\n/', $fileContents);
// Function for removing items from an array - https://stackoverflow.com/questions/9993168/remove-item-from-array-if-item-value-contains-searched-string-character?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
function myFilter($string) {
return strpos($string, '?') === false;
}
// Remove comment items from array
$newFileContents = array_filter($fileContents, 'myFilter');
// Concatenate and echo out the result
echo implode(",\n",$newFileContents);

An alternate because I was bored:
foreach(preg_grep('/^#/', file($filename), PREG_GREP_INVERT) as $line) {
echo $line;
}
Read file lines into an array
Get all lines NOT starting with ^ the # character
Loop those lines

php - regex exact matches

I have the following strings:
Falchion-Case
P90-ASH-WOOD-WELL-WORN
I also have the following URLS which are inside a text file:
http://csgo.steamanalyst.com/id/115714004/FALCHION-CASE-KEY-
http://csgo.steamanalyst.com/id/115716486/FALCHION-CASE-
http://csgo.steamanalyst.com/id/2018/P90-ASH-WOOD-WELL-WORN
I'm looping through each line in the text file and checking if the string is found inside the URL:
// Read from file
if (stristr($item, "stattrak") === FALSE) {
$lines = file(public_path().'/csgoanalyst.txt');
foreach ($lines as $line) {
// Check if the line contains the string we're looking for, and print if it does
if(preg_match('/(?<!\-)('.$item.')\b/',$line) != false) { // case insensitive
echo $line;
break;
}
}
}
This works perfectly when $item = P90-ASH-WOOD-WELL-WORN however when $item = Falchion-Case It matches on both URL's when only the second: http://csgo.steamanalyst.com/id/115716486/FALCHION-CASE- is valid

Try modifying your regx to match the end of the line, assuming the line ends
'/('.$item.')$/'
This would match
http://csgo.steamanalyst.com/id/115714004/FALCHION-CASE-KEY- <<end of line
Basically do an ends with type match, you can do this too
'/('.$item.')\-?$/'
to optionally match an ending hyphen

You can also use a negative lookahead to negate that unwanted case:
preg_match('/(?<!-)'. preg_quote($item) .'(?!-\w)/i', $line);

Regex Match Exact Number at beginning (like 99 but not 999)

This should be a simple task, but searching for it all day I still can't figure out what I'm missing
I'm trying to open a file using PHP's glob() that begins with a specific number
Example filenames in a directory:
1.txt
123.txt
10 some text.txt
100 Some Other Text.txt
The filenames always begin with a unique number (which is what i need to use to find the right file) and are optionally followed by a space and some text and finally the .txt extension
My problem is that no matter what I do, if i try to match the number 1 in the example folder above it will match every file that begins with 1, but I need to open only the file that starts with exactly 1, no matter what follows it, whether it be a space and text or just .txt
Some example regex that does not succeed at the task:
filepath/($number)*.txt
filepath/([($number)])( |*.?)*.txt
filepath/($number)( |*.?)*.txt
I'm sure there's a very simple solution to this... If possible I'd like to avoid loading every single file into a PHP array and using PHP to check every item for the one that begins with only the exact number, when surely regex can do it in a single action
A bonus would be if you also know how to turn the optional text between the number and the extension into a variable, but that is entirely optional as it's my next task after I figure this one out

The Regex you want to use is: ^99(\D+\.txt)$
$re = "/^99(\D+\.txt)$/";
preg_match($re, $str, $matches);
This will match:
99.txt
99files.txt
but not:
199.txt
999.txt
99
99.txt.xml
99filesoftxt.dat
The ( ) around the \D+.txt will create a capturing group which will contain your file name.

I believe this is what you want OP:
$regex = '/' . $number . '[^0-9][\S\s]+/';
This matches the number, then any character that isn't a number, then any other characters. If the number is 1, this would match:
1.txt
1abc.txt
1 abc.txt
1_abc.txt
1qrx.txt
But it would not match:
1
12.txt
2.txt
11.txt
1.

Here you go:
<?php
function findFileWithNumericPrefix($filepath, $prefix)
{
if (($dir = opendir($filepath)) === false) {
return false;
}
while (($filename = readdir($dir)) !== false) {
if (preg_match("/^$prefix\D/", $filename) === 1) {
closedir($dir);
return $filename;
}
}
closedir($dir);
return false;
}
$file = findFileWithNumericPrefix('/base/file/path', 1);
if ($file !== false) {
echo "Found file: $file";
}
?>
With your example directory listing, the result is:
Found file: 1.txt

You can use a regex like this:
^10\D.*txt$
^--- use the number you want
Working demo
For intance:
$re = "/^10\\D.*txt$/m";
$str = "1.txt\n123.txt\n10 some text2.txt\n100 Some Other2 Text.txt";
preg_match_all($re, $str, $matches);
// will match only 10 some text.txt

Test pipe delimited format with regular expression in php

I need to verify the string is in the following format, otherwise I need to return a warning message.
purple|grape juice
green|lettuce is good in salad
yellow|monkeys like bananas
red|water melon is delicious
I need it in the pipe delimited data like above, each line is split into two chunks then a new line.
If there is 3 pipes in in one line, then it is not correct.

This RegEx will validate the whole string to your specs: ^(\w+\|[\w ]+(\n|$))+$
If one of the lines is invalid, preg_match() will return false.
Note: I assumed left part will have numbers and letters only and the second the same plus space
See it live here: http://regex101.com/r/iV0tC4
So all the code you need is:
if (!preg_match($regex,$string)) trigger_error("Invalid string!",E_USER_WARNING);
Live code: http://codepad.viper-7.com/i3Hcjs

I'm not sure what you are trying to get, but if I'm guessed you need something like this:
<?
$lines = explode("\n",$str);
foreach ($lines as $lineIndex => $oneLine)
if (count(explode('|',$oneLine))>2) echo "You have an error in line ".$lineIndex;
?>

A RegExp match is all you need. Split the string by new lines, then run the RegExp match on each line, if the match returns false then end the function early (short circuit). The RegExp is essentially START_OF_LINE (^) and anything not a pipe ([^\|]) and a pipe (\|) and anything not a pipe ([^\|]) and END_OF_LINE ($).
function verify($str) {
$regex = '/^[^\|]*\|[^\|]*$/';
$all = explode("\n", $str);
foreach($all as $line) {
if(preg_match($regex, $line) == false)
return false;
}
return true;
}
echo verify("bad") === false;
echo verify("bad|bad|bad") === false;
echo verify("abc|123\nbad") === false;
echo verify("abc|123\nbad|bad|bad") === false;
echo verify("good|good") === true;
echo verify("good|good\nnice|nice") === true;

PHP RegExp Replace

I am currently trying to add tokens to a CMS using PHP.
The user can enter (into a WYSIWYG Editor) a string such as [my_include.php]. We would like to extract anything with this format, and turn it into an include of the following format:
include('my_include.php');
Can anyone assist with composing the RegExp and extraction process to allow this? Ideally, I would like to extract them all into a single array, so that we can provide some checking before parsing it as the include();?
Thanks!

preg_replace('~\[([^\]]+)\]~', 'include "\\1";', $str);
Working sample: http://ideone.com/zkwX7

You'll either want to go with preg_match_all(), run the results in a loop and replace whatever you found. Might be a bit faster than the following callback solution, but is a bit more tricky if PREG_OFFSET_CAPUTRE and substr_replace() is used.
<?php
function handle_replace_thingie($matches) {
// build a file path
$file = '/path/to/' . trim($matches[1]);
// do some sanity checks, like file_exists, file-location (not that someone includes /etc/passwd or something)
// check realpath(), file_exists()
// limit the readable files to certain directories
if (false) {
return $matches[0]; // return original, no replacement
}
// assuming the include file outputs its stuff we need to capture it with an output buffer
ob_start();
// execute the include
include $file;
// grab the buffer's contents
$res = ob_get_contents();
ob_end_clean();
// return the contents to replace the original [foo.php]
return $res;
}
$string = "hello world, [my_include.php] and [foo-bar.php] should be replaced";
$string = preg_replace_callback('#\[([^\[]+)\]#', 'handle_replace_thingie', $string);
echo $string, "\n";
?>

Using preg_match_all(), you could do this:
$matches = array();
// If we've found any matches, do stuff with them
if(preg_match_all("/\[.+\.php\]/i", $input, $matches))
{
foreach($matches as $match)
{
// Any validation code goes here
include_once("/path/to/" . $match);
}
}
The regex used here is \[.+\.php\]. This will match any *.php string so that if the user types [hello] for example, it won't match.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Reading a file and extracting data using a regex in PHP - php

Related

How to remove lines containing a specific character (#) from file and echo out the data using PHP

php - regex exact matches

Regex Match Exact Number at beginning (like 99 but not 999)

Test pipe delimited format with regular expression in php

PHP RegExp Replace

Categories

Resources