Getting an exact preg_replace match from a .txt file [duplicate] - php

So I have got this far on my own but It looks like I have found the limit of my PHP knowledge (which isn't very much at all!). This script is for filtering filenames (game roms/iso's etc). It has other ways of filtering too but I've just highlighted the section I'm trying to add. I want a external .txt file I can put names of files in like so (separated by a single line break):
Pacman 2 (USA)
Space Invaders (USA)
Asteroids (USA)
Something Else (Europe)
And then running the script will search the directory and place any matching filenames in the "removed" folder. It loops fine with all the other filtering techniques it uses. I'm just trying to add my own (unsuccessfully!)
$gameList = trim(shell_exec("ls -1"));
$gameArray = explode("\n", $gameList);
$file = file_get_contents('manualremove.txt');
$manualRemovePattern = '/(' . str_replace(PHP_EOL, "|", $file) . ')/';
shell_exec('mkdir -p Removed');
foreach($gameArray as $thisGame) {
if(!$thisGame) continue;
// Probably already been removed
if(!file_exists($thisGame)) continue;
if(preg_match ($manualRemovePattern , $thisGame)) {
echo "{$thisGame} is on the manual remove list. Moving to Removed folder.\n";
shell_exec("mv \"{$thisGame}\" Removed/");
continue;
So this is working when I put names of games with no spaces or brackets in the .txt file. But spaces or brackets (or both) are breaking it's functionality. Could someone help me out?
Many thanks!

Replace the fourth line in the code you supplied with
$manualRemovePattern = "/(?:" . implode("|", array_map(function($i) {
return preg_quote(trim($i), "/");
}, explode(PHP_EOL, $file))) . ')/';
The main idea is:
Split the file contents you obtained into lines with explode(PHP_EOL, $file)
Then you need to iterate over the array and modify each item in the array (which can be done with array_map)
Modifying the array items involves adding escaping \ before any special regex metacharacter and a regex delimiter chosen by you (in this case, /), and this is done with preg_quote(trim($i), "/")
Note I remove any leading/trailing spaces with trim from the array items - just in case.
To match them as whole words, use word boundaries:
$manualRemovePattern = '/\b(?:' . implode('|', array_map(function($i) {
return preg_quote(trim($i), '/');
}, explode(PHP_EOL, $file))) . ')\b/';
To match them as whole strings, use ^/$ anchors:
$manualRemovePattern = '/^(?:' . implode('|', array_map(function($i) {
return preg_quote(trim($i), '/');
}, explode(PHP_EOL, $file))) . ')$/';

Related

PHP to search a word within txt file and echo the whole line [duplicate]

This question already has an answer here:
How to find a whole word in a string in PHP without accidental matches?
(1 answer)
Closed 2 years ago.
This might look like a duplicate but Its a different issue. I'll almost copy/paste another Question but I'm asking for a different issue. Also since that thread owner asked it very well and understandable I will describe it like he did.
I have a normal text files with each line having data in the following format.
Username | Age | Street
Now what I wanted to do was to search for the Username in the file and when found It will print the whole line. The question below does this perfectly with one main problem:
PHP to search within txt file and echo the whole line
Issue: If you have the name "Tobias" and search for "Tobi" it will find it and disply "Tobias" but I only want to search a whole word that your using as the search string. If I want to search for "Tobi" it should only find "Tobi" and not "Tobias" or every other string containing the word "Tobi".
It works using this solution: https://stackoverflow.com/a/4366744/14071499
But that also has the issue that using the solution above would only print the string that I am searching for and doesn't print the whole line.
So how am I able to search for a word and printing the whole line afterwards without also finding other string that aren't only the word but containing it?
The Code I have so far:
<?php
$file = 'ids.txt';
$searchfor = $_POST['search'];
// the following line prevents the browser from parsing this as HTML.
header('Content-Type: text/plain');
// get the file contents, assuming the file to be readable (and exist)
$contents = file_get_contents($file);
// escape special characters in the query
$pattern = preg_quote($searchfor, '/');
// finalise the regular expression, matching the whole line
$pattern = "/\b{$pattern}.*\$/m";
// search, and store all matching occurences in $matches
if(preg_match_all($pattern, $contents, $matches)){
echo "Found matches:\n";
echo implode("\n", $matches[0]);
}
else{
echo "No matches found";
}
?>
This answer doesn't take into account fields in your source data, since at the moment you're just bulk-matching the raw text and interested in getting full lines. There is a much simpler way to accomplish this, ie. by using file that loads each line into an array member, and the application of preg_grep that filters an array with a regular expression. Implemented as follows:
$lines = file('ids.txt', FILE_IGNORE_NEW_LINES|FILE_SKIP_EMPTY_LINES); // lines as array
$search = preg_quote($_POST['search'], '~');
$matches = preg_grep('~\b' . $search . '\b~', $lines);
foreach($matches as $line => $match) {
echo "Line {$line}: {$match}\n";
}
In related notes, to match only complete words, instead of substrings, you need to have word boundaries \b on both sides of the pattern. The loop above outputs both the match and the line number (0-indexed), since array index keys are saved when using preg_grep.
<?php
$file = "ids.txt";
$search = $_POST["search"];
header("Content-Type: text/plain");
$contents = file_get_contents($file);
$lines = explode("\n", $contents);
foreach ($lines as $line) {
if (preg_match("/\b${search}\b/", $line, $matches)) {
echo $line;
}
}

GlobIterator RegEx

I have a string with a wildcard at the end, but I don't know how many characters that string will be. How can I use GlobIterator and RegexIterator to match similar file names? The second match returns all the files from a directory, but I don't want that. I need a proper regular expression. I don't want to match the last set before the extension (ex. the files sized 250M, 500M, etc.)
$iterator = new GlobIterator($this->srcDir . $identifier . ".*");
MATCH ON
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.*
This returns the correct files.
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.500m.jpg
MATCH ON
/var/www/import/2014047-0216/YukonGold.A2014047.1620.*
Returns the files:
/var/www/import/2014047-0216/YukonGold.A2014047.1620.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.500m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.500m.jpg
EXPECTED OUTPUT
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.*
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.500m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.*
/var/www/import/2014047-0216/YukonGold.A2014047.1620.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.500m.jpg
You should use it inside a RegexIterator:
// Notice that there is no expansion pattern used here
$path = '/var/www/import/2014047-0216/YukonGold.A2014047.1620.';
$re = '~\Q' . $path . '\E(?:[^.]+\.)?\w+$~';
$regexIterator = new RegexIterator(new GlobIterator("{$path}*"), $re);
foreach ($regexIterator as $filename) {
echo $filename . "\n";
}

explode csv file on delimiter (;) and delimiter(,)?

when I explode csv file on delimiter (;)
the explode successfully in some excel program and failed in others
also when I explode csv file on delimiter (,)
the explode successfully in some excel program and failed in others
How can I do explode in all versions of excel?
How can I know the perfect delimiter to explode?
yes there is code..
if (!function_exists('create_csv')) {
function create_csv($query, &$filename = false, $old_csv = false) {
if(!$filename) $filename = "data_export_".date("Y-m-d").".csv";
$ci = &get_instance();
$ci->load->helper('download');
$ci->load->dbutil();
$delimiter = ";";
$newline = "\r\n";
$csv = "Data:".date("Y-m-d").$newline;
if($old_csv)
$csv .= $old_csv;
else
$csv .= $ci->dbutil->csv_from_result($query, $delimiter, $newline);
$columns = explode($newline, $csv);
$titles = explode($delimiter, $columns[1]);
$new_titles = array();
foreach ($titles as $item) {
array_push($new_titles, lang(trim($item,'"')));
}
$columns[1] = implode($delimiter, $new_titles);
$csv = implode($newline, $columns);
return $csv;
}
}
sometimes I put $delimiter = ";";
and sometims $delimiter = ",";
thanks..
You can use helper function to detect best delimiter like:
public function find_delimiter($csv)
{
$delimiters = array(',', '.', ';');
$bestDelimiter = false;
$count = 0;
foreach ($delimiters as $delimiter)
if (substr_count($csv, $delimiter) > $count) {
$count = substr_count($csv, $delimiter);
$bestDelimiter = $delimiter;
}
return $bestDelimiter;
}
If you have an idea of the expected data (number of columns) then this might work as a good guess, and could be a good alternative to comparing which occurs the most (depending on what kind of data you're expecting).
It would work even better if you have a header record, I'd imagine. (You could put in a check for specific header values)
Sorry for not fitting it into your code, but I am not really sure what those calls you are making do, but you should be able to fit it around.
$expected_num_of_columns = 10;
$delimiter = "";
foreach (array(",", ";") as $test_delimiter) {
$fid = fopen ($filename, "r");
$csv_row = fgetcsv($fid, 0, $test_delimiter);
if (count($csv_row) == $expected_num_of_columns) {
$delimiter = $test_delimiter;
break;
}
fclose($fid);
}
if (empty($delimiter)) {
die ("Input file did not contain the correct number of fields (" . $expected_num_of_columns . ")");
}
Don't use this if, for example, all or most of the fields contain non-integer numbers (e.g. a list of monetary amounts) and has no header record, because files separated by ; are most likely to use , as the decimal point and there could be the same number of commas and semi-colons.
The short answer is, you probably can't unless you can apply some heuristic to determine the file format. If you don't know and can't detect the format of the file you're parsing, then parsing it is going to be difficult.
However, once you have determined (or, required a particular one) the delimiter format. You will probably find that php's built-in fgetcsv will be easier and more accurate than a manual explode based strategy.
There is no way to be 100% sure you are targeting the real delimiter. All you can do is guessing.
You should start by finding the right delimiter, then explode the CSV on this delimiter.
To find the delimiter, basically, you want a function that counts the number of , and the number of ; and that returns the greater.
Something like :
$array = explode(find_delimiter($csv), $csv);
Hope it helps ;)
Edit : Your find_delimiter function could be something like :
function find_delimiter($csv)
{
$arrDelimiters = array(',', '.', ';');
$arrResults = array();
foreach ($arrDelimiters as $delimiter)
{
$arrResults[$delimiter] = count(explode($delimiter, $csv));
}
$arrResults = rsort($arrResults);
return (array_keys($arrResults)[0]);
}
Well, it looks like you exactly know that your delimiter will be "," or ";". This is a good place to start. Thus, you may try to replace all commas (,) to semicolons (;), and then explode by the semicolon only. However, in this approach you would definitely have a problem in some cases, because some lines of your CSV files could be like this:
"name,value",other name,other value,last name;last value
In this way delimiter of your CSV file will be comma if there will be four columns in your CSV file. However, by changing commas to semicolons you would get five columns which would be incorrect. So, changing some delimiter to another is not a good way.
But still, if your CSV file is correctly formatted, then you may find correct delimiter in any of the lines. So, you may try to create some function like find_delimiter($csvLine) as proposed by #johnkork, but the problem with this is that the function itself can't know which delimiter to search for. However, you exactly know all the possible delimiters, so you may try to create another, quite similar, function like delimiter_exists($csvLine, $delimiter) which returns true or false.
But even the function delimiter_exists($csvLine, $delimiter) is not enough. Why? Because for the instance of CSV line provided above you would get that both "," and ";" are delimiters that exists. For comma it would CSV file with four columns, and for semicolon it would be two columns.
Thus, there is no universal way which would get you exactly what you want. However, there may be another way you can check for - the first line of CSV file which is the header assuming your CSV files have a header. Mostly, headers in CSV file have (not necessarily) no other symbols, except for the alphanumeric names of the columns, which are delimited by the specific delimiter. So, you may try to create function like delimiter_exists($csvHeader, $delimiter) whose implementation could be like this:
function delimiter_exists($csvHeader, $delimiter) {
return (bool)preg_match("/$delimiter/", $csvHeader);
}
For you specific case you may use it like this:
$csvHeader = "abc;def";
$delimiter = delimiter_exists($csvHeader, ',') ? ',' : ';';
Hope this helps!

Regex Match Exact Number at beginning (like 99 but not 999)

This should be a simple task, but searching for it all day I still can't figure out what I'm missing
I'm trying to open a file using PHP's glob() that begins with a specific number
Example filenames in a directory:
1.txt
123.txt
10 some text.txt
100 Some Other Text.txt
The filenames always begin with a unique number (which is what i need to use to find the right file) and are optionally followed by a space and some text and finally the .txt extension
My problem is that no matter what I do, if i try to match the number 1 in the example folder above it will match every file that begins with 1, but I need to open only the file that starts with exactly 1, no matter what follows it, whether it be a space and text or just .txt
Some example regex that does not succeed at the task:
filepath/($number)*.txt
filepath/([($number)])( |*.?)*.txt
filepath/($number)( |*.?)*.txt
I'm sure there's a very simple solution to this... If possible I'd like to avoid loading every single file into a PHP array and using PHP to check every item for the one that begins with only the exact number, when surely regex can do it in a single action
A bonus would be if you also know how to turn the optional text between the number and the extension into a variable, but that is entirely optional as it's my next task after I figure this one out
The Regex you want to use is: ^99(\D+\.txt)$
$re = "/^99(\D+\.txt)$/";
preg_match($re, $str, $matches);
This will match:
99.txt
99files.txt
but not:
199.txt
999.txt
99
99.txt.xml
99filesoftxt.dat
The ( ) around the \D+.txt will create a capturing group which will contain your file name.
I believe this is what you want OP:
$regex = '/' . $number . '[^0-9][\S\s]+/';
This matches the number, then any character that isn't a number, then any other characters. If the number is 1, this would match:
1.txt
1abc.txt
1 abc.txt
1_abc.txt
1qrx.txt
But it would not match:
1
12.txt
2.txt
11.txt
1.
Here you go:
<?php
function findFileWithNumericPrefix($filepath, $prefix)
{
if (($dir = opendir($filepath)) === false) {
return false;
}
while (($filename = readdir($dir)) !== false) {
if (preg_match("/^$prefix\D/", $filename) === 1) {
closedir($dir);
return $filename;
}
}
closedir($dir);
return false;
}
$file = findFileWithNumericPrefix('/base/file/path', 1);
if ($file !== false) {
echo "Found file: $file";
}
?>
With your example directory listing, the result is:
Found file: 1.txt
You can use a regex like this:
^10\D.*txt$
^--- use the number you want
Working demo
For intance:
$re = "/^10\\D.*txt$/m";
$str = "1.txt\n123.txt\n10 some text2.txt\n100 Some Other2 Text.txt";
preg_match_all($re, $str, $matches);
// will match only 10 some text.txt

php simplest case regex replacement, but backtraces not working

Hacking up what I thought was the second simplest type of regex (extract a matching string from some strings, and use it) in php, but regex grouping seems to be tripping me up.
Objective
take a ls of files, output the commands to format/copy the files to have the correct naming format.
Resize copies of the files to create thumbnails. (not even dealing with that step yet)
Failure
My code fails at the regex step, because although I just want to filter out everything except a single regex group, when I get the results, it's always returning the group that I want -and- the group before it, even though I in no way requested the first backtrace group.
Here is a fully functioning, runnable version of the code on the online ide:
http://ideone.com/2RiqN
And here is the code (with a cut down initial dataset, although I don't expect that to matter at all):
<?php
// Long list of image names.
$file_data = <<<HEREDOC
07184_A.jpg
Adrian-Chelsea-C08752_A.jpg
Air-Adams-Cap-Toe-Oxford-C09167_A.jpg
Air-Adams-Split-Toe-Oxford-C09161_A.jpg
Air-Adams-Venetian-C09165_A.jpg
Air-Aiden-Casual-Camp-Moc-C09347_A.jpg
C05820_A.jpg
C06588_A.jpg
Air-Aiden-Classic-Bit-C09007_A.jpg
Work-Moc-Toe-Boot-C09095_A.jpg
HEREDOC;
if($file_data){
$files = preg_split("/[\s,]+/", $file_data);
// Split up the files based on the newlines.
}
$rename_candidates = array();
$i = 0;
foreach($files as $file){
$string = $file;
$pattern = '#(\w)(\d+)_A\.jpg$#i';
// Use the second regex group for the results.
$replacement = '$2';
// This should return only group 2 (any number of digits), but instead group 1 is somehow always in there.
$new_file_part = preg_replace($pattern, $replacement, $string);
// Example good end result: <img src="images/ch/ch-07184fs.jpg" width="350" border="0">
// Save the rename results for further processing later.
$rename_candidates[$i]=array('file'=>$file, 'new_file'=>$new_file_part);
// Rename the images into a standard format.
echo "cp ".$file." ./ch/ch-".$new_file_part."fs.jpg;";
// Echo out some commands for later.
echo "<br>";
$i++;
if($i>10){break;} // Just deal with the first 10 for now.
}
?>
Intended result for the regex: 788750
Intended result for the code output (multiple lines of): cp air-something-something-C485850_A.jpg ./ch/ch-485850.jpg;
What's wrong with my regex? Suggestions for simpler matching code would be appreciated as well.
Just a guess:
$pattern = '#^.*?(\w)(\d+)_A\.jpg$#i';
This includes the whole filename in the match. Otherwise preg_replace() will really only substitute the end of each string - it only applies the $replacement expression on the part that was actually matched.
Scan Dir and Expode
You know what? A simpler way to do it in php is to use scandir and explode combo
$dir = scandir('/path/to/directory');
foreach($dir as $file)
{
$ext = pathinfo($file,PATHINFO_EXTENSION);
if($ext!='jpg') continue;
$a = explode('-',$file); //grab the end of the string after the -
$newfilename = end($a); //if there is no dash just take the whole string
$newlocation = './ch/ch-'.str_replace(array('C','_A'),'', basename($newfilename,'.jpg')).'fs.jpg';
echo "#copy($file, $newlocation)\n";
}
#and you are done :)
explode: basically a filename like blah-2.jpg is turned into a an array('blah','2.jpg); and then taking the end() of that gets the last element. It's the same almost as array_pop();
Working Example
Here's my ideaone code http://ideone.com/gLSxA

Categories