GlobIterator RegEx - php

I have a string with a wildcard at the end, but I don't know how many characters that string will be. How can I use GlobIterator and RegexIterator to match similar file names? The second match returns all the files from a directory, but I don't want that. I need a proper regular expression. I don't want to match the last set before the extension (ex. the files sized 250M, 500M, etc.)
$iterator = new GlobIterator($this->srcDir . $identifier . ".*");
MATCH ON
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.*
This returns the correct files.
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.500m.jpg
MATCH ON
/var/www/import/2014047-0216/YukonGold.A2014047.1620.*
Returns the files:
/var/www/import/2014047-0216/YukonGold.A2014047.1620.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.500m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.500m.jpg
EXPECTED OUTPUT
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.*
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.721.500m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.*
/var/www/import/2014047-0216/YukonGold.A2014047.1620.250m.jpg
/var/www/import/2014047-0216/YukonGold.A2014047.1620.500m.jpg

You should use it inside a RegexIterator:
// Notice that there is no expansion pattern used here
$path = '/var/www/import/2014047-0216/YukonGold.A2014047.1620.';
$re = '~\Q' . $path . '\E(?:[^.]+\.)?\w+$~';
$regexIterator = new RegexIterator(new GlobIterator("{$path}*"), $re);
foreach ($regexIterator as $filename) {
echo $filename . "\n";
}

Related

Getting an exact preg_replace match from a .txt file [duplicate]

So I have got this far on my own but It looks like I have found the limit of my PHP knowledge (which isn't very much at all!). This script is for filtering filenames (game roms/iso's etc). It has other ways of filtering too but I've just highlighted the section I'm trying to add. I want a external .txt file I can put names of files in like so (separated by a single line break):
Pacman 2 (USA)
Space Invaders (USA)
Asteroids (USA)
Something Else (Europe)
And then running the script will search the directory and place any matching filenames in the "removed" folder. It loops fine with all the other filtering techniques it uses. I'm just trying to add my own (unsuccessfully!)
$gameList = trim(shell_exec("ls -1"));
$gameArray = explode("\n", $gameList);
$file = file_get_contents('manualremove.txt');
$manualRemovePattern = '/(' . str_replace(PHP_EOL, "|", $file) . ')/';
shell_exec('mkdir -p Removed');
foreach($gameArray as $thisGame) {
if(!$thisGame) continue;
// Probably already been removed
if(!file_exists($thisGame)) continue;
if(preg_match ($manualRemovePattern , $thisGame)) {
echo "{$thisGame} is on the manual remove list. Moving to Removed folder.\n";
shell_exec("mv \"{$thisGame}\" Removed/");
continue;
So this is working when I put names of games with no spaces or brackets in the .txt file. But spaces or brackets (or both) are breaking it's functionality. Could someone help me out?
Many thanks!
Replace the fourth line in the code you supplied with
$manualRemovePattern = "/(?:" . implode("|", array_map(function($i) {
return preg_quote(trim($i), "/");
}, explode(PHP_EOL, $file))) . ')/';
The main idea is:
Split the file contents you obtained into lines with explode(PHP_EOL, $file)
Then you need to iterate over the array and modify each item in the array (which can be done with array_map)
Modifying the array items involves adding escaping \ before any special regex metacharacter and a regex delimiter chosen by you (in this case, /), and this is done with preg_quote(trim($i), "/")
Note I remove any leading/trailing spaces with trim from the array items - just in case.
To match them as whole words, use word boundaries:
$manualRemovePattern = '/\b(?:' . implode('|', array_map(function($i) {
return preg_quote(trim($i), '/');
}, explode(PHP_EOL, $file))) . ')\b/';
To match them as whole strings, use ^/$ anchors:
$manualRemovePattern = '/^(?:' . implode('|', array_map(function($i) {
return preg_quote(trim($i), '/');
}, explode(PHP_EOL, $file))) . ')$/';

PHP get list of files matching a pattern and sort by oldest file

if I have a directory, where a list of file looks like this:
Something_2015020820.txt
something_5294032944.txt.a
something_2015324234.txt
Something_2014435353.txt.a
and I want to get the list of file sort by oldest date (not the filename) and the result should take anything that match something_xxxxxxx.txt. So, anything that ends with ".a" is not included. in this case it should return
Something_2015020820.txt
something_2015324234.txt
I do some google search and it seems like I can use glob
$listOfFiles = glob($this->directory . DIRECTORY_SEPARATOR . $this->pattern . "*");
but I'm not sure about the pattern.
It would be awesome if you could provide both case sensitive and insensitive pattern. The match pattern will have to be something_number.txt
It's a bit tricky using glob. It does support some limited pattern matching. For example you can do
glob("*.[tT][xX][tT]");
This will match file.txt and file.TXT, but it will also match file.tXt and file.TXt. Unfortunately there is no way to specify something like file.(txt or TXT). If that's not a problem, great! Otherwise, you'll have to first use this method to at least narrow the results down, and then perform some additional processing afterwards. array_filter and some regex maybe.
A better option might be to use PHP's Iterator classes, so you can specify much more advanced rules.
$directoryIterator = new RecursiveDirectoryIterator($directory);
$iteratorIterator = new RecursiveIteratorIterator($directoryIterator);
$fileList = new RegexIterator($iteratorIterator, '/^.*\.(txt|TXT)$/');
foreach($fileList as $file) {
echo $file;
}
You can use the scandir to get all your files. Then preg_match to filter those files down to ones that don't end with .a. Then filemtime can be used to pull the time the file was last modified. array_multisort can then be used to sort by the time the file was last modified and maintain the key/filename association.
This works as I expected:
date_default_timezone_set('America/New_York'); //change to your timezone
$directory = 'Your Directory';
$files = scandir($directory);
foreach($files as $file) {
if(!preg_match('~\.a$~', $file) && !is_dir($directory . $file)) {
$time["$file"] = filemtime($upload_directory . $file);
}
}
array_multisort($time);
Then to handle the outputting you could do something like:
foreach($time as $file => $da_time){
echo $file . ' '. date('m/d/Y H:i:s', $da_time) . "\n";
}
The preg_match can have i added after the ~ delimiter so the regex is case insensitive; or you could just make change the a to [aA].

Retrieve image from S3 bucket matching a pattern (PHP)

I have an S3 bucket full of images whose naming follows a simple pattern. The first 6 digits group images by listing number, the trailing digit(s) are non-sequential, but follow a reliable pattern (0 thru 99) I'm capturing the six digits that start the filename in a variable $ln.
/*
https://s3.amazonaws.com/stroupenwmls2/602665_10.jpg
https://s3.amazonaws.com/stroupenwmls2/602665_12.jpg
https://s3.amazonaws.com/stroupenwmls2/602665_13.jpg
https://s3.amazonaws.com/stroupenwmls2/602665_15.jpg
*/
What I want to do is populate a 'listing' img src attribute with the url to an image, if one exists for that listing (if not, I provide a no-image.jpg). And I'm looping thru many different listings to create my web page.
I'm struggling with the logic to grab the first image that matches the $listing variable. Here is what I've tried, with no luck (just produces a 0):
$bucket = 'https://s3.amazonaws.com/stroupenwmls2/';
$ln = '602665';
$string = $bucket . $ln . '_';
// match the pattern '_xx.jpg', with 1 or 2 numbers
$image = preg_match('/^_[0-9]{1,2}\.(jpg|jpeg|png|gif)/i', $string);
Then in my web app:
<img src="<?php echo $image ?>">
I'm an idiot when it comes to using preg_match, what I really need is some sort of wildcard parameter. I'm sure I'm making this way too complicated.
The problem is that you're not matching against the image paths, you're matching against what i assume you intend to be part of your regular expression. See below:
$bucket = 'https://s3.amazonaws.com/stroupenwmls2/';
$ln = '602665';
$re = $bucket . $ln . '_' + '[0-9]{1,2}\.(jpg|jpeg|png|gif)';
// let's say you have an array called img_list;
// loop through each path in the list, searching strings
// that match the regular expression constructed in $re.
// if you find a match, return it.
// you'd probably want to define a function to do this for you,
// and call it with the $listing and array as parameters.
foreach (img_list as $img) {
// this returns either 0 or 1 depending on match.
// return the first one, and we're done.
if (preg_match('/^' . $re . '/i', $img)) {
return $img;
}
}

Glob pattern matching the first part of a file name

In a directory I have filenames like 123X1.jpg, 23X1.jpg, 23X2.jpg, 4123X1.jpg.
I need the glob pattern to only get listed files starting with a required string.
For example:
'23X' -> 23X1.jpg, 23X2.jpg
'123X' -> 123X1.jpg
Last part part of the pattern is always an X. The first one is a number.
It's trivial with glob():
print_r(glob('/path/to/23X*.jpg'));
print_r(glob('/path/to/123X*.jpg'));
You can try RegexIterator
$fi = new FilesystemIterator(__DIR__, FilesystemIterator::SKIP_DOTS);
$regex = new RegexIterator($fi, "/\dX[a-z\d]+/i");
foreach($regex as $file) {
echo (string) $file, PHP_EOL;
}

How to extract only part of string in PHP?

I have a following string and I want to extract image123.jpg.
..here_can_be_any_length "and_here_any_length/image123.jpg" and_here_also_any_length
image123 can be any length (newimage123456 etc) and with extension of jpg, jpeg, gif or png.
I assume I need to use preg_match, but I am not really sure and like to know how to code it or if there are any other ways or function I can use.
You can use:
if(preg_match('#".*?\/(.*?)"#',$str,$matches)) {
$filename = $matches[1];
}
Alternatively you can extract the entire path between the double quotes using preg_match and then extract the filename from the path using the function basename:
if(preg_match('#"(.*?)"#',$str,$matches)) {
$path = $matches[1]; // extract the entire path.
$filename = basename ($path); // extract file name from path.
}
What about something like this :
$str = '..here_can_be_any_length "and_here_any_length/image123.jpg" and_here_also_any_length';
$m = array();
if (preg_match('#".*?/([^\.]+\.(jpg|jpeg|gif|png))"#', $str, $m)) {
var_dump($m[1]);
}
Which, here, will give you :
string(12) "image123.jpg"
I suppose the pattern could be a bit simpler -- you could not check the extension, for instance, and accept any kind of file ; but not sure it would suit your needs.
Basically, here, the pattern :
starts with a "
takes any number of characters until a / : .*?/
then takes any number of characters that are not a . : [^\.]+
then checks for a dot : \.
then comes the extension -- one of those you decided to allow : (jpg|jpeg|gif|png)
and, finally, the end of pattern, another "
And the whole portion of the pattern that corresponds to the filename is surrounded by (), so it's captured -- returned in $m
$string = '..here_can_be_any_length "and_here_any_length/image123.jpg" and_here_also_any_length';
$data = explode('"',$string);
$basename = basename($data[1]);

Categories