Regex to match numbers separated by dash (-) and get substring - php

I have a lot of image files in a directory with their ID between some descriptions about what it contains.
This is an example of the files in that directory:
de-te-mo-01-19-1084 moldura.JPG, ce-ld-ns-02-40-0453 senal.JPG, dp-bs-gu-01-43-1597-guante.JPG, am-ca-tw-04-30-2436 Tweter.JPG, am-ma-ac-02-26-0745 aceite.JPG, ca-cc-01-43-1427-F.jpg
What I want is to get the ID of the image *(nn-nn-nnnn) and rename the file with that sub-string.
*n as a number.
The result from the list above would be like: 01-19-1084.JPG, 02-40-0453.JPG, 01-43-1597.JPG, 04-30-2436.JPG, 02-26-0745.JPG, 01-43-1427.jpg.
This is the code I'm using to loop the directory:
$dir = "images";
// Open a known directory, and proceed to read its contents
if (is_dir($dir)) {
if ($dh = opendir($dir)) {
while (($file = readdir($dh)) !== false) {
if($sub_str = preg_match($patern, $file))
{
rename($dir.'/'.$file, $sub_str.'JPG');
}
}
closedir($dh);
}
}
So, how my $patern would be to get what I want?

$pattern must be like this:
$pattern = "/^.*(\d{2}-\d{2}-\d{4}).*\.jpg$/i"
This pattern can check file name and get id as match group. Also preg_math return number, not string. Matches return as third param of function. while body must looks like this:
if(preg_match($patern, $file, $matches))
{
rename($dir.'/'.$file, $matches[1].'.JPG');
}
$matches is array with matched string and groups.

Wouldn't that be just like:
^.*([0-9]{2})-([0-9]{2})-([0-9]{4}).*\.jpg$
Explain:
^ Start of string
.* Match any characters by any number 0+
([0-9]{2}) 2 Digits
- Just a - char
([0-9]{2}) 2 Digits
- Just a - char
([0-9]{4}) 4 Digits
- Just a - char
.* Any character
\.jpg Extension and escape wildcard
$ End of string
Now you got 3 groups inside the (). You have to use index 1, 2 and 3.

Related

Ajax search POST to php

I need some help with refining my current search.
I have folder with images that are named as:
20171116-category_title.jpg (where first number is date yyyymmdd)
My current search looks like this:
<?php
// string to search in a filename.
if(isset($_POST['question'])){
$searchString = $_POST['question'];
}
// image files in my/dir
$imagesDir = '';
$files = glob($imagesDir . '*.{jpg,jpeg,png,gif}', GLOB_BRACE);
// array populated with files found
// containing the search string.
$filesFound = array();
// iterate through the files and determine
// if the filename contains the search string.
foreach($files as $file) {
$name = pathinfo($file, PATHINFO_FILENAME);
// determines if the search string is in the filename.
if(strpos(strtolower($name), strtolower($searchString))) {
$filesFound[] = $file;
}
}
// output the results.
echo json_encode($filesFound, JSON_UNESCAPED_UNICODE);
?>
And this works just fine but...
I would like to limit search only to part of .jpg name that contains "title" behind underscore " _ " and after that (if possible) to expand search to:
To make double search if AJAX POST sends following format: abc+xyz where delimiter "+" practicaly means 2 queries.
First part is (abc) which targets "category" that stands between minus and underscore and second part of query (xyz) (which is basically my first question) only among previously found (category) answers.
Your tips are more than welcome!
Thank you!
For the first part of your question, the exact pattern you use depends on the format of your category strings. If you will never have underscores _ in the category, here's one solution:
foreach($files as $file) {
// $name = "20171116-category_title"
$name = pathinfo($file, PATHINFO_FILENAME);
// $title = "title", assuming your categories will never have "_".
// The regular expression matches 8 digits, followed by a hyphen,
// followed by anything except an underscore, followed by an
// underscore, followed by anything
$title = preg_filter('/\d{8}-[^_]+_(.+)/', '$1', $name);
// Now search based on your $title, not $name
// *NOTE* this test is not safe, see update below.
if(strpos(strtolower($title), strtolower($searchString))) {
If your categories can or will have underscores, you'll need to adjust the regular expression based on some format you can be sure of.
For your 2nd question, you need to first separate your query into addressable parts. Note though that + is typically how spaces are encoded in URLs, so using it as a delimiter means you will never be able to use search terms with spaces. Maybe that's not a problem for you, but if it is you should try another delimter, or maybe simpler would be to use separate search fields, eg 2 inputs on your search form.
Anyway, using +:
if(isset($_POST['question'])){
// $query will be an array with 0 => category term, and 1 => title term
$query = explode('+', $_POST['question']);
}
Now in your loop you need to identify not just the $title part of the filename, but also the $category:
$category = preg_filter('/\d{8}-([^_]+)_.+/', '$1', $name);
$title = preg_filter('/\d{8}-[^_]+_(.+)/', '$1', $name);
Once you have those, you can use them in your final test for a match:
if( strpos(strtolower($category), strtolower($query[0])) && strpos(strtolower($title), strtolower($query[1])) ) {
UPDATE
I just noticed your match test has a problem. strpos can return 0 if a match is found starting at position 0. 0 is a falsey result which which means your test will fail, even though there's a match. You need to explicitly test on FALSE, as described in the docs:
if( strpos(strtolower($category), strtolower($query[0])) !== FALSE
&& strpos(strtolower($title), strtolower($query[1])) !== FALSE ) {

Append to PHP variable, but before '.jpg' / '.png' etc

So I have created a basic code to upload images. The user uploads 2 images and when they are being processed/uploaded I have a small bit of code to make sure that file names aren't the same when they get uploaded to the server
if(file_exists($imgpath1)){
$imgpath1 = $imgpath1 . $random;
}
if(file_exists($imgpath2)){
$imgpath2 = $imgpath2 . $random;
}
Let's say $imgpath1 = "images/user/1.jpg" to begin with (before the PHP above is ran)
and $random is a random number generated at the start of the script, lets say $random = '255'.
The code works perfectly, and the images still display correctly, but it is adding the '255' ($random) directly to the end of the filepath, so $imgpath1 = "images/user/1.jpg255" after the code above has ran.
The file extension won't always be .jpg obviously, it could be .png, .bmp and so on...
How can I make the $random (255 in this instance) go before the ".jpg" in the filepath? I have tried researching on google but I can't seem to word it correctly to find any useful answers.
Thanks
You can try this code :
$filename = "file.jpg";
$append = "001";
function append_filename($filename, $append) {
preg_match ("#^(.*)\.(.+?)$#", $filename , $matches);
return $matches[1].$append.'.'.$matches[2];
}
echo append_filename($filename, $append);
It gives : file001.jpg
http://www.tehplayground.com/#JFiiRpjBX (Ctrl+ENTER for test)
You could do it like this:
This will extract the path and filename before the last period ($regs[1]) and the rest until the end of the string ($regs[2]).
if (preg_match('/^(.*)\.([^.].*)$/i', $imgpath1, $regs)) {
$myfilename = $regs[1] . $random . $regs[2];
} else
$myfilename = $imgpath1;
}
Works with file filenames like /path/subpath/filename.jpg or /path/subpath/filename.saved.jpg, etc.
What the Regex means:
# ^(.*)\.([^.].*)$
#
# Assert position at the beginning of the string «^»
# Match the regular expression below and capture its match into backreference number 1 «(.*)»
# Match any single character that is not a line break character «.*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the character “.” literally «\.»
# Match the regular expression below and capture its match into backreference number 2 «([^.].*)»
# Match any character that is NOT a “.” «[^.]»
# Match any single character that is not a line break character «.*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
You can use pathinfo function to get the required aprts, and rebuild with the random part added:
if(file_exists($imgpath1)){
$pathinfo = pathinfo($imgpath1);
$imgpath1 = $pathinfo['dirname'] .'/' . $pathinfo['filename'] . $random . '.' . $pathinfo['extension'];
}
Though your $random variable will need to be a unique id, else you can still get collisions.
You will also need to filter out bad chars (people on different filesystems to your server etc). Often its just easier to replace the whole name with uniqid() . '.' . $pathinfo['extension'];

Find string pattern from end

I am trying to find if a string is of a format <initial_part_of_name>_0.pdf i.e.
Find if it ends with .pdf (could be eliminated using rtrim)
The initial part is followed by an _ underscore.
The underscore is followed by a whole number (0, 1, 2, ... , etc.)
What could be the optimum way to achieve this? I have tried combinations of the string functions strpos (to find the position. but could not get to do anything from the end of the string).
Any pointers would be appreciated!
Edit:
Sample strings:
public://Big_Data_Tutorial_part4_0.pdf
public://Big_Data_Tutorial_part4_1.pdf
public://Big_Data_Tutorial_part4_3.pdf
The reason why I need to check is to avoid duplicate files which are stored with the _<number> appended.
You can use preg_match() function for matching patterns
Check the function preg_match()
preg_match("/(.*)_(\d+)\.pdf$/", "<initial_part_of_name>_0.pdf",$arr);
In $arr[1], you will get the <initial_part_of_name>
in $arr[2], you will get the number after underscore
a non-array and regex way
$str1 = "public://Big_Data_Tutorial_part4_0a.pdf"; // no match because 0a
$str2 = "public://Big_Data_Tutorial_part4_1.pdf"; // match
$str3 = "public://Big_Data_Tutorial_part4_3.pdf"; // match
$last_part = strrchr($str1, "_");
if (trim(strstr($last_part, ".", true), "_0..9") == "" && strstr($last_part, ".") == ".pdf") {
echo "match";
}
$str = '<initial_part_of_name>_0.pdf';
$exploded = explode('.', $str);
echo $exploded[1];
This could be done using regex. Something like
^[0-9A-Za-z]+\_[\d]\.pdf$
Implementation:
$filename = '<initial_part_of_name>_0.pdf';
if(preg_match('/^[0-9A-Za-z]+\_[\d]\.pdf$/i', $filename)){
// name pattern Matched
}
SOLUTION 2
use pathinfo()
$filename = '<initial_part_of_name>_0.pdf';
$path_parts = pathinfo($filename);
if(strtolower($path_parts['extension']) == 'pdf') {
if(preg_match('/.*_[\d]$/', $path_parts['filename'])){
// name pattern Matched
}
} else {
// Not a PDF file
}

php - regex exact matches

I have the following strings:
Falchion-Case
P90-ASH-WOOD-WELL-WORN
I also have the following URLS which are inside a text file:
http://csgo.steamanalyst.com/id/115714004/FALCHION-CASE-KEY-
http://csgo.steamanalyst.com/id/115716486/FALCHION-CASE-
http://csgo.steamanalyst.com/id/2018/P90-ASH-WOOD-WELL-WORN
I'm looping through each line in the text file and checking if the string is found inside the URL:
// Read from file
if (stristr($item, "stattrak") === FALSE) {
$lines = file(public_path().'/csgoanalyst.txt');
foreach ($lines as $line) {
// Check if the line contains the string we're looking for, and print if it does
if(preg_match('/(?<!\-)('.$item.')\b/',$line) != false) { // case insensitive
echo $line;
break;
}
}
}
This works perfectly when $item = P90-ASH-WOOD-WELL-WORN however when $item = Falchion-Case It matches on both URL's when only the second: http://csgo.steamanalyst.com/id/115716486/FALCHION-CASE- is valid
Try modifying your regx to match the end of the line, assuming the line ends
'/('.$item.')$/'
This would match
http://csgo.steamanalyst.com/id/115714004/FALCHION-CASE-KEY- <<end of line
Basically do an ends with type match, you can do this too
'/('.$item.')\-?$/'
to optionally match an ending hyphen
You can also use a negative lookahead to negate that unwanted case:
preg_match('/(?<!-)'. preg_quote($item) .'(?!-\w)/i', $line);

Regex Match Exact Number at beginning (like 99 but not 999)

This should be a simple task, but searching for it all day I still can't figure out what I'm missing
I'm trying to open a file using PHP's glob() that begins with a specific number
Example filenames in a directory:
1.txt
123.txt
10 some text.txt
100 Some Other Text.txt
The filenames always begin with a unique number (which is what i need to use to find the right file) and are optionally followed by a space and some text and finally the .txt extension
My problem is that no matter what I do, if i try to match the number 1 in the example folder above it will match every file that begins with 1, but I need to open only the file that starts with exactly 1, no matter what follows it, whether it be a space and text or just .txt
Some example regex that does not succeed at the task:
filepath/($number)*.txt
filepath/([($number)])( |*.?)*.txt
filepath/($number)( |*.?)*.txt
I'm sure there's a very simple solution to this... If possible I'd like to avoid loading every single file into a PHP array and using PHP to check every item for the one that begins with only the exact number, when surely regex can do it in a single action
A bonus would be if you also know how to turn the optional text between the number and the extension into a variable, but that is entirely optional as it's my next task after I figure this one out
The Regex you want to use is: ^99(\D+\.txt)$
$re = "/^99(\D+\.txt)$/";
preg_match($re, $str, $matches);
This will match:
99.txt
99files.txt
but not:
199.txt
999.txt
99
99.txt.xml
99filesoftxt.dat
The ( ) around the \D+.txt will create a capturing group which will contain your file name.
I believe this is what you want OP:
$regex = '/' . $number . '[^0-9][\S\s]+/';
This matches the number, then any character that isn't a number, then any other characters. If the number is 1, this would match:
1.txt
1abc.txt
1 abc.txt
1_abc.txt
1qrx.txt
But it would not match:
1
12.txt
2.txt
11.txt
1.
Here you go:
<?php
function findFileWithNumericPrefix($filepath, $prefix)
{
if (($dir = opendir($filepath)) === false) {
return false;
}
while (($filename = readdir($dir)) !== false) {
if (preg_match("/^$prefix\D/", $filename) === 1) {
closedir($dir);
return $filename;
}
}
closedir($dir);
return false;
}
$file = findFileWithNumericPrefix('/base/file/path', 1);
if ($file !== false) {
echo "Found file: $file";
}
?>
With your example directory listing, the result is:
Found file: 1.txt
You can use a regex like this:
^10\D.*txt$
^--- use the number you want
Working demo
For intance:
$re = "/^10\\D.*txt$/m";
$str = "1.txt\n123.txt\n10 some text2.txt\n100 Some Other2 Text.txt";
preg_match_all($re, $str, $matches);
// will match only 10 some text.txt

Categories