Regular Expression to detect a file extension - php

I need a regular expression that will detect a filename from a string - see if it ends with .pdf, or .PDF - case insensitive. Should even match .Pdf or any variant of pdf in case the user has messy filenames.
This will be used in PHP 5. I know I can make a bunch of rows to test against each case, but I'm sure there's a more elegant way to do this.

There is nothing wrong with a regex, but there is also a ready-made function for dissecting a path and extracting the extension from it:
echo pathinfo("/path/to/myfile.txt", PATHINFO_EXTENSION); //.txt

how about this one. I dont know what language you are using but here is a regex for matching anything ending in .pdf
.+([.][Pp][Dd][Ff]){1}
my bad. Im half asleep. PHP it is.
dont know php but that regex should work

another possibility is to tolower the extension
strtolower(pathinfo("/path/file.pDf", PATHINFO_EXTENSION)) == ".pdf"

As others have noted, extracting the extension would work, otherwise you can do something like this.
preg_match('/.*\.pdf/i', "match_me.pDf", $matches);

If your string consists of a single filename here is a simple regex solution:
if (preg_match('/\.pdf$/i', $filename)) {
// Its a PDF file
}

Related

PHP Regex - URL - link to a file

How can I identify via preg_match that a string containing a URL is actually pointing to a file and not to a valid page. For example:
www.example.com/a.png
www.example.com/a/b/c/d.mp4
www.example.com/e/f/h.xls
If I just do an explode on "." and check last index, it will not work. Also, I don't have the complete list of possible extensions and want to write something generic.
Thanks.
\/.+\.(?!php|php5)[a-zA-Z0-9]{1,4}
(php and php5 are examples for blacklist here)
Or
explode on . and do an array_pop on it.
I suggest to use a whitelist instead of blacklist. Add only allowed extensions.

Regex pattern to detect a link but not an image

I've been fooling around for a while with regex. A few days ago I started modifying a regex pattern I found some time ago. It detects all hyperlinks, my version should only detect hyperlinks and not images.
http://domain.com/someimage.jpg
shouldn't be detected. But it does detect an image partly. I don't how to solve this.
The original regex:
/(https?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,10}(\/\S*)?/i
Link to my version:
http://regexr.com/38rv9
Please help. Thanks!
You just need a space at last.
/((https?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,10}(\/(?:(\S(?!jpg|jpeg|png|gif))*))?)\s/ig
I would accomplish this by making sure what is clicked by the user does NOT end with an image file extension. You mention you are using php; have ONE condition statement that matches your original regex:
/(https?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,10}(\/\S*)?/i
but does not match any common image file extension at the END of the expression:
/^.*\.*[*(jpg$|jpeg$|gif$|png$|tif$)]/i
This would work for any text string that precedes the image file extension; preg_match will be useful to accomplish this.

Regex to match base name of files with multiple extensions

I'm trying to match files of the following structure in PHP.
Input:
filename.ext1
filename.ext1.ext2
filename.ext3.ext2.ext1
filename.ext4.ext2.ext1.ext4
file name with spaces and no way of knowing how long.ext1
file name with spaces and no way of knowing how long.ext1.ext2
file name with spaces and no way of knowing how long.ext2.ext1.ext3
file name with spaces and no way of knowing how long.ext3.ext1.ext4.ext3
Output:
filename
filename
filename
filename
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
What I've already attempted (doesn't work of course and I already understand why):
^(?P<basename>.*)(\.ext4)|(\.ext3)|(\.ext2)|(\.ext1).*$
I'd like to extract the base name of the file and basically strip all extensions, because there's no way of knowing in which order they may appear. I've tried several solutions presented here but they did not work for me. The extensions could be anything alphanumeric of any length.
I'm fairly new to regular expressions and am confused that apparently you cannot simply search forward to the first dot and remove it including everything that comes after.
To learn, I'd also love to see how to do the reverse and just match all the extensions including the first dot.
Update:
I didn't think about file names that contain dots. So obviously my thinking regarding "searching forward" is flawed. Does anyone have a solution for the case
file name with spaces and no. way of knowing how long.ext3.ext1.ext4.ext3
or even
file name with spaces and no way of knowing.how.long.ext3.ext1.ext4.ext3
The latter one would quite possibly only work when certain extensions are given. So please assume ext1-4 are given but are in an unpredictable sequence.
Quick and dirty:
preg_replace("/\.(ext1|ext2|ext3|ext4)/i", "", $filename)
There's no need to use regular expressions for this; PHP has the buildin function basename() for that
Does something simple like this works for you....
^[^.]*
Basically it just matches string before first dot.
This regex should work for you:
^.+?(?=\.[^.]*$)
Online Demo: http://regex101.com/r/uT2oK5
This will find file names before very last dot only. See all the examples included in the link.
am confused that apparently you cannot simply search forward to the first dot and remove it including everything that comes after.
Since regexes are read from left to right, looking for a single dot will lead you straight to the first dot. That said, you would thus be able to use:
preg_replace("/\..*/", "", $filename);
.* matches any characters except newlines.
If the filename has dots, this obviously won't work, since part of the filename will then be removed.
As per update, if you have the specific extensions, you can use something like this:
preg_replace("/(?:\.ext[1-4])+$/m", "", $filename);
regex101 demo
In a broader perspective, you could use something like this if you have an array of extensions at your disposition:
$exts = array(".ext1", ".ext2", ".ext3", ".ext4");
$result = preg_replace("/(?:". preg_quote(join("|",$exts)) .")+$/m", "", $filename);
.*(?=\.)
Try this? Will match all before the last dot even if theres a dot in the file name
This is easy with just plain old php functions. No need for fancy regex.
$name = substr($filename, 0, strpos($filename, '.'));
This won't work for filenames which have a . like your updated example, however in order to achieve this you would likely need to know in advance the extensions which you are likely to encounter.

How to catch filetypes in malformed URLs

Just wondering how I can extract or match the specific file type, since there are a lot of malformed URLs and directories.
So I need a good regex to match only the real ones.
http://domain.com/1/image.jpg <-match .jpg
http://domain.com/1/image_1.jpg/.gif <-match first .jpg
http://domain.com/1/image_1.jpg/image.png <-match first .jpg
http://domain.com/1/image_1.jpg <-match .jpg
http://domain.com/1/image.jpg.jpeg <-match only the first .jpg
http://domain.com/1/.jpg <-not match
http://domain.com/.jpg.jpg <- not match
/1/.jpg <-not match
/.jpg.png <-match the first jpg
/image.jpg.png <-match the first jpg
I'm trying with this piece of code:
preg_match_all('([a-zA-Z0-9.-_](jpg))i', $url, $matches);
Any ideas?
preg_match('(^(http://domain.com/\w.*?\.jpg))i', $url, $matches);
This will match everything from the start of the string up to the first .jpg. The filename part must start with a letter, number, or _.
Parsing URLs with regular expressions is usually a bad idea. See Getting parts of a URL (Regex) for a related question. In particular, look at this answer, then realize that parse_url might be a good start. Take $result['path'] and use a file name parsing API on it to extract the extension.
I'm not sure exactly what you are asking for though.
http://domain.com/1/image_1.jpg/.gif <-match first .jpg
http://domain.com/1/image_1.jpg/image.png <-match first .jpg
In both of these cases image_1.jpg is a perfectly valid directory name. You could split the path on '/' and check each one for "validity".
Edit I just noticed that you need this to work with relative URLs as well. parse_url does not work well in that case.

I need a regex to return true for 01.mp4

I want to sort out specific files depending on their names, i want a regex that returns true for file names like : 01.mp4, 99.avi, 05.mpg.
the file extensions must match exactly to the ones that i want and the filename must start with characters which can not be longer than 2 characters. the first part is done but the file extensions aren't working. need some help
the regex that I have is
/^[0-9]{1,2}\.[mp4|mpg|avi]*/
but it also returns true for 01.4mp4, 01.4mp4m.
Try this:
/^[0-9]{1,2}\.(?:mp4|mpg|avi)$/
I played a bit with my original regex and came up with this
/^[0-9]{1,2}\.[mp4|mpg|avi]{3}$/
and this works like a charm :)
/^[0-9]{1,2}\.[mp4|mpg|avi]?/

Categories