Regular expression filename matching - php

I am writing a PHP function in Drupal to detect duplicate file uploads and attempting to compare the uploaded filename to previously uploaded files.
I have example files of:
trees-nature_0.jpg
trees-nature_1.jpg
trees-nature0.jpg
trees-nature.jpg
I am trying to match all of them all using the following code:
file_scan_directory('image/uploads', "/trees-nature[*]?.jpg/");
However, all I get back is trees-nature.jpg.
I would appreciate some correction.

Your regex is not correct. use:
file_scan_directory('image/uploads', '/trees-nature.*?\.jpg/');

You can use the following:
file_scan_directory('image/uploads', '/trees-nature(.*?)\.jpg/');
Correction:
[ ] cannot be used as parentheses.. it has special meaning in regex
* is not wildcard in regex.. you have to use .*
. also has special meaning here (any character) you need to escape it

Related

Accept only single dot while getting files?

When i try to upload a image using php, it accepts files ending with .php.jpg for example: acp_1.php.jpg How to stop a user from adding this kind of files?
Using this regex:
^[^\.\s]+\.[^\.\s]+$
Ensures a unique point wrapped with non-space characters around, doing the job you asked.
You can use strpos($string, '.php').
TThen just do a condition if(strpos(***)), and you can then prevent it.

PHP Regex - URL - link to a file

How can I identify via preg_match that a string containing a URL is actually pointing to a file and not to a valid page. For example:
www.example.com/a.png
www.example.com/a/b/c/d.mp4
www.example.com/e/f/h.xls
If I just do an explode on "." and check last index, it will not work. Also, I don't have the complete list of possible extensions and want to write something generic.
Thanks.
\/.+\.(?!php|php5)[a-zA-Z0-9]{1,4}
(php and php5 are examples for blacklist here)
Or
explode on . and do an array_pop on it.
I suggest to use a whitelist instead of blacklist. Add only allowed extensions.

Regex to match base name of files with multiple extensions

I'm trying to match files of the following structure in PHP.
Input:
filename.ext1
filename.ext1.ext2
filename.ext3.ext2.ext1
filename.ext4.ext2.ext1.ext4
file name with spaces and no way of knowing how long.ext1
file name with spaces and no way of knowing how long.ext1.ext2
file name with spaces and no way of knowing how long.ext2.ext1.ext3
file name with spaces and no way of knowing how long.ext3.ext1.ext4.ext3
Output:
filename
filename
filename
filename
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
What I've already attempted (doesn't work of course and I already understand why):
^(?P<basename>.*)(\.ext4)|(\.ext3)|(\.ext2)|(\.ext1).*$
I'd like to extract the base name of the file and basically strip all extensions, because there's no way of knowing in which order they may appear. I've tried several solutions presented here but they did not work for me. The extensions could be anything alphanumeric of any length.
I'm fairly new to regular expressions and am confused that apparently you cannot simply search forward to the first dot and remove it including everything that comes after.
To learn, I'd also love to see how to do the reverse and just match all the extensions including the first dot.
Update:
I didn't think about file names that contain dots. So obviously my thinking regarding "searching forward" is flawed. Does anyone have a solution for the case
file name with spaces and no. way of knowing how long.ext3.ext1.ext4.ext3
or even
file name with spaces and no way of knowing.how.long.ext3.ext1.ext4.ext3
The latter one would quite possibly only work when certain extensions are given. So please assume ext1-4 are given but are in an unpredictable sequence.
Quick and dirty:
preg_replace("/\.(ext1|ext2|ext3|ext4)/i", "", $filename)
There's no need to use regular expressions for this; PHP has the buildin function basename() for that
Does something simple like this works for you....
^[^.]*
Basically it just matches string before first dot.
This regex should work for you:
^.+?(?=\.[^.]*$)
Online Demo: http://regex101.com/r/uT2oK5
This will find file names before very last dot only. See all the examples included in the link.
am confused that apparently you cannot simply search forward to the first dot and remove it including everything that comes after.
Since regexes are read from left to right, looking for a single dot will lead you straight to the first dot. That said, you would thus be able to use:
preg_replace("/\..*/", "", $filename);
.* matches any characters except newlines.
If the filename has dots, this obviously won't work, since part of the filename will then be removed.
As per update, if you have the specific extensions, you can use something like this:
preg_replace("/(?:\.ext[1-4])+$/m", "", $filename);
regex101 demo
In a broader perspective, you could use something like this if you have an array of extensions at your disposition:
$exts = array(".ext1", ".ext2", ".ext3", ".ext4");
$result = preg_replace("/(?:". preg_quote(join("|",$exts)) .")+$/m", "", $filename);
.*(?=\.)
Try this? Will match all before the last dot even if theres a dot in the file name
This is easy with just plain old php functions. No need for fancy regex.
$name = substr($filename, 0, strpos($filename, '.'));
This won't work for filenames which have a . like your updated example, however in order to achieve this you would likely need to know in advance the extensions which you are likely to encounter.

replicate preg replace with javascript

Is it possible to replicate this with javascript?
preg_replace('/(.gif|.jpg|.png)/', '_thumb$1', $f['logo']);
EDIT - I am not getting this following error for this peice of code,
unterminated string literal
$('#feed').prepend('<div class="feed-item"><img src="'+html.logo.replace(/(.gif|.jpg|.png)/g, "_thumb$1")+'"/>
<div class="content">'+html.content+'</div></div>').fadeIn('slow');
There are a couple of problems with the code you are trying to replicate:
It matches "extensions" even if they aren't at the end of the filename.
The dot in a regular expression matches (nearly*) any character, not just a period.
Try this instead:
'abc.jpg'.replace(/\.(jpg|gif|png)$/, '_thumbs$&')
I'm assuming that the string you are trying to replace contains only a single filename.
*See the documentation for PCRE_DOTALL.
Yes, except that in JavaScript, replace is a string's method, so it would be rearranged a little (also, the array/object notation is slightly different):
f.logo.replace(/\.(gif|jpg|png)/, '_thumb.$1');
more info
somestringvar.replace(/(.gif|.jpg|.png)/, replacementValue)

Extract multiple lines of text from string PHP

I have a string of registry keys that looks like this:
ThreatName REG_SZ c:\temp Protection Code REG_SZ a Check ThreatName REG_SZ c:\windows Protection
I want to extract "c:\WHATEVER" from the string. It occurs multiple times between the words "ThreatName REG_SZ" and "Protection".
How can I extract "c:\WHATEVER" multiple times using PHP?
One way to do it is using Regular Expressions, here is an example code (un-tested).
$string = "ThreatName REG_SZ c:\\temp Protection
Code REG_SZ c:\\a Check
ThreatName REG_SZ c:\\windows Protection";
preg_match_all("~.* REG_SZ (.*) ~iU", $string, $matches);
print_r($matches);
If you want to understand more see the php manual for: preg_match_all(). Or google regular expressions for more information on them. But basically it looks between the REG_SZ and Protection (the U modifier makes it ungreedy so it will look for the first Protection) and returns everything but the new line character (the .*). If this is spread across new lines, the 's' modifier will help resolve that.
EDIT: Saw that you wanted them all. This should work for all of them.
EDIT: Fixed the regex to include "ThreatName", not sure if this is dynamic. Also added extra slashes to the string as they were being parsed as characters.
I am not sure if you will have to use addslashes() on the string or not, but it maybe needed.
Removed the isset as it was not necessary.
EDIT: Modified the code given that the correct output formatting was omitted. The updated method will work, but if the directory has a space in it, chances are it will only pull the first part of the directory.

Categories