PHP string manipulation help with timestamped files

PHP string manipulation help with timestamped files - php

i'm trying to work out the best way to remove a timestamp from a filename using php's string functions. The timestamp is split from the rest of the filename by an underscore on the left, and the dot to start the file extension on the right (e.g myfile_12343434.jpg) - I only ever want the text prior to the underscore although the length of this can vary. What's the best way to deal with this? Thanks!

edit to leave the extension intact (including e.g. .gd2 and .JPEG) do this:
$new = preg_replace("/_\\d+(\\.[a-z0-9]+)\$/i","\\1",$orig);
this effectively removes only the "_123" part, in a not-so-pretty way. For the purists among us, a version with a lookahead assertion, which only removes the timestamp:
$new = preg_replace("/_\\d+(?=\\.[0-9a-z]+\$)/i","",$orig);

You could use this:
$filename = explode("_", $orig_filename)[0];

The best way is to use preg_replace() to specify an exact match. A good start is something like the following (which will also preserve the extension):
$new = preg_replace("/_\d+/","",$orig);
But since this is a unix timestamp, we can do better by specifying the length of the numeric portion that it will match on:
$new = preg_replace("/_\d{1,11}/","",$orig);

Related

Regex to match base name of files with multiple extensions

I'm trying to match files of the following structure in PHP.
Input:
filename.ext1
filename.ext1.ext2
filename.ext3.ext2.ext1
filename.ext4.ext2.ext1.ext4
file name with spaces and no way of knowing how long.ext1
file name with spaces and no way of knowing how long.ext1.ext2
file name with spaces and no way of knowing how long.ext2.ext1.ext3
file name with spaces and no way of knowing how long.ext3.ext1.ext4.ext3
Output:
filename
filename
filename
filename
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
What I've already attempted (doesn't work of course and I already understand why):
^(?P<basename>.*)(\.ext4)|(\.ext3)|(\.ext2)|(\.ext1).*$
I'd like to extract the base name of the file and basically strip all extensions, because there's no way of knowing in which order they may appear. I've tried several solutions presented here but they did not work for me. The extensions could be anything alphanumeric of any length.
I'm fairly new to regular expressions and am confused that apparently you cannot simply search forward to the first dot and remove it including everything that comes after.
To learn, I'd also love to see how to do the reverse and just match all the extensions including the first dot.
Update:
I didn't think about file names that contain dots. So obviously my thinking regarding "searching forward" is flawed. Does anyone have a solution for the case
file name with spaces and no. way of knowing how long.ext3.ext1.ext4.ext3
or even
file name with spaces and no way of knowing.how.long.ext3.ext1.ext4.ext3
The latter one would quite possibly only work when certain extensions are given. So please assume ext1-4 are given but are in an unpredictable sequence.

Quick and dirty:
preg_replace("/\.(ext1|ext2|ext3|ext4)/i", "", $filename)

There's no need to use regular expressions for this; PHP has the buildin function basename() for that

Does something simple like this works for you....
^[^.]*
Basically it just matches string before first dot.

This regex should work for you:
^.+?(?=\.[^.]*$)
Online Demo: http://regex101.com/r/uT2oK5
This will find file names before very last dot only. See all the examples included in the link.

am confused that apparently you cannot simply search forward to the first dot and remove it including everything that comes after.
Since regexes are read from left to right, looking for a single dot will lead you straight to the first dot. That said, you would thus be able to use:
preg_replace("/\..*/", "", $filename);
.* matches any characters except newlines.
If the filename has dots, this obviously won't work, since part of the filename will then be removed.
As per update, if you have the specific extensions, you can use something like this:
preg_replace("/(?:\.ext[1-4])+$/m", "", $filename);
regex101 demo
In a broader perspective, you could use something like this if you have an array of extensions at your disposition:
$exts = array(".ext1", ".ext2", ".ext3", ".ext4");
$result = preg_replace("/(?:". preg_quote(join("|",$exts)) .")+$/m", "", $filename);

.*(?=\.)
Try this? Will match all before the last dot even if theres a dot in the file name

This is easy with just plain old php functions. No need for fancy regex.
$name = substr($filename, 0, strpos($filename, '.'));
This won't work for filenames which have a . like your updated example, however in order to achieve this you would likely need to know in advance the extensions which you are likely to encounter.

decoding Timestamps explicitly encoded in a filename

Quite a while ago I decided to change the way filenames are formatted when users upload a file to my website(which was a great idea!) to include the UNIX_Timestamp in the filename, this proved very useful after unfortunately the timestamps associated with files in my database was wiped, the filename structure has changed considerably and I am currently trying to recover the valid timestamp from the filename, however so far I have only been able to recover half.
Beforehand, I structured the filename to be like this:
dcd1322318879.png - Letters prepended by the unix timestamp.
Then, I allowed people to start logging in and made the filename much more Human Friendly, they now read as follows;
username-09092012-183422.jpeg where username is the logged in users name. The problem I am having now is the fact that firstly, I am awful with regex, for the previous filenames(without the user data and human friendly date) my code was:
$newDate = preg_replace("/[^0-9]/","", $data['name']);
However obviously that will not work now as usernames can contain numbers, and therefore that number will also be included in the final number for the date.
I am currently wondering if there is anyway I can resolve this issue, I guess the fact I added hyphens(-) will be useful as I can use it as a separator, however I have no idea where to start. I need to just get the last two parts of the filename, which contain the date & time so I can then convert them into unix timestamps.
Any help is greatly appreciated

There you go, yet another way, including translation to unix timestamp:
preg_match_all('/[0-9]+/', $string, $matches);
$d = DateTime::createFromFormat('dmY His', $matches[0][0].' '.$matches[0][1]);
$timestamp = $d->getTimestamp();

Try this:
preg_match_all("/(\d{8})-(\d+)/", $str, $matches);

the most easiest way is using
$array = explode ("-", $filename);
print_r ($array);
unless you really want to go with regex, but I dont think its needed since the system is the one doing all the creating. Meaning there will be no user inputs... it can also depends on how you have your script setup.

From your question, I assume that the hyphens can be used as the structure for the regex. That is, usernames and dates won't contain them.
So then, this pretty simple pattern would work:
/(.*?)-(.*?)-(.*)/
Group 1 will contain the username, group 2 contains the date, and group 3 contains the filename (note that the filename (everything after the third hyphen) can be anything at all, including more hyphens).
(the .*? is the dot, quantified at 'zero or more times' by the *, and the ? makes the quantifier lazy, so that the . doesn't just greedily race to the end of the string)

To parse all three components from your file name, try this:
<?php
preg_match(
"/^([\w]+)-(\d{8})-(\d{6})/",
"username-09092012-183422.jpeg",
$matches);
echo "user=[{$matches[1]}]<br/>date=[{$matches[2]}]<br/>time=[{$matches[3]}]<br/>";
?>
Which produces:
user=[username]
date=[09092012]
time=[183422]
You might also consider incorporating the date in your file names as YYYYMMDD. For example:
'09092012' => '20120909'
This will make it easier to sort your data by date should you ever wish to do so.

php string manipulation nonrandom sort

I am trying to sort a 4 character string thats being feed in from a user into a different order. an example might be they type "abcd" which I then take and turn it into "bcad".
Here is an example of my attempt which is not working :P
<?php
$mixedDate = $_REQUEST['userDate'];
$formatted_date = firstSubString($mixedDate,2).secondSubString($mixedDate,3).thirdSubString($mixedDate,1).fourthSubString($mixedDate,4);
//... maybe some other stuff here then echo formatted_date
?>
any help would be appreciated.

Copied from comment:
You could pretty simply do this by doing something like:
$formatted_date = $mixedDate[1].$mixedDate[2].$mixedDate[0].$mixedDate[3];
That way, you don't have to bother with calling a substring method many times, since you're just moving individual characters around.

<?php
$mixedDate = $_REQUEST['userDate'];
$formatted_date = $mixedDate{1}.$mixedDate{2}.$mixedDate{0}.$mixedDate{3};
echo $formatted_date;
?>
The curly syntax allows you to get just that one character from your string.
It should be noted that this works correctly on your sample string, abcd and turns it into bcad if $_REQUEST['userDate'] is abcd.

Look into split() in php. It takes a string and a delimiter then splits the string into an array. Either force the user to use a certain format or use a regex on the input string to put the date into a known format, like dd/mm/yyyy or dd-mm-yyyy, then use the hyphen or / as the delimiter.
Once the string is split into an array, you can rearrange it any way you like.

That is very simple.
If
$mixedDate = 21-12-2010
then, try this
echo substr($mixedDate, 3,
2).'-'.substr($mixedDate, 0,
2).'-'.substr($mixedDate, 6);
this will result in
12-21-2010
This is assuming the format is fixed.

Use str_split() to break the string into single characters:
$char_array = str_split($input_string);
If you know exactly what order you want, and you only have four characters, then from here you can actually just do it the way you wanted from your question, and concatenate the array elements back into a single string, like so:
$output_string = $char_array[2].$char_array[3].$char_array[1].$char_array[4];
If your needs are more complex, you can sort and implode the string:
Use sort() to put the characters into order:
sort($char_array);
Or one of the other related sorting functions that PHP provides if you need a different sort order. If you need an sort order which is specific to your requirements, you can use usort(), which allows you to write a function which defines how the sorting works.
Then re-join the characters into a single string using implode():
$output_string = implode($char_array);
Hope that helps.

regex to get current page or directory name?

I am trying to get the page or last directory name from a url
for example if the url is: http://www.example.com/dir/ i want it to return dir or if the passed url is http://www.example.com/page.php I want it to return page Notice I do not want the trailing slash or file extension.
I tried this:
$regex = "/.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*/i";
$name = strtolower(preg_replace($regex,"$2",$url));
I ran this regex in PHP and it returned nothing. (however I tested the same regex in ActionScript and it worked!)
So what am I doing wrong here, how do I get what I want?
Thanks!!!

Don't use / as the regex delimiter if it also contains slashes. Try this:
$regex = "#^.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*$#i";

You may try tho escape the "/" in the middle. That simply closes your regex. So this may work:
$regex = "/.*\.(com|gov|org|net|mil|edu)\/([a-z_\-]+).*/i";
You may also make the regex somewhat more general, but that's another problem.

You can use this
array_pop(explode('/', $url));
Then apply a simple regex to remove any file extension

Assuming you want to match the entire address after the domain portion:
$regex = "%://[^/]+/([^?#]+)%i";
The above assumes a URL of the format extension://domainpart/everythingelse.

Then again, it seems that the problem here isn't that your RegEx isn't powerful enough, just mistyped (closing delimiter in the middle of the string). I'll leave this up for posterity, but I strongly recommend you check out PHP's parse_url() method.
This should adequately deliver:
substr($s = basename($_SERVER['REQUEST_URI']), 0, strrpos($s,'.') ?: strlen($s))
But this is better:
preg_replace('/[#\.\?].*/','',basename($path));
Although, your example is short, so I cannot tell if you want to preserve the entire path or just the last element of it. The preceding example will only preserve the last piece, but this should save the whole path while being generic enough to work with just about anything that can be thrown at you:
preg_replace('~(?:/$|[#\.\?].*)~','',substr(parse_url($path, PHP_URL_PATH),1));

As much as I personally love using regular expressions, more 'crude' (for want of a better word) string functions might be a good alternative for you. The snippet below uses sscanf to parse the path part of the URL for the first bunch of letters.
$url = "http://www.example.com/page.php";
$path = parse_url($url, PHP_URL_PATH);
sscanf($path, '/%[a-z]', $part);
// $part = "page";

This expression:
(?<=^[^:]+://[^.]+(?:\.[^.]+)*/)[^/]*(?=\.[^.]+$|/$)
Gives the following results:
http://www.example.com/dir/ dir
http://www.example.com/foo/dir/ dir
http://www.example.com/page.php page
http://www.example.com/foo/page.php page
Apologies in advance if this is not valid PHP regex - I tested it using RegexBuddy.

Save yourself the regular expression and make PHP's other functions feel more loved.
$url = "http://www.example.com/page.php";
$filename = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
Warning: for PHP 5.2 and up.

Whitelist in php

I have an input for users where they are supposed to enter their phone number. The problem is that some people write their phone number with hyphens and spaces in them. I want to put the input trough a filter to remove such things and store only digits in my database.
I figured that I could do some str_replace() for the whitespaces and special chars.
However I think that a better approach would be to pick out just the digits instead of removing everything else. I think that I have heard the term "whitelisting" about this.
Could you please point me in the direction of solving this in PHP?
Example: I want the input "0333 452-123-4" to result in "03334521234"
Thanks!

This is a non-trivial problem because there are lots of colloquialisms and regional differences. Please refer to What is the best way for converting phone numbers into international format (E.164) using Java? It's Java but the same rules apply.
I would say that unless you need something more fully-featured, keep it simple. Create a list of valid regular expressions and check the input against each until you find a match.
If you want it really simple, simply remove non-digits:
$phone = preg_replace('![^\d]+!', '', $phone);
By the way, just picking out the digits is, by definition, the same as removing everything else. If you mean something different you may want to rephrase that.

$number = filter_var(str_replace(array("+","-"), '', $number), FILTER_SANITIZE_NUMBER_INT);
Filter_Var removes everything but pluses and minuses, and str_replace gets rid of those.
or you could use preg_replace
$number = preg_replace('/[^0-9]/', '', $number);

You could do it two ways. Iterate through each index in the string, and run is_numeric() on it, or you could use a regular expression on the string.

On the client side I do recommand using some formating that you design when creating a form. This is good for zip or telephone fields. Take a look at this jquery plugin for a reference. It will much easy later on the server side.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.