Preg replace numbers if more than 5 digits - php

I have an image upload form. After user submitted the form, my script will process the image and clean the image filename (im appending a unique number series at the end of the filename to prevent possible duplicate filename.
Often Im receiving filenames (after processing) such as
"c-id-1333-l-id-1298491-aid-3951-id-13995346097186883-im-193-1.jpg"
How can I preg_replace the numbers if its more than 5 digits, if less than 5 digit or less it will be retain. The above example should give "c-id-1333-l-id--aid-3951-id--im-193-1.jpg" (Dont mind the multiple consecutive dash[-], my script can handle this.

You can use the following to do this.
$str = 'c-id-1333-l-id-1298491-aid-3951-id-13995346097186883-im-193-1.jpg';
$str = preg_replace('/\d{5,}/', '', $str);
var_dump($str);
Explanation:
\d{5,} # digits (0-9) (at least 5 times)
Output:
string(41) "c-id-1333-l-id--aid-3951-id--im-193-1.jpg"
If you want to retain 5 digits or less than 5 then you can use \d{6,} instead.

Is the user supplied name important?
If it is not, one technique i like to do to normalize file names in that case is to simply hash them with something like sha1 or even md5. Then add your timestamp, and ids or what not to that, this takes care of a lot of issues with special characters such as ".", "\" and "/" ( dot dot dash and directory traversal ) in the file names.
#hwnd's regular expression should do the trick, but I though I would throw this out there.
so with hashing you'd get cleaner ( but less meaningful ) names like this
da39a3ee5e6b4b0d3255bfef95601890afd80709.jpg
then you can add your unique numbers on
da39a3ee5e6b4b0d3255bfef95601890afd80709-1234568764564558.jpg
you could even salt the filename with the timestamp first and then hash them to get filenames all 40 characters long, and the chance of hash collisions is very minimal unless your dealing with 10's of thousands of files, in which case just up the hashing to sha256 etc.

Related

Cryptographically secure random ASCII-string in PHP

I know about random_bytes() in PHP 7, and I want to use it for generating a cryptographically secure (e.g. hard to guess) random string for use as a one-time token or for longer term storage in a cookie.
Unfortunately, I don't know how to convert the output of random_bytes() to a string consisting only of human readable characters, so browsers don't get confused. I know about bin2hex(), but I'd prefer to use the full ASCII-range instead of hex numbers, for the sake of more bits per length.
Any ideas?
Unfortunately Peter O. deleted his answer after receiving negative attention in a review queue, perhaps because he phrased it as a question. I believe it is legitimate answer so I will reprise it.
One easy solution is to encode your random data into the base64 alphabet using base64_encode(). This will not produce the "full ASCII-range" as you have requested but it will give you most of it. An even larger ASCII range is output by a suitable base85 encoder, but php does not have a built-in one. You can probably find plenty of open-source base85 encoders for php though. In my opinion the decrease in length of base85 over base64 is unlikely to be worth the extra code you have to maintain.
I personally just use a GUID library and concatenate a couple of GUIDs to get a long unique token string. You also have the option to remove the dashes to keep it difficult to know the source and if you want to make it even more complex you can randomly cut back the string by up to 10 char to add complexity to its unknown length.
I use this library for generating my GUIDs
https://packagist.org/packages/ramsey/uuid
use Ramsey\Uuid\Uuid;
$token = Uuid::uuid4() . '-' . Uuid::uuid4();
Sorry, I overlooked the part about you wanting to use the full scope of 26 alpha char with numeric... Not sure I have an answer for you in this respect but you should have faith in the difficulty of guessing a UUID4, especially when you add a couple together and obfuscate the length by a factor of 10 to make guessing more complex.
Actually, if you could safely generate an array of random numbers in the range of valid ascii char codes then you could convert the entire random array of codes into the respective ascii char and implode them together as a single string.
function randomAsciiString($length) {
return implode('', array_map(
function($value) {
return chr($value);
},
array_map(
function($value) {
return random_int(33, 126);
},
array_fill(0, $length - 1, null)
)
));
}
echo randomAsciiString(128); // Normal 128 char string
echo randomAsciiString(random_int(118, 128)); // obfuscated length char string for extra complexity.
of course though... you should be mindful that you're using all the standard keys on the keyboard and some of those characters are going to upset things that are sensitive ( eg quotes etc.. )
Let's consider the letters to be used. For the sake of simplicity I will assume that you intend only big and small English letters to be used. This means that you have 26 big letters and 26 small letters, 52 different possible values. If we view a byte array of n elements as a number of n digits in base 256 and we convert this number into a base 52 number, where A is 0, B is 1, C is 2, ..., a is 26, ..., z is 51, then converting these digits into the corresponding letters will yield the text you wanted.

What set of chars is php's uniqid composed of?

I would like to prepare simple regular expression for php's uniqid. I checked uniqid manual looking for set of chars used as return value. But the documentation only mention that:
#return string the unique identifier, as a string.
And
With an empty prefix, the returned string will be 13 characters long. If more_entropy is true, it will be 23 characters.
I would like to know what characters can I expect in the return value. Is it a hex string? How to know for sure? Where to find something more about the uniqid function?
The documentation doesn't specify the string contents; only its length. Generally, you shouldn't depend on it. If you print the value between a pair of delimiters, like quotation marks, you could use them in the regular expression:
"([^"]+)" ($1 contains the value)
As long as you develop for a particular PHP version, you can inspect its implementation and assume, that it doesn't change. If you upgrade, you should check, if the assumption is still valid.
A comment in uniqid documentation describes, that it is essentially a hexadecimal number with an optional numeric suffix:
if (more_entropy) {
uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec, php_combined_lcg() * 10);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}
Which gives you two possible output formats:
uniqid() - 13 characters, hexadecimal number
uniqid('', true) - 14 - 23 characters, hexadecimal number with floating number suffix
computed elsewhere
If you use other delimiters than alphanumeric characters and dot, you could use one of these simple regular expressions to grab the value in either of the two formats:
[0-9a-f]+
[.0-9a-f]+
If you need 100% format guarantee for any PHP version, you could write your own function based on sprintf.
I admit, that it is unlikely, that the uniqid would significantly change; I would expect creating other extensions to provide different formats. Another comment in uniqid documentation shows a RFC 4211 compliant UUID implementation. There was also a discussion on stackoverflow about it.
I found this on the php site: http://www.php.net/manual/en/function.uniqid.php#95001
If this is to be believed then the 13 character version is entirely hex.
However the 23 character version has:
14 characters (hex)
then a dot
then another 8 characters (decimal)
If you need to be entirely sure, you can verify this yourself: http://sandbox.onlinephpfunctions.com/code/c04c7854b764faee2548180eddb8c23288dcb5f7

Password validation php regex

I'm new to regex.
I need to validate passwords using php with following password policy using Regex:
Passwords:
Must have minimum 8 characters
Must have 2 numbers
Symbols allowed are : ! # # $ % *
I have tried the following: /^(?=.*\d)(?=.*[A-Za-z])[0-9A-Za-z!##$%]$/
The following matches exactly your requirements: ^(?=.*\d.*\d)[0-9A-Za-z!##$%*]{8,}$
Online demo <<< You don't need the modifiers, they are just there for testing purposes.
Explanation
^ : match begin of string
(?=.*\d.*\d) : positive lookahead, check if there are 2 digits
[0-9A-Za-z!##$%*]{8,} : match digits, letters and !##$%* 8 or more times
$ : match end of string
I would first try and find two numbers, using non-regex (or preg_match_all('[0-9]', ...) >= 2, then validating against:
^[!##$%*a-zA-Z0-9]{8,}$
This should be faster and easier to understand. To do it using only regex sounds you need lookahead which basically scans the expression twice afaik, though I'm not sure of the PHP internals on that one.
Be prepared for a lot of complaints about passwords not being accepted. I personally have a large subset of passwords that wouldn't validate against those restrictions. Also nonsensical passwords like 12345678 would validate, or even 11111111, but not f4#f#faASvCXZr$%%zcorrecthorsebatterystaple.
if(preg_match('/[!##$%*a-zA-Z0-9]{8,}/',$password) && preg_match_all('/[0-9]/',$password) >= 2)
{
// do
}
Full Strong Password Validation With PHP
Min 8 chars long
Min One Digit
Min One Uppercase
Min One Lower Case
Min One Special Chars
/^\S*(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=\S*[\W])[a-zA-Z\d]{8,}\S*$/
Demo here

how to check a password's Content and length using an array Functions

A user enters a password, say 'tomorrow1234'. I'm aware that I can split it into an array with str_split, but after that, I want to go through each value and search them for things such as capitalization, number, or white space.
How would I go about doing this?
This is an old standby function I use to valiate password complexity. It requires that the password contains upper and lowercase letters, as well as non-alpha characters. Length checks are trivial and are handled elsewhere.
$req_regex = array(
'/[A-Z]/', //uppercase
'/[a-z]/', //lowercase
'/[^A-Za-z]/' //non-alpha
);
foreach($req_regex as $regex) {
if( !preg_match($regex, $password) ) {
return NULL;
}
}
I use the array and a loop so it's easy to add/remove conditions if necessary.
Sounds like your trying to verify password strength.
Check out this web page, your solution would be pretty complex to write a specific answer for, but you can use regex to check for things like capitalization, symbols and digits. This page has several examples you could modify for your needs.
http://www.cafewebmaster.com/check-password-strength-safety-php-and-regex
This is what I would use:
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$
Checks for 1 letter, 1 number, 1 special character and at least 8 characters long.

php regular expression to filter out junk

So I have an interesting problem: I have a string, and for the most part i know what to expect:
http://www.someurl.com/st=????????
Except in this case, the ?'s are either upper case letters or numbers. The problem is, the string has garbage mixed in: the string is broken up into 5 or 6 pieces, and in between there's lots of junk: unprintable characters, foreign characters, as well as plain old normal characters. In short, stuff that's apt to look like this: Nyþ=mî;ëMÝ×nüqÏ
Usually the last 8 characters (the ?'s) are together right at the end, so at the moment I just have PHP grab the last 8 chars and hope for the best. Occasionally, that doesn't work, so I need a more robust solution.
The problem is technically unsolvable, but I think the best solution is to grab characters from the end of the string while they are upper case or numeric. If I get 8 or more, assume that is correct. Otherwise, find the st= and grab characters going forward as many as I need to fill up the 8 character quota. Is there a regex way to do this or will i need to roll up my sleeves and go nested-loop style?
update:
To clear up some confusion, I get an input string that's like this:
[garbage]http:/[garbage]/somewe[garbage]bsite.co[garbage]m/something=[garbage]????????
except the garbage is in unpredictable locations in the string (except the end is never garbage), and has unpredictable length (at least, I have been able to find patterns in neither). Usually the ?s are all together hence me just grabbing the last 8 chars, but sometimes they aren't which results in some missing data and returned garbage :-\
$var = '†http://þ=www.ex;üßample-website.î;ëcomÝ×ü/joy_hÏere.html'; // test case
$clean = join(
array_filter(
str_split($var, 1),
function ($char) {
return (
array_key_exists(
$char,
array_flip(array_merge(
range('A','Z'),
range('a','z'),
range((string)'0',(string)'9'),
array(':','.','/','-','_')
))
)
);
}
)
);
Hah, that was a joke. Here's a regex for you:
$clean = preg_replace('/[^A-Za-z0-9:.\/_-]/','',$var);
As stated, the problem is unsolvable. If the garbage can contain "plain old normal characters" characters, and the garbage can fall at the end of the string, then you cannot know whether the target string from this sample is "ABCDEFGH" or "BCDEFGHI":
__http:/____/somewe___bsite.co____m/something=__ABCDEFGHI__
What do these values represent? If you want to retain all of it, just without having to deal with garbage in your database, maybe you should hex-encode it using bin2hex().
You can use this regular expression :
if (preg_match('/[\'^£$%&*()}{##~?><>,|=_+¬-]/', $string) ==1)

Categories