preg_match admits only two consecutive lowercases - php

I want to check if password contains:
minimum 2 lower cases
minimum 1 upper case
minimum 2 selected special characters
The problem is that when i want to verify this,it admits two lowercases,but only if they are consecutive,like this:paSWORD .
if I enter pASWORd,it returns an error.
This is the code
preg_match("/^(?=.*[a-z]{2})(?=.*[A-Z])(?=.*[_|!|#|#|$|%|^|&|*]{2}).+$/")
I don't see where the problem is and how to fix it.

You're looking for [a-z]{2} in your regex. That is two consecutive lowercases!
I will go out on a limb and suggest that it is probably better to individually check each of your three conditions in separate regexes rather than trying to be clever and do it in one.
I've put some extra braces in which may get your original idea to work for non-consecutive lowercase/special chars, but I think the expression is overcomplex.
preg_match("/^(?=(.*[a-z]){2})(?=.*[A-Z])(?=(.*[_!##$%^&*]){2}).+$/")

You can use this pattern to check the three rules:
preg_match("/(?=.*[a-z].*[a-z])(?=.*[A-Z])(?=.*[_!##$%^&*].*[_!##$%^&*])/");
but if you want to allow only letters and these special characters, you must add:
preg_match("/^(?=.*[a-z].*[a-z])(?=.*[A-Z])(?=.*[_!##$%^&*].*[_!##$%^&*])[a-zA-Z_!##%^&*]+$/");
a way without regex
$str = '*MauriceAimeLeJambon*';
$chars = 'abcdefghijklmnopqrtuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_!##$%^&*';
$state = array('lower' => 2, 'upper' => 1, 'special' => 2);
$strlength = strlen($str);
for ($i=0; $i<$strlength; $i++) {
$pos = strpos($chars, $str[$i]);
if (is_numeric($pos)) {
if ($state['lower'] && $pos<26) $state['lower']--;
elseif ($state['upper'] && $pos<52) $state['upper']--;
elseif ($state['special']) $state['special']--;
} else { $res = false; break; }
$res = !$state['lower'] && !$state['upper'] && !$state['special'];
}
var_dump($res);
(This version give the same result than the second pattern. If you want the same result than the first pattern, just remove the else {} and put the last line out of the for loop.)

Related

How to check the first 2 characters ( case-insensitive ) in PHP?

I have a search input box.
I already check for the first 2 characters.
$first_two = substr($search,0,2);
They need to start with BS, so I check them like this :
if ($search AND $first_two == 'bs' ){
// ... do stuffs
}
After that, I just realize that I don't want to have any restriction on them.
I want my first two character is case-insensitive.
I want to allow BS, Bs, bs and bS.
What do I need to fix in my if (?) to allow these to happens ?
Use a certain case for your comparison, this example uses lower case
$first_two = substr($search,0,2);
if ($search AND strtolower($first_two) == 'bs' ){
// ... do stuffs
}
This way BS, Bs, bS will all be converted to bs and your comparison will always work
Use strcasecmp() like this:
if ($search AND strcasecmp($first_two , "bs") == 0)
//^If both strings are the same (case-insensitive) the function returns 0

Password Validation With Multiple Rules

I'm attempting to write a regex in PHP that validates the following:
At least 10 chars
Has at least 2 Upper-case characters
Has at least 2 Numbers OR Symbols
I've looked at just about every reference I can find but, to no avail.
I guess I can test individually, but that makes me very sad :(
Can someone please help? (And send me to a spot where I can learn in plain English Reg Ex?)
This picture is worth more than 1000 words
(and that's a lot of entropy)
(image via XKCD)
With this in mind you might want to consider dropping rules 2 & 3 if password length is higher than X (say.. 20) or increase the minimum to at least 16 characters (as the only rule).
As for your requirement:
As opposed to having one big, ugly, hard-to-maintain, advanced RegExp you might want to break the problem in smaller parts and tackle each bit separately using dedicated functions.
For this you could look at ctype_* functions, count_chars() and MultiByte String Functions.
Now the ugly:
This advanced RegEx will return true or false according to your rules:
preg_match('/^(?=.{10,}$)(?=.*?[A-Z].*?[A-Z])(?=.*?([\x20-\x40\x5b-\x60\x7b-\x7e\x80-\xbf]).*?(?1).*?$).*$/',$string);
Test demo here: http://regex101.com/r/qE9eB2
1st part (LookAhead) : (?=.{10,}$) will check string length and continue if it has at least 10 characters. You could drop this and do a check with strlen() or even better mb_strlen().
2nd part (also a LookAhead): (?=.*?[A-Z].*?[A-Z]) will check for the presence of 2 UPPERCASE characters. You could also do a $upper=preg_replace('/[^A-Z]/','',$string) instead and count the chars in $upper to be more than two.
3rd LookAhead uses a character class: [\x20-\x40\x5b-\x60\x7b-\x7e\x80-\xbf] with hex escaped character ranges for common symbols (pretty much all the symbols one could find on an average keyboard). You could also do a $sym=preg_replace('/[^a-zA-Z]/','',$string) instead and count the chars in $sym to be more than two. Note: to make it shorter I used a recursive group (?1) to not repeat the same character class again
For learning, the most comprehensive RegExp reference I know of is: regular-expressions.info
You can use lookaheads to make sure that what you are looking for is contained appropriately.
/(?=.*[A-Z].*[A-Z])(?=.*[^a-zA-Z].*[^a-zA-Z]).{10,}/
I have always preferred good old procedural code for handling stuff like this. Regular expressions can be useful but they can also be a little cumbersome, especially for code maintenance and quick scanning (regular expressions are not exactly examples of readability).
function strContains($string, $contains, $n = 1, $exact = false) {
$length = strlen($string);
$tally = 0;
for ($i = 0; $i < $length; $i++) {
if (strpos($contains, $string[$i]) !== false) {
$tally++;
}
}
return ($exact ? $tally == $n : $tally >= $n);
}
function validPassword($password) {
if (strlen($password) < 10) {
return false;
}
$upperChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$upperCount = 2;
if (strContains($password, $upperChars, $upperCount) === false) {
return false;
}
$numSymChars = '0123456789!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~';
$numSymCount = 2;
if (strContains($password, $numSymChars, $numSymCount) === false) {
return false;
}
return true;
}

Regex expression for matching all duplicate substrings of any length

Let's say we have a string: "abcbcdcde"
I want to identify all substrings that are repeated in this string using regex (i.e. no brute-force iterative loops).
For the above string, the result set would be: {"b", "bc", "c", "cd", "d"}
I must confess that my regex is far more rusty than it should be for someone with my experience. I tried using a backreference, but that'll only match consecutive duplicates. I need to match all duplicates, consecutive or otherwise.
In other words, I want to match any character(s) that appears for the >= 2nd time. If a substring occurs 5 times, then I want to capture each of occurrences 2-5. Make sense?
This is my pathetic attempt thus far:
preg_match_all( '/(.+)(.*)\1+/', $string, $matches ); // Way off!
I tried playing with look-aheads but I'm just butchering it. I'm doing this in PHP (PCRE) but the problem is more or less language-agnostic. It's a bit embarrassing that I'm finding myself stumped on this.
Your problem is recursi ... you know what, forget about recursion! =p it wouldn't really work well in PHP and the algorithm is pretty clear without it as well.
function find_repeating_sequences($s)
{
$res = array();
while ($s) {
$i = 1; $pat = $s[0];
while (false !== strpos($s, $pat, $i)) {
$res[$pat] = 1;
// expand pattern and try again
$pat .= $s[$i++];
}
// move the string forward
$s = substr($s, 1);
}
return array_keys($res);
}
Out of interest, I wrote Tim's answer in PHP as well:
function find_repeating_sequences_re($s)
{
$res = array();
preg_match_all('/(?=(.+).*\1)/', $s, $matches);
foreach ($matches[1] as $match) {
$length = strlen($match);
if ($length > 1) {
for ($i = 0; $i < $length; ++$i) {
for ($j = $i; $j < $length; ++$j) {
$res[substr($match, $i, $j - $i + 1)] = 1;
}
}
} else {
$res[$match] = 1;
}
}
return array_keys($res);
}
I've let them fight it out in a small benchmark of 800 bytes of random data:
$data = base64_encode(openssl_random_pseudo_bytes(600));
Each code is run for 10 rounds and the execution time is measured. The results?
Pure PHP - 0.014s (10 runs)
PCRE - 40.86s <-- ouch!
It gets weirder when you look at 24k bytes (or anything above 1k really):
Pure PHP - 4.565s (10 runs)
PCRE - 0.232s <-- WAT?!
It turns out that the regular expression broke down after 1k characters and so the $matches array was empty. These are my .ini settings:
pcre.backtrack_limit => 1000000 => 1000000
pcre.recursion_limit => 100000 => 100000
It's not clear to me how a backtrack or recursion limit would have been hit after only 1k of characters. But even if those settings are "fixed" somehow, the results are still obvious, PCRE doesn't seem to be the answer.
I suppose writing this in C would speed it up somewhat, but I'm not sure to what degree.
Update
With some help from hakre's answer I put together an improved version that increases performance by ~18% after optimizing the following:
Remove the substr() calls in the outer loop to advance the string pointer; this was a left over from my previous recursive incarnations.
Use the partial results as a positive cache to skip strpos() calls inside the inner loop.
And here it is, in all its glory (:
function find_repeating_sequences3($s)
{
$res = array();
$p = 0;
$len = strlen($s);
while ($p != $len) {
$pat = $s[$p]; $i = ++$p;
while ($i != $len) {
if (!isset($res[$pat])) {
if (false === strpos($s, $pat, $i)) {
break;
}
$res[$pat] = 1;
}
// expand pattern and try again
$pat .= $s[$i++];
}
}
return array_keys($res);
}
You can't get the required result in a single regex because a regex will match either greedily (finding bc...bc) or lazily (finding b...b and c...c), but never both. (In your case, it does find c...c, but only because c is repeated twice.)
But once you've found a repeated substring of length > 1, it logically follows that all the smaller "substrings of that substring" must also be repeated. If you want to get them spelled out for you, you need to do this separately.
Taking your example (using Python because I don't know PHP):
>>> results = set(m.group(1) for m in re.finditer(r"(?=(.+).*\1)", "abcbcdcde"))
>>> results
{'d', 'cd', 'bc', 'c'}
You could then go and apply the following function to each of your results:
def substrings(s):
return [s[start:stop] for start in range(len(s)-1)
for stop in range(start+1, len(s)+1)]
For example:
>>> substrings("123456")
['1', '12', '123', '1234', '12345', '123456', '2', '23', '234', '2345', '23456',
'3', '34', '345', '3456', '4', '45', '456', '5', '56']
The closest I can get is /(?=(.+).*\1)/
The purpose of the lookahead is to allow the same characters to be matched more than once (for instance, c and cd). However, for some reason it doesn't seem to be getting the b...
Interesting question. I basically took the function in Jacks answer and was trying if the number of tests can be reduced.
I first tried to only search half the string, however it turned out that creating the pattern to search for via substr each time was way too expensive. The way how it is done in Jacks answer by appending one character per each iteration is way better it looks like. And then I did run out of time so I could not look further into it.
However while looking for such an alternative implementation I at least found out that some of the differences in the algorithm I had in mind could be applied to Jacks function as well:
There is no need to cut the beginning of the string in each outer iteration as the search is already done with offsets.
If the rest of the subject to look for repetition is smaller than the repetition needle, you do not need to search for the needle.
If it was already searched for the needle, you don't need to search again.
Note: This is a memory trade. If you have many repetitions, you will use similar memory. However if you do have a low amount of repetitions, than this variant uses more memory than before.
The function:
function find_repeating_sequences($string) {
$result = array();
$start = 0;
$max = strlen($string);
while ($start < $max) {
$pat = $string[$start];
$i = ++$start;
while ($max - $i > 0) {
$found = isset($result[$pat]) ? $result[$pat] : false !== strpos($string, $pat, $i);
if (!$result[$pat] = $found) break;
// expand pattern and try again
$pat .= $string[$i++];
}
}
return array_keys(array_filter($result));
}
So just see this as an addition to Jacks answer.

How to compare version #'s inside a string?

I need to compare two version #'s to see if one is greater than the other and am having a really hard time doing so.
version 1: test_V10.1.0.a.1#example
version 2: test_V9.7.0_LS#example
I've tried stripping all non numeric characters out so I would be left with:
version1: 10101
version2: 970
Which drops the 'a' from 10.1.0.a.1 so that's no good, and I've tried taking everything between 'test_' and '#' then stripping out anything to the right of an underscore '_' and the underscore itself, but then I still have to strip out the 'V' at the beginning of the string.
Even if I can get down to just 10.1.0.a.1 and 9.7.0, how can I compare these two? How can I know if 10.1.0.a.1 is greater than 9.7.0? If I strip the decimals out I'm still left with a non numeric character in 1010a1, but I need that character in case say the release version I'm comparing this to is 10.1.0.b.1, this would be greater than 10.1.0.a.1.
This is driving me nuts, has anyone dealt with this before? How did you compare the values? I'm using php.
Shouldn't you be using? version_compare(ver1, ver2)
http://php.net/manual/en/function.version-compare.php
I think you want to consider working with a regex to parse out the "number" part of the version numbers - "10.1.0.a.1" and "9.7.0". After that, you can split by '.' to get two "version arrays".
With the version arrays, you pop elements off them until you find a higher number. Whichever array it came from is the higher version number. If either array runs out, it's a lesser version number (unless all the remaining elements are "0" or "a" or whatever semantics you use to say "base version", e.g., "10.0.0.a.0" == "10.0"). If both run out at the same time, then they're equal.
Use explode('.', $versionNum)
$ver1 = '10.1.0.a.1';
$ver2 = '10.1.0';
$arr1 = explode('.', $ver1);
$arr2 = explode('.', $ver2);
$min = min(count($arr1), count($arr2));
for ($i = 0; $i < $min; $i++)
{
if ($i + 1 == $min)
echo ($min == count($arr1)) ? $ver2 : $ver1;
if ($arr1[$i] > $arr2[$i])
{
echo $ver1;
break;
}
elseif ($arr1[$i] < $arr2[$i])
{
echo $ver2;
break;
}
}
The following regular expression will match everything between "test_V" and "#example" and throw it into an array called $matches[1]
$pattern = '/test_V(.*?)(?:_.*?)?#example/i';
$string = 'version 1: test_V10.1.0.a.1#example version 2: test_V9.7.0_LS#example';
if(preg_match_all($pattern,$string,$matches))
{
print_r($matches[1]);
}
returns
Array
(
[0] => 10.1.0.a.1
[1] => 9.7.0
)
This will give you a head start in figuring out how you want to pull apart your fairly complex version number.

Split a long string not using space

If I have sentences like this:
$msg = "hello how are you?are you fine?thanks.."
and I wish to seperate it into 3 (or whatever number).
So I'm doing this:
$msglen = strlen($msg);
$seperate = ($msglen /3);
$a = 0;
for($i=0;$i<3;$i++)
{
$seperate = substr($msg,$a,$seperate)
$a = $a + $seperate;
}
So the output should be..
hello how are
[a space here->] you?are you [<-a space here]
fine?thanks..
So is it possible to separate at middle of any word instead of having a space in front or end of the separated message?
Such as "thank you" -> "than" and "k you" instead of "thank" " you ".
Because I'm doing a convert function and with a space in front or end it will effect the convertion , and the space is needed for the conversion,so I can't ignore or delete it.
Thanks.
I take it you can't use trim because the message formed by the joined up strings must be unchanged?
That could get complicated. You could make something that tests for a space after the split, and if a space is detected, makes the split one character earlier. Fairly easy, but what if you have two spaces together? Or a single lettered word? You can of course recursively test this way, but then you may end up with split strings of lengths that are very different from each other.
You need to properly define the constraints you want this to function within.
Please state exactly what you want to do - do you want each section to be equal? Is the splitting in between words of a higher priority than this, so that the lengths do not matter much?
EDIT:
Then, if you aren't worried about the length, you could do something like this [starting with Eriks code and proceeding to change the lengths by moving around the spaces:
$msg = "hello how are you?are you fine?thanks..";
$parts = split_without_spaces ($msg, 3);
function split_without_spaces ($msg, $parts) {
$parts = str_split(trim($msg), ceil(strlen($msg)/$parts));
/* Used trim above to make sure that there are no spaces at the start
and end of the message, we can't do anything about those spaces */
// Looping to (count($parts) - 1) becaause the last part will not need manipulation
for ($i = 0; $i < (count($parts) - 1) ; $i++ ) {
$k = $i + 1;
// Checking the last character of the split part and the first of the next part for a space
if (substr($parts[$i], -1) == ' ' || $parts[$k][0] == ' ') {
// If we move characters from the first part to the next:
$num1 = 1;
$len1 = strlen($parts[$i]);
// Searching for the last two consecutive non-space characters
while ($parts[$i][$len1 - $num1] == ' ' || $parts[$i][$len1 - $num1 - 1] == ' ') {
$num1++;
if ($len1 - $num1 - 2 < 0) return false;
}
// If we move characters from the next part to the first:
$num2 = 1;
$len2 = strlen($parts[$k]);
// Searching for the first two consecutive non-space characters
while ($parts[$k][$num2 - 1] == ' ' || $parts[$k][$num2] == ' ') {
$num2++;
if ($num2 >= $len2 - 1) return false;
}
// Compare to see what we can do to move the lowest no of characters
if ($num1 > $num2) {
$parts[$i] .= substr($parts[$k], 0, $num2);
$parts[$k] = substr($parts[$k], -1 * ($len2 - $num2));
}
else {
$parts[$k] = substr($parts[$i], -1 * ($num1)) . $parts[$k];
$parts[$i] = substr($parts[$i], 0, $len1 - $num1);
}
}
}
return ($parts);
}
This takes care of multiple spaces and single lettered characters - however if they exist, the lengths of the parts may be very uneven. It could get messed up in extreme cases - if you have a string made up on mainly spaces, it could return one part as being empty, or return false if it can't manage the split at all. Please test it out thoroughly.
EDIT2:
By the way, it'd be far better for you to change your approach in some way :) I seriously doubt you'd actually have to use a function like this in practice. Well.. I hope you do actually have a solid reason to, it was somewhat fun coming up with it.
If you simply want to eliminate leading and trailing spaces, consider trim to be used on each result of your split.
If you want to split the string into exact thirds it is not known where the cut will be, maybe in a word, maybe between words.
Your code can be simplified to:
$msg = "hello how are you?are you fine?thanks..";
$parts = str_split($msg, ceil(strlen($msg)/3));
Note that ceil() is needed, otherwise you might get 4 elements out because of rounding.
You're probably looking for str_split, chunk_split or wordwrap.

Categories