Regular Expression not allow duble underscores - php

I am writing my website user registration part, I have a simple regular expression as follows:
if(preg_match("/^[a-z0-9_]{3,15}$/", $username)){
// OK...
}else{
echo "error";
exit();
}
I don't want to let users to have usernames like: '___' or 'x________y', this is my function which I just wrote to replace duble underscores:
function replace_repeated_underScores($string){
$final_str = '';
$str_len = strlen($string);
$prev_char = '';
for($i = 0; $i < $str_len; $i++){
if($i > 1){
$prev_char = $string[$i - 1];
}
$this_char = $string[$i];
if($prev_char == '_' && $this_char == '_'){
}else{
$final_str .= $this_char;
}
}
return $final_str;
}
And it works just fine, but I wonder if I could also check this with regular expression and not another function.
I would appreciate any help.

Just add negative look-ahead to check whether there is double underscore in the name or not.
/^(?!.*__)[a-z0-9_]{3,15}$/
(?!pattern), called zero-width negative look-ahead, will check that it is not possible to find the pattern, ahead in the string from the "current position" (current position is the position that the regex engine is at). It is zero-width, since it doesn't consume text in the process, as opposed to the part outside. It is negative, since the match would only continue if there is no way to match the pattern (all possibilities are exhausted).
The pattern is .*__, so it simply means that the match will only continue if it cannot find a match for .*__, i.e no double underscore __ ahead in the string. Since the group does not consume text, you will still be at the start of the string when it starts to match the later part of the pattern [a-z0-9_]{3,15}$.
You already allow uppercase username with strtolower, nevertheless, it is still possible to do validation with regex directly by adding case-insensitive flag i:
/^(?!.*__)[a-z0-9_]{3,15}$/i

Related

Follow cloaked link with with chain of numbers to front

I'm building a small cloaking link script but I need to find each one with a different string number eg( 'mylinkname1'-1597). By the way: the number is always integer.
The problem is that I never know the string number so I was thinking to use regex but something is failing.
Here's what I got now:
$pattern = '/-([0-9]+)/';
$v = $_GET['v']
if ($v == 'mylinkname1'.'-'.$pattern) {$link = 'http://example1.com/';}
if ($v == 'mylinkname2'.'-'.$pattern) {$link = 'http://example2.com/';}
if ($v == 'mylinkname3'.'-'.$pattern) {$link = 'http://example3.com/';}
header("Location: $link") ;
exit();
The dash is already in the pattern so you don't have to add it in the if clause.
You can omit the capturing group around the digits -[0-9]+, and you have to use the pattern with preg_match.
You might update the format of the if statements to:
$pattern = '-[0-9]+';
if (preg_match("/mylinkname1$pattern/", $v)) {$link = 'http://example1.com/';}
To prevent mylinkname1-1597 being part of a larger word, you might surround the pattern with anchors ^ and $ to assert the start and end of the string or word boundaries \b
no need for regular expressions here at all just split the string on the hyphen and only match that, also I recommend a case\switch when you 3 or if\eleses:
$v=explode('-',$_GET['v']);
switch ($v[0]) {
case "mylinkname1":
$link = 'http://example1.com/';
break;
case "mylinkname2":
$link = 'http://example2.com/';
break;
case "mylinkname3":
$link = 'http://example3.com/';
break;
default:
echo "something not right";
}
header("Location: $link") ;
exit();

Is every letter in the alphabet in a string at least once?

Was wondering if there was a more efficient way to detect if a string contains every letter in the alphabet one or more times using regex?
I appreciate any suggestions
$str = str_split(strtolower('We promptly judged antique ivory buckles for the next prize'));
$az = str_split('abcdefghijklmnopqrstuvwxyz');
$count = 0;
foreach($az as $alph) {
foreach($str as $z) {
if($alph == $z) {
$count++;
break;
}
}
}
Just use array_diff:
count(array_diff($az, $str)) > 0;
With regex you can do that, but it isn't optimal nor fast at all, #hjpotter way if from far faster:
var_dump(strlen(preg_replace('~[^a-z]|(.)(?=.*\1)~i', '', $str)) == 26);
It removes all non letter characters, all duplicate letters (case insensitive), and compares the string length with 26.
[^a-z] matches any non letter character
(.) captures a letter in group 1
(?=.*\1) checks if the same letter is somewhere else (on the right)
the i modifier makes the pattern case insensitive
I don't have any regex answer. But without regex you can try using PHP's count_chars function.
For example:
$test_string = 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz';
echo count(count_chars($test_string, 1));
Gives you 26 - which is the number of unique chars from $test_string with a frequency greater than zero.
You current program will print pangram for all strings having 26+ alphabets, which means, even aaa... is a pangram.
In your inner loop, you can just break if any character from a-z is not found:
function is_pangram($str) {
if (strlen($str) < 26) return false;
$az = str_split('abcdefghijklmnopqrstuvwxyz');
for ($az as $char) {
if (stripos($str, $char) === false)
return false;
return true;
}
}
A regex is not optimal in this situation. An alternative approach would be using array_map and str_count.
Make an array of Booleans with length 26. Then you can loop through your string just once. In pseudo code (because I don't know PHP):
Boolean b[26]; // Initialized to false
count = 0;
Loop for each char c in string
if (not b[c]) then
++count;
b[c] = true
end
if (count == 26)
break; // All present;
end
end
// If count < 26 then not all present
You need to figure out how to make the character index into the array, but that shouldn't be too hard.

What is the Regex for Only One # Character?

I am looking for a Regex to use in PHP in order to match one character; the # symbol.
For example, if I typed: P#ssword into an input, the Regex will match. If I typed P##ssword into an input, the regex will not match.
Here is my PHP Code that I am using:
<?php
session_start();
if($_SERVER['REQUEST_METHOD'] == "POST") {
$conn=mssql_connect('d','dd','d');
mssql_select_db('d',$conn);
if(! $conn )
{
die('Could not connect: ' . mssql_get_last_message());
}
$username = ($_POST['username']);
$password = ($_POST['password']);
if (preg_match("[\W]",$_POST["password"]))
{
if (!preg_match("^[^#]*#[^#]*$",$_POST["password"]))
{
header("location:logingbm.php");
} else {
}
}
if(!filter_var($_POST['username'], FILTER_VALIDATE_EMAIL))
{
if ($_POST["username"])
{
if ($_POST["password"])
{
$result = mssql_query("SELECT * FROM staffportal WHERE email='".$username."' AND
password='".$password."'");
if(mssql_num_rows($result) > 0) {
$_SESSION['staff_logged_in'] = 1;
$_SESSION['username'] = $username;
}}}} else {
if ($_POST["password"])
{
$result = mssql_query("SELECT * FROM staffportal WHERE email='".$username."' AND
password='".$password."'");
if(mssql_num_rows($result) > 0) {
$_SESSION['staff_logged_in'] = 1;
$_SESSION['username'] = $username;
}}}}
if(!isset($_SESSION['staff_logged_in'])) {
header("location:logingbm.php");
echo "<script>alert('Incorrect log-in information!');</script>";
} else {
header("location:staffportal.php");
}
?>
Other lightweight approaches...
Without regex
Just use substr_count (see demo)
<?php
$str1 = "pa#s#s";
$str2 = "pa#ss";
echo (substr_count($str1,"#")==1)?"beauty\n":"abject\n"; // abject
echo (substr_count($str2,"#")==1)?"beauty\n":"abject\n"; // beauty
With regex
EDIT: just saw that Sam wrote something equivalent.
If you want to use regex, you could use this fairly simple regex:
#
How? This code (see demo)
<?php
$str1 = "pa#s#s";
$str2 = "pa#ss";
$regex = "~#~";
echo (preg_match_all($regex,$str1,$m)==1)?"beauty\n":"abject\n"; // abject
echo (preg_match_all($regex,$str2,$m)==1)?"beauty\n":"abject\n"; // beauty
The easiest way would be to use the return value of preg_match_all().
Returns the number of full pattern matches (which might be zero), or FALSE if an error occurred.
Example:
$count = preg_match_all('/#/', $password, $matches);
Non regex solution (based off of #cdhowie's comment):
$string = 'P#ssword';
$length = strlen($string);
$count = 0;
for($i = 0; $i < $length; $i++) {
if($string[$i] === '#') {
$count++;
}
}
This works because you can access characters of Strings as you would with normal arrays ($var = 'foo'; $var[0] = 'f';).
As I said in my comment, your pattern needs delimiters /, #, ~ or whatever you want (see the PHP doc for that and test yourself).
To be quickly sure that a string contains only one #, you can do that:
if (preg_match('~\A[^#]*#[^#]*\z~', $yourstr))
echo 'There is one #';
else
echo 'There is more than one # or zero #';
This regexp will do what you want:
^[^#]*#[^#]*$
This matches any line that contains one and only one #.
Explanation
^ matches the beginning of the line
[^#]* matches everything before the #
# matches the # character
[^#]* matches everything after the #
$ matches the end of the line
Use
preg_match("#^[^#]*#[^#]*$#", $passwd); //Matches $passwd if it contains only one character
Here's what your regex code means:
If there is at least one non-word character in the string ([\W]), there must be exactly one at-sign (#). There may be any number of any other characters before and after the at-sign: letters, digits, control characters, punctuation, anything. Anything but #.
What I'm wondering is, are you trying to say there can be not more than one at-sign (i.e. zero or one?) That's pretty simple, conceptually; just get rid of the first regex check ("[\W]") and change the second regex to this:
"^[^#]*(?:#[^#]*)?$"
In other words:
Start by consuming not at-signs you see. If you see a #, go ahead and consume it, then resume matching whatever not at-signs remain. If that doesn't leave you at the end of the string, it can only mean there were more than one #. Abandon the attempt immediately and report a failed match.
Of course, this still leaves you with the problem of which other characters you want to allow. I'm pretty sure [^#]* is not what you want.
Also, "[\W]" may be working as you intended, but it's only by accident. You could have written it "/\W/" or "~\W~" "(\W)" and it would work just the same. You may have meant those square brackets to form a character class, but they're not even part of the regex; they're the regex delimiters.
So why did it work, you ask? \W is a predefined character class, equivalent to [^\w]. You can use it inside a regular character class, but it works fine on its own.

filtering words from text with exploits

I have filter which filters bad words like 'ass' 'fuck' etc. Now I am trying to handle exploits like "f*ck", "sh/t".
One thing I could do is matching each words with dictionary of bad word having such exploits. But this is pretty static and not good approach.
Another thing I can do is, using levenshtein distance. Words with levenshtein distance = 1 should be blocked. But this approach also prone to give false positive.
if(!ctype_alpha($text)&& levenshtein('shit', $text)===1)
{
//match
}
I am looking for some way of using regex. May be I can combine levenshtein distance with regex, but I could not figure it out.
Any suggestion is highly appreciable.
Like stated in the comments, it is hard to get this right. This snippet, far from perfect, will check for matches where letters are substituted for the same number of other characters.
It may give you a general idea of how you could solve this, although much more logic is needed if you want to make it smarter. This filter, for instance will not filter 'fukk', 'f ck', 'f**ck', 'fck', '.fuck' (with leading dot) or 'fück', while it does probably filter out '++++' to replace it with 'beep'. But it also filters 'f*ck', 'f**k', 'f*cking' and 'sh1t', so it could do worse. :)
An easy way to make it better, is to split the string in a smarter way, so punctuation marks aren't glued to the word they are adjacent to. Another improvement could be to remove all non-alphabetic characters from each word, and check if the remaining letters are in the same order in a word. That way, 'f\/ck' would also match 'fuck'. Anyway, let your imagination run wild, but be careful for false positives. And trust me that 'they' will always find a way to express themselves in a way that bypasses your filter.
<?php
$badwords = array('shit', 'fuck');
$text = 'Man, I shot this f*ck, sh/t! fucking fucker sh!t fukk. I love this. ;)';
$words = explode(' ', $text);
// Loop through all words.
foreach ($words as $word)
{
$naughty = false;
// Match each bad word against each word.
foreach ($badwords as $badword)
{
// If the word is shorter than the bad word, it's okay.
// It may be bigger. I've done this mainly, because in the example given,
// 'f*ck,' will contain the trailing comma. This could be easily solved by
// splitting the string a bit smarter. But the added benefit, is that it also
// matches derivatives, like 'f*cking' or 'f*cker', although that could also
// result in more false positives.
if (strlen($word) >= strlen($badword))
{
$wordOk = false;
// Check each character in the string.
for ($i = 0; $i < strlen($badword); $i++)
{
// If the letters don't match, and the letter is an actual
// letter, this is not a bad word.
if ($badword[$i] !== $word[$i] && ctype_alpha($word[$i]))
{
$wordOk = true;
break;
}
}
// If the word is not okay, break the loop.
if (!$wordOk)
{
$naughty = true;
break;
}
}
}
// Echo the sensored word.
echo $naughty ? 'beep ' : ($word . ' ');
}

PHP regular expression allowing at most 1 '.' or '_' character in string, and '.' or '_' can't be at beginning or end of string

I am writing a PHP validation for a user registration form. I have a function set up to validate a username which uses perl-compatible regular expressions. How can I edit it so that one of the requirements of the regular expression are AT MOST a single . or _ character, but NOT allow that character at the beginning or end of the string? So for example, things like "abc.d", "nicholas_smith", and "20z.e" would be valid, but things like "abcd.", "a_b.C", and "_nicholassmith" would all be invalid.
This is what I currently have but it does not add in the requirements about . and _ characters.
function isUsernameValid()
{
if(preg_match("/^[A-Za-z0-9_\.]*(?=.{5,20}).*$/", $this->username))
{
return true; //Username is valid format
}
return false;
}
Thank you for any help you may bring.
if (preg_match("/^[a-zA-Z0-9]+[._]?[a-zA-Z0-9]+$/", $this->username)) {
// there is at most one . or _, and it's not at the beginning or end
}
You can combine this with the string length check:
function isUsernameValid() {
$length = strlen($this->username);
if (5 <= $length && $length <= 20
&& preg_match("/^[a-zA-Z0-9]+[._]?[a-zA-Z0-9]+$/", $this->username)) {
return true;
}
return false;
}
You could probably do the whole lot using just one regex, but it would be much harder to read.
You can use the following pattern, I have divided it into multiple lines to make it more understandable:
$pattern = "";
$pattern.= "%"; // Start pattern
$pattern.= "[a-z0-9]+"; // Some alphanumeric chars, at least one.
$pattern.= "[\\._]"; // Contains exactly either one "_" or one "."
$pattern.= "[a-z0-9]+"; // Some alphanumeric chars, at least one.
$pattern.= "%i"; // End pattern, optionally case-insensetive
And then you can use this pattern in your function/method:
function isUsernameValid() {
// $pattern is defined here
return preg_match($pattern, $this->username) > 0;
}
Here is a commented, tested regex which enforces the additional (unspecified but implied) length requirement of from 5 to 20 chars max:
function isUsernameValid($username) {
if (preg_match('/ # Validate User Registration.
^ # Anchor to start of string.
(?=[a-z0-9]+(?:[._][a-z0-9]+)?\z) # One inner dot or underscore max.
[a-z0-9._]{5,20} # Match from 5 to 20 valid chars.
\z # Anchor to end of string.
/ix', $username)) {
return true;
}
return false;
}
Note: No need to escape the dot inside the character class.

Categories