Regex Challenge - either ... or - php

I havent been able to figure this one out.
I need to match all those strings by matching whole and its surroundings underscores (in one regex statement):
whole_anything
anything_whole
anything_whole_anything
but it must NOT match this
anythingwholeanything
anything_wholeanything
anythingwhole_anything
That means... make a regex statement, that match phrase whole only if it has underscore before, after or both. Not if there are no underscores.
The following
preg_match("/(whole_|_whole_|_whole)/",string)
is not a solution ;)
2015/02/09 Edit: added conditions 5. and 6. for clarification

You could reduce the number of cases in the alternatives:
preg_match('/(_whole_?|whole_)/', $string);
If there's an underscore before, the underscore after is optional. But if there's no underscore before, the underscore after is required.
You can use a PHP variable to solve the problem of putting the word twice:
$word = preg_quote('whole');
preg_match("/(_{$word}_?|{$word}_)/", $string);

Another alternative. This way we check for the existence of a word boundary or _ both before and after whole, but we exclude the word whole by itself through a negative lookahead.
(?!\bwhole\b)((?:_|\b)whole(?:_|\b))
Regex Demo here.

You could exclude all alphanumeric characters prior to and after. Unfortunately you can't use \w because _ is considered a word character
([^a-zA-Z0-9])_?whole_?([^a-zA-Z0-9])
That will exclude alphanumeric before and after from matching, and the underscore in front, behind, or both, is optional. If none exist, it can't match because it can'be proceeded by a letter or number. You could change it to include special characters and the lot.

Related

preg_match wildcard, require at least one character

The preg_match below matches 'empty' (0 characters) against the wildcard. I want to disable that:
preg_match('/site.com\/subsection\/.*?/', $page_url);
So the thing above should match site.com/subsection/subpage, but shouldn't match the root dir site.com/subsection/
How can I adjust the regex above? Thanks in advance!
The .*? at the end of the pattern matches empty string. You need to make it match one or more characters using .+:
'/site\.com\/subsection\/.+/'
^
Now, it requires at least 1 char after site.com/subsection/.
Note the dot must be escaped to match a literal dot.
Also, it might be a good idea to use regex delimiters other than / (as OcuS suggests in the comments below) if you have many slashes in the pattern itself. I usually use tildes:
'~site\.com/subsection/.+~'

regexpneed to find word with minus and others

i have for now few types of code that i need to find and replace with regex.
{word1/G_KP8zXsDp8/word2}
{word1/GKP8zXsDp8/word2}
{word1/G-KP8zXsDp8/word2}
my replacement now is: /({word1\/)(\w+)\/(\w+)/ and it finds 1st and 2nd cases, but don't find 3rd one. and I need it what it would be in match[2] $2.
The problem is that \w doesn't match a hyphen by default — it is a shorthand representation for the character class [a-zA-Z0-9_].
To fix this, you can update your regex to include - as well. Use [\w-]+ instead of \w+.
/({word1\/)([\w-]+)\/(\w+)/
RegEx Demo

Strip trailing non-word character(s)

I need to strip any non-alphanumeric characters from the end of strings using PHP's preg_replace:
Word One, Two, -, Word One, Two,[space], Word One, Two,, Word One, Two should all become Word One, Two.
I have tried preg_replace('/(.+)\\W+$/', '$1', 'Word One, Two, -'); but this only strips the last non-word character. I also tried '/(.+)\\W*$/' as I assumed this would make it work if 0 or 1 non-word characters are found (as I need) but it then doesn't match at all. I think I need to make the \W greedy but I'm not sure how. Any ideas? Also, please feel free to explain to me what I am doing wrong so I don't find myself haunting the SO regex tag ;-)
This is because (.+) eats up all other character, including non-word characters. The regex engine starts matching the string and starts out with all characters in the capturing group. Only then it notices that the \W at the end of the string won't fit and backs up, tentatively allowing a single character to be matched by the \W. But a single character is all that's needed to satisfy the \W+, so it just stops and just strips that single character. That's also the reason why (.+)\W*$ doesn't work at all, because \W* is content with matching nothing at all.
Use
preg_replace('/\\W+$/', '', $foo);
instead. This avoids the problem by just replacing trailing non-word characters without even trying to match something else.
Another option would be
preg_replace('/(.+?)\\W+$/', '$1', $foo);
which would use a lazy quantifier (+?) for the capturing group. This quantifier tries satisfying the match while matching as little as possible (as opposed to + which tries to match as much as possible as we saw above). But generally I'd avoid replacing parts of the match by themselves if you can avoid it. To strip things from a string you certainly don't need to match more than you need to strip.
What your regex is doing is looking for the maximum possible amount of any character, while still keeping at least one non-word at the end.
What you need to do is just drop the (.+), and use:
preg_replace("/\W+$/","",$input);

Regular expression for validating a username?

I'm still kinda new to using Regular Expressions, so here's my plight. I have some rules for acceptable usernames and I'm trying to make an expression for them.
Here they are:
1-15 Characters
a-z, A-Z, 0-9, and spaces are acceptable
Must begin with a-z or A-Z
Cannot end in a space
Cannot contain two spaces in a row
This is as far as I've gotten with it.
/^[a-zA-Z]{1}([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$/
It works, for the most part, but doesn't match a single character such as "a".
Can anyone help me out here? I'm using PCRE in PHP if that makes any difference.
Try this:
/^(?=.{1,15}$)[a-zA-Z][a-zA-Z0-9]*(?: [a-zA-Z0-9]+)*$/
The look-ahead assertion (?=.{1,15}$) checks the length and the rest checks the structure:
[a-zA-Z] ensures that the first character is an alphabetic character;
[a-zA-Z0-9]* allows any number of following alphanumeric characters;
(?: [a-zA-Z0-9]+)* allows any number of sequences of a single space (not \s that allows any whitespace character) that must be followed by at least one alphanumeric character (see PCRE subpatterns for the syntax of (?:…)).
You could also remove the look-ahead assertion and check the length with strlen.
make everything after your first character optional
^[a-zA-Z]?([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$
The main problem of your regexp is that it needs at least two characters two have a match :
one for the [a-zA-Z]{1} part
one for the [^\s] part
Beside this problem, I see some parts of your regexp that could be improved :
The [^\s] class will match any character, except spaces : a dot or semi-colon will be accepted, try to use the [a-zA-Z0-9] class here to ensure the character is a correct one.
You can delete the {1} part at the beginning, as the regexp will match exactly one character by default

Check a variable using regex

Im about to create a registration form for my website. I need to check the variable, and accept it only if contains letter, number, _ or -.
How can do it with regex? I used to work with them with preg_replace(), but i think this is not the case. Also, i know that the "ereg" function is dead. Any solutions?
this regex is pretty common these days.
if(preg_match('/^[a-z0-9\-\_]+$/i',$username))
{
// Ok
}
Use preg_match:
preg_match('/^[\w-]+$/D', $str)
Here \w describes letters, digits and the _, so [\w-]+ matches one or more letters, digits, _, and -. ^ and $ are so called anchors that denote the begin and end of the string respectively. The D modifier avoids that $ really matches the end of the string and is not followed by a line break.
Note that the letter and digits that are matched by \w depend on the current locale and might match other letter or digits than just [a-zA-Z0-9]. So if you just want these, use them explicitly. And if you want to allow more than these, you could also try character classes that are describes by Unicode character properties like \p{L} for all Unicode letters.
Try preg_match(). http://php.net/manual/en/function.preg-match.php

Categories