Regular Expressions PHP - php

I'm new with Regex in PHP and what I want to know is how to match words that are equal or like each other.
Example:
I have the word "designer" and the word "design", if we try to match the designer with design will return false, but if we try to match design with designer it will return a match. I need to match both cases using one preg_match statement.
Can Anyone help me?

I believe you are looking for stemming:
http://en.wikipedia.org/wiki/Stemming
If you are only looking to match on those two words then do as nickb suggested and keep it simple. If you are seeking to replicate this matching on many words then you could use this PorterStemmer class: http://tartarus.org/~martin/PorterStemmer/php.txt

What I think you're looking for is an optional match:
/design(?:er)?/
The parentheses group the "er", "?:" makes it non-capturing, and the "?" following make that group optional.
In more general terms, if you want to capture a word or any longer version of that word:
/design\w*/
That matches on "design" and zero or more ("*") word characters ("\w").

Related

How to match these strings using preg_match

preg_match('/"\'<>&/', 'misiek"')
Why does not it work ?
As stated in the comments - it does exactly what you told it to do. In your case, you simply check if the string you provide contains the exact substring: "\'<>& anywhere.
So with your pattern, the following strings would result in a match:
"'<>&
LOREM "'<>& IPSUM
Since both of these include the pattern you searched for. However, LO"R'EM<>IPS&UM would not return a match, because you are not checking for the individual characters, only the complete pattern.
If you change your pattern to:
/["\'<>&]/
You instead look for a list of characters. This will return true if any of the characters wrapped in brackets are found.
misiek - would in this case not match
LO"R'EM<>IPS&UM - would match
mis&iek - would match
You can test your regex patterns as well as build them on this site:
https://regex101.com
There you'll also find the available modifiers you can use and how / why to use them.
Good luck!
I am guessing: could it be that you want to match a string containing at least one of the characters listed in your regular expression? In that case you should do the following:
$res=preg_match('/["\'<>&]/' , 'misiek"');
And the result should be positive ($res===1), see here:
http://rextester.com/KYNGYI23753

Regex lookbehind position of words

The thing was not very nice.
My regex so far:
(?<!not\s)(?<!n't\s)(nice|friendly|excelent|comfortable|easy access|good|clean|beautiful)
I want to match only words (nice,friendly... in this sentence...) if the sentence does not contain words: "not" or "n't" (a.k.a wasn't, isn't). A.k.a positive sentences.
But this regex works only for sentences like:
The thing was very not nice.
How to write lookbehind to check if the words "not" or "n't" are not in whatever position before my adjectives?
You could use a negative look ahead to check there isn't any not or n't anywhere in the string, and \K to throw out the part you don't want:
^(?!.*(?:not|n't)).*\K(nice|friendly|excellent|comfortable|easy access|good|clean|beautiful)
(?!...) will fail if what's inside it matches, (?:not|n't) is a non capturing group.
Kind in mind that this is a pretty simple check though. It wouldn't match nice in This is nice but not pretty. If you want to add more in depth syntax understanding, you'll have to dig deeper.
Lookbehinds must be fixed-length.
It would perhaps be easier to divide the task. First check if the sentence contains "nice", "friendly" etc., and then in a separate conditional, check if it doesn't contain a negation.
This would make it easier to detect double-negatives too :p

Regex expression to mach one of many strings in php

I am totally new to regex , I want to match if the value is any one of the following
cs,ch,es,ee,it,me
Till now I have tried
if (preg_match("/^[cs|ch|es|ee|it|me]{2}$/",$val))
echo "true";
else
echo "false";
Its working fine for true cases but also returns true for reverse of them like sc,hc etc.
Also it will be really helpful if you refer some good source/books to learn it for PHP.
Remove the character class [] from your regex and wrap them using (). Also remove the {2} as its not necessary anymore.
if (preg_match("/^(cs|ch|es|ee|it|me)$/",$val))
And this will do for you.
You need to use () insteadof []
/^(cs|ch|es|ee|it|me)$/
Note: While using parentheses do not use {2}
So your Final code is:
if (preg_match("/^(cs|ch|es|ee|it|me)$/",$val))
echo "true";
else
echo "false";
TO learn regex for php I will suggest this book its a good one for quick refere or refer this question for more.
You must use the grouping delimiters (parentheses). The character class delimiters (square brackets) are used for matching ranges of characters.
/^(cs|ch|es|ee|it|me)$/
If you only use the regular expressions to match something (and not capture anything) then you can use the (?:) grouping.
/^(?:cs|ch|es|ee|it|me)$/
One of the better websites for learning regular expressions is regular-expressions.info.
do you know what [] does ?
lets take an example [abcdef]
it will match any of the letters mentioned in the square brackets, suppose you are providing : ^[cs|ch|es|ee|it|me]{2}$
it will match a single character in the list cs|heitm
you can add a single letter howsoever times you want but it will match only once.
so it will match any word of two letters as you have mentioned starting with the letters cs|heitm
so it will match cs hs |s etc.
hope you understand it :)
the corrected regex should be
/^(cs|ch|es|ee|it|me)$/
this will match for exact literal words rather than letters.

Match words in a string which are not in anchor of link with regex

I'm trying to find some words (or expression: like two words) in a string which are not in the anchor of a link (the string contains html code and is usually utf-8 encoded). The plan is to replace those words with some links after that.
I'm not really good with regex, i've searched the web and stackoverflow and found two regex patterns which help me, but each of them have an issue. I'm hoping someone can help me to combine those two example to get a good one.
First pattern: /('.$tag.')(?![^<]*<\/a>)/is
This pattern, finds the words, but if by example i'm trying to find "express" in the string:
In computing, a regular expression provides a concise and flexible means...
..i don't expect to find a match, however the match is found in the word "expression".
Second pattern: \'(?!((<.*?)|(<a.*?)))(\b'.$tag.'\b)(?!(([^<>]*?)>)|([^>]*?</a>))\'is
This pattern, doesn't have the previous issue, but if the word or expression, i'm trying to find has as a last character a special utf-8 character then i don't get a match.
Example word: apă
Example string: ...care transformă umiditatea din aer în apă potabilă. Dacă iniţial a fost creată pentru situaţia ţărilor...
Assuming the second regular expression works for you (I haven't tested it and I really don't think you should use regexes for this kind of stuff), all you need to do is add a u modifier like #hakre said:
\'(?!((<.*?)|(<a.*?)))(\b'.$tag.'\b)(?!(([^<>]*?)>)|([^>]*?</a>))\'isu
Personally, I'd use DOMDocument for this task.

Regex - Match ( only ) words with mixed chars

i'm writing my anti spam/badwors filter and i need if is possible,
to match (detect) only words formed by mixed characters like: fr1&nd$ and not friends
is this possible with regex!?
best regards!
Of course it's possible with regex! You're not asking to match nested parentheses! :P
But yes, this is the kind of thing regular expressions were built for. An example:
/\S*[^\w\s]+\S*/
This will match all of the following:
#ss
as$
a$s
#$s
a$$
#s$
#$$
It will not match this:
ass
Which I believe is what you want. How it works:
\S* matches 0 or more non-space characters. [^\w\s]+ matches only the symbols (it will match anything that isn't a word or a space), and matches 1 or more of them (so a symbol character is required.) Then the \S* again matches 0 or more non-space characters (symbols and letters).
If I may be allowed to suggest a better strategy, in Perl you can store a regex in a variable. I don't know if you can do this in PHP, but if you can, you can construct a list of variables like such:
$a = /[aA#]/ # regex that matches all a-like symbols
$b = /[bB]/
$c = /[cC(]/
# etc...
Or:
$regex = array( 'a' => /[aA#]/, 'b' => /[bB]/, 'c' => /[cC(]/, ... );
So that way, you can match "friend" in all its permutations with:
/$f$r$i$e$n$d/
Or:
/$regex['f']$regex['r']$regex['i']$regex['e']$regex['n']$regex['d']/
Granted, the second one looks unnecessarily verbose, but that's PHP for you. I think the second one is probably the best solution, since it stores them all in a hash, rather than all as separate variables, but I admit that the regex it produces is a bit ugly.
It is possible, you will not have very pretty regex rules, but you can match basically any pattern that you can describe using regex. The tricky part is describing it.
I would guess that you would have a bunch of regex rules to detect bad words like so:
To detect fr1&nd$, friends, fr**nd* you can use a regex like:
/fr[1iI*][&eE]nd[s$Sz]/
Doing something like this for each rule will find all the variations of possible characters in the brackets. Pick up a regex guide for more info.
(I'm assuming for a badwords filter you would want friend as well as frie**, you may want to mask the bad word as well as all possible permutations)
Didn't test this thoroughly, but this should do it:
(\w+)*(?<=[^A-Za-z ])
You could build some regular expressions like the following:
\p{L}+[\d\p{S}]+\S*
This will match any sequence of one or more letters (\p{L}+, see Unicode character preferences), one or more digits or symbols ([\d\p{S}]+) and any following non-whitespace characters \S*.
$str = 'fr1&nd$ and not friends';
preg_match('/\p{L}+[\d\p{S}]+\S*/', $str, $match);
var_dump($match);

Categories