Matching multiples - php

I need to match the following using preg_match()
cats and dogs
catsAndDogs
i like cats and dogs
etc, so i simply stripped out the spaces and lowercased it and used that as my pattern in preg_match();
'/catsanddogs/i'
But now I need to match the following too:
cats+and+dogs
cats_and_dogs
cats+and_Dogs
So is there quick and easy way to do a string replace multiple times, other than nesting it? I want to end up with the same pattern to match with.
Thanks.

try this expression '/cats([+_-])?and([+_-])?dogs/i'
edit: just saw that you don't want a + after the "and" when you already have a + before the "and". If that's right then you should use this expression:
'/cats(\+and\+|\+and_|_and_|and)dogs/i'

I would go with #ITroubs answer in this situation, however, you can do multiple character/string replacements with strtr as follows:
$trans = array(' ' => '','+' => '','-' => '', '_' => '');
$str = 'cats+and_dogs';
echo strtr($str, $trans); // prints: catsanddogs
Read the documentation carefully before use.

OK, I don't think you have well defined your matching rules, but here is my simplified version:
(c|C)ats([+_ ][aA]|A)nd([+_ ][dD]|D)ogs
Probably you want to be case insensitive because you used /i in your pattern, but I would like to chip in with another approach.
The main differences - compared to the other answers - are the expression parts for the word bounding. I use [+_ ][aA]|A, so that the regex will match 'cats and' or 'catsAnd', but not 'catsand'. So the bottomline is, that I would only match camel case text if there is no whitespace in between.

Related

preg_replace - similar patterns

I have a string that contains something like "LAB_FF, LAB_FF12" and I'm trying to use preg_replace to look for both patterns and replace them with different strings using a pattern match of;
/LAB_[0-9A-F]{2}|LAB_[0-9A-F]{4}/
So input would be
LAB_FF, LAB_FF12
and the output would need to be
DAB_FF, HAD_FF12
Problem is, for the second string, it interprets it as "LAB_FF" instead of "LAB_FF12" and so the output is
DAB_FF, DAB_FF
I've tried splitting the input line out using 2 different preg_match statements, the first looking for the {2} pattern and the second looking for the {4} pattern. This sort of works in that I can get the correct output into 2 separate strings but then can't combine the two strings to give the single amended output.
\b is word boundary. Meaning it will look at where the word ends and not only pattern match.
https://regex101.com/r/upY0gn/1
$pattern = "/\bLAB_[0-9A-F]{2}\b|\bLAB_[0-9A-F]{4}\b/";
Seeing the comment on the other answer about how to replace the string.
This is one way.
The pattern will create empty entries in the output array for each pattern that fails.
In this case one (the first).
Then it's just a matter of substr.
$re = '/(\bLAB_[0-9A-F]{2}\b)|(\bLAB_[0-9A-F]{4}\b)/';
$str = 'LAB_FF12';
preg_match($re, $str, $matches);
var_dump($matches);
$substitutes = ["", "DAB", "HAD"];
For($i=1; $i<count($matches); $i++){
If($matches[$i] != ""){
$result = $substitutes[$i] . substr($matches[$i],3);
Break;
}
}
Echo $result;
https://3v4l.org/gRvHv
You can specify exact amounts in one set of curly braces, e.g. `{2,4}.
Just tested this and seems to work:
/LAB_[0-9A-F]{2,4}/
LAB_FF, LAB_FFF, LAB_FFFF
EDIT: My mistake, that actually matches between 2 and 4. If you change the order of your selections it matches the first it comes to, e.g.
/LAB_([0-9A-F]{4}|[0-9A-F]{2})/
LAB_FF, LAB_FFFF
EDIT2: The following will match LAB_even_amount_of_characters:
/LAB_([0-9A-F]{2})+/
LAB_FF, LAB_FFFF, LAB_FFFFFF...

Replace from one custom string to another custom string

How can I replace a string starting with 'a' and ending with 'z'?
basically I want to be able to do the same thing as str_replace but be indifferent to the values in between two strings in a 'haystack'.
Is there a built in function for this? If not, how would i go about efficiently making a function that accomplishes it?
That can be done with Regular Expression (RegEx for short).
Here is a simple example:
$string = 'coolAfrackZInLife';
$replacement = 'Stuff';
$result = preg_replace('/A.*Z/', $replacement, $string);
echo $result;
The above example will return coolStuffInLife
A little explanation on the givven RegEx /A.*Z/:
- The slashes indicate the beginning and end of the Regex;
- A and Z are the start and end characters between which you need to replace;
- . matches any single charecter
- * Zero or more of the given character (in our case - all of them)
- You can optionally want to use + instead of * which will match only if there is something in between
Take a look at Rubular.com for a simple way to test your RegExs. It also provides short RegEx reference
$string = "I really want to replace aFGHJKz with booo";
$new_string = preg_replace('/a[a-zA-z]+z/', 'boo', $string);
echo $new_string;
Be wary of the regex, are you wanting to find the first z or last z? Is it only letters that can be between? Alphanumeric? There are various scenarios you'd need to explain before I could expand on the regex.
use preg_replace so you can use regex patterns.

Regex matching optional section

So I have two possible strings here for example.
/user/name
and
/user/name?redirect=1
I'm trying to figure out the proper regex to match either with a result of:
Array ([0] => /user/name [1] => user [2] => name)
I think the part I'm having an issue with is that the question mark and the GET query after it are optional and will only be there some of the time. I've tried many different things and can't seem to come up with a regex to match the strings whether the ?** is there or not.
Don't use a regex,
Use parse_url(), and explode()
$result = parse_url("/here/is/a/path?query=string");
$pieces = explode("/", $result['path']);
? is the "zero-or-one" quantifier. So you could append (\?.*)? to your regex, which will optionally match zero or one instances of a literal question-mark followed by any number of characters.
In regex you can specify something as optional using the ? parameter. So for instance, the regex n?ever matches ever and never.
In your case, you might want something like /([A-Za-z0-9]+)/([A-Za-z0-9]+)(\?redirect=1)?
This will match /.../... (given the "..." consist of letters and numbers) or /.../...?redirect=1
If there are more possible flags that could come after the question mark than simply redirect=1, try the more general:
/([A-Za-z0-9]+)/([A-Za-z0-9]+)(\?[A-Za-z0-9]+=[A-Za-z0-9]+)?(&[A-Za-z0-9]+=[A-Za-z0-9]+)*
preg_match('{^/(user)/(name)(?=\?redirect=1)?$}', $subject, $matches);
This is a look ahead assertion. It won't be included in the match itself.
But like the other answers suggest you shouldn't use regex to parse URLs. Just posting the actual answer to the specific question for completeness.

Replacing a string inside a string in PHP

I have strings in my application that users can send via a form, and they can optionally replace words in that string with replacements that they also specify. For example if one of my users entered this string:
I am a user string and I need to be parsed.
And chose to replace and with foo the resulting string should be:
I am a user string foo I need to be parsed.
I need to somehow find the starting position of what they want to replace, replace it with the word they want and then tie it all together.
Could anyone write this up or at least provide an algorithm? My PHP skills aren't really up to the task :(
Thanks. :)
$result = preg_replace('/\band\b/i', 'foo', $subject);
will find all occurences of and where it's a word on its own and replace it with foo. \b ensures that there is a word boundary before and after and.
use preg_replace. You don't need to think so hard about this though you will have to learn a little bit about regexes. :)
Read up on str_replace, or for more complex replacements on Regular Expressions and preg_replace.
Examples for both:
<?php
$str = 'I am a user string and I need to be parsed.';
echo str_replace( 'and', 'foo', $str ) . "\n";
echo preg_replace( '/and/', 'foo', $str ) . "\n";
?>
In response to the comments of this answer, note that both examples above will replace every occurrence of the search string (and), even when it happens to be within another word.
To take care of that you either have to add the word separators to the str_replace call (see the comment of an example), but this will get quite complicated when you want to take care of all common word separators (space, commas, dots, exclamation marks, question marks etc.).
An easier to way to fix this problem is to use the power of regular expressions and make sure, the actual search string is not found within another word. See Tim Pietzcker's example below for a possible solution.

Regex - Match ( only ) words with mixed chars

i'm writing my anti spam/badwors filter and i need if is possible,
to match (detect) only words formed by mixed characters like: fr1&nd$ and not friends
is this possible with regex!?
best regards!
Of course it's possible with regex! You're not asking to match nested parentheses! :P
But yes, this is the kind of thing regular expressions were built for. An example:
/\S*[^\w\s]+\S*/
This will match all of the following:
#ss
as$
a$s
#$s
a$$
#s$
#$$
It will not match this:
ass
Which I believe is what you want. How it works:
\S* matches 0 or more non-space characters. [^\w\s]+ matches only the symbols (it will match anything that isn't a word or a space), and matches 1 or more of them (so a symbol character is required.) Then the \S* again matches 0 or more non-space characters (symbols and letters).
If I may be allowed to suggest a better strategy, in Perl you can store a regex in a variable. I don't know if you can do this in PHP, but if you can, you can construct a list of variables like such:
$a = /[aA#]/ # regex that matches all a-like symbols
$b = /[bB]/
$c = /[cC(]/
# etc...
Or:
$regex = array( 'a' => /[aA#]/, 'b' => /[bB]/, 'c' => /[cC(]/, ... );
So that way, you can match "friend" in all its permutations with:
/$f$r$i$e$n$d/
Or:
/$regex['f']$regex['r']$regex['i']$regex['e']$regex['n']$regex['d']/
Granted, the second one looks unnecessarily verbose, but that's PHP for you. I think the second one is probably the best solution, since it stores them all in a hash, rather than all as separate variables, but I admit that the regex it produces is a bit ugly.
It is possible, you will not have very pretty regex rules, but you can match basically any pattern that you can describe using regex. The tricky part is describing it.
I would guess that you would have a bunch of regex rules to detect bad words like so:
To detect fr1&nd$, friends, fr**nd* you can use a regex like:
/fr[1iI*][&eE]nd[s$Sz]/
Doing something like this for each rule will find all the variations of possible characters in the brackets. Pick up a regex guide for more info.
(I'm assuming for a badwords filter you would want friend as well as frie**, you may want to mask the bad word as well as all possible permutations)
Didn't test this thoroughly, but this should do it:
(\w+)*(?<=[^A-Za-z ])
You could build some regular expressions like the following:
\p{L}+[\d\p{S}]+\S*
This will match any sequence of one or more letters (\p{L}+, see Unicode character preferences), one or more digits or symbols ([\d\p{S}]+) and any following non-whitespace characters \S*.
$str = 'fr1&nd$ and not friends';
preg_match('/\p{L}+[\d\p{S}]+\S*/', $str, $match);
var_dump($match);

Categories