Building a regex expression for PHP

Building a regex expression for PHP - php

I am stuck trying to create a regex that will allow for letters, numbers, and the following chars: _ - ! ? . ,
Here is what I have so far:
/^[-\'a-zA-Z0-9_!\?,.\s]+$/ //not escaping the ?
and this version too:
/^[-\'a-zA-Z0-9_!\?,.\s]+$/ //attempting to escape the ?
Neither of these seem to be able to match the following:
"Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?"
Can somebody point out what I am doing wrong? I must point out that my script takes the user input (the paragraph in quotes in this case) and strips all white space so actual input has no white space.
Thanks!
UPDATE:
Thanks to Lix's advice, this is what I have so far:
/^[-\'a-zA-Z0-9_!\?,\.\s]+$/
However, it's still not working??
UPDATE2
Ok, based on input this is what's happening.
User inputs string, then I run the string through following functions:
$comment = preg_replace('/\s+/', '',
htmlspecialchars(strip_tags(trim($user_comment_orig))));
So in the end, user input is just a long string of chars without any spaces. Then that string of chars is run using:
preg_match("#^[-_!?.,a-zA-Z0-9]+$#",$comment)
What could possibly be causing trouble here?
FINAL UPDATE:
Ended up using this regex:
"#[-'A-Z0-9_?!,.]+#i"
Thanks all! lol, ya'll are going to kill me once you find out where my mistake was!
Ok, so I had this piece of code:
if(!preg_match($pattern,$comment) || strlen($comment) < 2 || strlen($comment) > 60){
GEEZ!!! I never bothered to look at the strlen part of the code. Of course it was going to fail every time...I only allowed 60 chars!!!!

When in doubt, it's always safe to escape non alphanumeric characters in a class for matching, so the following is fine:
/^[\-\'a-zA-Z0-9\_\!\?\,\.\s]+$/
When run through a regular expression tester, this finds a match with your target just fine, so I would suggest you may have a problem elsewhere if that doesn't take care of everything.
I assume you're not including the quotes you used around the target when actually trying for a match? Since you didn't build double quote matching in...
Can somebody point out what I am doing wrong? I must point out that my script takes the user input (the paragraph in quotes in this case) and strips all white space so actual input has no white space.
in which case you don't need the \s if it's working correctly.

I got the following code to work as expected to (running php5):
<?php
$pattern = "#[-'A-Z0-9_?!,.\s]+#i";
$string = "Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?";
$results = array();
preg_match($pattern, $string, $results);
echo '<pre>';
print_r($results);
echo '</pre>';
?>
The output from print_r($results) was as following:
Array
(
[0] => Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?
)
Tested on http://writecodeonline.com/php/.

It's not necessary to escape most characters inside []. However, \s will not do what you want inside the expression. You have two options: either manually expand (/^[-\'a-zA-Z0-9_!?,. \t\n\r]+$/) or use alternation (/^(?:[-\'a-zA-Z0-9_!?,.]|\s)+$/).
Note that I left the \ before the ' because I'm assuming you're putting this in a PHP string and I wouldn't want to suggest a syntax error.

The only characters with a special meaning within a character class are:
the dash (since it can be used as a delimiter for ranges), except if it is used at the beginning (since in this case it is no part of any range),
the closing bracket,
the backslash.
In "pure regex parlance", your character class can be written as:
[-_!?.,a-zA-Z0-9\s]
Now, you need to escape whatever needs to be escaped according to your language and how strings are written. Given that this is PHP, you can take the above sample as is. Note that \s is interpreted in character classes as well, so this will match anything which is matched by \s outside of a character class.
While some manuals recommend using escapes for safety, knowing the general regex rules for character classes and applying them leads to shorter and easier to read results ;)

Related

Extract text between brakets tags in php using Regex

I have the following content in a string (query from the DB), example:
$fulltext = "Thank you so much, {gallery}art-by-stephen{/gallery}. As you know I fell in love with it from the moment I saw it and I couldn’t wait to have it in my home!"
So I only want to extract what it is between the {gallery} tags, I'm doing the following but it does not work:
$regexPatternGallery= '{gallery}([^"]*){/gallery}';
preg_match($regexPatternGallery, $fulltext, $matchesGallery);
if (!empty($matchesGallery[1])) {
echo ('<p>matchesGallery: '.$matchesGallery[1].'</p>');
}
Any suggestions?

Try this:
$regexPatternGallery= '/\{gallery\}(.*)\{\/gallery\}/';
You need to escape / and { with a \ before it. And you where missing start and end / of the pattern.
http://www.phpliveregex.com/p/fn1

Similar to Andreas answer but differ in ([^"]*?)
$regexPatternGallery= '/\{gallery\}([^"]*?)\{\/gallery\}/';
Don't forget to put / at the beginning and the end of the Regex string. That's a must in PHP, different from other programming languages.
{,},/ are characters that can be confused as a Regex logic, so you have to escape it using \ like \{.
Use ? to make the string to non-greedy, thus saves memory. It avoids error when facing this kind of string "blabla {galery}you should only get this{/gallery} but you also got this instead.{/gallery} Rarely happens but be careful anyway".

Try this RegEx:
\{gallery\}(.*?)\{\/gallery\}
The problem with your RegEx was that you did not escape the / in the closing {gallery}. You also need to escape { and }.
You should use .*? for a lazy match, otherwise if there are 2 tags in one string, it will combine them. I.e. {gallery}by-joe{/gallery} and {gallery}by-tim{/gallery} would end up as:
by-joe{/gallery} and {gallery}by-tim
However, using a lazy match, you would get 2 results:
by-joe
by-tim
Live Demo on Regex101

Preg_match when string is sometimes a single word?

I'm trying to pull a word out of an email subject line to use as a category for attached email. Preg_match works great as long as it's not just a single word (which is what I'd like to do anyway). If there is only one word in the subject line, I just get an empty array. I've tried to treat $matches as just a variable in that case, but that doesn't work either. Can anyone tell me if preg_match will work on a single word, or what the better way to do this would be?
Thanks very much

Assuming \b(?:word1|word2|word3)\b
The reason it wont match "word1" is because you included a word separator, the \b.
What you can do is just simply always inject the word separator:
preg_match("\b(?:word1|word2|word3)\b", "." . $subject . ".", $matches);
Crude but effective.

preg_match will work on a string one character long. I think that the issue here is probably your regex. My guess is that you're testing for whitespace and because it isn't finding any it says that there is no match. Try appending '^([^\s]*)$|' to your regex and I wager it will start picking up those one word values. ([^\s] means give me anything which has no spaces in it, | means 'or'. By adding it to the front of your regex, it will include things without whitespace or whatever you already had)

Replacing [[wiki:Title]] with a link to my wiki

I'm looking for a simple replacement of [[wiki:Title]] into Title.
So far, I have:
$text = preg_replace("/\[\[wiki:(\w+)\]\]/","\\1", $text);
The above works for single words, but I'm trying to include spaces and on occasion special characters.
I get the \w+, but \w\s+ and/or \.+ aren't doing anything.
Could someone improve my understanding of basic regex? And I don't mean for anyone to simply point me to a webpage.

\w\s+ means "a word-character, followed by 1 or more spaces". You probably meant (\w|\s)+ ("1 or more of a word character or a space character").
\.+ means "one or more dots". You probably meant .+ (1 or more of any character - except newlines, unless in single-line mode).
The more robust way is to use
\[wiki:(.+?)\]
This means "1 or more of any character, but stop at first position where the rest matches", i.e. stop at first right bracket in this case. Without ? it would look for the longest available match - i.e. past the first bracket.

You need to use \[\[wiki:([\w\s]+)\]\]. Notice square brackets around \w\s.
If you are learning regular expressions, you will find this site useful for testing: http://rexv.org/

You're definitely getting there, but you've got a couple syntax errors.
When you're using multiple character classes like \w and \s, in order to match within that group, you have to put them in [square brackets] like so... ([\w\s]+) this basically means one or more of words or white space.
Putting a backslash in front of the period escapes it, meaning the regex is searching for a period.
As for matching special characters, that's more of a pain. I tried to come up with something quickly, but hopefully someone else can help you with that.
(Great cheat sheet here, I keep a copy on my desk at all times: http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/ )

Match a regular expression against any non-character or number

Ok, here again.
I'll promise to study deeply the regular expression soon :P
Language: PhP
Problem:
Match if some badword exist inside a string and do something.
The word must be not included inside a "greater word". I mean if i'll search for "rob" (sorry Rob, i'm not thinking you're a badword), the word "problem have to pass without check.
I'd googled around but found nothing good for me. So, I thought something like this:
If i match the word with after and before any character of the following:
.
,
;
:
!
?
(
)
+
-
[whitespace]
I can simulate a check against single word inside a string.
Finally the Questions:
There's a better way to do it?
If not, which will be the correct regexp to consider [all_that_char]word[all_that_char]?
Thanks in advance to anyone would help!
Maybe this is a very stupid question but today is one of that day when move our neurons causes an incredible headache :|

Look up \b (word boundary):
Matches at the position between a word
character (anything matched by \w) and
a non-word character (anything matched
by [^\w] or \W) as well as at the
start and/or end of the string if the
first and/or last characters in the
string are word characters.
(http://www.regular-expressions.info/reference.html)
So: \brob\b matches rob, but not problem.

You can use \b, see Whole word bounderies.

replace exact match in php

im new to regular expressions in php.
I have some data in which some of the values are stored as zero(0).What i want to do is to replace them with '-'. I dont know which value will get zero as my database table gets updated daily thats why i have to place that replace thing on all the data.
$r_val=preg_replace('/(0)/','-',$r_val);
The code im using is replacing all the zeroes that it finds for eg. it is even replacing zero from 104.67,giving the output 1-4.56 which is wrong. i want that data where value is exact zero that must be replaced by '-' not every zero that it encounter.
Can anyone please help!!
Example of the values that $r_val is having :-
10.31,
391.05,
113393,
15.31,
1000 etc.

This depends alot on how your data is formatted inside $r_val, but a good place to start would be to try:
$r_val = preg_replace('/(?<!\.)\b0\b(?!\.)/', '-', $r_val);
Where \b is a 0-length character representing the start or end of a 'word'.
Strange as it may sound, but the Perl regex documentation is actually really good for explaining the regex part of the preg_* functions, since Perl is where the functionality is actually implemented.

Again, it would be more than helpful if you could supply an example of what the $r_val string really looks like.
Note that \b matches at word boundaries, which would also turn a string like "0.75" into "-.75". Not a desirable result, I guess.

Whilst the other answer does work, it seems overly complex to me. I think you need only to use the ^ and $ chars either side of 0.
$r_val = preg_replace('/^0+$/', '&#45', $r_val);
^ indicates the regex should match from the beginning of the line.
$ indicates the regex should match to the end of the line.
+ means match this pattern 1 or more times
I altered the minus sign to it's html code equivalent too. Paranoid, yes, but we are dealing with numbers after all, so I though throwing a raw minus sign in there might not be the best idea.

Why not just do this?
if ( $r_val == 0 )
$r_val = '-';
You do not need to use a regex for this. In fact, I'd advise against doing so for performance reasons. The operation above is approximately 20x faster than the regex solution.
Also, the PHP manual advises against using regexes for simple replacements:
If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of ereg_replace() or preg_replace().
http://us.php.net/manual/en/function.str-replace.php
Hope that helps!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Building a regex expression for PHP - php

Related

Extract text between brakets tags in php using Regex

Preg_match when string is sometimes a single word?

Replacing [[wiki:Title]] with a link to my wiki

Match a regular expression against any non-character or number

replace exact match in php

Categories

Resources