I have a form on a site that is collecting user details. There was a fool submitting the form with a name like "Barneycok" from different IP addresses, so I learned how to block that name from going through on the form.
I learned a little regex, just enough to write this little piece:
if (preg_match('/\b(\w*arneycok)\b/', $FirstName)){
$error.= "<li><font face=arial><strong>Sorry, an error occured. Please try again later.</strong><br><br>";
$errors=1;
}
The code has worked perfectly and I never got that name coming through anymore. However, recently, someone is entering a string of numbers on the name field.
The string looks like this:
123456789
123498568
123477698
12346897w
If you notice, the first 4 characters are constant throughout.
So how do I add that in my regex above so that if the name starts with "1234", it will simply match that regex and give the user the error code?
Your help will be greatly appreciated.
Jaime
This will match $FirstName which starts with 1234. for matching a specific word like Barneycok you should use this (b|B)arneycok
Regex: ^\s*1234|\b(?:b|B)arneycok\b
1. ^\s*1234 starts with 1234 can contain spaces in starting
2. | is like or condition,
3. \b(?:b|B)arneycok\b matches the word which contains barneycok or Barneycok
Try this code snippet here
if (preg_match('/^1234|\b(?:b|B)arneycok\b/i', $FirstName))
{
$error.= "<li><font face=arial><strong>Sorry, an error occured. Please try again later.</strong><br><br>";
$errors = 1;
}
The following regex will work.
^1234.*
For the sake of providing the best possible pattern to protect your site, I'd like to offer this:
/^\s*1234|barneycok/i
This will match a string that has 1234 as its first non-white characters as well as a string that contains the substring barneycok (case insensitively).
Demo Link
You will notice that the pattern:
omits the leading word boundary (letting it catch abarneycok),
doesn't bother with a non-capturing group with a pipe between B and b (because it is pointless when using the i flag)
omits the trailing word boundary (letting it catch barneycoka)
uses the i flag so that bArNeYcOk is caught.
You can implement the pattern with:
if(preg_match('/^\s*1234|barneycok/i',$FirstName)){
$error.="<li><font face=arial><strong>Sorry, an error occurred. Please try again later.</strong><br><br>";
$errors=1;
}
On SO, it is important that the very best answers are posted and awarded the green tick, because sub-optimal answers run the risk of performing poorly for the OP as well as teaching future SO readers bad practices / sloppy coding writing habits. I hope you find this refinement helpful and informative.
Related
I searched everywhere but i couldn't find the right regex for my verificaiton
I have a $string, i want to make sure it contains at last one uppercase letter and one number. no other characters allowed just numbers and letter. is for a password require.
John8 = good
joHn8 = good
jo8hN = good
I will use preg_match function
The uppercase and letter can be everywhere in the word, not only at the begging or end
This should work, but is a bit of a mess. Consider using multiple checks for readability and maintainability...
preg_match('/^[A-Za-z0-9]*([A-Z][A-Za-z0-9]*\d|\d[A-Za-z0-9]*[A-Z])[A-Za-z0-9]*$/', $password);
Use lookahead:
preg_match('/^(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]+$/', $string);
Use this regex pattrn
^([A-Z]+([a-z0-9]+))$
Preg_match
preg_match('~^([A-Z]+([a-z0-9]+))$~',$str);
Demo
Your requisition need "precise syntax description", and a lot of examples for assert your description. Only 3 or 4 examples is not enough, is very open.
For last confirmed update:
preg_match('/^([a-z]*\d+[a-z]*[A-Z][a-z]*|[a-z]*[A-Z][a-z]*\d+[a-z]*)$/',$str)
History
first solution preg_match('/^[A-Z][a-z]+\d+$/',$str)
After your edit1: preg_match('/^[a-z]*[A-Z][a-z]*\d+$/',$str)
After your comment about utf8: hum... add at your question the valid language. Example: "José11" is a valid string?
After your edit2 ("jo8hN" is valid): and about number, can repeat? Well I suppose not. "8N" is valid? I suppose yes. preg_match('/^([a-z]*\d+[a-z]*[A-Z][a-z]*|[a-z]*[A-Z][a-z]*\d+[a-z]*)$/',$str) you can add more possibilities with "|" in this regex.
I'm having a little trouble figuring out the pattern to identify the beginning of inline replies/forwards in an email body, there are some easier ones that simply begin with something like "Begin forwarded message" but the replies are a little more complicated:
On 12-06-13 10:56 AM, "John Doe" <john.doe#some.tld> wrote:
Obviously the constants will be "On" and "wrote:". I'd like to be able to find only the first match and then either wrap everything after it in a div with display:none applied or even just eliminate it using substr($body,0, POSITION_OF_MATCH).
One of the issues I'm having is that it's not catching the FIRST occurrence, and second is that I can't get the greediness to work properly.
My progress (having fallen back to at least a partially working version) so far is:
preg_match("/On [^>]* wrote:/i",$content,$matches,PREG_OFFSET_CAPTURE);
Any help would be greatly appreciated!
You can probably break this down by elements; so you basically have:
On DATE, "NAME" <EMAIL> wrote:
You can then characterize DATE, NAME, and EMAIL.
DATE is composed of numbers, dashes, spaces, colons, and letters. However, it ends with a comma, so you can use that instead.
NAME is composed of letters and spaces, though it is delimited by quotes, and you can probably handle that.
EMAIL is a bit more complicated, but emails cannot contain the character >, so you should be able to capture everything but that.
So you basically get:
On [anything but comma], "[anything but "]" <[anything but >]> wrote:
Which, in regex, is something like:
/^On ([^,]+), \"([^\"]+)\" <([^>]+)> wrote:$/
Then, when using preg_match, you can get your matches from some $matches array, indices 1 through 3.
I wonder how your current version works at all, because you cannot possibly match the closing >. But you could do something like this:
$content = preg_replace('/(On [^>]*> wrote:).*$/s', '$1', $content);
Which will match the first On ... wrote: and everything after that up until the end of the string. And replace it by just the On ... wrote:.
I suggest
$email = preg_match('/^On [^"]*"[^"]*" <([^>]*)> wrote:$/', $str, $re) ? $re[1] : '';
See this demo.
I appreciate the other answers, but none of them really took into account the many possible variations in the reply strings I was dealing with, that might have been my fault for not explaining properly or providing more options. I've +1'd everyone for their efforts though.
The final solution which seems to be working best after a day of fiddling with it on and off is this:
/On (Mon|Tue|Wed|Thu|Fri|Sat|Sun|[[:digit:]]{1,2})(.*?) wrote:/i
The option list that it begins with covers a range of different reply types that start with "On Tue..." or "On 23..." or "On 1...", etc. ensuring that the greediness I was complaining about wasn't taking in too much from random "on" strings elsewhere, the (.*?) takes care of the rest of the name/email portion, finally following up with "wrote:" to finish it off.
I want to create a pregmatch pattern which applies to:
http://site.local/app/**/admin
text. I created something, which looks good, but it also pass the
http://site.local/app/vf/adming
what I dont want to. The basically created pattern:
preg_match('/http:\/\/site.local\/app\/.*\\/admin/', $siteUrl)
how should it be corrected?
Btw: operators/admins, I created this thread previously and since then that account is disabled. https://stackoverflow.com/questions/11139579/i-need-a-regexp Now that you see, I really tried it hard, may I get that account back? If not, I understand
[a-zA-Z] only letters and {1,5} from 1 to 5 length. If you to allow numbers just change it to [a-zA-Z0-9]
$site = 'http://site.local/app/at/admin';
if(preg_match('/^http:\/\/site.local\/app\/[a-zA-Z]{1,5}\/admin$/', $site)){
echo 1;
}
Use ^ and $ to "tell" regex the start and end of your pattern.
preg_match('/^http:\/\/site.local\/app\/(.*)\/admin$/', 'http://site.local/app/abcd/admin');
preg_match('/^http:\/\/site.local\/app\/(.*)\/admin$/', 'http://site.local/app/abcd/admins');
I would say the problem is that .* matches all characters where you actually want to match two *.
/^http:\/\/site.local\/app\/[\*]{2}\\/admin$/
Should do it...
Edit: To exlpain myself to the person who marked down.
The asker said he wanted a preg_match to match the
text
http://site.local/app/**/admin
I did just that. How can you mark me down for understanding English?
But to statisfy the asker cos he did mean any chars and any number of chars between app and admin here is the amended version:
/^http:\/\/site.local\/app\/.*\\/admin$/
I'm building this regex with a positive look ahead in it. Basically it must select all text in the line up to last period that precedes a ":" and add a "|" to the end to delimit it. Some sample text below. I am testing this in gskinner and editpadpro which has full grep regex support apparently so if I could get the answers in that for I'd appreciate it.
The regex below works to a degree but I am unsure if it is correct. Also it falls down if the text contains brackets.
Finally I would like to add another ignore rule like the one that ignores but includes "Co." in the selection. This second ignore rule would ignore but include periods that have a single Capital letter before them. Sample text below too. Thanks for all the help.
^(?:[^|]+\|){3}(.*?)[^(?:Co)]\.(?=[^:]*?\:)
121| Ryan, T.N. |2001. |I like regex. But does it like me (2) 2: 615-631.
122| O' Toole, H.Y. |2004. |(Note on the regex). Pages 90-91 In: Ryan, A. & Toole, B.L. (Editors) Guide to the regex functionality in php. Timmy, Tommy& Stewie, Quohog. * Produced for Family Guy in Quohog.
I don't think I understand what you want to do. But this part [^(?:Co)] is definitely not correct.
With the square brackets you are creating a character class, because of the ^ it is a negated class. That means at this place you don't want to match one of those characters (?:Co), in other words it will match any other character than "?)(:Co".
Update:
I don't think its possible. How should I distinguish between L. Co. or something similar and the end of the sentence?
But I found another error in your regex. The last part (?=[^:]*?\:) should be (?=[^.]*?\:) if you want to match the last dot before the : with your expression it will match on the first dot.
See it here on Regexr
This seems to do what you want.
(.*\.)(?=[^:]*?:)
It quite simply matches all text up to the last full stop that occurs before the colon.
I am trying to clean up user submitted comments in PHP using regex but have become rather stuck and confused!
Is it possible using regex to:
Remove punctuation repeated more than twice so that:
OMG it was AWESOME!!!! becomes OMG it was AWESOME!!
!!!!!!!!!!.........------ becomes !!..--
!?!?!? becomes !?
Remove duplicate words of phrases (for example a user has copied and pasted a message) so:
spamspamspamspam becomes spam
I love copy and paste. I love copy and paste. I love copy and paste. becomes I love copy and paste.
Remove collections of letters and spaces longer than say 10 letters in caps:
I LOVE CAPITALS THEY ARE SO AWESOME becomes I love capitals they are so awesome
GOOD that sounds stays the same
Any suggestions you have?
This is for a student system (hence the urge to at least try and tidy up what they post), although I do not wish to go as far as filtering it or blocking their messages, just "correct" it with some regex.
Thanks for your time,
Edit:
If it isn't possible using regex (or regex mised with other PHP) how would you do it?
1:
// same punctuation repeated more than 2 times
preg_replace('#([?!.-])\1{2,}#', '$1$1', $string);
// sequence of different punctuations repeated more than one time
preg_replace('#([?!.-][?!.-]+?)\1+#', '$1', $string);
2:
// any sequence of characters repeated more than one time
preg_replace('#(.{2,}?)\1+#', '$1', $string);
3:
// sequence of uppercase letters and spaces
function tolower_cb($match) {
return strtolower($match[0]);
}
preg_replace_callback('#([A-Z ]{10,})#', 'tolower_cb', $string);
Try it here: http://codepad.org/iQsZ2vJ0
A good rule of thumb is to never, ever try and "fix" user input. If a user wants to type 4 exclamation points after a sentence then allow it. There is no reason not too.
You should be more concerned with injection attacks then things like this.