regex (preg_match) woes - php

I'm sorry if this has been asked before, but I just can't get a straight answer from the interwebs today!
I need to validate a form field and check if there are 3 and only 3 (no more, no less), uppercase letters.
My sorry attempts at regex have so far all failed - I thought that
/^[A-Z]{3}$/
would do the job, but nix. Any takers?!

/^[A-Z]{3}$/ Will check the string for ...
begining_of_the_string->three_and_only_tree_uppercase_letters->end_of_line
no other letters are valid in the string with this regexp.
But, I tried it with js regexp. And I think the same for php. Could you provide full code (or at least part of it) of your php script ?

Related

Regular Expression (regex) match of base64_decode concatenated using PHP

So i've been trying to build a regex for the past couple hours and i'm starting to go crazy in thinking if this is even possible or worth wild.
I have a script that scans PHP files checking MD5 sum for known malicious files, and certain strings. Most recently i've come across files where instead of using base64_decode in the PHP file, they are using variables and concatenating it so the scanner doesn't pick it up.
As an example here's the latest one I found:
$a='bas'.'e6'.'4_d'.'ecode';eval($a
So because the scanner searches for base64_decode this file wasn't picked up as they are using PHP to concatenate base64_decode in a variable, and then call the variable.
Forgive me because i've just started with regex, but is it even possible to search for something like this using regex? I mean, I understand and was able to get a regex that would match that exact one, but what about if they used this instead:
$a='b'.'ase'.'64_d'.'ecode';eval($a
It wouldn't be picked up because the regex was looking for ' then b then a, etc etc.
I've already added
(eval)\(\$[a-z]
To send me an email as a notice to check the file, i'll have to let it run for a couple days and see how many false positives show up, but my main concern is with the base64_decode
If someone could please shed some light on this for me and maybe point me in the right direction, I would greatly appreciate it.
Thanks!!
You can use this regexp:
b\W*a\W*s\W*e\W*6\W*4\W*_\W*d\W*e\W*c\W*o\W*d\W*e
It searches for base64_decode with any non-alphanumeric characters interspersed.

Checking the structure of a string using preg_match

I don't have a deep knowledge of regular expressions (I just learned it today). I have a website and I want to ask how I create a 6 digit security code either in the form of:
1. LNLNLN
or
2. NLNLNL
Where L = Letter and N = Number
I am not sure of the best way to do this, but I have seen people using preg_match() to validate data. I found that using this regular expression works:
^[a-zA-Z][0-9][a-zA-Z][0-9][a-zA-Z][0-9]|^[0-9][a-zA-Z][0-9][a-zA-Z][0-9][a-zA-Z]
but this seems pretty long. I wonder if there is any way that I can check this more easily? Thank you
Use repetition
^([a-zA-Z][0-9]){3}|^([0-9][a-zA-Z]){3}
Then escape sequence \d
^([a-zA-Z]\d){3}|^(\d[a-zA-Z]){3}
With i option you can write even this.
^([a-z]\d){3}|^(\d[a-z]){3}
preg_match('/^([a-z]\d){3}|^(\d[a-z]){3}/i', $string)

Basic Regular Expression for

For some reason I always get stuck making anything past extremely basic regular expressions.
I'm trying to make a regular expression that kind of looks like a URL. I only want basic checking.
I would like it to match the following patterns where X is "something".
X://X.X
X://X.X... etc.
X.X
X.X... etc
If the string contains one of these patterns, it is sufficient checking for me. This way a url like www.example.com:8888 will still match. I have tried many different REGEX combinations with preg_match and cannot seem to get any to behave the way I want it to. I have consulted many other related REGEX questions on SO but my readings have not helped me.
Any help? I will be happy to provide more information if you would like but I don't know what else you would need.
It takes practice but here is one that I made using a regex tester (http://www.regextester.com/) to check my pattern:
^.+(:\/\/|\.)([a-zA-Z0-9]+\.)+.+
My approach is to slowly build my pattern from the beginning and add on one piece at a time. This cheatsheet is extremely helpful for remembering http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/ what everything is.
Basically the pattern starts at the beginning of the string and checks for any characters followed by either :// or . then checks for groupings of letters and numbers followed by a . ending with any number of characters.
The pattern could probably be improved with groupings to not pass on invalid characters. But this one was quick and dirty. You could replace the first and last . with the characters that would be valid.
UPDATE
Per the comments here is an updated pattern:
^.+?(:\/\/|\.)?([a-zA-Z0-9]+?\.)+.+
/^(.+:\/\/)?[^.]+\.[^.\/]+([.\/][^.\/]+)*$/

Regular expression for counting sentences in a block of text [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
PHP - How to split a paragraph into sentences.
I have a block of text that I would like to separate into sentences, what would be the best way of doing this? I thought of looking for '.','!','?' characters, but I realized there were some problems with this, such as when people use acronyms, or end a sentence with something like !?. What would be the best way to handle this? I figured there would be some regex that could handle this, but I'm open to a non-regex solution if that fits the problem better.
Regex isn't the best solution for this problem. You'd be served better by creating a parsing library. Something where you an easily create logic blocks to distinguish one thing from another. You'll need to come up with a set of rules breaking up the text into the chunks you'd like to see.
"Are you sure?" he asked.
Doesn't that mess things up when using regex? However, with a parser you could actually see
<start quote><capitalization>are you sure<question><end quote>he asked<period>
that with simple rules could say "that's one sentence."
Unfortunately there is no perfect solution for this, for the very reasons you stated. If it is content that you can somehow control or force a specified delimiter after every sentence, that would be ideal. Beyond that, all you can really do is look for (\.|!|?)+ and maybe even throw in a \s after that since most people pad new sentences with 1 or 2 spaces between the previous and next sentence.
I think the biggest problem is the possible existence of acronyms! Therefore you must use something like Prof. Knuth in a JavaDoc summary sentence so that the javadoc generator don't thinks that the first sentence ends after Prof..
This is a problem I don't know how anyone can reliably handle. The only approximate solution I could imagine is the use of an abbreviation dictionary.

Regex problem Email test

i have some problem with pattern bellow:
/([A-Z0-9]+[A-Z0-9\.\_\+\-]*){3,64}#(([A-Z0-9]+([-][A-Z0-9])*){2,}\.)+([A-Z0-9]+([-][A-Z0-9])*){2,}/i
It match email addresses and i have problem with this rule:
[A-Z0-9\.\_\+\-]*
If i remove the star it works but i want this characters to be 0 or more. I tested it on http://regexpal.com/ and it works but on preg_match_all (PHP) - didn't work
Thanks
Why not use PHPs filter_var()
filter_var('test#email.com', FILTER_VALIDATE_EMAIL)
There is no good regex to validate email addresses. If you absolutely MUST use regex, then maybe have a look at Validate an E-Mail Address with PHP, the Right Way. Although, this is by no means a perfect measure either.
Edit: After some digging, I came across Mailparse.
Mailparse is an extension for parsing
and working with email messages. It
can deal with » RFC 822 and » RFC 2045
(MIME) compliant messages.
Mailparse is stream based, which means
that it does not keep in-memory copies
of the files it processes - so it is
very resource efficient when dealing
with large messages.
First of all, there are plenty of resources for this available. A quick search for "email validation regex" yields tons of results... Including This One...
Secondly, the problem is not in the * character. The problem is in the whole block.
([A-Z0-9]+[A-Z0-9\.\_\+\-]*){3,64}
Look at what that's doing. It's basically saying match as many alpha-numerics as possible, then match as many alpha-numerics with other characters as possible, then repeat at least 3 and at most 64 times. That could be a LOT of characters...
Instead, you could do:
([A-Z0-9][A-Z0-9\.\_\+\-]{2,63})
Which will at most result in a match against a 64 character email.
Oh, and this is the pain of parsing emails with regex
There are plenty of other resources for validating email addresses (Including filter_var). Do some searching and see how the popular frameworks do it...
Try this regex :
/^[A-Z0-9][A-Z0-9\.\_\+\-]{3,64}#([A-Z0-9][-A-Z0-9]*\.)+[A-Z0-9]{2,}$/i
But like #Russell Dias said, you shouldn't use regex for emails.
While I agreed with Russel Dias, I believe your issue is with this entire block:
([A-Z0-9]+[A-Z0-9\.\_\+\-]*){3,64}
Basically you are saying, you want;
Letters or numbers, 1 or more times
Letters or numbers, 0 or more times
Repeat the above between 3 and 64 times
You have quantity modifier after whole group:
([A-Z0-9]+[A-Z0-9\.\_\+\-]*){3,64}
So this will require minimum of 3 alphabetical characters and something like this:
a5________#gmail.com
will not work, but this:
a____a___a___#gmail.com
will do the work. Better find a ready well tested regex.
Also, you don't have starting and ending delimiter, so something like this will pass:
&^$##&$^##&aaa5a55a55a#gmail.comADA;'DROP TABLE :)

Categories