Is there a good way of test if a string is a regex or normal string in PHP?
Ideally I want to write a function to run a string through, that returns true or false.
I had a look at preg_last_error():
<?php
preg_match('/[a-z]/', 'test');
var_dump(preg_last_error());
preg_match('invalid regex', 'test');
var_dump(preg_last_error());
?>
Where obviously first one is not an error, and second one is. But preg_last_error() returns int 0 both times.
Any ideas?
The simplest way to test if a string is a regex is:
if( preg_match("/^\/.+\/[a-z]*$/i",$regex))
This will tell you if a string has a good chance of being intended to be as a regex. However there are many string that would pass that check but fail being a regex. Unescaped slashes in the middle, unknown modifiers at the end, mismatched parentheses etc. could all cause problems.
The reason preg_last_error returned 0 is because the "invalid regex" is not:
PREG_INTERNAL_ERROR (an internal error)
PREG_BACKTRACK_LIMIT_ERROR (excessively forcing backtracking)
PREG_RECURSION_LIMIT_ERROR (excessively recursing)
PREG_BAD_UTF8_ERROR (badly formatted UTF-8)
PREG_BAD_UTF8_OFFSET_ERROR (offset to the middle of a UTF-8 character)
Here is a good answer how to:
https://stackoverflow.com/a/12941133/2519073
if(#preg_match($yourPattern, null) === false){
//pattern is broken
}else{
//pattern is real
}
The only easy way to test if a regex is valid in PHP is to use it and check if a warning is thrown.
ini_set('track_errors', 'on');
$php_errormsg = '';
#preg_match('/[blah/', '');
if($php_errormsg) echo 'regex is invalid';
However, using arbitrary user input as a regex is a bad idea. There were security holes (buffer overflow => remote code execution) in the PCRE engine before and it might be possible to create specially crafted long regexes which require lots of cpu/memory to compile/execute.
Why not just use...another regex? Three lines, no # kludges or anything:
// Test this string
$str = "/^[A-Za-z ]+$/";
// Compare it to a regex pattern that simulates any regex
$regex = "/^\/[\s\S]+\/$/";
// Will it blend?
echo (preg_match($regex, $str) ? "TRUE" : "FALSE");
Or, in function form, even more pretty:
public static function isRegex($str0) {
$regex = "/^\/[\s\S]+\/$/";
return preg_match($regex, $str0);
}
This doesn't test validity; but it looks like the question is Is there a good way of test if a string is a regex or normal string in PHP? and it does do that.
Related
I have a problem with the strpos & substr function, thank you for your help:
$temp = "U:hhp|E:123#gmail.com,P:h123";
$find_or = strpos($temp,"|");
$find_and = strpos($temp,",");
$find_user = substr($temp,2,$find_or-2);
$find_email = substr($temp,$find_or+3,$find_and);
$find_passeord = substr($temp,$find_and+3,strlen($temp));
echo("$find_user+$find_email+$find_passeord<br/>");
/************************************/
Why is the output like this ??
hhp+123#gmail.com,P:h123 +h123
but i want this:
hhp+123#gmail.com,h123
The problem is that $find_and is the index of ,, but the third argument to substr() needs to be the length of the substring, not the ending index. So
$find_email = substr($temp,$find_or+3,$find_and);
should be
$find_email = substr($temp,$find_or+3,$find_and-$find_or-3);
For $find_passeord you can omit the 3rd argument, since the default is the end of the string.
However, this would be simpler with a regular expression:
if (preg_match('/^U:(.*?)\|E:(.*?),P:(.*)/', $temp, $match)) {
list($whole, $user, $email, $password) = $match;
}
if you have control over the input I would suggest
$temp = "U:hhp|E:123#gmail.com|P:h123";
list($user, $email, $password) = explode("|",$temp);
$user = explode(":",$user)[1];
$email = explode(":",$email)[1];
$password = explode(":",$password)[1];
if not then I still recommend exploding the string into parts and work your way down to what you need . https://3v4l.org/ is a great site for testing php code ... here is an example of this working https://3v4l.org/upEGG
Echoing what Barmar just said in a comment, regular expressions are definitely the best way to "break up a string." (It is quite-literally much of what they are for.) This is the preg_ family of PHP functions. (e.g. preg_match, preg_match_all, preg_replace.)
The million-dollar idea behind a "regular expression" is that it is a string-matching pattern. If the string "matches" that pattern, you can easily extract the exact substrings which matched portions of it.
In short, all of the strpos/substr logic that you are right now wrestling with ... "goes away!" Poof.
For example, this pattern: ^(.*)|(.*),(.*)$ ...
It says: "Anchored at the beginning of the string ^, capture () a pattern consisting of "zero or more occurrences of any character (.*), until you encounter a literal |. Now, for the second group, proceed until you find a ,. Then, for the third group, proceed to take any character until the end of the string $."
You can "match" that regular expression and simply be handed all three of these groups! (As well as "the total string that matched.") And you didn't have to "write" a thing!
There are thousands of web pages by now which discuss this "remarkable 'programming language' within a single cryptic string." But it just might be the most pragmatically-useful technology for any practitioner to know, and every programming language somehow implements it, more-or-less following the precedent first set by the (still active) programming language, Perl.
I want to check a particular string is present in a content without considering space differences or newline characters.
Case 1
input string: this is a sample test
checking word: asample test
Case 2
input string: sample test1 \n new line content
checking word : test1 new
You can use the following logic where you remove all space characters from both the pattern string and the string to analyze before check if the strings contains the pattern.
PROTOTYPE:
$input1='this is a sample test';
$inputFiltered1= str_replace('\s', '', $input1);
$pattern1='asample test';
$patternFiltered1= str_replace('\s', '', $pattern1);
$input2='sample test1 \n new line content';
$inputFiltered2= str_replace('\s', '', $input2);
$pattern2='test1 new';
$patternFiltered2= str_replace('\s', '', $pattern2);
if (strpos($inputFiltered1, $patternFiltered1) !== false) {
echo 'true';
}
if (strpos($inputFiltered2, $patternFiltered2) !== false) {
echo 'true';
}
Method #1: Pure regex, $input is unmodified
$pattern='~'.preg_replace('~\S(?!$)\K\s*~','\s*',$check).'~';
if(preg_match($pattern,$input)){
echo "Found: $check";
}else{
echo "Did Not Find: $check";
}
Method #2: Regex modification of $input and $check
if (strpos(preg_replace('~\s+~', '', $input), preg_replace('~\s+~', '', $check))!==false){
echo "Found: $check";
}else{
echo "Did Not Find: $check";
}
Method #3: Regex-free modification of $input and $check
$whitechars=[' ',"\t","\r","\n"]; // hardcode whitespace characters
if (strpos(str_replace($whitechars, '', $input), str_replace($whitechars, '', $check))!==false){
echo "Found: $check";
}else{
echo "Did Not Find: $check";
}
(Demonstrations Link)
Now, you might be asking: "Which one should I choose for my project?"
Answering that properly will first depend on the size and content of your $input data, then the number of times you'll be running this snippet, then personal preference, and probably least importantly the size and content of $check.
If your $input data is relatively large, then you will want to avoid performing any modifications on the string because of the impact on speed. Assuming your $check string is usually going to be quite small, modifying this value alone is going to produce minimal "drag" on execution time. As a general answer, I would recommend Method #1; although it is using two preg_ calls, they are processing a very small string. I should explain that Method #1 only prepares the $check value by placing \s* between all visible characters. If there are any whitespace characters in the string, they are removed during pattern preparation (assuming there are no leading or trailing whitespace characters in $check -- otherwise, call trim() or refine pattern preparation). (Method #1 Pattern Demo)
If you want to prepare both $input and $check by removing all occurrences of whitespace characters, then the most direct approach would be to call preg_replace() on one or more \s characters and replace with an empty string (Method #2). This will tell the regex engine to isolate substrings of whitespace(s) and remove them via a "single scan" of the given string. By comparison, if you want to avoid regex functions, you can use str_replace() to perform the same task. However, using Method #3 means that you will need to individually "inform" the function of all characters to be removed. str_replace() doesn't know what \s means. Unfortunately, by listing the whitespace character in an array, the function will be performing n "iterated scans" of the string AND it can only replace one character at a time.
To crystallize, if you have a string that contains a[space][newline][newline][tab][space]b, then preg_replace() will scan the string once and make a single replacement. If you call str_replace() on the same string, it will make four scans (remember: [' ',"\t","\r","\n"]) and make five separate replacements.
I suppose the message that I am driving home is: Regular Expressions sometimes get a bad rap, but this seems like a compelling case to select a regex method over a non-regex method (unless you benchmark your actual project and find that the regex methods are in fact negatively impacting performance).
Is there a regexp to check if a string is a valid php regexp ?
In the back office of my application, the administrator define the data type and he can define a pattern as a regexp. For example /^[A-Z][a-zA-Z]+[a-z]$/ and in the front office, i use this pattern for validate user entries.
In my code i use the PHP preg_match function
preg_match($pattern, $user_entries);
the first argument must be a valid PHP regexp, how can i be sure that $pattern is a valid regexp since it a user entrie in my back office.
Any idea ?
Execute it and catch any warnings.
$track_errors = ini_get('track_errors');
ini_set('track_errors', 'on');
$php_errormsg = '';
#preg_match($regex, 'dummy');
$error = $php_errormsg;
ini_set('track_errors', $track_errors);
if($error) {
// do something. $error contains the PHP warning thrown by the regex
}
If you just want to know if the regex fails or not you can simply use preg_match($regex, 'dummy') === false - that won't give you an error message though.
As a work-around, you could just try and use the regex and see if an error occurs:
function is_regex($pattern)
{
return #preg_match($pattern, "") !== false;
}
The function preg_match() returns false on error, and int when executing without error.
Background: I don't know if regular expressions themselves form a regular grammar, i.e. whether it's even possible in principle to verify a regex with a regex. The generic approach is to start parsing and checking if an error occurs, which is what the workaround does.
Technically, any expression can be a valid regular expression...
So, the validity of a regular expression will depend on the rules you want to respect.
I would:
Identify the rules your regex must do
Use a preg_match of your own, or some combination of substr to validate the pattern
You could use T-Regx library:
<?php
if (pattern('invalid {{')->valid()) {
https://t-regx.com/docs/is-valid
This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 4 years ago.
if(preg_match("/" . $filter . "/i", $node)) {
echo $node;
}
This code filters a variable to decide whether to display it or not. An example entry for $filter would be "office" or "164(.*)976".
I would like to know whether there is a simple way to say: if $filter does not match in $node. In the form of a regular expression?
So... not an "if(!preg_match" but more of a $filter = "!office" or "!164(.*)976" but one that works?
This can be done if you definitely want to use a "negative regex" instead of simply inverting the result of the positive regex:
if(preg_match("/^(?:(?!" . $filter . ").)*$/i", $node)) {
echo $node;
}
will match a string if it doesn't contain the regex/substring in $filter.
Explanation: (taking office as our example string)
^ # Anchor the match at the start of the string
(?: # Try to match the following:
(?! # (unless it's possible to match
office # the text "office" at this point)
) # (end of negative lookahead),
. # Any character
)* # zero or more times
$ # until the end of the string
The (?!...) negative assertion is what you're looking for.
To exclude a certain string from appearing anywhere in the subject you can use this double assertion method:
preg_match('/(?=^((?!not_this).)+$) (......)/xs', $string);
It allows to specify an arbitrary (......) main regex still. But you could just leave that out, if you only want to forbid a string.
Answer number 2 by mario is the correct answer, and here is why:
First to answer the comment by Justin Morgan,
I'm curious, do you have any idea what the performance of this would
be as opposed to the !preg_match() approach? I'm not in a place where
I can test them both. – Justin Morgan Apr 19 '11 at 21:53
Consider the gate logic for a moment.
When to negate preg_match(): when looking for a match and you want the condition to be 1)true for the absence of the desired regex, or 2)false for the regex being present.
When to use negative assertion on the regex: when looking for a match and you want the condition to be true if the string ONLY matches the regex, and fail if anything else is found. This is necessary if you really need to test for undesireable characters while allowing ommission of permitted characters.
Negating the result of (preg_match() === 1) only tests if the regex is present. If 'bar' is required, and numbers aren't allowed, the following won't work:
if (preg_match('bar', 'foo2bar') === 1) {
echo "found 'bar'"; // but a number is here, so fail.
}
if (!pregmatch('[0-9]', 'foobar') === 1) {
echo "no numbers found"; // but didn't test for 'bar', so fail.
}
So, in order to really test multiple regexes, a beginner would test using multiple preg_match() calls... we know this is a very amateur way to do it.
So, the Op wants to test a string for possible regexes, but the conditional may only pass as true if the string contains at least one of them. For most simple cases, simply negating preg_match() will suffice, but for more complex or extensive regex patterns, it won't. I will use my situation for a more real-life scenario:
Say you want to have a user form for a person's name, particularly a last name. You want your system to accept all letters regardless of case and placement, accept hyphens, accept apostrophes, and exclude all other characters. We know that matching a regex for all undesired characters is the first thing we think of, but imagine you are supporting UTF-8... that's alot of characters! Your program will be nearly as big as the UTF-8 table just on a single line! I don't care what hardware you have, your server application has a finite limit on how long a command be, not to mention the limit of 200 parenthesized subpatterns, so the ENTIRE UTF-8 character table (minus [A-Z],[a-z],-,and ') is too long, never mind that the program itself will be HUGE!
Since we won't use an if (!preg_match('.#\\$\%... this can be quite long and impossible to evaluate... on a string to see if the string is bad, we should instead test the easier way, with an assertion negative lookaround on the regex, then negate the overall result using:
<?php
$string = "O'Reilly-Finlay";
if (preg_match('/?![a-z\'-]/i', $string) === 0) {
echo "the given string matched exclusively for regex pattern";
// should not work on error, since preg_match returns false, which is not an int (we tested for identity, not equality)
} else {
echo "the given string did not match exclusively to the regex pattern";
}
?>
If we only looked for the regex [a-z\'-]/i , all we say is "match string if it contains ANY of those things", so bad characters aren't tested. If we negated at the function, we say "return false if we find a match that contained any of these things". This isn't right either, so we need to say "return false if we match ANYTHING not in the regex", which is done with lookahead. I know the bells are going off in someone's head, and they are thinking wildcard expansion style... no, lookahead doesn't do this, it just does negation on each match, and continues. So, it checks first character for regex, if it matches, it moves on until it finds a non-match or the end. After it finishes, everything that was found to not match the regex is returned to the match array, or simply returns 1. In short, assert negative on regex 'a' is the opposite of matching regex 'b', where 'b' contains EVERYTHING ELSE not matchable by 'a'. Great for when 'b' would be ungodly extensive.
Note: if my regex has an error in it, I apologize... I have been using Lua for the last few months, so I may be mixing my regex rules. Otherwise, the '?!' is proper lookahead syntax for PHP.
I have an input field where both regular text and sprintf tags can be entered.
Example: some text here. %1$s done %2$d times
How do I validate the sprintf parts so its not possible them wrong like %$1s ?
The text is utf-8 and as far as I know regex only match latin-1 characters.
www.regular-expressions.info does not list /u anywhere, which I think is used to tell that string is unicode.
Is the best way to just search the whole input field string for % or $ and if either found then apply the regex to validate the sprintf parts ?
I think the regex would be: /%\d\$(s|d|u|f)/u
I originally used Gumbo's regex to parse sprintf directives, but I immediately ran into a problem when trying to parse something like %1.2f. I ended up going back to PHP's sprintf manual and wrote the regex according to its rules. By far I'm not a regex expert, so I'm not sure if this is the cleanest way to write it:
/%(?:\d+\$)?[+-]?(?:[ 0]|'.{1})?-?\d*(?:\.\d+)?[bcdeEufFgGosxX]/
The UTF-8 modifier is not necessary unless you use UTF-8 in your pattern. And beside that the sprintf format is more complex, try the following
/%(?:\d+\$)?[dfsu]/
This would match both the %s and %1$s format.
But if you want to check every occurrence of % and whether a valid sprintf() format is following, regular expressions would not be a good choice. A sequential parser would be better.
This is what I ended up with, and its working.
// Always use server validation even if you have JS validation
if (!isset($_POST['input']) || empty($_POST['input'])) {
// Do stuff
} else {
$matches = explode(' ',$_POST['input']);
$validInput = true;
foreach ($matches as $m) {
// Check if a slice contains %$[number] as it indicates a sprintf format
if (preg_match('/[%\d\$]+/',$m) > 0) {
// Match found. Now check if its a valid sprintf format
if ($validInput === false || preg_match('/^%(?:\d+\$)?[dfsu]$/u',$m)===0) { // no match found
$validInput = false;
break; // Invalid sprintf format found. Abort
}
}
}
if ($validInput === false) {
// Do stuff when input is NOT valid
}
}
Thank you Gumbo for the regex pattern that matches both with and without order marking.
Edit: I realized that searching for % is wrong, since nothing will be checked if its forgotten/omitted. Above is new code.
"$validInput === false ||" can be omitted in the last if-statement, but I included it for completeness.