PHP very strict complex string validation using preg_match

PHP very strict complex string validation using preg_match - php

I am trying to validate a string against the following regular expression which has been imposed upon me:
[-,.:; 0-9A-Z&#$£¥€'"«»‘’“”?!/\\()\[\]{}<>]{3}[-,.:; 0-9A-Z&#$£¥€'"«»‘’“”?!/\\()\[\]{}<>*=#%+]{0,157}
Can anybody help with writing a preg_match in PHP to validate an input string against this? I am struggling because:
my knowledge of regex isn't that great in the first place
I see special characters in the regex itself which I feel sure PHP won't be happy about me inserting directly into a string (e.g. $£¥€)
In vain hope I just tried sticking it into preg_match, escaping the double quotes, thus:
$ste = "Some string input";
if(preg_match("/[-,.:; 0-9A-Z&#$£¥€'\"«»‘’“”?!/\\()\[\]{}<>]{3}[-,.:; 0-9A-Z&#$£¥€'\"«»‘’“”?!/\\()\[\]{}<>*=#%+]{0,157}/",$ste))
{
echo "OK";
}
else
{
echo "Not OK";
}
Thanks in advance!!

PHP will be perfectly happy with the "special" characters in the expression, provided you do the following:
Make sure the input string is encoded with UTF-8 encoding.
Make sure your PHP program file is saved using UFT-8 encoding. (and obviously you'll need to use UTF-8 encoding in all other parts of your system too, or you'll get borked characters showing up somewhere along the line, but that's outside the scope of this question)
Add the add the u modifier to the end of the regex pattern string to tell the regex parser to handle UTF-8 characters. ie:
preg_match("/....../u", ...);
^
add this
Other than that, you've got it pretty much spot on already.

You can do that:
if (preg_match('~^[ -"$&-),-<>?-\]{}£¥€«»‘’“”]{3}[ -\]{}£¥€«»‘’“”]{0,157}$~u', $ste))
echo 'OK';
else
echo 'Not OK';
I have added the "u" modifier for unicode, and reduced the size of the character classes using ranges (example:,-< means all characters between , and < in the unicode table).
But the most important, I have added anchors ^ and $ that means respectivly start and end of the string.

Related

How to get only integer value Started from Symbol from given string

I need a integer value which started from £ and Â£ , I try to do with regrex but I only getting value which starting from Â£.
Here I use the regrex Like this.
if(preg_match('/(\£[0-9]+(\.[0-9]{2})?)/',$vals,$matches))
{
$main[]= str_replace('£','',$matches[0]);
}
I am not familiar with regrex. so please share any solution. any help would highly appriciated.Thank you.

From your question I understand that you are having troubles with character encodings, so first of all I would suggest you to address this issue one step before, it is really important to resolve encoding issues in the earliest possible step.
Back to the question, first off, to avoid falling deeper into the charset encoding hell, I would recommend you to write your regexp literal in HEX, because otherwise the charset encoding in which you save your PHP files would affect the result. I.E. if you do something like this:
preg_match('/(£|Â£)(\d+)', ...)
It would match "£" and "Â£" (binary) if you save your source code in ISO-8859-1, but it would actually match "Â£" and "Ã‚Â£" (binary) if you chose to save your source code in UTF-8 (which might be a good idea in general). So be careful with this, and verify what your editor/IDE is doing!
My suggestion thus is to write it this way, which is equivalent for ISO-8859-1 and UTF-8:
preg_match('/(\xa3|\xc2\xa3)(\d+)', ...) // match "£" and "Â£"
Also I suggest to make use of the sub-pattern capture feature of regular expressions, so you don't have to str_replace() afterwards, this way:
if (preg_match('/(?:\xa3|\xc2\xa3)([0-9]+(?:\.[0-9]{2})?)/', $data, $regp)) {
$main[] = $regp[1];
}
The "?:" at after the "(" means "this is a sub-pattern, but don't capture it".
Note that you can also replace preg_match with preg_match_all and you will find in $regp[1] the array of all matching numbers already prepared.

Try with this modified regex:
(?:£|Â£)([0-9]+(\.[0-9]{2})?)
It should do the trick. But it will return you decimal values also, because of the:
(.[0-9]{2})?
You can remove it and it will return only the integer part after £|Â£

PHP regex to avoid inserting digits in field while working with Unicode

I have a form in my application that will fill using Unicode characters (Non English). As name field needs to fill just with letters, I have to detect numbers as an error. On the other hand, I don't know how can I write the exact regular expression that works on Unicode.
Please help me.

preg_match ('/\d/u', $string) will do.
The u modifier makes it safe to use on unicode strings

I got this from a previous answer... but forget which one - its now in my snippet library...
$post = '9999, škofja loka';
echo preg_match('/^\\d{4},[\\s\\p{L}]+$/u', $post);

regexunicode - Accented characters are removed when using preg_match_all

I have the the problem described in title.
If I use
preg_match_all('/\pL+/u', $_POST['word'], $new_word);
and I type hello à and ì the new_word returned is *hello and *
Why?
Someone advised me to specify all characters I want to convert in this way
preg_match_all('/\pL+/u', $_POST['word'], 'aäeëioöuáéíóú');
, but I want my application works with all existing accents (for a multilanguage website).
Can you help me?
Thanks.
EDIT: I specify that I utilise this regex to purify punctuation. It well purify all punctuation but unicode characters are wrong returned, in fact are not even returned.
EDIT 2: I am sorry, but I very badly explained.
The problem is not in preg_match_all but in
str_word_count($my_key, 2, 'aäáàeëéèiíìoöóòuúù');
I had to manually specify accented characters but I think there are many others. Right?

\pL should match all utf8 characters and spaces. Be sure, that $_POST['word'] is a string encoded with utf8. If not, try utf8_encode() before matching or check the encoding of your HTML form. In my tests, your example works like a charm.
You may use this together with count() to get the number of words. Then you need not care about the possible characters. \pL will do this for you. This should do the trick:
$string = "áll thât words wíth ìntérnâtiønal çhårs";
preg_match_all('/\pL+/u', $string, $words);
echo count($words[0]); // returns: 6

Try using mb_ereg_match() (instead of preg_match()) from Multibyte String PHP library. It is specially made for working with multibyte strings.

PHP Json_Encode strange characters?

I am using JSON_ENCODE in PHP to output data.
When it gets to this word: Æther it outputs \u00c6ther.
Anyone know of a way to make json output that character or am I going to have to change the text to not have that character in it?

That's the unicode version of the character. JavaScript should handle it properly. You'll notice the slash before it which means that it's an escape sequence. The u indicates it's a unicode code point and the hex digits represent the actual character.
See here for some more info.

That is working as specified. The RFC ( http://www.ietf.org/rfc/rfc4627.txt ) indicates that any character may be escaped, and your average printable character can be written in the \uXXXX format.
Any JSON parser that cannot understand a character escaped in that way is not compliant with the standard. Work on resolving that problem rather than trying to coax PHP into misbehaving as well.
(It is legal to put UTF-8 characters into JSON strings without escaping them as well, with a few exceptions, but the safe approach of escaping anything questionable is wise.)

preg_replace - NULL result?

Here's a small example (download, rename to .php and execute it in your shell):
test.txt
Why does preg_replace return NULL instead of the original string?
\x{2192} is the same as HTML "→" ("→").

I had an null response when my regular expression included the u UTF-8 PCRE modifier. If your source text is not UTF and you have this modifier, you'll get a null result.

From the documentation on preg_replace():
Return Values
preg_replace() returns an array if the
subject parameter is an array, or a
string otherwise.
If matches are found, the new subject
will be returned, otherwise subject
will be returned unchanged or NULL if
an error occurred.
In your pattern, I don't think the u flag is supported. WRONG
Edit: It seems like some kind of encoding issue with the subject. When I erase '147 3.2 V6 - GTA (184 kW)' and manually re-type it everything seems to work.
Edit 2: In the pattern you provided, there are 3 spaces that seem to be giving issues to the regex engine. When I convert them to decimal their value is 160 (as opposed to normal space 32). When I replace those spaces with normal ones it seems to work.
I've replaced the offending spaces with underscores below:
'147 3.2 V6 - GTA (184 kW)'
'147 3.2_V6 - GTA_(184_kW)'

You are using single quotes, which means the only thing that you can escape is other single quotes. To enable escape sequences (e.g. \x32, then use double quotes "")
I am not a UTF8 expert, but the escape code \x2192 is not correct either. You can do: \x21\x92 to get both bytes into your string, but you may want to look at utf8_encode and utf8_decode
Your source string has invalid characters in it, or something. PHP gives:
Warning: preg_replace(): Compilation failed: invalid UTF-8 string at offset 0 in test.php on line 7

I believe there is also a fault in your Regex expression: ~\x{2192}~u
Try replacing what I have and see if that works out for you: /\x{2192}/u

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP very strict complex string validation using preg_match - php

Related

How to get only integer value Started from Symbol from given string

PHP regex to avoid inserting digits in field while working with Unicode

regexunicode - Accented characters are removed when using preg_match_all

PHP Json_Encode strange characters?

preg_replace - NULL result?

Categories

Resources