Regular Expression for Non-Alphanumeric, Non-Symbol Characters - php

I am writing a simple form pre-validator. I am looking for a PHP regular expression to match any irregular symbols. That is, non-Latic and non-numeric characters, and charachters which are not included in normal English punctuation (basically, match characters not in the second through forth column on this ascii table). Any regexp wizards out there who could help me out?

The second to fourth column can be translated into a simple regexp:
/[^ -~]/
matches any characters not between space and tilde.

Answer is over here.
The long & short of this. This PCRE: [^\x00-\x7F]

Related

php: strip everything except alphanumeric unicode and two characters

I am trying to get a strip a text from all punctuation but since the text is in Spanish I can't use [A-Za-z0-9].
I have found this regex:
trim(preg_replace('#[^\p{L}\p{N}]+#u', ' ', $str)
which seems to do the job, but I would like to keep two special characters # and #, how can I achieve that?
Extra question: How can I delete all strings that are just numbers? e.g. 123 would be deleted but not as5623.
Thanks in advance!
You can simply add those characters to your negated class to retain them. And be sure to change your pattern delimiters to something other than # as well.
~[^\p{L}\p{N}##]+~u
To remove all strings that are numbers, you can place word boundaries \b around your pattern.
\b\d+\b
Note: A word boundary does not consume any characters. It asserts that on one side there is a word character, and on the other side there is not.
You can use posix character classes too.
/[^[:alnum:]##]+/
But for the two special character, you just have to add it inside character class.
To delete all the only number containing words following regex would work.
/\b[[:digit:]]+\b/

secured regular expression that restrict specific special characters

I tried to create regular expression with specification below
any alphabetic character (at least one)
any numeric character (at least one)
no spaces
accept all special characters (except ",;&|')
^(?=.*[0-9])(?=.*[a-z])(?!.*\s)((?!.*[",;&|'])|(?=(.*\W){1,}))(?!.*[",;&|'])$
This is the one I tried.
What I can do with this?
Question is still vague in nature, please provide some examples of accepted strings.
Just to get you started you can use:
character class in a negative lookahead
Don't forget start & end anchors:
Regex:
/^(?=.*?\d)(?=.*?[a-z])(?!.*?[ ",;&|']).+$/i
This regex will match 1 or more characters that are not one of ",;&|' and atleast one digit and a-z alpgabet is required.
Live Demo: http://www.rubular.com/r/nxdi79ZcRx
In PHP use it like this:
'/^(?=.*?\d)(?=.*?[a-z])(?!.*?[ ",;&|\']).+$/i'

Regex to find string containing special characters in text

I'm trying to formulate a regular expression that will allow me to find a string within a piece of text, if the string exists on its own i.e. not within another word (but surrounded by special characters is ok).
/\bword\b/i
The above regex works fine, and finds "word" in the text. The problem comes when the word I want to find is something like "c++". In this case it matches on any occurrence of the "c" character on it's own. I've tried escaping the "+" characters but it doesn't make any difference. I'm assuming because "+" is a non-word character, I'm possibly going down the wrong route and using word boundaries is not what I should be doing.
So I guess the question is, how can I use a regular expression to find a string in a piece of text, on it's own, and regardless of whether the string is alphanumeric or contains special characters. So in the following piece of text it should match on the 3 occurences of "c++":
c++
(c++)
perl/c++/assembly
But it should not match on the following:
maniac++
c++abc
This is intended so that my script can tell if a specific skill exists within a user's CV/resume. I'm using this with PHP's preg_match_all() function.
I've done a lot of searching but can't come up with a solution, hopefully someone with good regex knowledge can help.
Try this:
/(?<!\w)(c\+\+)(?!\w)/
The (?<!\w) is a negative lookbehind clause, meaning that a word character should not immediately precede your pattern. The (?!\w) part is negative lookahead, meaning that a word character should not immediately follow.
Hope this helps!

Regex for word characters in any language

Testing the PHP regex engine, I see that it considers only [0-9A-Za-z_] to be word characters. Letters of non-ASCII languages, such as Hebrew, are not matched as word characters with [\w]. Are there any PHP or Perl regex escape sequences which will match a letter in any language? I could add ranges for each alphabet that I expect to be used, but users will always surprise us with unexpected languages!
Note that this is not for security filtering but rather for tokenizing a text.
Try [\pL_] - see the reference at
http://php.net/manual/en/regexp.reference.unicode.php
Try \p{L}. It matches any kind of letter from any language. If you don't want to use char set [].

Match Arabic/English Alphanumeric using Regex

I would like to have a regular expression that matches:
Arabic letters.
List item
English alphanumeric.
3 Spaces maximum.
4 Underscores maximum.
Any order.
I tried varies solution but couldn't solve it.
Here is what i have now:
preg_match('#^([^\W_]*\s){0,3}[^\W_]*$#', $username)
The above expression allows:
3 spaces maximum
English alpanumerics
No underscore allowed
You can check if your Regex flavour supports this \p{Arabic} or \p{InArabic}.
Also experiment with mb_ereg_match() function: http://si2.php.net/manual/en/function.mb-ereg-match.php
If that doesn't work, there is no other option than explicitly writing all arabic characters into the expression. Messy, but does the work.
Since you are using php, you can first list all arabic characters into a string variable and then add that variable to regex, for the code manageability's sake.
I don't know about arabic characters, but the following regexp should match the others
([a-zA-Z0-9]{1,})\s{0,3}_{0,4}
This will match
(Alphanumeric)(0-3 spaces)(0-4 underscores)
If there are more than 4 underscores, the last ones will be omitted
If there are more than 3 spaces then the part after the 3 spaces will be ignored.
EDIT:
For arabic letters: First declare a string containing all arabic letters
so you'll have
$arabic='all_arabic_letters';
Then your regexp string will be
$regex='[' . $arabic . ']{1,}([a-zA-Z0-9]{1,})\s{0,3}_{0,4}';
And match it as follows:
preg_match($regex, $username);

Categories