PHP preg_match: any letter but no numbers (and symbols) - php

I am trying to set a validation rule for a field in my form that checks that the input only contains letters.
At first I tried to make a function that returned true if there were no numbers in the string, for that I used preg_match:
function my_format($str)
{
return preg_match('/^([^0-9])$', $str);
}
It doesn't matter how many times I look at the php manual, it seems like I won't get to understand how to create the pattern I want. What's wrong with what I made?
But I'd like to extend the question: I want the input text to contain any letter but no numbers nor symbols, like question marks, exclamation marks, and all those you can imagine. BUT the letters I want are not only a-z, I want letters with all kinds of accents, as those used in Spanish, Portuguese, Swedish, Polish, Serbian, Islandic...
I guess this is no easy task and hard or impossible to do with preg_match. It there any library that covers my exact needs?

If you're using utf-8 encoded input, go for unicode regex. Using the u modifier.
This one would match a string that only consists of letters and any kind of whitespace/invisible separators:
preg_match('~^[\p{L}\p{Z}]+$~u', $str);

function my_format($str)
{
return preg_match('/^\p{L}+$/', $str);
}
Simpler than you think about!
\p{L} matches any kind of letter from any language

First of all,Merry Christmas.
You are on the right track with the first one, just missing a + to match one or more non-number characters:
preg_match('/^([^0-9]+)$/', $str);
As you can see, 0-9 is a range, from number 0 to 9. This applies to some other cases, like a-z or A-Z, the '-' is special and it indicates that it is a range. for 0-9, you can use shorthand of \d like:
preg_match('/^([^\d]+)$/', $str);
For symbols, if your list is punctuations . , " ' ? ! ; : # $ % & ( ) * + - / < > = # [ ] \ ^ _ { } | ~, there is a shorthand.
preg_match('/^([^[:punct:]]+)$/', $str);
Combined you get:
preg_match('/^([^[:punct:]\d]+)$/', $str);

Use the [:alpha:] POSIX expression.
function my_format($str) {
return preg_match('/[[:alpha:]]+/u', $str);
}
The extra [] turns the POSIX into a range modified by the + to match 1 or more alphabetical characters. As you can see, the :alpha: POSIX matches accented characters as well
If you want to include whitespace, just add \s to the range:
preg_match('/[[:alpha:]\s]+/u', $str);
EDIT: Sorry, I misread your question when I looked over it a second time and thought you wanted punctuation. I've taken it back out.

Related

PHP: Find all words starting with ":"

Could you help me with PHP function/regex that in given text finds all words starting with character ":" ?
..in other words all substrings that start with ":" and are separated with " " (a space)
Since :word should probably be valid, and I guess :word:another should be considered two words, then you cannot say that there is always a space.
Words in natural languages can be followed by dots and other characters.
In digital input, they can be followed by end of line.
I suggest using this regexp:
~:\w+~
It takes any : character followed by at least one alpha character and will end at any character that is not valid letter.
Example: on RegExr.com
You can also try ~:\w+\b~, where \b is word boundary (literally end of word), but I see it not necessary here.
Note: \w stands for [a-zA-Z0-9_] meaning it catches underscores _ and digits 0-9 as well. It works pretty much like variable/function naming in PHP
EDIT (some notes on usage):
You said that in given text (I understand that like input with random things) you want to extract all words prepended with :, for example :word. To do that easily, you should use preg_match_all() function with PREG_PATTERN_ORDER flag.
Example:
$regex = '~(:\w+)~';
if (preg_match_all($regex, $input, $matches, PREG_PATTERN_ORDER)) {
foreach ($matches[1] as $word) {
echo $word .'<br/>';
}
}
regex: /:\w+\s/g
\w Matchs any word character
\s Matchs whitespace character
This would work:
preg_match('/^:\w*\s$/g', $var);
Sorry, because I don't use PHP. But I suppose that your problem is that PHP would have reserved the character ":" for some reason in its regex implementation ?
Well, in that case, you still can catch any word beginning with ":" and ending with some space this way:
(...)
match('^\x3A[.]*[\s]');
("3A" is hexadecimal value for 58, which is the ASCII code for ":")
This should work, I think...

Capitalize city names when they start with apostrophe

I have an issue with right capitalization of (dutch) city names when they start with an apostrophe. For instance I could have the names:
'S-HERTOGENBOSCH or 's gravendeel or 'T Harde
What I would like to do is to bring all to lowercase and then capitalize the following letter after the prefix 'S or 's or 'T. So the outcome should be:
's-Hertogenbosch and 's Gravendeel and 't Harde
I'm thinking about using a Regex to do this but am not quite sure yet how this should be done. Could someone point me in the right direction?
Thanks!
You could use preg_replace_callback.
$city = strtolower("'T-HERTOGENBOSCH");
echo preg_replace_callback("/('(s|t)( |\-))([a-z])/", function($matches) {
return $matches[1] . ucfirst($matches[4]);
}, $city);
The pattern is using multiple subpatterns, whose results getting reassembled in the callback function:
('(s|t)( |\-)) # Apostrophe, then 's' or 't', then '<space>' or '-'
([a-z]) # The following lowercased character
Note that I've wrapped the first part into an additional subpattern. This makes reassembling it simpler.
Try the following function which is based on sanchises RegEx (I edited it slightly...):
function dutch_city_name($name) {
$name = strtolower(trim($name));
$matches = array();
preg_match("/'([a-z])( |-)[a-z]*/", $name, $matches);
if(count($matches) == 0) {
return $name;
}
return "'".$matches[1].$matches[2].ucfirst(substr($name, 3, strlen($name) - 3));
}
I tried it and it is working.
Firstly, I would like to reccomend websites like regex101.com or equivalent. Then, lets talk you through a very basic regex:
-You want the literal "'" character followed by exactly one other character which you would like to match to 'uncapitalize',
-and then a whole word
Basically, you need to match something of the form '(a-zA-Z)(?: |\-)[a-zA-Z]*. From left to right
' Literal '
(a-zA-Z) Single character in the alphabet, lower- or uppercase. Is a matching group.
(?: |\-) Either a space or a dash. Is not a matching group
[a-zA-Z]* A series of characters in the alphabet. Could be (a-zA-Z)* if you want something with this bit too.
Now that you have your matching, all you need to do is replace it with the uncapitalized version, for example using a PHP function.
Here's one without regex. It simply checks if the first character is an apostrophe and if so, skips the character after the apostrophe when searching for the first letter to capitalize.
function capitalizeCityName($name) {
$name = strtolower(trim($name));
$i = ($name[0] === "'") ? 2 : 1;
for(; $i<strlen($name); $i++) {
if(ctype_alpha($name[$i])) {
$name[$i] = strtoupper($name[$i]);
break;
}
}
return $name;
}
print capitalizeCityName("'T Harde"); //'t Harde
print capitalizeCityName("Harde"); //Harde
I don't know if the PHP replace function you want to use supports changing case of letters on dynamic replace string. But following worked with Perl regular expression engine in text editor UltraEdit v21.10.
Search string:
'([STst])(\W)(\w)([\w\-]+)
Replace string:
'\L\1\E\2\U\3\E\L\4\E
or
'\l\1\2\u\3\L\4\E
The search string matches:
a straight apostrophe,
followed by character s or t in any case marked for backreferencing as string 1,
a single non word character marked for backreferencing as string 2,
a single word character marked for backreferencing as string 3,
1 or more additional word characters or hyphens marked for backreferencing as string 4.
The replace string:
keeps the apostrophe,
first marked string (character s or t in any case) converted to lower case,
second marked string unmodified,
third marked string (first word character of city name) converted to upper case,
fourth marked string converted to lower case.
Explanation of the special characters in replace string:
\l ... convert only next character to lower case.
\u ... convert only next character to upper case.
\L ... convert all characters up to \E to lower case.
\U ... convert all characters up to \E to upper case.
Note: The case conversion works only for the ASCII letters A-Za-z and not for language specific, localized letters like German umlauts, characters with an accent, etc.

Preg_replace with PHP and MySQL, allowing dashes and brackets

I have a website where people can add content and when they type in titles, all characters are filtered when parsing to MySQL with PHP, only allowing members to write text and numbers. But I want to allow dashes (-) and brackets/parenthesis (()). Currently, I have:
$video_title = preg_replace('#[^A-za-z0-9 ?!.,]#i', '', $_POST['video_title']);
What shall I add or remove to the preg_replace function to allow these characters?
Just add the \( \) \- to the expression
[^a-z0-9 ?!.,()-]
Since it just got erased, you only the the a-z once if it is case insensitive.
This is not really an answer, but it didn't fit well in the comment box.
Note that A-z may not do what you expect in a regexp character class: it matches all characters whose ASCII code lies between those of A and z, which includes all upper- and lowercase letters, but also a bunch of punctuation characters:
echo join("", preg_grep("/[A-z]/", array_map("chr", range(0, 255)))) . "\n";
Outputs:
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz

PHP regular expression to match alpha-numeric strings with some (but not all) punctuation

I've written a regular expression in PHP to allow strings that are alpha-numeric with any punctuation except & or #. Essentially, I need to allow anything on a standard American keyboard with the exception of those two characters. It took me a while to come up with the following regex, which seems to be doing what I need:
if (ereg("[^]A-Za-z0-9\[!\"#$%'()*+,./:;<=>?^_`{|}~\-]", $test_string)) {
// error message goes here
}
Which brings me to my question... is there a better, simpler, or more efficient way?
Have a look at character ranges:
#[!-%'-?A-~]+#
This will exclude the characters & (\0x26) and # (0x40).
Looking at an ASCII Table,you can see how this works:
The exclamation mark is the first character in the ASCII set, that is not whitespace. It will then match everything up to and including the % character, which immediately precedes the ampersand. Then the next range until the # character, which lies between ? and A. After that, we match everything unto the end of the standard ASCII character set which is a ~.
Update
To make things more readable, you might also consider to do this in two steps:
At first, filter anything outside of the default ASCII range.
#[!-~]+#
In a second step, filter your undesired characters, or simply do a str_pos on the characters.
At the end, you can compare it with what you started to see whether it contained any undesired characters.
Instead, you could also use a regex such as this for the second step.
/[^#&]+/
The steps are interchangeable and doing a str_pos on # or & as a first step, to identify bad characters, may be better performance wise.
What about this:
[^&#]
with preg_match
$str = 'a';
var_dump(preg_match('~^[^&#]+$~', $str)); // true
$str = '&';
var_dump(preg_match('~^[^&#]+$~', $str)); // false
$str = '!';
var_dump(preg_match('~^[^&#]+$~', $str)); // true
I think rather than testing for all the alpha numeric characters you can simply check for # and & and use a not?
$reg = '/#|&/';
if(!preg_match($reg, "YOUR STRING CAN GO HERE")){
// your code goes here
}

How can I check with a regex that a string contains only certain allowed characters?

I need a special regular expression and have no experience in them whatsoever, so I am turning to you guys on this one.
I need to validate a classifieds title field so it doesn't have any special characters in it, almost.
Only letters and numbers should be allowed, and also the three Swedish letters å, ä, ö (upper- or lowercase).
Besides the above, these should also be allowed:
The "&" sign.
Parentheses "()"
Mathematical signs "-", "+", "%", "/", "*"
Dollar and Euro signs
One accent signed letter: "é". // Only this one is required
Double quote and single quote signs.
The comma "," and point "." signs
Try this:
^[\s\da-zA-ZåäöÅÄÖ&()+%/*$€é,.'"-]*$
Breakdown:
^ = matches the start of the string
[...]* = matches any characters (or ranges) inside the brackets one or more times
$ = matches the end of the string
Updated with all the suggestions from the comments. Thanks guys!
There is a security flaw in the accepted answer:
^[\s\da-zA-ZåäöÅÄÖ&()+%/*$€é,.'"-]*$
This will generate a true response for empty strings as * is for 0 or more occurrences.
Here is a more secure version:
^[\s\da-zA-ZåäöÅÄÖ&()+%/*$€é,.'"-]+$
The + responds true to 1 or more occurrences.
More information can be found at https://regexr.com/
PHP has a variety of functions that can help with text validation. You may find them more appropriate than a straight regex. Consider strip_tags(), htmlspecialchars(), htmlentities()
As well, if you are running >PHP5.2, you can use the excellent Filter functions, which were designed for exactly your situation.
^[\sa-zA-Z0-9åäö&()+%/*$€é,.'"-]*$
will match all the required characters.
In PHP:
if (preg_match('#^[\sa-zA-Z0-9åäö&()+%/*$€é,.\'"-]*$#i', $subject)) {
# Successful match
} else {
# Match attempt failed
}

Categories