Match Polish characters in PHP with preg_match - php

I am trying to do some server side validation in PHP. I tried hard but I found still no solution. I am trying to allow only Polish characters in the input.
For this I have used:
preg_match('/^[\x{0104}-\x{017c}]*$/u',$titles)
This doesn't work however.
Anyone has any idea how to write it properly?

To match Polish letters only, you just need a character class:
[a-pr-uwy-zA-PR-UWY-ZąćęłńóśźżĄĆĘŁŃÓŚŹŻ]
Use as
preg_match('/^[A-PR-UWY-ZĄĆĘŁŃÓŚŹŻ]*$/iu',$titles)
Note that there is no Q, V and X in Polish, but since they can be met in some words (taxi), you may want to allow these letters as well. Then, use '/^[A-ZĄĆĘŁŃÓŚŹŻ]*$/iu' regex.
IDEONE demo
if (preg_match('/^[A-PR-UWY-ZĄĆĘŁŃÓŚŹŻ]*$/iu', "spółka")) {
echo "The whole string contains only Polish letters";
}

Related

Why is ctype_alnum unhelpful in matching culture-agnostic alphanumerics?

Let's suppose that I have a text in a variable called $text and I want to validate it, so that it can contain spaces, underscores, dots and any letters from any languages and any digits. Since I am a total noob with regular expressions, I thought I can work-around learning it, like this:
if (!ctype_alnum(str_replace(".", "", str_replace(" ", "", str_replace("_", "", $text))))) {
//invalid
}
This correctly considers the following inputs as valid:
foobarloremipsum
foobarloremipsu1m
foobarloremi psu1m
foobar._remi psu1m
So far, so good. But if I enter my name, Lajos Árpád, which contains non-English letters, then it is considered to be invalid.
Returns TRUE if every character in text is either a letter or a digit,
FALSE otherwise.
Source.
I suppose that a setting needs to be changed to allow non-English letters, but how can I use ctype_alnum to return true if and only if $text contains only letters or digits in a culture-agnostic fashion?
Alternatively, I am aware that some spooky regular expression can be used to resolve the issue, including things like \p{L} which is nice, but I am interested to know whether it is possible using ctype_alnum.
You need to use setlocale with category set to LC_CTYPE and the appropriate locale for the ctype_* family of functions to work on non-English characters.
Note that the locale that you're using with setlocale needs to actually be installed on the system, otherwise it won't work. The best way to remedy this situatioin is to use a portable solution, given in this answer to a similar question.

Regular expression testing with Greek characters php

Regular expressions are getting the better of me today. I've been searching google finding code that people say works ... and it doesn't for me!
Background:
I want to validate user input into a text field. The input can contain:
1). English characters and numbers
or
2). Greek character and numbers
If symbols (&%!) etc are found it should not match!
Currently:
Been testing this RegEx in PHP
/^[\p{Greek}\s\d a-zA-Z]+/u
testing with "γφψη 677" works
testing with "γφψη 677###hello" works too!! - BUT I WANT THIS TO FAIL
<?php
header("content-type: text/html;charset=utf-8");
if (preg_match_all('/^[\p{Greek}\s\d a-zA-Z]+/u', 'γφψη 677###hello')){
echo "ok!";
}
else {
echo "no!";
}
?>
Any ideas?
Thank you in advance!!
The problem with your regex: /^[\p{Greek}\s\d a-zA-Z]+/u is that it tells your engine what to start matching. That being said, it does not provide any instructions on what to do at the end of your string. Changing your regex to this: /^[\p{Greek}\s\d a-zA-Z]+$/u (notice the $ at the end) should fix the problem.
The ^ and $ combo essentially instruct the regex engine to start matching at the beginning of the string (^ and at the end $).

Accented character validation in PHP

What is a good one-liner php regex for checking first/last name fields with accented characters (in case someone's name was Pièrre), that could match something like:
<?php
$strErrorMessage = null;
if(!preg_match('/\p{L}0-9\s-+/u', trim($_POST["firstname"])))
$strErrorMessage = "Your first name can only contain valid characters, ".
"spaces, minus signs, or numbers.";
?>
This tries to use unicode verification, from this post, but doesn't work correctly. The solution seems pretty hard to google.
Aside from the difficulty to validate a name, you need to put your characters into a character class. /\p{L}0-9\s-+/u matches only on a sequence like "Ä0-9 ------". What you wanted to do is
/^[\p{L}0-9\s-]+$/u
Additionally I added anchors, they ensure that the regex tries to match the complete string.
As ex3v mentioned you should probably add \p{M} to that class to match also combination characters. See Unicode properties.
/^[\p{L}\p{M}0-9\s-]+$/u

REGEX at last one uppercase and one number

I searched everywhere but i couldn't find the right regex for my verificaiton
I have a $string, i want to make sure it contains at last one uppercase letter and one number. no other characters allowed just numbers and letter. is for a password require.
John8 = good
joHn8 = good
jo8hN = good
I will use preg_match function
The uppercase and letter can be everywhere in the word, not only at the begging or end
This should work, but is a bit of a mess. Consider using multiple checks for readability and maintainability...
preg_match('/^[A-Za-z0-9]*([A-Z][A-Za-z0-9]*\d|\d[A-Za-z0-9]*[A-Z])[A-Za-z0-9]*$/', $password);
Use lookahead:
preg_match('/^(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]+$/', $string);
Use this regex pattrn
^([A-Z]+([a-z0-9]+))$
Preg_match
preg_match('~^([A-Z]+([a-z0-9]+))$~',$str);
Demo
Your requisition need "precise syntax description", and a lot of examples for assert your description. Only 3 or 4 examples is not enough, is very open.
For last confirmed update:
preg_match('/^([a-z]*\d+[a-z]*[A-Z][a-z]*|[a-z]*[A-Z][a-z]*\d+[a-z]*)$/',$str)
History
first solution preg_match('/^[A-Z][a-z]+\d+$/',$str)
After your edit1: preg_match('/^[a-z]*[A-Z][a-z]*\d+$/',$str)
After your comment about utf8: hum... add at your question the valid language. Example: "José11" is a valid string?
After your edit2 ("jo8hN" is valid): and about number, can repeat? Well I suppose not. "8N" is valid? I suppose yes. preg_match('/^([a-z]*\d+[a-z]*[A-Z][a-z]*|[a-z]*[A-Z][a-z]*\d+[a-z]*)$/',$str) you can add more possibilities with "|" in this regex.

Match words in a string which are not in anchor of link with regex

I'm trying to find some words (or expression: like two words) in a string which are not in the anchor of a link (the string contains html code and is usually utf-8 encoded). The plan is to replace those words with some links after that.
I'm not really good with regex, i've searched the web and stackoverflow and found two regex patterns which help me, but each of them have an issue. I'm hoping someone can help me to combine those two example to get a good one.
First pattern: /('.$tag.')(?![^<]*<\/a>)/is
This pattern, finds the words, but if by example i'm trying to find "express" in the string:
In computing, a regular expression provides a concise and flexible means...
..i don't expect to find a match, however the match is found in the word "expression".
Second pattern: \'(?!((<.*?)|(<a.*?)))(\b'.$tag.'\b)(?!(([^<>]*?)>)|([^>]*?</a>))\'is
This pattern, doesn't have the previous issue, but if the word or expression, i'm trying to find has as a last character a special utf-8 character then i don't get a match.
Example word: apă
Example string: ...care transformă umiditatea din aer în apă potabilă. Dacă iniţial a fost creată pentru situaţia ţărilor...
Assuming the second regular expression works for you (I haven't tested it and I really don't think you should use regexes for this kind of stuff), all you need to do is add a u modifier like #hakre said:
\'(?!((<.*?)|(<a.*?)))(\b'.$tag.'\b)(?!(([^<>]*?)>)|([^>]*?</a>))\'isu
Personally, I'd use DOMDocument for this task.

Categories