PHP regular expression any kind of letter from any language

PHP regular expression any kind of letter from any language - php

I'm trying to create my own routing in php using regex,
my example returns true when the name is in latain, but when the name is in arabic returns false
preg_match('#^(en/users/(?<name>[\p{L}\p{Nd}\_\-\+]+))$#', 'en/users/علي+عثمان')
What am I doing wrong?

To match arabic script you have to use \p{Arabic} instead of \p{L}, and also set the pattern modifier u to enable UTF-8 support.
Like so:
preg_match('#^(en/users/([\p{L}\p{Ll}\p{Arabic}\p{Nd}\_\-\+]+))$#u', 'en/users/علي+عثمان')
Working example: https://ideone.com/Zwrnpg

Related

How to optimize lowercase and capitalize formats extensions? [duplicate]

How can I make the following regex ignore case sensitivity? It should match all the correct characters but ignore whether they are lower or uppercase.
G[a-b].*

Assuming you want the whole regex to ignore case, you should look for the i flag. Nearly all regex engines support it:
/G[a-b].*/i
string.match("G[a-b].*", "i")
Check the documentation for your language/platform/tool to find how the matching modes are specified.
If you want only part of the regex to be case insensitive (as my original answer presumed), then you have two options:
Use the (?i) and [optionally] (?-i) mode modifiers:
(?i)G[a-b](?-i).*
Put all the variations (i.e. lowercase and uppercase) in the regex - useful if mode modifiers are not supported:
[gG][a-bA-B].*
One last note: if you're dealing with Unicode characters besides ASCII, check whether or not your regex engine properly supports them.

Depends on implementation
but I would use
(?i)G[a-b].
VARIATIONS:
(?i) case-insensitive mode ON
(?-i) case-insensitive mode OFF
Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?im) in the middle of the regex then the modifier only applies to the part of the regex to the right of the modifier. With these flavors, you can turn off modes by preceding them with a minus sign (?-i).
Description is from the page:
https://www.regular-expressions.info/modifiers.html

regular expression for validate 'abc' ignoring case sensitive
(?i)(abc)

The i flag is normally used for case insensitivity. You don't give a language here, but it'll probably be something like /G[ab].*/i or /(?i)G[ab].*/.

Just for the sake of completeness I wanted to add the solution for regular expressions in C++ with Unicode:
std::tr1::wregex pattern(szPattern, std::tr1::regex_constants::icase);
if (std::tr1::regex_match(szString, pattern))
{
...
}

JavaScript
If you want to make it case insensitive just add i at the end of regex:
'Test'.match(/[A-Z]/gi) //Returns ["T", "e", "s", "t"]
Without i
'Test'.match(/[A-Z]/g) //Returns ["T"]

In JavaScript you should pass the i flag to the RegExp constructor as stated in MDN:
const regex = new RegExp('(abc)', 'i');
regex.test('ABc'); // true

As I discovered from this similar post (ignorecase in AWK), on old versions of awk (such as on vanilla Mac OS X), you may need to use 'tolower($0) ~ /pattern/'.
IGNORECASE or (?i) or /pattern/i will either generate an error or return true for every line.

C#
using System.Text.RegularExpressions;
...
Regex.Match(
input: "Check This String",
pattern: "Regex Pattern",
options: RegexOptions.IgnoreCase)
specifically: options: RegexOptions.IgnoreCase

[gG][aAbB].* probably simples solution if the pattern is not too complicated or long.

Addition to the already-accepted answers:
Grep usage:
Note that for greping it is simply the addition of the -i modifier. Ex: grep -rni regular_expression to search for this 'regular_expression' 'r'ecursively, case 'i'nsensitive, showing line 'n'umbers in the result.
Also, here's a great tool for verifying regular expressions: https://regex101.com/
Ex: See the expression and Explanation in this image.
References:
man pages (man grep)
http://droptips.com/using-grep-and-ignoring-case-case-insensitive-grep

In Java, Regex constructor has
Regex(String pattern, RegexOption option)
So to ignore cases, use
option = RegexOption.IGNORE_CASE

Kotlin:
"G[a-b].*".toRegex(RegexOption.IGNORE_CASE)

You also can lead your initial string, which you are going to check for pattern matching, to lower case. And using in your pattern lower case symbols respectively .

You can practice Regex In Visual Studio and Visual Studio Code using find/replace.
You need to select both Match Case and Regular Expressions for regex expressions with case. Else [A-Z] won't work.enter image description here

Why is ctype_alnum unhelpful in matching culture-agnostic alphanumerics?

Let's suppose that I have a text in a variable called $text and I want to validate it, so that it can contain spaces, underscores, dots and any letters from any languages and any digits. Since I am a total noob with regular expressions, I thought I can work-around learning it, like this:
if (!ctype_alnum(str_replace(".", "", str_replace(" ", "", str_replace("_", "", $text))))) {
//invalid
}
This correctly considers the following inputs as valid:
foobarloremipsum
foobarloremipsu1m
foobarloremi psu1m
foobar._remi psu1m
So far, so good. But if I enter my name, Lajos Árpád, which contains non-English letters, then it is considered to be invalid.
Returns TRUE if every character in text is either a letter or a digit,
FALSE otherwise.
Source.
I suppose that a setting needs to be changed to allow non-English letters, but how can I use ctype_alnum to return true if and only if $text contains only letters or digits in a culture-agnostic fashion?
Alternatively, I am aware that some spooky regular expression can be used to resolve the issue, including things like \p{L} which is nice, but I am interested to know whether it is possible using ctype_alnum.

You need to use setlocale with category set to LC_CTYPE and the appropriate locale for the ctype_* family of functions to work on non-English characters.
Note that the locale that you're using with setlocale needs to actually be installed on the system, otherwise it won't work. The best way to remedy this situatioin is to use a portable solution, given in this answer to a similar question.

PHP Regex : several stopping characters with Positive lookbehind

Hi stackoverflow community !
I'm trying to use a simple regex expression in PHP based on a Positive lookbehind. My objective is to extract everything in a URL between a domain name and a set of specific characters (? or & or /). I want to extract "bar" on those examples :
foo.com/bar?
foo.com/bar&
foo.com/bar/
I tried
(?<=foo\.com\/)[^/?&]+
it works fine in the plateform test
but not with PHP 5.3x preg_match : the error thrown is that I can't use several stopping characters - it works with one.
I also tried a combination of positive lookbehind/lookahead, but the issue remains the same.
What did I do wrong ?

In PHP, unlike (say) JavaScript, you can't use the regex-delimiter without escaping it, even inside a character class. So, you need to change this:
"/(?<=foo\.com\/)[^/?&]+/"
to this:
"/(?<=foo\.com\/)[^\/?&]+/"

Escape the slashes:
preg_match("/(?<=foo\.com\/)[^\/?&]+/", "http://www.foo.com/bar?", $result);
here ___^
or use another delimiter
preg_match("#(?<=foo\.com/)[^/?&]+#", "http://www.foo.com/bar?", $result);

Hebrew regex match not working in php

this is my current regex code to validate english & numbers:
const CANONICAL_FMT = '[0-9a-z]{1,64}';
public static function isCanonical($str)
{
return preg_match('/^(?:' . self::CANONICAL_FMT . ')$/', $str);
}
Pretty straight forward. Now i want to change that to validate only hebrew, underscore
and numbers. So i changed the code to:
public static function isCanonical($str)
{
return preg_match('/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/i', $str);
}
But it doesn't work. I basically took the hebrew UTF range out of Wikipedia.
What is Wrong here?

I was able to get it to work much more easily, using the /u flag and the \p{Hebrew} Unicode character property:
return preg_match('/^(?:\p{Hebrew}+|\w+)$/iu', $str);
Working example: http://ideone.com/gSlmh

If you want preg_match() to work properly with UTF-8, you might have to enable the u modifier (quoting) :
This modifier turns on additional functionality of PCRE that is
incompatible with Perl. Pattern strings are treated as UTF-8.
In your case, instead of using the following regex :
/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/i
I suppose you'd be using :
/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/iu
(Note the additionnal u at the end)

You need the /u modifier to add support for UTF-8.
Make sure you convert your hebrew input to UTF-8 if it's in some other codepage/character set.

php regular expression assistance bold a filename

I am not very good, with regular expression in php I am trying to get a reg_expression to find all file names such as /file-name-here.php and make it bold.
This expression works in Flash but not in php it also doesn't accept the '-' i'm not sure why i can't get it to work with preg_replace
/(https?://)?(www\.)?([a-zA-Z0-9_%]*)\b\.[a-z]{2,4}(\.[a-z]{2})?((/[a-zA-Z0-9_%]*)+)?(\.[a-z]*)?/g

I think you need to escape your forward slashes:
/(https?:\/\/)?(www\.)?([a-zA-Z0-9_%]*)\b\.[a-z]{2,4}(\.[a-z]{2})?((\/[a-zA-Z0-9_%]*)+)?(\.[a-z]*)?/g
Or you could use a different delimiter (in PHP, the first character is the delimiter for the regular expression):
#(https?://)?(www\.)?([a-zA-Z0-9_%]*)\b\.[a-z]{2,4}(\.[a-z]{2})?((/[a-zA-Z0-9_%]*)+)?(\.[a-z]*)?#g

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP regular expression any kind of letter from any language - php

I'm trying to create my own routing in php using regex, my example returns true when the name is in latain, but when the name is in arabic returns false preg_match('#^(en/users/(?<name>[\p{L}\p{Nd}\_\-\+]+))$#', 'en/users/علي+عثمان') What am I doing wrong?

To match arabic script you have to use \p{Arabic} instead of \p{L}, and also set the pattern modifier u to enable UTF-8 support. Like so: preg_match('#^(en/users/([\p{L}\p{Ll}\p{Arabic}\p{Nd}\_\-\+]+))$#u', 'en/users/علي+عثمان') Working example: https://ideone.com/Zwrnpg

Related

How to optimize lowercase and capitalize formats extensions? [duplicate]

Why is ctype_alnum unhelpful in matching culture-agnostic alphanumerics?

PHP Regex : several stopping characters with Positive lookbehind

Hebrew regex match not working in php

php regular expression assistance bold a filename

Categories

Resources