regex \p{L} problems

regex \p{L} problems - php

im using this for my Validation:
$space = "[:blank:]";
$number = "0-9";
$letter = "\p{L}";
$specialchar = "-_.:!='\/&*%+,()";
...
$default = "/^[".$space.$number.$letter.$specialchar."]*$/";
if (!preg_match($all, $input)){
$error = true;
}
The Problem i have is:
all is working except "ü"... "Ü" is but "ü" not and i dont know why?
\p{L} should accept all letters and special letters... i dont get it why its not working :(
anyone an idea what i can do?
The data i try to validate is a POST Value from a registration FORM
// p.s. if im using \p{L}ü i get an error like this:
Compilation failed: range out of order in character class at offset 23 in...

Escape the dash:
$specialchar = "\-_.:!='\/&*%+,()";
# here __^
Also add the /u modifier for unicode matching:
$default = "/^[".$space.$number.$letter.$specialchar_def."]*$/u";
# here __^
Test:
$space = "[:blank:]";
$number = "0-9";
$letter = "\p{L}";
$specialchar = "\-_.:!='\/&*%+,()";
$default = "/^[".$space.$number.$letter.$specialchar."]*$/u";
// wrong variable name ^^^^^^^^^^^^ in your example.
$input = 'üÜ is but ';
if (!preg_match($default, $input)){
echo "KO\n";
} else {
echo "OK\n";
}
Output:
OK

The problem is the position that the hyphen is placed in. Within a character class you can place a hyphen as the first or last character in the range. If you place the hyphen anywhere else you need to escape it in order to add it to your class.
$specialchar = "_.:!='\/&*%+,()-";
Also you need to add the u (Unicode) modifier to your regular expression. This modifier turns on additional functionality of PCRE and pattern strings are treated as (UTF-8) and you have the wrong variable in the pattern.
$default = "/^[".$space.$number.$letter.$specialchar."]*$/u";

The - is a special character in character class used to specify a range of character. When you construct the regex by string concatenation, it is recommended that you escape it \-.
Including the fix above, there is another problem in "\-_.:!='\/&*%+,()". Do you want to include \ and /? Or only one of them?
If you want to include both, you should specify it as "\-_.:!='\\\\\/&*%+,()".
If you don't want to escape /, you can replace your separator / in the construction of $default to something not used in your regex, for example ~. In that case, the list of special character will have one less \: "\-_.:!='\\\\/&*%+,()".

Related

Create a function to find a specific word in the title

I have the following title formation on my website:
It's no use going back to yesterday, because at that time I was... Lewis Carroll
Always is: The phrase… (author).
I want to delete everything after the ellipsis (…), leaving only the sentence as the title. I thought of creating a function in php that would take the parts of the titles, throw them in an array and then I would work each part, identifying the only pattern I have in the title, which is the ellipsis… and then delete everything. But when I do that, in the X space of my array, it returns the following:
was...
In position 8 of the array comes the word and the ellipsis and I don't know how to find a pattern to delete the author of the title, my pattern was the ellipsis. Any idea?
<?php
$a = get_the_title(155571);
$search = '... ';
if(preg_match("/{$search}/i", $a)) {
echo 'true';
}
?>
I tried with the code above and found the ellipsis, but I needed to bring it into an array to delete the part I need. I tried something like this:
<?php
define('WP_USE_THEMES', false);
require('./wp-blog-header.php');
global $wpdb;
$title_array = explode(' ', get_the_title(155571));
$search = '... ';
if (array_key_exists("/{$search}/i",$title_array)) {
echo "true";
}
?>
I started doing it this way, but it doesn't work, any ideas?
Thanks,

If you use regex you need to escape the string as preg_quote() would do, because a dot belongs to the pattern.
But in your simple case, I would not use a regex and just search for the three dots from the end of the string.
Note: When the elipsis come from the browser, there's no way to detect in PHP.
$title = 'The phrase... (author).';
echo getPlainTitle($title);
function getPlainTitle(string $title) {
$rpos = strrpos($title, '...');
return ($rpos === false) ? $title : substr($title, 0, $rpos);
}
will output
The phrase

First of all, since you're working with regular expressions, you need to remember that . has a special meaning there: it means "any character". So /... / just means "any three characters followed by a space", which isn't what you want. To match a literal . you need to escape it as \.
Secondly, rather than searching or splitting, you could achieve what you want by replacing part of the string. For instance, you could find everything after the ellipsis, and replace it with an empty string. To do that you want a pattern of "dot dot dot followed by anything", where "anything" is spelled .*, so \.\.\..*
$title = preg_replace('/\.\.\..*/', '', $title);

Symfony ExpressionLanguage evaluate string with dashes

I'm trying to evaluate some strings containing dashes with the symfony ExpressionLanguage component.
Here is what I've got so far :
...
$string = 'user.chuck-norris.getId()';
$language = new ExpressionLanguage();
$evaluated = $language->evaluate($expression, $users);
...
This returns me the following error :
Variable "norris" is not valid around position 12. (Symfony\Component\ExpressionLanguage\SyntaxError)
If I change the dash "-" by an underscore "_", this works, but I have slug system which use dash and I dont wont to change it if I can avoid it.
Is there any solution?
Thanks

Like stated by Yonel, dashes are interpretated as operator.
So for this to work, I just have to replace dashes by undescores
$string = 'user.chuck-norris.getId()';
And then before making the request, replace _ by -
$value = str_replace('_', '-', $value);

regex case insensitive and with/without whitespace

Not being that knowledgable in regex patterns and after reading all wikis and references I found I'm having problems altering a pattern for word detection and higlighting.
I found a function on another stackoverflow answer that did everything it was needed but now I found out it misses out on a few things
The function is:
function ParserGlossario($texto, $termos) {
$padrao = '\1\2\3';
if (empty($termos)) {
return $texto;
}
if (is_array($termos)) {
$substituir = array();
$com = array();
foreach ($termos as $key => $value) {
$key = $value;
$value = $padrao;
// $key = '([\s])(' . $key . ')([\s\.\,\!\?\<])';
$key = '([\s])(' . $key . ')([\s\.\,\!\?\<])';
$substituir[] = '|' . $key . '|ix';
$com[] = empty($value) ? $padrao : $value;
}
return preg_replace($substituir, $com, $texto);
} else {
$termos = '([\s])(' . $termos . ')([\s])';
return preg_replace('|'.$termos.'|i', $padrao, $texto);
}
}
Some words are not being highlighted (the ones marked with red arrows):
And I don't know if it helps, but here is the array of "terms" that is used to search the text:
EDIT. The string being searched is just plain text:
Abaxial Xxxxx acaule Acaule xxxxxx xxx; xxxxx xxx Abaxial esporos.
abaxial
EDIT. Added PHP code fiddle
http://phpfiddle.org/main/code/079ad24318f554d9f2ba
Any help? I really don't know much about regexes...

try
$key = '(^|\b)(' . $key . ')\b';
insetad of
$key = '([\s])(' . $key . ')([\s\.\,\!\?\<])';
should help. Your matches still will be in the second group but there will be no third and I think the first should not be touched, so I believe this
$padrao = '\1\2\3';
is better to be as
$padrao = '$2';
and forgot (sorry):
change
$substituir[] = '|' . $key . '|ix';
to
$substituir[] = '#' . $key . '#ix';
And also I would use a string
$com = empty($value) ? $padrao : $value;
instead of array, it's not needed in this case.

Let us look together on value of $key for example for array element acaule.
([\s])(acaule)([\s\.\,\!\?\<])
There are 3 marking groups defined by 3 pairs of (...).
The first marking group matches any whitespace character. If there is no whitespace character like for Abaxial at beginning of the string, the word is ignored.
Putting \s into a character class, i.e. within [...] is not really needed here as \s is itself a character class. ([\s]) and (\s) are equal.
The second marking group matches just the word from array.
The third marking group matches
either any whitespace character,
or a period,
or a comma,
or an exclamation mark,
or a question mark, i.e. the standard punctuation marks,
or a left angle bracket (from an HTML or XML tag).
A semicolon or colon is not matched and other non word characters are also ignored for a positive match.
If there is none of those characters like for abaxial at end of the string, the search is negative.
By the way: ([\s.,!?<]) is equal ([\s\.\,\!\?\<]) as only \ and ] (always) and - (depending on position) must be escaped with a backslash within a character class definition to be interpreted as literal character. Well, [ should be also escaped with a backslash within [...] for easier reading.
So it is clear why Abaxial at beginning of string and abaxial at end of the string are not matched.
But why is Acaule not matched?
Well, there is left to this word acaule with a space left and a space right as required for a positive match. So the space right of acaule was already taken for this positive match. Therefore for Acaule there is no whitespace character anymore left to this word.
There is \b which means word boundary not matching any character which might be used together with \W*? instead of ([\s]) and instead of ([\s\.\,\!\?\<]) to avoid matching substrings within a word.
Possible would be something like
$key = '(\W*?)(\b' . $key . '\b)(\W*?)';
\W*? means any non word character 0 or more times non-greedy.
\W? means any non word character 0 or 1 times and could be also used in first and third capturing group if that is better for the replace.
But what is the right search string depends on what you want as result of the replace.
I don't have a PHP interpreter installed at all and therefore can't try it out what your PHP code does on replace and therefore what you would like to see after replace done on provided example string.

preg_split : How to get what's before the split

I'm having some issues with the preg-split function.
I would like to get what is before the delimiter instead of what's after it.
I've found some leads explaining that using the following code would do the trick :
$var = end(preg_split('/\./',$string));
echo($var[0]);
But when I'm doing that I only get the first char and not every chars before the dot.
Here is my code :
$item = "software_technical_item.TI";
$joint = end(preg_split('/\./',$item));
I obviously get "TI" in $joint, I would like to get "software_technical_item", would someone know how to do that ?
Thanks,
Corentin.

Dot is a special character in regex which matches any character , you need to escape it in-order to match a literal dot.
$string = "software_technical_item.TI";
$var = preg_split('/\./',$string);
echo($var[0]);
Output:
software_technical_item

php preg match a-zA-Z and only one space between 2 or more words

For my PHP script I have this code:
if (!preg_match("/[^A-Za-z]/", $usersurname))
$usersurname_valid = 1;
This worked untill I realized a surname can be two or more words... doh.
Anyone can tell me how to write this code if I want to allow 1 space between two worlds? For example:
Jan Klaas is now wrong and Jan Klaas should be allowed, also Jan Klaas Martijn and so on should be allowed.
Even better would be a preg replace, to replace two or more spaces with 1, so when you write: Jan(space)(space)Klaas or Jan(space)(space)(space)(space)Klaas, it would return Jan(space)Klaas.
I searched around for a while but somehow I just can't get this space matching to work..
PS: When I got this working, I will apply this for the mid and last name too ofcourse.
===========================================
EDIT: After you helping me out, I re-wrote my code to:
// validate usersurname
$usersurname = preg_replace("/\s{2,}/"," ", $usersurname);
if (!preg_match("/^[A-Za-z]+(\s[A-Za-z]+)*$/",$usersurname))
$usersurname_valid = 1;
// validate usermidname
$usermidname = preg_replace("/\s{2,}/"," ", $usermidname);
if (!preg_match("/^[A-Za-z]+(\s[A-Za-z]+)*$/",$usermidname))
$usermidname_valid = 1;
// validate userforename
$userforename = preg_replace("/\s{2,}/"," ", $userforename);
if (!preg_match("/^[A-Za-z]+(\s[A-Za-z]+)*$/",$userforename))
$userforename_valid = 1;
and the error notifications
elseif ($usersurname_valid !=1)
echo ("<p id='notification'>Only alphabetic character are allowed for the last name. $usersurname $usermidname $userforename</p>");
// usermidname character validation
elseif ($usermidname_valid !=1)
echo ("<p id='notification'>Only alphabetic character are allowed for the middle name. $usersurname $usermidname $userforename</p>");
// userforename character validation
elseif ($userforename_valid !=1)
echo ("<p id='notification'>Only alphabetic character are allowed for the (EDIT) first name. $usersurname $usermidname $userforename</p>");
Replacing the spaces are working well and I need this preg_match to check on on A-Za-z + space. I think in this case it doesn't matter if it's matching more than 1 spaces because it's replaced anyway, right?
EDIT:
Solution for my case:
$usersurname = preg_replace("/\s{2,}/"," ", $usersurname);
if (!preg_match("/[^A-Za-z ]/", $usersurname))
This does the work. Thanks for helping out, J0HN

Well, solving the problem you have in mind:
if (!preg_match("/^[A-Za-z]+(\s[A-Za-z]+)*$/",$usersurname)) { ... }
But, well, it's just a part of the solution, and it's not bulletproof. Look at the list of common mistakes when handling names.
So, you'd better to re-think on your validation approach.
Replacing the multiple spaces is simpler to achieve as a separate instruction, something like
$processed_usersurname = preg_replace("/\s{2,}/"," ", $usersurname);
This will match and replace any two or more consequent whitespace characters (space, tab, linebreak and carriage return) to single space

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

regex \p{L} problems - php

Related

Create a function to find a specific word in the title

Symfony ExpressionLanguage evaluate string with dashes

regex case insensitive and with/without whitespace

preg_split : How to get what's before the split

php preg match a-zA-Z and only one space between 2 or more words

Categories

Resources