PHP Validate string characters are UK or US Keyboard characters - php

What is the easiest or best way in PHP to validate true or false that a string only contains characters that can be typed using a standard US or UK keyboard with the keyboard language set to UK or US English?
To be a little more specific, I mean using a single key depression with or without using the shift key.
I think the characters are the following. 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~`!##$%^&*()_-+={[}]|\:;"'<,>.?/£ and Space

You can cover every ASCII character by [ -~] (i.e. range from space to tilde). Then just add £ too and there you go (you might need to add other characters as well, such as ± and §, but for that, have a look at the US and UK keyboard layouts).
Something like:
if(preg_match('#^[ -~£±§]*$#', $string)) {
// valid
}

The following regular expression may be of use for you:
/^([a-zA-Z0-9!"#$%&'()*+,\-.\/:;<=>?#[\\\]^_`{|}~\t ])*$/m
Use this as:
$result = (bool)preg_match('/^([a-zA-Z0-9!"#$%&\'()*+,\-.\/:;<=>?#[\\\]^_`{|}~\t ])*$/m', $input);
Or create a reusable function from this code:
function testUsUkKeyboard($input)
{
return (bool)preg_match('/^([a-zA-Z0-9!"#$%&\'()*+,\-.\/:;<=>?#[\\\]^_`{|}~\t ])*$/m', $input);
}

The easier way to check is to check if chars exist rather then they do not, so first you would need a list of chars that do not exists, you can get these from the ascii range 128 - 255 where as 0 - 127 is the regular key set.
Tio create the invalid array uou can do:
$chars = range(128,255);
The above array would contain all the chars in the table below:
then you should check agains the string in question, people say use regex, but i dont really think thats needed
$string = "testing a plain string";
for($s=0;$s<strlen($string);$s++)
{
if(in_array(ord($string[$s]),$chars))
{
//Invalid
}
}

Related

Selecting thousands separator character with RegEx

I need to change the decimal separator in a given string that has numbers in it.
What RegEx code can ONLY select the thousands separator character in the string?
It need to only select, when there is number around it. For example only when 123,456 I need to select and replace ,
I'm converting English numbers into Persian (e.g: Hello 123 becomes Hello ۱۲۳). Now I need to replace the decimal separator with Persian version too. But I don't know how I can select it with regex. e.g. Hello 121,534 most become Hello ۱۲۱/۵۳۴
The character that needs to be replaced is , with /
Use a regular expression with lookarounds.
$new_string = preg_replace('/(?<=\d),(?=\d)/', '/', $string);
DEMO
(?<=\d) means there has to be a digit before the comma, (?=\d) means there has to be a digit after it. But since these are lookarounds, they're not included in the match, so they don't get replaced.
According to your question, the main problem you face is to convert the English number into the Persian.
In PHP there is a library available that can format and parse numbers according to the locale, you can find it in the class NumberFormatter which makes use of the Unicode Common Locale Data Repository (CLDR) to handle - in the end - all languages known to the world.
So converting a number 123,456 from en_UK (or en_US) to fa_IR is shown in this little example:
$string = '123,456';
$float = (new NumberFormatter('en_UK', NumberFormatter::DECIMAL))->parse($string);
var_dump(
(new NumberFormatter('fa_IR', NumberFormatter::DECIMAL))->format($float)
);
Output:
string(14) "۱۲۳٬۴۵۶"
(play with it on 3v4l.org)
Now this shows (somehow) how to convert the number. I'm not so firm with Persian, so please excuse if I used the wrong locale here. There might be options as well to tell which character to use for grouping, but for the moment for the example, it's just to show that conversion of the numbers is taken care of by existing libraries. You don't need to re-invent this, which is even a sort of miss-wording, this isn't anything a single person could do, or at least it would be sort of insane to do this alone.
So after clarifying on how to convert these numbers, question remains on how to do that on the whole text. Well, why not locate all the potential places looking for and then try to parse the match and if successful (and only if successful) convert it to the different locale.
Luckily the NumberFormatter::parse() method returns false if parsing did fail (there is even more error reporting in case you're interested in more details) so this is workable.
For regular expression matching it only needs a pattern which matches a number (largest match wins) and the replacement can be done by callback. In the following example the translation is done verbose so the actual parsing and formatting is more visible:
# some text
$buffer = <<<TEXT
it need to only select , when there is number around it. for example only
when 123,456 i need to select and replace "," I'm converting English
numbers into Persian (e.g: "Hello 123" becomes "Hello ۱۲۳"). now I need to
replace the Decimal separator with Persian version too. but I don't know how
I can select it with regex. e.g: "Hello 121,534" most become
"Hello ۱۲۱/۵۳۴" The character that needs to be replaced is , with /
TEXT;
# prepare formatters
$inFormat = new NumberFormatter('en_UK', NumberFormatter::DECIMAL);
$outFormat = new NumberFormatter('fa_IR', NumberFormatter::DECIMAL);
$bufferWithFarsiNumbers = preg_replace_callback(
'(\b[1-9]\d{0,2}(?:[ ,.]\d{3})*\b)u',
function (array $matches) use ($inFormat, $outFormat) {
[$number] = $matches;
$result = $inFormat->parse($number);
if (false === $result) {
return $number;
}
return sprintf("< %s (%.4f) = %s >", $number, $result, $outFormat->format($result));
},
$buffer
);
echo $bufferWithFarsiNumbers;
Output:
it need to only select , when there is number around it. for example only
when < 123,456 (123456.0000) = ۱۲۳٬۴۵۶ > i need to select and replace "," I'm converting English
numbers into Persian (e.g: "Hello < 123 (123.0000) = ۱۲۳ >" becomes "Hello ۱۲۳"). now I need to
replace the Decimal separator with Persian version too. but I don't know how
I can select it with regex. e.g: "Hello < 121,534 (121534.0000) = ۱۲۱٬۵۳۴ >" most become
"Hello ۱۲۱/۵۳۴" The character that needs to be replaced is , with /
Here the magic is just two bring the string parts into action with the number conversion by making use of preg_replace_callback with a regular expression pattern which should match the needs in your question but is relatively easy to refine as you define the whole number part and false positives are filtered thanks to the NumberFormatter class:
pattern for Unicode UTF-8 strings
|
(\b[1-9]\d{0,2}(?:[ ,.]\d{3})*\b)u
| | |
| grouping character |
| |
word boundary -----------------+
(play with it on regex101.com)
Edit:
To only match the same grouping character over multiple thousand blocks, a named reference can be created and referenced back to it for the repetition:
(\b[1-9]\d{0,2}(?:(?<grouping_char>[ ,.])\d{3}(?:(?&grouping_char)\d{3})*)?\b)u
(now this get's less easy to read, get it deciphered and play with it on regex101.com)
To finalize the answer, only the return clause needs to be condensed to return $outFormat->format($result); and the $outFormat NumberFormatter might need some more configuration but as it is available in the closure, this can be done when it is created.
(play with it on 3v4l.org)
I hope this is helpful and opens up a broader picture to not look for solutions only because hitting a wall (and only there). Regex alone most often is not the answer. I'm pretty sure there are regex-freaks which can give you a one-liner which is pretty stable, but the context of using it will not be very stable. However not saying there is only one answer. Instead bringing together different levels of doings (divide and conquer) allows to rely on a stable number conversion even if yet still unsure on how to regex-pattern an English number.
You can write a regex to capture numbers with thousand separator, and then aggregate the two numeric parts with the separator you want :
$text = "Hello, world, 121,534" ;
$pattern = "/([0-9]{1,3}),([0-9]{3})/" ;
$new_text = preg_replace($pattern, "$1X$2", $text); // replace comma per 'X', keep other groups intact.
echo $new_text ; // Hello, world, 121X534
In PHP you can do that using str_replace
$a="Hello 123,456";
echo str_replace(",", "X", $a);
This will return: Hello 123X456

how to check a password's Content and length using an array Functions

A user enters a password, say 'tomorrow1234'. I'm aware that I can split it into an array with str_split, but after that, I want to go through each value and search them for things such as capitalization, number, or white space.
How would I go about doing this?
This is an old standby function I use to valiate password complexity. It requires that the password contains upper and lowercase letters, as well as non-alpha characters. Length checks are trivial and are handled elsewhere.
$req_regex = array(
'/[A-Z]/', //uppercase
'/[a-z]/', //lowercase
'/[^A-Za-z]/' //non-alpha
);
foreach($req_regex as $regex) {
if( !preg_match($regex, $password) ) {
return NULL;
}
}
I use the array and a loop so it's easy to add/remove conditions if necessary.
Sounds like your trying to verify password strength.
Check out this web page, your solution would be pretty complex to write a specific answer for, but you can use regex to check for things like capitalization, symbols and digits. This page has several examples you could modify for your needs.
http://www.cafewebmaster.com/check-password-strength-safety-php-and-regex
This is what I would use:
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$
Checks for 1 letter, 1 number, 1 special character and at least 8 characters long.

Splitting string containing letters and numbers not separated by any particular delimiter in PHP

Currently I am developing a web application to fetch Twitter stream and trying to create a natural language processing by my own.
Since my data is from Twitter (limited by 140 characters) there are many words shortened, or on this case, omitted space.
For example:
"Hi, my name is Bob. I m 19yo and 170cm tall"
Should be tokenized to:
- hi
- my
- name
- bob
- i
- 19
- yo
- 170
- cm
- tall
Notice that 19 and yo in 19yo have no space between them. I use it mostly for extracting numbers with their units.
Simply, what I need is a way to 'explode' each tokens that has number in it by chunk of numbers or letters without delimiter.
'123abc' will be ['123', 'abc']
'abc123' will be ['abc', '123']
'abc123xyz' will be ['abc', '123', 'xyz']
and so on.
What is the best way to achieve it in PHP?
I found something close to it, but it's C# and spesifically for day/month splitting. How do I split a string in C# based on letters and numbers
You can use preg_split
$string = "Hi, my name is Bob. I m 19yo and 170cm tall";
$parts = preg_split("/(,?\s+)|((?<=[a-z])(?=\d))|((?<=\d)(?=[a-z]))/i", $string);
var_dump ($parts);
When matching against the digit-letter boundary, the regular expression match must be zero-width. The characters themselves must not be included in the match. For this the zero-width lookarounds are useful.
http://codepad.org/i4Y6r6VS
how about this:
you extract numbers from string by using regexps, store them in an array, replace numbers in string with some kind of special character, which will 'hold' their position. and after parsing the string created only by your special chars and normal chars, you will feed your numbers from array to theirs reserved places.
just an idea, but imho might work for you.
EDIT:
try to run this short code, hopefully you will see my point in the output. (this code doesnt work on codepad, dont know why)
<?php
$str = "Hi, my name is Bob. I m 19yo and 170cm tall";
preg_match_all("#\d+#", $str, $matches);
$str = preg_replace("!\d+!", "#SPEC#", $str);
print_r($matches[0]);
print $str;

PHP - preg_match()

Alright, so I want the user to be able to enter every character from A-Z and every number from 0-9, but I don't want them entering "special characters".
Code:
if (preg_match("/^[a-zA-Z0-9]$/", $user_name)) {
#Stuff
}
How is it possible for it to check all of the characters given, and then check if those were matched? I've tried preg_match_all(), but I didn't honestly understand much of it.
Like if a user entered "FaiL65Mal", I want it to allow it and move on. But if they enter "Fail{]^7(,", I want it to appear with an error.
You just need a quantifier in your regex:
Zero or more characters *:
/^[a-zA-Z0-9]*$/
One or more characters +:
/^[a-zA-Z0-9]+$/
Your regex as is will only match a string with exactly one character that is either a letter or number. You want one of the above options for zero or more or one or more, depending on if you want to allow or reject the empty string.
Your regular expression needs to be changed to
/^[a-zA-Z0-9]{1,8}$/
For usernames between 1 and 8 characters. Just adjust the 8 to the appropriate number and perhaps the 1.
Currently your expression matches one character
Please keep in mid that preg_match() and other preg_*() functions aren't reliable because they return either 0 or false on fail, so a simple if won't throw on error.
Consider using T-Regx:
if (pattern(('^[a-zA-Z0-9]{1,8}$')->matches($input))
{
// Matches! :)
}

How can I test if an input field contains foreign characters?

I have an input field in a form. Upon pushing submit, I want to validate to make sure the user entered non-latin characters only, so any foreign language characters, like Chinese among many others. Or at the very least test to make sure it does not contain any latin characters.
Could I use a regular expression for this? What would be the best approach for this?
I am validating in both javaScript and in PHP. What solutions can I use to check for foreign characters in the input field in both programming languages?
In PHP, you can check the Unicode property IsLatin. That's probably closest to what you want.
So if preg_match('/\p{Latin}/u', $subject) returns true, then there is at least one Latin character in your $subject. See also this reference.
JavaScript doesn't support this; you'd have to contruct the valid Unicode ranges manually.
In Javascript, at least, you can use hex codes inside character range expressions:
var rlatins = /[\u0000-\u007f]/;
You can then test to see if there are any latin characters in a string like this:
if (rlatins.test(someString)) {
alert("ROMANI ITE DOMUM");
}
You're trying to check if all letters are not Latin, but you do accept accented letters.
A simple solution is to validate the string using the regex (this is useful if you have a validation plugin):
/^[^a-z]+$/i
^...$ - Match from start to end
^[...] - characters that are not
a-z - A though Z,
+ - with at least one letter
/i - ignoring case (could also done /^[^a-zA-Z]+$/ )
Another option is simply to look for a letter:
/[a-z]/i
This regex will match if the string conatins a letter, so you can unvalidated it.
In JavaScript you can check that easily with if:
var s = "שלום עולם";
if(s.match(/^[^a-z]+$/i){
}
or
if(!s.match(/[a-z]/i))
PHP has a different syntax and more security than JavaScript, but the regular expressions are the same.

Categories