how do you validate characters AND words in regex?

how do you validate characters AND words in regex? - php

The Context
I'm in need of a bit of code that takes a very simple math string and runs PHP's eval() function. For example ...
$math = '25 * (233 - 1.5)';
echo eval("return $math;"); // returns 5787.5
However eval() is quite dangerous in the wrong hands, so the variable must be scrubbed. For the above, for example, a simple preg_replace would be ...
$math = '25 * (233 - 1.5)';
$replace = '/[^0-9\(\)\.\,\+\-\*\/\s]/';
$math = preg_replace($replace, '', $math);
echo eval("return $math;"); // returns 5787.5
... which ensures $math only contains valid characters ... .,+-*/, spaces and numbers, and no malicious code.
The Question
I want to allow a few very specific words (PHP math functions), such as pow, pi, min, max, etc.
What's the cleanest way to validate both characters and words in regex?
So if given this string ...
pow(25,2) / pi(); hack the pentagon;
... how would I remove everything that wasn't in the $replace regex, but preserve the words pow and pi?

Using php, you can match those words that you don't want to remove and use a (*SKIP)(*FAIL) approach.
You can also shorten the character class by remove the backslashes, and if you use a different delimiter than / in php you also don't have to escape the /
As you are replacing the matched characters in the character class with an empty string, you can use a quantifier + to match 1 or more consecutive matches and do a single replacement.
\b(?:p(?:i|ow)|m(?:in|ax))\b(*SKIP)(*FAIL)|[^0-9().,+*/\s-]+
The pattern matches
\b(?:p(?:i|ow)|m(?:in|ax))\b Match either pi pow min or max
(*SKIP)(*FAIL)| What is matches so far should not be part of the match result
[^0-9().,+*/\s-]+ Match 1+ times any char except the listed chars in the negated character class
Regex demo
If you don't want the spaces at the start and end, you could consider to trim $math
$math = 'pow(25,2) / pi(); hack the pentagon;';
$replace = '~\b(?:p(?:i|ow)|m(?:in|ax))\b(*SKIP)(*FAIL)|[^0-9().,+*/\s-]+~';
$math = preg_replace($replace, '', $math);
echo eval("return $math;"); // returns 198.94367886487

Related

split a value into two and then reverse the value in php

I have a value like this 73b6424b. I want to split value into two parts. Like 73b6 and 424b. Then the two split value want to reverse. Like 424b and 73b6. And concatenate this two value like this 424b73b6. I have already done this like way
$substr_device_value = 73b6424b;
$first_value = substr($substr_device_value,0,4);
$second_value = substr($substr_device_value,4,8);
$final_value = $second_value.$first_value;
I am searching more than easy way what I have done. Is it possible?? If yes then approach please

You may use
preg_replace('~^(.{4})(.{4})$~', '$2$1', $s)
See the regex demo
Details
^ - matches the string start position
(.{4}) - captures any 4 chars into Group 1 ($1)
(.{4}) - captures any 4 chars into Group 2 ($2)
$ - end of string.
The '$2$1' replacement pattern swaps the values.
NOTE: If you want to pre-validate the data before swapping, you may replace . pattern with a more specific one, say, \w to only match word chars, or [[:alnum:]] to only match alphanumeric chars, or [0-9a-z] if you plan to only match strings containing digits and lowercase ASCII letters.

Find and split a string by the first character that is not 0

I wanted to know how I could split a string based on the first character that is not 0, e.g.
$ID = ABC-000000160810;
I want to split the id so it looks like this:
$split_ID = 160810;
I tried to just get the last 6 digits, however the problem was that the 6 digits might not always be consistent, so just need to split based on the first non-zero. What is the easiest way to achieve this?
Thanks.

Here's a way using a regular expression:
$id = 'ABC-000000160810';
preg_match('/-0*([1-9][0-9]*)/', $id, $matches);
$split_id = $matches[1];

You can use ltrim if you only want to remove leading zeroes.
$ID = ABC-000000160810;
$split_ID = ltrim($str, '0');

Use ltrim to remove leading characters.
$id = 'ABC-00001234';
$numeric = ltrim(mb_substr($id, mb_strpos($id, '-') + 1), '0');
echo $numeric; // 1234
The above requires the mbstring extension to be enabled. If you encounter an error, either enable the extension or use the non-multibyte functions substr and strpos. Probably you should get in the habit of using the mb_ string functions.
This should also work:
const CHAR_MASK = 'a..zA..Z-0';
$id = 'ABC-00001234';
$numeric = ltrim($id, CHAR_MASK);
echo $numeric; // 1234

For your example "ABC-00000016081" you might use a regex that would match the first part up until you encounter not a zero and then use \K to not include the previously consumed characters in the final match.
[^-]+-0+\K[1-9][0-9]+
[^-]+ Match not a - one or more times using a negated character class
- Match literally
0+ Match one or more times a zero (If you want your match without leading zeroes you could use 0*)
\K Resets the starting point of the reported match
[1-9][0-9]* Match your value starting with a digit 1 -9
Test

You can substr off the ABC part and multiply with 1 to make it a number.
$ID = "ABC-000000160810";
Echo substr($ID, 4)*1;

Using regex to extract first half of string

I have variable strings like the below:
The.Test.String.A01Y18.123h.WIB-DI.DO5.1.K.314-ECO
The.Regex.F05P78.123h.WIB-DI.DO5.1.K.314-EYT
Word.C05F78.342T.DSW-RF.EF5.2.F.342-DDF
I would like to extract this part of these string in PHP dynamically and i was looking at using regex but haven't had much success:
The.Test.String.A01Y18
The.Regex.F05P78
Word.C05F78
And ultimately to:
The Test String A01Y18
The Regex F05P78
Word C05F78
The first part of the text will be variable in length and will separate each word with a period. The next part will always be the same length with the pattern:
One letter, 2 number, one letter, 2 numbers pattern (C05F78)
Any thing in the string after that is what I would like to remove.

that's it
$x=array(
"The.Test.String.A01Y18.123h.WIB-DI.DO5.1.K.314-ECO",
"The.Regex.F05P78.123h.WIB-DI.DO5.1.K.314-EYT",
"Word.C05F78.342T.DSW-RF.EF5.2.F.342-DDF"
);
for ($i=0, $tmp_count=count($x); $i<$tmp_count; ++$i) {
echo str_replace(".", " ", preg_replace("/^(.+?)([a-z]{1}[0-9]{2}[a-z]{1}[0-9]{2})\..+$/i", "\\1\\2", $x[$i]))."<br />";
}

Using this regular expression should work, replacing each of your strings with the first capturing group:
^((?:\w+\.)+\w\d{2}\w\d{2}).*
See demo at http://regex101.com/r/fR3pM6

This is valid too:
preg_match("\.*[\w\d]{6}", stringVariable)
.* for all digits atleast we found a composition of letters and words of 6 characters ([\w\d]{6})
Result:
Match 1: The.Test.Stsrisng.A01Y18
Match 2: The.Regex.F05P78
Match 3: Word.C05F78

Regular expression to match an exact number of occurrence for a certain character

I'm trying to check if a string has a certain number of occurrence of a character.
Example:
$string = '123~456~789~000';
I want to verify if this string has exactly 3 instances of the character ~.
Is that possible using regular expressions?

Yes
/^[^~]*~[^~]*~[^~]*~[^~]*$/
Explanation:
^ ... $ means the whole string in many regex dialects
[^~]* a string of zero or more non-tilde characters
~ a tilde character
The string can have as many non-tilde characters as necessary, appearing anywhere in the string, but must have exactly three tildes, no more and no less.

As single character is technically a substring, and the task is to count the number of its occurences, I suppose the most efficient approach lies in using a special PHP function - substr_count:
$string = '123~456~789~000';
if (substr_count($string, '~') === 3) {
// string is valid
}
Obviously, this approach won't work if you need to count the number of pattern matches (for example, while you can count the number of '0' in your string with substr_count, you better use preg_match_all to count digits).
Yet for this specific question it should be faster overall, as substr_count is optimized for one specific goal - count substrings - when preg_match_all is more on the universal side. )

I believe this should work for a variable number of characters:
^(?:[^~]*~[^~]*){3}$
The advantage here is that you just replace 3 with however many you want to check.
To make it more efficient, it can be written as
^[^~]*(?:~[^~]*){3}$

This is what you are looking for:
EDIT based on comment below:
<?php
$string = '123~456~789~000';
$total = preg_match_all('/~/', $string);
echo $total; // Shows 3

php regex - need 2 groups captured

I need 2 groups captured: 1-expr (can be empty); 2-essi
see code below
$s = 'regular expr<span>essi</span>on contains';
function my_func($matches){
//I need 2 groups captured
//$matches[1] - "expr" (see $s before span) - can be empty, but I still need to capture it
//$matches[2] - "essi" (between spans)
}
$pattern = "???";
echo preg_replace_callback($pattern, my_func, $s);

$pattern = "~(\w*)<span>(\w+)</span>~";
This should do the trick.
If the second group should be able to match empty strings as well, replace the + by another *. Note that \w will match letters, digits and underscores. If that is too much or insufficient, replace it by an appropriate character class.
One more thing: I think the syntax for preg_replace_callback requires you to hand in the function name as a string.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

how do you validate characters AND words in regex? - php

Related

split a value into two and then reverse the value in php

Find and split a string by the first character that is not 0

Using regex to extract first half of string

Regular expression to match an exact number of occurrence for a certain character

php regex - need 2 groups captured

Categories

Resources