A quicker PHP regex test - php

I need to sanitize incoming JSON that typically bears the form
'["P4950Zp550","P4950Zp575","P4950Zp600","P5000Zp550","P5000Zp575","P5000Zp600","P4975Zp550","P4975Zp600"]'
with the number of digits following each P|M, p|m varying between 3 and 5. json_decoding this and then applying the test
preg_match('/(P|M){1}[0-9]{3,5}Z(p|m){1}[0-9]{3,5}/',$value)
eight times, in a foreach loop, (I always have eight values in the array) would be a trivial matter. However, I am wondering if there might not be a Regex I could write that could do this in a oner without me having to first json_decode in the incoming string. My knowledge of RegExs is at its limits with the regex I have created here.

Decode the JSON and then use a loop:
$json = '["P4950Zp550","P4950Zp,575","P4950Zp600","P5000Zp550","P5000Zp,575","P5000Zp600","P4975Zp550","P4975Zp600"]';
$array = json_decode($json, true);
foreach ($array as $value) {
if (!preg_match('/^[PM]\d{3,5}Z[pm]\d{3,5}$/',$value)) {
echo "Invalid value: $value<br>\n";
}
}
DEMO
Trying to parse the original JSON with a regexp is a bad idea.

If you truly, deeply want to validate the 8-element array in a single pass while it is still a json string, you can use this:
Pattern: ~^\["[PM]\d{3,5}Z[pm]\d{3,5}(?:","[PM]\d{3,5}Z[pm]\d{3,5}){7}"]$~
Pattern Demo -- this matches the first, then the seven to follow; all wrapped appropriately.
Code (Demo)
$string = '["P4950Zp550","P4950Zp575","P4950Zp600","P5000Zp550","P5000Zp575","P5000Zp600","P4975Zp550","P4975Zp600"]';
if (preg_match('~^\["[PM]\d{3,5}Z[pm]\d{3,5}(?:","[PM]\d{3,5}Z[pm]\d{3,5}){7}"]$~', $string)) {
echo "pass";
} else {
echo "fail";
}
// outputs: pass
Sometimes, you just want to bulk validate input :]

See if this helps
(\"[PM]\d{3,5}Z[pm](\,)?\d{3,5}\"(\,)?)*
Here, the expression is enclosed in ()* which groups the inner expression and looks for any number of occurrences (*) of the inner group. You can include the square brackets also if you prefer...

Related

Multiple patterns within regex

I have a json and I need to match all "text" keys as well as the "html" keys.
For example, the json could be like below:
[{
"layout":12,
"text":"Lorem",
"html":"<div>Ipsum</div>"
}]
Or it could be like below:
[{
"layout":12,
"settings":{
"text":"Lorem",
"atts":{
"html":"<div>Ipsum</div>"
}
}
}]
The json is not always using the same structure so I have to match the keys and get their values using preg_match_all. I have tried the following to get the value of the "text" key:
preg_match_all('|"text":"([^"]*)"|',$json,$match_txt,PREG_SET_ORDER);
The above works fine for matching a single key. When it comes to matching a second key ("html" in this case) it just doesn't work. I have tried the following:
preg_match_all('|"text|html":"([^"]*)"|',$json,$match_txt,PREG_SET_ORDER);
Can you please give me some hints why the OR operator (text|html) doesn't work? Strangely, the above (multi-pattern) regex works fine when I test it in an online tester but it doesn't work in my php files.
Fixing text|html
You should add text|html to a group, otherwise it will look for "text or html".
|"(text|html)":"([^"]*)"|
Delimiters
This won't currently work with your delimiters though as you use the pipe (|) inside of the expression. You should change your delimiters to something else, here I've used /.
/"(text|html)":"([^"]*)"/
If you still want to use the pipe as your delimiters, you should escape the pipe within the expression.
|"(text\|html)":"([^"]*)"|
If you don't want to manually escape it, preg_quote() can do it for you.
$exp = preg_quote('"(text|html)":"([^"]*)"');
preg_match_all("|{$exp}|",$json,$match_txt,PREG_SET_ORDER);
Parsing JSON
Although that regex will work, it will need additional parsing and it makes more sense to use a recursive function for this.
json_decode() will decode a JSON string into the relative data types. In the example below I've passed an additional argument true which means I will get an associative array where you would normally get an object.
Once findKeyData() is called, it will recursively call itself and work through all of the data until it finds the specified key. If not, it returns null.
function findKeyData($data, $key) {
foreach ($data as $k => $v) {
if (is_array($v)) {
$data = findKeyData($v, $key);
if (! is_null($data)) {
return $data;
}
}
if ($k == $key) {
return $v;
}
}
return null;
}
$json1 = json_decode('[{
"layout":12,
"text":"Lorem",
"html":"<div>Ipsum</div>"
}]', true);
$json2 = json_decode('[{
"layout":12,
"settings":{
"text":"Lorem",
"atts":{
"html":"<div>Ipsum</div>"
}
}
}]', true);
var_dump(findKeyData($json1, 'text')); // Lorem
var_dump(findKeyData($json1, 'html')); // <div>Ipsum</div>
var_dump(findKeyData($json2, 'text')); // Lorem
var_dump(findKeyData($json2, 'html')); // <div>Ipsum</div>
preg_match_all('/"(?:text|html)":"([^"]*)"/',$json,$match_txt,PREG_SET_ORDER);
print $match_txt[0][0]." with group 1: ".$match_txt[0][1]."\n";
print $match_txt[1][0]." with group 1: ".$match_txt[1][1]."\n";
returns:
$ php -f test.php
"text":"Lorem" with group 1: Lorem
"html":"<div>Ipsum</div>" with group 1: <div>Ipsum</div>
The enclosing parentheses are needed : (?:text|html); I couldn't get it to work on https://regex101.com without. ?: means the content of the parentheses will not be captured (i.e., not available in the results).
I also replaced the pipe (|) delimiter with forward slashes since you also have a pipe inside the regex. Another option is to escape the pipe inside the regex: |"(?:text\|html)":"([^"]*)"|.
I don't see any reason to use a regex to parse a valid json string:
array_walk_recursive(json_decode($json, true), function ($v, $k) {
if ( in_array($k, ['text', 'html']) )
echo "$k -> $v\n";
});
demo
You use the Pipe | character as delimiter, I think this will break your regexp. Does it work using another delimiter like
preg_match_all('#"text|html":"([^"]*)"#',$json,$match_txt,PREG_SET_ORDER);
?

stristr Case-insensitive search PHP

Please excuse my noob-iness!
I have a $string, and would like to see if it contains any one or more of a group of words, words link ct, fu, sl** ETC. So I was thinking I could do:
if(stristr("$input", "dirtyword1"))
{
$input = str_ireplace("$input", "thisWillReplaceDirtyWord");
}
elseif(stristr("$input", "dirtyWord1"))
{
$input = str_ireplace("$input", "thisWillReplaceDirtyWord2");
}
...ETC. BUT, I don't want to have to keep doing if/elseif/elseif/elseif/elseif...
Can't I just do a switch statement OR have an array, and then simply say something like?:
$dirtywords = { "f***", "c***", w****", "bit**" };
if(stristr("$input", "$dirtywords"))
{
$input = str_ireplace("$input", "thisWillReplaceDirtyWord");
}
I'd appreciate any help at all
Thank you
$dirty = array("fuc...", "pis..", "suc..");
$censored = array("f***", "p***", "s***");
$input= str_ireplace($dirty, $censored , $input);
Note, that you don't have to check stristr() to do a str_ireplace()
http://php.net/manual/en/function.str-ireplace.php
If search and replace are arrays, then str_ireplace() takes a value from each array and uses them to do search and replace on subject. If replace has fewer values than search, then an empty string is used for the rest of replacement values. If search is an array and replace is a string, then this replacement string is used for every value of search.
Surely not the best solution since I don't know too much PHP, but what about a loop ?
foreach (array("word1", "word2") as $word)
{
if(stristr("$input", $word))
{
$input = str_ireplace("$input", $word" "thisWillReplaceDirtyWord");
}
}
When you have several objects to test, think "loop" ;-)

Get more backreferences from regexp than parenthesis

Ok this is really difficult to explain in English, so I'll just give an example.
I am going to have strings in the following format:
key-value;key1-value;key2-...
and I need to extract the data to be an array
array('key'=>'value','key1'=>'value1', ... )
I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:
/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/
to work with preg_match and this code:
for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
$parameters[$matches[$i]] = $matches[$i+1];
}
However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.
In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.
You can use a lookahead to validate the input while you extract the matches:
/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/
(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.
This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.
If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.
regex is powerful tool, but sometimes, its not the best approach.
$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
$e = explode("-",$k);
$array[$e[0]]=$e[1];
}
print_r($array);
Use preg_match_all() instead. Maybe something like:
$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';
preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$parameters[$match[1]] = $match[2];
}
print_r($parameters);
EDIT:
to first validate if the input string conforms to the pattern, then just use:
if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
EDIT2: the final semicolon is optional
if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.
what about this solution:
$samples = array(
"good" => "key-value;key1-value;key2-value;key5-value;key-value;",
"bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
"bad2" => "key;key1-value;key2-value;key5-value;key-value;",
"bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);
foreach($samples as $name => $value) {
if (preg_match("/^(\w+-\w+;)+$/", $value)) {
printf("'%s' matches\n", $name);
} else {
printf("'%s' not matches\n", $name);
}
}
I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

PHP - Sanitise a comma separated string

What would be the most efficient way to clean a user input that is a comma separated string made entirely on numbers - e.g
2,40,23,11,55
I use this function on a lot of my inputs
function clean($input){ $input=mysql_real_escape_string(htmlentities($input,ENT_QUOTES)); return $input; }
And on simple integers I do:
if (!filter_var($_POST['var'], FILTER_VALIDATE_INT)) {echo('error - bla bla'); exit;}
So should I explode it and then check every element of the array with the code above or maybe replace all occurrences of ',' with '' and then check the whole thing is a number? What do you guys think?
if (ctype_digit(str_replace(",", "", $input))) {
//all ok. very strict. input can only contain numbers and commas. not even spaces
} else {
//not ok
}
If it is CSV and if there might be spaces around the digits or commas and maybe even some quotation marks better use a regex to check if it matches
if (!preg_match('/\A\d+(,\d+)*\z/', $input)) die('bad input');
If you want to transform a comma-separated list instead of simply rejecting it if it's not formed correctly, you could do it with array_map() and avoid writing an explicit loop.
$sanitized_input = implode(",", array_map("intval", explode(",", $input)));
I would filter instead of error checking on simple input, though only 'cause I'm lazy, I suppose, and usually in a web context there's way too many cases to handle on what could be coming in that I wouldn't expect: Simple filter below.
<?php
$input = '234kljsalkdfj234a,a, asldkfja 345345sd,f jasld,f234l2342323##$##';
function clean($dirty){ // Essentially allows numbers and commas, just strips everything else.
return preg_replace('/[^0-9,]/', "", (string) $dirty);
}
$clean = clean($input);
echo $clean;
// Result: 234234,,345345,,2342342323
// Note how it doesn't deal with adjacent filtered-to-empty commas, though you could handle those in the explode. *shrugs*
?>
Here's the code and the output on codepad:
http://codepad.org/YfSenm9k

Parse multiple predictably formatted substrings of user data existing in a single string

I have a really long string in a certain pattern such as:
userAccountName: abc userCompany: xyz userEmail: a#xyz.com userAddress1: userAddress2: userAddress3: userTown: ...
and so on. This pattern repeats.
I need to find a way to process this string so that I have the values of userAccountName:, userCompany:, etc. (i.e. preferably in an associative array or some such convenient format).
Is there an easy way to do this or will I have to write my own logic to split this string up into different parts?
Simple regular expressions like this userAccountName:\s*(\w+)\s+ can be used to capture matches and then use the captured matches to create a data structure.
If you can arrange for the data to be formatted as it is in a URL (ie, var=data&var2=data2) then you could use parse_str, which does almost exactly what you want, I think. Some mangling of your input data would do this in a straightforward manner.
You might have to use regex or your own logic.
Are you guaranteed that the string ": " does not appear anywhere within the values themselves? If so, you possibly could use implode to split the string into an array of alternating keys and values. You'd then have to walk through this array and format it the way you want. Here's a rough (probably inefficient) example I threw together quickly:
<?php
$keysAndValuesArray = implode(': ', $dataString);
$firstKeyName = 'userAccountName';
$associativeDataArray = array();
$currentIndex = -1;
$numItems = count($keysAndValuesArray);
for($i=0;$i<$numItems;i+=2) {
if($keysAndValuesArray[$i] == $firstKeyName) {
$associativeDataArray[] = array();
++$currentIndex;
}
$associativeDataArray[$currentIndex][$keysAndValuesArray[$i]] = $keysAndValuesArray[$i+1];
}
var_dump($associativeDataArray);
If you can write a regexp (for my example I'm considering there're no semicolons in values), you can parse it with preg_split or preg_match_all like this:
<?php
$raw_data = "userAccountName: abc userCompany: xyz";
$raw_data .= " userEmail: a#xyz.com userAddress1: userAddress2: ";
$data = array();
// /([^:]*\s+)?/ part works because the regexp is "greedy"
if (preg_match_all('/([a-z0-9_]+):\s+([^:]*\s+)?/i', $raw_data,
$items, PREG_SET_ORDER)) {
foreach ($items as $item) {
$data[$item[1]] = $item[2];
}
print_r($data);
}
?>
If that's not the case, please describe the grammar of your string in a bit more detail.
PCRE is included in PHP and can respond to your needs using regexp like:
if ($c=preg_match_all ("/userAccountName: (<userAccountName>\w+) userCompany: (<userCompany>\w+) userEmail: /", $txt, $matches))
{
$userAccountName = $matches['userAccountName'];
$userCompany = $matches['userCompany'];
// and so on...
}
the most difficult is to get the good regexp for your needs.
you can have a look at http://txt2re.com for some help
I think the solution closest to what I was looking for, I found at http://www.justin-cook.com/wp/2006/03/31/php-parse-a-string-between-two-strings/. I hope this proves useful to someone else. Thanks everyone for all the suggested solutions.
If i were you, i'll try to convert the strings in a json format with some regexp.
Then, simply use Json.

Categories