Php regexp for escaping characters - php

I have a string that the user may split manually using comma's.
For example, the string value1,value2,value3 should result in the array:
["value1", "value2", "value3"]
Now what if the user wishes to allow a comma as a substring? I would like to solve that problem by letting the user escape a comma using two comma's or a backslash. For example, the string
"Hi, Stackoverflow" would be written as "Hi,, Stackoverflow" or "Hi\, Stackoverflow".
I find it difficult to evaluate such a string however. I have attempted preg splitting, but there is no way to see if a lookbehind or lookahead series of characters consists of an even or odd number. Furthermore, backslashes and double comma's meant for escaping must be removed as well, which probably requires an additional replace function.

$text = 'Hello, World \,asdas, 123';
$data = preg_split('/(?<=[^\\\]),/',$text);
print_r($data);
Result
Array ( [0] => Hello [1] => World \,asdas [2] => 123 )

For this I would run preg_replace_callback which allows you to count escape characters used and determine what to do with them. If it turns out that coma is not escaped, replace it to some non-printable character that should not be used by user in his input and then explode by this character:
<?php
$str = "One,Two\\, Two\\\\,Three";
$delimiter = chr(0x0B); // vertical tab, hope you do not expect it in the input?
$escaped = preg_replace_callback('/(\\\\)*,?/', function($m) use($delimiter){
if(!isset($m[1]) || strlen($m[0])%2) {
return str_replace(',',$delimiter,preg_replace('/\\\\{2}/','\\',$m[0]));
} else {
return str_replace('\\,',',', preg_replace('/\\\\{2}/','\\',$m[0]));
}
}, $str);
$array = explode($delimiter, $escaped);

Related

Replace whole words from blacklist array instead of partial matches

I have an array of words
$banned_names = array('about','access','account');
The actual array is very long a contains bad words so at risk of breaking any rule I just added an example, the issue I'm having is the following:
$title = str_ireplace($filterWords, '****', $dn1['title']);
This works however, one of my filtered words is 'rum' and if I was to post the word 'forum' it will display as 'fo****'
So I need to only replace the word with **** if it matches the exact word from the array, if I was to give an example the phrase "Lets check the forum and see if anyone has rum", would be "Lets check the forum and see if anyone has ****".
Similar to the other answers but this uses \b in regex to match word boundaries (whole words). It also creates the regex-compatible banned list on the fly before passing to preg_replace_callback().
$dn1['title'] = 'access forum';
$banned_names = array('about','access','account','rum');
$banned_list = array_map(function($r) { return '/\b' . preg_quote($r, '/') . '\b/'; }, $banned_names);
$title = preg_replace_callback($banned_list, function($m) {
return $m[0][0].str_repeat('*', strlen($m[0])-1);
}, $dn1['title']);
echo $title; //a***** forum
You can use regex with \W to match a "non-word" character:
var_dump(preg_match('/\Wrum\W/i', 'the forum thing')); // returns 0 i.e. doesn't match
var_dump(preg_match('/\Wrum\W/i', 'the rum thing')); // returns 1 i.e. matches
The preg_replace() method takes an array of filters like str_replace() does, but you'll have to adjust the list to include the pattern delimiters and the \W on both sides. You could store the full patterns statically in your list:
$banlist = ['/\Wabout\W/i','/\Waccess\W/i', ... ];
preg_replace($banlist, '****', $text);
Or adjust the array on the fly to add those bits.
You can use preg_replace() to look for your needles with a beginning/end of string tag after converting each string in your haystack to an array of strings, so you'll be matching on full words. Alternatively you can add spaces and continue to use str_ireplace() but that option would fail if your word is the first or last word in the string being checked.
Adding spaces (will miss first/last word, not reccomended):
You'll have to modify your filtering array first of course. And yes the foreach could be simpler, but I hope this makes clear what I'm doing/why.
foreach($filterWords as $key => $value){
$filterWords[$key] = " ".$value." ";
}
str_ireplace ( $filterWords, "****", $dn1['title'] );
OR
Breaking up long string (recommended):
foreach($filterWords as $key => $value){
$filterWords[$key] = "/^".$value."$/i"; //add regex for beginning/end of string value
}
preg_replace ( $filterWords, "****", explode(" ", $dn1['title']) );

Regular expression needed for PHP preg_split

I need help with a regular expression in PHP.
I have one string containing a lot of data and the format could be like this.
key=value,e4354ahj\,=awet3,asdfa\=asdfa=23f23
So I have 2 delimiters , and = where , is the set of key and value. The thing is that key and value can contain the same symbols , and = but they will always be escaped. So I cant use explode. I need to use preg_split but I am no good at regular expressions.
Could someone give me a hand with this one?
You need to use negative lookbehind:
// 4 backslashes because they are in a PHP string, so PHP translates them to \\
// and then the regex engine translates the \\ to a literal \
$keyValuePairs = preg_split('/(?<!\\\\),/', $input);
This will split on every , that is not escaped, so you get key-value pairs. You can do the same for each pair to separate the key and value:
list($key, $value) = preg_split('/(?<!\\\\)=/', $pair);
See it in action.
#Jon's answer is awesome. I though of providing a solution by matching the string:
preg_match_all('#(.*?)(?<!\\\\)=(.*?)(?:(?<!\\\\),|$)#', $string, $m);
// You'll find the keys in $m[1] and the values in $m[2]
$array = array_combine($m[1], $m[2]);
print_r($array);
Output:
Array
(
[key] => value
[e4354ahj\,] => awet3
[asdfa\=asdfa] => 23f23
)
Explanation:
(.*?)(?<!\\\\)= : match anything and group it until = not preceded by \
(.*?)(?:(?<!\\\\),|$) : match anything and group it until , not preceded by \ or end of line.

preg replace complete word using partial patterns in PHP

I am using preg_replace($oldWords, $newWords, $string); to replace an array of words.
I wish to replace all words starting with foo into hello, and all words starting with bar into world
i.e foo123 should change to hello , foobar should change to hello, barx5 should change to world, etc.
If my arrays are defined as:
$oldWords = array('/foo/', '/bar/');
$newWords = array('hello', 'world');
then foo123 changes to hello123 and not hello. similarly barx5 changes to worldx5 and not world
How do I replace the complete matched word?
Thanks.
This is actually pretty simple if you understand regex, as well as how preg_replace works.
Firstly, your replacement arrays are incorrectly formed. What is:
$oldWords = array('\foo\', '\bar\');
Should instead be:
$oldWords = array('/foo/', '/bar/');
As the backslash in php escapes the character after it, meaning your strings were getting turned into non-strings, and it was messing up the rest of your code.
As to your actual question, however, you can achieve the desired effect with this:
$oldWords = array('/foo\w*/', '/bar\w*/');
\w matches any word character, while * is a quantifier either meaning 0 or any number of matches.
Adding in those two items will cause the regex to match any string with foo and x number of word-characters directly after it, which is what preg_replace then replaces; the match.
one way to do it is to loop through the array checking each word, since we are only checking the first three letters I would use a substr() instead of a regex because regex functions are slower.
foreach( $oldWords as $word ) {
$newWord = substr( $word, 0, 2 );
if( $newWord === 'foo' ) {
$word = 'hello';
}
else if( $newWord === 'bar' ) {
$word = 'world';
}
};

Regex escape specific characters

I'm using preg_split to make array with some values.
If I have value such as 'This*Value', preg_split will split the value to array('This', 'Value') because of the * in the value, but I want to split it to where I specified, not to the * from the value.How can escape the value, so symbols of the string not to take effect on the expression ?
Example:
// Cut into {$1:$2}
$str = "{Some:Value*Here}";
$result = preg_split("/[\{(.*)\:(.*)\}]+/", $str, -1, PREG_SPLIT_NO_EMPTY);
// Result:
Array(
'Some',
'Value',
'Here'
);
// Results wanted:
Array(
'Some',
'Value*Here'
);
The [ and ] are interpreted as character classes, so any character inside them matches. Try this one, but don't split on it, use preg_match and look in the match's captured groups.
"/(\{([^:]*)\:([^:]*)\})+/"
Original answer (which does not apply to the OP's problem):
If you want to escape * in your values with \ like this\*value, you can split on this regex:
(?<!\\)\*
Your current regular expression is a little... wild. Most special characters inside a character class are treated literally, so it can be greatly simplified:
$str = "{Some:Value*Here}";
$result = preg_split("/[{}:]+/", $str, -1, PREG_SPLIT_NO_EMPTY);
And now $result looks like this:
array(2) {
[0] => string(4) "Some"
[1] => string(10) "Value*Here"
}
The correct and safest solution to your problem is to use preg_quote. If the string contains chars that shall not be quoted, you need to str_replace them back after quoting.

Split text using multiple delimiters into an array of trimmed values

I've got a group of strings which I need to chunk into an array.
The string needs to be split on either /, ,, with, or &.
Unfortunately it is possible for a string to contain two of the strings which needs to be split on, so I can't use split() or explode().
For example, a string could say first past/ going beyond & then turn, so I am trying to get an array that would return:
array('first past', 'going beyond', 'then turn')
The code I am currently using is
$splittersArray=array('/', ',', ' with ','&');
foreach($splittersArray as $splitter){
if(strpos($string, $splitter)){
$splitString = split($splitter, $string);
foreach($splitString as $split){
I can't seem to find a function in PHP that allows me to do this.
Do I need to be passing the string back into the top of the funnel, and continue to go through the foreach() after the string has been split again and again?
This doesn't seem very efficient.
Use a regular expression and preg_split.
In the case you mention, you would get the split array with:
$splitString = preg_split('/(\/|\,| with |\&/)/', $string);
To concisely write the pattern use a character class for the single-character delimiters and add the with delimiter as a value after the pipe (the "or" character in regex). Allow zero or more spaces on either side of the group of delimiters so that the values in the output don't need to be trimmed.
I am using the PREG_SPLIT_NO_EMPTY function flag in case a delimiter occurs at the start or end of the string and you don't want to have any empty elements generated.
Code: (Demo)
$string = 'first past/ going beyond & then turn with everyone';
var_export(
preg_split('~ ?([/,&]|with) ?~', $string, 0, PREG_SPLIT_NO_EMPTY)
);
Output:
array (
0 => 'first past',
1 => 'going beyond',
2 => 'then turn',
3 => 'everyone',
)

Categories