Un-escaping characters with regular expressions

Un-escaping characters with regular expressions - php

I'm removing certain characters from the string by substituting them:
% -> %%
: -> %c
/ -> %s
The string "%c" is properly escaped into %%c. However when I try to reverse it back with str_replace( array('%%','%c','%s'), array('%',':','/'), $s) it converts it into ":". That's proper behaviour of str_replace as per documentation, that's why I'm looking into solution with Regular Expressions.
Please suggest, what should I use to properly decode escaped string. Thank you.

You need to do the replacement for all escape sequences at once and not successively:
preg_replace_callback('/%([%cs])/', function($match) {
$trans = array('%' => '%', 'c' => ':', 's' => '/');
return $trans[$match[1]];
}, $str)

You could use a preg_replace pipeline (with a temporary marker):
<?php
$escaped = "Hello %% World%c You'll find your reservation under %s";
echo preg_replace("/%TMP/", "%",
preg_replace("/%s/", "/",
preg_replace("/%c/", ":",
preg_replace("/%%/", "%TMP", $escaped)));
echo "\n";
# Output should be
# Hello % World: You'll find your reservation under /
?>

Judging from your comment (that you want to go from "%%c" to "%c", and not from "%%c" straight to ":"), you can use Gumbo's method with a little modification, I think:
$unescaped = preg_replace_callback('/%(%[%cs])/', function($match) {
return $match[1];
}, $escaped);

Related

php preg_replace not putting dash back in?

So, I'm doing some manipulation on lat/long pairs, and I need to turn this:
39.1889375383777,-94.48019109594397
into:
39.1889375383777 -94.48019109594397
I can't use str_replace, unless I want to have an array of 10 search and 10 replace strings, so I was hoping to use preg_replace:
$query1 = preg_replace( "/([0-9-]),([0-9-])/", "\1 \2", $query );
The problem is that the "-" gets lost:
39.1889375383777 94.48019109594397
Note, that I have a string containing a list of these, trying to do all at once:
[[39.1889375383777,-94.48019109594397],[39.18425796890108,-94.28288005131176],[39.41972019529712,-94.19956344733345],[39.41412315915102,-94.41932608390658],[39.34785744845041,-94.4893603307242],[39.1889375383777,-94.48019109594397]]
I managed to make this work with preg_replace_callback:
$str = preg_replace_callback( "/([0-9-]),([0-9-])/",
function ($matches) {return $matches[1] . " " . $matches[2];},
$query
);
But still not sure why the simpler preg_match didn't work?

Your main issue is that "\1 \2" define a "\x1\x20\x2" string, where the first character is a SOH char and the third one is STX char (see the ASCII table). To define backreferences, you need to use a literal backslash, "\\", or, better, use $n notation, and better inside a single-quoted string literal.
You can also use a solution without backreferences:
preg_replace('~(?<=\d),(?=-?\d)~', ' ', $str)
Details:
(?<=\d) - a location that is immediately preceded with a digit
, - a comma
(?=-?\d) - a location that is immediately followed with an optional - and a digit.
See the PHP demo:
$str = '[[39.1889375383777,-94.48019109594397],[39.18425796890108,-94.28288005131176],[39.41972019529712,-94.19956344733345],[39.41412315915102,-94.41932608390658],[39.34785744845041,-94.4893603307242],[39.1889375383777,-94.48019109594397]]';
echo preg_replace('~(?<=\d),(?=-?\d)~', ' ', $str);
// => [[39.1889375383777 -94.48019109594397],[39.18425796890108 -94.28288005131176],[39.41972019529712 -94.19956344733345],[39.41412315915102 -94.41932608390658],[39.34785744845041 -94.4893603307242],[39.1889375383777 -94.48019109594397]]

Preg_replace with a part of string but in another format

I have a string with the format:
$string = 'First\Second\Third';
'First' and 'Second' are always the same but Third' is just and an example (can be anything but in with special character '\' between them).
I want to create a string like that:
$string = 'Second_Third';
I tried to use function preg_replace and this is my code:
preg_replace(Array('/^First\\\/', '/Second\\\[^\s]+/'), Array('', '/Second\_[^\s]+/'), $string);
I have no idea how to do this.
10x

You can use:
$str = preg_replace('/.*?\\\\(Second)\\\\(.+)$/', '$1_$2');
RegEx Demo

If you're just looking for \ then you don't need a regular expression. You can do a regular character match. Using list and explode you can do it like so:
list($first, $second) = explode("\\", "First\\Second\\Third");
echo $first . "_" . $second;
Note that you need to escape \ because it's the escape character :>

Simplest way to use wildcard in string replacement?

I have the following strings:
Johnny arrived at BOB
Peter is at SUSAN
I want a function where I can do this:
$string = stripWithWildCard("Johnny arrived at BOB", "*at ")
$string must equal BOB. Also if I do this:
$string = stripWithWildCard("Peter is at SUSAN", "*at ");
$string must be equal to SUSAN.
What is the shortest way to do this?

A regular expression. You substitute .* for * and replace with the empty string:
echo preg_replace('/.*at /', '', 'Johnny arrived at BOB');
Keep in mind that if the string "*at " is not hardcoded then you also need to quote any characters which have special meaning in regular expressions. So you would have:
$find = '*at ';
$find = preg_quote($find, '/'); // "/" is the delimiter used below
$find = str_replace('\*', '.*'); // preg_quote escaped that, unescape and convert
echo preg_replace('/'.$find.'/', '', $input);

Convert string into slug with single-hyphen delimiters only

I would like to sanitize a string in to a URL so this is what I basically need:
Everything must be removed except alphanumeric characters and spaces and dashed.
Spaces should be converter into dashes.
Eg.
This, is the URL!
must return
this-is-the-url

function slug($z){
$z = strtolower($z);
$z = preg_replace('/[^a-z0-9 -]+/', '', $z);
$z = str_replace(' ', '-', $z);
return trim($z, '-');
}

First strip unwanted characters
$new_string = preg_replace("/[^a-zA-Z0-9\s]/", "", $string);
Then changes spaces for unserscores
$url = preg_replace('/\s/', '-', $new_string);
Finally encode it ready for use
$new_url = urlencode($url);

The OP is not explicitly describing all of the attributes of a slug, but this is what I am gathering from the intent.
My interpretation of a perfect, valid, condensed slug aligns with this post: https://wordpress.stackexchange.com/questions/149191/slug-formatting-acceptable-characters#:~:text=However%2C%20we%20can%20summarise%20the,or%20end%20with%20a%20hyphen.
I find none of the earlier posted answers to achieve this consistently (and I'm not even stretching the scope of the question to include multi-byte characters).
convert all characters to lowercase
replace all sequences of one or more non-alphanumeric characters to a single hyphen.
trim the leading and trailing hyphens from the string.
I recommend the following one-liner which doesn't bother declaring single-use variables:
return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($string)), '-');
I have also prepared a demonstration which highlights what I consider to be inaccuracies in the other answers. (Demo)
'This, is - - the URL!' input
'this-is-the-url' expected
'this-is-----the-url' SilentGhost
'this-is-the-url' mario
'This-is---the-URL' Rooneyl
'This-is-the-URL' AbhishekGoel
'This, is - - the URL!' HelloHack
'This, is - - the URL!' DenisMatafonov
'This,-is-----the-URL!' AdeelRazaAzeemi
'this-is-the-url' mickmackusa
---
'Mork & Mindy' input
'mork-mindy' expected
'mork--mindy' SilentGhost
'mork-mindy' mario
'Mork--Mindy' Rooneyl
'Mork-Mindy' AbhishekGoel
'Mork & Mindy' HelloHack
'Mork & Mindy' DenisMatafonov
'Mork-&-Mindy' AdeelRazaAzeemi
'mork-mindy' mickmackusa
---
'What the_underscore ?!?' input
'what-the-underscore' expected
'what-theunderscore' SilentGhost
'what-the_underscore' mario
'What-theunderscore-' Rooneyl
'What-theunderscore-' AbhishekGoel
'What the_underscore ?!?' HelloHack
'What the_underscore ?!?' DenisMatafonov
'What-the_underscore-?!?' AdeelRazaAzeemi
'what-the-underscore' mickmackusa

This will do it in a Unix shell (I just tried it on my MacOS):
$ tr -cs A-Za-z '-' < infile.txt > outfile.txt
I got the idea from a blog post on More Shell, Less Egg

Try This
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}
Usage:
echo clean('a|"bc!#£de^&$f g');
Will output: abcdef-g
source : https://stackoverflow.com/a/14114419/2439715

Using intl transliterator is a good option because with it you can easily handle complicated cases with a single set of rules. I added custom rules to illustrate how it can be flexible and how you can keep a maximum of meaningful informations. Feel free to remove them and to add your own rules.
$strings = [
'This, is - - the URL!',
'Holmes & Yoyo',
'L’Œil de démon',
'How to win 1000€?',
'€, $ ＆ other currency symbols',
'Und die Katze fraß alle mäuse.',
'Белите рози на София',
'പോണ്ടിച്ചേരി സൂര്യനു കീഴിൽ',
];
$rules = <<<'RULES'
# Transliteration
:: Any-Latin ; :: Latin-Ascii ;
# examples of custom replacements
'&' > ' and ' ;
[^0-9][01]? { € > ' euro' ; € > ' euros' ;
[^0-9][01]? { '$' > ' dollar' ; '$' > ' dollars' ;
:: Null ;
# slugify
[^[:alnum:]&[:ascii:]]+ > '-' ;
:: Lower ;
# trim
[$] { '-' > &Remove() ;
'-' } [$] > &Remove() ;
RULES;
$tsl = Transliterator::createFromRules($rules, Transliterator::FORWARD);
$results = array_map(fn($s) => $tsl->transliterate($s), $strings);
print_r($results);
demo
Unfortunately, the PHP manual is totally empty about ICU transformations but you can find informations about them here.

All previous asnwers deal with url, but in case some one will need to sanitize string for login (e.g.) and keep it as text, here is you go:
function sanitizeText($str) {
$withSpecCharacters = htmlspecialchars($str);
$splitted_str = str_split($str);
$result = '';
foreach ($splitted_str as $letter){
if (strpos($withSpecCharacters, $letter) !== false) {
$result .= $letter;
}
}
return $result;
}
echo sanitizeText('ОРРииыфвсси ajvnsakjvnHB "&nvsp;\n" <script>alert()</script>');
//ОРРииыфвсси ajvnsakjvnHB &nvsp;\n scriptalert()/script
//No injections possible, all info at max keeped

function isolate($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}

You should use the slugify package and not reinvent the wheel ;)
https://github.com/cocur/slugify

The following will replace spaces with dashes.
$str = str_replace(' ', '-', $str);
Then the following statement will remove everything except alphanumeric characters and dashed. (didn't have spaces because in previous step we had replaced them with dashes.
// Char representation 0 - 9 A- Z a- z -
$str = preg_replace('/[^\x30-\x39\x41-\x5A\x61-\x7A\x2D]/', '', $str);
Which is equivalent to
$str = preg_replace('/[^0-9A-Za-z-]+/', '', $str);
FYI: To remove all special characters from a string use
$str = preg_replace('/[^\x20-\x7E]/', '', $str);
\x20 is hexadecimal for space that is start of Acsii charecter and \x7E is tilde. As accordingly to wikipedia https://en.wikipedia.org/wiki/ASCII#Printable_characters
FYI: look into the Hex Column for the interval 20-7E
Printable characters
Codes 20hex to 7Ehex, known as the printable characters, represent letters, digits, punctuation marks, and a few miscellaneous symbols. There are 95 printable characters in total.

preg_replace to capitalize a letter after a quote

I have names like this:
$str = 'JAMES "JIMMY" SMITH'
I run strtolower, then ucwords, which returns this:
$proper_str = 'James "jimmy" Smith'
I'd like to capitalize the second letter of words in which the first letter is a double quote. Here's the regexp. It appears strtoupper is not working - the regexp simply returns the unchanged original expression.
$proper_str = preg_replace('/"([a-z])/',strtoupper('$1'),$proper_str);
Any clues? Thanks!!

Probably the best way to do this is using preg_replace_callback():
$str = 'JAMES "JIMMY" SMITH';
echo preg_replace_callback('!\b[a-z]!', 'upper', strtolower($str));
function upper($matches) {
return strtoupper($matches[0]);
}
You can use the e (eval) flag on preg_replace() but I generally advise against it. Particularly when dealing with external input, it's potentially extremely dangerous.

Use preg_replace_callback - But you dont need to add an extra named function, rather use an anonymous function.
$str = 'JAMES "JIMMY" SMITH';
echo preg_replace_callback('/\b[a-z]/', function ($matches) {
return strtoupper($matches[0]);
}, strtolower($str));
Use of /e is be deprecated as of PHP 5.5 and doesn't work in PHP 7

Use the e modifier to have the substitution be evaluated:
preg_replace('/"[a-z]/e', 'strtoupper("$0")', $proper_str)
Where $0 contains the match of the whole pattern, so " and the lowercase letter. But that doesn’t matter since the " doesn’t change when send through strtoupper.

A complete solution doesn't get simpler / easier to read than this...
Code: https://3v4l.org/rrXP7
$str = 'JAMES "JIMMY" SMITH';
echo ucwords(strtolower($str), ' "');
Output:
James "Jimmy" Smith
It is merely a matter of declaring double quotes and spaces as delimiters in the ucwords() call.
Nope. My earlier self was not correct. It doesn't get simpler than this multibyte-safe title-casing technique!
Code: (Demo)
echo mb_convert_case($str, MB_CASE_TITLE);
Output:
James "Jimmy" Smith

I do this without regex, as part of my custom ucwords() function. Assuming no more than two quotes appear in the string:
$parts = explode('"', $string, 3);
if(isset($parts[2])) $string = $parts[0].'"'.ucfirst($parts[1]).'"'.ucfirst($parts[2]);
else if(isset($parts[1])) $string = $parts[0].'"'.ucfirst($parts[1]);

You should do this :
$proper_str =
preg_replace_callback(
'/"([a-z])/',
function($m){return strtoupper($m[1]);},
$proper_str
);
You should'nt use "eval()" for security reasons.
Anyway, the patern modifier "e" is deprecated.
See : PHP Documentation.

echo ucwords(mb_strtolower('JAMES "JIMMY" SMITH', 'UTF-8'), ' "'); // James "Jimmy" Smith
ucwords() has a second delimiter parameter, the optional delimiters contains the word separator characters. Use space ' ' and " as delimiter there and "Jimmy" will be correctly recognized.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Un-escaping characters with regular expressions - php

You need to do the replacement for all escape sequences at once and not successively: preg_replace_callback('/%([%cs])/', function($match) { $trans = array('%' => '%', 'c' => ':', 's' => '/'); return $trans[$match[1]]; }, $str)

Judging from your comment (that you want to go from "%%c" to "%c", and not from "%%c" straight to ":"), you can use Gumbo's method with a little modification, I think: $unescaped = preg_replace_callback('/%(%[%cs])/', function($match) { return $match[1]; }, $escaped);

Related

php preg_replace not putting dash back in?

Preg_replace with a part of string but in another format

Simplest way to use wildcard in string replacement?

Convert string into slug with single-hyphen delimiters only

preg_replace to capitalize a letter after a quote

Categories

Resources