How you encode strings like \u00d6? - php

This is encoded: \u00d6
This is decoded: Ö
What function I have to use to decode that string into something readable?
\u00d6asdf -> Öasdf

To convert to UTF-8, do:
preg_replace('/\\\\u([0-9a-f]{4})/ie',
'mb_convert_encoding("&#x$1;", "UTF-8", "HTML-ENTITIES")',
$string);
Since this is the escaping used in JSON, another option would be json_decode. This would, however, also require escaping double quotes and backslashes before (except those of the \uXXXX escape sequences) and adding double quotes around the string. If, however, the string is indeed JSON-encoded and that's what originally motivated the question, the correct answer would naturally be use json_decode instead of the method above.

Normally this would be the urldecode method, but it does not apply to unicode characters, like yours. Try this one instead:
function unicode_urldecode($url)
{
preg_match_all('/%u([[:alnum:]]{4})/', $url, $a);
foreach ($a[1] as $uniord)
{
$utf = '&#x' . $uniord . ';';
$url = str_replace('%u'.$uniord, $utf, $url);
}
return urldecode($url);
}

Related

PHP json_encode data with double quotes

I'm using this simple code to transform database query results into JSON format:
$result = $mysqli->query("
SELECT
date as a
, sum(sales) as b
, product as c
FROM
default_dataset
GROUP BY
date
, product
ORDER BY
date
");
$data = $result->fetch_all(MYSQLI_ASSOC);
echo stripslashes(json_encode($data));
The problem is that if there are double quotes in the data (e.g. in the product column) returned by this query. The json_encode function does not encode the data in a good JSON format.
Could someone help me how to escape the double quotes that are returned by the query? Thank you.
You will need htmlspecialchars instead of stripslashes with proper encoding (UTF-8, if your page uses UTF-8 charset) and ENT_QUOTES which will escape double quotes preventing data to break. See the code below:
echo htmlspecialchars(json_encode($data), ENT_QUOTES, 'UTF-8');
json_encode already takes care of this, you are breaking the result by calling stripslashes:
echo json_encode($data); //properly formed json
Example with a simple array with double quotes string value.
$yourArr = array(
'title' => 'This is an example with "double quote" check it'
);
// add htmlspecialchars as UTF-8 after encoded
$encodeData = htmlspecialchars(json_encode($yourArr), ENT_QUOTES, 'UTF-8');
echo $encodeData;
Result:
{"title":"This is an example with \"double quote\" check it"}
According to PHP Manual:
That said, quotes " will produce invalid JSON, but this is only an
issue if you're using json_encode() and just expect PHP to magically
escape your quotes. You need to do the escaping yourself.
Yes, PHP json_encode wouldn't "escape double" quotes, we need to escape those manually in this super simple way-
array_walk($assoc_array, function(&$v, $k) {
if(is_string($v) && strpos($v, '"') !== false) {
$v = str_replace('"', '\"', $v);
}
});
// After escaping those '"', you can simply use json_enocde then.
$json_data = json_encode($assoc_array);
to encode the quotes using htmlspecialchars:
$json_array = array(
'title' => 'Example string\'s with "special" characters'
);
$json_decode = htmlspecialchars(json_encode($json_array), ENT_QUOTES, 'UTF-8');
You can done that using base64_encode and base64_decode
To store in database first convert it to base64_encode and stored into database and if you want that data then you can decrypt that data using base64_decode
$var['cont'] = base64_encode($_POST['data']);
$dd = json_encode($var['cont']);
echo base64_decode ( json_decode($dd) );

Urlencode everything but slashes?

Is there any clean and easy way to urlencode() an arbitrary string but leave slashes (/) alone?
Split by /
urlencode() each part
Join with /
You can do like this:
$url = "http://www.google.com/myprofile/id/1001";
$encoded_url = urlencode($url);
$after_encoded_url = str_replace("%2F", "/", $url);
Basically what #clovecooks said, but split() is deprecated as of 5.3:
$path = '/path with some/illegal/characters.html';
$parsedPath = implode('/', array_map(function ($v) {
return rawurlencode($v);
}, explode('/', $path)));
// $parsedPath == '/path%20with%20some/illegal/characters.html';
Also might want to decode before encoding, in case the string is already encoded.
I suppose you are trying to encode a whole HTTP url.
I think the best solution to encode a whole HTTP url is to follow the browser strickly.
If you just skip slashes, then you will get double-encode issue if the url has already been encoded.
And if there are some parameters in the url, (?, &, =, # are in the url) the encoding will break the link.
The browsers only encode , ", <, >, ` and multi-byte characters. (Copy all symbols to the browser, you will get the list)
You only need to encode these characters.
echo preg_replace_callback("/[\ \"<>`\\x{0080}-\\x{FFFF}]+/u", function ($match) {
return rawurlencode($match[0]);
}, $path);
Yes, by properly escaping the individual parts before assembling them with slashes:
$url = urlencode($foo) . '/' . urlencode($bar) . '/' . urlencode($baz);
$encoded = implode("/", array_map(function($v) { return urlencode($v); }, split("/", $url)));
This will split the string, encode the parts and join the string together again.

PHP \uXXXX encoded string convert to utf-8

I've got such strings
\u041d\u0418\u041a\u041e\u041b\u0410\u0415\u0412
How can I convert this to utf-8 encoding?
And what is the encoding of given string?
Thank you for participating!
The simple approach would be to wrap your string into double quotes and let json_decode convert the \u0000 escapes. (Which happen to be Javascript string syntax.)
$str = json_decode("\"$str\"");
Seems to be russian letters: НИКОЛАЕВ (It's already UTF-8 when json_decode returns it.)
To parse that string in PHP you can use json_decode because JSON supports that unicode literal format.
To preface, you generally should not be encountering \uXXXX unicode escape sequences outside of JSON documents, in which case you should be decoding those documents using json_decode() rather than trying to cherry-pick strings out of the middle by hand.
If you want to generate JSON documents without unicode escape sequences, then you should use the JSON_UNESCAPED_UNICODE flag in json_encode(). However, the escapes are default as they are most likely to be safely transmitted through various intermediate systems. I would strongly recommend leaving escapes enabled unless you have a solid reason not to.
Lastly, if you're just looking for something to make unicode text "safe" in some fashion, please instead read over the following SO masterpost: UTF-8 all the way through
If, after three paragraphs of "don't do this", you still want to do this, then here are a couple functions for applying/removing \uXXXX escapes in arbitrary text:
<?php
function utf8_escape($input) {
$output = '';
for( $i=0,$l=mb_strlen($input); $i<$l; ++$i ) {
$cur = mb_substr($input, $i, 1);
if( strlen($cur) === 1 ) {
$output .= $cur;
} else {
$output .= sprintf('\\u%04x', mb_ord($cur));
}
}
return $output;
}
function utf8_unescape($input) {
return preg_replace_callback(
'/\\\\u([0-9a-fA-F]{4})/',
function($a) {
return mb_chr(hexdec($a[1]));
},
$input
);
}
$u_input = 'hello world, 私のホバークラフトはうなぎで満たされています';
$e_input = 'hello world, \u79c1\u306e\u30db\u30d0\u30fc\u30af\u30e9\u30d5\u30c8\u306f\u3046\u306a\u304e\u3067\u6e80\u305f\u3055\u308c\u3066\u3044\u307e\u3059';
var_dump(
utf8_escape($u_input),
utf8_unescape($e_input)
);
Output:
string(145) "hello world, \u79c1\u306e\u30db\u30d0\u30fc\u30af\u30e9\u30d5\u30c8\u306f\u3046\u306a\u304e\u3067\u6e80\u305f\u3055\u308c\u3066\u3044\u307e\u3059"
string(79) "hello world, 私のホバークラフトはうなぎで満たされています"

How do I escape only single quotes?

I am writing some JavaScript code that uses a string rendered with PHP. How can I escape single quotes (and only single quotes) in my PHP string?
<script type="text/javascript">
$('#myElement').html('say hello to <?php echo $mystringWithSingleQuotes ?>');
</script>
Quite simply: echo str_replace('\'', '\\\'', $myString);
However, I'd suggest use of JSON and json_encode() function as it will be more reliable (quotes new lines for instance):
<?php $data = array('myString' => '...'); ?>
<script>
var phpData = <?php echo json_encode($data) ?>;
alert(phpData.myString);
</script>
If you want to escape characters with a \, you have addcslashes(). For example, if you want to escape only single quotes like the question, you can do:
echo addcslashes($value, "'");
And if you want to escape ', ", \, and nul (the byte null), you can use addslashes():
echo addslashes($value);
str_replace("'", "\'", $mystringWithSingleQuotes);
In some cases, I just convert it into ENTITIES:
// i.e., $x= ABC\DEFGH'IJKL
$x = str_ireplace("'", "&apos;", $x);
$x = str_ireplace("\\", "&bsol;", $x);
$x = str_ireplace('"', """, $x);
On the HTML page, the visual output is the same:
ABC\DEFGH'IJKL
However, it is sanitized in source.
Use the native function htmlspecialchars. It will escape from all special character. If you want to escape from a quote specifically, use with ENT_COMPAT or ENT_QUOTES. Here is the example:
$str = "Jane & 'Tarzan'";
echo htmlspecialchars($str, ENT_COMPAT); // Will only convert double quotes
echo "<br>";
echo htmlspecialchars($str, ENT_QUOTES); // Converts double and single quotes
echo "<br>";
echo htmlspecialchars($str, ENT_NOQUOTES); // Does not convert any quotes
The output would be like this:
Jane & 'Tarzan'<br>
Jane & 'Tarzan'<br>
Jane & 'Tarzan'
Read more in PHP htmlspecialchars() Function
To replace only single quotes, use this simple statement:
$string = str_replace("'", "\\'", $string);
You can use the addcslashes function to get this done like so:
echo addcslashes($text, "'\\");
After a long time fighting with this problem, I think I have found a better solution.
The combination of two functions makes it possible to escape a string to use as HTML.
One, to escape double quote if you use the string inside a JavaScript function call; and a second one to escape the single quote, avoiding those simple quotes that go around the argument.
Solution:
mysql_real_escape_string(htmlspecialchars($string))
Solve:
a PHP line created to call a JavaScript function like
echo
'onclick="javascript_function(\'' . mysql_real_escape_string(htmlspecialchars($string))"
I wrote the following function. It replaces the following:
Single quote ['] with a slash and a single quote [\'].
Backslash [\] with two backslashes [\\]
function escapePhpString($target) {
$replacements = array(
"'" => '\\\'',
"\\" => '\\\\'
);
return strtr($target, $replacements);
}
You can modify it to add or remove character replacements in the $replacements array. For example, to replace \r\n, it becomes "\r\n" => "\r\n" and "\n" => "\n".
/**
* With new line replacements too
*/
function escapePhpString($target) {
$replacements = array(
"'" => '\\\'',
"\\" => '\\\\',
"\r\n" => "\\r\\n",
"\n" => "\\n"
);
return strtr($target, $replacements);
}
The neat feature about strtr is that it will prefer long replacements.
Example, "Cool\r\nFeature" will escape \r\n rather than escaping \n along.
Here is how I did it. Silly, but simple.
$singlequote = "'";
$picturefile = getProductPicture($id);
echo showPicture('.$singlequote.$picturefile.$singlequote.');
I was working on outputting HTML that called JavaScript code to show a picture...
I am not sure what exactly you are doing with your data, but you could always try:
$string = str_replace("'", "%27", $string);
I use this whenever strings are sent to a database for storage.
%27 is the encoding for the ' character, and it also helps to prevent disruption of GET requests if a single ' character is contained in a string sent to your server. I would replace ' with %27 in both JavaScript and PHP just in case someone tries to manually send some data to your PHP function.
To make it prettier to your end user, just run an inverse replace function for all data you get back from your server and replace all %27 substrings with '.
Happy injection avoiding!

Escaping escape Characters

I'm trying to mimic the json_encode bitmask flags implemented in PHP 5.3.0, here is the string I have:
$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly
Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT) outputs the following:
"O\\\u0027Rei\\\u0022lly"
And I'm currently doing this in PHP versions older than 5.3.0:
str_replace(array('\\"', "\\'"), array('\\u0022', '\\\u0027'), json_encode($s))
or
str_replace(array('\\"', '\\\''), array('\\u0022', '\\\u0027'), json_encode($s))
Which correctly outputs the same result:
"O\\\u0027Rei\\\u0022lly"
I'm having trouble understanding why do I need to replace single quotes ('\\\'' or even "\\'" [surrounding quotes excluded]) with '\\\u0027' and not just '\\u0027'.
Here is the code that I'm having trouble porting to PHP < 5.3:
if (get_magic_quotes_gpc() && version_compare(PHP_VERSION, '6.0.0', '<'))
{
/* JSON_HEX_APOS and JSON_HEX_QUOT are availiable */
if (version_compare(PHP_VERSION, '5.3.0', '>=') === true)
{
$_GET = json_encode($_GET, JSON_HEX_APOS | JSON_HEX_QUOT);
$_POST = json_encode($_POST, JSON_HEX_APOS | JSON_HEX_QUOT);
$_COOKIE = json_encode($_COOKIE, JSON_HEX_APOS | JSON_HEX_QUOT);
$_REQUEST = json_encode($_REQUEST, JSON_HEX_APOS | JSON_HEX_QUOT);
}
/* mimic the behaviour of JSON_HEX_APOS and JSON_HEX_QUOT */
else if (extension_loaded('json') === true)
{
$_GET = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_GET));
$_POST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_POST));
$_COOKIE = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_COOKIE));
$_REQUEST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_REQUEST));
}
$_GET = json_decode(stripslashes($_GET));
$_POST = json_decode(stripslashes($_POST));
$_COOKIE = json_decode(stripslashes($_COOKIE));
$_REQUEST = json_decode(stripslashes($_REQUEST));
}
The PHP string
'O\'Rei"lly'
is just PHP's way of getting the literal value
O'Rei"lly
into a string which can be used. Calling addslashes on that string changes it to be literally the following 11 characters
O\'Rei\"lly
i.e. strlen(addslashes('O\'Rei"lly')) == 11
This is the value which is being sent to json_escape.
In JSON backslash is an escape character, so that needs to be escaped, i.e.
\ to be \\
Also single and double quotes can cause problems. So converting them to their unicode equivalent in one way to avoid problems. So later verions of PHP's json_encode change
' to be \u0027
and
" to be \u0022
So applying these three rules to
O\'Rei\"lly
gives us
O\\\u0027Rei\\\u0022lly
This string is then wrapped in double quotes to make it a JSON string. Your replace expressions include the leading forward slashes. Either by accident or on purpose this means that the leading and trailing double quote returned by json_encode is not subject to the escaping, which it shouldn't be.
So in earlier versions of PHP
$s = addslashes('O\'Rei"lly');
print json_encode($s);
would print
"O\\'Rei\\\"lly"
and we want to change ' to be \u0027
and we want to change \" to be \u0022 because the \ in \" is just to get the " into the string because it begins and ends with double-quotes.
So that's why we get
"O\\\u0027Rei\\\u0022lly"
It's escaping the backslash as well as the quote. It's difficult dealing with escaped escapes, as you're doing here, as it quickly turns into backslash counting games. :-/
If I understand correctly, you just want to know why you need to use
'\\\u0027' and not just '\\u0027'
You're escaping the slash and the character unicode value. With this you are telling json that it should put an apostrophe there, but it needs the backslash and the u to know that a unicode hexadecimal character code is next.
Since you are escaping this string:
$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly
the first backslash is actually escaping the backslash before the apostrophe. Then next slash is used to escape the backslash used by json to identify the character as a unicode character.
If you were appplying the algorythm to O'Reilly instead of O\'Rei\"lly then the latter would suffice.
I hope you find this useful. I only leave you this link so you can read more on how json is constructed, since its obvious you already understand PHP:
http://www.json.org/fatfree.html
When you encode a string for json, some things have to be escaped regardless of the options. As others have pointed out, that includes '\' so any backslash run through json_encode will be doubled. Since you are first running your string through addslashes, which also adds backslashes to quotes, you are adding a lot of extra backslashes. The following function will emulate how json_encode would encode a string. If the string has already had backslashes added, they will be doubled.
function json_encode_string( $encode , $options ) {
$escape = '\\\0..\37';
$needle = array();
$replace = array();
if ( $options & JSON_HEX_APOS ) {
$needle[] = "'";
$replace[] = '\u0027';
} else {
$escape .= "'";
}
if ( $options & JSON_HEX_QUOT ) {
$needle[] = '"';
$replace[] = '\u0022';
} else {
$escape .= '"';
}
if ( $options & JSON_HEX_AMP ) {
$needle[] = '&';
$replace[] = '\u0026';
}
if ( $options & JSON_HEX_TAG ) {
$needle[] = '<';
$needle[] = '>';
$replace[] = '\u003C';
$replace[] = '\u003E';
}
$encode = addcslashes( $encode , $escape );
$encode = str_replace( $needle , $replace , $encode );
return $encode;
}
Since you are going to json_encode the string \' you will have to encode first the \ then the '. So you will have \\ and \u0027. Concatenating these results \\\u0027.
The \ generated by addslashes() get re-escaped by json_encode(). You probably meant to say this Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT) outputs the following but you used $str instead of $s, which confused everyone.
If you evaluate the string "O\\\u0027Rei\\\u0022lly" in JavaScript, you will get "O\'rei\"lly" and I am pretty sure that's not what you want. When you evaluate it, you probably need all the control codes removed. Go ahead, poke this in a file: alert("O\\\u0027Rei\\\u0022lly").
Conclusion: You are escaping the quotes twice, which is most likely not what you need. json_encode already escapes everything that is needed so that any JavaScript parser would return the original data structure. In your case, that is the string you have obtained after the call to addslashes.
Proof:
<?php $out = json_encode(array(10, "h'ello", addslashes("h'ello re-escaped"))); ?>
<script type="text/javascript">
var out = <?php echo $out; ?>;
alert(out[0]);
alert(out[1]);
alert(out[2]);
</script>

Categories