Output PHP string to show escaped characters - php

In PHP, is it at all possible to output the contents of a string to show any escaped characters that may be contained within the string? I get that the whole point of escaping characters is so that they aren't treated in the usual way. But I would still like to be able to view the raw contents of a string so I can see for myself exactly how characters like \n and \r, etc. are represented. Does PHP have a method for doing this?

Use json_encode() to encode the string as JSON. The JSON encoding of strings (which is, in fact, JavaScript) is the same as the one used by PHP. Both JavaScript and PHP were inspired from C and they copied the notation of string literals from it.

if you use single quotation marks it should do what you need
eg echo 'this\n'; will output this\n where as echo "this\n"; will output this and a new line

Related

PHP - Replace JSON with the correct Unicode symbol

Ok, so I have some JSON, that when decoded, I print out the result. Before the JSON is decoded, I use stripslashes() to remove extra slashes. The JSON contains website links, such as https://www.w3schools.com/php/default.asp and descriptions like Hello World, I have u00249999999 dollars
When I print out the JSON, I would like it to print out
Hello World, I have $9999999 dollars, but it prints out Hello World, I have u00249999999 dollars.
I assume that the u0024 is not getting parsed because it has no backslash, though the thing is that the website links' forward slashes aren't removed through strip slashes, which is good - I think that the backslashes for the Unicode symbols are removed with stripslashes();
How do I get the PHP to automatically detect and parse the Unicode dollar sign? I would also like to apply this rule to every single Unicode symbol.
Thanks In Advance!
According to the PHP documentation on stripslashes (), it
un-quotes a quoted string.
Which means, that it basically removes all backslashes, which are used for escaping characters (or Unicode sequences). When removing those, you basically have no chance to be completely sure that any sequence as "u0024" was meant to be a Unicode entity, your user could just have entered that.
Besides that, you will get some trouble when using stripslashes () on a JSON value that contains escaped quotes. Consider this example:
{
"key": "\"value\""
}
This will become invalid when using stripslashes () because it will then look like this:
{
"key": ""value""
}
Which is not parseable as it isn't a valid JSON object. When you don't use stripslashes (), all escape sequences will be converted by the JSON parser and before outputting the (decoded) JSON object to the client, PHP will automatically decode (or "convert") the Unicode sequences your data may contain.
Conclusion: I'd suggest not to use stripslashes () when dealing with JSON entities as it may break things (as seen in the previous example, but also in your problem).
Your assumption is correct: u0024 is not getting parsed because it has no backslash. You can use regex to add backslash back after the conversion.
It looks like you have UTF-8 encoded strings internally, PHP outputs them properly, but your browser fails to auto-detect the encoding (it decides for ISO 8859-1 or some other encoding).
The best way is to tell the browser that UTF-8 is being used by sending the corresponding HTTP header:
header("content-type: text/html; charset=UTF-8");
Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.
If you want, you can additionally declare the encoding in the generated HTML by using the <meta> tag:
<meta http-equiv=Content-Type content="text/html; charset=UTF-8"> for HTML <=4.01
<meta charset="UTF-8">
for HTML5
HTTP header has priority over the <meta> tag, but the latter may be useful if the HTML is saved to HD and then read locally.
The main question you have to understand, is why do you need to strip slashes?
And, if it is really necessary to strip slashes, how to manage the encoding? Probably it is a good idea to convert unicode symbols before to strip slashes, not after, using html_entity_decode .
Anyway, you can try fix the problem with this workaround:
$string = "Hello World, I have u00249999999 dollars";
$string = preg_replace( "/u([0-9A-F]{0,4})/", "&#x$1;", $string ); // recover "u" + 4 alnums
$string = html_entity_decode( $string, ENT_COMPAT, 'UTF-8' ); // convert to utf-8

php is incorrectly converting &not in strings to ¬

I need to make up a simple string in PHP which is a string of data to be posted to another site.
The problem is that one of the fields is 'notify_url=..' and when I use that PHP takes the & in front of it and the not part to mean the logical operator AND NOT and converts it to a ¬ character:
$string = 'field1=1234&field2=this&notify_url=http';
prints as 'field1=1234&field2=this¬ify_url=http'
The encoding on my page is UTF-8.
I have tried creating the string with single quotes as well as double quotes. I have tried making the fields names variables and concating them in but it always products the special character.
This is not being urlencoded because the string is meant to be hashed before the form is submitted to verify posted data.
PHP isn't doing that, it's your browser interpreting HTML entity notation. & has a special meaning in HTML as the start of an HTML entity, and &not happens to be a valid HTML entity. You need to HTML-encode characters with special meanings:
echo htmlspecialchars($string);
// field1=1234&field2=this&notify_url=http

Using Regular Expressions with user input in PHP

I was wondering if anybody knew how to get around this problem.
I am gathering user input from a HTML form which is then posted using htmlspecialchars into PHP to avoid issues when using quotes/etc...
However, I also want to run server-side validation checks on the data being gathered through regular expressions - though I'm not sure how to go about this.
So far, I have thought of decoding the htmlspecialchars - but because I am going to be using the Strings straight away, this means that the code could break after I run this conversion. e.g: Let's say the user inputted a single quote, " into a field. This would be converted to ", then if I decode this and use it in a variable, it could end up like: $string = """; which is going to give me issues.
Any advice on this would be greatly appreciated!
You seem to misunderstand the difference between data and how this data is altered to be parseable in a certain context.
A php string can contain any data. What is stored in this string is the "raw" form: the form in which we want to manipulate the data if needed.
In certain contexts, not all characters are valid. For example, in a html textarea, the < and > characters may not be used, because they are special characters. We still want to be able to use these characters. To use special characters in a context, we escape these characters. By escaping a special character it looses its special meaning. In the context of a html textarea, the < character is escaped as the sequence <. Unlike the < character, this escaped sequence does not have a special meaning in html, and thus if we send the following sequence to the browser, it knows how to parse that sequence and display the right thing: <textarea><</textarea>. When we talk about what the data is that this textarea contains, we do not say that it contains <, but instead we say that it contains <.
As you said, in a php script, in a double quoted string, the " character has a special meaning. This has only to do with parsing. PHP simply does not know how to parse a sequence $str = """;. If we would want to have the double quote in such a double quoted string, we would need to escape it. We escape a double quote in a php double quoted string by prepending it with a \. To make a string containing a single double quote, using the double quoted notation, you would write $str = "\"";.
However, none of this matters.. You are taking input from a html form. When you click the submit button, the browser reads what is in the textarea(, and decodes it as html?). The browser then encodes it in a way as dictated by the form tag, and sends it to the server. The server then decodes the blob of text back in it's raw data form. That data is passed to PHP, and it is this form you will encounter in $_POST['myTextarea'].
In conclusion: If data is encoded, realize for which context it was encoded and decode it based on that context. You do not need to escape for php quoted strings, because you are working on internal strings. There is nothing to parse. Remind yourself that when you are going to use the data somewhere, that you should take care that all special characters in your data for that particular context are escaped.
I suppose that htmlspecialchars() function is called after posting the form to PHP. Simplest solution then will be to match against regular expression first and then do htmlspecialchars().
Also, if you have string encoded with htmlspecialchars(), after decoding with htmlspecialchars_decode(), PHP internal representation will be "\"", so you break nothing. There is big difference how you write strings by hand to PHP file and how PHP internally handle them. You really don't need to be bothered by this.

Understanding what \u0000 is in PHP / JSON and getting rid of it

I haven't a clue what is going on but I have a string inside an array. It must be a string as I have ran this on it first:
$array[0] = (string)$array[0];
If I output $array[0] to the browser in plain text it shows this:
hellothere
But if I JSON encode $array I get this:
hello\u0000there
Also, I need to separate the 'there' part (the bit after the \u0000), but this doesn't work:
explode('\u0000', $array[0]);
I don't even know what \u0000 is or how to control it in PHP.
I did see this link: Trying to find and get rid of this \u0000 from my json
...which suggests str_replacing the JSON that is generated. I can't do that (and need to separate it as mentioned above first) so I then checked Google for 'php check for backslash \0 byte' but I still can't work out what to do.
\uXXXX is the JSON Unicode escape notation (X is hexadecimal).
In this case, it means the 0 ASCII char, aka the NUL byte, to split it you can either do:
explode('\u0000', json_encode($array[0]));
Or better yet:
explode("\0", $array[0]); // PHP doesn't use the same notation as JSON
The string you have is "hello\0world", or "hello\x00world" whatever you prefer. If you echo it, the null symbol \0 won't be displayed, thats why you see helloworld instead, but json_encode will detect it and escape it as it does to any other special character, thats why its replaced by a visible \u0000 string.
In my way of seeing it, json is encoding the string perfectly, the \u0000 is there to do its job of reproducing the inputted string in a json encoded way. You don't have to touch its output. If you don't want that \u0000 there you should fix its input instead.
you can simply do trim($str) without giving it a charlist
\uXXXX is the unicode symbol with code XXXX (hexadecimal).
For example: http://msdn.microsoft.com/en-us/library/aa664669(v=vs.71).aspx
If you really get 0000 - then it's just the char with code 0
I came across this issue today and I sorted it out by replacing \u0000 in my array with "" before sending it back to the client.
echo str_replace('\\u0000', "", json_encode($send));
In my case I've found the symbol inside serialized Laravel job's payload json, something like s:8:"\0*\0order"; (or s:8:"\u0000*\u0000order";) which meant that serialized object's property order has visibility protected on a moment of serialization
Just in case anyone need it to apply to the whole array
$data = (array)json_decode(str_replace('\u0000*\u0000', '', json_encode($data)));
Try explode("\u0000", $array[0]);, making sure you use double quotes. With single quotes it's going to parse the literal 6 character value.
As others have mentioned, \u0000 is the Unicode NUL character.

quoted_printable_decode() replaces wrong strings

I'm running quoted_printable_decode() on HTML content that is stored in DB and has a lot of these types of characters =C5=DD= etc..
However, I also have this string in the HTML which I did not mean to replace:
link
Since it has =b in it, it replaces it as well.
Is there any way to avoid this?
Encode the = as =3D, which is the equivalent in Quoted Printable.

Categories