quoted_printable_decode() replaces wrong strings - php

I'm running quoted_printable_decode() on HTML content that is stored in DB and has a lot of these types of characters =C5=DD= etc..
However, I also have this string in the HTML which I did not mean to replace:
link
Since it has =b in it, it replaces it as well.
Is there any way to avoid this?

Encode the = as =3D, which is the equivalent in Quoted Printable.

Related

Output PHP string to show escaped characters

In PHP, is it at all possible to output the contents of a string to show any escaped characters that may be contained within the string? I get that the whole point of escaping characters is so that they aren't treated in the usual way. But I would still like to be able to view the raw contents of a string so I can see for myself exactly how characters like \n and \r, etc. are represented. Does PHP have a method for doing this?
Use json_encode() to encode the string as JSON. The JSON encoding of strings (which is, in fact, JavaScript) is the same as the one used by PHP. Both JavaScript and PHP were inspired from C and they copied the notation of string literals from it.
if you use single quotation marks it should do what you need
eg echo 'this\n'; will output this\n where as echo "this\n"; will output this and a new line

Using Regular Expressions with user input in PHP

I was wondering if anybody knew how to get around this problem.
I am gathering user input from a HTML form which is then posted using htmlspecialchars into PHP to avoid issues when using quotes/etc...
However, I also want to run server-side validation checks on the data being gathered through regular expressions - though I'm not sure how to go about this.
So far, I have thought of decoding the htmlspecialchars - but because I am going to be using the Strings straight away, this means that the code could break after I run this conversion. e.g: Let's say the user inputted a single quote, " into a field. This would be converted to ", then if I decode this and use it in a variable, it could end up like: $string = """; which is going to give me issues.
Any advice on this would be greatly appreciated!
You seem to misunderstand the difference between data and how this data is altered to be parseable in a certain context.
A php string can contain any data. What is stored in this string is the "raw" form: the form in which we want to manipulate the data if needed.
In certain contexts, not all characters are valid. For example, in a html textarea, the < and > characters may not be used, because they are special characters. We still want to be able to use these characters. To use special characters in a context, we escape these characters. By escaping a special character it looses its special meaning. In the context of a html textarea, the < character is escaped as the sequence <. Unlike the < character, this escaped sequence does not have a special meaning in html, and thus if we send the following sequence to the browser, it knows how to parse that sequence and display the right thing: <textarea><</textarea>. When we talk about what the data is that this textarea contains, we do not say that it contains <, but instead we say that it contains <.
As you said, in a php script, in a double quoted string, the " character has a special meaning. This has only to do with parsing. PHP simply does not know how to parse a sequence $str = """;. If we would want to have the double quote in such a double quoted string, we would need to escape it. We escape a double quote in a php double quoted string by prepending it with a \. To make a string containing a single double quote, using the double quoted notation, you would write $str = "\"";.
However, none of this matters.. You are taking input from a html form. When you click the submit button, the browser reads what is in the textarea(, and decodes it as html?). The browser then encodes it in a way as dictated by the form tag, and sends it to the server. The server then decodes the blob of text back in it's raw data form. That data is passed to PHP, and it is this form you will encounter in $_POST['myTextarea'].
In conclusion: If data is encoded, realize for which context it was encoded and decode it based on that context. You do not need to escape for php quoted strings, because you are working on internal strings. There is nothing to parse. Remind yourself that when you are going to use the data somewhere, that you should take care that all special characters in your data for that particular context are escaped.
I suppose that htmlspecialchars() function is called after posting the form to PHP. Simplest solution then will be to match against regular expression first and then do htmlspecialchars().
Also, if you have string encoded with htmlspecialchars(), after decoding with htmlspecialchars_decode(), PHP internal representation will be "\"", so you break nothing. There is big difference how you write strings by hand to PHP file and how PHP internally handle them. You really don't need to be bothered by this.

converting special characters in HTML into the appropriate coding for PHP

I am making a website where one fills out a form and it creates a PDF. The user will be able to put in diacritic and special characters. The way I am sending the characters to the PHP, those characters will come into the PHP as HTML coded characters i.e. à. I need to change this to whatever it is PHP will read so when I put it through the PDF maker we have it has the diacritic character and not the HTML code for it.
I wrote a test to try this out but I haven't been able to figure it out. If I have to I will end up writing an array for every possible character they can use and translate the incoming string but I am trying to find an easier solution.
Here is the code of my test:
$title = "Test of Title for use With This Project and it should also wrap because it is sò long! Acutally it is even longer than previously expected!";
$ti = htmlspecialchars_decode($title);
I have been attempting to use the htmlspecialchars_decode() to convert it but it still comes out as &ograve and not ò. Is there an easy way to do this?
See the documentation which tells you it won't touch most of the characters you care about and to use html_entity_decode instead.
Use the html_entity_decode function instead of htmlspecialchars_decode (which only decodes entities such as &, ", < and > = special HTML chars, not all entities).

getting json_encode to not escape html entities

I send json_encoded data from my PHP server to iPhone app. Strings containing html entities, like '&' are escaped by json_encode and sent as &.
I am looking to do one of two things:
make json_encode not escape html entities. Doc says 'normal' mode shouldn't escape it but it doesn't work for me. Any ideas?
make the iPhone app un-escape html entities cheaply. The only way I can think of doing it now involves spinning up a XML/HTML parser which is very expensive. Any cheaper suggestions?
Thanks!
Neither PHP 5.3 nor PHP 5.2 touch the HTML entities.
You can test this with the following code:
<?php
header("Content-type: text/plain"); //makes sure entities are not interpreted
$s = 'A string with & &#x6F8 entities';
echo json_encode($s);
You'll see the only thing PHP does is to add double quotes around the string.
json_encode does not do that. You have another component that is doing the HTML encoding.
If you use the JSON_HEX_ options you can avoid that any < or & characters appear in the output (they'd get converted to \u003C or similar JS string literal escapes), thus possibly avoiding the problem:
json_encode($s, JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_QUOT)
though this would depend on knowing exactly which characters are being HTML-encoded further downstream. Maybe non-ASCII characters too?
Based on the manual it appears that json_encode shouldn't be escaping your entities, unless you explicitly tell it to, in PHP 5.3. Are you perhaps running an older version of PHP?
Going off of Artefacto's answer, I would recommend using this header, it's specifically designed for JSON data instead of just using plain text.
<?php
header('Content-Type: application/json'); //Also makes sure entities are not interpreted
$s = 'A string with & &#x6F8 entities';
echo json_encode($s);
Make sure you check out this post for more specific reasons why to use this content type, What is the correct JSON content type?

Encoding $_GET[] values with PHP to make them broswer safe

With PHP, which function is best to be used with $_GET[] values to make them browser safe?
I have read up on a few htmlspecialchars() and htmlentities(). Should one of those be used or is there another function that would work better?
Using htmlspecialchars suffices to encode the HTML special characters. htmlentities is only necessary if you want to use characters that can not be encoded with the character encoding you are using.
But make sure to specify the quote_style parameter when you want to use the output in an attribute value quoted with single quotes like:
echo "<input type='text' value='".htmlspecialchars($_GET['foobar'], ENT_QUOTES)."'>";
And to specify the charset parameter when you’re using a character encoding other than ISO 8859-1:
echo htmlspecialchars($_GET['foobar'], ENT_QUOTES, 'UTF-8');
You use htmlspecialchars() to display $_GET variables, and use urlencode() to encode them.
htmlspecialchars() should be applied to every $_GET variable you output into your page.
If you're doing this just for safety (removing <script>'s etc) rather than because you need to make sure characters are encoded correctly (although that could definitely be a concern) it could be worth looking at strip_tags, which will remove tags entirely, rather than just encoding the < and > symbols. This is a bit nicer in some cases - <b>hello</b> will become just "hello", rather than having the tags converted to become visible.

Categories