using htmlentities with superglobal variables - php

I'm working on php with a book now. The book said I should be careful using superglobal variables, so it's better to use htmlentities like this.
$came_from = htmlentities($_SERVER['HTTP_REFERER']);
So, I wrote a code like this;
<?php
$came_from=htmlentities($_SERVER['HTTP_REFERER']);
echo $came_from;
?>
However, the display of the code above was the same without htmlentities(); It didn't change anything at all. I thought that it would change \ into something else. Did I use it wrong?

So, by default, htmlentities() encodes characters using ENT_COMPAT (converts double-quotes and leave single-quotes alone) and ENT_HTML401. Seeing as the backslash isn't part of the HTML 4.01 entity spec (as far as I can see anyway), it won't be converted.
If you specify the ENT_HTML5 flag, you get a different result
php > echo htmlentities('abc\123');
abc\123
php > echo htmlentities('abc\123', ENT_HTML5);
abc&bsol;123
This is because backslash is part of the HTML5 spec. See http://dev.w3.org/html5/html-author/charref

Sorry. My previous answer was absolutely wrong. I was confused with something else. My apologise. Let me refrain my answer:
htmlentities will convert special characters into their HTML entity. "<" for example will be converted to "<". Your browser will automaticly recognise this HTML entity and decode it back to "<". So you won't notice any difference.
The reason for this is to prevent problems when saving your document in something different then UTF-8 encoding. Any characters not encoded might become screwed up for this reason.

Related

Passing code through the post variable?

I am coding a small template editor and the problem I am having is that code keeps getting converted into other characters, such as:
<?php
$hello = "hello";
?>
and it writes exactly that to the file, I want to write the actual code, php and html.
How can I accomplish this?
In this specific case you should run the contents of the file through the html_entity_decode function.
Description from the documentation -
Convert special characters to HTML entities
$str = '<?php';
echo html_entity_decode($str);
Outputs - <?php
Your issue is that your PHP is calling htmlspecialchars(). This converts characters that could be an issue (such as <>) into their HTML-safe version. You can resolve this by removing the htmlspecialchars() function (not recommended, as it's probably there for a reason) or calling html_entity_decode() on the code you want to save to a file.
I wouldn't necessarily recommend using html_entity_decode. It's usually a bad idea to fix incorrectly-encoded text by just reversing the encoding. Instead, figure out why it's encoded incorrectly in the first place.

Illegal non-standard quotes in XML

I'm allowing some user input on my website, that later is read in XML. Every once in a while I get these weird single or double quotes like this ”’. These are directly copied from the source that broke my XML. I'm wondering if there is an easy way to correct these types of characters in my xml. htmlentities did not seem to touch them.
Where do these characters come from? I'm not even sure how I'd go about typing them out unintentionally.
EDIT- I forgot to clarify these quotes are not being used in attributes, but in the following way:
<SomeTag>User’s Input</SomeTag>
Don't disallow and/or modify foreign characters; that's just annoying for your users! This is just an encoding issue. I don't know what parser you're using to read the XML, but if it's reasonably sophisticated, you can solve your problem by including the following encoding pragma at the top of your XML files:
<?xml version="1.0" encoding="UTF-8"?>
There may also be a UTF-8 option in the parser's API.
Edit: I just read that you're reading the XML directly in a browser. Most browsers listen to the encoding pragma!
Edit 2: Apparently, those quotes aren't even legal in UTF-8, so ignore what I said above. Instead, you might find what you're looking for here, where a similar problem is being discussed.
Are these quotes being used in text content, or to delimit attributes? For attribute delimiters, XML requires typewriter quotes (single or double). Microsoft and other word-processing applications often try to be smart and replace typewriter quotes with typographical quotes, which is almost certainly the answer to the question "where are they coming from?".
If you need to get rid of them, a simple global replace using a text editor will do the job fine.
But you might try to work out first why they are causing a problem. Perhaps your data flow can't handle ANY non-ASCII characters, in which case that's a deeper problem that you really ought to fix (it would typically imply some unwanted transcoding is happing somewhere along the line).
If the input string is UTF-8 encoded, maybe you need to specify that to htmlentities(), for example:
$html = htmlentities( '”’', ENT_COMPAT, "utf-8" );
echo $html;
For me gives:
”’
whereas
$html = htmlentities( '”’' );
echo $html;
gets confused:
â??â??
If the input string is non-UTF-8, then you'd need to adjust the encoding arg for htmlentities() accordingly.
Stay away from MicroSoft Office apps. Word, Excel etc. have a nasty habit of replacing matching pairs of single quotes and double quotes with non-standard "smart-quotes".
These quote characters are truly non-standard and never made it into the official latin-1 character set. All the MS Office apps "helpfully" replace standard quote characters with these abominations.
Just google for "undoing smatquotes" or "convert smartquotes back" for hints tips and regexes to get rid of these.
Use
$s = 'User’s Input';
$descriptfix = preg_replace('/[“”]/','\"',$s);
$descriptfix = preg_replace('/[‘’]/','\'',$descriptfix);
echo "<SomeTag>htmlentities($s)</SomeTag>";

converting special characters in HTML into the appropriate coding for PHP

I am making a website where one fills out a form and it creates a PDF. The user will be able to put in diacritic and special characters. The way I am sending the characters to the PHP, those characters will come into the PHP as HTML coded characters i.e. à. I need to change this to whatever it is PHP will read so when I put it through the PDF maker we have it has the diacritic character and not the HTML code for it.
I wrote a test to try this out but I haven't been able to figure it out. If I have to I will end up writing an array for every possible character they can use and translate the incoming string but I am trying to find an easier solution.
Here is the code of my test:
$title = "Test of Title for use With This Project and it should also wrap because it is sò long! Acutally it is even longer than previously expected!";
$ti = htmlspecialchars_decode($title);
I have been attempting to use the htmlspecialchars_decode() to convert it but it still comes out as &ograve and not ò. Is there an easy way to do this?
See the documentation which tells you it won't touch most of the characters you care about and to use html_entity_decode instead.
Use the html_entity_decode function instead of htmlspecialchars_decode (which only decodes entities such as &, ", < and > = special HTML chars, not all entities).

weird characters such as ‪ ‬ ‏

My friend has been playing around with some language stuff on our site and our file names are being out put with these characters now. Usually I'd wait for him to wake up but this is a pretty big issue as we are getting e-mails through about the weird characters in the file names.
You don't see the characters when echoed in HTML, but we have the names being output to a header, which does show the characters, like so:
header('Content-Disposition: attachment; filename="'.$title.'.'.strtolower($type).'";');
How can we avoid these characters from displaying? They are also being input to our database, file names such as ‪asdfmovie‬‏ - I have googled the codes but I can't find any results for them.
Does anyone know what they are? and how to avoid them?
Thank you
html_entity_decode()
http://php.net/manual/en/function.html-entity-decode.php
These are html entities that are valid in HTML. Your email client is actually encoding them into HTML entities (a double effect), which means that the actual entities are what you're seeing. Just make sure that anything passed into the email runs through the html_entity_decode() function.
These are HTML entities which can be decoded using html_entity_decode, like echo html_entity_decode($str, ENT_COMPAT, 'UTF-8').
It's wrong to store such values in the database though, as you are seeing. The values should be stored in their original form and only HTML entity encoded when necessary for outputting to HTML. Figure out where they're being HTML encoded and fix that. If you already have a database full of this nonsense... um, have fun reversing it. :o)

getting json_encode to not escape html entities

I send json_encoded data from my PHP server to iPhone app. Strings containing html entities, like '&' are escaped by json_encode and sent as &.
I am looking to do one of two things:
make json_encode not escape html entities. Doc says 'normal' mode shouldn't escape it but it doesn't work for me. Any ideas?
make the iPhone app un-escape html entities cheaply. The only way I can think of doing it now involves spinning up a XML/HTML parser which is very expensive. Any cheaper suggestions?
Thanks!
Neither PHP 5.3 nor PHP 5.2 touch the HTML entities.
You can test this with the following code:
<?php
header("Content-type: text/plain"); //makes sure entities are not interpreted
$s = 'A string with & &#x6F8 entities';
echo json_encode($s);
You'll see the only thing PHP does is to add double quotes around the string.
json_encode does not do that. You have another component that is doing the HTML encoding.
If you use the JSON_HEX_ options you can avoid that any < or & characters appear in the output (they'd get converted to \u003C or similar JS string literal escapes), thus possibly avoiding the problem:
json_encode($s, JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_QUOT)
though this would depend on knowing exactly which characters are being HTML-encoded further downstream. Maybe non-ASCII characters too?
Based on the manual it appears that json_encode shouldn't be escaping your entities, unless you explicitly tell it to, in PHP 5.3. Are you perhaps running an older version of PHP?
Going off of Artefacto's answer, I would recommend using this header, it's specifically designed for JSON data instead of just using plain text.
<?php
header('Content-Type: application/json'); //Also makes sure entities are not interpreted
$s = 'A string with & &#x6F8 entities';
echo json_encode($s);
Make sure you check out this post for more specific reasons why to use this content type, What is the correct JSON content type?

Categories