Does sometime fputs() or fwrite() encode html special characters? - php

I am outputting a string that consists of html content to a html file, but in the html file the html special characters are encoded (for example " in \" ). I've even used htmlspecialcharacters_decode before using the write functions. The wierd part is that on my computer the characters are not encoded, while uploaded on some server are encoded. How can I deal with this problem?
Anticipated thanks!

You are probably suffering from Magic Quotes
Check you phpinfo();
To clear Magic Quotes look into the discussion at php.net:
http://www.php.net/manual/en/function.stripslashes.php
Example (c) jeremysawesome:
array_walk_recursive($_POST, create_function('&$val', '$val = stripslashes($val);'));

Related

PHP http_build_query generating a ® sign when I write reg, how to escape?

I have the following code:
$data1 = [
"user_number" => "423423", // unique_id
"reg_date" => "2013-01-20", // date of registration yyyy-mm-dd
];
echo http_build_query($data1);
This is generating the following string:
user_number=423423®_date=2013-01-20
As you can see, it converts "reg" to ®, breaking the API query. How to prevent it from doing that?
&reg is how you write a registered trademark symbol in HTML.
Your URL is fine, the problem is that you are interpreting it as a URL in HTML and not a URL in plain text.
Use htmlspecialchars to convert the string to HTML source code.
The file (physical file .php) you are coding may be in ISO-8859-1 or similar.
You need to convert your file to UTF-8 to avoid this problem.
You may also want to apply a function to solve convert special chars:
echo htmlspecialchars(http_build_query($data1));
echo htmlentities(http_build_query($data1));
Both will work.
The issue here is that &reg is encoded as ®.
By using functions to replace these special chars, it will render the way you want.
The online php interpreters probably use those, that's why we can't reproduce the issue.
echo http_build_query($data1,'','&');
&reg is how you write a registered trademark symbol in HTML.
that means your original code returns
user_number=423423&reg_date=2013-01-20
and when you output to browser browser converts &reg part to (r)

PHP - Replace JSON with the correct Unicode symbol

Ok, so I have some JSON, that when decoded, I print out the result. Before the JSON is decoded, I use stripslashes() to remove extra slashes. The JSON contains website links, such as https://www.w3schools.com/php/default.asp and descriptions like Hello World, I have u00249999999 dollars
When I print out the JSON, I would like it to print out
Hello World, I have $9999999 dollars, but it prints out Hello World, I have u00249999999 dollars.
I assume that the u0024 is not getting parsed because it has no backslash, though the thing is that the website links' forward slashes aren't removed through strip slashes, which is good - I think that the backslashes for the Unicode symbols are removed with stripslashes();
How do I get the PHP to automatically detect and parse the Unicode dollar sign? I would also like to apply this rule to every single Unicode symbol.
Thanks In Advance!
According to the PHP documentation on stripslashes (), it
un-quotes a quoted string.
Which means, that it basically removes all backslashes, which are used for escaping characters (or Unicode sequences). When removing those, you basically have no chance to be completely sure that any sequence as "u0024" was meant to be a Unicode entity, your user could just have entered that.
Besides that, you will get some trouble when using stripslashes () on a JSON value that contains escaped quotes. Consider this example:
{
"key": "\"value\""
}
This will become invalid when using stripslashes () because it will then look like this:
{
"key": ""value""
}
Which is not parseable as it isn't a valid JSON object. When you don't use stripslashes (), all escape sequences will be converted by the JSON parser and before outputting the (decoded) JSON object to the client, PHP will automatically decode (or "convert") the Unicode sequences your data may contain.
Conclusion: I'd suggest not to use stripslashes () when dealing with JSON entities as it may break things (as seen in the previous example, but also in your problem).
Your assumption is correct: u0024 is not getting parsed because it has no backslash. You can use regex to add backslash back after the conversion.
It looks like you have UTF-8 encoded strings internally, PHP outputs them properly, but your browser fails to auto-detect the encoding (it decides for ISO 8859-1 or some other encoding).
The best way is to tell the browser that UTF-8 is being used by sending the corresponding HTTP header:
header("content-type: text/html; charset=UTF-8");
Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.
If you want, you can additionally declare the encoding in the generated HTML by using the <meta> tag:
<meta http-equiv=Content-Type content="text/html; charset=UTF-8"> for HTML <=4.01
<meta charset="UTF-8">
for HTML5
HTTP header has priority over the <meta> tag, but the latter may be useful if the HTML is saved to HD and then read locally.
The main question you have to understand, is why do you need to strip slashes?
And, if it is really necessary to strip slashes, how to manage the encoding? Probably it is a good idea to convert unicode symbols before to strip slashes, not after, using html_entity_decode .
Anyway, you can try fix the problem with this workaround:
$string = "Hello World, I have u00249999999 dollars";
$string = preg_replace( "/u([0-9A-F]{0,4})/", "&#x$1;", $string ); // recover "u" + 4 alnums
$string = html_entity_decode( $string, ENT_COMPAT, 'UTF-8' ); // convert to utf-8

using htmlentities with superglobal variables

I'm working on php with a book now. The book said I should be careful using superglobal variables, so it's better to use htmlentities like this.
$came_from = htmlentities($_SERVER['HTTP_REFERER']);
So, I wrote a code like this;
<?php
$came_from=htmlentities($_SERVER['HTTP_REFERER']);
echo $came_from;
?>
However, the display of the code above was the same without htmlentities(); It didn't change anything at all. I thought that it would change \ into something else. Did I use it wrong?
So, by default, htmlentities() encodes characters using ENT_COMPAT (converts double-quotes and leave single-quotes alone) and ENT_HTML401. Seeing as the backslash isn't part of the HTML 4.01 entity spec (as far as I can see anyway), it won't be converted.
If you specify the ENT_HTML5 flag, you get a different result
php > echo htmlentities('abc\123');
abc\123
php > echo htmlentities('abc\123', ENT_HTML5);
abc&bsol;123
This is because backslash is part of the HTML5 spec. See http://dev.w3.org/html5/html-author/charref
Sorry. My previous answer was absolutely wrong. I was confused with something else. My apologise. Let me refrain my answer:
htmlentities will convert special characters into their HTML entity. "<" for example will be converted to "<". Your browser will automaticly recognise this HTML entity and decode it back to "<". So you won't notice any difference.
The reason for this is to prevent problems when saving your document in something different then UTF-8 encoding. Any characters not encoded might become screwed up for this reason.

Illegal non-standard quotes in XML

I'm allowing some user input on my website, that later is read in XML. Every once in a while I get these weird single or double quotes like this ”’. These are directly copied from the source that broke my XML. I'm wondering if there is an easy way to correct these types of characters in my xml. htmlentities did not seem to touch them.
Where do these characters come from? I'm not even sure how I'd go about typing them out unintentionally.
EDIT- I forgot to clarify these quotes are not being used in attributes, but in the following way:
<SomeTag>User’s Input</SomeTag>
Don't disallow and/or modify foreign characters; that's just annoying for your users! This is just an encoding issue. I don't know what parser you're using to read the XML, but if it's reasonably sophisticated, you can solve your problem by including the following encoding pragma at the top of your XML files:
<?xml version="1.0" encoding="UTF-8"?>
There may also be a UTF-8 option in the parser's API.
Edit: I just read that you're reading the XML directly in a browser. Most browsers listen to the encoding pragma!
Edit 2: Apparently, those quotes aren't even legal in UTF-8, so ignore what I said above. Instead, you might find what you're looking for here, where a similar problem is being discussed.
Are these quotes being used in text content, or to delimit attributes? For attribute delimiters, XML requires typewriter quotes (single or double). Microsoft and other word-processing applications often try to be smart and replace typewriter quotes with typographical quotes, which is almost certainly the answer to the question "where are they coming from?".
If you need to get rid of them, a simple global replace using a text editor will do the job fine.
But you might try to work out first why they are causing a problem. Perhaps your data flow can't handle ANY non-ASCII characters, in which case that's a deeper problem that you really ought to fix (it would typically imply some unwanted transcoding is happing somewhere along the line).
If the input string is UTF-8 encoded, maybe you need to specify that to htmlentities(), for example:
$html = htmlentities( '”’', ENT_COMPAT, "utf-8" );
echo $html;
For me gives:
”’
whereas
$html = htmlentities( '”’' );
echo $html;
gets confused:
â??â??
If the input string is non-UTF-8, then you'd need to adjust the encoding arg for htmlentities() accordingly.
Stay away from MicroSoft Office apps. Word, Excel etc. have a nasty habit of replacing matching pairs of single quotes and double quotes with non-standard "smart-quotes".
These quote characters are truly non-standard and never made it into the official latin-1 character set. All the MS Office apps "helpfully" replace standard quote characters with these abominations.
Just google for "undoing smatquotes" or "convert smartquotes back" for hints tips and regexes to get rid of these.
Use
$s = 'User’s Input';
$descriptfix = preg_replace('/[“”]/','\"',$s);
$descriptfix = preg_replace('/[‘’]/','\'',$descriptfix);
echo "<SomeTag>htmlentities($s)</SomeTag>";

getting json_encode to not escape html entities

I send json_encoded data from my PHP server to iPhone app. Strings containing html entities, like '&' are escaped by json_encode and sent as &.
I am looking to do one of two things:
make json_encode not escape html entities. Doc says 'normal' mode shouldn't escape it but it doesn't work for me. Any ideas?
make the iPhone app un-escape html entities cheaply. The only way I can think of doing it now involves spinning up a XML/HTML parser which is very expensive. Any cheaper suggestions?
Thanks!
Neither PHP 5.3 nor PHP 5.2 touch the HTML entities.
You can test this with the following code:
<?php
header("Content-type: text/plain"); //makes sure entities are not interpreted
$s = 'A string with & &#x6F8 entities';
echo json_encode($s);
You'll see the only thing PHP does is to add double quotes around the string.
json_encode does not do that. You have another component that is doing the HTML encoding.
If you use the JSON_HEX_ options you can avoid that any < or & characters appear in the output (they'd get converted to \u003C or similar JS string literal escapes), thus possibly avoiding the problem:
json_encode($s, JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_QUOT)
though this would depend on knowing exactly which characters are being HTML-encoded further downstream. Maybe non-ASCII characters too?
Based on the manual it appears that json_encode shouldn't be escaping your entities, unless you explicitly tell it to, in PHP 5.3. Are you perhaps running an older version of PHP?
Going off of Artefacto's answer, I would recommend using this header, it's specifically designed for JSON data instead of just using plain text.
<?php
header('Content-Type: application/json'); //Also makes sure entities are not interpreted
$s = 'A string with & &#x6F8 entities';
echo json_encode($s);
Make sure you check out this post for more specific reasons why to use this content type, What is the correct JSON content type?

Categories