Why is htmlspecialchars not converting quotes? - php

I am trying to convert a single quote into html specialchars.. but its not working. I am not sure what I am doing wrong here. Below is how I am using the function
echo htmlspecialchars("Housemade Mac N' Cheese",ENT_QUOTES);
Any help would be really appreciated!

The following comes from my own experiment.
Try this, it will replace all single quote by ߴ, which is an Unicode character. nko high tone apostrophe (U+07F4). Php/js will treat it as a regular character, so no headaches here.
Of course the content of $string is going to be altered, this is not the perfect solution but a workaround.
echo htmlspecialchars(preg_replace("/'/","ߴ",$string),ENT_QUOTES);
To understand it better, in your showcase, the following should be ok, but what if the string will change to something you don't know yet. Will it contains single quotes, on, two, three, where? This is hard stuff! Look simple but it is actually a very complex case!
echo htmlspecialchars("Housemade Mac N\' Cheese",ENT_QUOTES);
Philosophical annexe, this part can be skipped...
The most complex things are hiding in the simplest.
Personal reflexion 💡

Related

Proper Storing of DataURIs in a Variable

There is probably a much more elegant way to say the title, but that's the best I could come up with because frankly I am feeling silly that I don't know the answer and even have to ask but that's how you learn, so no shame necessary.
I have stored dataURIs such as :
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADAAAAAwCAYAAABXAvmHAAAE5klEQVR4nO1ZS0hjVxg+9xEJVFAnCzcuurVC3XfTWpqMMTGJbrSFQumAxhERFdE4YK4Bo5kY8zAvo0434nKghW46pXRhH5sKBXHXdikyDjGK0ETj7f/f5KY3N68bySSh9YPP/5x7Xv9/znfOubkS8oAH/L9B1XtAnufVYNTZbBqYpCgqVaKuCgySxiwwBXVvpHXYt+hrKajn5+fjNzc3hGEYolarT1ZXV0NLS0t74FwSK4DjLQ6H48nCwsJkKpXqSafTQl2v19sKxQ0PIAcM4vr6+r2rq6uQzWZ7ur+//8nd3R1ZXFz8Hp73YDnLsoSmaQwKm/DyPuj6u03+drvd7Z2dnf1dXV2fazSaX2DmSSKR6Dk+Pn55cnLy8vLysgeftbW1/Qz1Puvo6ND5fL53iGz2EXXfA3Kgzp1O59jZ2VkQZxmJzkNgT+12+w6kb8u1b3gACAwCJPRjPB7/APM48y6X68NKziMaIaEC4MnS2toaQr0jIR1U4jyioZtYCtiw57hZEbe3t+dK2zXFCiC6u7uP0eIeWFlZ+Ulpu6ZZgZGRkcTh4WHV7ZpmBUiRM14JGGnm4OCAHx4e5pBDaIeGuCGgBa3FwlmAZrRmM2c2WzgTWpOJM5nM3CDawUHBGsEajUbBGsAaDECjgRswAAfQDnD6AaAerZ7r1+u5346Onp2/fi348ceffz3T6rSc7nE/8DGn1ek4nS5jtVod9+rVdyuiz3kSQv2Njo4qnyYxQ4lpXpop3qbog8zDmZkZwQerdbzk+F6vLy9fEAACr/O8cSQD8lKvs/VzSfxDSQLgC5pncnx5vcAplOtaCpWKlYyfQd4euJcICSk62D16yU1guftVPhZbrFSs07//phae5fDtpx3KKpabEVlZUQmJuKvJzFaPcsPKJZQfgCyRfhsBlNK/uF8q6FGRhETUfgVq0GGzSogvMn6pelIUSCjD7HFarwCqOMbkQdZXQhXOf2V9lJWQmMqcw/VaAekwFTexLC+TEC+KUcjXOoDip091g1QlodrcsMqhZBMrkpBY5dcnj/IqU8J1T7LvO7lWxUx+eUnty6+lyqgsIQkoaQk6gY5nH2a+HuT3WvWC3WOF5atEy0qFXlmGFsjkmPmxzTJMljRRscy/eUgzbMYWMNtWVUAog7dLlYQiVLLn+eUV3kb3Xrwgu3t7And2d0lsBxkj27FtEt1GRkkkGiXhCDJCQuEwCYbCJBQKkWAwRLa2gsAtEghsEX8gQHwBP/H5/cTr9xGvz0c2vV7i8W6SjU2wHrAeD3FveIS0OMPP3Rtk3fWcrK27yNraOnECV51rQGfBvmyK70IIcJydmJgQvrxFIhGWoqi0knbN9Jv4XmiaAGZnZ98X09PT071K2zVNACCh9qxF0660XdPsgcnJyQD8Fp7CNMMwgXA4PN1onxRjeXlZY7Va42NjYzwS03Nzc4p+fzZ8BWw226NEIvFDOp3upbI3I34VoWn695aWlr5AIBAv177uewA0TjscDs3U1FQf8MuLi4sjdF5whqZnkRgIBNGbTCaPYDW+GB8f/6hUf434Nkqdnp6ei//MEP+hAbp3wGUofLWC+6ANjB3K3gV+hc9AWiQWixUopqGnEDoOrxnfAD+GTWuHPI+MRqMc2D7g1xX7qIejUqCEJOMKL1/odL39eMAD/iv4B7EvTT8yUT9VAAAAAElFTkSuQmCC
in a variable by simply doing: $some_var = "INSERT_ABOVE"; and has usually worked fine, but I am finding in a piece of code I am working on right now that the code breaks anytime one of the variable=>datauri scheme I showed above is used but deleting that line, fixes the code.
So I am not sure ?WHY? this particular datauri is breaking the code and shouldn't anything inside "" QUOTATIONS be treated as a literal string? Is there a best practice or should I say recommended way of storing datauris as variables to be used through the code, so it doesn't break?
Any thoughts and/or suggestions would be greatly appreciated, TIA.
PS. This wasn't just due to the backslash at the end, it was also that the string by itself wasn't being respected, now it is storing into variable like this, thanks. Since the title most accurately describes what was being achieved, I think its more appropriate.
Use Heredoc string quoting. It's another way to represent strings in PHP. It can eliminate your problem of ending quote getting escaped by string data.
$str = <<<EOD
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADAAAAAwCAYAAABXAvmHAAAE5klEQVR4nO1ZS0hjVxg+9xEJVFAnCzcuurVC3XfTWpqMMTGJbrSFQumAxhERFdE4YK4Bo5kY8zAvo0434nKghW46pXRhH5sKBXHXdikyDjGK0ETj7f/f5KY3N68bySSh9YPP/5x7Xv9/znfOubkS8oAH/L9B1XtAnufVYNTZbBqYpCgqVaKuCgySxiwwBXVvpHXYt+hrKajn5+fjNzc3hGEYolarT1ZXV0NLS0t74FwSK4DjLQ6H48nCwsJkKpXqSafTQl2v19sKxQ0PIAcM4vr6+r2rq6uQzWZ7ur+//8nd3R1ZXFz8Hp73YDnLsoSmaQwKm/DyPuj6u03+drvd7Z2dnf1dXV2fazSaX2DmSSKR6Dk+Pn55cnLy8vLysgeftbW1/Qz1Puvo6ND5fL53iGz2EXXfA3Kgzp1O59jZ2VkQZxmJzkNgT+12+w6kb8u1b3gACAwCJPRjPB7/APM48y6X68NKziMaIaEC4MnS2toaQr0jIR1U4jyioZtYCtiw57hZEbe3t+dK2zXFCiC6u7uP0eIeWFlZ+Ulpu6ZZgZGRkcTh4WHV7ZpmBUiRM14JGGnm4OCAHx4e5pBDaIeGuCGgBa3FwlmAZrRmM2c2WzgTWpOJM5nM3CDawUHBGsEajUbBGsAaDECjgRswAAfQDnD6AaAerZ7r1+u5346Onp2/fi348ceffz3T6rSc7nE/8DGn1ek4nS5jtVod9+rVdyuiz3kSQv2Njo4qnyYxQ4lpXpop3qbog8zDmZkZwQerdbzk+F6vLy9fEAACr/O8cSQD8lKvs/VzSfxDSQLgC5pncnx5vcAplOtaCpWKlYyfQd4euJcICSk62D16yU1guftVPhZbrFSs07//phae5fDtpx3KKpabEVlZUQmJuKvJzFaPcsPKJZQfgCyRfhsBlNK/uF8q6FGRhETUfgVq0GGzSogvMn6pelIUSCjD7HFarwCqOMbkQdZXQhXOf2V9lJWQmMqcw/VaAekwFTexLC+TEC+KUcjXOoDip091g1QlodrcsMqhZBMrkpBY5dcnj/IqU8J1T7LvO7lWxUx+eUnty6+lyqgsIQkoaQk6gY5nH2a+HuT3WvWC3WOF5atEy0qFXlmGFsjkmPmxzTJMljRRscy/eUgzbMYWMNtWVUAog7dLlYQiVLLn+eUV3kb3Xrwgu3t7And2d0lsBxkj27FtEt1GRkkkGiXhCDJCQuEwCYbCJBQKkWAwRLa2gsAtEghsEX8gQHwBP/H5/cTr9xGvz0c2vV7i8W6SjU2wHrAeD3FveIS0OMPP3Rtk3fWcrK27yNraOnECV51rQGfBvmyK70IIcJydmJgQvrxFIhGWoqi0knbN9Jv4XmiaAGZnZ98X09PT071K2zVNACCh9qxF0660XdPsgcnJyQD8Fp7CNMMwgXA4PN1onxRjeXlZY7Va42NjYzwS03Nzc4p+fzZ8BWw226NEIvFDOp3upbI3I34VoWn695aWlr5AIBAv177uewA0TjscDs3U1FQf8MuLi4sjdF5whqZnkRgIBNGbTCaPYDW+GB8f/6hUf434Nkqdnp6ei//MEP+hAbp3wGUofLWC+6ANjB3K3gV+hc9AWiQWixUopqGnEDoOrxnfAD+GTWuHPI+MRqMc2D7g1xX7qIejUqCEJOMKL1/odL39eMAD/iv4B7EvTT8yUT9VAAAAAElFTkSuQmCC\
EOD;
echo $str;

How to get &curren to display literally, not as an HTML entity

I'm using php to look at an XML file that has a URL in it. The URLs look something like this:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
When I echo out the URLs, the "&curren" shows up as "¤" (AKA #164, A4 or currency symbol) and the links don't work. This happens even though there isn't a closing semicolon for it. What is the cleanest way to make "&curren" display literally?
Funny enough I ran into the same problem just now and I found this answer. However, I found another solution which might even be better!
Simply put the variable at the beginning of your query string, and you will avoid the &curren completely.
Do:
https://site.com/bacon_report?currentDimension=2&Id=1&report=1&param=1
instead of:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
Use the php function urlencode:
urlencode("https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1"
will output
https%3A%2F%2Fsite.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
The problem here is escaping - you need to escape the "&" characters. In XML all special characters like <, >, ', " and & should be escaped.
Escape it properly as
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
..just like in HTML:
WRONG - no escaping
CORRECT - correct escape sequence
So - the cleanest way to show "&curren" in HTML/XML is to properly escape the ampersand, and render it as "&curren".
I think that in this case it is best to use htmlentities because with urlencode you get
https%3A%2F%2Fexample.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
and when applying urldecode, you will still have the &curren symbol
where as with htmlentities the url comes out clean.
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
I came across this issue while working on technical documentation (in Markdown which gets converted to HTML).
To solve the issue I used a zero-width space character which I copied and pasted from between these brackets (​). That way it appears that there is no space and can include the below without any issues:
/search?query=1&currentLonLat=-74.600291,40.360869

What could the purpose of replacing %20 with spaces before doing PHP rawurlencode() be?

It's a pretty silly question, sorry. There is a big and rather complex system that has a bug and I managed to track it down to this piece
return str_replace('%2F', '/', rawurlencode(str_replace('%20', ' ', $key)));
There is a comment explaining why slashes are replaced - to preserve path structure, e.g. encoded1/encoded2/etc. However there is no explanation whatsoever why %20 is replaced with space and that part is the direct cause of a bug. I am tempted to just remove str_replace() but it looks like it was placed there for some reason and I have a feeling that I'll break something else by doing this. Has anyone encountered anything similar? Perhaps it's a dirty fix for some PHP bug? Any guesses and insights are highly appreciated!
Doing so would prevent %20 (encoded space) from being encoded to %2F20. However, it only serves to prevent double escaped spaces; other special characters would still get double encoded.
This is a sign of bad code; strings that are passed into this function shouldn't be allowed to have encoded characters in the first place.
I would recommend creating unit tests that cover all referencing code and then refactor this function to remove the str_replace() to make sure it doesn't break the tests.
First thing that jumps to mind is as a mitigation technique against double encoding.
Not that I would recommend doing such a thing this way, as it would get real messy real quickly (and one would already wonder why only that entity, perhaps 'they' never experienced issues with any others... yet).
It could be the result of a misunderstanding of rawurlencode() vs urlencode()
urlencode() replaces spaces with + signs
If the original author thought that rawurlencode() did the same thing, they would be attempting to pre-encode the spaces so they don't get turned into +s

Regex for a Function Call with Multiple Optional Parameters

I'm looking for a regex that will scan a document to match a function call, and return the value of the first parameter (a string literal) only.
The function call could look like any of the following:
MyFunction("MyStringArg");
MyFunction("MyStringArg", true);
MyFunction("MyStringArg", true, true);
I'm currently using:
$pattern = '/Use\s*\(\s*"(.*?)\"\s*\)\s*;/';
This pattern will only match the first form, however.
Thanks in advance for your help!
Update
I was able to solve my problem with:
$pattern = '/Use\s*\(\s*"(.*?)\"/';
Thanks Justin!
~Scott
If you only care about the value of the first parameter, you can just chop off the end of the regex:
$pattern = '/Use\s*\(\s*"(.*?)\"/';
However, you should understand that this (or any pure-regex solution for this problem) will not be perfect, and there will be some possible cases it handles incorrectly. In this case, you'll get false positives, and escaped quotes (\") will break it.
You can ignore escaped quotes by complicating it a bit:
$pattern = '/Use\s*\(\s*"(.*?)(?!<(?:\\\\)*\\)\"/';
This ignores " characters inside the quoted string if they have an odd number of backslashes in front of them.
However, the false-postives issue can't be helped without introducing false-negatives, and vice versa. This is because PHP is an irregular language, so it can't be parsed with "pure" regex, and even modern regex engines that allow recursion are going to need some pretty complex code to do a really thorough job at this.
All I'm saying is, if you're planning a one-off job to quickly scrape through some PHP you wrote yourself, regex is probably fine. If you're looking for something robust and open-ended that will do this on arbitrary PHP code, you need some kind of reflection or PHP parser.
This might be slightly simpler, though will only work if you have double quotes and not single quotes:
$pattern = /Use\s*[^\"]*\"([^\"]*)\"/

Getting rid of \r\n strings

I have a form into which I entered a newline character which looked correct when I entered it, but when the data is now pulled from the database, instead of the white space, I get the \n\r string showing up.
I try to do this:
$hike_description = nl2br($hike_description);
But it doesn't work. Does anyone know how this can be fixed? I am using PHP.
And here is the page where this is happening. See the description section of the page:
http://www.comehike.com/hikes/scheduled_hike.php?hike_id=130
Thanks,
Alex
Does anyone know how this can be fixed?
Sure.
Your code doing unnecessary escaping, most likely before adding text to the database.
So, instead of replacing it back, you have to find that harmful code and get rid of it.
This means, you have probably plain text '\n\r' strings in the db.
Try to sanitize db output before display:
$sanitized_text = preg_replace('/\\[rn]/','', $text_from_db);
(just a guess).
Addendum:
Of course, as Col. Shrapnel pointed out, there's something fundamentally wrong
with the contents of the database (or, it is used this way by convention and you don't know that).
For now, you have fixed a symptom partially
but it would be much better to look for the reason for these escaped characters
being in the database at all.
Regards
rbo
You can use str_replace to clean up the input.
$hike_description = nl2br(str_replace("\r\n", "\n", $hike_description));
$hike_description = str_replace(array('\n','\r'),'',$hike_description);
You may want to read up on the differences between the single quote and double quote in PHP as well: http://php.net/manual/en/language.types.string.php

Categories