Convert ' to an apostrophe in PHP - php

My data has many HTML entities in it (• ...etc) including '. I just want to convert it to its character equivalent.
I assumed htmlspecialchars_decode() would work, but - no luck. Thoughts?
I tried this:
echo htmlspecialchars_decode('They're here.');
But it returns: They're here.
Edit:
I've also tried html_entity_decode(), but it doesn't seem to work:
echo html_entity_decode('They're here.')
also returns: They're here.

Since ' is not part of HTML 4.01, it's not converted to ' by default.
In PHP 5.4.0, extra flags were introduced to handle different languages, each of which includes ' as an entity.
This means you can do something like this:
echo html_entity_decode('They're here.', ENT_QUOTES | ENT_HTML5);
You will need both ENT_QUOTES (convert single and double quotes) and ENT_HTML5 (or any language flag other than ENT_HTML401, so choose the most appropriate to your situation).
Prior to PHP 5.4.0, you'll need to use str_replace:
echo str_replace(''', "'", 'They're here.');

There is a "right" way, without using str_replace , #cbuckley was right it's because the default for html_entity_decode is HTML 4.01, but you can set an HTML 5 parameter that will decode it.
Use it like this:
html_entity_decode($str,ENT_QUOTES | ENT_HTML5)

The ' entity and a lot of others are not in the PHP translation table used by html_entity_decode and htmlspecialchars_decode functions, unfortunately.
Check this comment from the PHP manual:
http://php.net/manual/en/function.get-html-translation-table.php#73410

This should work:
$value = "They're here.";
html_entity_decode(str_replace("'","'",$value));

What you are actually looking for is html_entity_decode().
html_entity_decode() translates all entities to characters, while htmlspecialchars_decode() only reverses what htmlspecialchars() will encode.
EDIT: Looking at the examples on the page I linked to, I did a bit more investigation and the following seems to not work:
[matt#scharley ~]$ php
<?php
$tmp = array_flip(get_html_translation_table(HTML_ENTITIES));
var_dump($tmp['&apos;']);
PHP Notice: Undefined index: &apos; in - on line 3
NULL
This is why it's not working. Why it's not in the lookup table is another question entirely, something I can't answer unfortunately.

Have you tried using echo htmlspecialchars('They&apos;re here.')?
I think that is what you are looking for.

Related

PHP urlencode issue with a parameter in my string: "&notify_url" incorrectly returns "¬ify_url"

So when I use PHP's urlencode on the following string, there seems to be a technicality coming up which I think is on a reserved PHP word "&not".
The original string:
cancel_url=https://example.com/payment_cancelled&notify_url=https://example.com/order_notify
I get the following result using urlencode:
cancel_url=https%3A%2F%2Fexample.com%2Fpayment_cancelled¬ify_url=https%3A%2F%2Fexample.com%2Forder_notify
As you notice above, the '¬' special character it creates (just after the word 'cancelled'). So to me it seems the "&not" portion of "&notify_url" is an operator reserved operator word ("&not" in PHP?).
I have tried PHP's str_replace function after url encoding as follows:
$paramUrlString = str_replace('¬', '&not', $paramUrlString);
$paramUrlString = str_replace('&#170', '&not', $paramUrlString);
(trying the ASCII code for that special character too)
I've run out of ideas now. Please assist, thank you.
urlencode does not usually replace &not at all, but does replace & with %26. See example here: http://sandbox.onlinephpfunctions.com/code/e9d62797d01f8162170e5ad5181e14fc339faa52
You could try replacing & with %26 before urlencode.
$urlString = str_replace('&', '%26', $urlString);
It's not that anything in PHP is replacting the string &not with ¬, it's that whatever you're using to view/display the data is doing that.
Given that the closing ; on the entity is not required, I would wager that you're putting the URL into XML without properly escaping the entities. While & is the entity that conflicts between URLs and XML, there are more than that.
The simplest solution is if you're embedding a raw string in an XML document you need to call:
$string = htmlspecialchars($string, ENT_XML1 | ENT_COMPAT);
The best solution, on the other hand, is to not create XML documents by hand at all. Use a library like DOMDocument or XMLWriter. This handles not only the escaping/encoding of your data, but all of the other subtle complexities of creatings proper XML documents.

PHP Escape a string if it hasn't already been escaped with entities

I'm using a 3rd party API that seems to return its data with the entity codes already in there. Such as The Lion’s Pride.
If I print the string as-is from the API it renders just fine in the browser (in the example above it would put in an apostrophe). However, I can't trust that the API will always use the entities in the future so I want to use something like htmlentities or htmlspecialchars myself before I print it. The problem with this is that it will encode the ampersand in the entity code again and the end result will be The Lion&#8217;s Pride in the HTML source which doesn't render anything user friendly.
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
No one seems to be answering your actual question, so I will
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
It's impossible. What if I'm making an educational post about HTML entities and I want to actually print this on the screen:
The Lion&#8217;s Pride
... it would need to be encoded as...
The Lion&amp&semi;&num;8217&semi;s Pride
But what if that was the actual string we wanted to print on the string ? ... and so on.
Bottom line is, you have to know what you've been given and work from there – which is where the advice from the other answers comes in – which is still just a workaround.
What if they give you double-encoded strings? What if they start wrapping the html-encoded strings in XML? And then wrap that in JSON? ... And then the JSON is converted to binary strings? the possibilities are endless.
It's not impossible for the API you depend on to suddenly switch the output type, but it's also a pretty big violation of the original contract with your users. To some extent, you have to put some trust in the API to do what it says it's going to do. Unit/Integration tests make up the rest of the trust.
And because you could never write a program that works for any possible change they could make, it's senseless to try to anticipate any change at all.
Decode the string, then re-encode the entities. (Using html_entity_decode())
$string = htmlspecialchars(html_entity_decode($string));
https://eval.in/662095
There is NO WAY to do what you ask for!
You must know what kind of data is the service giving back.
Anything else would be guessing.
Example:
what if the service is giving back & but is not escaping ?
you would guess it IS escaping so you would wrongly interpret as & while the correct value is &
I think the best solution, is first to decode all html entities/special chars from the original string, and then html encode the string again.
That way you will end up with a correctly encoded string, no matter if the original string was encoded or not.
You also have the option of using htmlspecialchars_decode();
$string = htmlspecialchars_decode($string);
It's already in htmlentities:
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), false);
Hi&mom
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), true);
Hi&amp&semi;mom
Just use the [optional]4th argument to NOT double-encode.

variable name with 'current' word in php adds special character [duplicate]

I'm using php to look at an XML file that has a URL in it. The URLs look something like this:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
When I echo out the URLs, the "&curren" shows up as "¤" (AKA #164, A4 or currency symbol) and the links don't work. This happens even though there isn't a closing semicolon for it. What is the cleanest way to make "&curren" display literally?
Funny enough I ran into the same problem just now and I found this answer. However, I found another solution which might even be better!
Simply put the variable at the beginning of your query string, and you will avoid the &curren completely.
Do:
https://site.com/bacon_report?currentDimension=2&Id=1&report=1&param=1
instead of:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
Use the php function urlencode:
urlencode("https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1"
will output
https%3A%2F%2Fsite.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
The problem here is escaping - you need to escape the "&" characters. In XML all special characters like <, >, ', " and & should be escaped.
Escape it properly as
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
..just like in HTML:
WRONG - no escaping
CORRECT - correct escape sequence
So - the cleanest way to show "&curren" in HTML/XML is to properly escape the ampersand, and render it as "&curren".
I think that in this case it is best to use htmlentities because with urlencode you get
https%3A%2F%2Fexample.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
and when applying urldecode, you will still have the &curren symbol
where as with htmlentities the url comes out clean.
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
I came across this issue while working on technical documentation (in Markdown which gets converted to HTML).
To solve the issue I used a zero-width space character which I copied and pasted from between these brackets (​). That way it appears that there is no space and can include the below without any issues:
/search?query=1&currentLonLat=-74.600291,40.360869

How to get &curren to display literally, not as an HTML entity

I'm using php to look at an XML file that has a URL in it. The URLs look something like this:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
When I echo out the URLs, the "&curren" shows up as "¤" (AKA #164, A4 or currency symbol) and the links don't work. This happens even though there isn't a closing semicolon for it. What is the cleanest way to make "&curren" display literally?
Funny enough I ran into the same problem just now and I found this answer. However, I found another solution which might even be better!
Simply put the variable at the beginning of your query string, and you will avoid the &curren completely.
Do:
https://site.com/bacon_report?currentDimension=2&Id=1&report=1&param=1
instead of:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
Use the php function urlencode:
urlencode("https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1"
will output
https%3A%2F%2Fsite.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
The problem here is escaping - you need to escape the "&" characters. In XML all special characters like <, >, ', " and & should be escaped.
Escape it properly as
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
..just like in HTML:
WRONG - no escaping
CORRECT - correct escape sequence
So - the cleanest way to show "&curren" in HTML/XML is to properly escape the ampersand, and render it as "&curren".
I think that in this case it is best to use htmlentities because with urlencode you get
https%3A%2F%2Fexample.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
and when applying urldecode, you will still have the &curren symbol
where as with htmlentities the url comes out clean.
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
I came across this issue while working on technical documentation (in Markdown which gets converted to HTML).
To solve the issue I used a zero-width space character which I copied and pasted from between these brackets (​). That way it appears that there is no space and can include the below without any issues:
/search?query=1&currentLonLat=-74.600291,40.360869

Replace characters in a string with their HTML coding

I need to replace characters in a string with their HTML coding.
Ex. The "quick" brown fox, jumps over the lazy (dog).
I need to replace the quotations with the & quot; and replace the brakets with & #40; and & #41;
I have tried str_replace, but I can only get 1 character to be replaced. Is there a way to replace multiple characters using str_replace? Or is there a better way to do this?
Thanks!
I suggest using the function htmlentities().
Have a look at the Manual.
PHP has a number of functions to deal with this sort of thing:
Firstly, htmlentities() and htmlspecialchars().
But as you already found out, they won't deal with ( and ) characters, because these are not characters that ever need to be rendered as entities in HTML. I guess the question is why you want to convert these specific characters to entities? I can't really see a good reason for doing it.
If you really do need to do it, str_replace() will do multiple string replacements, using arrays in both the search and replace paramters:
$output = str_replace(array('(',')'), array('&#40','&#41'), $input);
You can also use the strtr() function in a similar way:
$conversions = array('('=>'(', ')'=>')');
$output = strtr($conversions, $input);
Either of these would do the trick for you. Again, I don't know why you'd want to though, because there's nothing special about ( and ) brackets in this context.
While you're looking into the above, you might also want to look up get_html_translation_table(), which returns an array of entity conversions as used in htmlentities() or htmlspecialchars(), in a format suitable for use with strtr(). You could load that array and add the extra characters to it before running the conversion; this would allow you to convert all normal entity characters as well as the same time.
I would point out that if you serve your page with the UTF8 character set, you won't need to convert any characters to entities (except for the HTML reserved characters <, > and &). This may be an alternative solution for you.
You also asked in a separate comment about converting line feeds. These can be converted with PHP's nl2br() function, but could also be done using str_replace() or strtr(), so could be added to a conversion array with everything else.

Categories