XML - proper way to escape string - php

I just want to add in tag any string.
I am using this code to escape string:
$name = $this->_dom->createElement('name', htmlspecialchars($userName, ENT_COMPAT,'utf-8'));
$item->appendChild($name);
I got a problem, one of my users put to name field some specific symbols, and whole xml feed become broken. How i must escaping string?
Thank for help and sorry for my poor English...

I think adding the xml node's value by using createTextNode instead of passing it as a parameter to createElement may solve your problem.

My solution:
$title = $this->_dom->createElement('title',
htmlspecialchars($playlist->getTitle(),
ENT_DISALLOWED, 'utf-8')
);
And when you output xml, you must strip � (REPLACEMENT CHARACTER) see this related link about ENT_DISALLOWED like:
echo str_replace('�', '', $this->_dom->saveXML());
This way allows us to display any string with html entities and/or special chars.

Related

PHP urlencode issue with a parameter in my string: "&notify_url" incorrectly returns "¬ify_url"

So when I use PHP's urlencode on the following string, there seems to be a technicality coming up which I think is on a reserved PHP word "&not".
The original string:
cancel_url=https://example.com/payment_cancelled&notify_url=https://example.com/order_notify
I get the following result using urlencode:
cancel_url=https%3A%2F%2Fexample.com%2Fpayment_cancelled¬ify_url=https%3A%2F%2Fexample.com%2Forder_notify
As you notice above, the '¬' special character it creates (just after the word 'cancelled'). So to me it seems the "&not" portion of "&notify_url" is an operator reserved operator word ("&not" in PHP?).
I have tried PHP's str_replace function after url encoding as follows:
$paramUrlString = str_replace('¬', '&not', $paramUrlString);
$paramUrlString = str_replace('&#170', '&not', $paramUrlString);
(trying the ASCII code for that special character too)
I've run out of ideas now. Please assist, thank you.
urlencode does not usually replace &not at all, but does replace & with %26. See example here: http://sandbox.onlinephpfunctions.com/code/e9d62797d01f8162170e5ad5181e14fc339faa52
You could try replacing & with %26 before urlencode.
$urlString = str_replace('&', '%26', $urlString);
It's not that anything in PHP is replacting the string &not with ¬, it's that whatever you're using to view/display the data is doing that.
Given that the closing ; on the entity is not required, I would wager that you're putting the URL into XML without properly escaping the entities. While & is the entity that conflicts between URLs and XML, there are more than that.
The simplest solution is if you're embedding a raw string in an XML document you need to call:
$string = htmlspecialchars($string, ENT_XML1 | ENT_COMPAT);
The best solution, on the other hand, is to not create XML documents by hand at all. Use a library like DOMDocument or XMLWriter. This handles not only the escaping/encoding of your data, but all of the other subtle complexities of creatings proper XML documents.

variable name with 'current' word in php adds special character [duplicate]

I'm using php to look at an XML file that has a URL in it. The URLs look something like this:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
When I echo out the URLs, the "&curren" shows up as "¤" (AKA #164, A4 or currency symbol) and the links don't work. This happens even though there isn't a closing semicolon for it. What is the cleanest way to make "&curren" display literally?
Funny enough I ran into the same problem just now and I found this answer. However, I found another solution which might even be better!
Simply put the variable at the beginning of your query string, and you will avoid the &curren completely.
Do:
https://site.com/bacon_report?currentDimension=2&Id=1&report=1&param=1
instead of:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
Use the php function urlencode:
urlencode("https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1"
will output
https%3A%2F%2Fsite.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
The problem here is escaping - you need to escape the "&" characters. In XML all special characters like <, >, ', " and & should be escaped.
Escape it properly as
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
..just like in HTML:
WRONG - no escaping
CORRECT - correct escape sequence
So - the cleanest way to show "&curren" in HTML/XML is to properly escape the ampersand, and render it as "&curren".
I think that in this case it is best to use htmlentities because with urlencode you get
https%3A%2F%2Fexample.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
and when applying urldecode, you will still have the &curren symbol
where as with htmlentities the url comes out clean.
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
I came across this issue while working on technical documentation (in Markdown which gets converted to HTML).
To solve the issue I used a zero-width space character which I copied and pasted from between these brackets (​). That way it appears that there is no space and can include the below without any issues:
/search?query=1&currentLonLat=-74.600291,40.360869

How do I remove HTML tags from a string?

I have a php script, where the user inserts his name.
Users can insert anything they want, even things like <img src="....
I would like to save their input in a way it won't show any image (or any html).
I know it exists but I don't know what keywords to search in order to find what does it.
Use strip_tags($str).
http://php.net/strip_tags
htmlspecialchars() will encode the text so that the tags are not interpreted as HTML.
The easiest solution is the PHP function strip_tags(), which does exactly what the name suggests, and strips HTML tags from a string.
The other alternative is to 'escape' the input, so that HTML characters such as < and > are converted into displayable text. This would result in the HTML code being displayed.
You would do this with the function htmlentities().
It's worth pointing out that the input may contain HTML characters without actually intending to be HTML. The & character is a HTML reserved character, but can also be found in normal text. > and < are less commonly used in normal text, but still possible. All of them may cause problems when displayed on your page, without necessarily being actual HTML code.
The solution to this is as above, to escape the string using htmlentities(). You may want to run striptags() first, but you should also run htmlentities() as well, to ensure that the string is displayed correctly.
Hope that helps.

Replace characters in a string with their HTML coding

I need to replace characters in a string with their HTML coding.
Ex. The "quick" brown fox, jumps over the lazy (dog).
I need to replace the quotations with the & quot; and replace the brakets with & #40; and & #41;
I have tried str_replace, but I can only get 1 character to be replaced. Is there a way to replace multiple characters using str_replace? Or is there a better way to do this?
Thanks!
I suggest using the function htmlentities().
Have a look at the Manual.
PHP has a number of functions to deal with this sort of thing:
Firstly, htmlentities() and htmlspecialchars().
But as you already found out, they won't deal with ( and ) characters, because these are not characters that ever need to be rendered as entities in HTML. I guess the question is why you want to convert these specific characters to entities? I can't really see a good reason for doing it.
If you really do need to do it, str_replace() will do multiple string replacements, using arrays in both the search and replace paramters:
$output = str_replace(array('(',')'), array('&#40','&#41'), $input);
You can also use the strtr() function in a similar way:
$conversions = array('('=>'(', ')'=>')');
$output = strtr($conversions, $input);
Either of these would do the trick for you. Again, I don't know why you'd want to though, because there's nothing special about ( and ) brackets in this context.
While you're looking into the above, you might also want to look up get_html_translation_table(), which returns an array of entity conversions as used in htmlentities() or htmlspecialchars(), in a format suitable for use with strtr(). You could load that array and add the extra characters to it before running the conversion; this would allow you to convert all normal entity characters as well as the same time.
I would point out that if you serve your page with the UTF8 character set, you won't need to convert any characters to entities (except for the HTML reserved characters <, > and &). This may be an alternative solution for you.
You also asked in a separate comment about converting line feeds. These can be converted with PHP's nl2br() function, but could also be done using str_replace() or strtr(), so could be added to a conversion array with everything else.

Searching through a string trying to find ' in PHP

I am using tinyMCE and, rather annoyingly, it replaces all of my apostrophes with their HTML numeric equivalent. Now most of the time this isn't a problem but for some reason I am having a problem storing the apostrophe replacement. So i have to search through the string and replace them all. Any help would be much appreciated
did you try:
$string = str_replace("'", "<replacement>", $string);
Is it just apostrophes that you want decoded from HTML entities, or everything?
print html_entity_decode("Hello, that's an apostophe.", ENT_QUOTE);
will print
Hello, that's an apostrophe.
Why work around the problem when you can fix the cause? You can just turn of the TinyMCE entity encoding*. More info: here
*Unless you want all the other characters encoded, that is.

Categories