Encoding and decoding with php and javascript and data attributes validation - php

Im using data-attributes for sending data from html to javascript
The data attributes come from mysql data so it might have a space in it, so
i get validation errors as it says no space are allowed inside attributes
So the solution i thought if is to encode the value in php so it has no spaces
and then decode it in javascript once it has been passed on to javascript. Is there a premade function for this?
A failsafe way will be awesome :)
Also is there a way to store values in data-attributes with having space in it and getting it to validate?

spaces in html attributes shouldn't be a problem (just think of the style attribute: style="background-color: #F63;" has a space in it, but will still work. If the data is sent using ajax however,chances are it gets url encoded (#Brad: urlencode is what you meant, I suspect).
since you say the data is coming from an SQL table, chances are the data itself is passed is stored in either an object or an array in php. If so, why not just json_encode the data, and in javascript: JSON.parse(document.getElementById('theId').value);. This gives you an object in JS, containing all data you have. If only 1 string is required, you can sill use json_encode by placing your data in a wrapper array, and encode that...
Do make sure to use SINGLE quotes if you string this to html:
<?php
$dbArray = array('this', 'array', 'contains','your','data', 'with spaces');
$html= '<input type="hidden" id="hiddenArray" value=\''.json_encode($dbArray).'\'/>';
If you use single quotes the element will be malformed, since json encoded arrays have double quotes in them:
<input type="hidden" id="hiddenArray" value='["this","array","contains","your","data","with spaces"]' />
Might not look sexy, but as you can see, double quotes would have set the elements value to [...
This doesn't strip spaces, but they won't cause you any problems either.

Also is there a way to store values in data-attributes with having space in it and getting it to validate?
Just make sure you quote your attribute values.
This is a valid document that passes at the W3C validator.
<!DOCTYPE HTML>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Data Attributes</title>
</head>
<body>
<h1 data-space="This value has a space in it">Data Attributes</h1>
</body>
</html>

If you html encode them in PHP using something like htmlentities you can then decode them in javascript using unescape
<?php
$value = htmlentities('hey you'); //hey%20you
and then in javascript
var value = unescape('hey%20you'); //hey you

Related

safely load HTML from user into textarea

I'm using TinyMCE 4 on a project, where I need to be able to pre-populate the textarea with HTML that was submitted through POST (for server-side error handling without deleting all their work) I know that a textarea works mostly like a tag, in that HTML inside is not parsed into DOM, so most sites show the demo:
<textarea name="demo"><?=$_POST['demo']?></textarea>
but what happens when a user submits HTML that includes an unmatched <textarea> or </textarea> tag?
Is there a standard way to manage this risk?
use htmlspecialchars($_POST['demo']) in php when outputing
Remove only the <textarea> tags from the user input. Please see this post using regular expressions. It tells you how to remove only certain tags (unlike htmlentities) which removes all tags.
Use xmp tag instead of textarea. It will display html as itself.
Eg: http://dadinck.x10.mx/xmp.html
htmlentities function will replace every html caracter (such as <) to one that will display correctly but wont break your html.
http://www.php.net/manual/en/function.htmlentities.php

Replace a string with HTML

I'm using str_replace as follows:
<html>
<head>
</head>
<body>
<script type="text/php">
$new_str = str_replace( '[[str_to_replace]]' , $GLOBALS['html'] , $original_str );
</script>
<div class="wrapper">[[str_to_replace]]</div>
<?php
// multiple includes
// lots and lots of code
//PHP code to calculate HTML code
//value of $html depends on data calculated after div.wrapper is drawn
$GLOBALS['html'] = '<input type="text" />';
?>
</body>
</html>
I'm forced to wrap the PHP code in a script tag because the document is getting passed to a library as an HTML document. The library has the ability to execute PHP code inside script tags but in this case is working oddly.
What I'm expecting:
[[str_to_replace]] should become an HTML input field.
What I'm getting:
[[str_to_replace]] becomes the literal string <input type="text" />.
How do I get the second result?
You're likely misinterpreting wht you can do with inline script. In dompdf HTML is parsed separately from inline script. Any HTML you insert into the document using inline script will be treated as plain text. What you should be doing is parsing your document first then passing the results to dompdf.
FYI, It's hard to see from your sample exactly what you're doing in the code. Plus we can't see what's going on with dompdf. I'm having a hard time seeing how everything ties together.
It sounds like what you're trying to do is to replace the string with decoded HTML entities. You'll probably want to do:
$htmlEntityString = '&'; // String containing HTML entities that you want to decode.
$new_str = str_replace( '[[str_to_replace]]' , html_entity_decode($htmlEntityString) , $original_str );
In this case, whatever HTML you have with HTML entity form will be decoded and will replace the substring.
Read more about it for all the options:
http://php.net/manual/en/function.html-entity-decode.php

Preserve utf8 when loading HTML from file

Well, apparently, PHP and it's standard libraries have some problems, and DOMDocument isn't an exception.
There are workarounds for utf8 characters when loading HTML string - $dom->loadHTML().
Apparently, I haven't found a way to do this when loading HTML from file - $dom->loadHTMLFile(). While it reads and sets the encoding from <meta /> tags, the problem strikes back if I haven't defined those. For instance, when loading a fragment of HTML (template part, like, footer.html), not a fully built HTML document.
So, how do I preserve utf8 characters, when loading HTML from file, that hasn't got it's <meta /> keys present, and defining those is not an option?
Update
footer.html (the file is encoded in UTF-8 without BOM):
<div id="footer">
<p>My sūpēr ōzōm ūtf8 štrīņģ</p>
</div>
index.php:
$dom = new DOMDocument;
$dom->loadHTMLFile('footer.html');
echo $dom->saveHTML(); // results in all familiar effed' up characters
Thanks in advance!
Try a hack like this one:
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="UTF-8">' . $html);
// dirty fix
foreach ($doc->childNodes as $item)
if ($item->nodeType == XML_PI_NODE)
$doc->removeChild($item); // remove hack
$doc->encoding = 'UTF-8'; // insert proper
Several others are listed in the user comments here: http://php.net/manual/en/domdocument.loadhtml.php. It is also important that your document head includea meta tag to specify encoding FIRST, directly after the tag.
I would suggest using my answer here: https://stackoverflow.com/a/12846243/816753 and instead of adding another <head>, wrap your entire fragment in
<html>
<head><meta http-equiv='Content-type' content='text/html; charset=UTF-8' /></head>
<body><!-- your content here --></body>
</html>`
While I'm not sure about how to go about solving the problem with ->loadHTMLFile(), have you considered using file_get_contents() to get the HTML, run mb_convert_encoding() on that string, then pass that value in to ->loadHTML()?
Edit: Also, when you initialize DOMDocument, are you giving it the $encoding argument?
The key is for your browser only. Once the page is all built up, your browser should display the page correctly if it has the meta at the end.
You can always try to use the utf8_decode (or encode, I'm never sure lol) function before echo'ing the data like so:
echo utf8_decode($dom->saveHTML());

str_replace inline script code from html in php not working

I have a html page stored in the mysql database. I get the html from the database and try to replace some of the inline javascript code from the html content. I tried using str_replace() but it does not replace the inline javascript code. I can replace other html content like divs but not inline javascript code.
How can I do find and replace the inline javascript code?
PHP should be seeing the entire HTML page as a big string, so in theory, it should be able to alter JS and HTML alike. Is it possible the string still has slashes, and your str_replace can't find the search criteria due to the slashes?
Try printing the entirety of the string to the screen to make sure, and if it does still have slashes, use a stripslashes($string) call to get rid of them.
You probably want to use a DOM parser to handle your webpage as a DOM structure, not a serialised string of HTML (where things like string replacement and regular expressions can be troublesome).

How can I store UTF8 in MySQL with PHP, sanitize it, echo it with XML and transform it with XSLT?

I am developing a MVC application with PHP that uses XML and XSLT to print the views. It need to be fully UTF-8 supported. I also use MySQL right configured with UTF8. My problem is the next.
I have a <input type="text"/> with a value like àáèéìíòóùú"><'##~!¡¿?. This is processed to add it to the database. I use mysql_real_escape_string($_POST["name"]) and then do MySQL a INSERT. This will add a slash \ before " and '.
The MySQL database have a DEFAULT CHARACTER SET utf8 and COLLOCATE utf8_spanish_ci. The table field is a normal VARCHAR.
Then I have to print this on a XML that will be transformed with XSLT. I can use PHP on the XML so I echo it with <?php echo TexUtils::obtainSqlText($value_obtained_from_sql); ?>. The obtainSqlText() function actually returns the same as the $value processed, is waiting for a final structure.
One of the first things that I will need for the selected input is to convert > and < to > and < because this will generate problems with start/end tags. This will be done with <?php htmlspecialchars($string, ENT_QUOTES, "UTF-8"); ?>. This will also converts & to &, " to " and ' to '. This is a big problem: XSLT starts to fail because it doesn't recognize all HTML special characters.
There is another problem. I've talked about àáèéìíòóùú"><'##~!¡¿? input but I will have some text from a CKEditor <textarea /> that the value will look like:
<p>
àáèéìíòóùú"><'##~!¡¿?
</p>
How I've to manage this? At first, if I want to print this second value right I will need to use <xsl:value-of select="value" disable-output-escaping="yes" />. Will "><' print right?
So what I am really looking for is how I need to manage this values and how I've to print. I need to use something if is coming from a VARCHARthat doesn't allows HTML and another if is a TEXT (for example) and allows HTML? I will need to use disable-output-escaping="yes" everytime?
I also want to know if doing this I am really securing the query from XSS attacks.
Thank you in advance!
This will be done with <?php htmlspecialchars($string, ENT_QUOTES, "UTF-8"); ?>.
Fine.
This is a big problem: XSLT starts to fail because it doesn't recognize all HTML special characters.
It shouldn't fail on htmlspecialchars() output, ever. & is a predefined entity in XML and ' is a character reference which is always allowed. htmlspecialchars() should produce XML-compatible output, unlike the usually-a-mistake htmlentities(). What is the error you are seeing?
àáèéìíòóùú"><'##~!¡¿?
Urgh, an HTML rich text editor produced that invalid markup? What a dodgy editor.
If you have to allow users to input arbitrary HTML, it's going to need some processing. Unless you really trust those users, you'll need a purifier (to stop them using dangerous scripting elements and XSS-ing each other), and a tidier (to remove malformed markup either due to crap rich-text-editor output or deliberate sabotage). If you intend to put the content directly into XML you will also need it to convert to XHTML output and replace HTML entity references.
A simple way to do this in PHP would be DOMDocument->loadHTML followed by a walk of the DOM tree removing all but known-good elements/attributes/URL-schemes, followed by DOMDocument->saveXML.
Will "><' print right?
Well, it'll print as in your example, yes. But that's equally invalid as both HTML and XML.

Categories