HTML Entities Converting Very Strangely

HTML Entities Converting Very Strangely - php

I am working on a project that involves modifying some existing code and there is a behavior going on that makes absolutely no sense to me. I am hoping somebody has seen something similar and thus can provide some insight as to where the problem is originating from.
The best short example I can give is the following:
A user enters "This & that" into a textarea on an input form and when saved
once it becomes: "This &amp; that", when it is saved again it becomes:
"This &amp;amp;amp; that", save it again and you get:
"This &amp;amp;amp;amp;amp; that".
Obviously the problem continues to get worse with each save. The data actually stored in the DB (MySQL) is the text displayed above, there are no filters on the front-end to convert characters/entities. Obviously if they were being stored properly it would be very easy to slap a call to htmlspecialchars_decode() but that isn't an option yet...
Are there some front-end checks I can be doing to see where the symbols are being mangled? I am looking at the controller that processes the data and it's using a rest event to do so but no where do I see anything that would even try to convert the HTML entities, let alone something that would incorrectly convert them.
As I said in the intro, I hope somebody may have seen this before and can help pinpoint where it might be happening. This is built using PHP (Protean, MVC framework), Propel, patforms/smarty are in play, MySQL (via PDO) on the backend, jQuery for most JS-related stuff.

Your data is being htmlentities() too many times. This is a common, noobish mistake that usually involves urlencoding your data before sending to the database, and encoding it again upon retrieval. Once (on output) is enough. You should never encode it going in.

I hate to answer my own question here but it was in fact a bi-product of a set method buried in the framework that was causing the double encoding. I changed the data flow a bit and now everything is being stored properly and I can now just throw a htmlspecialchars_decode() around the output and life is good.
Thanks for the suggestions everyone!
-- N

Related

is it normal to send special characters encoded in post

I had a discussion with someone and could not come to a proper solution so I wanted to know how you guys think about this:
I have a html form and the other guy call him 'Aron' has got a .net system.
My html form has a input text field called description.
Aron his .net system catches my description Post and then changes this data in to XML.
BUT if a special character like & is posted, then he will get a parse error.
Now Aron is telling me that i need to post the & data as & and not as raw &.
What do you guys think about this?

For me it sounds more like a server-side problem. If I were you, I would create an object, serialize it to JSON, send it to this .NET application and let .NET developer do whatever he needs with it. I have sent proper data in accordance with my arhitecture and language.
It would be more proper and reasonable. Imagine the case - you don't work with Aron anymore, you work with Mark who takes your POST data and saves as plain text. He will ask "why are you sending HTML-encoded data to me? I do not need this". You definitely won't tell him "just decode it back" and you definitely don't want to change your code every time you change a partner. What if you work with both of them at the same time, or with 10 services at the same time?
As a consumer of a service, you should not bother about how this service is implemented. Moreover, you may now know how it is implemented, and it shouldn't affect your code.

I think the honourable gentleman Aron should use an appropriate library that deals with issues such as this and more, used correctly it should be almost automatic.
Failing this, he can use System.Web.HttpUtility.HtmlEncode() before converting to XML.
It can be done client side, using javascript to fiddle with the data before it gets sent... but this in my opinion is bad practice and should be handled by the server.
An alternative is encoding the html entities using htmlspecialchars() in PHP yourself, then using CURL to post the data to his ".net system".

PHP - html_entity_decode is not working when I change language

This is an odd problem for me. When I am using web site on English, html_entity_decode is working fine, but when I change language, the functions is kinda not working - HTML tags can be seen.
I am using trim(htmlentities($this->input->post('page_srb'))) to insert into DB, and <?php echo html_entity_decode($page->page) ?> to show page. What seems to be a problem?
This is sample of the page when I am using English language (at the moment I am using same text)
This is the same sample of the page when I change to Serbian language:

Never mix view and storage functions, it's bad practice, something i spent years forcing out of my peers in dev agencies.
Store your HTML in your database with the correct collation (utf-8?) then use html_entities_encode whenever you don't wish to output valid HTML for the browser to render (i.e. by default, whatever comes out of the database will be un-escaped/un-encoded that the browser will render).
By doing this,it allows clear separation and guidelines. Classic example is "what if someone edits your text directly in the DB?", you might say that never happens, but it MIGHT at some point, or someone might be able to insert data into that table via another form that doesn't encode data.
Define some programming rules and follow them. If your inserting data, then focus on protecting the store, if outputting, focus on protecting the client. Consistency will pay off in the long run.

TinyMCE, PHP and MySQL: security and escaping questions

I'm implementing TinyMCE for a client so they can edit front-end content via a simple, familiar interface in their site's admin panel.
I have never used TinyMCE before but notice that you are able to insert whatever markup you want and it will be happily saved off to the MySQL database, assuming you don't escape the contents of the TinyMCE before running it through your query.
You can even insert single quotes and have it break your SQL query entirely.
But of course, when I do escape the contents, benign presentational stuff like paragraph tags get converted to HTML entities and so the whole point of the WYSIWYG editor is defeated, because the entities are spat back out when it comes to displaying the stored content on the front-end.
So is there a way I can "selectively escape" content from TinyMCE, to keep the innocent tags like P and BR but get rid of dangerous ones like SCRIPT, IFRAME, etc.? I really don't want to have to manually encode and decode them using str_replace() or whatever, but I'd rather not give my client a gaping security hole either.
Thanks.

Have you tried htmlpurifier? works wonders. Its caveats; big and slow, but the best you can have.
http://htmlpurifier.org .

Sorry Dude, I'd say this a question for the authors of TinyMCE, so I suggest you ask at: http://tinymce.moxiecode.com/enterprise/support.php ... I'm sure they'll be only to happy to answer (for a small fee), and I suspect this may even be one of there FAQ's.
It's just that I'd guess you'd be very lucky if you hit another TinyMCE-user (let alone an authorative one) on stack-overflow, a "general programming forum"... although I notice there are currently 837 questions tagged "tinymce" on this forum; have you tried searching through them? Maybe there's a pointer in one of those?
Cheers. Keith.
EDIT: Yep, Making user-made HTML templates safe is more or less the same question posed in different words, and it has (what looks to ignorant me) a couple of answers which posit practical solutions. I just searched stack overflow for "Tiny MCE html security".

That's like complaining that you can write naughty words in Microsoft Word, and that Word should filter them for you. Or complain to GM that they build cars that then get used as escape vehicles in bank robberies. TinyMCE's job is to be an online editor, not to be the content police.
If you need to ban certain tags, then remove them when the document's submitted by using strip_tags(). Or better yet, HTMLpurifier for a more bullet-proof sanitization. If embedded quotes are breaking your SQL, then why weren't you passing the submitted document through mysql_real_escape_string() or using PDO prepared queries first? MCE has no idea what the server-side handling is going to be, nor should it care at all. It's up to you to decide how to handle the data, because only you know what its ultimate purpose is going to be.
In any case, remember that all those editors work on the client side. You can make TinyMCE as bulletproof and as strict an editor as you want, but it's still running on the client. Nothing says a malicious user can't bypass it entirely and submit all the embedded quotes and bad tags they want. The ultimate responsibility for cleaning the data HAS to fall on your code running on the server, as it's the last line of defense, and the only one that can ensure the database remains pristine. Anything else is lipstick on a pig.

JQuery AJAX Autocomplete issue with PHP and MySQL

I'm having a problem with my autocomplete. It works on another one of my pages, but on this one, it doesn't work. It's returning the correct number of entries, but they are all "blank" (or at least black so I can't see it), and selecting one does not put it into the text field either.
I'm using this: http://papermashup.com/jquery-php-ajax-autosuggest/
My page right now looks like
Any suggestions?
Thanks!
I'd post my code, but it's pretty much exactly what's on the site linked above, with some variables changed, and embedded into a PHP. Let me know if you want to see it (I don't want to paste it here and make the page huge and fugly).
Oh and this is taking it from a column in a MySQL database.

I think this link is quite useful to understand the technique. After you got the AJAX technique, you may simply change your php files which can run sql queries etc. You may show the results in that way with a simple div, very trivial css implementation would be enough. I think the hardest part is solved here:
http://www.w3schools.com/ajax/ajax_aspphp.asp

maybe your problem lies within encoding, jquery needs utf8 in the default settings, but without any code, I can only speculate...
try utf8_encode($output) instead of only returning the output...
also maybe your ajax request awaits a specific datastructure (json/xml/ etc)

Can a simple web form like this get hacked?

Hi I have a web form that sends a string to one php file which redirects them to a corresponding URL. I've searched about web form hacking and I've only received information about PHP and SQL... my site only uses a single PHP file, very basic etc. Would it be open to any exploits? I'm obviously not going to post the URL, but here is some code I was working on for the php file:
Newbie PHP coding problem: header function (maybe, I need someone to check my code)
Thanks

From that little snippet, I don't see anything dangerous. "Hackers" can enter pretty much anything they want into $_REQUEST['sport'] and thereby $searchsport, but the only place you use it is to access your array. If it's not found in your array.... nothing much will happen. I think you're safe in this limited scenario ;) Just be careful not to use $searchsport for...... just about anything else. Echoing it, or inserting it into a DB is dangerous.

Uh, it really depends. If you are inserting data into a MySQL DB without sanitizing, the answer is a huge yes. This is something you need to decide for yourself if you aren't going to show code.

The solution you've got in the linked question is pretty safe.
Every possible action is hardcoded in your script.
Nothing to worry about.
Though asking for the "web form like this" you'd better to provide a web form. Not the link to the question that contains a code that can be presumed as this form's handler.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

HTML Entities Converting Very Strangely - php

Your data is being htmlentities() too many times. This is a common, noobish mistake that usually involves urlencoding your data before sending to the database, and encoding it again upon retrieval. Once (on output) is enough. You should never encode it going in.

Related

is it normal to send special characters encoded in post

PHP - html_entity_decode is not working when I change language

TinyMCE, PHP and MySQL: security and escaping questions

JQuery AJAX Autocomplete issue with PHP and MySQL

Can a simple web form like this get hacked?

Categories

Resources