PHP - html_entity_decode is not working when I change language - php

This is an odd problem for me. When I am using web site on English, html_entity_decode is working fine, but when I change language, the functions is kinda not working - HTML tags can be seen.
I am using trim(htmlentities($this->input->post('page_srb'))) to insert into DB, and <?php echo html_entity_decode($page->page) ?> to show page. What seems to be a problem?
This is sample of the page when I am using English language (at the moment I am using same text)
This is the same sample of the page when I change to Serbian language:

Never mix view and storage functions, it's bad practice, something i spent years forcing out of my peers in dev agencies.
Store your HTML in your database with the correct collation (utf-8?) then use html_entities_encode whenever you don't wish to output valid HTML for the browser to render (i.e. by default, whatever comes out of the database will be un-escaped/un-encoded that the browser will render).
By doing this,it allows clear separation and guidelines. Classic example is "what if someone edits your text directly in the DB?", you might say that never happens, but it MIGHT at some point, or someone might be able to insert data into that table via another form that doesn't encode data.
Define some programming rules and follow them. If your inserting data, then focus on protecting the store, if outputting, focus on protecting the client. Consistency will pay off in the long run.

Related

How to store user content while avoiding XSS vulnerabilities

I know similar questions have been asked but I am struggling to work out how to do it.
I am building a CMS, rather primitive right now, but it's as a learning exercise; in a production site, I would use an existing solution for sure.
I would like to take user input, which can be styled in a WYSIWYG editor. I would also like them to be able to insert images inline.
I understand I can store HTML in the database but how can I safely re-render this. I know there is no problem with the HTML being stored but it is my understanding that XSS become an issue if I were to just simply dump the user-generated code onto a layout template.
So the question put simply, is how can I store and safely rerender user content in cms? I am using Laravel and PHP. I also have a little knowledge of javascript if its required.
For a CMS where you want to allow some tags but not others, then you want something like HTML Purifier. This will take HTML and run it against a whitelist and regenerate HTML that is safe to display back to the user.
A good and cheap way to avoid cross-site scripting is to get your php program to entitize everything from your users' input before storing it in the database. That is, you want to take this entry from a user
Hi there sucker! I just hacked your site.
<script>alert('You have been pwned!')</script>
and convert it to this before putting it into your database.
Hi there sucker! I just hacked your site.
<script>alert('You have been pwned!')</script>
When you pass < to a browser, it renders it as <, but it doesn't do anything else with it.
The htmlentities() function can do this for you. And, php's htmlspecialchars_decode() can reverse it if you need to. But you shouldn't reverse the operation unless you absolutely must do so, for example to load the document into an embedded editor for changes.
You can also choose to entitize user-furnished text after you retrieve it from your database and before you display it. If you get to the point where several people work on your code, you may want to do both for safety.
You can also render user-provided input inside <pre>content</pre> tags, which tells the brower to just render the text and do nothing else with it.
(Use right-click Inspect on this very page to see how Stack Overflow handles my malicious example.)

Storing HTML template in an SQL database and executing it

I wish to store certain pieces of code in database tables as templates but I am unsure as to whether they are going to create problems or not. I keep reading mixed messages from various different people in different posts and I am just not happy that I am clear on this subject.
I have already worked out that you cannot really echo/ print PHP into a webpage. Obviously you can echo strings of HTML but it becomes awkward when you try to do it with PHP code. The only way I have managed to do this is through eval which is apparently bad in most cases... so I am using another method to implement the templates (i.e. writing a php file to be used as an include file)
The main question I am asking is: is there really a problem with storing the PHP code strings (which include SQL statements) inside text type fields (mediumtext, longtext etc) in tables? Could those SQL statements ever do anything like execute actual actions or would they just remain as text strings?
Just to clarify, the reason I am storing strings of code is because they are templates to be used should the web administrator wish to allocate them to a specific area (div) of the pages.
Use SMARTY or Twig template engine. This will neatly solve your problem and you will not need to store anything in the database. It will also keep your PHP code completely separate from your HTML.
Another option is to use
I can see the need for code in the database for instance if you have multiple sites and want to do a source control between them, and not use any 3rd party software.. I would store in a database and then write the code on to a actual physical page, then run the php from that page...
Do not do this. If your database is ever compromised and someone injects malicious PHP, it may be executed. You should store the templates as files and call them when needed.
And you actually can echo/print PHP. You would do it using eval.
The eval() language construct is very dangerous because it allows execution of arbitrary PHP code. Its use thus is discouraged. If you have carefully verified that there is no other option than to use this construct, pay special attention not to pass any user provided data into it without properly validating it beforehand.

HTML Entities Converting Very Strangely

I am working on a project that involves modifying some existing code and there is a behavior going on that makes absolutely no sense to me. I am hoping somebody has seen something similar and thus can provide some insight as to where the problem is originating from.
The best short example I can give is the following:
A user enters "This & that" into a textarea on an input form and when saved
once it becomes: "This &amp; that", when it is saved again it becomes:
"This &amp;amp;amp; that", save it again and you get:
"This &amp;amp;amp;amp;amp; that".
Obviously the problem continues to get worse with each save. The data actually stored in the DB (MySQL) is the text displayed above, there are no filters on the front-end to convert characters/entities. Obviously if they were being stored properly it would be very easy to slap a call to htmlspecialchars_decode() but that isn't an option yet...
Are there some front-end checks I can be doing to see where the symbols are being mangled? I am looking at the controller that processes the data and it's using a rest event to do so but no where do I see anything that would even try to convert the HTML entities, let alone something that would incorrectly convert them.
As I said in the intro, I hope somebody may have seen this before and can help pinpoint where it might be happening. This is built using PHP (Protean, MVC framework), Propel, patforms/smarty are in play, MySQL (via PDO) on the backend, jQuery for most JS-related stuff.
Your data is being htmlentities() too many times. This is a common, noobish mistake that usually involves urlencoding your data before sending to the database, and encoding it again upon retrieval. Once (on output) is enough. You should never encode it going in.
I hate to answer my own question here but it was in fact a bi-product of a set method buried in the framework that was causing the double encoding. I changed the data flow a bit and now everything is being stored properly and I can now just throw a htmlspecialchars_decode() around the output and life is good.
Thanks for the suggestions everyone!
-- N

Argument for PHP vs. DWT

I was having a "discussion" with my manager today about the merits of using PHP includes and functions as a template to build websites more quickly and efficiently. He has been using Dreamweaver templates for years and sees it as really the best way to go. I would like to start using some different and more efficient methods for web creation, because we need to get through our projects faster. I would like to know in detail what would make Dreamweaver dwts better than using code to accomplish the same task, or vice versa.
His reasoning is:
When you change links on the dwt file, it changes links for every page made from that dwt.
Even if you move pages around in directories, it maintains links to images
Everyone in the company should do it one way, and this is the way he chose (there are two of us, with someone who's just started who needs to learn web design from the beginning, and he plans to teacher her the dwt method)
If you edit a site made with a dwt, you can't change anything in the template (it's grayed out), making it safer
If he's building sites with dwt, and I'm doing it with PHP includes, we can't edit each others' sites. It gets all over the place. When we have new employees in the future, it will get all crazy and people can't make changes to others' sites if they're out of the office.
I've been studying PHP these days, and am thrilled with how powerful it is for creating dynamic pages. The site in question which sparked this "discussion" is more or less static, so a dwt would work fine. However, I wanted to stretch my wings a bit, and the code was getting REALLY jumbled as the pages grew. So I chopped off the header, footer, and sidebar, and brought them in to all the pages with a php include, as well as dynamically assigned the title, meta data, and description for each page using variables echoed in the header.The reasons I like this better are:
It's cleaner. If every page contains all the data for the header and footer, as well as the extra tags Dreamweaver throws in there, then I have to sift through everything to find where I need to be.
It's safer. It's sort of like the above reason dwts are safe, except I do all my code editing in a text editor like Coda. So on occasion I have accidentally deleted a dwt-protected line of code because those rules only apply within dreamweaver. I can't chop off part of the header if I can't see it. Incidentally, this makes it easier to identify bugs.
It's modern. I look through source when I see great pages made by designers and design firms I admire. I've never seen dwt tags. I believe by using PHP to dynamically grab files and perform other tasks that keeps me from having to go through and change something on every page, life becomes easier, and keeps things streamlined and up-to-date with current web trends and standards.
It's simple. This should be at the top of the list. Like I said we have to train a new person in how to create for the web. Isn't it much better for her to learn a simple line of PHP and get an understanding for how the language works, rather than learn an entire piece of (not exactly user-friendly) software just for the purpose of keeping her work the exact same as everyone else's? On that note, I believe PHP is a powerful tool in a web designer's arsenal, and it would be a sin to prevent her from learning it for the sake of uniformity.
It's fast. Am I mistaken in my thought that a page build with header and footer includes loads faster than one big page with everything in it? Or does that just apply when the body is loaded dynamically with AJAX?
I did extensive searching on Google and Stack Overflow on this topic and this is the most relevant article I could find:
Why would one use Dreamweaver Templates over PHP or Javascript for templating?
The answer is helpful, but I would really like to understand in more detail why exactly we shouldn't switch to a new method if it's simpler and has more potential. My manager insists that "the result is the same, so if there isn't something that makes me say, 'oh wow that's amazing and much better!', then we should just stay how we are now."
(I do apologize for the length of this question, but the guidelines asked that I be as specific as possible.)
Like I said in comments, without knowing what exactly sites you are working with it's hard to tell which PHP features are most important to showcase. However, I can try and describe the most simple kind of sites I was dealing with, and where and how PHP came in handy. If you work with something more complicated, the need of programming language may only increase.
The simple website may have a few different pages with text and images. I'm assuming nothing interactive (i.e. no inquiry form), no large amount of structured data (i.e. no product catalog), only one design template which is used by every page with no differences whatsoever. Here's the typical structure:
One PHP file (index.php) for handling all sorts of php-ish stuff
One design file (template.php for example) for storing everything html-ish (including header, footer and more. Basically all html with placeholders for text and menu)
One CSS file for, well, the site CSS
Most of the texts are stored in database or (worst case) just txt files. Menu (navigation) is stored in database as well
Images folder with all the needed images
The key features here are:
Simplicity. You only have as many files and code as you really need to keep things organized and clear
Reusability. You can basically copy/paste your php code with little to no changes for a new similar website
No duplicates whatsoever.
Data and design separation. Wanna change texts, fix typos? You do it without as much as touching design (html) files. Wanna make a completely brand new design for your website? You can do it without even knowing what those texts are or where they are kept.
like deceze said, no lock-ins. Use whatever software you like. (Including Dreamweaver)
PHP is responsible for taking texts, menus, design and rendering them all into a web page. If website is in more than 1 language, PHP code choose the right texts for the language of visitors choice.
If texts are stored in database, you don't even need notepad and ftp. You just need, i.e., phpMyAdmin (stored in server) so you can connect directly to database and edit any text you like using only web browser; from anywhere in the world. (I am assuming no real CMS). If you need to add one more page, you connect to database using myAdmin and browser, enter the page name (for menu) in 1 or more languages, enter the text for new page (in 1 or more languages), done! new page created, name placed in the menu, all hyperlinks generated for you. If you need to remove a page, you connect to database and click delete. If you need to hide a page for a while (i.e. for proof reading before publishing), you connect to database and uncheck "published" box.
All this doesn't come with just using database ofcourse, you need to program these features with PHP. It may take about 1 - 3 hours depending on experience and the code is fully reusable for every similar website in the future. Basically you just copy/paste php file, copy/paste database tables, enter new text and menu into database, put placeholders into your html file and done! brand new site created.
Which immediately makes most of the reasoning for DWT irrelevant. You don't move files around because you have only one html file and no directories, you don't need grayed out template because texts/images (content) and template are not even in the same file, there's no such thing as changing links in dwt file because it's PHP that generates them on the fly (these are not real links to real html files but rather links with parameters to tell PHP which exactly page must be rendered.. because remember we have just 1 file). The bottom line is, comparing features of the two side by side is like comparing features of a sword vs machinegun. Sharpness and length of the blade concepts are meaningless in a case of machinegun; while lifetime sword user won't really get the meaning of velocity and caliber before he tries and uses machinegun. And yet, while you can't compare their features one by one, no one brings sword to a gunfight for a reason :)
As for #3, currently there are many more people working with PHP than DWT (in a case you will need more employees in the future, or if other people will need to work with your websites later, etc.) As for #5, you can edit PHP websites with Dreamweaver as fine as DWT websites.
That's just off the top of my head. I wrote this in my lunch break so I likely forgot or missed quite a few things. I hope you will get a proper answer with detailed DWT vs PHP comparison too.
You simply can't compare PHP vs. DWT.
PHP is a programming language, where templating is just one of it's numerous features, and DWT is just a silly proprietary system to build simple web pages.
there is actually nothing to compare.
I would say that using DWT templates over PHP do have some advantages.
It does not need any extra server-side process, like PHP to process the files at the server.
You can serve all files to the user as .html files rather than .php files, though I suspect that it is possible to hide the .php extension. Why should any user see anything other than .html?
You don't have to learn PHP syntax/programming. It is true that you can do more with PHP that plain .dwt files but for plain templating the .dwt files can be just as clean.
It is not true that .dwt files are a lock-in technology. The feature is also implemented by other web editors, e.g. Microsoft Expression Web.

How should I store textual data that won't change very often?

As an exercise in web design and development, I am building my website from the ground up, using PHP, MySQL, JavaScript and no frameworks. So far, I've been following a model-view-controller design. However, there is one hurdle that I am quickly approaching that I'm not sure how I'm going to solve, but I'm sure it's been addressed before with varying degrees of success.
On my website, I'm going to have a resume and an "about me" bio section. These probably won't be changing very often.
For my resume, I think that XML that can be rendered into HTML (or any other format) is the best option, and in that case, I could even build a "resume manager" using PHP that can edit the underlying XML. A resume also seems like it could be built on top of MySQL, as well, and generated into XML or HTML or whatever output format I choose.
However, I'm not sure how to store my about me/bio. My initial idea was a plain text document that can be read it, parsed, and the line breaks converted to paragraphs. However, I'm not sold on that being the best idea. My other idea was using MySQL, but I think that might be overkill for a single page. What I do know, however
What techniques have you used when storing text for a page that will not change very often? How did they work out for you - what problems or successes did you have?
Like McWafflestix said, use HTML, if you want to output HTML. Simplest case within PHP:
<?php
create_header_stuff();
include('static_about.html');
create_footer_stuff();
?>
and in static_about.html something like
<div id="about">
...
</div>
Cheers,
Just use a static page, if the information won't change very often. Just using static HTML gives you more control over the display format.
Generally treating infrequently changing information the same as frequently changing information works well if you add one other component: caching.
Whatever solution you decide on for the back end, store the output in a cache and then check to see if the data has changed. Version numbers or modified dates work well here. If it hasn't changed, just give the cached data. If it has changed then you rebuild the content, cache it and display.
As far as structure goes, I tend to use text blobs in a database if there is any risk that there will be more dynamic databases. XML is a great protocol for communicating between services and as an intermediate step, but I tend to use a database under all my projects because eventually I end up using it for other things anyway.

Categories