Saving raw html of a dynamically created page - php

I'm writing an application that would allow users to edit a calendar, its description and a few other things. I'm using jquery, php and mysql. Each time the user makes a change it asynchronously updates the database.
I'd like to give them the option of turning what they make into a pdf. Is there a way that I can post to my server the raw html of the page after the user makes changes?
I could regenerate the page using only php on the server, but this way would be easier if possible.

You can use this to get most of the HTML for the page:
var htmlSource = document.getElementsByTagName('html')[0].innerHTML;
However it'll lack the opening and closing HTML tags and doctype, which probably won't matter to you as you could recreate that very easily back on the server.
I'll assume you can just use the same AJAX you're already using to send htmlSource to the server once you've grabbed it.

You can certainly return the innerHTML from jQuery any object that you can select, although it doesn't seem like the best way to go (see other answers for alternatives).
Watch out for XSS attacks. If you just run the HTML back and forth without checking it first you are leaving yourself open to major risks.

Regenerating the page from the server is going to be your best bet. To have a good downloading experience, you'll want to be able to send headers for Content-Type and size.
To answer your question, I would use output buffering to capture the output of your scripts, and then use one of the many tools available for turning HTML to PDF.

Related

How to store user content while avoiding XSS vulnerabilities

I know similar questions have been asked but I am struggling to work out how to do it.
I am building a CMS, rather primitive right now, but it's as a learning exercise; in a production site, I would use an existing solution for sure.
I would like to take user input, which can be styled in a WYSIWYG editor. I would also like them to be able to insert images inline.
I understand I can store HTML in the database but how can I safely re-render this. I know there is no problem with the HTML being stored but it is my understanding that XSS become an issue if I were to just simply dump the user-generated code onto a layout template.
So the question put simply, is how can I store and safely rerender user content in cms? I am using Laravel and PHP. I also have a little knowledge of javascript if its required.
For a CMS where you want to allow some tags but not others, then you want something like HTML Purifier. This will take HTML and run it against a whitelist and regenerate HTML that is safe to display back to the user.
A good and cheap way to avoid cross-site scripting is to get your php program to entitize everything from your users' input before storing it in the database. That is, you want to take this entry from a user
Hi there sucker! I just hacked your site.
<script>alert('You have been pwned!')</script>
and convert it to this before putting it into your database.
Hi there sucker! I just hacked your site.
<script>alert('You have been pwned!')</script>
When you pass < to a browser, it renders it as <, but it doesn't do anything else with it.
The htmlentities() function can do this for you. And, php's htmlspecialchars_decode() can reverse it if you need to. But you shouldn't reverse the operation unless you absolutely must do so, for example to load the document into an embedded editor for changes.
You can also choose to entitize user-furnished text after you retrieve it from your database and before you display it. If you get to the point where several people work on your code, you may want to do both for safety.
You can also render user-provided input inside <pre>content</pre> tags, which tells the brower to just render the text and do nothing else with it.
(Use right-click Inspect on this very page to see how Stack Overflow handles my malicious example.)

Is XSS possible with handsontable and no PHP?

My webpage has the php extension, but there is no php code in it. There is handsontable and user would insert some numbers and get some cool JS effects on the same page. When I was writing the code for comments, I used strip_tags as a protection, but that was working for PHP. Now , I am curious if there is any danger leaving handsontable as it is?
Well the question is, can a visitor alter the content in a way that another visitor will load something to their browser that was not intended by developers. If it's purely client side and if you do not accept any user inputs, I think its pretty safe. If you have any unused php scripts, remove them.

Get contents of DOM via PHP

I need to get the contents of a website through PHP, however, the content is only available when JavaScript is enabled. The workaround that I am using now is making an applescript to open the website in Safari, and selecting all of the page content, copying it to the clipboard, and pasting it.
That will be really hard to achieve I guess. If you observe the JS on that page that is responsible for getting the content ready, you may discover its just another AJAX call that you may be able to call directly from your PHP script.
best possible solution: ask the website owner for api/export access ;)
If that is not possible, you can only pray that you can analyze the requests that are initialized via JavaScript and imitate them.
(possible tools: firefox with firebug or tamper data plugin).
Warning the owner of the website might not like this approach, in fact, it may be disallowed to scrape the data automatically
What do you mean by:
the content is only available when JavaScript is enabled
Does the page pull data from somewhere via JS? Would it be easier to analyse where the data is coming from and access that place directly from PHP?

Editing and Saving user HTML with Javascript - how safe is it?

For example I have a Javascript-powered form creation tool. You use links to add html blocks of elements (like input fields) and TinyMCE to edit the text. These are saved via an autosave function that does an AJAX call in the background on specific events.
The save function being called does the database protection, but I'm wondering if a user can manipulate the DOM to add anything he wants(like custom HTML, or an unwanted script).
How safe is this, if at all?
First thing that comes to mind is that I should probably search for, and remove any inline javascript from the received html code.
Using PHP, JQuery, Ajax.
Not safe at all. You can never trust the client. It's easy even for a novice to modify DOM on the client side (just install Firebug for Firefox, for example).
While it's fine to accept HTML from the client, make sure you validate and sanitize it properly with PHP on the server side.
Are you saving the full inline-html in your database?
If so, try to remake everything and only save the nessesary data to your backend. ALL fields should also be controlled if they are recieved in the expected way.
All inline-js is easily removed.
You can never trust the user!
Absolutely unsafe, unless you take the steps to make it safe of course. StackOverflow allows certain tags, filtered so that users can't do malicous things. You'll definately need to do something similar.
I'd opt to sanitize input server side so that everyone gets their input sanitized, whether they've blocked scripts or not. Using something like this: http://www.phpclasses.org/package/3746-PHP-Remove-unsafe-tags-and-attributes-from-HTML-code.html or http://grom.zeminvaders.net/html-sanitizer implemented with AJAX would be a pretty good solution

How to disable or encrypt "View Source" for my site

Is there any way to disable or encrypt "View Source" for my site so that I can secure my code?
Fero,
Your question doesn't make much sense. The "View Source" is showing the HTML source—if you encrypt that, the user (and the browser) won't be able to read your content anymore.
If you want to protect your PHP source, then there are tools like Zend Guard. It would encrypt your source code and make it hard to reverse engineer.
If you want to protect your JavaScript, you can minify it with, for example, YUI Compressor. It won't prevent the user from using your code since, like the user, the browser needs to be able to read the code somehow, but at least it would make the task more difficult.
If you are more worried about user privacy, you should use SSL to make sure the sensitive information is encrypted when on the wire.
Finally, it is technically possible to encrypt the content of a page and use JavaScript to decrypt it, but since this relies on JavaScript, an experienced user could defeat this in a couple of minutes. Plus all these problems would appear:
Search engines won't be able to index your pages...
Users with JavaScript disabled would see the encrypted page
It could perform really poorly depending the amount of content you have
So I don't advise you to use this solution.
You can't really disable that because eventually the browser will still need to read and parse the source in order to output.
If there is something SO important in your source code, I recommend you hide it on server side.
Even if you encrypt or obfuscate your HTML source, eventually we still can eval and view it. Using Firebug for instance, we can see source code no matter what.
If you are selling PHP software, you can consider Software as a Service (SaaS).
So you want to encrypt your HTML source. You can encrypt it using some javascript tool, but beware that if the user is smart enough, he will always be able to decrypt it doing the same thing that the browser should do: run the javascript and see the generated HTML.
EDIT: See this HTML scrambler as an example on how to encrypt it:
http://www.voormedia.com/en/tools/html-obfuscate-scrambler.php
EDIT2: And .. see this one for how to decrypt it :)
http://www.gooby.ca/decrypt/
Short answer is not, html is an open text format what ever you do if the page renders people will be able to see your source code. You can use javascript to disable the right click which will work on some browsers but any one wanting to use your code will know how to avoid this. You can also have javascrpit emit the html after storing this encoded, this will have bad impacts on development, accessibility, and speed of load. After all that any one with firebug installed will still be able to see you html code.
There is also very really a lot of value in your html, your real ip is in your server code which stays safe and sound on your server.
This is fundamentally impossible. As (almost) everybody has said, the web browser of your user needs to be able to read your html and Javascript, and browsers exist to serve their users -- not you.
What this means is that no matter what you do there is eventually going to be something on a user's machine that looks like:
<html>
<body>
<div id="my secret page layout trick"> ...
</div>
</body>
</html>
because otherwise there is nothing to show the user. If that exists on the client-side, then you have lost control of it. Even if you managed to convince every browser-maker on the planet to not make that available through a "view source" option -- which is, you know, unlikely -- the text will still exist on that user's machine, and somebody will figure out how to get to it. And that will never happen, browsers will always exist to serve their users before all others. (Hopefully)
The same thing is true for all of your Javascript. Let me say it again: nothing that you send to a user is secure or secret from that user. The encryption via Javascript hack is stupid and cannot work in any meaningful sense.
(Well, actually, Flash and Silverlight ship binaries, but I don't think that they're encrypted. So they are at the least irritating to get data out of.)
As others have said, the only way to keep something secret from your users is to not give it to them: put the logic in your server and make sure that it is never sent. For example, all of the code that you write in PHP (or Python/Ruby/Perl/Java/C...) should never be seen by your users. This is e.g. why Google still has a business. What they give you is fundamentally uninteresting compared to what they never send to you. And, because they realize this, they try to make most things that they send you as open as useful as possible. Because it's the infrastructure -- the Terrabyte-huge maps database and pathfinding software, as opposed to the snazzy map that you can click and drag -- that you are trading your privacy for.
Another example: I'm not sure if you remember how many tricks people employed in the early days of the web to try and keep people from saving images to disk. When was the last time you ran across one of those? Know why? Because once data is on your user's machine, she controls it. Not you.
So, in short: if you want to keep something secret from your user, don't give it to her.
You cant. The browser needs the source to render the page. If the user user wishes the user may have the browser show the source. Firefox can also show you the DOM of the page. You can obfuscate the source but not encrypt or lock the user out.
Also why would you want this, it seem like a lame ass thing to do :P
I don't think there is a way to do this. Because if you encrypt how the browser will understand the HTML?
No. The browsers offer no ability for the HTML/javascript to disable that feature (thankfully). Plus even if you could the HTML is still transmitted in plain text ready for a HTTP sniffer to read.
Best you could do would be to somehow obscure the HTML/javascript to make it hard to read. But then debuggers like Firebug and IE 8's debugger will reconstruct it from the DOM making it easy to read,
You can, in fact, disable the right click function. It is useless to do so, however, as most browsers now have built in inspector tools which show the source anyway. Not to mention that other workarounds (such as saving the page, then opening the source, or simply using hotkeys) exist for viewing the html source. Tutorials for disabling the right click function abound across the web, so a quick google search will point you in the right direction if you fell an overwhelming urge to waste your time.
There is no full proof way.
But You can fool many people using simple Hack using below methods:
"window.history.pushState()" and
adding oncontextmenu="return false" in body tag as attribute
Detail here - http://freelancer.usercv.com/blog/28/hide-website-source-code-in-view-source-using-stupid-one-line-chinese-hack-code
You can also use “javascript obfuscation” to further complicate things, but it won’t hide it completely.
“Inspect Element” can reveal everything beyond view-source.
Yes, you can have your whole website being rendered dynamically via javascript which would be encrypted/packed/obfuscated like there is no tomorrow.

Categories