I read a lot of topics about scripts that compute html and output pdf; I tried lots of them, and I am always disapointed in the results. Lots of them don't consider the external CSS, lots of them can't be executed from shared hosting (need to be installed in some unaccessible places, like DOMPDF), etc. Also, lots of the threads on the question are pretty old (most of them've been asked in 2010).
Question: Is there a simple way to cURL (from a php script) a remote web page and simply save a pdf of the "print" (like in css media print) version of the page, or even a jpeg, or a docx, or anything that "contains" the images and the styling for offline viewing? And more important, can it be free/open source?
All the web browsers do that with no effort. Once on the page, only press ctrl-p and there it goes (almost). Why is it so trivial to find a good script that can do this? Is there a way to emulate a browser, or what...?
Isn't it possible to cURL and force css media print, then take a snapshot of this?
The difficulty to find this seems very strange to me... I feel like it's a quite simple task.
Try to call wkhtmltox from PHP.
wkhtmltox/bin/wkhtmltopdf www.stackoverflow.com stackoverflow.pdf
This PHP library seems to work with wkhtmltox:
http://thejoyofcoding.org/php-wkhtmltox/
This might help:
http://davidbomba.com/php-wkhtmltox/
Related
I'm looking for a tool to facilitate mulitple webpage file uploads from a single file browse dialogue. I know this has been asked previously, but I can't find anything current.
I'd like to check file size prior to upload, and I gather Flash is still the only way to do that cross-browser?
Ideally, I'd like an upload progress metre. I'll be using Linux and Apache servers, but don't have access to install add-ons such as PHP APC. Again, I assume something flash-based is the only option there?
I've looked at SWFUpload, but that appears to be another of these projects where the developers have become quite zealous and turned a simple concept into a full suite of tools for the masses. It seems quite cumbersome and I don't think I want to use it for my purpose.
I'd prefer not to have to write something from scratch for this. Could someone recommend me something or perhaps suggest a non-Flash alternative if there is one? I do need full cross-browser compatibility without too many layers of degradation, so anything HTML 5 probably isn't what I want.
Thanks
As I mentioned earlier today ( Multiple file upload (client side) )
I am a big fan of Plupload which can check file size, show progress bar, single dialog for multiple files, and supports things other than Flash if needed.
Is there any library/module available with which i can generate images from a swf file?
The problem i am trying to solve is: I want to create a pdf from a web page and i am having problems in doing that when the web page has swf (question in stackoverflow)
I am thinking if i am able to read a swf and write out an image, i will be able to solve the above the problem
Will appreciate your help, Thanks
In fact, that is quite a hard task. I've spent hours looking for a program that could fulfill such a task. However, I eventually only found one. It isn't open-source but would really help you I guess.
Flash Animation Source can output all frames of a SWF file. It uses a DirectShow filter to do so. Therefore, a Windows operating system is required should you not want to use WINE.
In short, you'll just need to install Flash Animation Source on your computer and then find a way to get DirectShow frames using your desired programming language. Everything else is actually quite easy. You tell Windows the directory and the name of your SWF file and it'll do the rest, thus delivering you with an image. And did you know DirectShow filters can deliver all frames of a video? Therefore, you can choose which one you'd like to have.
By the way, please don't try to find another way to get an image of your SWF file. Believe me, you won't find one. I have looked for an open-source program that fits my needs but all of them fail. You need to use the propriertary Adobe Flash player for your outputs. There is no other option as the open-source alternatives still need a lot of development to genuinely output the vector-based frame as it is.
I was wondering if there was any way of turning an entire HTML page into a png (or other kind of image?) I'm trying to create PDFs on the fly, but it's pulling across my styles as text, but I want the styles to stay the same as the page (cufon and all). Any help would be appreciated! :)
This doesn't look straightforward. The backend (PHP etc.) doesn't do rendering, layout. It merely generates content.
The layout and visual aspects of the website are done by your client (browser) and the backend has no way of accessing this.
However, given an HTML file, there are libraries that can render it into a PDF like Prince XML that seem to be capable of this.
The only way to generate an image identical, or even near, what a visitor sees in their browser when viewing your site is to launch a browser and take a screenshot. You need the browser's rendering engine to render the page. All the libraries you find to do it without a browser create something much different than what the visitor sees, and won't render cufon or other fancy things at all.
Companies that offer screenshot previews of a webpage now run many servers, each running many virtual PCs, each running a full operating system and real web browser. They have all those systems pulling jobs, opening the webpages in real browsers, taking screenshots and saving images. You won't replicate that with a little PHP script.
http://ipinfo.info/html/rendering_services.php
Turning web pages into images and PDFs is a royal pain using PHP. Solutions often require OS level scripting, fake printer drivers, or screen capturing, which can make for a rather fragile setup. I ran into the same issue a few years ago and started working on native PHP extension that leveraged the Gecko engine to render HTML to PDF, but never finished it.
The best answer I've seen doesn't quite turn a full web page into a PDF, but instead does XML to PDF. XEP by RenderX is the commercial tool Apple uses to produce developer documentation in many formats, including HTML and beautifully rendered PDFs, from an XML source. The great thing about using the XEP tool in conjunction with PHP is that PHP deals with XML very well, so you can pass generated XML to the XEP binary, let it do the conversion to PDF, then deal with the resulting PDF file in PHP.
consider building a regular PDF file that resembles your web page:
PHP::PDF - constructing using php.
PDF Reference - file structure.
I've bumped into a problem while working at a project. I want to "crawl" certain websites of interest and save them as "full web page" including styles and images in order to build a mirror for them. It happened to me several times to bookmark a website in order to read it later and after few days the website was down because it got hacked and the owner didn't have a backup of the database.
Of course, I can read the files with php very easily with fopen("http://website.com", "r") or fsockopen() but the main target is to save the full web pages so in case it goes down, it can still be available to others like a "programming time machine" :)
Is there a way to do this without read and save each and every link on the page?
Objective-C solutions are also welcome since I'm trying to figure out more of it also.
Thanks!
You actually need to parse the html and all css files that are referenced, which is NOT easy. However a fast way to do it is to use an external tool like wget. After installing wget you could run from the command line
wget --no-parent --timestamping --convert-links --page-requisites --no-directories --no-host-directories -erobots=off http://example.com/mypage.html
This will download the mypage.html and all linked css files, images and those images linked inside css.
After installing wget on your system you could use php's system() function to control programmatically wget.
NOTE: You need at least wget 1.12 to properly save images that are references through css files.
Is there a way to do this without read and save each and every link on the page?
Short answer: No.
Longer answer: if you want to save every page in a website, you're going to have to read every page in a website with something on some level.
It's probably worth looking into the Linux app wget, which may do something like what you want.
One word of warning - sites often have links out to other sites, which have links to other sites and so on. Make sure you put some kind of stop if different domain condition in your spider!
If you prefer an Objective-C solution, you could use the WebArchive class from Webkit.
It provides a public API that allows you to store whole web pages as .webarchive file. (Like Safari does when you save a webpage).
Some nice features of the webarchive format:
completely self-contained (incl. css,
scripts, images)
QuickLook support
Easy to decompose
Whatever app is going to do the work (your code, or code that you find) is going to have to do exactly that: download a page, parse it for references to external resources and links to other pages, and then download all of that stuff. That's how the web works.
But rather than doing the heavy lifting yourself, why not check out curl and wget? They're standard on most Unix-like OSes, and do pretty much exactly what you want. For that matter, your browser probably does, too, at least on a single page basis (though it'd also be harder to schedule that).
I'm not sure if you need a programming solution to 'crawl websites' or personally need to save websites for offline viewing, but if its the latter, there's a great app for Windows — Teleport Pro and SiteCrawler for Mac.
You can use IDM (internet downloader management) for downloading full webpages, there's also HTTrack.
I have a few sites I built for my work, nothing major, mainly just little tools which people can access and use when they're out of the office. I'm not very experienced as a developer but I like to tinker quite a lot and I was wondering if anyone had any clever little tweaks I could do to my sites to make them download faster? We have an office in south america with a poor internet connection who constantly complain my sites take too long to use. So far I have found the following site which was quite useful and the guys in the other office said they'd seen a difference in the service www.dev-explorer.com/articles/apache-optimisation
Anyone kno of any more little bits and pieces I could do?
Any help is much appreciated.
Thanks in advance
John
Look into YSLOW and read the Yahoo Dev blog. You can do a lot by optimizing the front-end.
Limit the number of http requests (css, js, images)
Use mod_deflate in apache to gzip your content
Use a far-future expires header whenever possible
Make your HTML markup as lean as possible
2 things (from YSlow) that will help are a CDN (Content Delivery Network)... and cookie-less servers for static content.
Even if you can just push your images to load off another server you'll be able to load your HTML content faster while image downloading can happen in the background from the other server(s).
Try to have these other servers (for images, CSS, and Scripts) be cookie-less if possible, its a minor saving, but it sounds like you're trying to squeeze every last drop. ;-)
and of course, cache everything except your HTML.
I'd got for yslow, as already said, and better (because is what yslow is based on) the Yahoo Exceptional Performance Team best practices
A few simple tricks:
Firstly, limit yourself to exactly one CSS and one Javascript file. No more. If you have multiple compact them into one (each). Ideally, your Javascript should also be minified. I've been using JSMin for this lately.
There are some more advanced techniques to optimize this further. You set the expires header to far in the future so the browser doesn't download it as often. To push changes you need to change the link to the css/js file though. You can do this with Apache mod_rewrite and a small PHP script.
More on this in What is an elegant way to force browsers to reload cached CSS/JS files?
You can also use the expires trick on images.
Secondly, gzip your constent. Typically all you have to do for this in PHP is start all your scripts wiht:
ob_start('ob_gzhandler');
This turns on output buffering (good idea anyway) and if the browser says that it supports gzip encoding, your script will be gzipped before sending it to the client.