Download web page with images and stylesheets and (optionally) E-mailing it

Download web page with images and stylesheets and (optionally) E-mailing it - php

I need to make snapshots of web pages programmatically using PHP and get them into a HTML E-Mail.
I tried wget --page-requisites. It downloads everything all right, but it doesn't change the HTML page's source code to point to the downloaded files rather than the on-line originals. Also, that HTML is of course a long way from being displayed properly in a HTML E-Mail.
I am interested to know whether there are ready-made solutions for this. I would already be happy with a solution that takes a HTML snapshot and changes the HTML accordingly. Being able to E-Mail it would be the icing on the cake.
I control the web pages being snapshot, so I have the possibility to adjust the content to optimize the results.
My server-side platform is PHP but with very liberal settings, I can execute things like wget and Perl scripts from within PHP. I do however not have root access and can not install additional packages or programs.
The task is to make a snapshot of a product page each time somebody places an order, so there is documentation about what the page looked like at the time.

wget has a -k (--convert-links) option, which will convert both links and references to embedded content (like images). See e.g. wget advanced use (also here).
For the email-part of your question - I'm sure you can use one of the existing libraries. For example, PHP has some PEAR package (do no remember the exact name) to handle HTML emails; I'm pretty sure both Perl and Python have something similar.

In this case, you try to do a website mirroring using wget. The simple solution is to use httrack which is a simple command-line tool. It's very powerful and configurable, try it!
The httrack website presents a GUI, but you don't need it, all is possible from the command-line (or from PHP).

Related

PHP as independent application ( binary, compile, pack, no php on host )

If I would like to distribute PHP application with installer(package system of OS) how should I proceed? I don't want PHP files to be there, just working application, so when I type 'app' into console, it ends up being launching application, without need to install PHP on system(no php installation on host required). I would also like the application to have patch-able byte-code, so it's in parts, loaded when needed and only part needs to be replaced on update.
What I would do now is following:
->Compile PHP with extensions for specific platform.
->Make binary application which launches '/full/php app' when app is launched.
->Pack it in installer in a way, that there would be binary added to path when added, launching specific installation of PHP which is alongside the app with argument of start point->App would be running.
Problem is:
Maybe I don't want my PHP files to be exposed(in application, there will be available source anyway) is there some ready made stuff to do this? Is there some better way than I proposed?
Alternative: Modifying OP Cache to work with "packing" application to deliver byte codes to modified OP Cache which just reads the cache.

My suggestion would be a tiny tool I just finished, for almost exactly the same problem. (Oh yes I tried all the others but they're old and rusty, sometimes they're stuck with 4.x syntax, have no support, have no proper documentation, etc)
So here's RapidEXE:
http://deneskellner.com/sw/rapidexe
In the classical way, it's not a really-real compiler, just a glorified packer, but does exactly what you need: the output exe will be standalone, carrying everything with it and transparently building an ad-hoc runtime environment. Don't worry, it all happens very fast.
It uses PHP 7.2 / Win64 by default but has 5.x too, for XP compatibility.
It's freeware, obviously. (MIT License.)
(Just telling this because I don't want anyone to think I'm advertising or something. I just took a few minutes to read the guidelines about own-product answers and I'm trying to stay within the Code of the Jedi here.)
However...
I would also like the application to have patch-able byte-code, so it's in parts, loaded when needed and only part needs to be replaced on update.
It's easier to recompile the exe. You can extract the payload pieces of course but the source pack is one big zip; there seems to be no real advantage of handling it separately. Recompiling a project is just one command.
Maybe I don't want my PHP files to be exposed(in application, there will be available source anyway)
In this case, the exe contains your source compressed but eventually they get extracted into a temp folder. They're deleted immediately after run but, well, this is no protection whatsoever. Obfuscation seems to be the only viable option.
If something goes wrong, feel free to comment or drop me a line on developer-at-deneskellner-dot-com. (I mean, I just finished it, it's brand new, it may misbehave so consider it something like a beta for now.)
Happy compiling!

PHP doesn't do that natively, but here are a few ideas:
Self-extracting archive
Many archival programs allow you to create a self-extracting archive and some even allow to run a program after extraction. Configure it so that it extracts php.exe and all your code to a temp folder and then runs ir from there; deleting after the script has complete.
Transpilers/compilers
There's the old HPHC which translates PHP code to C++, and its wikipedia age also contains links to other, similar projects. Perhaps you can take advantage of those.
Modified PHP
PHP itself is opensource. You should be able to modify it withot too much difficulty to take the source code from another location, like some resource compiled directly inside the php.exe.

Use Zend Guard tool that compiles and converts the plain-text PHP scripts into a platform-independent binary format known as a 'Zend Intermediate Code' file. These encoded binary files can then be distributed instead of the plain text PHP. Zend Guard loaders are available for Windows and Linux platform that enables PHP to run the scripts encoded by Zend Guard.
Refer to http://www.zend.com/en/products/zend-guard

I would like to add another answer for anyone who might be Googling for answers.
Peach Pie compiler/runtime
There is an alternative method to run (and build apps from) .php source codes, without using the standard php.exe runtime. The solution is based on C#/.NET and is actually able to compile php source files to .NET bytecode.
This allows you to distribute your program without exposing its source code.
You can learn more about the project at:
https://www.peachpie.io/

You've got 3 overlapping questions.
1. Can I create a stand-alone executable from a PHP application?
Answered in this question. TL;DR: yes, but it's tricky, and many of the tools you might use are semi-abandoned.
2. Can I package my executable for distribution on client machines?
Yes, though it depends on how you answer question 1. If you use the .Net compiler, your options are different to the C++ option.
3. Can I protect my source code once I've created the application?
Again, depends on how you answer question 1. Many compilers include an "obfuscator" option which makes it hard to make sense of any information you get from decompiling the app. However, a determined attacker can probably get through that (this is why software piracy is possible).

Converting a dynamic PHP/mySQL website to an archived HTML version?

I have a PHP/mySQL site that is no longer going to get any new content added. But I'd like to keep what I do have as an archive and keep it online. Ideally I'd like to convert it to a static site so that it no longer requires a database.
If anyone else has gone through this process, are there any tools, scripts, or methodologies that can automate this or at least make this easier? I'd want to be able to do things like make sure that all the links still work (so they'd have to somehow be converted to correctly point to the new static versions), things like that.
I have ssh access to the server in question. I'm relatively comfortable with both PHP and Python so tools using those languages would be ideal.
Note: there are two basic reasons I'm doing this:
cost, as it's much cheaper to host just a collection of static files than a dynamic website (I'm using NearlyFreeSpeech and with the bandwidth I'm using I estimate my costs would go down to well under $1/month).
spammers have somehow found my site and keep signing up for accounts (at which point, they're blocked from making comments anyway, but it's still annoying).

If you have shell access to any linux machine (perhaps even your own webserver would suffice), I'd recommend that you just spider and download a mirror of your own site using wget. Wget is a utility which is designed to mirror sites as flat files, and it has been in use for quite some time. I believe it should serve you well:
http://www.gnu.org/software/wget/manual/wget.html
I hope that's helpful.
Chris

I have recently used the following to good effect:
wget --mirror -w 2 -p --html-extension --convert-links -P folder_to_save_to http://mysite.com
You might need to use the full path to your wget script. This will change all the links so that your site is fully static and self contained.

Using PHP you could write a simple script that would do this:
Save current page.
Follow links from that page and saving those pages (and for each page repeat from 1).
Replace URLs on current page with those leading to saved pages.

Is there such a thing as a converter from php to html?

Don't think that I'm mad, I understand how php works!
That being said. I develop personal website and I usually take advantage of php to avoid repetion during the development phase nothing truly dynamic, only includes for the menus, a couple of foreach and the likes.
When the development phase ends I need to give the website in html files to the client. Is there a tool (crawler?) that can do this for me instead of visiting each page and saving the interpreted html?

You can use wget to download recursively all the pages linked.
You can read more about this here: http://en.wikipedia.org/wiki/Wget#Recursive_download

If you need something more powerful that recursive wget, httrack works pretty well. http://www.httrack.com/

Pavuk offers much finer control than wget. And will rewrite the URLs in the grabbed pages if required.

If you want to use a crawler, I would go for the mighty wget.
Otherwise you could also use some build tool like make.
You need to create a file nameed Makefile in the same folder of your php files.
It should contain this:
all: 1st_page.html 2nd_page.html 3rd_page.html
1st_page.html: 1st_page.php
php command
2nd_page.html: 2nd_page.php
php command
3rd_page.html: 3rd_page.php
php command
Note that the php command is not preceded by spaces but by a tabulation.
(See this page for the php command line syntax.)
After that, whenever you want to update your html files just type
make
in your terminal to automatically generate them.
It could seem a lot of work for just a simple job, but make is a very handy tool that you will find useful to automate other tasks as well.

Maybe, command line will help?

If you're on windows, you can use Free Download Manager to crawl a web-site.

C++ Serve PHP documents?

I am writing a small web server, nothing fancy, I basically just want to be able to show some files. I would like to use PHP though, and im wondering if just putting the php code inside of the html will be fine, or if I need to actually use some type of PHP library?
http://www.adp-gmbh.ch/win/misc/webserver.html
I just downloaded that and I am going to use that to work off of. Basically I am writing a serverside game plugin that will allow game server owners to access a web control panel for their server. Some features would be possible with PHP so this is my goal. Any help would be appreciated, thanks!

The PHP won't serve itself. What happens in a web server like Apache is before the PHP is served to the user it is passed through a PHP parser. That PHP parser reads, understands and executes anything between (or even ) tags depending on configuration. The resultant output, usually still HTML, is served by the web server.
There are a number of ways to achieve this. Modules to process PHP have been written by Apache but you do not have to use these. PHP.exe on windows, installed from windows.php.net, will do this for you. Given a PHP file as an argument it will parse the PHP and spit the result back out on the standard output.
So, one option for you is to start PHP.exe from within your web server with a re-directed standard output to your program, and serve the result.
How to create a child process with re-directed IO: http://msdn.microsoft.com/en-us/library/ms682499%28VS.85%29.aspx however, you won't be writing the child process, that'll be PHP.exe
Caveat: I am not sure from a security / in production use perspective if this is the most secure approach, but it would work.

PHP needs to be processed by the PHP runtime. I'm assuming the case you're talking about is that you have a C++ server answering HTTP queries, and you want to write PHP code out with the HTML when you respond to clients.
I'm not aware of any general-purpose PHP library. The most straightforward solution is probably to use PHP as a CGI program.
Here's a link that might be useful for that: http://osdir.com/ml/php-general/2009-06/msg00473.html
This method is nice because you don't need to write the HTML+PHP out to a file first; you can stream it to PHP.

You need execute the PHP page to serve the page it generates.
The easiest thing for you to do would be to add CGI support to your webserver in some basic form. This is non-trivial, but not too difficult. Basically you need to pass PHP an environment and input, and retrieve the output.
Once you have CGI support you can just use any executable, including PHP, to generate webpages.

Save full webpage

I've bumped into a problem while working at a project. I want to "crawl" certain websites of interest and save them as "full web page" including styles and images in order to build a mirror for them. It happened to me several times to bookmark a website in order to read it later and after few days the website was down because it got hacked and the owner didn't have a backup of the database.
Of course, I can read the files with php very easily with fopen("http://website.com", "r") or fsockopen() but the main target is to save the full web pages so in case it goes down, it can still be available to others like a "programming time machine" :)
Is there a way to do this without read and save each and every link on the page?
Objective-C solutions are also welcome since I'm trying to figure out more of it also.
Thanks!

You actually need to parse the html and all css files that are referenced, which is NOT easy. However a fast way to do it is to use an external tool like wget. After installing wget you could run from the command line
wget --no-parent --timestamping --convert-links --page-requisites --no-directories --no-host-directories -erobots=off http://example.com/mypage.html
This will download the mypage.html and all linked css files, images and those images linked inside css.
After installing wget on your system you could use php's system() function to control programmatically wget.
NOTE: You need at least wget 1.12 to properly save images that are references through css files.

Is there a way to do this without read and save each and every link on the page?
Short answer: No.
Longer answer: if you want to save every page in a website, you're going to have to read every page in a website with something on some level.
It's probably worth looking into the Linux app wget, which may do something like what you want.
One word of warning - sites often have links out to other sites, which have links to other sites and so on. Make sure you put some kind of stop if different domain condition in your spider!

If you prefer an Objective-C solution, you could use the WebArchive class from Webkit.
It provides a public API that allows you to store whole web pages as .webarchive file. (Like Safari does when you save a webpage).
Some nice features of the webarchive format:
completely self-contained (incl. css,
scripts, images)
QuickLook support
Easy to decompose

Whatever app is going to do the work (your code, or code that you find) is going to have to do exactly that: download a page, parse it for references to external resources and links to other pages, and then download all of that stuff. That's how the web works.
But rather than doing the heavy lifting yourself, why not check out curl and wget? They're standard on most Unix-like OSes, and do pretty much exactly what you want. For that matter, your browser probably does, too, at least on a single page basis (though it'd also be harder to schedule that).

I'm not sure if you need a programming solution to 'crawl websites' or personally need to save websites for offline viewing, but if its the latter, there's a great app for Windows — Teleport Pro and SiteCrawler for Mac.

You can use IDM (internet downloader management) for downloading full webpages, there's also HTTrack.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.