Don't think that I'm mad, I understand how php works!
That being said. I develop personal website and I usually take advantage of php to avoid repetion during the development phase nothing truly dynamic, only includes for the menus, a couple of foreach and the likes.
When the development phase ends I need to give the website in html files to the client. Is there a tool (crawler?) that can do this for me instead of visiting each page and saving the interpreted html?
You can use wget to download recursively all the pages linked.
You can read more about this here: http://en.wikipedia.org/wiki/Wget#Recursive_download
If you need something more powerful that recursive wget, httrack works pretty well. http://www.httrack.com/
Pavuk offers much finer control than wget. And will rewrite the URLs in the grabbed pages if required.
If you want to use a crawler, I would go for the mighty wget.
Otherwise you could also use some build tool like make.
You need to create a file nameed Makefile in the same folder of your php files.
It should contain this:
all: 1st_page.html 2nd_page.html 3rd_page.html
1st_page.html: 1st_page.php
php command
2nd_page.html: 2nd_page.php
php command
3rd_page.html: 3rd_page.php
php command
Note that the php command is not preceded by spaces but by a tabulation.
(See this page for the php command line syntax.)
After that, whenever you want to update your html files just type
make
in your terminal to automatically generate them.
It could seem a lot of work for just a simple job, but make is a very handy tool that you will find useful to automate other tasks as well.
Maybe, command line will help?
If you're on windows, you can use Free Download Manager to crawl a web-site.
Related
I'm trying to make sense on the best way to do automatize a series of things in a row in order to deploy a web-app and haven't yet came up with a suitable solution. I would like to:
use google's compiler.jar to minify my JS
use yahoo's yui-compressor.jar to minify my CSS
access a file and change a string so that header files like "global.css?v=21" get served the correct version
deploy the app (sftp, mercurial or rsync?) omitting certain directories like "/userfiles"
Can you guys put me on the right track to solve this?
Thank you!
you may want to check out phing http://phing.info/ (they are in the process of moving servers so may be down this weekend), but it can do all of what you are wanting and is written in php.
A quick google search should bring up plenty of tutorials to get you started.
You can run php from the command line to do all sorts of fun things.
$ php script_name.php arg1 arg2
See: command line, argv, argc, exec
Running PHP from the command line is very fast. I've been doing this a lot lately for various automation tasks.
I generally run Python projects so this may or may not be an option for you: but apart from writing your own scripts you could look into the following:
Fabric
Buildout
maven
I'm trying to make sense on the best way to do automatize a series of things in a row in order to deploy a web-app and haven't yet came up with a suitable solution. I would like to:
use google's compiler.jar to minify my JS
use yahoo's yui-compressor.jar to minify my CSS
access a file and change a string so that header files like "global.css?v=21" get served the correct version
deploy the app (sftp, mercurial or rsync?) omitting certain directories like "/userfiles"
Can you guys put me on the right track to solve this?
Thank you!
you may want to check out phing http://phing.info/ (they are in the process of moving servers so may be down this weekend), but it can do all of what you are wanting and is written in php.
A quick google search should bring up plenty of tutorials to get you started.
You can run php from the command line to do all sorts of fun things.
$ php script_name.php arg1 arg2
See: command line, argv, argc, exec
Running PHP from the command line is very fast. I've been doing this a lot lately for various automation tasks.
I generally run Python projects so this may or may not be an option for you: but apart from writing your own scripts you could look into the following:
Fabric
Buildout
maven
I need to make snapshots of web pages programmatically using PHP and get them into a HTML E-Mail.
I tried wget --page-requisites. It downloads everything all right, but it doesn't change the HTML page's source code to point to the downloaded files rather than the on-line originals. Also, that HTML is of course a long way from being displayed properly in a HTML E-Mail.
I am interested to know whether there are ready-made solutions for this. I would already be happy with a solution that takes a HTML snapshot and changes the HTML accordingly. Being able to E-Mail it would be the icing on the cake.
I control the web pages being snapshot, so I have the possibility to adjust the content to optimize the results.
My server-side platform is PHP but with very liberal settings, I can execute things like wget and Perl scripts from within PHP. I do however not have root access and can not install additional packages or programs.
The task is to make a snapshot of a product page each time somebody places an order, so there is documentation about what the page looked like at the time.
wget has a -k (--convert-links) option, which will convert both links and references to embedded content (like images). See e.g. wget advanced use (also here).
For the email-part of your question - I'm sure you can use one of the existing libraries. For example, PHP has some PEAR package (do no remember the exact name) to handle HTML emails; I'm pretty sure both Perl and Python have something similar.
In this case, you try to do a website mirroring using wget. The simple solution is to use httrack which is a simple command-line tool. It's very powerful and configurable, try it!
The httrack website presents a GUI, but you don't need it, all is possible from the command-line (or from PHP).
I've bumped into a problem while working at a project. I want to "crawl" certain websites of interest and save them as "full web page" including styles and images in order to build a mirror for them. It happened to me several times to bookmark a website in order to read it later and after few days the website was down because it got hacked and the owner didn't have a backup of the database.
Of course, I can read the files with php very easily with fopen("http://website.com", "r") or fsockopen() but the main target is to save the full web pages so in case it goes down, it can still be available to others like a "programming time machine" :)
Is there a way to do this without read and save each and every link on the page?
Objective-C solutions are also welcome since I'm trying to figure out more of it also.
Thanks!
You actually need to parse the html and all css files that are referenced, which is NOT easy. However a fast way to do it is to use an external tool like wget. After installing wget you could run from the command line
wget --no-parent --timestamping --convert-links --page-requisites --no-directories --no-host-directories -erobots=off http://example.com/mypage.html
This will download the mypage.html and all linked css files, images and those images linked inside css.
After installing wget on your system you could use php's system() function to control programmatically wget.
NOTE: You need at least wget 1.12 to properly save images that are references through css files.
Is there a way to do this without read and save each and every link on the page?
Short answer: No.
Longer answer: if you want to save every page in a website, you're going to have to read every page in a website with something on some level.
It's probably worth looking into the Linux app wget, which may do something like what you want.
One word of warning - sites often have links out to other sites, which have links to other sites and so on. Make sure you put some kind of stop if different domain condition in your spider!
If you prefer an Objective-C solution, you could use the WebArchive class from Webkit.
It provides a public API that allows you to store whole web pages as .webarchive file. (Like Safari does when you save a webpage).
Some nice features of the webarchive format:
completely self-contained (incl. css,
scripts, images)
QuickLook support
Easy to decompose
Whatever app is going to do the work (your code, or code that you find) is going to have to do exactly that: download a page, parse it for references to external resources and links to other pages, and then download all of that stuff. That's how the web works.
But rather than doing the heavy lifting yourself, why not check out curl and wget? They're standard on most Unix-like OSes, and do pretty much exactly what you want. For that matter, your browser probably does, too, at least on a single page basis (though it'd also be harder to schedule that).
I'm not sure if you need a programming solution to 'crawl websites' or personally need to save websites for offline viewing, but if its the latter, there's a great app for Windows — Teleport Pro and SiteCrawler for Mac.
You can use IDM (internet downloader management) for downloading full webpages, there's also HTTrack.
I'm not much of a programmer, PHP is where I'm comfortable. And sometimes I find that I need to do things, such as arrange files or rename files on a mass scale on my computer. And I think I could do this with PHP but I can't of course.
So I was curious, is there a way I could run PHP files as kind of exe files.
EDIT: Fairly important point, using Windows.
just use php.exe (put it in your path) and the name of the php file you want to execute
You should have a look at php gtk
It's not as bad as you put it. PHP may be a very good tool for string related stuff like parsing, renaming etc. Especially if you know PHP.
To use php as script you should add #!/path/to/php as first line and set execution permissions on unixoid systems. In windows you can simply assign the php file ending with your php cli exe so you can click on them or use the script with the "start" command in the windows shell. But make sure that you write your scripts in a way that it is sensible to the current working directory. It may be different to what you might expect sometimes.
to be able to execute php files with double click, just like normal programs, go to the command line, then type
ftype php_script "C:\path\to\php.exe" "%1"
assoc .php=php_script
Check out WinBinder
Sure, just add #!/path/to/php to the top of the file, add the code in tags, and run it as a shell script.
Works fine - the php binary you use is either the cgi one or the purpose built CLI version.
http://www.php-cli.com/
http://php.net/manual/en/features.commandline.php
it would appear so, yes.
Download Wamp Server, install it. Once thats done, add the path to the php.exe to your PATH settings. You can do this by going to control panel->system->change settings->advanced->environment variables. Edit the PATH, add a ';' to the end of the line and then past the path to the php.exe. This is on Vista, it might be different on XP or Windows 7.
My path looks like this after: C:\Sun\SDK\jdk\bin;C:\wamp\bin\php\php5.3.0
Once thats done, you'll be able to execute a php file from the command line. You could create shortcuts too.
C:\Users\Garth Michel>php test.php
This is a test
C:\Users\Garth Michel>
I used php for years as a scripting language before I even bothered to use it as a web programming language.
maybe php is not the right tool to doing this and it's about time to learn another language...use this chance to expand your programming horizion
See the .reg file in this gist, it makes it possible to use .php files exactly like .bat files, e.g. my_script.php foo bar baz
Don't forget to edit paths to suit your setup.
http://www.appcelerator.com/products/download/ Still use html & css as a desktop app and now has support for php.
PHP based web apps like Wordpress and Mediawiki I think uses php to setup and configure itself. Just give IIS proper read/write rights and you can make a simple web app that does massive renaming, etc.. PHP doesn't have to always be used for writing out html.
See ExeOutput for PHP. It has excellent customization and MAGNIFICENT features. Yet it is not an IDE or something. It compiles your PHP+HTML into fully-fledged EXEs.