In my project I have to do some screen scrapping.The source pages return data after executing the javascript embedded within them.In my php script I fetch the page using file_get_contents() and as usual,it returns the page simply as text.My question is that,is there a way to get the final output from the webpage (the output after executing javascript).
I know some of you might suggest embedding a webbrowser inside and using that to execute the page.But how to do that?.Is there a working browser available.Or is there executable non GUI versions of opensource browsers such as chromium,so that I can run it as a CGI script or something
You will have to have some real browser like client for this, php alone won't cut it. For automation purposes you are most likely want a "headless" (without gui) browser like PhantomJS (the new hotness). Check out this answer.
Related
Whenever I have to test a design change on an HTML page that's written within an PHP file, I have to make the effort of first uploading it to the webpage and then viewing it through the browser. Normally, if it was an HTML file, you'd just update the browser and see the changes locally. Of course, it's not possible to execute PHP on a browser unless you use XAMPP or whatever, but I consider that to be too much for simple debug. Isn't there a way to open a PHP file as just HTML, ignoring all PHP code for debugging purposes? If not, then I'll just get it with XAMPP. Thanks in advance.
PHP has a built-in webserver, you can utilize that i guess.
php -S localhost:8080
If the script does not require the http stack (which i doubt), you could also do:
php index.php > index.html
Update:
After you executed the command above (php -S), simply type the same url in your browsers address bar.
I want to write a program to contact a PHP script online, and show the HTML output. Basically, it will be a browser that only access the programmed URL.
Similar to Seva's answer but a little better integrated with the Windows GUI. I would put together a simple HTA that used JavaScript to load the required page into an IFRAME. HTA's are executable and provide a simple GUI that is basically IE without all of the controls.
A google search for "Microsoft HTA" should get you going, clearly you already know HTML so putting an HTA together will be easy as it is just HTML and some scripting in either JavaScript or VBscript.
http://technet.microsoft.com/en-us/scriptcenter/dd742317.aspx
You don't need a program. Just execute the URL as a file. In Windows, that'll open the browser. If you need to ship at least something, ship a BAT file with the following contents:
#start http://www.mysite.com/
The builtin start command means "find the right executable for the specified parameter and call it, passing the parameter to it". The default browser is the one registered to handle the http schema.
Or you can ship a URL shortcut. To make one, open the site in IE, and drag from the address bar to the desktop. That'll give you a file with a .URL extension that you can
"execute", either from the shell or from the command line.
It doesn't matter if there is a PHP or Python or anything else on server because scripts give their output in HTML.
What you need is some kind of HTTP-client library like libcurl
There are lots of web pages which simply run a script without having any material on them.
Is there anyway of seeing the page source without actually visiting the page because it just redirects you ?
Will using an html parser work to do this ? I'm using simpleHTMLdom to parse the page ?
In firefox you can use the view-source protocol to view only the sourcecode of a site without actually rendering it or executing JavaScripts on it.
Example: view-source:http://stackoverflow.com/q/5781021/298479 (copy it to your address bar)
Yes, simple parsing the HTML will get you the client-side (Javascript) code.
When these pages are accessed through a browser, the browser runs the code and redirects it but when you access it using a scraper or your own program, the code is not run and static script can be obtained.
Ofcourse you can't access the server side (php). That's impossible.
If you need a quick & dirty fix, you could disable JavaScript and Meta redirects (Internet Explorer can disable these in the Internet Options dialog. Firefox can use the NoScript add-in for same effect.)
This won't any server-side redirects, but will prevent client-side redirects and allow you to see the document's HTML source.
The only way to get the page HTML source is to send HTTP request to the web server and receive answer which is equal to visiting the page.
If you're on a *nix based operating system, try using curl from the terminal.
curl http://www.google.com
wget or lynx will also work well if you have access to a command line linux shell:
wget http://myurl
lynx -dump http://myurl
If you are trying to HTML-Scrape the contents of a page that builds 90%+ of its content/view through executing JavaScript you are going to encounter issues unless you are rendering to a screen (hidden) and then scraping that. Otherwise you'll end up scraping a few script tags which does you little good.
e.g. If I try to scrape my Gmail inbox page, it is an empty HTML page with just a few scattered script tags (likely typical of almost all GWT based apps)
Does the page/site you are scraping have an API? If not, is it worth asking them if they have one in the works?
Typically these types of tools run along a fine line between "stealing" information and "sharing" information thus you may need to tread lightly.
I'm developing a Web Application and I got stuck at this:
I want to create a simple tag that triggers the execution of a local program
such as gedit, mozilla firefox, etc.
My project is based on HTML, Javascript and PHP.
I'm aware that Javascript doesn't allow this kind of execution, but perhaps PHP does?
Thank you!
PHP code runs on the server; it has no influence over what happens in the web browser. All it can do is generate HTML and Javascript for the web browser to process, it can't take any action on the client machine directly, so no, there's no way to do this.
You can do that with exec() function but the program has beign executed on the server doesnt on the client side
I have a server with a PHP script that pulls data from a source and populates a database. I need to call this PHP script repeatedly, each time with a different parameter.
I need to create a shell script on a Mac (which reads in a text file with the list of parameters - that part is not a problem) and for each parameter, runs the PHP script/URL in a web browser.
The PHP is on a remote server, so I need to load a web browser instance (safari or firefox) and instruct it to load the URL (ie. something like http:/myserver.com/scriptname.php?param1). Then I need to wait for it to complete and trigger the same URL with the next parameter.
I don't know the incantation to launch the web browser with a URL (I am a former Windows dev not a Mac OS-X pro, yet). I also don't think there is a way to detect when the script completes - but I don't want to end up with 100 instances of the browser running simultaneously.
Any help would be greatly appreciated!
If you just need it to hit a php page on a remote server, don't use a browser. Use curl, or some equivalent.
curl http:/myserver.com/scriptname.php?param1
open -a Safari http://stackoverflow.com