php and server side javascript - php

I am dealing with a problem where I need to do few thing at the SERVER SIDE using JAVASCRIPT (I am using php + apache combination )-
read source of url using curl
run it through some server side JavaScript and get DOM out of it
traverse and parse the DOM using pre-existing java script code.This code works fine in a browser.
I goggled and found http://pecl.php.net/package/spidermonkey , which allows us to run java script at server.is there any better way to achieve this? can we use Mozilla engine to get DOM out of HTML source code and process it using java script ?
Thanks in advance

You can check Jaxer.org, where you tell your javascript where to run.
alt text http://jaxer.org/images//Picture+4_0.png
hope it helps, Sinan.

PHP contains a DOM parser - I would recommend using this to achieve the same results, rather than using server-side Javascript.

You might want to use something else than Javascript, but if you really need this, you can run firefox under Xvfb and remote connect to it from php. It's not exactly trivial to set up, but it's possible.
You might want to try with something like SimpleBrowser instead.

You might want to try installing GromJS. But the success depends on complexity of your JS code. As far as I see, GromJS does not have DOM :(
A lot more complex project, Narwhal does have DOM and a lot more.
For more information, refer to Mozilla hub about ServerJS.

Related

cURL PHP - load a fully page

I am currently trying to load an HTML page via cURL. I can retrieve the HTML content, but part is loaded later via scripting (AJAX POST). I can not recover the HTML part (this is a table).
Is it possible to load a page entirely?
Thank you for your answers
No, you cannot do this.
CURL does nothing more than download a file from a URL -- it doesn't care whether it's HTML, Javascript, and image, a spreadsheet, or any other arbitrary data; it just downloads. It doesn't run anything or parse anything or display anything, it just downloads.
You are asking for something more than that. You need to download, parse the result as HTML, then run some Javascript that downloads something else, then run more Javascript that parses that result into more HTML and inserts it into the original HTML.
What you're basically looking for is a full-blown web browser, not CURL.
Since your goal involves "running some Javascript code", it should be fairly clear that it is not acheivable without having a Javascript interpreter available. This means that it is obviously not going to work inside of a PHP program (*). You're going to need to move beyond PHP. You're going to need a browser.
The solution I'd suggest is to use a very specialised browser called PhantomJS. This is actually a full Webkit browser, but without a user interface. It's specifically designed for automated testing of websites and other similar tasks. Your requirement fits it pretty well: write a script to get PhantomJS to open your URL, wait for the table to finish rendering, and grab the finished HTML code.
You'll need to install PhantomJS on your server, and then use a library like this one to control it from your PHP code.
I hope that helps.
(*) yes, I'm aware of the PHP extension that provides a JS interpreter inside of PHP, and it would provide a way to solve the problem, but it's experimental, unfinished, would be still difficult to implement as a solution, and I don't think it's a particularly good idea anyway, so let's not consider it for the purposes of this answer.
No, the only way you can do that is if you make a separate curl request to ajax request and put the two results together afterwards.

How to output DOM after all javascript loaded?

As title, my question is how to output (lets say save as a text file on server computer or pass the result to some other php function using ajax) all DOM content on a page?
I did some homework, I tried curl can just output all DOM content using "curl http://google.ca > dom.txt"
However, the this approach will not save contents that Javascript generated, in other words, the javascript code will not run.
Another approach is to embed some javascript code into a page and let the page load the website we want to output, then use the javascript code to save all DOM file after everything is loaded.
I am not sure if phantom.js can do such job, if yes, then how to?
Any body can give a detailed answer on how to achieve this?
I am open to any solutions, this program will run on my server to provide service.
Thank you in advance.
Why not:
jQuery(document).ready(function($) {
$.post(
'/your_filename.php',
'html='+$("html").html(),
function(response){
alert(response);
}
);
});
You can get the contents of the HTML element (including both head and body) using document.documentElement.innerHTML. If you need everything, you can concatenate document.doctype with document.documentElement.outerHTML.
Note that outerHTML isn't quite cross-browser (it works in IE and Chrome, but not Firefox) - for a way to simulate outerHTML for Firefox, see this question: How do I do OuterHTML in firefox?
Javascript is a client side language, so running it on a Server is going to require specialized technology. PHP actually has the ability to work with DOM stuff, as it can build and modify dom elements before transmitting to the client, read more about that here.
I'm not really sure what you are trying to accomplish by doing this, but it sounds like you are trying too hard: you are sending code to the client so that the client can turn around and send code back to the server so that the server can save it as a file? Although if that is what you need to do, follow Brilliand's and iambriansreed's advice to scoop up dom elements with Javascript/jQuery.

Language for web scraping JAVASCRIPT content

I think topic ask the question, I usually use PHP for parse/ web scraping, but I have really bad time scraping javascript most cases I cant do it
ex: Parse a div that appears when a javascript its executed.
I readed about RUBY, that have a parser library for javascript, so question is w is the languaje for program a web scraping that will effective scrap javascript generated content ?? Its here a library for PHP like the one for ruby for parse javascript content ?
There are a handful of strategies for this. Depending on your needs, consider pro grammatically instantiating a browser instance that you can hook into and read the page from.
The idea is, let the browser do the work, as the page is made for a browser and not your bot. You can then tap in and scrape away using a browser plugin that feeds data to your primary application running things.
This may be way overkill for what you need though. I'll leave it up to you to decide.
You should look at some GUI-less/headless browsers. There is some written for Java. I didn't find one for PHP.
Look at :
HTMLUnit
Golf
You can try using something like Selenium, which allows you to automate browser tasks.
On the other hand, you can go into details on what happens when the js code is executed. For example, if the js code is requesting something from the server by POSTing some data, you could emulate that in the regular fashion.
You should look at PhantomJS and CasperJS (headless browsers).
In the ruby world the gem for running Phantomjs would be poltergeist
There is another article about some of the options you have in ruby here too (however they are not all js capable)

Can I execute js files via php?

The situation is next:
I have php file, which parses a web-page. on that web-page is a phone number and it's digits are mixed with each other. The only way to put each digit on the correct place is to use some JS functions (on the client side). So, when I execute that php file in linux console, it gives me all that I need, except js function's result (no wonder - JavaScript is not a server-side language). So all I see from JS - only a code, that I have written.
The question: can I execute js files via php and how?
Results of a quick google search (terms = javascript engine php)
J4P5 -- not developed since 2005 [BAD](according to its News)
PECL package spidermonkey
a 2008 post by jeresig points to PHPJS but can't see when it was last updated.
I'm sure you'll find many more links on that google search.
Alternatively:
you say that the digits are scrambled and you need to "unscramble" them using js. Can you code that unscrambling logic into a PHP function and just use it? Will sure save you a lot of trouble, but if learning to use js in php is what you're after, then its a whole different story...
Zend tutorial: "Using javascript in PHP with PECL and spidermonkey"?
http://en.wikipedia.org/wiki/Comparison_of_Server-side_JavaScript_solutions
Alternatively, PHP has simple functions for executing other programs and retrieving their output (I used this along with GPG once to create a PHP file manager which could live-encrypt files as you were uploading them and live-decrypt as you were downloading them)
Using those functions along with http://code.google.com/p/v8/ you should be able to interpret any javascript.
Not unless you know someone who's implemented a Javascript engine in PHP.
In other words, probably not.
Without some sort of browser emulation or passing the unparsed js off to a server side implementation of javascript (maybe node.js?), you won't be able to execute it.
However, does the page use the same js function to unscramble the phone number every time? You should be able to read the incorrect digits and shuffle them with PHP.
If you're prepared to do a bit of work building your own JS runtime to work with it, Tim Whitlock has written a javascript tokenizer and parser in pure PHP
node.js is server-side... but full JS :) no PHP in it so I don't it answer your needs...
Anyway, here is an example : a chat in JS both client & server-side : http://chat.nodejs.org/
Plus, not every host allows you to use the v8 engine...
If you have Javascript data objects, and you need to convert them to/from PHP arrays, that's quite easy using PHP's json_encode() and json_decode() functions.
But actually running Javascript code? No. You can't. You might be able to find a JS interpreter written in PHP (a few other answers have pointed a links that may or may not help you here), or more likely execute the JS using a stand-alone JS interpreter on your server which you call out to from PHP. However if the JS code includes references to the browser's DOM (which is highly likely), that's a whole other set of issues which will almost certainly make it impossible.
Given the way you describe the question, I'd say the easiest solution for you would just be to re-implement the JS code as PHP code; it's unlikely that all the work arounds being suggested would be appropriate for what sounds like a fairly simple bit of utility code.

Convert HTML page to an image

I want to change my HTML page as an image. Is there a way in PHP to change or save an HTML page as an image?
This is not easy; as NullUserException says in his comment, you would need to render the HTML page on the server-side, which is not something PHP (or any other server-sided language) has built in.
The approach that comes to mind would be to write a program (probably not in PHP, but rather something like C# or C++) that runs on your server, fires up a web browser, and does a series of screen captures (possibly combined with page scrolls). As this is a very nontrivial and bug-prone process, I would suggest looking into third-party components that are capable of doing this.
You would then execute this program from PHP, and when it's done running, display the results from the file it output.
I would advise you to use an external service with an api. This list might be a good start: http://blogs.sitepoint.com/2008/07/10/9-ways-to-put-site-screenshots-in-your-web-app/
Thumbalizr seems great, they allso provide a php script so you can cache the images locally:
http://www.thumbalizr.com/apitools.php
Try taking a look at browsershots.org - source code is available for it if you want to install it locally. Essentially it uses a browser to take screenshots, and can be controlled via an XML-RPC interface, which you can call from PHP.
As others have said this is not a simple job, and not something you can do directly in PHP, so use an external service.
(I'm not affiliated with browsershots.org in any way)

Categories