I'm generating some content through an API, accessed by javascript, and I cannot grab the source code of what is plainly displayed, post-load, on the browser. I can highlight the text and view the source of selected text (which is a firefox feature), but I will be using CURL to capture the data automatically with php... How can I capture the data? Is there a way to update the source(maybe through a DOM update) so it displays some how? Any help is appreciated.
You can't just request some HTML source and expect the results of modifying it with JS to be in place without running the JS so if you want to get the content in PHP, then you will have to either
Push the HTML through something that will execute the JavaScript (I'd probably look to WWW::Mechanize::Firefox if I were using Perl, it uses Mozrepl. I don't know if PHP has a similar nice API for it)
Reverse engineer the JavaScript and do whatever it does to get the data yourself.
You can pull up the page source using Google Chrome from within developer tools (wrench in the top right -> Tools -> Developer tools, or Control+Shift+I (that's an uppercase i)). The source code shown in the developer tools represents the up-to-date source code of the page, including things that were generated dynamically by JavaScript after the page initially loads.
I'm sure other browsers have similar capabilities, I just happen to know Chrome's method off the top of my head.
If you developing environment is Linux/Unix, you could incorporate PjantonJS, which is a very nifty tool that executes the JavaScript and it passes the output. The way I would recommend doing this would be with a shell_exec() in witch you run you CLI PhantomJS.
Hope this helps.
Related
I am currently trying to load an HTML page via cURL. I can retrieve the HTML content, but part is loaded later via scripting (AJAX POST). I can not recover the HTML part (this is a table).
Is it possible to load a page entirely?
Thank you for your answers
No, you cannot do this.
CURL does nothing more than download a file from a URL -- it doesn't care whether it's HTML, Javascript, and image, a spreadsheet, or any other arbitrary data; it just downloads. It doesn't run anything or parse anything or display anything, it just downloads.
You are asking for something more than that. You need to download, parse the result as HTML, then run some Javascript that downloads something else, then run more Javascript that parses that result into more HTML and inserts it into the original HTML.
What you're basically looking for is a full-blown web browser, not CURL.
Since your goal involves "running some Javascript code", it should be fairly clear that it is not acheivable without having a Javascript interpreter available. This means that it is obviously not going to work inside of a PHP program (*). You're going to need to move beyond PHP. You're going to need a browser.
The solution I'd suggest is to use a very specialised browser called PhantomJS. This is actually a full Webkit browser, but without a user interface. It's specifically designed for automated testing of websites and other similar tasks. Your requirement fits it pretty well: write a script to get PhantomJS to open your URL, wait for the table to finish rendering, and grab the finished HTML code.
You'll need to install PhantomJS on your server, and then use a library like this one to control it from your PHP code.
I hope that helps.
(*) yes, I'm aware of the PHP extension that provides a JS interpreter inside of PHP, and it would provide a way to solve the problem, but it's experimental, unfinished, would be still difficult to implement as a solution, and I don't think it's a particularly good idea anyway, so let's not consider it for the purposes of this answer.
No, the only way you can do that is if you make a separate curl request to ajax request and put the two results together afterwards.
I'm developing a Wordpress site, which I'm fairly new to. I'm not sure if this is a stupid question or not but I haven't been able to return any decent google results regarding this. Anyway, is there a way to find out what PHP function is generating a piece of HTML code using a browser code inspector like Chrome's? Thanks!
No.
Once the data arrive to the browser, all the PHP code have been processed and you can't know what part of PHP generated which part of the HTML code.
No - not without modifying the php code to enable some kind of debugging. Chrome can only give you information about the received html document on the client side (you). But php code gets parsed server side.
You kind of can:
Download a copy of the theme and plugins folder
Open the page on your site that you want to find the function for.
Find a div/class that is specific to section e.g. <article>
Open a text editor like notepad++ (one that will allow you to search through multiple files at ones)
Use the find feature of chosen text editor and search for the div/class
The result will show you a list of pages where that term is.
Look through those pages for the function you are looking for (it might take a few goes)
The above it is a bit of a roundabout way of doing it, but I think other than looking through each file separately, it is you next best way.
I need to get the contents of a website through PHP, however, the content is only available when JavaScript is enabled. The workaround that I am using now is making an applescript to open the website in Safari, and selecting all of the page content, copying it to the clipboard, and pasting it.
That will be really hard to achieve I guess. If you observe the JS on that page that is responsible for getting the content ready, you may discover its just another AJAX call that you may be able to call directly from your PHP script.
best possible solution: ask the website owner for api/export access ;)
If that is not possible, you can only pray that you can analyze the requests that are initialized via JavaScript and imitate them.
(possible tools: firefox with firebug or tamper data plugin).
Warning the owner of the website might not like this approach, in fact, it may be disallowed to scrape the data automatically
What do you mean by:
the content is only available when JavaScript is enabled
Does the page pull data from somewhere via JS? Would it be easier to analyse where the data is coming from and access that place directly from PHP?
I have a small script that pulls HTML from another site using Javascript.
I want to include that static HTML that gets pulled in a PHP page without any of the Javascript code appearing in the final PHP page that gets displayed.
I tried doing an include of the file with the Javascript code in the PHP page, but it just included the actual Javascript and not the results of the Javascript.
So how would I go about doing this?
You would need to fetch the page, execute the JavaScript in it, then extract the data you wanted from the generated DOM.
The usual approach to this is to use a web automation tool such as Selenium.
You simply can't.
You need to understand that PHP and Javascript operate on different places, PHP on the server and Javascript on the client.
Your only solution is to change the way all this is done and use "file_get_contents(url)" from PHP to get the same content your javascript used to get. This way, there is no javascript anymore and you can still pre-process your page with distant content.
You wouldn't be able to do this directly from within PHP, since you'd need to run Javascript code.
I'd suggest passing the URL (and any required actions such as click event, etc) to a headless browser such as Phantom or Zombie, and capturing the DOM from it once the JS engine has done it's work.
You could also use a real browser, but of course you don't need a UI in your case, and it might actually get in the way of what you're trying to do, so a headless browser might be better.
This sort of thing would normally be used for automated testing of a site (ie Functional Testing).
There is a PHP tool named Mink which can run these sorts of scripts from within a PHP program. It is aimed at writing test scripts, but I would imagine you could use it for your purposes.
Hope that helps.
I want to change my HTML page as an image. Is there a way in PHP to change or save an HTML page as an image?
This is not easy; as NullUserException says in his comment, you would need to render the HTML page on the server-side, which is not something PHP (or any other server-sided language) has built in.
The approach that comes to mind would be to write a program (probably not in PHP, but rather something like C# or C++) that runs on your server, fires up a web browser, and does a series of screen captures (possibly combined with page scrolls). As this is a very nontrivial and bug-prone process, I would suggest looking into third-party components that are capable of doing this.
You would then execute this program from PHP, and when it's done running, display the results from the file it output.
I would advise you to use an external service with an api. This list might be a good start: http://blogs.sitepoint.com/2008/07/10/9-ways-to-put-site-screenshots-in-your-web-app/
Thumbalizr seems great, they allso provide a php script so you can cache the images locally:
http://www.thumbalizr.com/apitools.php
Try taking a look at browsershots.org - source code is available for it if you want to install it locally. Essentially it uses a browser to take screenshots, and can be controlled via an XML-RPC interface, which you can call from PHP.
As others have said this is not a simple job, and not something you can do directly in PHP, so use an external service.
(I'm not affiliated with browsershots.org in any way)