I want to get the html source of a webpage genereted by javascript using Curl(PHP)
I tried the curl but I get just a javascript code :(
Can I use ruby to resolve my problem ?!
The javascript is executed by the browser to generate the HTML. If you make a request with CURL it will just show you the actual HTML content.You would need a Javascript engine to process the Javascript after receiving the response body.
just look at any web inspector tools (in chrome just ctrl+shift+i). here you can see the changes that the javascript has on the page reflected. I dont think curl or any curl-like-tool can do this.
This is a tough problem because the JavaScript has to run to get the right code. What I would say is download all the code locally and then add in an ajax call to the code, so it can ajax the source back to you after all the js has run. Then run the code in a browser.
If you need to do this a bunch of times you could queue these pages that needed to be loaded into a db and load all the pages using php. Then once the js has ajax'd the code back to the server it can refresh and pull the next page off the queue.
Let me know if you need me to clarify anything.
This can be done by headless browser activity like phantom js a great way to create your own logic whatever you want then get result array in console for php you can try activity here https://github.com/jonnnnyw/php-phantomjs & also https://github.com/ariya/phantomjs
Related
Please let me know is it possible to scrap some info after ajax loaded with PHP? I had only used SIMPLE_HTML_DOM for static pages.
Thanks for advice.
Scraping the entire site
Scraping Dynamic content requires you to actually render the page. A PHP server-side scraper will just do a simple file_get_contents or similar. Most server based scrappers wont render the entire site and therefore don't load the dynamic content generated by the Ajax calls.
Something like Selenium should do the trick. Quick google search found numerous examples on how to set it up. Here is one
Scraping JUST the Ajax calls
Though I wouldn't consider this scraping you can always examine an ajax call by using your browsers dev tools. In chrome while on the site hit F12 to open up the dev tools console.
You should then see a window like the above. Hit the network tab and then hit chrome's refresh button. This will show every request made between you and the site. You can then filter out specific requests.
For example if you are interested in Ajax calls you can select XHR
You can then click on any of the listed items in the tabled section to get more information.
File get content on AJAX call
Depending on how robust the APIs are on these ajax calls you could do something like the following.
<?php
$url = "http://www.example.com/test.php?ajax=call";
$content = file_get_contents($url);
?>
If the return is JSON then add
$data = json_decode($content);
However, you are going to have to do this for each AJAX request on a site. Beyond that you are going to have to use a solution similar to the ones presented [here].
Finally you can also implement PhantomJS to render an entire site.
Summary
If all you want is the data returned by specific ajax calls you might be able to get them using file_get_contents. However, if you are trying to scrape the entire site that happens to also use AJAX to manipulate the document then you will NOT be able to use SIMPLE_HTML_DOM.
Finally I worked around my problem. I just get a POST url with all parameters from ajax call and make the same request using SIMPLE_HTML_DOM class.
I'm using the simple html dom class and have gotten it to work on basic pages and can view the information I want. However, when I attempt to use it on a page that reloads a div with ajax, I can't seem to get it to "wait" before reading the page.
I basically want it to load the page, then wait 2 seconds before reading the page content (so that the new div has time to load). Is this possible or am I trying to use the class incorrectly? I'm manually inputting the URL, so it's not a link issue.
Example Page:
- You can see the load issue when you navigate through the pages.
Someone suggested curl and I tried that with the same results.
Thanks in advance.
PHP runs on the server. JavaScript (e.g. AJAX) runs in the browser, after the PHP code on the server has finished producing the page. You can't make a PHP program, running on the server, wait for an event that happens later in the browser.
You'll need to either load the content for that div using PHP code, or replace the PHP DOM-parsing code with JavaScript code that does the work on the client.
You can use the Sleep method ( http://php.net/manual/en/function.sleep.php ) if you simply want to delay the program execution for some set amount of time.
I'm currently using curl to login to a site and grab the html for one of the pages. My problem is that the page has some ajax links on it (click on the link results to html changes). How would I be able to make the clicks of the link and get the html of the final state using php? Seems like from researching this I need some sort of headless browser? Is there something like that in php I can use?
I'm not aware of any headless browsers that supports Javascript/AJAX that you can drive with PHP. If you want to drive a real browser with PHP, see http://seleniumhq.org/
Had this exact problem a few minutes ago. This works like a charm. Use .live() as the top answer here explains.
Reload javascript file after an AJAX request
Tested and works.
I Have to load a html file using jquery. When i referred google, i got the snippet
$("#feeds").load("feeds.html");
But I dont want to load the content of feed.html into feeds object. Instead I need to load that page entirely. How to load that page. Help me plz
If you're not wanting to load() some HTML into an element on the existing page, maybe you mean that you want to redirect to another page?
url = "feeds.html";
window.location = url;
Or maybe you just want to fill an entire body? You could load() into the body tag if you wanted.
$("body").load("feeds.html");
$.get()
Here's a summary:
Load a remote page using an HTTP GET
request. This is an easy way to send a
simple GET request to a server without
having to use the more complex $.ajax
function. It allows a single callback
function to be specified that will be
executed when the request is complete
(and only if the response has a
successful response code). If you need
to have both error and success
callbacks, you may want to use $.ajax.
$.get() returns the XMLHttpRequest
that it creates. In most cases you
won't need that object to manipulate
directly, but it is available if you
need to abort the request manually.
Are you sure feeds.html is located under the same domain as your script?
If so, that line is ok, you'll search for problem anywhere else. Try to debug it with Firebug under Net panel.
If not, so you are able to send only JSON requests to urls under other domains. But you can easily write a proxy script with PHP (noticed php tag under your question :) ) to load your content with cURL. Take a look at CatsWhoCode tutorial at point 9, hope it'll help.
How do I create a loading overlay(with loading gif) while php script runs and returns data
PHP is a server-side language and what you're looking for is something that interacts with the browser on the clientside.
You're probably best of using a solution involving AJAX - for example using Jquery:
When the user loads the page, make an AJAX call that runs your script. Show a div that displays your 'loading' gif. When the AJAX call finishes, hide the div with your 'loading' gif.
You need to provide more information as to what to show and when to show the image. but to start with here is post about hot you can show a "loading" message (gif) using php and JQuery.
Use AJAX-like technologies and 2 php-scripts. The first should output only "loading..." div and start of AJAX-request to the second one, which returns the full content of your page.
PS: actually, I'm not sure if you understand, that "loading" exists on client-side browser as php's output, but php script runs on server before your browser gets it's output... read more about how HTTP and web servers work - this knowlege is quite neccesary for any good web-developer.