I Have to load a html file using jquery. When i referred google, i got the snippet
$("#feeds").load("feeds.html");
But I dont want to load the content of feed.html into feeds object. Instead I need to load that page entirely. How to load that page. Help me plz
If you're not wanting to load() some HTML into an element on the existing page, maybe you mean that you want to redirect to another page?
url = "feeds.html";
window.location = url;
Or maybe you just want to fill an entire body? You could load() into the body tag if you wanted.
$("body").load("feeds.html");
$.get()
Here's a summary:
Load a remote page using an HTTP GET
request. This is an easy way to send a
simple GET request to a server without
having to use the more complex $.ajax
function. It allows a single callback
function to be specified that will be
executed when the request is complete
(and only if the response has a
successful response code). If you need
to have both error and success
callbacks, you may want to use $.ajax.
$.get() returns the XMLHttpRequest
that it creates. In most cases you
won't need that object to manipulate
directly, but it is available if you
need to abort the request manually.
Are you sure feeds.html is located under the same domain as your script?
If so, that line is ok, you'll search for problem anywhere else. Try to debug it with Firebug under Net panel.
If not, so you are able to send only JSON requests to urls under other domains. But you can easily write a proxy script with PHP (noticed php tag under your question :) ) to load your content with cURL. Take a look at CatsWhoCode tutorial at point 9, hope it'll help.
Related
Please let me know is it possible to scrap some info after ajax loaded with PHP? I had only used SIMPLE_HTML_DOM for static pages.
Thanks for advice.
Scraping the entire site
Scraping Dynamic content requires you to actually render the page. A PHP server-side scraper will just do a simple file_get_contents or similar. Most server based scrappers wont render the entire site and therefore don't load the dynamic content generated by the Ajax calls.
Something like Selenium should do the trick. Quick google search found numerous examples on how to set it up. Here is one
Scraping JUST the Ajax calls
Though I wouldn't consider this scraping you can always examine an ajax call by using your browsers dev tools. In chrome while on the site hit F12 to open up the dev tools console.
You should then see a window like the above. Hit the network tab and then hit chrome's refresh button. This will show every request made between you and the site. You can then filter out specific requests.
For example if you are interested in Ajax calls you can select XHR
You can then click on any of the listed items in the tabled section to get more information.
File get content on AJAX call
Depending on how robust the APIs are on these ajax calls you could do something like the following.
<?php
$url = "http://www.example.com/test.php?ajax=call";
$content = file_get_contents($url);
?>
If the return is JSON then add
$data = json_decode($content);
However, you are going to have to do this for each AJAX request on a site. Beyond that you are going to have to use a solution similar to the ones presented [here].
Finally you can also implement PhantomJS to render an entire site.
Summary
If all you want is the data returned by specific ajax calls you might be able to get them using file_get_contents. However, if you are trying to scrape the entire site that happens to also use AJAX to manipulate the document then you will NOT be able to use SIMPLE_HTML_DOM.
Finally I worked around my problem. I just get a POST url with all parameters from ajax call and make the same request using SIMPLE_HTML_DOM class.
I have a website which dynamically loads the content in a div container using jQuery load function.
In the browser address line is always: http://www.meinedomain.de/#
Now I would have semantic URLs in the browser address bar.
For example: meinedomain.de/impressum
Do I open http://www.meinedomain.de/impressum I see only the content of impressum.php. Header, Navigation, Footer etc. missing.
$(document).ready(function() {
$(".freizeitparks").click(function() {
hideMainContent();
$('#pagecontainer').load('freizeitparks.html');
$('#contentTitle').html("<strong>Freizeitparks</strong>");
});
I hope you can help me.
Best regards
Patrick
You have to use HTML5 History API if you want to change the url without reloading the full page.
With the History API you can only catch when the browser history changed and the base url is still the same site. That means when they use the address bar to open up a url like http://www.meinedomain.de/impressum it's gonna send a request to the site no matter what you do. So you have to include your footer, header, etc. in impressum.php.
But because you can catch the change of the browser history (back, forward, go, etc.) you can use ajax calls to load data dynamically. Use the API to change the URL and send an ajax call instead reloading everything. On backend you have to make a decision tree to catch when it was called with ajax or not. (My trick is I pass a get parameter when I use ajax so I know when it should return the whole page or just part of it.)
You can also keep continue to use the hash method but then you have to use a workaround to let searchengines to map your site. And don't forget to check the hash on load and not use only on hashchange.
Maybe that's all.
Basically, a page generates some dynamic content, and I want to get that dynamic content, and not just the static html. I am not being able to do this with cURL. Help please.
You can't with just cURL.
cURL will grab the specific raw (static) files from the site, but to get javascript generated content, you would have to put that content into a browser-like envirionment that supports javascript and all other host objects that the javascript uses so the script can run.
Then once the script runs, you would have to access the DOM to grab whatever content you wanted from it.
This is why most search engines don't index javascript-generated content. It's not easy.
If this is one specific site that you're trying to gather info on, you may want to look into exactly how the site gets the data itself and see if you can't get the data directly from that source. For example, is the data embedded in JS in the page (in which case you can just parse out that JS) or is the JS obtained from an ajax call (in which case you can maybe just make that ajax call directly) or some other method.
you could try selenium at http://seleniumhq.org, which supports js.
I want to get the html source of a webpage genereted by javascript using Curl(PHP)
I tried the curl but I get just a javascript code :(
Can I use ruby to resolve my problem ?!
The javascript is executed by the browser to generate the HTML. If you make a request with CURL it will just show you the actual HTML content.You would need a Javascript engine to process the Javascript after receiving the response body.
just look at any web inspector tools (in chrome just ctrl+shift+i). here you can see the changes that the javascript has on the page reflected. I dont think curl or any curl-like-tool can do this.
This is a tough problem because the JavaScript has to run to get the right code. What I would say is download all the code locally and then add in an ajax call to the code, so it can ajax the source back to you after all the js has run. Then run the code in a browser.
If you need to do this a bunch of times you could queue these pages that needed to be loaded into a db and load all the pages using php. Then once the js has ajax'd the code back to the server it can refresh and pull the next page off the queue.
Let me know if you need me to clarify anything.
This can be done by headless browser activity like phantom js a great way to create your own logic whatever you want then get result array in console for php you can try activity here https://github.com/jonnnnyw/php-phantomjs & also https://github.com/ariya/phantomjs
I have to scrape this page using php curl. In this when the user scrolls down , more items are loaded using ajax . Can I call the URL that the ajax script is calling ? If so, then how do I figure the URL out ? I know a bit of ajax, but the code there is kind of complex for me.
Here is the relevant js code pastebin
Alternatively can someone suggest an alternative method of scraping that page? PS : I doing this for good cause.
Edit: I figured it out. Live http headers. QUestion can be closed. downvoted to oblivion.
You can use FireBug for that. Switch to the Console-Tab and then make the page make the AJAX-request.
This is what should see after scrolling to the bottom of the page: http://www.flipkart.com/computers/components/ram-20214?_l=m56QC%20tQahyMi46nTirnSA--&_r=11FxOYiYfpMxmANj4kGJzg--&_pop=flyout&response-type=json&inf-start=20
and if you scroll further: http://www.flipkart.com/computers/components/ram-20214?_l=m56QC%20tQahyMi46nTirnSA--&_r=11FxOYiYfpMxmANj4kGJzg--&_pop=flyout&response-type=json&inf-start=40
The tokens seem to always remain the same: _l=m56QC%20tQahyMi46nTirnSA-- and _r=11FxOYiYfpMxmANj4kGJzg--, so does the _pop-parameter: _pop=flyout So let's have a look at the other parameters:
This one was for the main page:
//no additional parameters...
this one for the first 'reload':
&response-type=json&inf-start=20
and this one for the second 'reload':
&response-type=json&inf-start=40
So, appearently you just have to append &response-type=json&inf-start=$offset to your initial URI to get the results in JSON-format. You can also see the contents in FireBug which should make it very easy to work with them.
Here's a screenshot: