I am using data scrapping technique(Parsing) in php to get the data from a web page using html_dom class. This page has some AJAX method to load more data when we scroll down the page but in page source there is only data that loads first time mean when we browse the page first time.
So my question is how to get the all page source that loads through AJAX??
Thanks
If you are using chrome, you could use developer tool or F12 button. Then go to network tab and tick the preserve log button and select the XHR tab to see which source is being loaded in that page.
I'm looking to get information from an external website, by taking it from a div in their code. But using the file_get_contents() method doesn't work because the information isn't in the source code for the page. It only shows up after the page loads (It's available if you use an inspect element in the web browser).
Is there a way to do this? Or Am I just out of luck on that?
This might be a bit difficult to explain but I will try my best.
I have a page that will display some results from JavaScript (time (HH/MM/SS) to be exact).
I need to display the results of JavaScript which is shown on the page in the source code of that page when viewed from a browser, like Firefox right click -> view source files.
I was thinking about echoing the results but that seemed to be a wrong idea, as echoing will only show the results on the page, and not in the source code of the page.
EDIT:
Okay, if it is impossible to show the results of the JavaScript in the page source, then how does this site display the result of the current time, etc, in the page source? I.e. Wednesday, July 31, 2013, etc, etc can be viewed on the page and on the page source.
http://www.timeanddate.com/worldclock/city.html?n=136
I am using Google Chrome to view the page source.
You can't alter the source code that comes from server by the means of JavaScript. While javascript can manipulate DOM objects, the text you see when you click "view source" is exactly as it came from the server, and there is no way you could change that.
To view the changes done by your scripts, use Firebug or some similar tool.
Unfortunately the source code you see when you hit view source, is the response of the http request that was send. Javascript can not alter that source. Any change you would make to the source (DOM) would only be visible in a DOM inspector (ie. ctrl+j in chrome or F12 in IE).
What you want is simply not possible from the javascript side.
how is it possible to grab web page source from a ajax type web page:
curl doesn't seem to be able to get ajax generated source.
Sorry if duplicate, but looking throw questions didn't find answer.
If the page you want to grab uses ajax to compose different parts of it, then the content does not exist until all the loading is done.
You couldn't do this with curl, as curl acts as a client requesting only the URL you instruct it, but has no javascript engine to interpret the script and load other parts of the page.
If the content you are looking for is in one of the parts loaded through ajax, you should use the chrome inspector -> network tab and see what is the exact URL of the loaded page, then load that page using curl.
When I view the page source of a page (like this for example: http://my.sa.ucsb.edu/public/curriculum/coursesearch.aspx) there's not very much code/info in it. On that linked page, for instance, none of the class info is shown in the page source.
BUT: when I view it in firebug, I can see a lot more of the html information. For instance, I can see all of the class info, in tables.
Why is this? How can I access the full (firebug html)? Can I do it in php/javascript?
This is the order in which stuff happens:
PHP generates HTML
Browser loads HTML
JavaScript manipulate loaded HTML
Why is this?
The view source browser feature normally shows the plain HTML as received by the browser. Other advanced tools like Firefug are able to display the current HTML after being changed by JavaScript. (Firefox itself has this feature as well: just right click on some generated HTML and choose "View selected source".)
How can I access the full (firebug html)?
I'm not sure about the HTML tab but the Network tab always displays documents as received from the server.
Can I do it in php/javascript?
PHP is no longer running when the original HTML reaches the browser.
JavaScript can display HTML with the .innerHTML property of any DOM node.
View Source shows what the browser got from the server. Firebug shows the browser DOM - i.e. representation of the page view that exists in browser memory. DOM can be changed by Javascript. Javascript can access DOM by using document value and then going to its children, etc. - for example, to see all tables, you might do document.getElementsByTagName('table')
If you want whole DOM contents as HTML, you can do something like document.getElementsByTagName('html')[0].innerHTML
View Source simply shows you the HTML loaded from the server, which means that any changes done to the DOM after the page has been loaded will not be shown. The Page source only shows you the first source when the page finishes loading.
On the other hand, Firebug is dynamic and shows you the DOM and how it is being manipulated. When the DOM is being changed, Firebug's source will change as well. This is important for debugging as you can see what is really going on, unlike the View Source.
When viewing the source with "View Source", the HTML you view is the HTML of the URL you are in, and the HTML without any modification from JavaScript and the sort.
Also, if the page had frames or iframes in its code, the content of them will not show either.
Instead, in firebug, changes to HTML dynamically and content of frames/iframes will be visible.
Also, viewing the source of a page before it's fully loaded, can be a reason of not having the whole HTML code (or any HTML code at all).
Traversing the HTML code with JavaScript will always return the full updated HTML code. (i.e. what you would see in firebug)
I'm not sure how you want to access the HTML with PHP, but PHP does not have access to the code after it reaches the browser. But if you are sending a URL to PHP to load the HTML, the HTML you will have is the original HTML before any dynamic changes (i.e. the one you would see in "View Source")
Also firebug will show you the css file which will just be targeted from the main html via
<link rel="stylesheet" type="text/css" href="css">
Therefore showing some more information.
Page source shows you HTML when page was loaded for first time. It does not show you modifications made using javascript etc after page was loaded or after you clicked any button on webpage. To view the currently visible DOM, you can use the following:
For IE/firefox, following bookmarklet works:
https://www.squarefree.com/bookmarklets/webdevel.html#generated_source
For google chrome, right click on any element and choose 'Inspect Element' option. It will show the position of element in DOM. Now right click on '
For opera, right click on any element and choose 'Inspect element'. This will start opera dragonfly. In dragonfly window, Click on 'Expand the DOM tree' button (first button with a dot and two arrows) and then 'export the current DOM panel' button (second button)
In IE, open the webpage and press F12 to open developer tools. Click View->Source->DOM(page) or shortcut Ctrl+Shift+G in developer tools window. This will show the complete currently visible DOM.
For firefox, alternative is Web developer toolbar extension and choose View Source->View Generated source in it.
View source gives you the source of the page when it is loaded, to get the current html, there is a option in web developer tool (Firefox addon) - "View generated source"
in menu :
view source -> view generated source