I'm looking to get information from an external website, by taking it from a div in their code. But using the file_get_contents() method doesn't work because the information isn't in the source code for the page. It only shows up after the page loads (It's available if you use an inspect element in the web browser).
Is there a way to do this? Or Am I just out of luck on that?
I have coded a math quiz mostly in PHP which pulls 10 questions and answers randomly from a database of about 20 questions.
This works fine, however when I click on "view source code", the source code displays DIFFERENT questions than the ones displayed on the actual webpage. It seems to show other random questions from the database. Does anyone know why this happens?
Here is the link to the quiz: http://socialsoftware.purchase.edu/nicholas.roberts/mathquiz/mathselect.php?category=Calculus
Notice how the source code shows different data than the actual webpage...
If you 'View page source', the browser issues a new request, so you get a new random set of questions in the source.
It's different when you choose 'inspect element'. If you do that, you are inspecting details of the current document, not as it was loaded, but as it currently is in the DOM in the browser.
This is because on every refresh, you are fetching 10 random questions. In some browsers, view-source fetches fresh copy of page source. So your script is returning random questions again.
Use inspect element (Developer tools) instead of view source.
When you view the page source, your browser will issue another request to the server. The page source you are viewing then contains a new random set of questions.
If you need to inspect the page currently loaded, use inspect element instead.
How do you do dynamic refresh of a single div tab using php/ajax and have the content actually change the local html on the page (so that it is changed when you go to ‘view source’ in a browser) instead of just putting the change in a JavaScript object? I am trying to design a webpage that loads search results without refreshing the entire page. I use a simple hash followed by a GET/query string request to determine what content to load. This gets passed to a JavaScript XMLHttpRequest, then to some php which picks up the GET and passes it to a SOAP service and finally echo’s the SOAP results back to the XMLHttpRequest to get displayed in a document.getElementById div change. This works fine for usual display in conventional browsers. However I am concerned that search bots and screen readers are not going to recognize the majority of the content that shows in browsers because it is all contained within a client side JavaScript object.
So, I guess my first question is: is this a valid concern? If it is, is there a work around?
Thanks!
AJAX content is very hard to get indexed. Google has webmaster guidelines for AJAX. This should get you started in the right direction on getting your content indexed.
I'm inexperienced with search engine behavior but as far as i know the best option is to load the full content of your div on a php page, when the page load you can include that page inside the div, and then start using js/jquery to refresh that every so many seconds.
this way when a search bot gets on the site it will see the current content, and users will see it update.
updating the div box can be done quite easy using ajax function and jquery.
I've tried a bunch of techniques to crawl this url (see below), and for some reason the title comes back incorrect. If I look at the source of the page with firebug I can see the correct title tag, however, if I view the page source it's different.
Using several php techniques I get the same result. Digg is able to crawl the page and parse the correct title.
Here's the link: http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
The correct title is "How to Make Your iPhone (or Other iOS Device) More Like Android"
The parsed title is "Lifehacker, tips and downloads for getting things done"
Is this normal? How are they doing this? Is there a way to get the correct title?
That's because when you request it using PHP (without any JS support) you're getting the main page of lifehacker - which is lifehacker.com.
Lifehacker switched their CMS recently so that all requests go to an initial page and then everything after the hashbang is read by a JS script in the main page to figure out which page needs to be served. You need to modify your program to take this into account
EDIT
Have a gander at these links
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch
Found the answer:
http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
becomes:
http://lifehacker.com/?_escaped_fragment_=5772420/how-to-make-ios-more-like-android
When I view the page source of a page (like this for example: http://my.sa.ucsb.edu/public/curriculum/coursesearch.aspx) there's not very much code/info in it. On that linked page, for instance, none of the class info is shown in the page source.
BUT: when I view it in firebug, I can see a lot more of the html information. For instance, I can see all of the class info, in tables.
Why is this? How can I access the full (firebug html)? Can I do it in php/javascript?
This is the order in which stuff happens:
PHP generates HTML
Browser loads HTML
JavaScript manipulate loaded HTML
Why is this?
The view source browser feature normally shows the plain HTML as received by the browser. Other advanced tools like Firefug are able to display the current HTML after being changed by JavaScript. (Firefox itself has this feature as well: just right click on some generated HTML and choose "View selected source".)
How can I access the full (firebug html)?
I'm not sure about the HTML tab but the Network tab always displays documents as received from the server.
Can I do it in php/javascript?
PHP is no longer running when the original HTML reaches the browser.
JavaScript can display HTML with the .innerHTML property of any DOM node.
View Source shows what the browser got from the server. Firebug shows the browser DOM - i.e. representation of the page view that exists in browser memory. DOM can be changed by Javascript. Javascript can access DOM by using document value and then going to its children, etc. - for example, to see all tables, you might do document.getElementsByTagName('table')
If you want whole DOM contents as HTML, you can do something like document.getElementsByTagName('html')[0].innerHTML
View Source simply shows you the HTML loaded from the server, which means that any changes done to the DOM after the page has been loaded will not be shown. The Page source only shows you the first source when the page finishes loading.
On the other hand, Firebug is dynamic and shows you the DOM and how it is being manipulated. When the DOM is being changed, Firebug's source will change as well. This is important for debugging as you can see what is really going on, unlike the View Source.
When viewing the source with "View Source", the HTML you view is the HTML of the URL you are in, and the HTML without any modification from JavaScript and the sort.
Also, if the page had frames or iframes in its code, the content of them will not show either.
Instead, in firebug, changes to HTML dynamically and content of frames/iframes will be visible.
Also, viewing the source of a page before it's fully loaded, can be a reason of not having the whole HTML code (or any HTML code at all).
Traversing the HTML code with JavaScript will always return the full updated HTML code. (i.e. what you would see in firebug)
I'm not sure how you want to access the HTML with PHP, but PHP does not have access to the code after it reaches the browser. But if you are sending a URL to PHP to load the HTML, the HTML you will have is the original HTML before any dynamic changes (i.e. the one you would see in "View Source")
Also firebug will show you the css file which will just be targeted from the main html via
<link rel="stylesheet" type="text/css" href="css">
Therefore showing some more information.
Page source shows you HTML when page was loaded for first time. It does not show you modifications made using javascript etc after page was loaded or after you clicked any button on webpage. To view the currently visible DOM, you can use the following:
For IE/firefox, following bookmarklet works:
https://www.squarefree.com/bookmarklets/webdevel.html#generated_source
For google chrome, right click on any element and choose 'Inspect Element' option. It will show the position of element in DOM. Now right click on '
For opera, right click on any element and choose 'Inspect element'. This will start opera dragonfly. In dragonfly window, Click on 'Expand the DOM tree' button (first button with a dot and two arrows) and then 'export the current DOM panel' button (second button)
In IE, open the webpage and press F12 to open developer tools. Click View->Source->DOM(page) or shortcut Ctrl+Shift+G in developer tools window. This will show the complete currently visible DOM.
For firefox, alternative is Web developer toolbar extension and choose View Source->View Generated source in it.
View source gives you the source of the page when it is loaded, to get the current html, there is a option in web developer tool (Firefox addon) - "View generated source"
in menu :
view source -> view generated source