how to get page HTML at client side or through javascript in Asp.net Application. Means if I want to get the html of http://www.yahoo.com on client side through javascript or any other
You can't get the HTML source of a page on a different hostname from JavaScript, for security reasons (the Same Origin Policy).
So unless you're Yahoo, you would have to run a proxy on the server-side that will fetch http://www.yahoo.com/ and then return its content to the client side via a string in a <script> block, or in the response to an XMLHttpRequest (also best JSON-encoded). This is known as a cross-domain proxy.
If you mean get the page html as a string in javascript, you can use:
var s = document.body.innerHTML;
Though you need to note that this does not give you the html exactly as sent to the browser, it gives you the html constructed from the DOM - essentially meaning any errors will have been fixed, as well as that it will include any dynamically created elements.
link :
http://www.boutell.com/newfaq/creating/include.html
there are two ways to create client side includes:
JavaScript and iframe. Let's look at the advantages and disadvantages of both before we tackle how to do it.
The JavaScript method is the more seamless of the two. JavaScript code can fetch a fragment of a page from any URL and insert it into another page at any point. The end result looks as good as a server-side include— but only if JavaScript is turned on. And search engines don't see the included text at all, which is a serious problem.
The iframe method is simpler. The iframe element can be used to force a second page to "embed" inside the first page, in much the same way that Flash movies, videos and MP3 players are embedded with the object element. And JavaScript doesn't have to be turned on. But there are disadvantages here too. The iframe element has a fixed width and height, no matter how big the content is. That can mean scrollbars inside your page. And, as of this writing, Google doesn't appear to index the separate page referenced by the iframe so that searchers can find your page.
You use Ajax.
I recommend using the jQuery Ajax javascript library for this.
Do you mean a PHP function similar to file_get_contents($url) ?
Related
I'm trying to come up with a way to get the all HTML/text that a user sees on any given URL, even though much of what they see may be produced dynamically (on page-load, for example) that is not in the DOM, then manually loading the javascripts and putting the resulting data back into the page.
My thinking is this:
(naively) return array of all javascript files by scraping all the <script> tags src attribute.
return array of all on-page hard-coded javascripts like: <script> var example = true; </script>
create a function to decide the real URLs encountered in the internal and external page javascripts. For example, when encountering for example $.ajax({ url: '/relative-js-file.js', it would figure out the absolute URL so PHP may access that page.
using PHP, load all of the javascript that was found on the page in a way that resembles it being loaded on the actual page itself (the page it came from).
take whatever data the javascript returns (plain, html, etc.), and inject this new plain-text and/or HTML back into the original page <body>.
I do realize this will not work a lot of the time, but my hope is that it would at least be a good starting point until I can find a better solution or create a more advanced function to handle unrecognizable/inaccessable javascript. For examlpe, the javascript itself preventing it from being loaded on any page other than its own.
My Question
Do you think this is a good approach to getting dynamic content that is not in the DOM, and forcing it in the DOM? Or can you think of a better approach? I appreciate your feedback and thoughts.
I have been working on parsing some of the data from the wow armory and have come into a bit of a snag. When it comes to the site serving up the achievements that players have received, it uses javascript to intemperate a string such as #73:1283 to display the requested information. (I made this number up but the data for the requests are formated like this).
Is it possible to pull data from a page that requires javascript to display its data with php?
How do you parse data from a site that has been loaded after the dom is ready or complete using php?
By using Firebug, I was able to look at the HTTP headers to see what AJAX calls were being made to generate the content on these pages: http://us.battle.net/wow/en/character/black-dragonflight/glitchshot/achievement#96:14861 and http://us.battle.net/wow/en/character/black-dragonflight/glitchshot/achievement#96
It looks the page is making an asynchronous call to load this page: http://us.battle.net/wow/en/character/black-dragonflight/glitchshot/achievement/14861 when the part after the hash is 96:14861, and a call to http://us.battle.net/wow/en/character/black-dragonflight/glitchshot/achievement/96 when the part after the hash is just 96. Both of those pages return XML that can be parsed to render HTML.
So generally speaking, if there's just one number after the hash, just put http://.../achievement/<number here> as the URL. If there are two numbers, put the second number at the end of the URL instead.
What you'll need to do, rather than pulling the Javascript and interpreting it, is make HTTP requests to those URLs by yourself in PHP (using cURL, for example) and parse the data on your own.
I would really recommend learning JavaScript and jQuery, since it will be very hard for you to really build a good site that pulls information from the WoW Armory without understanding all the AJAX loads that are going on in the background.
I would recommend seeing if you can replicate the query sent by JavaScript in PHP. While I don't believe there is a way to process JavaScript in PHP, there definitely isn't a simple or scalable way.
I would attempt to scan the first page's source that you downloaded with PHP for strings of that format you mention. Then if the JS on their site is querying something like http://www.wow.com/armory.php?id=#72:1284 you can just download the source of that next. You can find out how the JS is querying the server with something like FireBug or the Inspector in Chrome or Safari.
So in summary:
Check to find the JS URL format and if you can replicate it.
Create PHP to get main page and extract all strings.
Create PHP to loop through these strings and get these pages (with URL that JS requests).
Do whatever you wanted to with that information.
You can try jquery's $(document).onready function which helps
to run java script code when the web page loads up.
ex
<div id="wowoData">#4325325</div>
<script>
$(document).ready(
function(){
$("#wowoData").css("border","1px solid red");
}
)
</script>
I have a site that will scrape for new data on the first page visit. I would like to use AJAX to do this, so that I can present the user with at least some loading.gifs during the scrape, but that is only if Javascript is enabled.
My site utilizes a PHP template engine, so I thought to put the scrape function in <noscript> tags in the html template. Since this would occur after all the PHP code, I would have to reload the page so I can render/parse the scraped data with PHP.
This method seems a little sloppy, I was wondering if there is an efficient way to do this.
Is there any reason why you need to do this with Javascript? Why not just do it with PHP, using cURL to get the content and other PHP extensions such as SimpleXML to get hold of the content? You could even cache the scraped content in a database table for a set amount of time so that each page load doesn't force PHP to do the scraping all over again.
The <noscript> tag will only interpreted by the client (web browser) after the page has been sent, which means it's too late to trigger a php function. I would probably give up on the dual approach if I were you, but there might be some other (hacky) ways to accomplish this...
Maybe you could put an iframe in the <noscript> tag, whose src is no-js separate scraping script.
Or you could test js capability on a landing page, then send people to the page made for their setup.
Possible helpful links:
Check if JavaScript is enabled with PHP
Can I let PHP know that a user does not have javascript enabled?
Specifically. I am making an ajax app and trying to preserve the back button. My javascript is working properly and registering a new url in the address bar with an anchor-like hash in the url:
http://t2b.localhost/#/clients/
I can catch the url when the page loads with javascript and load the "clients" page, but I want to know if there is a way to read the entire url with php or with htaccess? Looking at normal variables, I seem to only be able to get the url up to the occurrence of the "#" (http://t2b.localhost/).
The browser don't send to the server the fragment (the text after the #) part of the url.
It is intended to be used locally by the client.
In firefox (and in explorer too) there is document.location.hash that contains the fragment part of the URL. If you use javascript you can read it and send his value into a common variable.
Please use any of the available javascript libraries to track the history state or browse by ajax requests. There are so many problems involved, such as certain browsers not notifying scripts when the hash part changes, or not adding a pseudo-'navigation' event to the browser's history list etc., that you'll end up recreating an expensive wheel that wouldn't work very well. I recommend YUI's History library, although it has problems on Google Chrome.
I'm pretty sure that you can't parse it strictly with PHP because the hash part is parsed only on the client-side ( Javascript ).
For history I'd recommend Ben Alman's BBQ plugin.
See: Can I read the hash portion of the URL on my server-side application (PHP, Ruby, Python, etc.)?
You could use javascript and set a cookie as the current URL then get it with PHP
How can I transfer an array from an IFrame running a php script, to the parent PHP of the IFrame?
I have an Array of strings in PHP that need to be transferred to the main window so that when the iframe changes pages, the array is still stored.
I could use a cookie for this, though I'm not sure that would be the best idea.
Thanks.
you can't do that in php. iframe is like a new browser window, so they are separate requests. separate requests can not transfer data between each other in a server side language.
if you give some detail as to what you're trying to accomplish, there may be another way around the issue that someone can suggest.
Like Tim Hoolihan said, PHP is a server side language that is only responsible for generating the HTML before it is sent to the browser. Meaning once the page shows up in your browser window, PHP has done it's part.
However, with that said, if you control both the "parent" page and the page being iframed, you can json_encode the array in the page being iframed and set it to Javascript variable, then on load pass it to a Javascript function on the parent page (assuming not violating any browser/domain sandbox constraints). At that point you can do whatever you want with it.
Take a look at jQuery for your core Javascript/Ajax needs.
if you control the iframe, you can save the array in a session variable and make the parent do an asynchronous call to retrieve the array from session.
however Jordan S. Jones solution with only javascript works as well