How to parse content loaded by javascript after the dom is complete - php

I have been working on parsing some of the data from the wow armory and have come into a bit of a snag. When it comes to the site serving up the achievements that players have received, it uses javascript to intemperate a string such as #73:1283 to display the requested information. (I made this number up but the data for the requests are formated like this).
Is it possible to pull data from a page that requires javascript to display its data with php?
How do you parse data from a site that has been loaded after the dom is ready or complete using php?

By using Firebug, I was able to look at the HTTP headers to see what AJAX calls were being made to generate the content on these pages: http://us.battle.net/wow/en/character/black-dragonflight/glitchshot/achievement#96:14861 and http://us.battle.net/wow/en/character/black-dragonflight/glitchshot/achievement#96
It looks the page is making an asynchronous call to load this page: http://us.battle.net/wow/en/character/black-dragonflight/glitchshot/achievement/14861 when the part after the hash is 96:14861, and a call to http://us.battle.net/wow/en/character/black-dragonflight/glitchshot/achievement/96 when the part after the hash is just 96. Both of those pages return XML that can be parsed to render HTML.
So generally speaking, if there's just one number after the hash, just put http://.../achievement/<number here> as the URL. If there are two numbers, put the second number at the end of the URL instead.
What you'll need to do, rather than pulling the Javascript and interpreting it, is make HTTP requests to those URLs by yourself in PHP (using cURL, for example) and parse the data on your own.
I would really recommend learning JavaScript and jQuery, since it will be very hard for you to really build a good site that pulls information from the WoW Armory without understanding all the AJAX loads that are going on in the background.

I would recommend seeing if you can replicate the query sent by JavaScript in PHP. While I don't believe there is a way to process JavaScript in PHP, there definitely isn't a simple or scalable way.
I would attempt to scan the first page's source that you downloaded with PHP for strings of that format you mention. Then if the JS on their site is querying something like http://www.wow.com/armory.php?id=#72:1284 you can just download the source of that next. You can find out how the JS is querying the server with something like FireBug or the Inspector in Chrome or Safari.
So in summary:
Check to find the JS URL format and if you can replicate it.
Create PHP to get main page and extract all strings.
Create PHP to loop through these strings and get these pages (with URL that JS requests).
Do whatever you wanted to with that information.

You can try jquery's $(document).onready function which helps
to run java script code when the web page loads up.
ex
<div id="wowoData">#4325325</div>
<script>
$(document).ready(
function(){
$("#wowoData").css("border","1px solid red");
}
)
</script>

Related

Get response from ajax and parse the response with html on webpage

I need to run a search using Ajax, such that the response I get from Ajax (should not contain HTML, it should only contain data) is fetched on webpage and then parse that response with HTML on the page and display.
I want to know can it be done, if yes then how to do it. Also is it going to make process run faster or consume less resources on server?
Since you have made no attempt to try to code this, I will give you a couple pointers.
1.) It is very possible, I do it on login forms.
2.) Post data to a external page, then encode the response on that page to an array in JSON. echo out the JSON on the external page.
3.) After your ajax post is finished, you can carry out a function similar to this:
function(data){alert(data.given_name_on_external_page)}
or something similar. Once you google around for Ajax form examples you should be able to grasp a little better.
4.) Now for displaying this on a web page, it is fairly easy.
HTML
<div id='response'></div>
Javascript
function(data){document.GetElementById('response').html=data.data};
That should be enough for you to understand what needs to be done, I will leave the rest to you and your ability to use google :).
use escape and load() functionality in JQUERY

How to include the static HTML results of a dynamic Javascript page in PHP?

I have a small script that pulls HTML from another site using Javascript.
I want to include that static HTML that gets pulled in a PHP page without any of the Javascript code appearing in the final PHP page that gets displayed.
I tried doing an include of the file with the Javascript code in the PHP page, but it just included the actual Javascript and not the results of the Javascript.
So how would I go about doing this?
You would need to fetch the page, execute the JavaScript in it, then extract the data you wanted from the generated DOM.
The usual approach to this is to use a web automation tool such as Selenium.
You simply can't.
You need to understand that PHP and Javascript operate on different places, PHP on the server and Javascript on the client.
Your only solution is to change the way all this is done and use "file_get_contents(url)" from PHP to get the same content your javascript used to get. This way, there is no javascript anymore and you can still pre-process your page with distant content.
You wouldn't be able to do this directly from within PHP, since you'd need to run Javascript code.
I'd suggest passing the URL (and any required actions such as click event, etc) to a headless browser such as Phantom or Zombie, and capturing the DOM from it once the JS engine has done it's work.
You could also use a real browser, but of course you don't need a UI in your case, and it might actually get in the way of what you're trying to do, so a headless browser might be better.
This sort of thing would normally be used for automated testing of a site (ie Functional Testing).
There is a PHP tool named Mink which can run these sorts of scripts from within a PHP program. It is aimed at writing test scripts, but I would imagine you could use it for your purposes.
Hope that helps.

php crawler for website with ajax content and https

i'm trying to grab the content of a website based on ajax and https but with no luck.
Is this possible.
The website i'm trying to crawl is this:
https://www.bet3000.com/en/html/home.html#!https://www.bet3000.com/html/en/eventssportsbook.html?category_id=2117
Thanks
If you take a look at the HTTP requests that this page is doing (using, for example, Firebug for Firefox), you'll notice it makes several Ajax requests.
Instead of trying to execute the Javascript code, a possible solution could be for you to request one of those URLs, and get the data -- you'd also not have to parse the HTML, this way.
In this specific case, one of those requests is made to the following URL :
https://www.bet3000.com/ajax/en/sportsbook.json.html?category_id=2117&offset=&live=&sportsbook_id=0
This URL seems to return some JSON data, that should interest you quite a bit ;-)
(There is a few characters before and after the JSON, that will need to be removed, but, asides from that, I don't see anything that doesn't look good.)

Use php function unless Javascript is enabled

I have a site that will scrape for new data on the first page visit. I would like to use AJAX to do this, so that I can present the user with at least some loading.gifs during the scrape, but that is only if Javascript is enabled.
My site utilizes a PHP template engine, so I thought to put the scrape function in <noscript> tags in the html template. Since this would occur after all the PHP code, I would have to reload the page so I can render/parse the scraped data with PHP.
This method seems a little sloppy, I was wondering if there is an efficient way to do this.
Is there any reason why you need to do this with Javascript? Why not just do it with PHP, using cURL to get the content and other PHP extensions such as SimpleXML to get hold of the content? You could even cache the scraped content in a database table for a set amount of time so that each page load doesn't force PHP to do the scraping all over again.
The <noscript> tag will only interpreted by the client (web browser) after the page has been sent, which means it's too late to trigger a php function. I would probably give up on the dual approach if I were you, but there might be some other (hacky) ways to accomplish this...
Maybe you could put an iframe in the <noscript> tag, whose src is no-js separate scraping script.
Or you could test js capability on a landing page, then send people to the page made for their setup.
Possible helpful links:
Check if JavaScript is enabled with PHP
Can I let PHP know that a user does not have javascript enabled?

PHP live updating

I was wondering if there is a way to use php to return the values from a search without having to reload the whole webpage or using iframes or anything like that. I've tried searching for it but I always end up with AJAX and I was wondering if there is a PHP way for it...
I suggest you read up on AJAX and what it is, as it is exactly what you are describing.
What AJAX is generating a request on the browser with javascript, sending the request to a server, generating content with whatever technology you want (be it PHP, .NET, etc.) and returning it to the browser, without the page ever 'reloading'. That's all it is, and that's what you want.
I recommend you check out something like jQuery as it is far away the most popular javascript library. It makes doing AJAX requests a piece of cake.
AJAX is what you're looking for. It means using JavaScript (on the browser) to initiate a request to the server (which may be running PHP, or any other language).
PHP is a server-side technology, and what you describe is mostly a client-side issue.
Every technology that does what you want is going to be very close to Ajax, so I suggest to just take a little time and get yourself going with Ajax. There are plenty of javascript frameworks around that make life easier for you as an Ajax programmer.
PHP is server-side. It can't do anything unless a web request is made (i.e. the user clicks on a link, requesting a page). This is why AJAX exists. The javascript on the client side is able to initiate a web request in the background and decide what to do with the response.
Check out jQuery. It makes AJAX a snap:
http://docs.jquery.com/Ajax
Yes I did the same using PHp and Mysql. What you can do is first create a PHP search page1 with a text box and write down some jQuery function for the onkeyup event of text box. Pass the value of the Text Box to the PHP search page2 and display its data in another blank DIV tag on your search page1. Let me know if you were able to get the concept, else I will forward you some link for that. infact I found a youtube video for this. Its not a difficult task.

Categories