Using PHP Simple HTML DOM Parser for Google App Status - php

I wanted to use PHP Simple HTML DOM Parser to grab the Google Apps Status table so I can create my own dashboard that will only include Google Mail and Google Talk service status, as well as change the presentation (html,css).
As a test, I wanted to find/output the table element but it's not displaying any results.
$html = file_get_html('http://www.google.com/appsstatus');
$e = $html->find("table", 0);
echo $e->outertext;
Although, if I find/output the div elements it will display results.
$html = file_get_html('http://www.google.com/appsstatus');
$e = $html->find("div", 0);
echo $e->outertext;
Any help would be much appreciated.

It's way easier than that. All of this data is tied up in a JSON feed.
http://www.google.com/appsstatus/json/en
On a simple level, you could do a file_get_contents() of that, knock that bit off the front that says dashboard.jsonp, and then to a json_decode() (doc), and you will have yourself a nice array with all the information you'd ever want to know about Google's service status. Dump it with print_r() to see where everything is.
Finding these types of things is super easy with Fiddler. I highly recommend it.

Related

Get data from table with PHP Simple HTML DOM Parser

I want to extract data that are in table #buyOrdersTable from here
https://bittrex.com/Market/Index?MarketName=BTC-XRP
To do this I am using PHP Simple HTML DOM Parser library and following code:
$html = file_get_html('https://bittrex.com/Market/Index?MarketName=BTC-XRP');
echo 'BTC/XRP<br>';
foreach($html->find('div.buy-table-container tr.dyn-tr-add td') as $td)
{
echo $td->plaintext . '<br>';
}
?>
I want to extract every row from BID section - SUM, TOTAL, SIZE (XRP), BID (BTC). But code doesn't find any row.
You can't do that. It's impossible, as explained by msg in the comments.
To do it properly, sign up for an API key, and call the API!
https://support.bittrex.com/hc/en-us/articles/115003723911-Developer-s-Guide-API
You'll probably want to use Guzzle, or cURL to make your requests. You can find lots of tutorials showing how to connect to any API using either.
This may or may not help you. A while back I started writing a library that hooked up to the BTC-e exchange (now Wex.nz). You can make adapters for any exchange, so you could tweak this code if you like.
https://github.com/delboy1978uk/BTCExchange/blob/master/src/Exchange/BtcE.php
Which extends this class https://github.com/delboy1978uk/BTCExchange/blob/master/src/Exchange/ExchangeAbstract.php
Credit to msg for bothering to check Packagist. There are many ready-to-rock Bittrex API packages waiting to be installed! https://packagist.org/?query=bitrex-api

PHP - file_get_html not returning anything

I am trying to scrape data from this site, using "inspect" I am checking the class of the div, but when I try to get it, it doesn't display anything:
Trying to get the "Diamond" below "Supremacy".
What I am using:
<?php
include('simple_html_dom.php');
$memberName = $_GET['memberName'];
$html = file_get_html('https://destinytracker.com/d2/profile/pc/'.$memberName.'');
preg_match("/<div id=\"dtr-rating\".*span>/", $html, $data);
var_dump($data);
?>
FYI, simple_html_dom is a package available on SourceForge at http://simplehtmldom.sourceforge.net/. See the documentation.
file_get_html(), from simple_html_dom, does not return a string; it returns an object that has methods you can call to traverse the HTML document. To get a string from the object, do:
$url = https://destinytracker.com/d2/profile/pc/'.$memberName;
$html_str = file_get_html($url)->plaintext;
But if you are going to do that, you might as well just do:
$html_str = file_get_contents($url);
and then run your regex on $html_str.
BUT ... if you want to use the power of simple_html_dom ...
$html_obj = file_get_html($url);
$the_div = $html_obj->find('div[id=dtr-rating]', 0);
$inner_str = $the_div->innertext;
I'm not sure how to do exactly what you want, because when I look at the source of the web link you provided, I cannot find a <div> with id="dtr-rating".
My other answer is about using simple_html_dom. After looking at the HTML doc in more detail, I see the problem is different than I first thought (I'll leave it there for pointers on better use of simple_html_dom).
I see that the web page you are scraping is a VueJS application. That means the HTML sent by the web server causes Javascript to run and build the dynamic contents of the web page that you see displayed. That means, the <div> your are looking for with regex DOES NOT EXIST in the HTML sent by the server. Your regex cannot find anything but its not there.
In Chrome, do Ctl+U to see what the web server sent (no "Supremacy"). Do Ctl+Shift+I and look under the "Elements" tab to see the HTML after the Javascript has done is magic (this does have "Supremacy").
This means you won't be able to get the initial HTML of the web page and scrape it to get the data you want.

Simple HTML Dom Scraping Google Result

I need to scrape the very little piece of text which Google returns to any enquiry as part of the "Knowledge Graph" result (the one generally on the right-hand side) which it gets from Wikipedia. This way I can then convert the plain-text to Voice Answer. Using Simple HTML Dom I have no problems scraping such info from Bing or Ask, but the very DIV (and SPAN) within which this result is nested on Google, I just can't get it. Simple function below:
$question = str_replace(' ','+',$_GET['question']);
$address = 'http://www.google.co.uk/search?q='.$question;
$ret = scraping_Google($address);
function scraping_Google($url) {
// create HTML DOM
$html = file_get_html($url);
// get title
$ret = $html->find('div.kno-rdesc', 0)->plaintext;
// clean up memory
$html->clear();
unset($html);
return $ret;
}
echo $ret;
The very div.kno-rdesc is where the content is nested (this I easily retrieve using Code Inspector on Chrome). Yet, no success to parse this tiny piece of information. Anybody able to help out? Cheers!
You don't need to scrape it. Google has an API for that. Tap into the power of Google's Knowledge Graph with Freebase data

HTML content extraction using Diffbot

Can someone help me I want to extract html data from http://www.quranexplorer.com/Hadith/English/Index.html. I have found a service that does exactly that http://diffbot.com/dev/docs/ they support data extraction via a simple api, the problem it that I have a large number of url that needs that needs to be processed. The link below http://test.deen-ul-islam.org/html/h.js
I need to create a script that that follows the url then using the api generate the json format of the html data (the apis from the site allows batch requests check website docs)
Please note diffbot only allows 10000 free request per month so I need a way to save the progress and be able to pick up where I left off.
Here is an example I created using php.
$token = "dfoidjhku";// example token
$url = "http://www.quranexplorer.com/Hadith/English/Hadith/bukhari/001.001.006.html";
$geturl="http://www.diffbot.com/api/article?tags=1&token=".$token."&url=".$url;
$json = file_get_contents($geturl);
$data = json_decode($json, TRUE);
echo $article_title=$data['title'];
echo $article_author=$data['author'];
echo $article_date=$data['date'];
echo nl2br($article_text=$data['text']);
$article_tags=$data['tags'];
foreach($article_tags as $result) {
echo $result, '<br>';
}
I don't mind if the tool is in javascript or php I just need a way to get the html data in json format.
John from Diffbot here. Note: not a developer, but know enough to write hacky code to do simple things.
You have a list of links -- it should be straightforward to iterate through those, making a call to us for each.
Here's a Python script that does such: https://gist.github.com/johndavi/5545375
I used a quick search regex in Sublime Text to pull out the links from the JS file.
To truncate this, just cut out some of the links, then run it. It will take a while as I'm not using the Batch API.
If you need to improve or change this, best seek out a stronger developer directly. Diffbot is a dev-friendly tool.

How to parse this NOAA Weather Alert CAP in PHP?

Greetings,
I am having some difficulty understanding how to parse NOAA's Weather Alert CAP in PHP. I need to do the following:
Locate the proper county in the feed
Verify that there is an active alert
Display the alert's description
The feed I am working with is at this address - http://www.weather.gov/alerts/va.cap
I have used simplexml_load_string() in the past for this sort of thing but it does not seem to work for this feed.
Thanks!
After some more time on Google I came across a script that does exactly what I am trying to do. Rather than try to reinvent the wheel, I am going to go with it. http://saratoga-weather.org/scripts-atom.php#atomadvisory
You are probably having an issue due to the namespace
<cap:alert xmlns:cap='http://www.incident.com/cap/1.0'>
This should give you an idea of how to extract information
$sxe = simplexml_load_file('http://www.weather.gov/alerts/va.cap');
foreach ($sxe->getDocNamespaces() as $ns => $uri) {
$sxe->registerXPathNamespace($ns, $uri);
}
foreach($sxe->xpath('//cap:areaDesc') as $areaDesc) {
echo $areaDesc;
}
On a sidenote, SimpleXml is for simple XML only. Consider using DOM instead.

Categories