I am Learning php. I have learned some basics. Now I am eager to learn Web page parsing.
I want to Parse this page http://www.icc-cricket.com/rankings/team-rankings/test
I want to parse this alone
Rank Team Matches Points Rating
1 South Africa 24 3240 135
I would recommend Symfony2 The DomCrawler Component http://symfony.com/doc/current/components/dom_crawler.html
If you know basic PHP, I would recommend you using this framework: http://simplehtmldom.sourceforge.net/
Its simple to use.
You could have a look at http://simplehtmldom.sourceforge.net/ which allows you to parse HTML pages rather easily.
That said, one should always instead look into if the service offers feeds instead, because parsing them are both less error prone, more efficient and (usually) don't change much. HTML-markup can change over time, causing your dom query to become invalid.
Seems that those scores are attached to pages via ajax. So you cannot parse this link directly to get your rankings. It seems that request is sent to
http://cma.icc-cricket.com/api/getRankings?callback=onRankings&_1375776810417=
So you need to make similar request and process data then.
Result from url:
onRankings([{"matchType":"TEST","rankings":[{"position":"1","team":{"fullName":"South Africa","abbreviation":"SA"},"qfyMatches":"0","played":"24","points":"3240","rating":"135"},{"position":"2","team":{"fullName":"India","abbreviation":"IND"},"qfyMatches":"0","played":"30","points":"3473","rating":"116"},{"position":"3","team":{"fullName":"England","abbreviation":"ENG"},"qfyMatches":"0","played":"32","points":"3577","rating":"112"},{"position":"4","team":{"fullName":"Australia","abbreviation":"AUS"},"qfyMatches":"0","played":"27","points":"2846","rating":"105"},{"position":"5","team":{"fullName":"Pakistan","abbreviation":"PAK"},"qfyMatches":"0","played":"19","points":"1947","rating":"102"},{"position":"6","team":{"fullName":"West Indies","abbreviation":"WI"},"qfyMatches":"0","played":"22","points":"2168","rating":"99"},{"position":"7","team":{"fullName":"Sri Lanka","abbreviation":"SL"},"qfyMatches":"0","played":"26","points":"2295","rating":"88"},{"position":"8","team":{"fullName":"New Zealand","abbreviation":"NZ"},"qfyMatches":"0","played":"27","points":"2126","rating":"79"},{"position":"9","team":{"fullName":"Bangladesh","abbreviation":"BAN"},"qfyMatches":"0","played":"13","points":"135","rating":"10"}]},{"matchType":"ODI","rankings":[{"position":"1","team":{"fullName":"India","abbreviation":"IND"},"qfyMatches":"0","played":"48","points":"5906","rating":"123"},{"position":"2","team":{"fullName":"Australia","abbreviation":"AUS"},"qfyMatches":"0","played":"34","points":"3861","rating":"114"},{"position":"3","team":{"fullName":"England","abbreviation":"ENG"},"qfyMatches":"0","played":"38","points":"4257","rating":"112"},{"position":"4","team":{"fullName":"Sri Lanka","abbreviation":"SL"},"qfyMatches":"0","played":"49","points":"5435","rating":"111"},{"position":"5","team":{"fullName":"South Africa","abbreviation":"SA"},"qfyMatches":"0","played":"34","points":"3584","rating":"105"},{"position":"6","team":{"fullName":"Pakistan","abbreviation":"PAK"},"qfyMatches":"0","played":"42","points":"4294","rating":"102"},{"position":"7","team":{"fullName":"New Zealand","abbreviation":"NZ"},"qfyMatches":"0","played":"29","points":"2593","rating":"89"},{"position":"8","team":{"fullName":"West Indies","abbreviation":"WI"},"qfyMatches":"0","played":"41","points":"3639","rating":"89"},{"position":"9","team":{"fullName":"Bangladesh","abbreviation":"BAN"},"qfyMatches":"0","played":"23","points":"1754","rating":"76"},{"position":"10","team":{"fullName":"Zimbabwe","abbreviation":"ZIM"},"qfyMatches":"0","played":"23","points":"1205","rating":"52"},{"position":"11","team":{"fullName":"Ireland","abbreviation":"IRE"},"qfyMatches":"0","played":"10","points":"394","rating":"39"},{"position":"12","team":{"fullName":"Netherlands","abbreviation":"NL"},"qfyMatches":"0","played":"7","points":"88","rating":"13"},{"position":"13","team":{"fullName":"Kenya","abbreviation":"KEN"},"qfyMatches":"0","played":"4","points":"40","rating":"10"}]},{"matchType":"T20I","rankings":[{"position":"1","team":{"fullName":"Sri Lanka","abbreviation":"SL"},"qfyMatches":"20","played":"16","points":"2003","rating":"125"},{"position":"2","team":{"fullName":"Pakistan","abbreviation":"PAK"},"qfyMatches":"31","played":"21","points":"2599","rating":"124"},{"position":"3","team":{"fullName":"India","abbreviation":"IND"},"qfyMatches":"18","played":"14","points":"1689","rating":"121"},{"position":"5","team":{"fullName":"South Africa","abbreviation":"SA"},"qfyMatches":"24","played":"18","points":"2158","rating":"120"},{"position":"4","team":{"fullName":"West Indies","abbreviation":"WI"},"qfyMatches":"22","played":"17","points":"2041","rating":"120"},{"position":"6","team":{"fullName":"England","abbreviation":"ENG"},"qfyMatches":"26","played":"19","points":"2148","rating":"113"},{"position":"7","team":{"fullName":"Australia","abbreviation":"AUS"},"qfyMatches":"23","played":"17","points":"1753","rating":"103"},{"position":"8","team":{"fullName":"New Zealand","abbreviation":"NZ"},"qfyMatches":"25","played":"19","points":"1937","rating":"102"},{"position":"unranked","team":{"fullName":"Afghanistan","abbreviation":"AFG"},"qfyMatches":"7","played":"6","points":"525","rating":"88"},{"position":"9","team":{"fullName":"Ireland","abbreviation":"IRE"},"qfyMatches":"12","played":"7","points":"568","rating":"81"},{"position":"10","team":{"fullName":"Bangladesh","abbreviation":"BAN"},"qfyMatches":"14","played":"10","points":"739","rating":"74"},{"position":"11","team":{"fullName":"Scotland","abbreviation":"Sco"},"qfyMatches":"9","played":"7","points":"435","rating":"62"},{"position":"12","team":{"fullName":"Zimbabwe","abbreviation":"ZIM"},"qfyMatches":"14","played":"10","points":"478","rating":"48"},{"position":"13","team":{"fullName":"Netherlands","abbreviation":"NL"},"qfyMatches":"8","played":"5","points":"181","rating":"36"},{"position":"14","team":{"fullName":"Kenya","abbreviation":"KEN"},"qfyMatches":"11","played":"9","points":"309","rating":"34"},{"position":"unranked","team":{"fullName":"Canada","abbreviation":"CAN"},"qfyMatches":"6","played":"4","points":"24","rating":"6"}]}]);
But if you want to just learn HTML parsing then you can allso use Ganon
As per my view its not possible to parse, because that table is appending through AJAX calls.
We can see a empty tag like this:
<section class="standings"></section>
If I have this all wrong, please correct me
Thanks
I was wondering if there's a way to use PHP (or any other server-side or even client-side [if possible] language) to obtain certain pieces of information from a different website (NOT a local file like the include 'nav.php'.
What I mean is that...Say I have a blog at www.blog.com and I have another website at www.mysite.com
Is there a way to gather ALL of the h2 links from www.blog.com and put them in a div in www.mysite.com?
Also, is there a way I could grab the entire information inside a DIV (with an ID of-course) from blog.com and insert it in mysite.com?
Thanks,
Amit
First of all, if you want to retrieve content from a blog, check if the blog generator (ie, Blogger, WordPress) does not have a API thanks to which you won't have to reinvent the wheel. Usually, good APis come with good documentations (meaning that probably 5% out of all APIs are good APIs) and these documentations should come with code examples for top languages such as PHP, JavaScript, Java, etc... Once again, if it is to retrieve content from a blog, there should be tons of frameworks that are here for you
Check out the PHP Simple HTML DOM library
Can be as easy as:
// Create DOM from URL or file
$html = file_get_html('http://www.otherwebsite.com/');
// Find all images
foreach($html->find('h2') as $element)
echo $element->src;
This can be done by opening the remote website as a file, then taking the HTML and using the DOM parser to manipulate it.
$site_html = file_get_contents('http://www.example.com/');
$document = new DOMDocument();
$document->loadHTML($site_html);
$all_of_the_h2_tags = $document->getElementsByTagName('h2');
Read more about PHP's DOM functions for what to do from here, such as grabbing other tags, creating new HTML out of bits and pieces of the DOM, and displaying that on your own site.
Your first step would be to use CURL to do a request on the other site, and bring down the HTML from the page you want to access. Then comes the part of parsing the HTML to find all the content you're looking for. One could use a bunch of regular expressions, and you could probably get the job done, but the Stackoverflow crew might frown at you. You could also take the resulting HTML and use the domDocument object, and loadHTML to parse the HTML and load the content you want.
Also, if you control both sites, you can set up a special page on the first site (www.blog.com) with exactly the information you need, properly formatted either in HTML you can output directly, or XML that you can manipulate more easily from www.mysite.com.