Scraping some data using PHP - php

I like to have update of some selected users score from https://location.services.mozilla.com/leaders For that i want to scrape their data using the id as you see for the first user <a id="g#v" href="#g#v">g#v</a>
I want to have his data using the id on that anchor. I have wrote the attached code but, couldn't succeed. Best if i could create an array like $names=array("g#v","elly","elkos"," grack"); and get the scores of all of them, so if i increase the number of names, i can have everyone's score. I was trying for a single user though:
<?php
$html = file_get_contents('https://location.services.mozilla.com/leaders');
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('//a[#id="g#v"]');
if($pokemon_row->length > 0){
foreach($pokemon_row as $row){
$name = $row->nodeValue;
$scores = $pokemon_xpath->query('//.td.tr/td[#class="text-right"]', $row);
foreach($scores as $score){
$lead = $score->nodeValue;
}
echo $name . ": ". $lead;
}
}
}
?>

I tweaked your scraping code to display all users and their score:
<?php
$html = file_get_contents('https://location.services.mozilla.com/leaders');
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('//tr');
if($pokemon_row->length > 0){
// Get each tr
foreach($pokemon_row as $row){
// Get each td in the tr group
$tds = $row->childNodes;
foreach($tds as $td){
// Filter out empty nodes
if(trim($td->nodeValue, " \n\r\t\0\xC2\xA0") !== "") {
$name = $td->nodeValue;
echo $name . " ";
}
}
echo "<br>" . PHP_EOL;
}
}
}
?>
This will print out:
Rank User Points
1. g#v 18444343
2. elly 4458330
3. elkos 4452607
4. grack 2769789
5. sonickydon 2707133
6. SDBoyd 2636721
...
If you want to search for a single user, then just replace the query line with this:
$pokemon_row = $pokemon_xpath->query('//tr[td/a[#id="g#v"]]');
which will print:
1. g#v 18444343
Hope that helps.

Related

php echo "%" displaying more than once

Testing with data scraping. The output I'm scraping, is a percent. So I basically slapped on a
echo "%<br>";
At the end of the actual number output which is
echo $ret_[66];
However there's an issue where the percent is actually appearing before the number as well, which is not desirable. This is the output:
%
-0.02%
Whereas what I'm trying to get is just -0.02%
Clearly I'm doing something wrong with the PHP. I'd really appreciate any feedback/solutions. Thank you!
Full code:
<?php
error_reporting(E_ALL^E_NOTICE^E_WARNING);
include_once "global.php";
$doc = new DOMDocument;
// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('http://www.moneycontrol.com/markets/global-indices/');
$xpath = new DOMXPath($doc);
$query = "//div[#class='MT10']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$result = trim($entry->textContent);
$ret_ = explode(' ', $result);
//make sure every element in the array don't start or end with blank
foreach ($ret_ as $key => $val){
$ret_[$key] = trim($val);
}
//delete the empty element and the element is blank "\n" "\r" "\t"
//I modify this line
$ret_ = array_values(array_filter($ret_,deleteBlankInArray));
//echo the last element
echo $ret_[66];
echo "%<br>";
}
<?php
echo "%<br>";
?>
On a seperate following PHP code. Does the same thing.

How to parse xml and filter content in php via DOMXPath::query

XML file link here :-
https://drive.google.com/file/d/0B9NYyX8V32BmM0JsbkNYZW9tLWM/view?usp=sharing
I want to filter only crucial details from the data .
My code does not generate any output on screen.
//demo.php
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->load('settings.xml');
$xpath = new DOMXPath($doc);
$lineitems = $doc->getElementsByTagName('lineitems')->item(0);
$query = 'lineitem/itemcode';
$entries = $xpath->query($query, $lineitems);
foreach ($entries as $entry) {
echo "Found {$entry->nextSiblings->nodeValue}," .
" by {$entry->previousSibling->nodeValue}\n";
}
?>
I am getting blank on screen.
output needed:-
ITEMCODE QUANTITY
ITEMCODE QUANTITY
ITEMCODE QUANTITY
ITEMCODE QUANTITY

How do I prevent from timedout

I use this method to extract data from html of a website. But sometimes it gets stuck. How do I prevent it from being timedout? Like rather than giving some weird errors it should simply say; can't fetch result right now
$html = file_get_contents('https://homeshopping.pk/search.php?category%5B%5D=&search_query='.$homeshoppingSearch);
$pk_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pk_doc->loadHTML($html);
libxml_clear_errors();
$pk_xpath = new DOMXPath($pk_doc);
/
//Homeshopping
$pk_row = $pk_xpath->query('//a[#class=""]');
$pk_row2 = $pk_xpath->query('//a[#class="price"]');
$pk_row3 = $pk_xpath->query('(//a[#class="price"])//#href');
//HomeShopping
if($pk_row->length > 0){
$rowarray = array();
foreach($pk_row as $row){
$rowarray[]= $row->nodeValue;
// echo $row->nodeValue . "<br/>";
}
}
if($pk_row2->length > 0){
$row2array = array();
foreach($pk_row2 as $row2){
$row2array[]=$row2->nodeValue;
// echo $row2->nodeValue . "<br/>";
}
}
if($pk_row3->length > 0){
$row3array = array();
foreach($pk_row3 as $row3){
$row3array[]=$row3->nodeValue;
//echo $row3->nodeValue . "<br/>";
}
}
}
You can try using the set_time_limit function. It lets you set how long the execution time may be
Another way would be to set the time limit right in the php ini if you got the rights to edit it

How Can i get the child element using class using php DOMXPath?

I want to get the child element with specific class form html I have manage to find the element using tag name but can't figureout how can I get the child emlement with specific class?
Here is my CODE:
<?php
$html = file_get_contents('myfileurl'); //get the html returned from the following url
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if (!empty($html)) { //if any html is actually returned
$pokemon_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$pokemon_xpath = new DOMXPath($pokemon_doc);
//get all the h2's with an id
$pokemon_row = $pokemon_xpath->query("//li[#class='content']");
if ($pokemon_row->length > 0) {
foreach ($pokemon_row as $row) {
$title = $row->getElementsByTagName('h3');
foreach ($title as $a) {
echo "Title: ";
echo strip_tags($a->nodeValue). '<br>';
}
$links = $row->getElementsByTagName('a');
foreach ($links as $l) {
echo "Link: ";
echo strip_tags($l->nodeValue). '<br>';
}
$desc = $row->getElementsByTagName('span');
//I tried that but didnt work..... iwant to get the span with class desc
//$desc = $row->query("//span[#class='desc']");
foreach ($desc as $d) {
echo "DESC: ";
echo strip_tags($d->nodeValue) . '<br><br>';
}
// echo $row->nodeValue . "<br/>";
}
}
}
?>
Please let me know if this is a duplicate but I cant find out or you think question is not good or not explaining well please let me know in comments.
Thanks.

How to parse the attribute value of a <a> tag in PHP

I am trying to parse a html page for a database for universities and colleges in US. The code I wrote does fetches the names of the universities but I am unable to to fetch their respective url address.
public function fetch_universities()
{
$url = "http://www.utexas.edu/world/univ/alpha/";
$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$tr = $tables->item(1)->getElementsByTagName('tr');
$td = $tr->item(7)->getElementsByTagName('td');
$rows = $td->item(0)->getElementsByTagName('li');
$count = 0;
foreach ($rows as $row)
{
$count++;
$cols = $row->getElementsByTagName('a');
echo "$count:".$cols->item(0)->nodeValue. "\n";
}
}
This is my code that I have currently.
Please tell me how to fetch the attribute values as well.
Thank you
If you have a reference to an element, you just have to use getAttribute(), so probably:
echo "$count:".$cols->item(0)->getAttribute('href') . "\n";

Categories