I'm trying to scrape the table containing "Institution Name:, Institution Type: etc" from the following website
http://cricos.deewr.gov.au/Institution/InstitutionDetails.aspx?ProviderID=00001
Using PHP and simple_html_dom i've generated the following code:
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://cricos.deewr.gov.au/Institution/InstitutionDetails.aspx?ProviderID=00001');
$element = $html->find("table");
echo $element[1];
?>
I've iterated through tables (e.g $element[0], $element[1], $element[2] etc)
But do not find the correct table.
I then tried the same code on a local copy of the website:
<?php
require 'simple_html_dom.php';
$html = file_get_html('Institution Details.htm');
$element = $html->find("table");
echo $element[1];
?>
And it works I can see the table I am looking for under $element[1].
Why does it work on a local copy but no when using the url? How do I make the first set of code work?
Thankyou
Related
I have been playing around with this code for a while but cant get it to work properly.
My goal is to display or maybe even create a table with ID's of grabbed data from the steam store for my own website and game library. the class is 'game_area_description'
This is a study project of mine.
So i tried to get the table using the following code.
#section('selectedGame');
<?php
$url = 'https://store.steampowered.com/app/'.$game->appID."/";
header("Access-Control-Allow-Origin: ${url}");
$dom = new DOMDocument();
#$dom->loadHTMLFile($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//div[#class="game_area_description"]/a');
$link = $dom->saveHTML($elements->item(0));
echo $link;
?>
#endsection;
I am using Laravel by the way.
In some other cases i can get another piece of the website.
$url = 'https://store.steampowered.com/app/'.$game->appID."/";
$content = file_get_contents($url);
$first_step = explode( '<div class="game_description_snippet">' , $content );
$second_step = explode("</div>" , $first_step[1] );
echo "<p>${second_step[0]}</p>";
Here it just takes the excerpt of the webpage which works in some cases.
Here is the biggest issue, other than not beeing able to get all the information where i get an error $first_step[1]is not valid.
Is some CORE issue.
See the webpage loads an age check in some cases like "Batman Arkham knight". the user needs to either log in or verify their age first.
Keeping me from using the second block of code.
But the first gives me all kinds of errors as the screenshot shows.
Anyone know of a way to grab this part of the page?
Where the description of the game is?
The answer to my question was in the comments.
apparently steam has some undocumented API's .
here is the code ( with bootstrap CSS).
That i used and going ti implement in my migration tables and seeder
#section('selectedGame');
<div class="container border">
<!-- Content here -->
<?php
$url = "http://store.steampowered.com/api/appdetails?appids=".$game->appID;
$jsondata = file_get_contents($url);
$parsed = json_decode($jsondata,true);
$gameID = $game->appID;
$gameDescr = $parsed[$gameID]['data']['about_the_game'];
echo $gameDescr;
?>
</div>
#endsection;
im studying simple html dom.
as mentioned in their documentation, if we want to retrieve headers from website like , we would proceed as following:
<?php
include('simple_html_dom.php');
$html = file_get_html('https://www.w3schools.com/');
//to find h1 headers from a webpage
$headlines = array();
foreach($html->find('h2') as $header) {
$headlines[] = $header->plaintext;
}
print_r($headlines);
?>
when i test this sample on my local server, it prints only:
array ()
if i had well understood it should print:
Python Php Java etc....everything that is inside <h2> tag.
am i missing something?
I was recently asked to take part in a certain project that, for now, aims to parse a chunk of HTML code to PHP . Using a certain website that I have been assigned, I went through inspect element to complete my code's missing parts. The actual aim is to spit out (using echo) some certain data on localhost, without them being stored into a database or anything relevant. Attached is the html and PHP code in a few printscreens (couldnt upload the raw codes, dunno why). Thanks in advance!
Php code:
<?php
include_once('simple_html_dom.php');
$html = new simple_html_dom();
// Website link to scrap
$website = 'https://www1.gsis.gr/webtax3/etak/faces/main.jspx?_adf.ctrl-
state=16kjeyshcz_4&_afrLoop=70130840737831';
// Create DOM from URL or file
$html = file_get_html($website, false, null, 0);
//$html = str_get_html('<html><body><div id="pt1:r1:0:t3::db">Hello</div>
<div class="xx8">Goodbye</div></body></html>');
//$ret = $html->find('.xx8', 0)->plaintext;
if (is_array($html)) {
foreach($html->find('div[class=xx8]')->outertext as $data) {
echo $data->outertext;
}
}
?>
HTML code (via inspect element, where Δεν βρεθηκαν γηπεδα is the custom text of the page i told you about):
<div id="pt1:r1:1:t3::db" class="xx8"
style="position:relative;width:100%;overflow:hidden" _afrcolcount="30"><table
class="xxb xy3" style="table-layout:fixed;position:relative;width:2097px;"
cellspacing="0" _totalwidth="2097" _selstate="{}" _rowcount="0" _startrow="0">
<colgroup span="30"><col style="width:80px;"><col style="width:110px;"><col
style="width:105px;"><col style="width:105px;"><col style="width:105px;"><col
style="width:75px;"><col style="width:35px;"><col style="width:50px;"><col
style="width:55px;"><col style="width:80px;"><col style="width:65px;"><col
style="width:65px;"><col style="width:55px;"><col style="width:95px;"><col
style="width:65px;"><col style="width:55px;"><col style="width:75px;"><col
style="width:75px;"><col style="width:60px;"><col style="width:60px;"><col
style="width:60px;"><col style="width:60px;"><col style="width:50px;"><col
style="width:50px;"><col style="width:60px;"><col style="width:50px;"><col
style="width:62px;"><col style="width:125px;"><col style="width:55px;"><col
style="width:55px;"></colgroup></table>Δε βρέθηκαν γήπεδα.</div>
For a college project, I am creating a website with some back end algorithms and to test these in a demo environment I require a lot of fake data. To get this data I intend to scrape some sites. One of these sites is freelance.com.To extract the data I am using the Simple HTML DOM Parser but so far I have been unsuccessful in my efforts to actually get the data I need.
Here is an example of the HTML layout of the page I intend to scrape. The red boxes mark the required data.
Here is the code I have written so far after following some tutorials.
<?php
include "simple_html_dom.php";
// Create DOM from URL
$html = file_get_html('http://www.freelancer.com/jobs/Website-Design/1/');
//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table[id=project_table] tr') as $tr) {
foreach($tr->find('td[class=title-col]') as $t) {
//get the inner HTML
$data = $t->outertext;
echo $data;
}
}
?>
Hopefully someone can point me in the right direction as to how I can get this working.
Thanks.
The raw source code is different, that's why you're not getting the expected results...
You can check the raw source code using ctrl+u, the data are in table[id=project_table_static], and the cells td have no attributes, so, here's a working code to get all the URLs from the table:
$url = 'http://www.freelancer.com/jobs/Website-Design/1/';
// Create DOM from URL
$html = file_get_html($url);
//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {
// Skip the first empty element
if ($i==0) {
continue;
}
echo "<br/>\$i=".$i;
// get the first anchor
$anchor = $tr->find('a', 0);
echo " => ".$anchor->href;
}
// Clear dom object
$html->clear();
unset($html);
Demo
Thank you for answering my question so quickly. I did some more digging and ultimately found a solution for grabbing data from external file and specific div and posting it into another document using PHP DOMDocument. Now I'm looking to improve the code by adding an if condition that will grab data from a different div if the one called for initially by getElementById has now data. Here is the code for what I got so far.
External html as source.
<div id="tab1_header" class="cushycms"><h2>Meeting - 12:00pm to 3:00pm</h2></div>
My PHP file calling from source looks like this.
<?php
$source = "user_data.htm";
$dom = new DOMDocument();
$dom->loadHTMLFile($source);
$dom->preserveWhiteSpace = false;
$tab1_header = $dom->getElementById('tab1_header');
?>
<html>
<head>
<title></title>
</head>
<body>
<div><h2><?php echo $tab1_header->nodeValue; ?></h2></div>
</body>
</html>
The following function will output a message if a div id can't be found but...
if(!tab1_header)
{
die("Element not found");
}
I would like to call for a different div if the one called for initially has no data. Meaning if <div id="tab1_header"></div> then grab <div id="alternate"><img src="filler.png" /></div>. Can someone help me modify the function above to achieve this result.
Thanks.
either split up master.php so div1\2 are in a file each or set them each to a var, them include master.php, and use the appropriate variable
master.php
$d1='<div id="description1">Some Text</div>';
$d2='<div id="description2">Some Text</div>';
description1.php
include 'master.php';
echo $d1;
You can't do this solely with PHP includes unless you put the divs into separate files. Look into PHP templating; it's probably the best solution for this. Or, since you're new to the language, try using variables:
master.php
$description1 = '<div id="description1">Some Text</div>';
$description2 = '<div id="description2">Some Text</div>';
board1.php
include 'master.php';
echo $description1;
board2.php
include 'master.php';
echo $description2;
Alternatively, you could use JavaScript, but that might get a little messy.
Short answer is: although it's possible it's probably very bad idea taking this approach.
Longer answer: the solution may turn out to be too complicated. If in your master.php file is only HTML markup, you could read content of that file with file_get_contents() function and then parse it (i.e. with DOMDocument library functions). You would have to look for a div with given id.
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$divs = $doc->getElementsByTagName('div');
foreach ($divs as $div)
{
if( $div->getAttribute('id') == 'description1' )
{
echo $div->nodeValue."\n";
}
}
?>
If your master.php file has also some dynamic content you could do following trick:
<?php
ob_start();
include('master.php');
$sMasterPhpContent = ob_get_clean();
// same as above - parse HTML
?>
Edit:
$tab_header = $dom->getElementById('tab1_header') ? $dom->getElementById('tab1_header') : $dom->getElementById('tab2_header');