php echo to html parsing issues - php

I was recently asked to take part in a certain project that, for now, aims to parse a chunk of HTML code to PHP . Using a certain website that I have been assigned, I went through inspect element to complete my code's missing parts. The actual aim is to spit out (using echo) some certain data on localhost, without them being stored into a database or anything relevant. Attached is the html and PHP code in a few printscreens (couldnt upload the raw codes, dunno why). Thanks in advance!
Php code:
<?php
include_once('simple_html_dom.php');
$html = new simple_html_dom();
// Website link to scrap
$website = 'https://www1.gsis.gr/webtax3/etak/faces/main.jspx?_adf.ctrl-
state=16kjeyshcz_4&_afrLoop=70130840737831';
// Create DOM from URL or file
$html = file_get_html($website, false, null, 0);
//$html = str_get_html('<html><body><div id="pt1:r1:0:t3::db">Hello</div>
<div class="xx8">Goodbye</div></body></html>');
//$ret = $html->find('.xx8', 0)->plaintext;
if (is_array($html)) {
foreach($html->find('div[class=xx8]')->outertext as $data) {
echo $data->outertext;
}
}
?>
HTML code (via inspect element, where Δεν βρεθηκαν γηπεδα is the custom text of the page i told you about):
<div id="pt1:r1:1:t3::db" class="xx8"
style="position:relative;width:100%;overflow:hidden" _afrcolcount="30"><table
class="xxb xy3" style="table-layout:fixed;position:relative;width:2097px;"
cellspacing="0" _totalwidth="2097" _selstate="{}" _rowcount="0" _startrow="0">
<colgroup span="30"><col style="width:80px;"><col style="width:110px;"><col
style="width:105px;"><col style="width:105px;"><col style="width:105px;"><col
style="width:75px;"><col style="width:35px;"><col style="width:50px;"><col
style="width:55px;"><col style="width:80px;"><col style="width:65px;"><col
style="width:65px;"><col style="width:55px;"><col style="width:95px;"><col
style="width:65px;"><col style="width:55px;"><col style="width:75px;"><col
style="width:75px;"><col style="width:60px;"><col style="width:60px;"><col
style="width:60px;"><col style="width:60px;"><col style="width:50px;"><col
style="width:50px;"><col style="width:60px;"><col style="width:50px;"><col
style="width:62px;"><col style="width:125px;"><col style="width:55px;"><col
style="width:55px;"></colgroup></table>Δε βρέθηκαν γήπεδα.</div>

Related

extracting h2 header from website using simplehtmldom

im studying simple html dom.
as mentioned in their documentation, if we want to retrieve headers from website like , we would proceed as following:
<?php
include('simple_html_dom.php');
$html = file_get_html('https://www.w3schools.com/');
//to find h1 headers from a webpage
$headlines = array();
foreach($html->find('h2') as $header) {
$headlines[] = $header->plaintext;
}
print_r($headlines);
?>
when i test this sample on my local server, it prints only:
array ()
if i had well understood it should print:
Python Php Java etc....everything that is inside <h2> tag.
am i missing something?

PHP Web Scrape with PHP Simple HTML DOM Parser

I am trying to get a data field using PHP Simple HTML DOM Parser. I can pull the links, images etc but cannot get a certain data attribute.
Example HTML -
<div id="used">
<div id="srpVehicle-1C3CCCEG2FN601809" class="vehicle" data-vin="1C3CCCEG2FN601809">
<div id="srpVehicle-1C3CCCEG2FN601810" class="vehicle" data-vin="1f2CfCEG2FN266778">
</div>
I would like to get all the "data-vin" fields on a site.
Here is my go at it -
$html = file_get_html($url);
foreach($html->find("div[data-vin]", 0) as $vin){
echo $vin."<br>";
}
But it returns the whole page when I echo $vin. How can I access that data-vin field?
$html->find("data-vin", 0)
is looking for tags named data-vin, when you really want tags with the attribute data-vin.
foreach($html->find("[data-vin]") as $tag){
echo $tag->getAttribute('data-vin')."<br>";
}

Edit iframe content using PHP, and preg_replace()

I need to load some 3rd party widget onto my website. The only way they distribute it is by means of clumsy old <iframe>.
I don't have much choice so what I do is get an iframe html code, using a proxy page on my website like so:
$iframe = file_get_contents('http://example.com/page_with_iframe_html.php');
Then I have to remove some specific parts in iframe like this:
$iframe = preg_replace('~<div class="someclass">[\s\S]*<\/div>~ix', '', $iframe);
In this way I intend to remove the unwanted section. And in the end i simply output the iframe like so:
echo ($iframe);
The iframe gets output alright, however the unwanted section is still there. The regex itself was tested using regex101, but it doesn't work.
You should try this way, Hope this will help you out. Here i am using sample HTML remove the div with given class name, First i load the document, query and remove that node from the child.
Try this code snippet here
<?php
ini_set('display_errors', 1);
//sample HTML content
$string1='<html>'
. '<body>'
. '<div>This is div 1</div>'
. '<div class="someclass"> <span class="hot-line-text"> hotline: </span> <a id="hot-line-tel" class="hot-line-link" href="tel:0000" target="_parent"> <button class="hot-line-button"></button> <span class="hot-line-number">0000</span> </a> </div>'
. '</body>'
. '</html>';
$object= new DOMDocument();
$object->loadHTML($string1);
$xpathObj= new DOMXPath($object);
$result=$xpathObj->query('//div[#class="someclass"]');
foreach($result as $node)
{
$node->parentNode->removeChild($node);
}
echo $object->saveHTML();

Extracting data from HTML using Simple HTML DOM Parser

For a college project, I am creating a website with some back end algorithms and to test these in a demo environment I require a lot of fake data. To get this data I intend to scrape some sites. One of these sites is freelance.com.To extract the data I am using the Simple HTML DOM Parser but so far I have been unsuccessful in my efforts to actually get the data I need.
Here is an example of the HTML layout of the page I intend to scrape. The red boxes mark the required data.
Here is the code I have written so far after following some tutorials.
<?php
include "simple_html_dom.php";
// Create DOM from URL
$html = file_get_html('http://www.freelancer.com/jobs/Website-Design/1/');
//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table[id=project_table] tr') as $tr) {
foreach($tr->find('td[class=title-col]') as $t) {
//get the inner HTML
$data = $t->outertext;
echo $data;
}
}
?>
Hopefully someone can point me in the right direction as to how I can get this working.
Thanks.
The raw source code is different, that's why you're not getting the expected results...
You can check the raw source code using ctrl+u, the data are in table[id=project_table_static], and the cells td have no attributes, so, here's a working code to get all the URLs from the table:
$url = 'http://www.freelancer.com/jobs/Website-Design/1/';
// Create DOM from URL
$html = file_get_html($url);
//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {
// Skip the first empty element
if ($i==0) {
continue;
}
echo "<br/>\$i=".$i;
// get the first anchor
$anchor = $tr->find('a', 0);
echo " => ".$anchor->href;
}
// Clear dom object
$html->clear();
unset($html);
Demo

Select Content of div using php

I have a div named "main" in my page. I put the code to convert a html into pdf using php at the end of page. I want to select the content (div named main contains paragraphs, charts, tables etc.).
How ?
Below code will show you how to get DIV tag's content using PHP code.
PHP Code:
<?php
$content="test.html";
$source=new DOMdocument();
$source->loadHTMLFile($content);
$path=new DOMXpath($source);
$dom=$path->query("*/div[#id='test']");
if (!$dom==0) {
foreach ($dom as $dom) {
print "
The Type of the element is: ". $dom->nodeName. "
<b><pre><code>";
$getContent = $dom->childNodes;
foreach ($getContent as $attr) {
print $attr->nodeValue. "</code></pre></b>";
}
}
}
?>
We are getting DIV tag with ID "test", You can replace it with your desired one.
test.html
<div id="test">This is my content</div>
Output:
The Type of the element is: div
This is my content
You should put the php code into a separate file from the html and use something like DOMDocument to get the content from the div.
$dom = new DOMDocument();
$dom->loadHTMLFile('yourfile.html');
...
You cannot directly interact with the HTML DOM via PHP.
What you could do, is using a with an input containing your content. When submitting the form you can access the data via PHP.
But maybe you want to use Javascript for that task?
Nevertheless, a quick'n'dirty PHP example:
<form action="" method="post">
<textarea name="content">hello world</textarea>
</form>
<?php
if (isset($_POST['content'])) {
echo $_POST['content'];
}
?>

Categories