extracting h2 header from website using simplehtmldom

extracting h2 header from website using simplehtmldom - php

im studying simple html dom.
as mentioned in their documentation, if we want to retrieve headers from website like , we would proceed as following:
<?php
include('simple_html_dom.php');
$html = file_get_html('https://www.w3schools.com/');
//to find h1 headers from a webpage
$headlines = array();
foreach($html->find('h2') as $header) {
$headlines[] = $header->plaintext;
}
print_r($headlines);
?>
when i test this sample on my local server, it prints only:
array ()
if i had well understood it should print:
Python Php Java etc....everything that is inside <h2> tag.
am i missing something?

Related

Change src atribute from img, using Simple HTML Dom php library

I'm totally new to php, and I'm having a hard time changing the src attribute of img tags.
I have a website that pulls a part of a page using Simple Html Dom php, here is the code:
<?php
include_once('simple_html_dom.php');
$html = file_get_html('http://www.tabuademares.com/br/bahia/morro-de-sao-paulo');
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
$elem = $html->find('table[id=tabla_mareas]', 0);
echo $elem;
?>
This code correctly returns the part of the page I want. But when I do this the img tags comes with the src of the original page: /assets/svg/icon_name.svg
What I want to do is change the original src so that it looks like this: http://www.mywebsite.com/wp-content/themes/mytheme/assets/svg/icon_name.svg
I want to put the url of my site in front of assets / svg / icon_name.svg
I already tried some tutorials, but I could not make any work.
Could someone please kind of help a noob in php?

i could make it work. So if someone have the same question, here is how i managed to get the code working.
<?php
// Note you must download the php files simple_html_dom.php from
// this link https://sourceforge.net/projects/simplehtmldom/files/
//than include them
include_once('simple_html_dom.php');
//target the website
$html = file_get_html('http://the_target_website.com');
//loop thru all images of the html dom
foreach($html ->find('img') as $item) {
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $item->src;
// Set a attribute
$item->src = 'http://yourwebsite.com/'.$value;
}
//save the variable
$html->save();
//findo on html the div you want to get the content
$elem = $html->find('div[id=container]', 0);
//output it using echo
echo $elem;
?>
That's it!

did you read the documentation for read and modify attributes
As per that
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;
// Set a attribute
$e->href = 'ursitename'.$value;

Issue with php simple html DOM parser in Joomla

I try to isert a stock-chart-module from an orther site into my own website.
As i use:
jimport('simplehtmldom.simple_html_dom');
// get DOM from URL or file
$html = file_get_html('http://www.raiffeisen.com/');
foreach($html->find('div#agrarfenster') as $element)
echo $element->innertext;
The Output will work. But i need this Code for the required output:
jimport('simplehtmldom.simple_html_dom');
// get DOM from URL or file
$html = file_get_html('http://www.raiffeisen.com/');
foreach($html->find('div#boersenfenster_bf_4562') as $element)
echo $element->innertext;
This Code would'nt work. But why?
My guess is that there are those underscores in the "boersenfenster_bf_4562".
Can somebody help me?

Extracting data from HTML using Simple HTML DOM Parser

For a college project, I am creating a website with some back end algorithms and to test these in a demo environment I require a lot of fake data. To get this data I intend to scrape some sites. One of these sites is freelance.com.To extract the data I am using the Simple HTML DOM Parser but so far I have been unsuccessful in my efforts to actually get the data I need.
Here is an example of the HTML layout of the page I intend to scrape. The red boxes mark the required data.
Here is the code I have written so far after following some tutorials.
<?php
include "simple_html_dom.php";
// Create DOM from URL
$html = file_get_html('http://www.freelancer.com/jobs/Website-Design/1/');
//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table[id=project_table] tr') as $tr) {
foreach($tr->find('td[class=title-col]') as $t) {
//get the inner HTML
$data = $t->outertext;
echo $data;
}
}
?>
Hopefully someone can point me in the right direction as to how I can get this working.
Thanks.

The raw source code is different, that's why you're not getting the expected results...
You can check the raw source code using ctrl+u, the data are in table[id=project_table_static], and the cells td have no attributes, so, here's a working code to get all the URLs from the table:
$url = 'http://www.freelancer.com/jobs/Website-Design/1/';
// Create DOM from URL
$html = file_get_html($url);
//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {
// Skip the first empty element
if ($i==0) {
continue;
}
echo "<br/>\$i=".$i;
// get the first anchor
$anchor = $tr->find('a', 0);
echo " => ".$anchor->href;
}
// Clear dom object
$html->clear();
unset($html);
Demo

Save the contents of manipulated div to a variable and pass to php file

I have tried to use AJAX, but nothing I come up with seems to work correctly. I am creating a menu editor. I echo part of a file using php and manipulate it using javascript/jquery/ajax (I found the code for that here: http://www.prodevtips.com/2010/03/07/jquery-drag-and-drop-to-sort-tree/). Now I need to get the edited contents of the div (which has an unordered list in it) I am echoing and save it to a variable so I can write it to the file again. I couldn't get that resource's code to work so I'm trying to come up with another solution.
If there is a code I can put into the $("#save").click(function(){ }); part of the javascript file, that would work, but the .post doesn't seem to want to work for me. If there is a way to initiate a php preg_match in an onclick, that would be the easiest.
Any help would be greatly appreciated.
The code to get the file contents.
<button id="save">Save</button>
<div id="printOut"></div>
<?php
$header = file_get_contents('../../../yardworks/content_pages/header.html');
preg_match('/<div id="nav">(.*?)<\/div>/si', $header, $list);
$tree = $list[0];
echo $tree;
?>
The code to process the new div and send to php file.
$("#save").click(function(){
$.post('edit-menu-process.php',
{tree: $('#nav').html()},
function(data){$("#printOut").html(data);}
);
});
Everything is working EXCEPT something about my encoding of the passed data is making it not read as html and just plaintext. How do I turn this back into html?
EDIT: I was able to get this to work correctly. I'll make an attempt to switch this over to DOMDocument.
$path = '../../../yardworks/content_pages/header.html';
$menu = htmlentities(stripslashes(utf8_encode($_POST['tree'])), ENT_QUOTES);
$menu = str_replace("<", "<", $menu);
$menu = str_replace(">", ">", $menu);
$divmenu = '<div id="nav">'.$menu.'</div>';
/* Search for div contents in $menu and save to variable */
preg_match('/<div id="nav">(.*?)<\/div>/si', $divmenu, $newmenu);
$savemenu = $newmenu[0];
/* Get file contents */
$header = file_get_contents($path);
/* Find placeholder div in user content and insert slider contents */
$final = preg_replace('/<div id="nav">(.*?)<\/div>/si', $savemenu, $header);
/* Save content to original file */
file_put_contents($path, $final);
?>
Menu has been saved.

To post the contents of a div with ajax:
$.post('/path/to/php', {
my_html: $('#my_div').html()
}, function(data) {
console.log(data);
});
If that's not what you need, then please post some code with your question. It is very vague.
Also, you mention preg_match and html in the same question. I see where this is going and I don't like it. You can't parse [X]HTML with regex. Use a parser instead. Like this: http://php.net/manual/en/class.domdocument.php

How to code PHP function that displays a specific div from external file if div called by getElementById has no value?

Thank you for answering my question so quickly. I did some more digging and ultimately found a solution for grabbing data from external file and specific div and posting it into another document using PHP DOMDocument. Now I'm looking to improve the code by adding an if condition that will grab data from a different div if the one called for initially by getElementById has now data. Here is the code for what I got so far.
External html as source.
<div id="tab1_header" class="cushycms"><h2>Meeting - 12:00pm to 3:00pm</h2></div>
My PHP file calling from source looks like this.
<?php
$source = "user_data.htm";
$dom = new DOMDocument();
$dom->loadHTMLFile($source);
$dom->preserveWhiteSpace = false;
$tab1_header = $dom->getElementById('tab1_header');
?>
<html>
<head>
<title></title>
</head>
<body>
<div><h2><?php echo $tab1_header->nodeValue; ?></h2></div>
</body>
</html>
The following function will output a message if a div id can't be found but...
if(!tab1_header)
{
die("Element not found");
}
I would like to call for a different div if the one called for initially has no data. Meaning if <div id="tab1_header"></div> then grab <div id="alternate"><img src="filler.png" /></div>. Can someone help me modify the function above to achieve this result.
Thanks.

either split up master.php so div1\2 are in a file each or set them each to a var, them include master.php, and use the appropriate variable
master.php
$d1='<div id="description1">Some Text</div>';
$d2='<div id="description2">Some Text</div>';
description1.php
include 'master.php';
echo $d1;

You can't do this solely with PHP includes unless you put the divs into separate files. Look into PHP templating; it's probably the best solution for this. Or, since you're new to the language, try using variables:
master.php
$description1 = '<div id="description1">Some Text</div>';
$description2 = '<div id="description2">Some Text</div>';
board1.php
include 'master.php';
echo $description1;
board2.php
include 'master.php';
echo $description2;
Alternatively, you could use JavaScript, but that might get a little messy.

Short answer is: although it's possible it's probably very bad idea taking this approach.
Longer answer: the solution may turn out to be too complicated. If in your master.php file is only HTML markup, you could read content of that file with file_get_contents() function and then parse it (i.e. with DOMDocument library functions). You would have to look for a div with given id.
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$divs = $doc->getElementsByTagName('div');
foreach ($divs as $div)
{
if( $div->getAttribute('id') == 'description1' )
{
echo $div->nodeValue."\n";
}
}
?>
If your master.php file has also some dynamic content you could do following trick:
<?php
ob_start();
include('master.php');
$sMasterPhpContent = ob_get_clean();
// same as above - parse HTML
?>
Edit:
$tab_header = $dom->getElementById('tab1_header') ? $dom->getElementById('tab1_header') : $dom->getElementById('tab2_header');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

extracting h2 header from website using simplehtmldom - php

Related

Change src atribute from img, using Simple HTML Dom php library

Issue with php simple html DOM parser in Joomla

Extracting data from HTML using Simple HTML DOM Parser

Save the contents of manipulated div to a variable and pass to php file

How to code PHP function that displays a specific div from external file if div called by getElementById has no value?

Categories

Resources