Get contents of element from external page PHP - php

I'd like to get the content (CSS, children, ect.) to display on a HTML page, but this element is on a external page. When I use:
$page = new DOMDocument();
$page->loadHTMLFile('about.php');
$text = $page->getElementById('text');
echo $text->nodeValue;
I only get the text, but #text also has a image as child and some CSS. Can I get (and echo) those to, kind of like with an iframe, but then with a element. If so, how?
Thanks a lot.

Maybe what you're looking for is DOMDocument::saveHTML().
If you set the optional arguments it outputs only this particular node.
$elm = $page->getElementById('text');
echo $elm->ownerDocument->saveHTML($elm);

I have found a solution, although it doesn't retrieve the CSS, but if you only need the element and its children, this is my best bet.
Use simple_html_dom.php to do all the hard stuff.
My external page:
<div id='text'>
<img src='img/dummy.png' align='left' alt='Image not available. Our apologies.'/>
<span>text</span><br/>
<p>
text
</p>
<p>
text
</p>
<p>
text
</p>
<div>
Now, my page that I'd like to show the contents of my external page:
<?php include('../includes/simple_html_dom.php'); ?>
....
<?php
$html = file_get_html('about.php');
$ret = $html->find('div#text', 0);
echo $ret;
?>
what this does, it echos the element with its children, without CSS unfortunately.

Related

Edit iframe content using PHP, and preg_replace()

I need to load some 3rd party widget onto my website. The only way they distribute it is by means of clumsy old <iframe>.
I don't have much choice so what I do is get an iframe html code, using a proxy page on my website like so:
$iframe = file_get_contents('http://example.com/page_with_iframe_html.php');
Then I have to remove some specific parts in iframe like this:
$iframe = preg_replace('~<div class="someclass">[\s\S]*<\/div>~ix', '', $iframe);
In this way I intend to remove the unwanted section. And in the end i simply output the iframe like so:
echo ($iframe);
The iframe gets output alright, however the unwanted section is still there. The regex itself was tested using regex101, but it doesn't work.
You should try this way, Hope this will help you out. Here i am using sample HTML remove the div with given class name, First i load the document, query and remove that node from the child.
Try this code snippet here
<?php
ini_set('display_errors', 1);
//sample HTML content
$string1='<html>'
. '<body>'
. '<div>This is div 1</div>'
. '<div class="someclass"> <span class="hot-line-text"> hotline: </span> <a id="hot-line-tel" class="hot-line-link" href="tel:0000" target="_parent"> <button class="hot-line-button"></button> <span class="hot-line-number">0000</span> </a> </div>'
. '</body>'
. '</html>';
$object= new DOMDocument();
$object->loadHTML($string1);
$xpathObj= new DOMXPath($object);
$result=$xpathObj->query('//div[#class="someclass"]');
foreach($result as $node)
{
$node->parentNode->removeChild($node);
}
echo $object->saveHTML();

How to parse multiple elements in portions for html via Simple Html Dom

I am attempting to get various elements inside of an li as shown below. I am pretty new to this so I may not be using the most efficient methods but this is where I have started...
EXAMPLE CODE SIMPLIFIED....
<li id='entry_0' title='09879879'>
<div ....>
<h2> The title text would go here </h2>
<span class='entrySize' ....> 20oz </span>
<span class='entryPrice' ....> $32.09 </span>
<span class='anotherEntry' ....> More Data I need To Grab </span>
.......
</div>
</li>
<li> .... With same structure as above .... 100's of entries like this </li>
I know how to pull individual parts separately but having trouble grasping how to do it grouped within a portion of the html.
$filename = "directory/file.html";
$html = file_get_html($filename);
for($i=0; $i<=count(entryNumber);$i++)
{
$li_id = "entry_".$i;
foreach($html->find('li[id='.$li_id.']') as $li) {
echo $li->innertext;
}
}
So this gets me the content in the line item tag with the id number as the unique attribute. I would like to grab the h2 text, entrySize, entryPrice etc as I iterate through the line item tags. What I don't understand is once I have the line item tag content how can I parse through that line item inner tags and attributes. There maybe other parts of the full HTML document that has tags with same id, class as these throughout the document so I am breaking this down to portions and than looking to parse each section at a time.
I would also like to pull the title attribute out of the title tag for the li tag.
I hope my explanation make sense.
You should probably use a DOM parser. PHP comes bundled with one, and there are many other's you could use.
http://php.net/dom
PHP Simple HTML DOM Parser
<?php
$html = file_get_content($page);
$doc = new DOMDocument();
$doc->loadHTML($html);
// now find what you need
$items = $dom->getElementsByTagName('li');
foreach ($items as $item) {
$id = $item->getAttribute('id');
if (strpos($id, 'item_') !== false) {
// found matchin li, grab its children
}
}
Use this as a baseline, we can't write all the code for you. Check out the PHP docs to finish this :) From what I have so far, you need to follow the docs to make it grab the child values, and handle them.

Select Content of div using php

I have a div named "main" in my page. I put the code to convert a html into pdf using php at the end of page. I want to select the content (div named main contains paragraphs, charts, tables etc.).
How ?
Below code will show you how to get DIV tag's content using PHP code.
PHP Code:
<?php
$content="test.html";
$source=new DOMdocument();
$source->loadHTMLFile($content);
$path=new DOMXpath($source);
$dom=$path->query("*/div[#id='test']");
if (!$dom==0) {
foreach ($dom as $dom) {
print "
The Type of the element is: ". $dom->nodeName. "
<b><pre><code>";
$getContent = $dom->childNodes;
foreach ($getContent as $attr) {
print $attr->nodeValue. "</code></pre></b>";
}
}
}
?>
We are getting DIV tag with ID "test", You can replace it with your desired one.
test.html
<div id="test">This is my content</div>
Output:
The Type of the element is: div
This is my content
You should put the php code into a separate file from the html and use something like DOMDocument to get the content from the div.
$dom = new DOMDocument();
$dom->loadHTMLFile('yourfile.html');
...
You cannot directly interact with the HTML DOM via PHP.
What you could do, is using a with an input containing your content. When submitting the form you can access the data via PHP.
But maybe you want to use Javascript for that task?
Nevertheless, a quick'n'dirty PHP example:
<form action="" method="post">
<textarea name="content">hello world</textarea>
</form>
<?php
if (isset($_POST['content'])) {
echo $_POST['content'];
}
?>

Using PHP X-Path to extract specific parts of a webpage

I am after a specific value from a webapge; the product name that is in the h1 tag:
<div id="extendinfo_container">
<h1><strong>Product Name</strong></h1>
<div style="font-size:0;height:4px;"></div>
<p class="text_breadcrumbs">
<img src="arrow_091.gif" align="absmiddle"/>
Product Name<img src="arrow_091.gif" align="absmiddle"/>
<strong>Product Name</strong>
<div class="dotted_line_blue">
<img src="theme_shim.gif" height="1" width="100%" alt=" " />
</div>
</div>
This is a poorly structured website with more than one h1 so I cannot simply do getElementById('h1').
I want to be as specific as possible in which element I get and this is the code I have:
$doc = new DOMDocument();
#$doc->loadHTML(file_get_contents('http://url/to/website'));
// locate <div id="extendinfo_container"><a><h1><strong>(.*)</strong></h1></a> as product name
$x = new DOMXPath($doc);
$pName = $x->query('//div[#id="extendinfo_container"]/a/h1/strong');
var_dump($pName->nodeValue);
This is return null. What query do I need to use to get the content I want?
query() returns a DOMNodeList, which doesn't have a nodeValue property. You have to select one element (i.e. the first):
$pName = $x->query('//div[#id="extendinfo_container"]/a/h1/strong')->item(0);
Or iterate over it:
foreach( $pName as $el) {
var_dump( $el->nodeValue);
}
Either one of these will give you access to a DOMNode, which is what you're looking for.
PHP's DOM is VERY picky about the html you load into it. It will barf and refuse to load even slightly malformed documents.
Turn off error supression (#$doc->loadHTML, remove the #) and make sure that it's not puking on this page you're trying to analyze. Otherwise, your XPath query looks fine, and if the document does get loaded/parsed properly, it SHOULD work.
The query works fine. I was accessing the value wrong. Here is the correct way to access the value:
var_dump($pName->item(0)->nodeValue);

PHP or Javascript: Simply Remove and Replace HTML Code

I have this code on my page, but the link has different names and ids:
<div class="myclass">
<a href="http://www.example.com/?vstid=00575000&veranstaltung=http://www.example.com/page.html">
Example Text</a>
</div>
how can I remove and Replace it to this:
<div class="myclass">Sorry no link</div>
With PHP or Javascript? I tried it with str.replace
Thank you!
I assume you mean dynamically? You won't be able to do this with php because it is server side, and doesn't have anything to do with the HTML once its been output to the screen.
See: http://www.tizag.com/javascriptT/javascript-innerHTML.php for the javascript.
Or you could use jquery which is just better and nicer than trying to do a cross browser compatible javascript script.
$('.myclass').html('Sorry...');
If the page is still on the server before you need to make the replacement, do this:
<?php if (allowed_to_see_link()) { ?>
<div class="myclass">
<a href="http://www.example.com/? vstid=00575000&veranstaltung=http://www.example.com/page.html">
Example Text</a>
</div>
<?php } else { ?>
non-link-text
<php } ?>
and also write the named functions...
You might want to clearify what you are up to. If that is your file, then you can simply open up in an editor and remove the portions. If you want to modify HTML with PHP, you can use native DOM
$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$xPath = new DOMXPath($dom);
foreach( $xPath->query('//div[#class="myclass"]/a') as $link) {
$link->parentNode->replaceChild(new DOMText('Sorry no link'), $link);
}
echo $dom->saveHTML();
The above code would replace any direct <a> element children of any <div> elements that have a class attribute of myclass with the Textnode "Sorry no link".

Categories