how to extract raw html code using simplehtmldom - php

I am trying to extract raw html from a web-page using simplehtmldom. I was wondering if it is possible using that library.
For example, let's say I have this web page I am trying to extract data from.
<div class="class1">
<div class="class2">
<div class="class3">
<p>p1</p>
<h1>header here!</h1>
<p>p2</p>
<img src="someimage"></img>
</div>
</div>
</div>
My goal is to extract everything within div class3 including the raw html code so when I get the data I can enter it to a text box which allows input for source code so it is formatted the same way it is from the webpage.
I have looked at simplehtmldom manuals and did some searching but have yet to find a solution.
Thank you.

Using your example html string
$html = str_get_html('<div class="class1">
<div class="class2">
<div class="class3">
<p>p1</p>
<h1>header here!</h1>
<p>p2</p>
<img src="someimage"></img>
</div>
</div>
</div>');
// Find all divs with class3
foreach($html->find('div[class=class3]') as $element) {
echo $element->outertext;
}

Related

Extract Nested Tag Using PHP

Tag hierarchy in a webpage :
<body>
<div id='header'>
<h2>.....</h2>
</div>
<div id='main'>
<h2>...</h2>
//Some other content
<h2>...</h2>
</div>
<div id='footer'>
<h2>.....</h2>
</div>
</body>
[PROBLEM : ] From the above hierarchy structure of a webpaege, I want to extract only the <h2> tags which are inside the <div id='main'>. Can someone please please help me out ?
What I have tried is.... using HTML DOM of php $h2Tags = $htmlDom->getElementsByTagName('h2');, but this gives me all the <h2> tag which are outside of main div as well. Please guide me to a solution.
I have updated this to PHP:
h2_tags below will get list of h2s in main div:
$div_m = $htmlDom->getElementById('main');
$h2_tags = $div_m->getElementsByTagName('h2');
This is JS:
var div_m = document.getElementById("main");
var h2_tag = div_m.getElementsByTagName('h2');

Getting element in PHP - PHP Simple HTML DOM Parser

Can you help me with the resolution below?
I have the following code in html:
<div class="return-form">
<div class="two_cols">
<div class="first_col">
<label for="namesinger">Name:</label> </div>
<div class="second_col">
<p id="name">Axl Rose</p>
</div>
</div>
I am using the PHP Simple HTML DOM Parser library and I would like to display only the name "Axl Rose" on the screen.
echo ($ name)
expected exit
Axl Rose
This how you can extract the data
<?php
// Load the HTML
$html = str_get_html('<div class="return-form">
<div class="two_cols">
<div class="first_col">
<label for="namesinger">Name:</label> </div>
<div class="second_col">
<p id="name">Axl Rose</p>
</div>
</di');
// Locate the date via div ID and display
echo $html->find('p[id=name]', 0)->plaintext;
?>
For more details Read this

Store html content into php variable?

I have some html in my php page that I fetched from database and will be used many times in my webpage, so I want to put it into a php variable for later use.
My sample code is
<div class="info">
Some content from database here...
<div class="more">
Some more text...
</div>
</div>
How to store this html into php variable?
Please also tell me how to echo content of that variable?
I do agree with #Jon Stirling but You can do like below:
$htmlData = '<div class="info">
Some content from database here...
<div class="more">
Some more text...
</div>
</div>';
and you can print this
echo $htmlData;

How to get content from Div which have other HTML tags using Regexp

I have div which contain other html tags along with text
I want to extract only text from this div OR inside all html tags
<div class="rpr-help m-chm">
<div class="header">
<h2 class="h6">Repair Help</h2>
</div><!-- /end .header -->
<div class="inner m-bsc">
<ul>
<li>Repair Video</li>
<li>Repair Q&A</li>
</ul>
</div>
<div>
<br>
<span class="h4">Cross Reference Information</span><br>
<p>Part Number 285753A (AP3963893) replaces 1195967, 280152, 285140, 285743, 285753, 3352470, 3363664, 3364002, 3364003, 62672, 62693, 661560, 80008, 8559748, AH1485646, EA1485646, PS1485646.
<br>
</p>
</div>
</div>
Here is my Regexp
preg_match_all("/<div class=\"rpr-help m-chm\">(.*)<\/.*>/s", $urlcontent, $description);
Its working fine whenever I assign this complete div to $urlcontent variable.
But when I am fetching data from real url like $urlcontent = "www.test.com/test.html";
its returning complete webpage script.
How can I get inside content of <div class="rpr-help m-chm"> ?
Is there any correction require in my regexp?
Any help would be appreciated. Thanks
It's not possible to parse HTML/XHTML by regex. Source
You can't parse [X]HTML with regex. Because HTML can't be parsed by
regex. Regex is not a tool that can be used to correctly parse HTML
Based on the language you use, Please consider using a thirdpart library for HTML parsing.
use this function
function GetclassContent($tagStart,$tagEnd,$content)
{
$first_step = explode( $tagStart,$content );
$second_step = explode($tagEnd,$first_step[1] );
return $second_step[0];
}
Steps to Use Above function
$website="www.test.com/test.html";
$content=file_get_contents($website);
$tagStart ='<div class="rpr-help m-chm">';
$tagEnd = "</div >";
$RequiredContent = GetclassContent($tagStart,$tagEnd,$content);

PhpQuery and replaceWith, How to?

I'm using PhpQuery and I need to replace an "iframe" for another tag
The html file have an Iframe
<div id="content">
<div class="pad3"></div>
<iframe src="http://www.yahoo.com" id="iFrame"></iframe>
<div class="pad2"></div>
</div>
Whit this piece of
$doc = phpQuery::newDocumentFileHTML('file.htm');
$doc->find('iframe')->replaceWith('<p>test</p>');
I expected this:
<div id="content">
<div class="pad3"></div>
<p>test</p>
<div class="pad2"></div>
</div>
But nothing happens. Can someone give me some clues?
Best Regards
Try using the id of your iframe element:
$doc->find('#iFrame')->replaceWith('<p>test</p>');

Categories