how to extract raw html code using simplehtmldom

how to extract raw html code using simplehtmldom - php

I am trying to extract raw html from a web-page using simplehtmldom. I was wondering if it is possible using that library.
For example, let's say I have this web page I am trying to extract data from.
<div class="class1">
<div class="class2">
<div class="class3">
<p>p1</p>
<h1>header here!</h1>
<p>p2</p>
<img src="someimage"></img>
</div>
</div>
</div>
My goal is to extract everything within div class3 including the raw html code so when I get the data I can enter it to a text box which allows input for source code so it is formatted the same way it is from the webpage.
I have looked at simplehtmldom manuals and did some searching but have yet to find a solution.
Thank you.

Using your example html string
$html = str_get_html('<div class="class1">
<div class="class2">
<div class="class3">
<p>p1</p>
<h1>header here!</h1>
<p>p2</p>
<img src="someimage"></img>
</div>
</div>
</div>');
// Find all divs with class3
foreach($html->find('div[class=class3]') as $element) {
echo $element->outertext;
}

Related

Extract Nested Tag Using PHP

Tag hierarchy in a webpage :
<body>
<div id='header'>
<h2>.....</h2>
</div>
<div id='main'>
<h2>...</h2>
//Some other content
<h2>...</h2>
</div>
<div id='footer'>
<h2>.....</h2>
</div>
</body>
[PROBLEM : ] From the above hierarchy structure of a webpaege, I want to extract only the <h2> tags which are inside the <div id='main'>. Can someone please please help me out ?
What I have tried is.... using HTML DOM of php $h2Tags = $htmlDom->getElementsByTagName('h2');, but this gives me all the <h2> tag which are outside of main div as well. Please guide me to a solution.

I have updated this to PHP:
h2_tags below will get list of h2s in main div:
$div_m = $htmlDom->getElementById('main');
$h2_tags = $div_m->getElementsByTagName('h2');
This is JS:
var div_m = document.getElementById("main");
var h2_tag = div_m.getElementsByTagName('h2');

Getting element in PHP - PHP Simple HTML DOM Parser

Can you help me with the resolution below?
I have the following code in html:
<div class="return-form">
<div class="two_cols">
<div class="first_col">
<label for="namesinger">Name:</label> </div>
<div class="second_col">
<p id="name">Axl Rose</p>
</div>
</div>
I am using the PHP Simple HTML DOM Parser library and I would like to display only the name "Axl Rose" on the screen.
echo ($ name)
expected exit
Axl Rose

This how you can extract the data
<?php
// Load the HTML
$html = str_get_html('<div class="return-form">
<div class="two_cols">
<div class="first_col">
<label for="namesinger">Name:</label> </div>
<div class="second_col">
<p id="name">Axl Rose</p>
</div>
</di');
// Locate the date via div ID and display
echo $html->find('p[id=name]', 0)->plaintext;
?>
For more details Read this

Store html content into php variable?

I have some html in my php page that I fetched from database and will be used many times in my webpage, so I want to put it into a php variable for later use.
My sample code is
<div class="info">
Some content from database here...
<div class="more">
Some more text...
</div>
</div>
How to store this html into php variable?
Please also tell me how to echo content of that variable?

I do agree with #Jon Stirling but You can do like below:
$htmlData = '<div class="info">
Some content from database here...
<div class="more">
Some more text...
</div>
</div>';
and you can print this
echo $htmlData;

How to get content from Div which have other HTML tags using Regexp

I have div which contain other html tags along with text
I want to extract only text from this div OR inside all html tags
<div class="rpr-help m-chm">
<div class="header">
<h2 class="h6">Repair Help</h2>
</div><!-- /end .header -->
<div class="inner m-bsc">
<ul>
<li>Repair Video</li>
<li>Repair Q&A</li>
</ul>
</div>
<div>
<br>
<span class="h4">Cross Reference Information</span><br>
<p>Part Number 285753A (AP3963893) replaces 1195967, 280152, 285140, 285743, 285753, 3352470, 3363664, 3364002, 3364003, 62672, 62693, 661560, 80008, 8559748, AH1485646, EA1485646, PS1485646.
<br>
</p>
</div>
</div>
Here is my Regexp
preg_match_all("/<div class=\"rpr-help m-chm\">(.*)<\/.*>/s", $urlcontent, $description);
Its working fine whenever I assign this complete div to $urlcontent variable.
But when I am fetching data from real url like $urlcontent = "www.test.com/test.html";
its returning complete webpage script.
How can I get inside content of <div class="rpr-help m-chm"> ?
Is there any correction require in my regexp?
Any help would be appreciated. Thanks

It's not possible to parse HTML/XHTML by regex. Source
You can't parse [X]HTML with regex. Because HTML can't be parsed by
regex. Regex is not a tool that can be used to correctly parse HTML
Based on the language you use, Please consider using a thirdpart library for HTML parsing.

use this function
function GetclassContent($tagStart,$tagEnd,$content)
{
$first_step = explode( $tagStart,$content );
$second_step = explode($tagEnd,$first_step[1] );
return $second_step[0];
}
Steps to Use Above function
$website="www.test.com/test.html";
$content=file_get_contents($website);
$tagStart ='<div class="rpr-help m-chm">';
$tagEnd = "</div >";
$RequiredContent = GetclassContent($tagStart,$tagEnd,$content);

PhpQuery and replaceWith, How to?

I'm using PhpQuery and I need to replace an "iframe" for another tag
The html file have an Iframe
<div id="content">
<div class="pad3"></div>
<iframe src="http://www.yahoo.com" id="iFrame"></iframe>
<div class="pad2"></div>
</div>
Whit this piece of
$doc = phpQuery::newDocumentFileHTML('file.htm');
$doc->find('iframe')->replaceWith('<p>test</p>');
I expected this:
<div id="content">
<div class="pad3"></div>
<p>test</p>
<div class="pad2"></div>
</div>
But nothing happens. Can someone give me some clues?
Best Regards

Try using the id of your iframe element:
$doc->find('#iFrame')->replaceWith('<p>test</p>');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

how to extract raw html code using simplehtmldom - php

Related

Extract Nested Tag Using PHP

Getting element in PHP - PHP Simple HTML DOM Parser

Store html content into php variable?

How to get content from Div which have other HTML tags using Regexp

PhpQuery and replaceWith, How to?

Categories

Resources