PhpQuery and replaceWith, How to? - php

I'm using PhpQuery and I need to replace an "iframe" for another tag
The html file have an Iframe
<div id="content">
<div class="pad3"></div>
<iframe src="http://www.yahoo.com" id="iFrame"></iframe>
<div class="pad2"></div>
</div>
Whit this piece of
$doc = phpQuery::newDocumentFileHTML('file.htm');
$doc->find('iframe')->replaceWith('<p>test</p>');
I expected this:
<div id="content">
<div class="pad3"></div>
<p>test</p>
<div class="pad2"></div>
</div>
But nothing happens. Can someone give me some clues?
Best Regards

Try using the id of your iframe element:
$doc->find('#iFrame')->replaceWith('<p>test</p>');

Related

Extract Nested Tag Using PHP

Tag hierarchy in a webpage :
<body>
<div id='header'>
<h2>.....</h2>
</div>
<div id='main'>
<h2>...</h2>
//Some other content
<h2>...</h2>
</div>
<div id='footer'>
<h2>.....</h2>
</div>
</body>
[PROBLEM : ] From the above hierarchy structure of a webpaege, I want to extract only the <h2> tags which are inside the <div id='main'>. Can someone please please help me out ?
What I have tried is.... using HTML DOM of php $h2Tags = $htmlDom->getElementsByTagName('h2');, but this gives me all the <h2> tag which are outside of main div as well. Please guide me to a solution.
I have updated this to PHP:
h2_tags below will get list of h2s in main div:
$div_m = $htmlDom->getElementById('main');
$h2_tags = $div_m->getElementsByTagName('h2');
This is JS:
var div_m = document.getElementById("main");
var h2_tag = div_m.getElementsByTagName('h2');

how to extract raw html code using simplehtmldom

I am trying to extract raw html from a web-page using simplehtmldom. I was wondering if it is possible using that library.
For example, let's say I have this web page I am trying to extract data from.
<div class="class1">
<div class="class2">
<div class="class3">
<p>p1</p>
<h1>header here!</h1>
<p>p2</p>
<img src="someimage"></img>
</div>
</div>
</div>
My goal is to extract everything within div class3 including the raw html code so when I get the data I can enter it to a text box which allows input for source code so it is formatted the same way it is from the webpage.
I have looked at simplehtmldom manuals and did some searching but have yet to find a solution.
Thank you.
Using your example html string
$html = str_get_html('<div class="class1">
<div class="class2">
<div class="class3">
<p>p1</p>
<h1>header here!</h1>
<p>p2</p>
<img src="someimage"></img>
</div>
</div>
</div>');
// Find all divs with class3
foreach($html->find('div[class=class3]') as $element) {
echo $element->outertext;
}

Xpath Exclude p.class of a div

This is my HTML example:
<div id="Texte">
<div class="pagination">
...
</div>
<p>...</p>
<p>....</p>
<p class="Foot">...</p>
</div>
I want to use Xpath to get all content of my <div id="Texte"> without the <p class="foot">.
I use this, but it's not ok, I have the class='Foot' in my result :
$crawler->filterXPath("//*[#id='Texte' and not(#class='Foot')]")->html();
Almost.
// correct
$crawler->filterXPath("//*[#id='Texte']/*[not(#class='Foot')]")->html();
// yours, for comparison
$crawler->filterXPath("//*[#id='Texte' and not(#class='Foot')]")->html();

Wordpress modify the_content markup

With jQuery or PHP, I would like to modify the dom structure of the the_content function in WordPress. In some posts I use the h3 element, and I would like to add a wrapper that contains the content until the next h3.
So I would like to convert this:
<h3>Title</h3>
<p>This is just regular text</p>
<h3>Next title</h3>
Into this:
<div class='wrapper'>
<h3>Title</h3>
<p>This is just regular text</p>
</div>
<div class='wrapper'>
<h3>Next title</h3>
</div>
Thanks!
assuming that the content only consists of <h3>s and <p>s,and they are welled formated like:
<div id="content">
<h3>title</h3>
<p>..........</p>
<p>..........</p>
<h3>another title</h3>
<p>.........</p>
<h3>yet another title</h3>
<p>..........</p>
</div>
then you may try this in jQuery.
//get the main post content
$content=$("#content");
$h3s=$content.find('h3');
$h3s.each(function(index){
if(index==0)$(this).before('<div class="wrapper">');
else $(this).before('</div><div class="wrapper">');
});
//remeber to close the last one
$content.append('</div>');

php use preg_replace remove div

I tried use preg_replace remove the div which class="image content", I use some code below, but still remain two </div> after my preg_replace, need a help, thanks.
<style>
#contract{width:100%;height:100%;}
#content{width:1002px;overflow:hidden;margin:0 auto;}
</style>
<div id="contract">
<div id="content">
<?php
$html = <<<EOT
<div style="float:left;" class="image content"><div style="float:left;"><div style="float:left;">
<img alt="Se hai problemi nella visualizzazione dei caratteri, clicca qui." src="../../image/1001.jpg" >
</div></div></div>
<div class="content" style="float:left;"><i>some content here.</i></div>
EOT;
echo preg_replace('/<div(.*?)class="image content">([\s\S]*?)<\/div>/','',$html);
?>
</div>
</div>
I need return: <div class="content" style="float:left;"><i>some content here.</i></div>
I know it's not answer to this question, but don't use regular expressions for manipulating with DOM. There are specialized classes for that (DOMDocument), which are faster and will cause you less headache.

Categories