regular expression to remove a div

regular expression to remove a div - php

I have a file like:
<div clas='dsfdsf'> this is first div </div>
<div clas='dsfdsf'> this is second div </div>
<div class="remove">
<table>
<thead>
<tr>
<th colspan="2">Mehr zum Thema</th>
</tr>
</thead>
<tbody>
<tr> this is tr</tr>
<tr> this row no 2 </tr>
</tbody>
</table>
</div>
<div clas='sasas'> this is last div </div>
I have get this file content in a variable like this:
$Cont = file_get_contents('myfile');
Now I want to replace div with class name 'remove' by preg_replace. I have tried this:
$patterns = "%<div class='remove'>(.+?)</div>%";
$strPageSource = preg_replace($patterns, '', $Cont);
It did not work. What should be the correct regular expression for this replace?

Try this code.
preg_replace("/<div class='remove'>(.*?)<\/div >/i", "<div class="newClass">Newthings</div> ", $Cont);

As it has been stated in the comments, you should not be using regex to parse HTML. Because there's no sane way for you to extract that <div> if there're other nested <div>'s inside. I.e.
<div clas='dsfdsf'> this is second div </div>
<div class="remove">
some text <div>nested div</div> more text and some elements<br />
</div>
What you want to do is find the location of your <div class="remove"> and then advance through the HTML (parse it) in the following manner
1) set $nesting_counter = 0
2) proceed through HTML until you encounter either <div> or </div>
a) if found <div>
$nesting_counter++ and go to point 2)
b) if found </div>
if $nesting_counter > 0
$nesting_counter-- and go to point 2)
else
you've found the closing tag for your `<div class="remove">`. remember current position and just remove that substring.

Related

Using DOMXpath to find data in not so nice html

I am trying to get some data from a plant list site. This proves to be a bit problematic because their html isn't really well-formed. These are two lines from the search result (disclaimer: I am not responsible for this code):
<tr>
<td>
<i class="glyphicons-icon leaf"></i>
</td>
<td>
<a title="Cimicifuga simplex" href="/taxon/wfo-0000604773" class="result">
<h4 class="h4Results"><em>Cimicifuga simplex</em>(DC.) Wormsk. ex Turcz.</h4>
</a>
Bull. Soc. Imp. Naturalistes Moscou<br/>
<div>
<em>Status:</em><span id="entryStatus">Synonym of </span>
<em>Actaea simplex</em>(DC.) Wormsk. ex Prantl
</div>
<div>
<em>Rank:</em><span id="entryRank">Species</span>
</div>
<div>
<em>Family:</em> Ranunculaceae
</div>
</td>
<td>
<img title="No Image Available" src="/css/images/no_image.jpg" class="thumbnail pull-right"/>
</td>
</tr>
<tr>
<td>
<i class="glyphicons-icon leaf"></i>
</td>
<td>
<a title="Actaea simplex" href="/taxon/wfo-0000519124" class="result">
<h4 class="h4Results"><strong><em>Actaea simplex</em>(DC.) Wormsk. ex Prantl</strong></h4>
</a>
Bot. Jahrb. Syst.<br/>
<div>
<em>Status:</em><span id="entryStatus">Accepted Name</span>
</div>
<div>
<em>Rank:</em><span id="entryRank">Species</span>
</div>
<div>
<em>Family:</em> Ranunculaceae</div>
<div>
<em>Order:</em> Ranunculales
</div>
</td>
<td>
<img title="No Image Available" src="/css/images/no_image.jpg" class="thumbnail pull-right"/>
</td>
</tr>
I added some layout myself, otherwise it wasn't readable.
Anyway, I loaded the page in php and DOMXpath and now I want to get two things:
Select the row that has Accepted Name in it
Get the species name and the corresponding link from it
In this case the result would be "Actaea simplex" and "/taxon/wfo-0000519124". Mind that there will be more results resembling the first row, and that the position of the row that I am looking for doesn't have to be the second one.
Normally I just try, use google and try some more and in the end I get there, but in this case IDs are used as classes, and are not unique. This make it impossible to use an Xpath tester, and perhaps even useless for DOMXpath.
So, is it possible to get my data with DOMXpath, and if yes - what query do I use?

Try something like:
$dom = new DOMDocument();
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
$target = $xpath->query("//td[.//span[.='Accepted Name']]/a");
$link = $target[0]->getAttribute('href');
$title = $target[0]->getAttribute('title');
echo $title," ",$link;
Output
Actaea simplex /taxon/wfo-0000519124

Extract links from specific table

I have a html code with many html tables. I want to extract links from specific one which has specific div above.
Here's my sample code:
<div class="boxuniwersal_header">Table 1</div>
<img src="img/boxuniwersal_top.gif" width="210" height="18" alt="" style="margin-top: 5px" />
<div class="boxuniwersal_content">
<div class="boxuniwersal_subcontent">
<div class='menu_m1'><table cellpadding="3"><tr><td><img src="some.jpg" width="45" /></td><td>Some text</td></tr></table></div>
<br />
</div>
</div>
<!-- /box -->
<!-- box -->
<div class="boxuniwersal_header">Table 2</div>
<img src="img/boxuniwersal_top.gif" width="210" height="18" alt="" style="margin-top: 5px" />
<div class="boxuniwersal_content">
<div class="boxuniwersal_subcontent">
<div class='menu_m1'><table cellpadding="3"><tr><td><img src="some2.jpg" width="45" /></td><td>Some text2</td></tr></table></div>
<br />
</div>
</div>
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query("//div/div/table/tr/td/a|//table//tr/td//a"); //querying domdocument
foreach($results as $result)
{
$links[]=$result->getAttribute("href");
}
This code returns all links. I want to grab only links from Table1. Is it possible?

Your main problem is just tuning the XPath expression to select the right XML.
If you change your XPath to
//div[text()="Table 1"]/following-sibling::div[1]//table//a
What this does is first find the <div> element whose text is the one your after.
The following-sibling::div[1] part will look at the first <div> element at the same level as the <div> element already selected (this is the one where the <table> is).
The last part just looks for all <a> elements within the enclosing <table>.

HTML DOM remove/replace between <tr> and </tr> tags

I've searched for solution but i'm lost. I have to remove or replace with blank everything between <tr> tags. I'm loading html file, which contains many <tr> tags, my goal is to remove <tr> with specific id. My <tr> looks like this:
<tr id="ctl00_cphMain_DisplayRecords1_RepeaterResults_ctl03_trZSD">
<td id="ctl00_cphMain_DisplayRecords1_RepeaterResults_ctl03_tdZSD" class="td-zsd footable-visible footable-last-column footable-first-column" colspan="9">
<div id="divZSDBanners" class="table-banners-zsd clearfix">
<div>
<div class="medium-4 columns zsd-ext-ad">
<div>
<script type="text/javascript">
</script>
<script>
</script>
<div id="ctl00_cphMain_DisplayRecords1_RepeaterResults_ctl03_ctl00_divSpace1" class="adSpacer">
</div>
</div>
</div>
<script type="text/javascript">
</script>
</div>
</div>
</td>
</tr>
I'm using Simple HTML DOM, I've already tried with $html->find('tr[id=tr_id]), but don't know to replace everything between, including divs and script tags.
Any ideas?

Use ->innertext property:
$tr = $html->find( 'tr[id=tr_id]', 0 ); // Select first node (0)
$tr->innertext = '';
echo $html->save();
Output:
<tr id="tr_id"></tr>
Or:
$tr->innertext = '<td>New Content</td>';
echo $html->save();
Output:
<tr id="tr_id"><td>New Content</td></tr>

To remove the TR element itself via DOM, use the removeChild method of its parent node:
$tr->parentNode->removeChild($tr);
To remove the element’s contents, either set its textContent property to empty string '' (PHP 5.6.1+) or remove all child nodes one by one using the element’s removeChild() method in a loop, e. g.:
while ($tr->lastChild) {
$tr->removeChild($tr->lastChild);
}
SimpleXMLElement object can be converted to DOMElement object using the dom_import_simplexml() function.

How create a popup on mouseover depending on the item?

I want to create a popup depending on the item. The text to appear is taken from the database depending on each item.Specifically, I have this code:
{foreach $images as $item}
<div class="icoana" id="container">
<img class="icoane" src="{base_url()}assets/image/{$item->code}/{$item->name}">
<div class="detalii">
<table style="font-size: 25px">
<tr><td>Nume:</td><td>{$item->title}</td> </tr>
<tr><td>Lungime:</td><td>{$item->width}&nbsp cm</td> </tr>
<tr><td>Latime:</td><td>{$item->height}&nbsp cm</td></tr>
</table>
</div>
<div class="despre" id="popup"><img src="{base_url()}assets/image/go.jpg" style="weight: 20px; height:20px;" >Mai multe...</div>
</div>
{/foreach}
and when mouseover div class="despre" I want appear a pop-up with text's description stored in {$item->description}. The pop-up I want to look like this: http://creativeindividual.co.uk/2011/02/create-a-pop-up-div-in-jquery/.I would like a link to an example or source code.

Broadly, what you need to do is two steps
1) Print out the div description in the php code immediately below the div mentioned.
So your code becomes.
{foreach $images as $item}
<div class="icoana" id="container">
<img class="icoane" src="{base_url()}assets/image/{$item->code}/{$item->name}">
<div class="detalii">
<table style="font-size: 25px">
<tr><td>Nume:</td><td>{$item->title}</td> </tr>
<tr><td>Lungime:</td><td>{$item->width}&nbsp cm</td> </tr>
<tr><td>Latime:</td><td>{$item->height}&nbsp cm</td></tr>
</table>
</div>
<div class="despre" id="popup"><img src="{base_url()}assets/image/go.jpg" style="weight: 20px; height:20px;" >Mai multe...</div>
<div class="desc" style="display:hidden">
{$item->description}
</div>
</div>
{/foreach}
2) After that use the same code in the link you provided above and modify the the display part
$(function() {
$('.despre').hover(function(e) {
//Code to show the popup
// Modify to use any standard popup library
// The code below , for now display the desc only.
$(this).next().show();
}, function() {
$(this).next().hide();
});
});
For now this will show and hide the div. You can use any ToolTip library to actually display the popup
Examples here : http://www.1stwebdesigner.com/css/stylish-jquery-tooltip-plugins-webdesign/
Regards
Shreyas N

Find and separate the HTML blocks to an array

First of all I want to describe the idea - anyone know that any CMS or a simple website has some kind of blocks like the list of articles for example on the main page of wordpress where shown each in a block of information: Title, author, content, date etc.
So the main idea is how to find and separate such blocks of HTML and append each of them to an array.
I thought first need to clear them from: classes, ids and styles.
step1:
<div id="box1">
<h3 class="title_style">Title1</h3>
<p>content for box1</p>
<div class="author">Author Name1<span class="style_date">date1<span>any text</div>
</div>
<div id="box2">
<h3 class="title_style">Title2</h3>
<p>content for box2</p>
<div class="author">Author Name2<span class="style_date">date2<span>any text2</div>
</div>
to
<div>
<h3>Title1</h3>
<p>content for box1</p>
<div>Author Name1<span>date1<span>any text</div>
</div>
<div>
<h3>Title2</h3>
<p>content for box2</p>
<div>Author Name2<span>date2<span>any text2</div>
</div>
Step2:
I need to find each block and write them to an array so I can to put each block to a row in the table like this: (note that this blocks are present on almost any site so it doesn't matter what tags it has, they just repeat with different content and attributes, only the structure is the same)
<table>
<tr id="block1">
<td>Title1</td>
<td>content for box1</td>
<td>Author Name1</td>
<td>date1</td>
<td>any text</td>
</tr>
<tr id="block2">
<td>Title2</td>
<td>content for box2</td>
<td>Author Name2</td>
<td>date2</td>
<td>any text</td>
</tr>
</table>
Any ideas ? I need the logic how to do this, not the code itself.

You can walk the DOM of the document using PHP's DOMDocument class.
So you can do something like this:
$str = <<<STR
<div id="box1">
<h3 class="title_style">Title1</h3>
<p>content for box1</p>
<div class="author">Author Name1<span class="style_date">date1</span>any text</div>
</div>
<div id="box2">
<h3 class="title_style">Title2</h3>
<p>content for box2</p>
<div class="author">Author Name2<span class="style_date">date2</span>any text2</div>
</div>
STR;
$dom = new DOMDocument();
$dom->loadHTML($str);
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
//read child elements
}

Try this library Simple HTML Dom Parser.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

regular expression to remove a div - php

Try this code. preg_replace("/<div class='remove'>(.*?)<\/div >/i", "<div class="newClass">Newthings</div> ", $Cont);

Related

Using DOMXpath to find data in not so nice html

Extract links from specific table

HTML DOM remove/replace between <tr> and </tr> tags

How create a popup on mouseover depending on the item?

Find and separate the HTML blocks to an array

Categories

Resources