Delete tag content using preg_replace

Delete tag content using preg_replace - php

I have a long string in PHP, which contains few tables and word between them (HTML in string). I need to delete all tables with specific classes (fQuote) from it, but when i'm trying to use my regular expression, it selected all content between first table occurance to last closing table tag.
My Regular Expression:
$content = preg_replace('/<table\s[^>]*class=\"[^\"]*fQuote[^\"]*\"[^>]*>.*<\/table>\s?/ix','',$content);
My test string:
<table class="fQuote simple_table"><tbody><tr><td class="fQuoteUser"><img class="fQuoteAvatar" src="http://localhost/images/avatars/na.png" /><br /><a class="clear" href="http://localhost/user/1">User1</a></td><td class="fQuoteCorner"> </td><td class="fQuoteTable"><table class="default_table"><tbody><tr><td class="fQuoteBody">Some text</td></tr><tr><td class="fQuoteDate">Quoting a message <a class="clear fQuotedID" href="http://localhost/p/92">#9</a> 12 december 2011 23:43</td></tr></tbody></table></td></tr></tbody></table><p>text needed to stay</p><table class="fQuote simple_table" style="width:932px"><tbody><tr><td class="fQuoteUser"><img class="fQuoteAvatar" src="http://localhost/images/avatars/na.png" /><br /><a class="clear" href="http://localhost/user/2">User2</a></td><td class="fQuoteCorner"> </td><td class="fQuoteTable"><table class="default_table" style="height:187px; width:830px"><tbody><tr><td class="fQuoteBody"><br />Quoted<br /><table align="center" cellpadding="0" cellspacing="1" style="height:34px; width:738px"><tbody><tr><td><table cellpadding="0" cellspacing="0" style="height:32px; width:736px"><tbody><tr><td><em><a class="main_normal" href="http://localhost/p/444" target="_blank">#444</a> <strong>User3</strong>15.12.2011 16:56<br />Some text2</em></td></tr></tbody></table></td></tr></tbody></table><br /><br /><br />some text3</td></tr><tr><td class="fQuoteDate">Quoting a message <a class="clear fQuotedID" href="http://localhost/p/447">#22</a> 15 december 2011 16:33<br /></td></tr></tbody></table></td></tr></tbody></table>
I need delete everything from this conrecte test case except
<p>text needed to stay</p>
When i'm using my Regular Expression it's get first table openning tag and selecting everything until last table closing tag, which cutting phrase to stay and getting clear string in result.
Please help with correct regular expression. Thanks.

Related

How to convert Wordpress caption tag to html div tag

I converted a webiste from Wordpress and I some of the posts have a caption tag as the following:
[caption id="attachment_666" align="alignleft" width="316"]
<img class="wp-image-92692" src="img" width="316" alt="fitbit-yoga-lady.png" height="210">
text
[/caption]
I would like to catch all of these captions and convert it to the following
<div id="attachment_666" style="width: 326px" class="wp-caption alignleft">
<img class="wp-image-92692" src="img" alt="fitbit-yoga-lady.png" width="316" height="210">
<p class="caption">text</p>
</div>

Well, given the exact text that you provided, the following should work.
Search Pattern:
\[caption([^\]]+)align="([^"]+)"\s+width="(\d+)"\](\s*\<img[^>]+>)\s*(.*?)\s*\[\/caption\]
Replacement:
<div\1style="width: \3px" class="wp-caption \2">\4
<p class="caption">\5</p>
</div>
See the demo.
Depending on how tolerant of variations in the input it needs to be, you may need to adjust it from there, but that should at least get you started.
Here's an example of how this could be done with preg_replace:
function convert_caption($content)
{
return preg_replace(
'/\[caption([^\]]+)align="([^"]+)"\s+width="(\d+)"\](\s*\<img[^>]+>)\s*(.*?)\s*\[\/caption\]/i',
'<div\1style="width: \3px" class="wp-caption \2">\4<p class="caption">\5</p></div>',
$content);
}

I'm doing this blindly on my phone, but I think you can use the following two regular expressions, one for the opening tag and another for the closing:
Find:
\[caption([^\]])\]
Replace:
<div$1>
Find:
\[/\caption\]
Replace:
</div>

Selection and Deletion of a tag with specific keyword in WordPress

I use an auto blogging plugin that allows me aggregate RSS feeds from certain sites to fill the gap in my blog in times I don't post in, the idea that posts are injected with ads like AdSense URLs with random dynamic links but usually starts with certain URLS, here is a code snippet:
src="http://rss.feedsportal.com/c/669/f/9809/s/3b7b71e8/sc/5/mf.gif" border="0" /><br clear='all'/><br /><br /><img src="http://da.feedsportal.com/r/199108411625/u/49/f/9809/c/669/s/3b7b71e8/sc/5/rc/1/rc.img" border="0" /><br /><img src="http://da.feedsportal.com/r/199108411625/u/49/f/9809/c/669/s/3b7b71e8/sc/5/rc/2/rc.img" border="0" /><br /><img src="http://da.feedsportal.com/r/199108411625/u/49/f/9809/c/669/s/3b7b71e8/sc/5/rc/3/rc.img" border="0" /><br /><br /><img src="http://da.feedsportal.com/r/199108411625/u/49/f/9809/c/669/s/3b7b71e8/sc/5/a2.img" border="0" /><img width="1" height="1" src="http://pi.feedsportal.com/r/199108411625/u/49/f/9809/c/669/s/3b7b71e8/sc/5/a2t.img" border="0" /><img src="http://feeds.feedburner.com/~r/techradar/allnews/~4/tWczqbBA1yg" height="1" width="1" /><br />
The idea that all injected tags include "*feedsportal.com", how can I select the whole tag line that includes this term and replace it or delete it in WordPress?
Thanks!

If you want to remove all tags containing the keyword feedsportal.com you can try something like this..
// Replace all matches
$str = preg_replace("/\<a.*?feedsportal\.com.*?\>.*?\<\/a\>/is","",$str);
// Collect matches
preg_match_all("/\<a.*?feedsportal\.com.*?\>.*?\<\/a\>/is",$str,$matches);
// Show matches
print "<pre>";
print_r($matches);
print "</pre>";

XPath & PHP: Matching text after a <br>

I'm trying to sort through a large list of video links. I'm wanting to sort them by 2014, 2013, 2012, and so on, so as an example, I'm after an XPath query to get the '2014', and retrieve all the movies that match 2014.
My effort: Tried matching text and checking for text after <br>, but that retrieves every single thing after every <br> in the document!
Maybe something with a text match as well? I.e. after <br> and text() = '2014 - '?
<td>
<table>
<tbody>
<tr>
<td>
<span>
<br>
2014 -
<a id="3447" class="tippable" href="www.examplemovie.com" style="color:#fff">The MovieName1</a>
<br>
2014 -
<a id="3595" class="tippable" href="www.examplemovie.com" style="color:#fff">MovieName2</a>
<br>
Thanks!!

I would suggest trying
//a[preceding-sibling::node()[1][contains(self::text(), '2014')]]
This will actually select the a elements for 2014, which I think is what you're actually after.

You can use the following xpath expression:
//a[contains(preceding-sibling::text()[1], "2014")]
This basically means: give me all of the a tags that has a text before it, containing 2014.
Demo:
Imagine you have the following index.html file:
<table>
<tbody>
<tr>
<td>
<span>
<br/>
2014 -
<a id="3447" class="tippable" href="www.examplemovie.com" style="color:#fff">The MovieName1</a>
<br/>
2014 -
<a id="3595" class="tippable" href="www.examplemovie.com" style="color:#fff">MovieName2</a>
<br/>
</span>
</td>
</tr>
</tbody>
</table>
Then, here is the output of xmllint xpath test:
$ xmllint index.html --xpath '//a[contains(preceding-sibling::text()[1], "2014")]'
<a id="3447" class="tippable" href="www.examplemovie.com" style="color:#fff">The MovieName1</a>
<a id="3595" class="tippable" href="www.examplemovie.com" style="color:#fff">MovieName2</a>

How to get content from a div using regex

I have string like :
<div class="fck_detail">
<table align="center" border="0" cellpadding="3" cellspacing="0" class="tplCaption" width="1">
<tbody>
<tr><td>
<img alt="nole-1375196668_500x0.jpg" src="http://l.f1.img.vnexpress.net/2013/07/30/nole-1375196668_500x0.jpg" width="500">
</td></tr>
<tr><td class="Image">
Djokovic hậm hực với các đàn anh. Ảnh: <em>Livetennisguide.</em>
</td></tr>
</tbody>
</table>
<p>Riêng với Andy Murray, ...</p>
<p style="text-align:right;"><strong>Anh Hào</strong></p>
</div>
I want to get content . How to write this pattern using preg_match. Please help me

If there are no other HTML tags inside the div, then this regex should work:
$v = '<div class="fck_detail">Some content here</div>';
$regex = '#<div class="fck_detail">([^<]*)</div>#';
preg_match($regex, $v, $matches);
echo $matches[1];
The actual regex here is <div class="fck_detail">([^<]*)</div>. Regexes used in PHP also need to be surrounded by some other character that doesn't occur in the regex (I used #).
However, if what you're parsing is arbitrary HTML provided by the user, then preg_match simply can't do this. Full-fledged HTML parsing is beyond the ability of any regex, and that's what you'll need if you're parsing the output of a full-fledged HTML editor.

Creating pattern to php preg_match_all

I'm using preg_match_all() and my problem is that I can't create the pattern that I want. Example of source text:
<td align='left'>
<span style='font-size: 13px; font-family: Verdana;'><span>
</td>
<td>
<a style='color: #ffff00' rel='gb_page_fs[]' title='Parodyk kitiems 8 seriją' href='/pasidalink-19577x10/'>
<img src="/templates/filmai_black/images/ico_tool_share.gif" />
</a>
</td>
<td>
<small>LT titrai</small>
</td>
<td>
<a rel='gb_page_center[528, 290]' title='Žiūrėti 8 seriją' href='http://www.filmai.in/watch.php?em=BuwgzpqtssiAGGcjeekz9PTI1NjQ0N2E~'>
<img src="/templates/filmai_black/images/play_icon.png" width="20" onclick='set_watched_cookie_serial("19577x10", "done-tick-full-series")' />
</a>
</td>
I am using the pattern:
<td><small>(.*)</small></td>
<td><a rel='gb_page_center[528, 290]' title='Žiūrėti (.*) seriją' href='(.*)'><img src=
I want to get the content in the (.*) location into an array.
Can someone please correct my pattern and explain it?
I want to learn to use regular expressions.

"Don't use Regex to parse HTML" aside,
here are a few uber simple steps to learning Regexp.
Download and install RegexBuddy
Run RegexBuddy
Start with something easy and then FLY! :)
the expression you are looking for is:
<small>(.*)</small>
It finds all characters found inbetween small tags and puts them into backreferences.
Think of Backreference as an Array. The first to item found, is 0, next is 1 and so on.
// command:
preg_match_all('%<small>(.*)</small>%i', $subject, $result, PREG_PATTERN_ORDER);
// $result[0]
Array
(
[0] => <small>LT titrai</small>
)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Delete tag content using preg_replace - php

Related

How to convert Wordpress caption tag to html div tag

Selection and Deletion of a tag with specific keyword in WordPress

XPath & PHP: Matching text after a <br>

How to get content from a div using regex

Creating pattern to php preg_match_all

Categories

Resources