How to get content from a div using regex - php

I have string like :
<div class="fck_detail">
<table align="center" border="0" cellpadding="3" cellspacing="0" class="tplCaption" width="1">
<tbody>
<tr><td>
<img alt="nole-1375196668_500x0.jpg" src="http://l.f1.img.vnexpress.net/2013/07/30/nole-1375196668_500x0.jpg" width="500">
</td></tr>
<tr><td class="Image">
Djokovic hậm hực với các đàn anh. Ảnh: <em>Livetennisguide.</em>
</td></tr>
</tbody>
</table>
<p>Riêng với Andy Murray, ...</p>
<p style="text-align:right;"><strong>Anh Hào</strong></p>
</div>
I want to get content . How to write this pattern using preg_match. Please help me

If there are no other HTML tags inside the div, then this regex should work:
$v = '<div class="fck_detail">Some content here</div>';
$regex = '#<div class="fck_detail">([^<]*)</div>#';
preg_match($regex, $v, $matches);
echo $matches[1];
The actual regex here is <div class="fck_detail">([^<]*)</div>. Regexes used in PHP also need to be surrounded by some other character that doesn't occur in the regex (I used #).
However, if what you're parsing is arbitrary HTML provided by the user, then preg_match simply can't do this. Full-fledged HTML parsing is beyond the ability of any regex, and that's what you'll need if you're parsing the output of a full-fledged HTML editor.

Related

regex find specific tables in html

i have html like bottom of this. and using PHP
<table style="...">
<tbody>
<tr> <img id="foo" src="foo"/></tr>
</tbody>
</table>
<p> ....</p>
<table style="...">
<tbody>
<tr> <img id="bar" src="bar"/></tr
</tbody>
</table>
I'm beginning PHP.
I want to find specific table like img src or id equals foo or bar.
but selected both tables.
here is my regex
1.find tables has img tag
/<table.*?>.*?<img *.*?<\/table>/
-> selected 2 table
2.add img src
<table.*?<img.+(src=.*?foo).*?<\/table>
-> selected all, from first tag to last tag
3.so try to not include </table> between ... tag.
<table.*?(?!<\/table>).*?<img.+(src=.*?foo).*?<\/table>
-> same result
I don't know what is wrong!
I was solved using preg_match_all() but still want know preg_match()
has any idea??
thanks!
This job is much better suited to using PHPs DOMDocument and DOMXPath classes. In this case we use an xpath to search for a table which has a descendant which is an img with it's src attribute equal to either 'foo' or 'bar':
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$footable = $xpath->query("//table[descendant::img[#src='foo']]");
echo $footable->item(0)->C14N() . "\n";
$bartable = $xpath->query("//table[descendant::img[#src='bar']]");
echo $bartable->item(0)->C14N() . "\n";
Output:
<table style="..."><tbody><tr><img id="foo" src="foo"></img></tr></tbody></table>
<table style="..."><tbody><tr><img id="bar" src="bar"></img></tr></tbody></table>
Demo on 3v4l.org

How to convert Wordpress caption tag to html div tag

I converted a webiste from Wordpress and I some of the posts have a caption tag as the following:
[caption id="attachment_666" align="alignleft" width="316"]
<img class="wp-image-92692" src="img" width="316" alt="fitbit-yoga-lady.png" height="210">
text
[/caption]
I would like to catch all of these captions and convert it to the following
<div id="attachment_666" style="width: 326px" class="wp-caption alignleft">
<img class="wp-image-92692" src="img" alt="fitbit-yoga-lady.png" width="316" height="210">
<p class="caption">text</p>
</div>
Well, given the exact text that you provided, the following should work.
Search Pattern:
\[caption([^\]]+)align="([^"]+)"\s+width="(\d+)"\](\s*\<img[^>]+>)\s*(.*?)\s*\[\/caption\]
Replacement:
<div\1style="width: \3px" class="wp-caption \2">\4
<p class="caption">\5</p>
</div>
See the demo.
Depending on how tolerant of variations in the input it needs to be, you may need to adjust it from there, but that should at least get you started.
Here's an example of how this could be done with preg_replace:
function convert_caption($content)
{
return preg_replace(
'/\[caption([^\]]+)align="([^"]+)"\s+width="(\d+)"\](\s*\<img[^>]+>)\s*(.*?)\s*\[\/caption\]/i',
'<div\1style="width: \3px" class="wp-caption \2">\4<p class="caption">\5</p></div>',
$content);
}
I'm doing this blindly on my phone, but I think you can use the following two regular expressions, one for the opening tag and another for the closing:
Find:
\[caption([^\]])\]
Replace:
<div$1>
Find:
\[/\caption\]
Replace:
</div>

Delete tag content using preg_replace

I have a long string in PHP, which contains few tables and word between them (HTML in string). I need to delete all tables with specific classes (fQuote) from it, but when i'm trying to use my regular expression, it selected all content between first table occurance to last closing table tag.
My Regular Expression:
$content = preg_replace('/<table\s[^>]*class=\"[^\"]*fQuote[^\"]*\"[^>]*>.*<\/table>\s?/ix','',$content);
My test string:
<table class="fQuote simple_table"><tbody><tr><td class="fQuoteUser"><img class="fQuoteAvatar" src="http://localhost/images/avatars/na.png" /><br /><a class="clear" href="http://localhost/user/1">User1</a></td><td class="fQuoteCorner"> </td><td class="fQuoteTable"><table class="default_table"><tbody><tr><td class="fQuoteBody">Some text</td></tr><tr><td class="fQuoteDate">Quoting a message <a class="clear fQuotedID" href="http://localhost/p/92">#9</a> 12 december 2011 23:43</td></tr></tbody></table></td></tr></tbody></table><p>text needed to stay</p><table class="fQuote simple_table" style="width:932px"><tbody><tr><td class="fQuoteUser"><img class="fQuoteAvatar" src="http://localhost/images/avatars/na.png" /><br /><a class="clear" href="http://localhost/user/2">User2</a></td><td class="fQuoteCorner"> </td><td class="fQuoteTable"><table class="default_table" style="height:187px; width:830px"><tbody><tr><td class="fQuoteBody"><br />Quoted<br /><table align="center" cellpadding="0" cellspacing="1" style="height:34px; width:738px"><tbody><tr><td><table cellpadding="0" cellspacing="0" style="height:32px; width:736px"><tbody><tr><td><em><a class="main_normal" href="http://localhost/p/444" target="_blank">#444</a> <strong>User3</strong>15.12.2011 16:56<br />Some text2</em></td></tr></tbody></table></td></tr></tbody></table><br /><br /><br />some text3</td></tr><tr><td class="fQuoteDate">Quoting a message <a class="clear fQuotedID" href="http://localhost/p/447">#22</a> 15 december 2011 16:33<br /></td></tr></tbody></table></td></tr></tbody></table>
I need delete everything from this conrecte test case except
<p>text needed to stay</p>
When i'm using my Regular Expression it's get first table openning tag and selecting everything until last table closing tag, which cutting phrase to stay and getting clear string in result.
Please help with correct regular expression. Thanks.

get image src from HTML with regex

I have HTML like
<td class="td_scheda_modello_dati">
<img src="/webapp/safilo/gen_img/p_verde.gif" width="15" height="15" alt="" border="0">
</td>
I want to extract the img src from this HTML using preg_match_all().
I have done this
preg_match_all('#<td class=td_scheda_modello_dati>(.*)<td>#',$detail,$detailsav);
It should give the whole img tag.But it doesn't give me the img tag. So what changes should be done to get the specific value?
Long story short: ideone
You should not use Regex, but instead an HTML parser. Here's how.
<?php
$html = '<img src="/webapp/safilo/gen_img/p_verde.gif" width="15" height="15" alt="" border="0">';
$xpath = new DOMXPath(#DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/#src)");
echo $src;
?>
Try this code.
$html_text = '<td class="td_scheda_modello_dati">
<img src="/webapp/safilo/gen_img/p_verde.gif" width="15" height="15" alt="" border="0"></td>';
preg_match( '/src="([^"]*)"/i', $html_text , $res_array ) ;
print_r($res_array);
Try using the s modifier after your regex. The default behavior for the dot character is not to match newlines (which your example has).
Something like:
preg_match_all('#<td class=td_scheda_modello_dati>(.*)</td>#s',$detail,$detailsav);
Should do the trick.
It's worth reading up a bit on modifiers, the more you do with regex the more useful they become.
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Edit: also, just realized that the code posted was missing a closing td tag (it was <td> instead of </td>). Fixed my example to reflect that.
Try this: <img[^>]*src="([^"]*/gen_img/p_verde.gif)"

Creating pattern to php preg_match_all

I'm using preg_match_all() and my problem is that I can't create the pattern that I want. Example of source text:
<td align='left'>
<span style='font-size: 13px; font-family: Verdana;'><span>
</td>
<td>
<a style='color: #ffff00' rel='gb_page_fs[]' title='Parodyk kitiems 8 seriją' href='/pasidalink-19577x10/'>
<img src="/templates/filmai_black/images/ico_tool_share.gif" />
</a>
</td>
<td>
<small>LT titrai</small>
</td>
<td>
<a rel='gb_page_center[528, 290]' title='Žiūrėti 8 seriją' href='http://www.filmai.in/watch.php?em=BuwgzpqtssiAGGcjeekz9PTI1NjQ0N2E~'>
<img src="/templates/filmai_black/images/play_icon.png" width="20" onclick='set_watched_cookie_serial("19577x10", "done-tick-full-series")' />
</a>
</td>
I am using the pattern:
<td><small>(.*)</small></td>
<td><a rel='gb_page_center[528, 290]' title='Žiūrėti (.*) seriją' href='(.*)'><img src=
I want to get the content in the (.*) location into an array.
Can someone please correct my pattern and explain it?
I want to learn to use regular expressions.
"Don't use Regex to parse HTML" aside,
here are a few uber simple steps to learning Regexp.
Download and install RegexBuddy
Run RegexBuddy
Start with something easy and then FLY! :)
the expression you are looking for is:
<small>(.*)</small>
It finds all characters found inbetween small tags and puts them into backreferences.
Think of Backreference as an Array. The first to item found, is 0, next is 1 and so on.
// command:
preg_match_all('%<small>(.*)</small>%i', $subject, $result, PREG_PATTERN_ORDER);
// $result[0]
Array
(
[0] => <small>LT titrai</small>
)

Categories