XPath & PHP: Matching text after a <br> - php

I'm trying to sort through a large list of video links. I'm wanting to sort them by 2014, 2013, 2012, and so on, so as an example, I'm after an XPath query to get the '2014', and retrieve all the movies that match 2014.
My effort: Tried matching text and checking for text after <br>, but that retrieves every single thing after every <br> in the document!
Maybe something with a text match as well? I.e. after <br> and text() = '2014 - '?
<td>
<table>
<tbody>
<tr>
<td>
<span>
<br>
2014 -
<a id="3447" class="tippable" href="www.examplemovie.com" style="color:#fff">The MovieName1</a>
<br>
2014 -
<a id="3595" class="tippable" href="www.examplemovie.com" style="color:#fff">MovieName2</a>
<br>
Thanks!!

I would suggest trying
//a[preceding-sibling::node()[1][contains(self::text(), '2014')]]
This will actually select the a elements for 2014, which I think is what you're actually after.

You can use the following xpath expression:
//a[contains(preceding-sibling::text()[1], "2014")]
This basically means: give me all of the a tags that has a text before it, containing 2014.
Demo:
Imagine you have the following index.html file:
<table>
<tbody>
<tr>
<td>
<span>
<br/>
2014 -
<a id="3447" class="tippable" href="www.examplemovie.com" style="color:#fff">The MovieName1</a>
<br/>
2014 -
<a id="3595" class="tippable" href="www.examplemovie.com" style="color:#fff">MovieName2</a>
<br/>
</span>
</td>
</tr>
</tbody>
</table>
Then, here is the output of xmllint xpath test:
$ xmllint index.html --xpath '//a[contains(preceding-sibling::text()[1], "2014")]'
<a id="3447" class="tippable" href="www.examplemovie.com" style="color:#fff">The MovieName1</a>
<a id="3595" class="tippable" href="www.examplemovie.com" style="color:#fff">MovieName2</a>

Related

Delete tag content using preg_replace

I have a long string in PHP, which contains few tables and word between them (HTML in string). I need to delete all tables with specific classes (fQuote) from it, but when i'm trying to use my regular expression, it selected all content between first table occurance to last closing table tag.
My Regular Expression:
$content = preg_replace('/<table\s[^>]*class=\"[^\"]*fQuote[^\"]*\"[^>]*>.*<\/table>\s?/ix','',$content);
My test string:
<table class="fQuote simple_table"><tbody><tr><td class="fQuoteUser"><img class="fQuoteAvatar" src="http://localhost/images/avatars/na.png" /><br /><a class="clear" href="http://localhost/user/1">User1</a></td><td class="fQuoteCorner"> </td><td class="fQuoteTable"><table class="default_table"><tbody><tr><td class="fQuoteBody">Some text</td></tr><tr><td class="fQuoteDate">Quoting a message <a class="clear fQuotedID" href="http://localhost/p/92">#9</a> 12 december 2011 23:43</td></tr></tbody></table></td></tr></tbody></table><p>text needed to stay</p><table class="fQuote simple_table" style="width:932px"><tbody><tr><td class="fQuoteUser"><img class="fQuoteAvatar" src="http://localhost/images/avatars/na.png" /><br /><a class="clear" href="http://localhost/user/2">User2</a></td><td class="fQuoteCorner"> </td><td class="fQuoteTable"><table class="default_table" style="height:187px; width:830px"><tbody><tr><td class="fQuoteBody"><br />Quoted<br /><table align="center" cellpadding="0" cellspacing="1" style="height:34px; width:738px"><tbody><tr><td><table cellpadding="0" cellspacing="0" style="height:32px; width:736px"><tbody><tr><td><em><a class="main_normal" href="http://localhost/p/444" target="_blank">#444</a> <strong>User3</strong>15.12.2011 16:56<br />Some text2</em></td></tr></tbody></table></td></tr></tbody></table><br /><br /><br />some text3</td></tr><tr><td class="fQuoteDate">Quoting a message <a class="clear fQuotedID" href="http://localhost/p/447">#22</a> 15 december 2011 16:33<br /></td></tr></tbody></table></td></tr></tbody></table>
I need delete everything from this conrecte test case except
<p>text needed to stay</p>
When i'm using my Regular Expression it's get first table openning tag and selecting everything until last table closing tag, which cutting phrase to stay and getting clear string in result.
Please help with correct regular expression. Thanks.

Get text before or after html tags using Xpath

I have html which I am simplifying here, I need to write an xPath to get phone number.
<td>
<font>
<b>
<font size="2">
Some link
</font>
</b>
<br>
Abc Address
<br>
Country name
<br>
(123) 456-7890
<hr>
A sentence here..
<img src="/images/abc.gif">
</font>
</td>
I can extract text inside anchor tag as,
->filterXPath('//font//b//a')->extract('_text'); //returns some link
How do I extract this text (123) 456-7890 after last <br> tag or before first <hr> tag? I have visited this link, but I couldn't understand properly.
I have also tried this:
->filterXPath('//font//br[last()]')->extract('_text'); // returns nothing but empty
Select the last br, then its first text sibling:
//font/br[last()]/following-sibling::text()[1]

How to get content from a div using regex

I have string like :
<div class="fck_detail">
<table align="center" border="0" cellpadding="3" cellspacing="0" class="tplCaption" width="1">
<tbody>
<tr><td>
<img alt="nole-1375196668_500x0.jpg" src="http://l.f1.img.vnexpress.net/2013/07/30/nole-1375196668_500x0.jpg" width="500">
</td></tr>
<tr><td class="Image">
Djokovic hậm hực với các đàn anh. Ảnh: <em>Livetennisguide.</em>
</td></tr>
</tbody>
</table>
<p>Riêng với Andy Murray, ...</p>
<p style="text-align:right;"><strong>Anh Hào</strong></p>
</div>
I want to get content . How to write this pattern using preg_match. Please help me
If there are no other HTML tags inside the div, then this regex should work:
$v = '<div class="fck_detail">Some content here</div>';
$regex = '#<div class="fck_detail">([^<]*)</div>#';
preg_match($regex, $v, $matches);
echo $matches[1];
The actual regex here is <div class="fck_detail">([^<]*)</div>. Regexes used in PHP also need to be surrounded by some other character that doesn't occur in the regex (I used #).
However, if what you're parsing is arbitrary HTML provided by the user, then preg_match simply can't do this. Full-fledged HTML parsing is beyond the ability of any regex, and that's what you'll need if you're parsing the output of a full-fledged HTML editor.

Creating pattern to php preg_match_all

I'm using preg_match_all() and my problem is that I can't create the pattern that I want. Example of source text:
<td align='left'>
<span style='font-size: 13px; font-family: Verdana;'><span>
</td>
<td>
<a style='color: #ffff00' rel='gb_page_fs[]' title='Parodyk kitiems 8 seriją' href='/pasidalink-19577x10/'>
<img src="/templates/filmai_black/images/ico_tool_share.gif" />
</a>
</td>
<td>
<small>LT titrai</small>
</td>
<td>
<a rel='gb_page_center[528, 290]' title='Žiūrėti 8 seriją' href='http://www.filmai.in/watch.php?em=BuwgzpqtssiAGGcjeekz9PTI1NjQ0N2E~'>
<img src="/templates/filmai_black/images/play_icon.png" width="20" onclick='set_watched_cookie_serial("19577x10", "done-tick-full-series")' />
</a>
</td>
I am using the pattern:
<td><small>(.*)</small></td>
<td><a rel='gb_page_center[528, 290]' title='Žiūrėti (.*) seriją' href='(.*)'><img src=
I want to get the content in the (.*) location into an array.
Can someone please correct my pattern and explain it?
I want to learn to use regular expressions.
"Don't use Regex to parse HTML" aside,
here are a few uber simple steps to learning Regexp.
Download and install RegexBuddy
Run RegexBuddy
Start with something easy and then FLY! :)
the expression you are looking for is:
<small>(.*)</small>
It finds all characters found inbetween small tags and puts them into backreferences.
Think of Backreference as an Array. The first to item found, is 0, next is 1 and so on.
// command:
preg_match_all('%<small>(.*)</small>%i', $subject, $result, PREG_PATTERN_ORDER);
// $result[0]
Array
(
[0] => <small>LT titrai</small>
)

What's going on with this space between my images? PHP/HTML

So, currently i'm programming my own website. The design is way not finished yet, so don't blame me for that xD Well, I've got a very small problem with a gap which is not supposed to be where it is (red square in the picture below). Do you know how it comes/came there?
Thanks in advance!
My Code:
(the while loop is there the pictures are displayed)
JustBasti's website
Home
<a href='index.php?mod=news'>News</a>
<a href='index.php?mod=allnews'>All News</a>
<a href='index.php?mod=gallery'>Gallery</a>
<a href='index.php?mod=guestbook'>Guestbook</a>
<a href='index.php?mod=admin'>Administrator</a><table>
<tr>
<td>Nam liber tempor</td>
</tr>
<tr>
<td>Saturday, 11th Jun 2011, 7:00 pm</td>
</tr>
<tr>
<td><a href='albums/110611190045 - Nam liber tempor/aus.png' rel='lightbox[testalbum]'
title='aus.png'><img src='albums/110611190045 - Nam liber tempor/thumbs/aus.png' /> </a>
</td>
<td><a href='albums/110611190045 - Nam liber tempor/airport.png' rel='lightbox[testalbum]'
title='airport.png'><img src='albums/110611190045 - Nam liber tempor/thumbs/airport.png' /> </a>
</td>
<td><a href='albums/110611190045 - Nam liber tempor/fam.png' rel='lightbox[testalbum]'
title='fam.png'><img src='albums/110611190045 - Nam liber tempor/thumbs/fam.png' /> </a>
</td>
<td><a href='albums/110611190045 - Nam liber tempor/way.png' rel='lightbox[testalbum]'
title='way.png'><img src='albums/110611190045 - Nam liber tempor/thumbs/way.png' /> </a>
</td>
</tr>
while ($photos = mysql_fetch_array($photo)){
$url_thumb = $photos['url_thumb'];
$url = $photos['url'];
$title = $photos['title'];
?>
<td><a href='<? echo $url; ?>' rel='lightbox[testalbum]'
title='<? echo $title; ?>'><img src='<? echo $url_thumb; ?>' /> </a>
</td>
<?
}
The website is not online yet. I didnt use any CSS. I just coded in php and html.
You have to add colspan=3 or so (or as long as the table goes) for your first 3 rows:
the navigation bar
the latin quote
and the date
Personally I wouldn't even put them in the table, but that's me. Anyway your problem stems from the 3 rows I mentioned being stuck in a cell and forcing the width of the first column to be bigger than the image.
Most of the time problems like this depends on spaces in HTML. Try to delete all the spaces of you code, such as:
<a> </a> --> <a></a>
And you'll see the spaces will disappear.
If you are using Firefox, try plugin Firebug and right-click on the empty space, choose "edit", on its left panel it will show you all margins, paddings etc.
\If using Chrome, you dont need to install anything as by default similar addon its already there.
Eventually another great plugin for FF is WebDeveloper, in its menu go for "select">"select element" and click on your empty space.
Also you should use:
<tr>
<td colspan="4">Nam liber tempor</td>
</tr>
<tr>
<td colspan="4">Saturday, 11th Jun 2011, 7:00 pm</td>
</tr>
if you haven't got it in your CSS already

Categories