If I want to inspect a table row with xpath

If I want to inspect a table row with xpath - php

AS part of a cURL operation I have some parsing I need to do. The data I want resides at ../table/tr/td, with said td being multiple cells containg many strings, one of which is <b>34 PT</b>, however the number is random and I cannot figure out how to just simply do a 'wildcard' or similar.
The suggestions I've found:
/tr[contains(#td, 'PT')]" );
does not return any results, nor does:
/tr/td[contains( #b, 'PT' ) ]
I've removed any kind of search at the end and it returns all of the cells as expected, so I know the data is there. The table cells that contain PT have an <a href> that I need to know.
Here is an example of the entire html:
<table>
<tr>
<td>
<tr>
<td width="120" valign="top" align="center">
<a href="submit.phtml?PT_id=86343434&xcn=b22c57866bfc2bac89b09527b05b7760&location_id=0">
<img height="80" width="80" border="1" alt="" src=".gif">
</a>
<b>3423 PT</b>
<td>
<td>
<tr>
<td> ...and so on
The xpath query was used like this:
#$dom = new DOMDocument();
#$dom->loadHTML( $rawPage );
#$xpath = new DOMXPath( $dom );
#$queryResult = $xpath->query( " //html/body/div[3]/div[3]/table/tr/td[2]/table[2]/tr/td/div/div/table/tr[2]/td/table/tr/td[contains( b, 'PT' ) ]" );

Remove your # symbol so it inspects the element values and not its attributes
ie /tr/td[contains( b, 'PT' ) ]

Related

regex find specific tables in html

i have html like bottom of this. and using PHP
<table style="...">
<tbody>
<tr> <img id="foo" src="foo"/></tr>
</tbody>
</table>
<p> ....</p>
<table style="...">
<tbody>
<tr> <img id="bar" src="bar"/></tr
</tbody>
</table>
I'm beginning PHP.
I want to find specific table like img src or id equals foo or bar.
but selected both tables.
here is my regex
1.find tables has img tag
/<table.*?>.*?<img *.*?<\/table>/
-> selected 2 table
2.add img src
<table.*?<img.+(src=.*?foo).*?<\/table>
-> selected all, from first tag to last tag
3.so try to not include </table> between ... tag.
<table.*?(?!<\/table>).*?<img.+(src=.*?foo).*?<\/table>
-> same result
I don't know what is wrong!
I was solved using preg_match_all() but still want know preg_match()
has any idea??
thanks!

This job is much better suited to using PHPs DOMDocument and DOMXPath classes. In this case we use an xpath to search for a table which has a descendant which is an img with it's src attribute equal to either 'foo' or 'bar':
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$footable = $xpath->query("//table[descendant::img[#src='foo']]");
echo $footable->item(0)->C14N() . "\n";
$bartable = $xpath->query("//table[descendant::img[#src='bar']]");
echo $bartable->item(0)->C14N() . "\n";
Output:
<table style="..."><tbody><tr><img id="foo" src="foo"></img></tr></tbody></table>
<table style="..."><tbody><tr><img id="bar" src="bar"></img></tr></tbody></table>
Demo on 3v4l.org

remove all text from node using domxpath

I am trying to remove all text from node, but when I am removing text, it removes normal text not from table text and inner div's text.
Here is my code:
$dom = new DOMDocument();
$result = $dom->loadHTML($html);
$finder = new DomXPath($dom);
//$nodes = $finder->query('//div[starts-with(#id, "post_message_")]');
$nodes = $finder->query('//div[contains(text(), "") and .//img and .//a and starts-with(#id, "post_message_")]');
But it gives me this html in node:
<div id="post_message_31962189">.<br><div align="center"><img src="http://s3.postimage.odf.jpg" border="0" alt=""></div><br><b><div align="center"><font size="5"><font color="Blue"><br><br>
WATERMARKED <br><br>
ADDED 4 IN LAST PAGE<br><br></font></font></div></b><br>
=============================================================================<br>
IN HOTEL <br><br><b><font size="4"><font color="Red"> i promise </font></font></b><br><br><b><div align="center"><font size="5"><font color="Blue">ADDED 4 NEW </font></font></div></b><br><br><br>Ashoka hotel<br><br><br><br><img src="http:/img.jpg" border="0" alt=""></div>
I want to remove all the things except img a and br.

get image src from HTML with regex

I have HTML like
<td class="td_scheda_modello_dati">
<img src="/webapp/safilo/gen_img/p_verde.gif" width="15" height="15" alt="" border="0">
</td>
I want to extract the img src from this HTML using preg_match_all().
I have done this
preg_match_all('#<td class=td_scheda_modello_dati>(.*)<td>#',$detail,$detailsav);
It should give the whole img tag.But it doesn't give me the img tag. So what changes should be done to get the specific value?

Long story short: ideone
You should not use Regex, but instead an HTML parser. Here's how.
<?php
$html = '<img src="/webapp/safilo/gen_img/p_verde.gif" width="15" height="15" alt="" border="0">';
$xpath = new DOMXPath(#DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/#src)");
echo $src;
?>

Try this code.
$html_text = '<td class="td_scheda_modello_dati">
<img src="/webapp/safilo/gen_img/p_verde.gif" width="15" height="15" alt="" border="0"></td>';
preg_match( '/src="([^"]*)"/i', $html_text , $res_array ) ;
print_r($res_array);

Try using the s modifier after your regex. The default behavior for the dot character is not to match newlines (which your example has).
Something like:
preg_match_all('#<td class=td_scheda_modello_dati>(.*)</td>#s',$detail,$detailsav);
Should do the trick.
It's worth reading up a bit on modifiers, the more you do with regex the more useful they become.
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Edit: also, just realized that the code posted was missing a closing td tag (it was <td> instead of </td>). Fixed my example to reflect that.

Try this: <img[^>]*src="([^"]*/gen_img/p_verde.gif)"

How to get content from a div using regex

I have string like :
<div class="fck_detail">
<table align="center" border="0" cellpadding="3" cellspacing="0" class="tplCaption" width="1">
<tbody>
<tr><td>
<img alt="nole-1375196668_500x0.jpg" src="http://l.f1.img.vnexpress.net/2013/07/30/nole-1375196668_500x0.jpg" width="500">
</td></tr>
<tr><td class="Image">
Djokovic hậm hực với các đàn anh. Ảnh: <em>Livetennisguide.</em>
</td></tr>
</tbody>
</table>
<p>Riêng với Andy Murray, ...</p>
<p style="text-align:right;"><strong>Anh Hào</strong></p>
</div>
I want to get content . How to write this pattern using preg_match. Please help me

If there are no other HTML tags inside the div, then this regex should work:
$v = '<div class="fck_detail">Some content here</div>';
$regex = '#<div class="fck_detail">([^<]*)</div>#';
preg_match($regex, $v, $matches);
echo $matches[1];
The actual regex here is <div class="fck_detail">([^<]*)</div>. Regexes used in PHP also need to be surrounded by some other character that doesn't occur in the regex (I used #).
However, if what you're parsing is arbitrary HTML provided by the user, then preg_match simply can't do this. Full-fledged HTML parsing is beyond the ability of any regex, and that's what you'll need if you're parsing the output of a full-fledged HTML editor.

Extract specific data from SimplePie get_content object

I have an RSS feed from which I'm trying to extract data though SimplePie (in WordPress).
I have to extract the content tag. It works with <?php echo $item->get_content(); ?>. It throws out all this stuff (of course this is just an entry, the others have the same structure):
<table><tr valign="top">
<td width="67">
<a href="http://www.anobii.com/books/Lapproccio_sistemico_al_governo_dellimpresa/9788813230944/014c5c45a7ddaab1ec/" style="border: 1px solid #333333">
<img src="http://image.anobii.com/anobi/image_book.php?type=3&item_id=014c5c45a7ddaab1ec&time=0">
</a>
</td><td style="margin-left: 10px;padding-left: 10px">[person name] put "[title]" onto shelf<br/></td></tr></table>
Though what I need is just the content inside src="" tag (image url). How can I extract only that?

You can do it using DOMDocument (the best way):
$doc = new DOMDocument();
#$doc->loadHTML($html);
$imgs = $doc->getElementsbyTagName('img');
$res = $imgs->item(0)->getAttribute('src');
print_r($res);
With a regex (the bad way):
if (preg_match('~\bsrc\s*=\s*["\']\K[^"\']*+~i', $html, $match))
print_r($match);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

If I want to inspect a table row with xpath - php

Remove your # symbol so it inspects the element values and not its attributes ie /tr/td[contains( b, 'PT' ) ]

Related

regex find specific tables in html

remove all text from node using domxpath

get image src from HTML with regex

How to get content from a div using regex

Extract specific data from SimplePie get_content object

Categories

Resources