My xpath:
(//tr[td[contains(., 'Refine by Vehicle Types')]])[1] /following-sibling::tr /td/div/table /tr/td/font /ul/li/a
My source:
<tr><td><font color="White">Refine by Vehicle Types</font></td> </tr>
<tr><td><div>
<table> <tr> <td><font<ul><li><a> Automobile/Light Trucks</a></li></ul></font></td> </tr> </table>
</div></td> </tr>
<tr> <td></td> </tr>
<tr> <td><font>Refine by Category</font></td> </tr>
<tr> <td><div>
<table> <tr> <td><font><ul><li><a>Agricultural</a></li></ul></font></td></tr>
I'm trying to scrape this source and collect the <li> nodes after "Refine by Vehicle Types" but not after "Refine by Category".
Any help is appriciated.
You are almost there.
Change:
(//tr
[td[contains(., 'Refine by Vehicle Types')]]
)
[1]
/following-sibling::tr
/td/div/table
/tr/td/font
/ul/li/a
to:
(//tr
[td[contains(., 'Refine by Vehicle Types')]]
)
[1]
/following-sibling::tr[1]
/td/div/table
/tr/td/font
/ul/li/a
When the second XPath expression is evaluated against the following XML document (your severely malformed text corrected to become a well-formed XML document):
<table>
<tr>
<td>
<font color="White">Refine by Vehicle Types</font>
</td>
</tr>
<tr>
<td>
<div>
<table>
<tr>
<td>
<font>
<ul>
<li>
<a> Automobile/Light Trucks</a>
</li>
</ul>
</font>
</td>
</tr>
</table>
</div>
</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>
<font>Refine by Category</font>
</td>
</tr>
<tr>
<td>
<div>
<table>
<tr>
<td>
<font>
<ul>
<li><a>Agricultural</a></li>
</ul>
</font>
</td>
</tr>
</table>
</div>
</td>
</tr>
</table>
Only one -- the wanted -- a element is selected:
<a> Automobile/Light Trucks</a>
Note: Did I mention that an XPath Visualizer will help you a lot?
For a robust XPath, which will work no matter how many tr/li elements are between the two text labels, try:
(//tr
[td[contains(., 'Refine by Vehicle Types')]]
)[1]
/following-sibling::tr[not(preceding-sibling::tr
[contains(., 'Refine by Category')])]
/td/div/table
/tr/td/font
/ul/li/a
(Borrowing from #Dimitre's formatting.)
The above is inefficient (could be O(n^2)), so if you have a long page, it could get slow.
But for moderate pages it should be fine.
Related
I'm using the Symfony Crawler component which use XPath itself.
I have a HTML of a nutritional table
<table>
<tr>
<td> Carbohydrate </td>
<td> 10g </td>
</tr>
<tr>
<td> Fat </td>
<td> < 0,1 </td>
</tr>
</table>
This is what I tried
$fatCell = $browser->filterXPath('//td[contains(text(), "Fat")]');
$fatCell->outerHtml() will return
<td>\n
Fat\n
</td>
$fatCell->nextAll()->outerHtml() will return
<td>\n
\n
</td>
And I try to get the information with XPath query, but when I try to access to fat informations, it's empty, it seems that the character < is misunderstood by XPath,
can I do something for this ?
Try
<table>
<tbody>
<tr>
<td>Carbohydrate</td>
<td>10g</td>
</tr>
<tr>
<td>Fat</td>
<td>< 0,1</td>
</tr>
</tbody>
</table>
How can I make each row clickable without repeating
This one is an example that shows the problem, parameter could be the code:
<table>
<thead>
<tr>
<th>Code</th>
<th>User</th>
...
</tr>
</thead>
<tbody>
<tr>
<td> 123 </td>
<td> User A </td>
...
</tr>
<tr>
<td> 456 </td>
<td> User B </td>
...
</tr>
</tbody>
Thanks
Excuse me for my English, I hope that you understand the problem.
There are a few different ways to achieve this. Here are a couple using plain javascript and one using jQuery.
Plain JS
With plain javascript with just use the onclick parameter. Pretty straight forward.
<table>
<thead>
<tr>
<th>Code</th>
<th>User</th>
...
</tr>
</thead>
<tbody>
<tr onclick="window.location='page/parameter1';">
<td> 123 </td>
<td> User A </td>
...
</tr>
<tr onclick="window.location='page/parameter2';">
<td> 456 </td>
<td> User B </td>
...
</tr>
</tbody>
</table>
jQuery
With jQuery you add a class so you can use that as the selector. There is also a data-href parameter that will hold the URL you want the user to go to when they click the row.
<table>
<thead>
<tr>
<th>Code</th>
<th>User</th>
...
</tr>
</thead>
<tbody>
<tr class="clickable" data-href="page/parameter1">
<td> 123 </td>
<td> User A </td>
...
</tr>
<tr class="clickable" data-href="page/parameter2">
<td> 456 </td>
<td> User B </td>
...
</tr>
</tbody>
</table>
<script>
jQuery(document).ready(function($) {
$("tr.clickable").click(function() {
window.location = $(this).data("href");
});
});
</script>
Your code should look like :
<table>
<thead>
<tr>
<th>Code</th>
<th>User</th>
...
</tr>
</thead>
<tbody>
<tr>
<td> 123 </td>
<td> User A </td>
...
</tr>
<tr>
<td> 456 </td>
<td> User B </td>
...
</tr>
</tbody>
Added end tag </a>
lets say i retrieve all of the values where their position belongs to top8.I populate them out in a table and instead of displaying different kinds of values , it displays 3 tables with 3 different values, how is this so? any help so that different values belonging to certain values will all be displayed out? i only need one table with 3 different values.
<?
$facebookID = "top8";
mysql_connect("localhost","root","password") or die(mysql_error());
mysql_select_db("schoutweet") or ie(mysql_error());
$data= mysql_query("SELECT schInitial FROM matchTable WHERE position='".$facebookID."'")
or die(mysql_error());
while($row = mysql_fetch_array($data))
{
?>
<center>
<table border="0" cellspacing="0" cellpadding="0" class="tbl_bracket">
<tr>
<td class="brack_under cell_1"><a href="www.facebook.com"/>team 1.1><?= $row['schInitial']?><a/></td>
<td class="cell_2"> </td>
<td class="cell_3"> </td>
<td class="cell_4"> </td>
<td class="cell_5"> </td>
<td class="cell_6"> </td>
</tr>
<tr>
<td class="brack_under_right_up">team 1.2><?= $row['schInitial']?></</td>
<td class="brack_right"><!--1.2.1--></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td class="brack_right"><!--2.1--></td>
<td class="brack_under"><!--3.1--></td>
<td><!--here?--></td>
<td><!--there?--></td>
<td><!--everywhere?--></td>
</tr>
</table>
</center>
<?
}
?>
</body>
That's because your <table> tag is within the loop! Place the <table> tag outside the while loop.
place your table tags outside the while loop
Because your writing the table tag inside the while loop. Everything inside the loop is done each loop cycle. If you only want to have one table in the output, you'll have to open and close the table outside of the loop, like this:
$data= mysql_query("SELECT schInitial FROM matchTable WHERE position='".$facebookID."'")
or die(mysql_error());
?>
<center>
<table border="0" cellspacing="0" cellpadding="0" class="tbl_bracket">
<?
while($row = mysql_fetch_array($data))
{
?>
<tr>
<td class="brack_under cell_1"><a href="www.facebook.com"/>team 1.1><?= $row['schInitial']?><a/></td>
<td class="cell_2"> </td>
<td class="cell_3"> </td>
<td class="cell_4"> </td>
<td class="cell_5"> </td>
<td class="cell_6"> </td>
</tr>
<tr>
<td class="brack_under_right_up">team 1.2><?= $row['schInitial']?></</td>
<td class="brack_right"><!--1.2.1--></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td class="brack_right"><!--2.1--></td>
<td class="brack_under"><!--3.1--></td>
<td><!--here?--></td>
<td><!--there?--></td>
<td><!--everywhere?--></td>
</tr>
<?
}
?>
</table>
</center>
That will, however, print three rows per loop and therefore per record (but you have references to the table contents in two of them, so I suppose that's what you want?).
Also take care about some not well-formed HTML you have there (e.g. the > character in the expression team 1.1> / team 1.2>. If you want to print the > character to the browser, encode it as HTML entity (> for this case). You also have a probably superfluous </ in the first column of the second row (</</td>).
you need to echo the HTML part as well in the while loop like
echo '<table>';
I've an XML file like this:
<tr class="station">
<td class="realtime">
<span>
15:11
</span>
</td>
</tr>
<tr class="station">
<td class="clock">
15:20
</td>
</tr>
<tr class="station">
<td class="clock">
15:30
</td>
</tr>
<tr class="station">
<td class="realtime">
<span>
15:41
</span>
</td>
</tr>
and I wanna parse it with xpath in php. The xml is been updated and parsed quite often.
I always want to get the first time (in this case 15:11)
The problem is that its not sure whether the surrounding tag is a td by class "clock" or "realtime".
If there is a so surrounding realtime, then there is a span tag within. Otherwise not.
In fact, its always the first "station"-class tag in which the information is, that matters.
So is it possible to tell xpath to just evaluate within this tag?
Is there a good method for doing this in xpath?
(sry for my bad english)
In fact, its always the first
"station"-class tag in which the
information is, that matters. So is it
possible to tell xpath to just
evaluate within this tag?
With this wellformed input source:
<table>
<tr class="station">
<td class="realtime">
<span>
15:41
</span>
</td>
</tr>
<tr class="station">
<td class="clock">
15:20
</td>
</tr>
<tr class="station">
<td class="clock">
15:30
</td>
</tr>
<tr class="station">
<td class="realtime">
<span>
15:41
</span>
</td>
</tr>
</table>
This XPath expression:
/table/tr[#class='station'][1]/td
Note: Just select the element you want and use the proper DOM API method to get the string value. It doesn't matter whether there is a span element or not.
If you want to...
/table/tr[#class='station'][1]/td//text()
i'm learning Regex but can't figure it out.... i want to get the entire HTML from a DIV, how to procced?
already tried this;
/\< td class=\"desc1\"\>(.+)/i
it returns;
Array
(
[0] => < td class="desc1">
[1] =>
)
the code that i'm matching is this;
<table id="profile" cellpadding="1" cellspacing="1">
<thead>
<tr>
<th colspan="2">Jogador TheInFEcT </th>
</tr>
<tr>
<td>Detalhes</td>
<td>Descrição:</td>
</tr>
</thead><tbody>
<tr>
<td class="empty"></td><td class="empty"></td>
</tr>
<tr>
<td class="details">
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<th>Classificação</th>
<td>11056</td>
</tr>
<tr>
<th>Tribo:</th>
<td>Teutões</td>
</tr>
<tr>
<th>Aliança:</th>
<td>-</td>
</tr>
<tr>
<th>Aldeias:</th>
<td>1</td>
</tr>
<tr>
<th>População:</th>
<td>2</td>
</tr><tr>
<td colspan="2" class="empty"></td>
</tr>
<tr>
<td colspan="2"> » Alterar perfil</td>
</tr>
</tbody></table>
</td>
<td class="desc1">
<div>STATUS: OFNAaaaAA</div>
</td>
</tr>
</tbody>
</table>
i need to get the entire code inside the < td class="desc1">, like that;
<div >STATUS: OFNAaaaAA< /div>
</td>
</tr>
</tbody>
</table>
Could someone help me out?
Thanks in advance.
I usually use
$dom = DOMDocument::load($htmldata);
for converting HTML code to XML DOM. And then you can use
$node = $dom->getElementsById($id);
/* or */
$nodes = $dom->getElementsByTagName($tag);
to get your HTML/XML node.
Now, use
$node->textContent
to get data inside node.
try this, it does not cover all possible cases but it should work:
/<td\s+class=['"]\s*desc1\s*['"]\s*>((.|\n)*)<\/td>/i
tested with: http://www.pagecolumn.com/tool/pregtest.htm
edit: improved solution suggested by Alan Moore
/<td\s+class=['"]\s*desc1\s*['"]\s*>(.*?)<\/td>/s