XPath doesn't return < symbol - php

I'm using the Symfony Crawler component which use XPath itself.
I have a HTML of a nutritional table
<table>
<tr>
<td> Carbohydrate </td>
<td> 10g </td>
</tr>
<tr>
<td> Fat </td>
<td> < 0,1 </td>
</tr>
</table>
This is what I tried
$fatCell = $browser->filterXPath('//td[contains(text(), "Fat")]');
$fatCell->outerHtml() will return
<td>\n
Fat\n
</td>
$fatCell->nextAll()->outerHtml() will return
<td>\n
\n
</td>
And I try to get the information with XPath query, but when I try to access to fat informations, it's empty, it seems that the character < is misunderstood by XPath,
can I do something for this ?

Try
<table>
<tbody>
<tr>
<td>Carbohydrate</td>
<td>10g</td>
</tr>
<tr>
<td>Fat</td>
<td>< 0,1</td>
</tr>
</tbody>
</table>

Related

how would i remove some portion of <tr><td> from a table in php?

this is my table -
<table>
<tr>
<td>ABC</td>
</tr>
<tr>
<td> </td>
</tr>
</table>
and I want to remove this one table row:
<tr>
<td> </td>
</tr>
my expected output is:
<table>
<tr>
<td>ABC</td>
</tr>
</table>
is it possible??please help me
As you have tagged your question with the tag php i would recommend using a regular expression.
The pattern \s*<tr>\s*<td> <\/td>\s*<\/tr> will find the tr with an empty ( ) td.
To test and look into the regex you can have a look here: https://regex101.com/r/ax6Xdg/1
Put together this will look something like this:
$table = "<table>
<tr>
<td>ABC</td>
</tr>
<tr>
<td> </td>
</tr>
</table>";
$pattern = "/\s*<tr>\s*<td> <\/td>\s*<\/tr>/";
var_dump( preg_replace( $pattern , "" , $table ) );
This will output something very simmilar to this:
string '<table>
<tr>
<td>ABC</td>
</tr>
</table>' (length=60)
You can do this by using JQuery function .remove(). You can look it up here
Edit: If you want to locate that specific tag, you can do that by using .next()read here, .find() read here,.parent()read here, .children read here
Just add id to your table :
<table id="tableid">
<tr>
<td>ABC</td>
</tr>
<tr>
<td> </td>
</tr>
</table>
This script find , if found then remove !
$('#tableid tr').each(function() {
if ($(this).find('td').html()==' ') $(this).remove();
});
If you want to find some text and then remove then replace html() with text()
$('#tableid tr').each(function() {
if ($(this).find('td').text()=='ABC') $(this).remove();
});
You should try this:
<table>
<tr id="abc>
<td>ABC</td>
</tr>
<tr id="remove">
<td> </td>
</tr>
<script>
$('#remove').remove();
</script>
When rendering the table, add a unique class for the rows you wish to delete. Lets say the class is: _rowToDelete, and then using jQuery, remove all the rows that have this class.
In the below example, when you click on the button the rows are being removed, so you can see the changes. But you can do the same on page load if you wish so.
$(function() {
$("#removeBtn").click(function() {
$("._rowToDelete").remove() ;
});
}) ;
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table>
<tr>
<td>ABC 1</td>
</tr>
<tr class="_rowToDelete">
<td> </td>
</tr>
<tr>
<td>ABC 2</td>
</tr>
<tr class="_rowToDelete">
<td> </td>
</tr>
<tr>
<td>ABC 3</td>
</tr>
<tr class="_rowToDelete">
<td> </td>
</tr>
</table>
Remove

PHP with DOMXPath - How to select and count from this html tree

I need to count how many of these items are open, and there are four types of them: Easy, Medium, Difficult and Not-Wanted. All of these types are values inside the div's. I need to exclude the 'Not-Wanted' types from the count. Notice the 'Open' and 'Close' values have different number of spaces around them. This is the html structure:
<table>
<tbody>
<tr>
<td>
<div>Difficult</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Closed </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Medium</div>
</td>
<td>Name</td>
<td>Open </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Medium</div>
</td>
<td>Name</td>
<td> Closed</td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td>Closed </td>
</tr>
<tr>
<td>
<div>Not-wanted</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Difficult</div>
</td>
<td>Name</td>
<td>Open</td>
</tr>
............
This is one of my attempts to solve the problem. It is obviously wrong, but I don't know how to get it right.
$doc = new DOMDocument();
$doc->loadHtmlFile('http://www.nameofsite.com');
$doc->preserveWhiteSpace = false;
$xpath = new DOMXPath($doc);
$elements = $xpath->query("/html/body/div[1]/div/section/div/section/article/div/div[1]/div/div/div[2]/div[1]/div[2]/div/section/div/div/table/tbody/tr");
$count = 0;
foreach ($elements as $element) {
if ($element->childNodes->nodeValue != 'Not-wanted') {
if ($element->childNodes->nodeValue === 'open') {
$count++;
}
}
}
echo $count;
I have a very rudimental knowledge of DOMXPath, so it is too complex for me, since I'm only able to create simple queries.
Can anybody help?
Thanks in advance.
Based on the data in your example, I think you can adjust the xpath expression to this to get all the <tr>'s that match your conditions:
//table/tbody/tr[normalize-space(td[3]/text()) = 'Open' and
td[1]/div/text() != 'Not-wanted']
$elements is then of type DOMNodeList and you can then get the length property to get the number of nodes in the list.
For example:
$source = <<<SOURCE
<table>
<tbody>
<tr>
<td>
<div>Difficult</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Closed </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Medium</div>
</td>
<td>Name</td>
<td>Open </td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Medium</div>
</td>
<td>Name</td>
<td> Closed</td>
</tr>
<tr>
<td>
<div>Easy</div>
</td>
<td>Name</td>
<td>Closed </td>
</tr>
<tr>
<td>
<div>Not-wanted</div>
</td>
<td>Name</td>
<td> Open </td>
</tr>
<tr>
<td>
<div>Difficult</div>
</td>
<td>Name</td>
<td>Open</td>
</tr>
</tbody>
</table>
SOURCE;
$doc = new DOMDocument();
$doc->loadHTML($source);
$doc->preserveWhiteSpace = false;
$xpath = new DOMXPath($doc);
$elements = $xpath->query("//table/tbody/tr[normalize-space(td[3]/text()) = 'Open' and td[1]/div/text() != 'Not-wanted']");
echo $elements->length;
Which will result in:
5
Demo

Clickable row with link

How can I make each row clickable without repeating
This one is an example that shows the problem, parameter could be the code:
<table>
<thead>
<tr>
<th>Code</th>
<th>User</th>
...
</tr>
</thead>
<tbody>
<tr>
<td> 123 </td>
<td> User A </td>
...
</tr>
<tr>
<td> 456 </td>
<td> User B </td>
...
</tr>
</tbody>
Thanks
Excuse me for my English, I hope that you understand the problem.
There are a few different ways to achieve this. Here are a couple using plain javascript and one using jQuery.
Plain JS
With plain javascript with just use the onclick parameter. Pretty straight forward.
<table>
<thead>
<tr>
<th>Code</th>
<th>User</th>
...
</tr>
</thead>
<tbody>
<tr onclick="window.location='page/parameter1';">
<td> 123 </td>
<td> User A </td>
...
</tr>
<tr onclick="window.location='page/parameter2';">
<td> 456 </td>
<td> User B </td>
...
</tr>
</tbody>
</table>
jQuery
With jQuery you add a class so you can use that as the selector. There is also a data-href parameter that will hold the URL you want the user to go to when they click the row.
<table>
<thead>
<tr>
<th>Code</th>
<th>User</th>
...
</tr>
</thead>
<tbody>
<tr class="clickable" data-href="page/parameter1">
<td> 123 </td>
<td> User A </td>
...
</tr>
<tr class="clickable" data-href="page/parameter2">
<td> 456 </td>
<td> User B </td>
...
</tr>
</tbody>
</table>
<script>
jQuery(document).ready(function($) {
$("tr.clickable").click(function() {
window.location = $(this).data("href");
});
});
</script>
Your code should look like :
<table>
<thead>
<tr>
<th>Code</th>
<th>User</th>
...
</tr>
</thead>
<tbody>
<tr>
<td> 123 </td>
<td> User A </td>
...
</tr>
<tr>
<td> 456 </td>
<td> User B </td>
...
</tr>
</tbody>
Added end tag </a>

XPath select descendent of parents sibling with in limits

My xpath:
(//tr[td[contains(., 'Refine by Vehicle Types')]])[1] /following-sibling::tr /td/div/table /tr/td/font /ul/li/a
My source:
<tr><td><font color="White">Refine by Vehicle Types</font></td> </tr>
<tr><td><div>
<table> <tr> <td><font<ul><li><a> Automobile/Light Trucks</a></li></ul></font></td> </tr> </table>
</div></td> </tr>
<tr> <td></td> </tr>
<tr> <td><font>Refine by Category</font></td> </tr>
<tr> <td><div>
<table> <tr> <td><font><ul><li><a>Agricultural</a></li></ul></font></td></tr>
I'm trying to scrape this source and collect the <li> nodes after "Refine by Vehicle Types" but not after "Refine by Category".
Any help is appriciated.
You are almost there.
Change:
(//tr
[td[contains(., 'Refine by Vehicle Types')]]
)
[1]
/following-sibling::tr
/td/div/table
/tr/td/font
/ul/li/a
to:
(//tr
[td[contains(., 'Refine by Vehicle Types')]]
)
[1]
/following-sibling::tr[1]
/td/div/table
/tr/td/font
/ul/li/a
When the second XPath expression is evaluated against the following XML document (your severely malformed text corrected to become a well-formed XML document):
<table>
<tr>
<td>
<font color="White">Refine by Vehicle Types</font>
</td>
</tr>
<tr>
<td>
<div>
<table>
<tr>
<td>
<font>
<ul>
<li>
<a> Automobile/Light Trucks</a>
</li>
</ul>
</font>
</td>
</tr>
</table>
</div>
</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td>
<font>Refine by Category</font>
</td>
</tr>
<tr>
<td>
<div>
<table>
<tr>
<td>
<font>
<ul>
<li><a>Agricultural</a></li>
</ul>
</font>
</td>
</tr>
</table>
</div>
</td>
</tr>
</table>
Only one -- the wanted -- a element is selected:
<a> Automobile/Light Trucks</a>
Note: Did I mention that an XPath Visualizer will help you a lot?
For a robust XPath, which will work no matter how many tr/li elements are between the two text labels, try:
(//tr
[td[contains(., 'Refine by Vehicle Types')]]
)[1]
/following-sibling::tr[not(preceding-sibling::tr
[contains(., 'Refine by Category')])]
/td/div/table
/tr/td/font
/ul/li/a
(Borrowing from #Dimitre's formatting.)
The above is inefficient (could be O(n^2)), so if you have a long page, it could get slow.
But for moderate pages it should be fine.

How can i get the entire HTML of an element using regex?

i'm learning Regex but can't figure it out.... i want to get the entire HTML from a DIV, how to procced?
already tried this;
/\< td class=\"desc1\"\>(.+)/i
it returns;
Array
(
[0] => < td class="desc1">
[1] =>
)
the code that i'm matching is this;
<table id="profile" cellpadding="1" cellspacing="1">
<thead>
<tr>
<th colspan="2">Jogador TheInFEcT </th>
</tr>
<tr>
<td>Detalhes</td>
<td>Descrição:</td>
</tr>
</thead><tbody>
<tr>
<td class="empty"></td><td class="empty"></td>
</tr>
<tr>
<td class="details">
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<th>Classificação</th>
<td>11056</td>
</tr>
<tr>
<th>Tribo:</th>
<td>Teutões</td>
</tr>
<tr>
<th>Aliança:</th>
<td>-</td>
</tr>
<tr>
<th>Aldeias:</th>
<td>1</td>
</tr>
<tr>
<th>População:</th>
<td>2</td>
</tr><tr>
<td colspan="2" class="empty"></td>
</tr>
<tr>
<td colspan="2"> » Alterar perfil</td>
</tr>
</tbody></table>
</td>
<td class="desc1">
<div>STATUS: OFNAaaaAA</div>
</td>
</tr>
</tbody>
</table>
i need to get the entire code inside the < td class="desc1">, like that;
<div >STATUS: OFNAaaaAA< /div>
</td>
</tr>
</tbody>
</table>
Could someone help me out?
Thanks in advance.
I usually use
$dom = DOMDocument::load($htmldata);
for converting HTML code to XML DOM. And then you can use
$node = $dom->getElementsById($id);
/* or */
$nodes = $dom->getElementsByTagName($tag);
to get your HTML/XML node.
Now, use
$node->textContent
to get data inside node.
try this, it does not cover all possible cases but it should work:
/<td\s+class=['"]\s*desc1\s*['"]\s*>((.|\n)*)<\/td>/i
tested with: http://www.pagecolumn.com/tool/pregtest.htm
edit: improved solution suggested by Alan Moore
/<td\s+class=['"]\s*desc1\s*['"]\s*>(.*?)<\/td>/s

Categories