Xpath match text in html table but ignore script

Xpath match text in html table but ignore script - php

I am trying to match text, and output the entire row including self in xpath.
The issue I am having is the self node also contains javascript in the html table and it is outputing the script as well.
I have tried the following:
Working but contains javascript from the self node:
$bo_row = $bo_xpath->query( "//td[contains(text(),'1234')]/following-sibling::* | //td[contains(text(),'1234')] " );
Failed attempts all look similar to:
$bo_row = $bo_xpath->query( "//td[contains(text(),'1234')]/following-sibling::* | //td[contains(text(),'1234')]//*[not(self::script)] " );
Here is an example of one table row:
<tr>
<!-- <td><a class=info href="**Missing Data**">
<img src="../images/button_go.gif" border=0>
<span>**Missing Data**</span>
</a>
</td> -->
<script>
if (document.getElementById("Function").value != 'Customer')
document.write('<td><a class=info href="OrdDetLine.pgm?Order=CV780&Page=02&Line=05&Seq=00&ShowPrice=&OpenOnly=&Function=Customer"><img src="../images/button_go.gif" border=0><span>Order Line Detail</span></a></td>');</script>
<td align="left">2-05-00</td>
<td align="left"> 1234
<script>if (document.getElementById("Function").value != 'Customer')
document.write("<a class=info href=#><img src=/operations/images/eye.png border=none onClick=window.open(\'StyleHdr.pgm?CompDiv=CO&Style=1234\'><span>Show style master information panel.</span></a>") ; </script>
</td>
<td align="left">MEN'S LAB/SHOP COATS</td>
<td align="left">REG</td>
<td align="left">NAY</td>
<td align="right">1</td>
<td align="right">April 12, 2019</td>
</tr>
I have tried using getAttribute to select the innertext like so:
$bo_row = $bo_xpath->query( "//tr/td[contains(text(),'1234')]/following-sibling::* | //td[contains(text(),'1234')] " );
echo '<br/>';
if ( $bo_row->length > 0 ) {
foreach ( $bo_row as $row ) {
echo $row->getAttribute ('innerText');
}
However I am either using getAttribute incorrectly or it is not supposed by php as indicated by PHPstorm

You have to use getAttribue('innerText'). Here is the console output with 2 different approaches.

Related

How to add the html tag into codeigniter variable?

I am using the following code for display the table row
$tableRow="<tr>
<td style='text-align:center'>".$services[$i]['serviceName']."</td>
<td style='text-align:center'>".$services[$i]['serviceDesc']."</td>
<td style='text-align:right'>".$services[$i]['taxAmt']."</td>
</tr>";
But only value is display.I got the following out
test tset 43500
I want like these
<tr>
<td style='text-align:center'>test</td>
<td style='text-align:center'>tset</td>
<td style='text-align:right'>43500</td>
</tr>

You can use php function for display your value
$tableRow="<tr>
<td style='text-align:center'>".$services[$i]['serviceName']."</td>
<td style='text-align:center'>".$services[$i]['serviceDesc']."</td>
<td style='text-align:right'>".$services[$i]['taxAmt']."</td>
</tr>";
echo htmlentities($tableRow);

replace < by < and > by >

You can use the tag inside any variable like :
$a = '<b>hell</b>';
then pass the variable to view;
$this->load->view('view_name',$a);

You want to display the literal source of HTML so you can use htmlspecialchars htmlspecialchars will do a character translation, pasted here for reference
Character Replacement
& &
" "
' '
< <
> >
You should call it like this:
echo htmlspecialchars($tableRow, ENT_QUOTES, 'UTF-8');

How Can I nest two foreach to get right values output?

I have two foreach blocks to get some values from a web page (I'm scraping values from an HTML table).
<?php
include('../simple_html_dom.php');
$html = file_get_html('http://www.betexplorer.com/soccer/belgium/jupiler-league/results/');
foreach($html->find('td') as $e) {
echo $e->innertext . '<br>';
}
foreach( $html->find('td[data-odd]') as $td ) {
echo $td->attr['data-odd'].PHP_EOL;
}
?>
and this is my HTML code:
<tr class="strong">
<td class="first-cell tl">
Waasland-Beveren - Anderlecht
</td>
<td class="result">
1:0
</td>
<td class="odds best-betrate" data-odd="5.97"></td>
<td class="odds" data-odd="4.21"></td>
<td class="odds" data-odd="1.51"></td>
<td class="last-cell nobr date">21.02.2016</td>
</tr>
<tr class="">
<td class="first-cell tl">
Waregem - KV Mechelen
</td>
<td class="result">
2:3
</td>
<td class="odds" data-odd="1.83"></td><td class="odds" data-odd="3.71"></td>
<td class="odds best-betrate" data-odd="3.99"></td>
<td class="last-cell nobr date">21.02.2016</td>
</tr>
In this way, in my output, I get before values from the first foreach and, after, values from the second. I'd like to get values together in the right order. For example:
21.02.2016 Waasland-Beveren - Anderlecht 1:0 5.96 4.20 1.51
21.02.2016 Waregem - KV Mechelen 2:3 1.83 3.71 3.98

If they have the exact (count) or (length). Use the normal for after you assign them to two variables.
$td = $html->find('td');
$attr = $html->find('td[data-odd]');
for($i=0; $i < count($td); $i++)
echo $td[$i]->innertext."<br/>".$attr[$i]->attr['data-odd'].PHP_EOL;
Update:
You want to reorder the tds you received from the HTML file, that means you have to think of another logic in how to retrieve them. This updated code is very specific to your case:
$match_dates = $html->find("td[class=last-cell nobr date]"); // we have 1 per match
$titles = $html->find("td[class=first-cell tl]"); // 1 per match
$results = $html->find("td[class=result]"); // 1
$best_bets = $html->find("td[class=odds best-betrate]"); // 1
$odds = $html->find("td[class=odds]"); // 2
// Now to output everything in whatever order you want:
$c=0; $b=0; // two counters
foreach($titles as $match)
echo $match_dates[$c]->innertext." - ".$match->innertext." [".$results[$c]->innertext."] - Best bet rate: ".$best_bets[$c++]->attr['data-odd']." - odds: ".$odds[$b++]->attr['data-odd'].", ".$odds[$b++]->attr['data-odd']."<br/>";

How to find data and get text from html table td element next to it, without attribute using PHP Simple DOM?

I am using simple_html_dom.php for this task.
I wounder how can I get the value (plaintext) "Data c" or "Data F" from this kind of a table?
TD elements doesn't have any attributes
The "Data c" or "Data F" values can have different position - different indexes.
Is there a way frist to find td with value "Data A" and then using next_sibling() or previous_sibling() get and output the value "Data C"?
How to do this "find and get the data" next/before to the data that web have inside HTML table?
<table class="xyz">
<tbody>
<tr>
<td>Data A<p>data b</p>
</td>
<td>Data C</td>
</tr>
<tr>
<td>Data D<p>data e</p>
</td>
<td>Data F</td>
</tr>
</tbody>
</table>
Or should I use some other tehnique?
Please could You help me with that?
Thank You!

You can do it quite easily retrieve the content from the various table cells with javascript - not sure if this is what you mean?
<script>
var col=document.querySelectorAll('table.xyz td');
for( var n in col ) if( n && col[n] && col[n].nodeType==1 ) console.log( 'Cell:%d Type:%s Value:%s',n,col[n].tagName, col[n].innerHTML );
</script>
will output
-----------
Cell:0 Type:TD Value:Data A<p>data b</p>
Cell:1 Type:TD Value:Data C
Cell:2 Type:TD Value:Data D<p>data e</p>
Cell:3 Type:TD Value:Data F

Targeting specific "nth" HTML tags with PHP Simple HTML DOM Parser

I am using the PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/) to read through a website and output particular information.
I'm trying to output the contents of specific ,tr, tags in every table, and the contents of specific ,p, tags, rather than all tables and all paragraphs.
Therefore, Ideally I would like to set up some PHP code that involves numeric parameters which refer target specific "nth" ,td, or ,p, tags.
As a PHP novice, I greatly appreciate the expertise that is found on StackOverflow.
Thank you for your time and assistance in figuring out my questions.
The first question set is here, above the code. The second question set can be found at the bottom of this post, with the PHP code.
1st question set:
A. How does one output the 2nd and 3rd of every table?
AND
B. How does one output the 4th paragraph after every table and exclude the ,a, tag it contains?
IN
The following HTML code
USING
The PHP Simple HTML DOM Parser as shown in the following PHP code
UNLESS
You have a different suggestion that you believe is better
Below is sample HTML code followed by PHP code and another relevant question set.
This is the main HTML I am interested in.
<a name=“arbitrary_a_tag_Begin_Item_01”></a>
<h2>Item No. 1 </h2>
<table>
<tbody>
<tr>
<td>Item Description:</td>
<td>Big blue ball</td>
</tr>
<tr>
<td>Property Location:</td>
<td>Storage Closet</td>
</tr>
<tr>
<td>Owner:</td>
<td>Gym</td>
</tr>
<tr>
<td>Cost</td>
<td>20.00</td>
</tr>
<tr>
<td>Vendor:</td>
<td>Jim’s Gym Toys</td>
</tr>
</tbody>
</table>
<p>
Approximate minimum acceptable grage sale price: $10
<br>
6 month redemption period
</p>
<p>
<img src="../dec/Item01.jpg">
</p>
<p>
<a target="new" href="http://pictures/Item01.jpg”>Picture of Item 01</a>
</p>
<p>
Current status: In Stock
<a name=“arbitrary_a_tag_Begin_Item_02></a>
</p>
<h2>Item No. 2 </h2>
<table>
<tbody>
<tr>
<td>Item Description:</td>
<td>Green tennis racket</td>
</tr>
<tr>
<td>Property Location:</td>
<td>Gear Lockers</td>
</tr>
<tr>
<td>Owner:</td>
<td>Tennis Team</td>
</tr>
<tr>
<td>Cost</td>
<td>50.00</td>
</tr>
<tr>
<td>Vendor:</td>
<td>Jim’s Gym Toys</td>
</tr>
</tbody>
</table>
<p>
Approximate minimum acceptable grage sale price: $25
<br>
6 month redemption period
</p>
<p>
<img src="../dec/Item02.jpg">
</p>
<p>
<a target="new" href="http://pictures/Item02.jpg”>Picture of Item 02</a>
</p>
<p>
Current status: In Stock
<a name=“arbitrary_a_tag_Begin_Item_03></a>
</p>
<h2>Item No. 3 </h2>
<table>
<tbody>
<tr>
<td>Item Description:</td>
<td>Red Soccer Ball</td>
</tr>
Etc. etc. etc.
The PHP code USING "PHP Simple HTML DOM Parser":
<?php
// Include the library
include('simple_html_dom.php');
$url = 'http://www.URL.com';
// Create DOM from URL or file
$html = file_get_html($url);
foreach($html->find('table') as $table)
{
echo '<table><tbody>';
foreach($table->find('tr') as $tr)
{
echo '<tr>';
foreach($tr->find('td') as $td)
{
echo '<td>';
echo $td->innertext;
echo '</td>';
}
echo '</tr>';
}
echo '</tbody></table><br />';
}
Some things I have come across and unsuccessfully attempted to implement to access specific tags:
The First Concept
$e = $html->find('table', 0)->find('tr', 1)->find('td');
foreach($e as $d){
echo $d;
}
Second concept:
$file = file_get_contents($url);
preg_match_all('#<p>([^<]*)</p>#Usi', $file, $matches);
foreach ($matches as $match)
{
echo $match;
}
Second Question Set:
Regarding this first concept above,
How do I set up a while loop to iterate through, lets say 12 tables?
For example, this: $e = $html->find('table', 0)
reads only the first table.
Yet, I am not sure how to replace the 0 with a variable, such as $i, which can be autoincremented.
$i = 1;
while($i<=12){
What goes here??
}
$i++
Regarding the second concept,
How can I use this (or the first concept) to:
Return an array of all p tags after each table
Read through the string contents (the "contents") within each p tag, and check it against string (the "key")
Only return the string "contents" when the key string is found within the contents
Before outputting the returned "contents" featuring the matched string, exclude/remove a 2nd matched string from the information to be output (for example, in the 1st Question Set, I want to grab everything within a specific ,p, tag, but exclude everything within the ,a, tag).
Thanks very much for your time and assistance!

DOMXPath Query for a dynamic HTML

Suppose that i have this HTML from a source (scrapping it) :
<tr class="calendar_row" data-eventid="41675">
<td class="alt2 eventDate smallfont" align="center"/>
<td class="alt2 smallfont" align="center">9:00pm</td>
<td class="alt2 smallfont" align="center">AUD</td>
<td class="alt2 icon smallfont" align="center">
<div class="cal_imp_medium" title="Medium Impact Expected"/>
</td>
<td class="alt2 eventHigh smallfont" align="center">
<div class="calendar_detail level_1" data-level="1" title="Open Detail"/>
</td>
//I want to get this part below correctly
<td class="alt2 pad_left eventHigh smallfont" align="center">0.2%</td>
<td class="alt2 pad_left eventHigh smallfont" align="center"/>
<td class="alt2 pad_left eventHigh smallfont" align="center">
<span class="revised worse" title="Revised From -0.3%">-0.4%</span>
</td>
</tr>
And I want to get the value (nodeValues) of the td's through XPath :
$query = $xpath->query('//tr[#data-eventid="41675"]/td[#class="alt2 pad_left eventHigh smallfont"]');
I cant figure it out why im only getting the value -0.4%.
Though the html seems to be complicated and regradless of how it is being formatted, is there any possible way (query) to retrieve the values in between tags including the null ones on the second td?
Full Code
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query_results = $xpath->query('//tr[#data-eventid="'.$data_eventid.'"]/td[#class="alt2 pad_left eventHigh smallfont"]');
foreach($query_results as $values){
if($values->nodeValue!=' ' and $values->nodeValue!='' and $values->nodeName!='#text') { //Discards Empty Arrays
$table_values[$data_eventid][5] = $values->nodeValue;
}
}

Try this: //tr[#data-eventid="41675"]/td[#class="alt2 pad_left eventHigh smallfont"]/descendant-or-self::*/text()
Well you probably just want the nodes, so take the /text() off:
//tr[#data-eventid="41675"]/td[#class="alt2 pad_left eventHigh smallfont"]/descendant-or-self::*

Your XPath matches three td elements, the first contains 0.2%, then there is an empty one, and the last one contains <span class="revised worse" title="Revised From -0.3%">-0.4%</span>.
You assign in sequence the values of these nodes (skipping the empty ones) to the same variable table_values[$data_eventid][5] - that so will contain the value of the last (non-empty) node - i.e. -0.4%
If you want the values of all the nodes you should append them to a list, or place them in different elements of an array.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Xpath match text in html table but ignore script - php

You have to use getAttribue('innerText'). Here is the console output with 2 different approaches.

Related

How to add the html tag into codeigniter variable?

How Can I nest two foreach to get right values output?

How to find data and get text from html table td element next to it, without attribute using PHP Simple DOM?

Targeting specific "nth" HTML tags with PHP Simple HTML DOM Parser

DOMXPath Query for a dynamic HTML

Categories

Resources