Can't access parent node of an element via DOM - php

I've got a table with 6 columns, and within the first column, there's a div with some info I don't need. So, I want to delete all the divs, but I keep getting an Trying to get property of non-object error. Here's the code:
$dom = new DomDocument();
#$dom->loadHTML($html); //I've acquired this page via curl
$tbl = $dom->getElementsByTagName('table')->item(4); //The fourth table in the page
$div = $tbl->getElementsByTagName('div');
for ($i = 0; $i < $td->length-1; $i++){
$chld = $div->item($i);
$prnt = $chld->parentNode; <-- here I get the error
$prnt->removeChild($chld);
}
Can you help me? Either by pointing the mistake I've made or giving me a hint at how to do it.

Related

Retrieve a text with certain class name from PHP url

How can I get a text property from another page that has certain class name with PHP?
I have an array list of URLs like this
$url_array = array(
'https://www.example.com/item/32',
'https://www.example.com/item/33',
'https://www.example.com/item/34'
);
This is really difficult to explain, so I made a not-so beautiful sketch of
the process:
The first list of the bubbles are the $url_array's items, which each contains a different URL.
Now I need a method to read the URL, and get its content.
The PHP will return a div element that has an <a> -element with href url, but the url is different for each time.
Now I want to get a content from the <a> elements url. It should return a <span> or <p> tags text content, with text-class as its own class.
How could I achieve this approach into a PHP code?
I have tried this but it ain't working:
$htmlAsString = "index.php";
$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a[#class="class-name"]/#href');
for ($i = 0; $i < $nodeList->length; $i++) {
$url_price = $nodeList->item($i)->value . "<br/>\n";
$retrieve_text_begin = explode('<div class="text-property">',
$url_price);
$retrieve_text_end = explode('</div>', $retrieve_text_begin[1]);
echo $retrieve_text_end[0];
}
I know that the $htmlAsString = "index.php"; might be the problem.

DomXpath and foreach. How to get a preview of the captured elements?

I am learning to deal with DOMXpath in php. I was using regex (but I was discouraged here in the stack when for html capture). I confess that for me it is not so simple and the DOM has its limits (when there are spaces in tag names and also in error handling). If someone can help me with the command in php to get a preview of the captured elements and check if everything is right, I would appreciate it. If you have suggestions for improving the code, you're welcome to do so.The code below was based on a question in Stackoverflow itself.
<?php
$doc = new DOMDocument;
libxml_use_internal_errors(true);
// Deleting whitespace (if any)
$doc->preserveWhiteSpace = false;
#$doc->loadHTML(file_get_contents ('http://www.imdb.com/search/title?certificates=us:pg_13&genres=comedy&groups=top_250'));
$xpath = new DOMXPath($doc);
// Starting from the root element
$grupos = $xpath->query(".//*[#class='lister-item mode-advanced']");
// Creating an array and then looping with the elements to be captured (image, title, and link)
$resultados = array();
foreach($grupos as $grupo) {
$i = $xpath->query(".//*[#class='loadlate']//#src", $grupo);
$t = $xpath->query(".//*[#class='lister-item-header']//a/text()", $grupo);
$l = $xpath->query(".//*[#class='lister-item-header']//a/#href", $grupo);
$resultados[] = $resultado;
}
// What command should I use to have a preview of the results and check if everything is ok?
print_r($resultados);
OK, so here your code with two corrections. First I'm adding a subarray to $resultados with the elements, and seconds I'm making a foreach instead of print_r/var_dump
BTW, doesn't imdb offer an API?
<?php
ini_set('display_errors', 1);
error_reporting(-1);
$doc = new DOMDocument;
libxml_use_internal_errors(true);
// Deleting whitespace (if any)
$doc->preserveWhiteSpace = false;
$doc->loadHTML(file_get_contents ('http://www.imdb.com/search/title?certificates=us:pg_13&genres=comedy&groups=top_250'));
//$doc->loadHTML($HTML);
$xpath = new DOMXPath($doc);
// Starting from the root element
$grupos = $xpath->query(".//*[#class='lister-item mode-advanced']");
// Creating an array and then looping with the elements to be captured (image, title, and link)
$resultados = array();
foreach($grupos as $grupo) {
$i = $xpath->query(".//*[#class='loadlate']//#src", $grupo);
$t = $xpath->query(".//*[#class='lister-item-header']//a/text()", $grupo);
$l = $xpath->query(".//*[#class='lister-item-header']//a/#href", $grupo);
$resultados[] = ['i' => $i[0], 't' => $t[0], 'l' => $l[0]];
}
// What command should I use to have a preview of the results and check if everything is ok?
//var_dump($resultados);
foreach($resultados as $r){
echo "\n-----------\n";
echo $r['i']->value."\n";
echo $r['t']->textContent."\n";
echo $r['l']->value."\n";
}
You can play with it here:
https://3v4l.org/hal0G

Display first 4 columns of external table

I am using Windows software to organize a tourpool. This program creates (among other things) HTML pages with rankings of participants. But these HTML pages are quite hideous, so I am building a site around it.
To show the top 10 ranking I need to select the first 10 out of about 1000 participants of the generated HTML file and put it on my own site.
To do this, I used:
// get top 10 ranks of p_rank.html
$file_contents = file_get_contents('p_rnk.htm');
$start = strpos($file_contents, '<tr class="header">');
// get end
$i = 11;
while (strpos($file_contents, '<tr><td class="position">'. $i .'</td>', $start) === false){
$i++;
}
$end = strpos($file_contents, '<td class="position">'. $i .'</td>', $start);
$code = substr($file_contents, $start, $end);
echo $code;
This way I get it to work, only the last 3 columns (previous position, up or down and details) are useless information. So I want these columns deleted or find a way to only select and display the first 4.
How do i manage this?
EDIT
I adjusted my code and at the end I only echo the adjusted table.
<?php
$DOM = new DOMDocument;
$DOM->loadHTMLFile("p_rnk.htm");
$table = $DOM->getElementsByTagName('table')->item(0);
$rows = $table->getElementsByTagName('tr');
$cut_rows_after = 10;
$cut_colomns_after = 3;
$row_index = $rows->length-1;
while($row = $rows->item($row_index)) {
if($row_index+1 > $cut_rows_after)
$table->removeChild($row);
else {
$tds = $row->getElementsByTagName('td');
$colomn_index = $tds->length-1;
while($td = $tds->item($colomn_index)) {
if($colomn_index+1 > $cut_colomns_after)
$row->removeChild($td);
$colomn_index--;
}
}
$row_index--;
}
echo $DOM->saveHTML($table);
?>
I'd say that the best way to deal with such stuff is to parse the html document (see, for instance, the first anwser here) and then manipulate the object that describes DOM. This way, you can easily extract the table itself using various selectors, get your 10 first records in a simpler manner and also will be able to remove unnecessary child (td) nodes from each line (using removeChild). When you're done with modifying, dump the resulting HTML using saveHTML.
Update:
ok, here's a tested code. I removed the necessity to hardcode the numbers of colomns and rows and separated the desired numbers of colomns and rows into a couple of variables (so that you can adjust them if neede). Give the code a closer look: you'll notice some details which were missing in you code (index is 0..999, not 1..1000, that's why all those -1s and +1s appear; it's better to decrease the index instead of increasing because in this case you don't have to case about numeration shifts on removing; I've also used while instead of for not to care about cases of $rows->item($row_index) == null separately):
<?php
$DOM = new DOMDocument;
$DOM->loadHTMLFile("./table.html");
$table = $DOM->getElementsByTagName('tbody')->item(0);
$rows = $table->getElementsByTagName('tr');
$cut_rows_after = 10;
$cut_colomns_after = 4;
$row_index = $rows->length-1;
while($row = $rows->item($row_index)) {
if($row_index+1 > $cut_rows_after)
$table->removeChild($row);
else {
$tds = $row->getElementsByTagName('td');
$colomn_index = $tds->length-1;
while($td = $tds->item($colomn_index)) {
if($colomn_index+1 > $cut_colomns_after)
$row->removeChild($td);
$colomn_index--;
}
}
$row_index--;
}
echo $DOM->saveHTML();
?>
Update 2:
If the page doesn't contain tbody, use the container which is present. For instance, if tr elements are inside a table element, use $DOM->getElementsByTagName('table') instead of $DOM->getElementsByTagName('tbody').

xpath->query() return 0 even element id present on the source

I am trying to find an element using #$xpath->query() function but it return a zero length for some of the url. Here's my code what I m trying to do :
$strLink ="http://www.pennenergy.com/3bl-energy-news.html"
$arrElements = array('//div[#id="threeblmediawidget"]', '//div[#id="threeblmediadetaillist"]');
$dom = new DOMDocument();
$fileContents = file_get_contents($strLink);
#$dom->loadHTML($fileContents);
$xpath = new DOMXPath($dom);
foreach ($arrElements as $strKey => $strVal) {
$arrParams[] = #$xpath->query($strVal);
}
That code working fine for most the url except some of them. Those url have one of the id element on the source but it returns zero. I am not able to find the issue.
I have tried to use xpath checker plugin in Mozilla and it shows the result for those url but not with the code I using. if someone have any suggestion, please help.

Getting table cell TD value using XPath and DOM in PHP

I need to access table cell values via DOM / PHP. The web page is loaded into $myHTML. I have identified the XPath as :
//*[#id="main-content-inner"]/div[2]/div[1]/div/div/table/tbody/tr/td[1]
I want to get the text of the value in the cell as follows:
$dom = new DOMDocument();
$dom->loadHTML($myHTML);
$xpath = new DOMXPath($dom);
$myValue = $xpath->query('//*[#id="main-content-inner"]/div[2]/div[1]/div/div/table/tbody/tr/td[1]');
echo $myValue->nodeValue;
But I am getting "Undefined Property: DOMNodeList::$nodeValue error. How do I retrieve the value of this table cell? I have tried various techniques from stackoverflow with no luck.
DOMXPath::query() returns a DOMNodeList, even if there's only one match.
If you know for sure you have a match there, you can use
echo $myValue->item(0)->nodeValue;
But if you want to be bullet proof, you better check the length in advance, e.g.
if ($myValue->length > 0) {
echo $myValue->item(0)->nodeValue;
} else {
//No such cell. What now?
}

Categories