dom document to get the href and nodeValue - php

I need to fetch the nodeValue and the HREF from this following snippet
<a class="head_title" href="/automotive/pr?sid=0hx">Automotive</a>
To achieve this I have done the following:
foreach($dom->getElementsByTagName('a') as $p) {
if($p->getAttribute('class') == 'head_title') {
foreach($p->childNodes as $child) {
$name = $child->nodeValue;
echo $name ."<br />";
echo $child->hasAttribute('href');
}
}
}
It returns me an error:
PHP Fatal error: Call to undefined method DOMText::hasAttribute()
Can anyone please help me with this.

hasAttribute is valid method for DOMElements but you cannot use it for text nodes. Can you check the type of node and then try to extract the value is its not a 'text' node. The following code might help you
foreach($p->childNodes as $child) {
$name = $child->nodeValue;
echo $name ."<br />";
if ($child->nodeType == 1) {
echo $child->hasAttribute('href');
}
}
It checks if the node is of type 'DOMElement' and invokes hasAttribute method only if it is a DOMElement.

Yes...I did the changes in my coding like the following:
foreach($dom->getElementsByTagName('a') as $link) {
if($link->getAttribute('class') == 'head_title') {
$link2 = $link->nodeValue;
$link1 = $link->getAttribute('href');
echo "".$link2."<br/>";
}
}
And it works for me!

Related

PHP Simple HTML DOM - method find retrieve empty array

Now I am trying to write a PhP parser and I don't why my code return an empty array. I am using PHP Simple HTML DOM. I know my code is't perfect, but it's only for testing.
I will be appreciate for any help
public function getData() {
// get url form urls.txt
foreach ($this->list_url as $i => $url) {
// create a DOM object from a HTML file
$this->html = file_get_html($url);
// find array all elements with class="name" because every products having name
$products = $this->html->find(".name");
foreach ($products as $number => $product) {
// get value attr a=href product
$href = $this->html->find("div.name a", $number)->attr['href'];
// create a DOM object form a HTML file
$html = file_get_html($href);
if($html && is_object($html) && isset($html->nodes)){
echo "TRUE - all goodly";
} else{
echo "FALSE - all badly";
}
// get all elements class="description"
// $nodes is empty WHY? Tough web-page having content and div.description?
$nodes = $html->find('.description');
if (count($nodes) > 0) {
$needle = "Производитель:";
foreach ($nodes as $short_description) {
if (stripos($short_description->plaintext, $needle) !== FALSE) {
echo "TRUE";
$this->data[] = $short_description->plaintext;
} else {
echo "FALSE";
}
}
} else {
$this->data[] = '';
}
$html->clear();
unset($html);
}
$this->html->clear();
unset($html);
}
return $this->data;
}
hi you should inspect the element and copy->copy selector and use it in find method to getting the object

Loop through a table with Simple HTML DOM

Trying to scrape data out of a table on a website. I got the following PHP written but it isn't working.
Following error received: Notice: Trying to get property of non-object in DataScraping.php on line 27
//Sets the HTML DOM Library
require_once 'C:/xampp/php/lib/SimpleHTMLDOM/simple_html_dom.php';
$html = new simple_html_dom();
$html = file_get_html('https://www.flightradar24.com/data/flights/british-airways-ba-baw');
foreach($html->find('table[id=tbl-datatable]') as $datatable) {
foreach($datatable->find('tr') as $tr) {
foreach($tr->find('td') as $td) {
if(strpos($td->find('a', 0)->href, 'https://www.flightradar24.com/data/flights/') !== false) {
echo $td->find('a', 0)->innertext .", " .$td->find('a', 0)->href;
}
}
}
}
Also worth mentioning, this data is publically available and it is only for personal use. Please don't comment about copyright infringement - there is nothing wrong with what I want to do.
I'm simply trying to scrape the flight number only, both the inner text and the URL that sites behind it. Any help on where I'm going wrong?
Additional test provides the data I need but with the same error in between rows:
foreach($html->find('table[id=tbl-datatable]') as $datatable) {
foreach($datatable->find('tr') as $tr) {
foreach($tr->find('td') as $td) {
if (strpos($td->find('a', 0)->href, '/data/flights/') !== false) {
$test = $td->find('a', 0)->href;
$test2 = $td->find('a', 0)->innertext;
echo $test .", " .$test2;
}
}
}
}
You're trying to access elements of a null reference in your if statement itself, because not all of the <TD> tags have <A> tags in them. When there's no <A> tag in $td, $td->find('a', 0) is null, so
$td->find('a', 0)->href
is just what your error message said: "trying to get [a] property of [a] non-object".
You can fix this by checking the result of find() for null with an if:
$atag = $td->find('a', 0)
if ($atag) {
// ...
}
And you can fold this into your single if statement with the && operator. You've got another couple problems I found when running your code:
in the source of that site, the hrefs in the table are all relative, not absolute, so when you check for 'https://www.flightradar24.com' you find none of them
you're not adding a newline at the end of your echo
So to summarize my suggestions, something like this seems to work:
foreach($tr->find('td') as $td) {
$atag = $td->find('a', 0);
if($atag && strpos($atag->href, '/data/flights/') !== false) {
echo $atag->innertext . ", " . $atag->href . "\n";
}
}

Parsing XML with PHP DOMDocument

I've been looking around for a solution for this but can't find one anywhere.
I am trying to parse a XML file, but certain TagNames are missing from the XML. Some posts suggest using the object length but this doesn't work either.
if ($xmlObject->item($i)->getElementsByTagName('image1')->item(0)->childNodes->item(0)->length > 0) {
$product_image1 = $xmlObject->item($i)->getElementsByTagName('image1')->item(0)->childNodes->item(0)->nodeValue;
} else {
$product_image1 = "";
}
Notice: Trying to get property of non-object in
/home/s/public_html/import_xml.php on line 72
Fatal error: Call to a member function item() on a non-object in
/home/s/public_html/import_xml.php on line 72
The error is because <image1> is missing from the XML.
Any ideas on a fix?
This is how I've done it. Not sure if its "the best" way, but it works...
foreach ($xmlDoc->getElementsByTagName('product')->item($i)->childNodes as $node) {
if ($node->nodeType === XML_ELEMENT_NODE) {
$nodes[] = $node->nodeName;
echo "Processing node " . $node->nodeName . "<br />";
}
}
if (in_array("name", $nodes)) {
$product_name = $xmlObject->item($i)->getElementsByTagName('name')->item(0)->childNodes->item(0)->nodeValue;
} else {
$product_name = "";
}

Error using PHP Simple HTML DOM parser

if (!is_null($elements)) {
$embeds = array();
foreach ($elements as $element) {
if (trim(strip_tags($element->innertext)) == $episode_term) {
$html2 = file_get_html($element->href);
$elements2 = $html2->find('#streamlinks .sideleft a');
if (!is_null($elements2)) {
foreach ($elements2 as $element) {
$html3 = file_get_html($element->href);
$iframe_element = $html3->find('.frame', 0);
if (!is_null($iframe_element)) {
$embed = $misc->buildEmbed($iframe_element->src);
if ($embed) {
$embeds[] = array(
"embed" => $embed,
"link" => $iframe_element->src,
"language" => "ENG",
);
}
}
}
}
}
}
return $embeds;
}
Blockquote
PHP Fatal error: Call to a member function find() on a non-object in
$elements2 = $html2->find('#streamlinks .sideleft a');
so its confusing as to what is causing this error to appear in my error log file?
I'd try to output $element->href befor you do the file_get_html.
If the file_get_html can't get a page $html2 stays uniinitialized and you can't use find on it.
Beside that you could build a check wether $html2 is set after the file_get_html and output an error if not. I usually use something like this:
if($html2 == false || $html2 == NULL){
// no html found
}else{
// html found
}

How Can i get the child element using class using php DOMXPath?

I want to get the child element with specific class form html I have manage to find the element using tag name but can't figureout how can I get the child emlement with specific class?
Here is my CODE:
<?php
$html = file_get_contents('myfileurl'); //get the html returned from the following url
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if (!empty($html)) { //if any html is actually returned
$pokemon_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$pokemon_xpath = new DOMXPath($pokemon_doc);
//get all the h2's with an id
$pokemon_row = $pokemon_xpath->query("//li[#class='content']");
if ($pokemon_row->length > 0) {
foreach ($pokemon_row as $row) {
$title = $row->getElementsByTagName('h3');
foreach ($title as $a) {
echo "Title: ";
echo strip_tags($a->nodeValue). '<br>';
}
$links = $row->getElementsByTagName('a');
foreach ($links as $l) {
echo "Link: ";
echo strip_tags($l->nodeValue). '<br>';
}
$desc = $row->getElementsByTagName('span');
//I tried that but didnt work..... iwant to get the span with class desc
//$desc = $row->query("//span[#class='desc']");
foreach ($desc as $d) {
echo "DESC: ";
echo strip_tags($d->nodeValue) . '<br><br>';
}
// echo $row->nodeValue . "<br/>";
}
}
}
?>
Please let me know if this is a duplicate but I cant find out or you think question is not good or not explaining well please let me know in comments.
Thanks.

Categories