Parse html from a .dat file - php

I'm trying to parse some html that is inside an external .dat file.
I would normaly use the follwing code:
$html = new DOMDocument();
$html->loadHTMLFile('http://www.bvl.com.pe/includes/cotizaciones_todas.dat');
$xpath = new DOMXPath($html);
$path = '/somepath';
$nodelist = $xpath->query($path);
echo $nodelist->item(0)->nodeValue;
But I'm getting this error:
DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://www.bvl.com.pe/includes/cotizaciones_todas.dat, line: 15
I know that the problem is the loadHTMLFile, I tried using load or loadXML but it's not working neither.
Any help would be appriciated.
UPDATE
To solve the problem I had to handle the errors using libxml_use_internal_errors(TRUE).
Now I've a new problem, I want to count how many <tr> tags are inside the table. I'm using the following code:
$html = new DOMDocument();
libxml_use_internal_errors(TRUE);
$html->loadHTMLFile('http://www.bvl.com.pe/includes/cotizaciones_todas.dat');
libxml_clear_errors();
$xpath = new DOMXPath($html);
$tbody = $html->getElementsByTagName('tbody')->item(0);
$path = 'count(tr)';
$trCount = $xpath->evaluate($path,$tbody);
But I'm getting this error msg: PHP Catchable fatal error: Argument 2 passed to DOMXPath::evaluate() must be an instance of DOMNode, null given I already used the same code with other files and everything worked fine, but in this case it's not working, maybe because the html is broken?

Related

How to get the href attribute

I have a url:
http://www.indeed.com/viewjob?jk=daddefef363643d7&qd=E8dXiB4h7yBMgEwoEDfyDF2ACaqK5NNcKe-lg0a0QeWlgGT7hwsgagao8YFkybxtaLZJqFprtIWhTxIjvWFBLUePVQb0Chqftd-uc7_Pfa4LB2pHYt-YP2NYagtBg9Lp&atk=1a4sk4spi1c0o5la&utm_source=publisher&utm_medium=organic_listings&utm_campaign=affiliate
I want to extract href value of anchor for view and apply
my code is:-
$dom = new DOMDocument();
#$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$applylink = $xpath->query("//*[#class='job-footer-button-row']/a");
if(!is_null($applylink)){
$this->view->applylink = $applylink->item(0)->getAttribute('href');
}
But it always shows below error:
Fatal error: Call to a member function getAttribute() on a non-object
This happens because DOMXPath::query does not return null when it finds no matches. Please read the documentation to see what it returns, and that should allow you to correct your code.

Fatal error: Call to undefined method DOMDocument::getElementsById()

I'm parsing a exterior html (http://www.amazon.com/Toshiba-Satellite-C55-A5245-15-6-Inch-Horizon/dp/B00D78PZE8/ref=lp_9277875011_1_1?s=pc&ie=UTF8&qid=1400886357&sr=1-1) where I have a element like this:
<span id="priceblock_ourprice" class="a-size-medium a-color-price">$429.99</span>
and a php with the following code:
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile($url);
libxml_clear_errors();
$links = $dom->getElementsById('priceblock_ourprice');
foreach ($links as $link ) {
echo "- ".$link->nodeValue."<br>";
}
But I'm getting the following error:
Fatal error: Call to undefined method DOMDocument::getElementsById()
Anyone could tell me what I'm doing wrong?
Thanks!
getElementsById() is not a method of DOMDocument, you should try getElementById() instead. I don't even think two elements can have the same id, so you won't be able to get a collection (array) based on id.
Ok, so I don't quite understand this, seems that Firebug in Firefox was showing me the wrong ID, I used the following code to get the Id of the different spans and the right one was:
$dom = new DOMDocument();
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile($url);
libxml_clear_errors();
$nodes = $dom->getElementsByTagName('span');
foreach($nodes as $node) {
echo $node->getAttribute('id'). '->'.$node->textContent.'<br>';
}
and it returned a different id for the field that I was looking for, I guess I had some error at some point, really sorry for waisting your time.

output only a specific node PHP XML

Im trying to save a specific node instead of the full xml file, but I get error.
Catchable fatal error: Argument 1 passed to DOMDocument::saveXML() must be an instance of DOMNode, instance of DOMNodeList given in php\corrdination.php on line 31
I'm following the doom documentation but since I don't create new element and only read from an already created xml file, it wont work with creating new elements.
My line 31 is
$resultX = $xpath->query('/stickers/sticker[id="200"]/position/x');
And when im trying to save only the changed node i write.
echo $xml->saveXML($resultX);
Any suggestion on how to do it ?
This is my whole php file.
$xml = new DOMDocument();
$xml->formatOutput = TRUE;
$xml->preserveWhiteSpace = FALSE;
$xml->load('../stickers.xml');
$xpath = new DOMXPath($xml);
$resultX = $xpath->query('/stickers/sticker[id="200"]/position/x');
$resultX->item(0)->nodeValue = "150";
echo $xml->saveXML($resultX);
If I only echo $xml->saveXML();
The query works but as I said, it saves the whole node structure.
XML file:
<stickers>
<sticker>
<position>
<x>0</x>
</position>
<text>Hello world </text>
<id>200</id>
</sticker>
</stickers>
Thanks
The error says you have to pass DOMNode to DOMDocument::saveXML(). So you need to change this line:
echo $xml->saveXML($resultX);
to this:
echo $xml->saveXML($resultX->item(0));

Using a var as agrument 2 in addChild method for writing XML

Here i parse some data from a webpage.
I want to write it to an file. It all works ok when i use some test strings in
$xml->addChild('alink', 'test');
But when i try and write in the data i actually need to use
$xml->addChild('alink', $value);
It doesnt work.
Message is :
Warning: SimpleXMLElement::addChild() [simplexmlelement.addchild]: unterminated entity reference .wvx= in C:\Documents and Settings\Owner\My Documents\Downloads\XAMPP_1.7.1\xampp\htdocs\PhpTest2\index.php on line 96
Complete code. Why does addChild not let me use a var there as agrument 2 in that method? And what is the word around to getting that working. Can find no explanation on php.net
$dom = new DOMDocument();
#$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
$articleList = $xpath->query("//body/div/div/div/table/tbody/tr/td/a");
$xml = new SimpleXmlElement('<links></links>');
$xml->addChild('dvd');
foreach ($articleList as $art)
{
$value = $art->getAttribute('href');
$xml->addChild('alink', $value);
}
$xml->asXML('/simplexml_create.xml');
Many Thanks,
-Code

php domdocument exception loadHTMLFile

I am having a strange behavior in my script. That has me confused
Script 1.
$dom = new DOMDocument();
$dom->loadHTMLFile("html/signinform.html");//loads file here
$form = $dom->getElementsByTagName("form")->item(0);
$div = $dom->createElement("div");
$dom->appendChild($div)->appendChild($form);
echo $dom->saveHTML();
Script 2.
$dom = new DOMDocument();
$div = $dom->createElement("div");
$dom->loadHTMLFile("html/signinform.html");//loads file here
$form = $dom->getElementsByTagName("form")->item(0);
$dom->appendChild($div)->appendChild($form);
echo $dom->saveHTML();
Script 1 works without problem. It shows the form. However Script 2 throws the following error: Fatal error: Uncaught exception 'DOMException' with message 'Wrong Document Error' in C:\Users
Could someone explain to me why the mere changing of position of the loadHTMLFile function results in such error? Thanks
You have added an element to the DOM (div) and then attempted to load a file to be parsed and its DOM structure used.
Load the file first if you intend to use one.
For DOM manipulation you do not need to insert an already existing element so doing something like this: $dom->appendChild($form) only reinserts the same form element, when you pull an element using $dom->getElementsByTag("form")->item(0) it becomes it's own DOM object which you can reference directly and append to. A proper example would be:
$dom = new DOMDocument();
$dom->loadHTMLFile("assets/dom_document-form.html");
$div = $dom->createElement("div");
$form = $dom->getElementsByTagName("form")->item(0);
$form->appendChild($div);
echo $dom->saveHTML();
One should append directly to the object they pulled from the DOM instead and load the document first.
To help aid your initial questions too:
Append directly to element that you pulled as it references the object.
new DOMDocument can be used to create multiple documents.
using DOMDocument::createElement before loadHTMLFile creates 2 DOMDocuments.
Using DomDocument::createDocumentFragment acts the same and creates it's own DOM.
If you would like to keep your code the same and create two DomDocuments then you should use DomDocument::importNode, an example of this would be:
$dom = new DOMDocument();
$div = $dom->createElement("div");
$dom->loadHTMLFile("assets/dom_document-form.html");
$node = $dom->importNode($div);
$form = $dom->getElementsByTagName("form")->item(0);
$form->appendChild($node);
echo $dom->saveHTML();

Categories