php simple html dom parser not updating multi classes - php

I'm trying to apply 2 classes to an element like this:
$div->setAttribute('class', 'txt found');
unfortunately it won't work as i'm getting the following markup:
<div found="" class="txt">
I've also tried $div->class = "txt found"; which had same result.
Any ideas how to fix this?

Could you please try following;
$div->className = "txt found";
Updated:
<?php
$divHtml = "<div></div>";
$dom = new DOMDocument();
$dom->loadHTML($divHtml);
$allElements = $dom->getElementsByTagName('div');
$divElement = $allElements->item(0);
$divElement->setAttribute("class", "txt found");
echo $dom->saveHTML();
?>
I tried to reproduce your case and finally it worked.You can test it.If you send more code we can modify it inorder to work

Related

Get text from script output

everyone, I've been using this code for quite a long time
<?php
$url = 'http://www.smn.gov.ar/mensajes/index.php?observacion=metar&operacion=consultar&87582=on&87641=on&87750=on&87765=on&87222=on&87761=on&87860=on&87395=on&87344=on&87166=on&87904=on&87571=on&87347=on&87803=on&87576=on&87162=on&87532=on&87497=on&87097=on&87046=on&87548=on&87217=on&87506=on&87692=on&87418=on&87574=on&87715=on&87374=on&87289=on&87852=on&87178=on&87896=on&87823=on&87270=on&87155=on&87453=on&87925=on&87934=on&87480=on&87047=on&87553=on&87311=on&87909=on&87436=on&87509=on&87912=on&87623=on&87444=on&87129=on&87371=on&87645=on&87022=on&87127=on&87828=on&87121=on&87938=on&87791=on&87448=on';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile($url);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
// search for td's containing METAR
$metars = $xpath->query('//td[contains(text(), "METAR SA")]');
if($metars->length <= 0) {
echo 'no metars found';
exit;
}
$data = array();
foreach($metars as $metar) {
$data[] = $metar->nodeValue;
}
echo '<pre>';
print_r($data);
Well, this was working fine, until the program in charge to read the output was updated and now it needs a clear output.
At the momment I'm getting this:
http://ar.ivao.aero/weather/metar.php
But the program needs it like this:
SABE 161600Z 02006KT 9999 FEW030 24/18 Q1009 =
SAZA 161600Z 18011KT CAVOK 24/08 Q1010 =
SAZB 161700Z 27012KT CAVOK 21/09 Q1011 =
I don't thought maybe using another script like a file_get_content() could be useful but again its going to show the infromation I don't want to.
I also tried replacing print_r() by var_dump() but its the same
Any ideas?
There is anyway to get this informatin in a simple txt file?
Regards,
You need to filter out some data. Try to find out what's common in the info you need to output. For instance, all the required info from your raw print_r data seems to beging with METAR. So
echo '<pre>';
foreach($metars as $metar) {
if(substr($metar->nodeValue, 0, 5) === "METAR") {
echo str_replace("METAR ", "", $metar->nodeValue) . PHP_EOL;
}
}
That removes any lines like Aeropuerto FORMOSA from the output.

scraping google custom search result with PHP

I'm using simple_html_dom.php
<?php
include('simple_html_dom.php');
$songName = '再见青春';
$dom = file_get_html('http://www.google.com/cse?q='. $songName .'&cx=partner-pub-4291153493758949%3A9692445719&cof=FORID%3A10&ie=UTF-8&ad=w9&num=1');
$firstRow = $dom->find('#gs-visibleUrl-long')->plaintext;
echo $dom;
var_dump($firstRow);
?>
$dom is ok, but I want to dive in the DOM, it doesn't work. The $firstRow returned NULL. Am I doing this scrapping wrong?
The Dom and error is here http://daysof.me/chrome_lyric/lyric.php

Missing html content when using dom->saveHTML in PHP

I am getting data from a website using DOM. I've tested my code in my local server and it works perfectly however, when I uploaded it on a server and ran the code, the script I created returned html tags without any content. My code looks something like this:
$divs = $dom->getElementsByTagName('div');
foreach($divs as $div){
if($div->getAttribute('class') == "content1"){
$dom = new DOMDocument();
$dom->appendChild($dom->importNode($div, true));
$content1 = $dom->saveHTML();
echo "content:".$content1;
}
}
In my localhost, it returns something like so:
<div class="content1">This is my content</div>
However, in the server, I strangely get the empty html tags like so:
<div class="content1"></div>
What are possible causes of this problem? Is there any way I can fix it? Please advise.
PHP version under 5.3.6 :
create a variable that will contains a clone of the current node with all sub nodes,
append it as a child
echo the returned value.
foreach($divs as $div) {
if($div->getAttribute('class') == "content1"){
$dom = new DOMDocument();
$cloned = $div->cloneNode(TRUE);
$dom->appendChild($dom->importNode($cloned,TRUE));
$content1 = $dom->saveHTML();
echo "content:".$content1;
}
}
EDIT: I've made a mistake it was not
$cloned = $element->cloneNode(TRUE);
but
$cloned = $div->cloneNode(TRUE);
sorry ^^ (hope it will work)

Echoing only a div with php

I'm attempting to make a script that only echos the div that encolose the image on google.
$url = "http://www.google.com/";
$page = file($url);
foreach($page as $theArray) {
echo $theArray;
}
The problem is this echos the whole page.
I want to echo only the part between the <div id="lga"> and the next closest </div>
Note: I have tried using if's but it wasn't working so I deleted them
Thanks
Use the built-in DOM methods:
<?php
$page = file_get_contents("http://www.google.com");
$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($page);
libxml_use_internal_errors(false);
$domx = new DOMXPath($domd);
$lga = $domx->query("//*[#id='lga']")->item(0);
$domd2 = new DOMDocument();
$domd2->appendChild($domd2->importNode($lga, true));
echo $domd2->saveHTML();
In order to do this you need to parse the DOM and then get the ID you are looking for. Check out a parsing library like this http://simplehtmldom.sourceforge.net/manual.htm
After feeding your html document into the parser you could call something like:
$html = str_get_html($page);
$element = $html->find('div[id=lga]');
echo $element->plaintext;
That, I think, would be your quickest and easiest solution.

PHP returning page error on simplexml print_r

The problem is only happening with one file when I try to do a DocumentDOM/SimpleXML method, so it seems like the issue is with that file. No clue what it could be.
If I do the following:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
print_r($xml);
in Chrome, I get a "Page Unavailable" error. In Firefox, I get nothing.
If I do the same thing but to a "test2.html", I get a print out as expected.
If I try the same thing but doing it this way:
$file = "test1.html";
$data = file_get_contents($file)
$dom = DOMDocument::loadHTML($data);
$xml = simplexml_import_dom($dom);
print_r($xml);
I get the same issue.
If I comment out the print_r line, Chrome goes from the "Page Unavailable" to blank.
I changed the permissions to 777, in case that was an issue, no fix.
I tried simply echoing out the contents of the html, no problem at all.
Any clues as to why a) Chrome would do that, and b) why I'm not getting any usable results?
Update:
If I put in:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
if(!$dom) {
echo "No Load!";
}
else {
$xml = simplexml_import_dom($dom);
print_r($xml);
}
I get the same issue. If I put in:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
if(!$dom) {
echo "No Load!";
}
else {
echo "Load!";
}
I get the "Load!" output, meaning that the dom method shouldn't be the problem (?)
I'll try the same exact test with the simplexml.
Update2:
If I do this:
I get the same issue. If I put in:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
if(!$xml) {
echo "No Load!";
}
else {
echo "Load!";
}
I get "Load!" but if I do:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
if(!$xml) {
echo "No Load!";
}
else {
echo "Load!";
print_r($xml);
}
I get the error. I did finally notice that I had an option to view the error in Chrome:
Error 324 (net::ERR_EMPTY_RESPONSE): Unknown error.
The troublesome html file is 288Kb. Could that be the issue? If so, how would I adjust for that?
Last Update:
Very Odd. I can use methods and functions on the object (as simplexml or domdocument), so I can do things like xpath to delete or parse the html, etc. In some cases (small results) it can echo out results, but for big stuff (show all spans), it fails in the same way.
So, since the end result, I think will fit in these parameters, I SHOULD be okay (I guess).
But any real solution is very welcome.
Turn on error reporting: error_reporting(E_ALL); in the first line of your PHP code.
Check the memory limit of your PHP configuration: memory_limit in the respective php.ini
What's the difference between test1.html and test2.html? Perhaps test1.html is not well-formed.
DocumentDOM and/or SimpleXML may bail out if the document is malformed. Try something like:
$dom = DOMDocument::loadHTMLFile($file);
if (!$dom) {
echo 'Loading file failed';
exit;
}
$xml = simplexml_import_dom($dom);
if (!$xml) {
...
}
If creating the $dom worked, conversion to $xml should work as well, but make sure anyway.
Edit: As Gehrig said, make sure error reporting is on, that should make it obvious where the process fails.

Categories