I want to get all the td of the ninth table element in an HTML page.
I started with this , but I dont know how to finish it :
define('GLPI_ROOT', '..');
$content = GLPI_ROOT . "/front/yourpage.html";
$dom = new DOMDocument();
#$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$attbs = $xpath->query("//table td");
foreach($attbs as $a) {
print $a->nodeValue;
}
And I have tried this one too, but it didn't work :
$dom = new DOMDocument();
$dom->loadHTMLFile("yourpage.html");
$tables = $dom->getElementsByTagName('table');
$table = $tables->item(8);
foreach ($table->childNodes as $td) {
if ($td->nodeName == 'td') {
echo $td->nodeValue, "\n";
echo "<script type=\"text/javascript\"> alert('".$td->nodeValue."');</script>";
}
}
I'm getting this error :
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in yourpage.html, line: 33
Your first one doesn't work because you don't seem to be understanding how XPath works, but your second one is probably a better option.
That said, when has <td> ever been a first-level child of <table>? You could use getElementsByTagName again on the $table, that'd work quite well.
Related
I've spent couple hours reading stack overflow and trying get data from some webpage with no success.
Could You help me? I ran out of ideas.
here is the html:
html
I've tried lot of examples. And the last one been simplest I think; and still have no idea how to do this.
include 'simple_html_dom.php';
//new dom object
$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
/*** get each column by tag name ***/
$cols = $row->getElementsByTagName('td');
/*** echo the values ***/
echo $cols->item(0)->nodeValue.'<br />';
echo $cols->item(1)->nodeValue.'<br />';
echo $cols->item(2)->nodeValue;
echo '<hr />';
}
and I getting this:
Call to a member function getElementsByTagName() on a non-object
because $tables is empty.
I need to get only two positions like on this screenshot:
Maybe this will help:
$dom = new DOMDocument();
$html = $dom->loadHTMLFile('http://www.m.rozkladzik.pl/poznan/wyszukiwarka_polaczen.html?from=Szymanowskiego%7Cb%7C105&to=Rybaki%7Cb%7C90&profile=opt&maxWalkChange=400&minChangeTime=2&currTime=1');
$dom->preserveWhiteSpace = false;
$finder = new DomXPath($dom);
$routes = $finder->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' route_row ')]");
for ($i = 0; $i < 2; $i++) {
$times[] = $routes->item($i);
}
$times is now an array with two DOMElement elements, that are the first two lines of the results.
I have seem similar solutions else where but I haven't been able to convert to work with my own code.
I have a function that splits an html string between the paragraph tags and returns in an array. Code is as follows...
$dom = new DOMDocument();
$dom->loadHTML($string);
$domx = new DOMXPath($dom);
$entries = $domx->evaluate("//p");
$result = array();
foreach ($entries as $entry) {
$result[] = '<' . $entry->tagName . '>' . $entry->nodeValue . '</' . $entry->tagName . '>';
}
return $result;
Can someone assist me to remove the nodeValue element from this so it returns the paragraph content with html tags complete?
The html I am testing against is this: http://adam-makes-websites.com/tests/htmltest/test.html
A full test of what im doing with the code (as it stands with the suggestion to use ownerDocument->saveHTML applied) is here: http://adam-makes-websites.com/tests/htmltest/runtest.txt
The output from the test can be seen here: http://adam-makes-websites.com/tests/htmltest/runtest.php
You need to call saveHTML on the ownerDocument property:
$result[] = $entry->ownerDocument->saveHTML($entry);
$dom = new DOMDocument();
$dom->loadHTML($string);
$entries = $dom->getElementsByTagName('p');
$new_dom = new DOMDocument();
foreach ($entries as $entry) {
$new_dom->appendChild($new_dom->importNode($entry, TRUE));
}
$result = $new_dom->saveHTML()
I have a html string that contains exactly one a-element in it. Example:
test
In php I have to test if rel contains external and if yes, then modify href and save the string.
I have looked for DOM nodes and objects. But they seem to be too much for only one A-element, as I have to iterate to get html nodes and I am not sure how to test if rel exists and contains external.
$html = new DOMDocument();
$html->loadHtml($txt);
$a = $html->getElementsByTagName('a');
$attr = $a->item(0)->attributes();
...
At this point I am going to get NodeMapList that seems to be overhead. Is there any simplier way for this or should I do it with DOM?
Is there any simplier way for this or should I do it with DOM?
Do it with DOM.
Here's an example:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
echo $dom->saveHTML();
I kept going to modify with DOM. This is what I get:
$html = new DOMDocument();
$html->loadHtml('<?xml encoding="utf-8" ?>' . $txt);
$nodes = $html->getElementsByTagName('a');
foreach ($nodes as $node) {
foreach ($node->attributes as $att) {
if ($att->name == 'rel') {
if (strpos($att->value, 'external')) {
$node->setAttribute('href','modified_url_goes_here');
}
}
}
}
$txt = $html->saveHTML();
I did not want to load any other library for just this one string.
The best way is to use a HTML parser/DOM, but here's a regex solution:
$html = 'test<br>
<p> Some text</p>
test2<br>
<a rel="external">test3</a> <-- This won\'t work since there is no href in it.
';
$new = preg_replace_callback('/<a.+?rel\s*=\s*"([^"]*)"[^>]*>/i', function($m){
if(strpos($m[1], 'external') !== false){
$m[0] = preg_replace('/href\s*=\s*(("[^"]*")|(\'[^\']*\'))/i', 'href="http://example.com"', $m[0]);
}
return $m[0];
}, $html);
echo $new;
Online demo.
You could use a regular expression like
if it matches /\s+rel\s*=\s*".*external.*"/
then do a regExp replace like
/(<a.*href\s*=\s*")([^"]\)("[^>]*>)/\1[your new href here]\3/
Though using a library that can do this kind of stuff for you is much easier (like jquery for javascript)
I'm scraping a page which contains of a table with several tr's. Inside every tr there's four td's, and I want to get the data from the first of these td's. Below is the code I've tried so far, but it grabs all the td's. How can I accomplish what I want?
...
$html = new simple_html_dom();
$html = file_get_html($url);
foreach($html->find('table tr') as $row) {
foreach($row->find('td', 0) as $cell) {
echo $cell;
}
}
Think about why you're using the second foreach when you actually only mean to act on one element within each row.
$html = new simple_html_dom();
$html = file_get_html($url);
foreach($html->find('table tr') as $row) {
$cell = $row->find('td', 0);
echo $cell;
}
simple html dom is a turd. It's simpler to use the built in dom functions and xpath:
$dom = new DOMDocument();
#$dom->loadHTMLFile($url);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//td[1]') as $td){
echo $td->nodeValue;
}
That said, I would probably still prefer to use phpquery
I'm trying to replace video links inside a string - here's my code:
$doc = new DOMDocument();
$doc->loadHTML($content);
foreach ($doc->getElementsByTagName("a") as $link)
{
$url = $link->getAttribute("href");
if(strpos($url, ".flv"))
{
echo $link->outerHTML();
}
}
Unfortunately, outerHTML doesn't work when I'm trying to get the html code for the full hyperlink like <a href='http://www.myurl.com/video.flv'></a>
Any ideas how to achieve this?
As of PHP 5.3.6 you can pass a node to saveHtml, e.g.
$domDocument->saveHtml($nodeToGetTheOuterHtmlFrom);
Previous versions of PHP did not implement that possibility. You'd have to use saveXml(), but that would create XML compliant markup. In the case of an <a> element, that shouldn't be an issue though.
See http://blog.gordon-oheim.biz/2011-03-17-The-DOM-Goodie-in-PHP-5.3.6/
You can find a couple of propositions in the users notes of the DOM section of the PHP Manual.
For example, here's one posted by xwisdom :
<?php
// code taken from the Raxan PDI framework
// returns the html content of an element
protected function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
?>
The best possible solution is to define your own function which will return you outerhtml:
function outerHTML($e) {
$doc = new DOMDocument();
$doc->appendChild($doc->importNode($e, true));
return $doc->saveHTML();
}
than you can use in your code
echo outerHTML($link);
Rename a file with href to links.html or links.html to say google.com/fly.html that has flv in it or change flv to wmv etc you want href from if there are other href
it will pick them up as well
<?php
$contents = file_get_contents("links.html");
$domdoc = new DOMDocument();
$domdoc->preservewhitespaces=“false”;
$domdoc->loadHTML($contents);
$xpath = new DOMXpath($domdoc);
$query = '//#href';
$nodeList = $xpath->query($query);
foreach ($nodeList as $node){
if(strpos($node->nodeValue, ".flv")){
$linksList = $node->nodeValue;
$htmlAnchor = new DOMElement("a", $linksList);
$htmlURL = new DOMAttr("href", $linksList);
$domdoc->appendChild($htmlAnchor);
$htmlAnchor->appendChild($htmlURL);
$domdoc->saveHTML();
echo ("<a href='". $node->nodeValue. "'>". $node->nodeValue. "</a><br />");
}
}
echo("done");
?>