I'm working with a row of html table cells that look like:
<td align="left" class="info">my message goes here</td>
<td align="left" class="info">my message goes here</td>
I would like to modify these cells by inserting a clickable anchor tag into each.
I have written the following function:
public function modifyAttribute2($domDoc) {
//We use xpath to search ChildElement:
$domXPath = new DOMXPath($domDoc);
$items = $domXPath->query("//td[#class='moreinfo']");
foreach ($items as $item) {
echo $item->nodeValue . "\n";
$item->nodeValue = "hi";
$doc = new DOMDocument();
$valid_elem = $doc->createElement('a');
$valid_attr = $doc->createAttribute('href');
$valid_attr->value = base_url();
$valid_elem->appendChild($valid_attr);
// We insert the new element as root (child of the document)
$xmlcontent = $domDoc->importNode($valid_elem, true);
$item->appendChild($xmlcontent);
$domDoc->saveXML($item); }
echo $domDoc->saveXML();
exit;
}
addendum:
I'm trying to create and import a new domdocument node into my original document $domDoc as suggested but I do not see any sign of the imported node after saving and inspecting the html. What am I doing wrong?
Related
I am using the following function to replace HTML5 elements with Div ID.
<?php function nonHTML5 ($content){
$dom = new DOMDocument;
// Hide HTML5 element errors
libxml_use_internal_errors(true);
$dom->loadHTML($content);
libxml_clear_errors();
$xp = new DOMXPath($dom);
// Bring elements into array
$elements = $xp->query('//*[self::header| self::footer ]
[not(ancestor::pre) and not(ancestor::code)]');
// Loop through
foreach($elements as $element){
// Replace with 'div' tag
$newElement = $dom->createElement('div');
while($element->childNodes->length){
// Keepup with the child nodes
$childElement = $element->childNodes->item(0);
$newElement->appendChild($dom->importNode($childElement, true));
}
while($element->attributes->length){
// Mailtain the length
$attributeNode = $element->attributes->item(0);
$newElement->setAttributeNode($dom->importNode($attributeNode));
}
$element->parentNode->replaceChild($newElement, $element);
}
$content = $dom->saveXML($dom->documentElement);
return $content;
} ?>
I know we can use HTMLShiv but I want to do this primarily for Old browsers with JavaScript disabled.
My Challenge:
I am not able to add an id =" " to it. For example.....
<header>
<h1>I am the header</h1>
</header>
Should become
<div id ="header">
<h1>I am the header</h1>
</div>
I tried doing......
$newElement = $dom->createElement('div id ="' . $element . '"');
but did not work.
My question
What should be the correct code?
Please Note: I am not a PHP expert hence please be a little descriptive in your answers / comments.
Here is how you can do it:
NOTE : I have added comments for more clarification that what is happening in exactly each statement of the code.
CREATING AN HTML ELEMENT WITH ATTRIBUTE USING DOM :
<?php
// Initiate a new DOMDocument
$dom = new DOMDocument();
// Create an element
$div = $dom->createElement("div","HERE DIV CONTENTS");
// Create an attribute i.e id
$divAttr = $dom->createAttribute('id');
// Assign value to your attribute i.e id="value"
$divAttr->value = 'This is an id';
// Add your attribute (id) to your element (div)
$div->appendChild($divAttr);
// Add your element (div) to DOM
$dom->appendChild($div);
// Print your DOM HERE
echo $dom->saveHTML();
?>
CODE OUTPUT :
<div id="This is an id">HERE DIV CONTENTS</div>
Not sure why the code below is not working, its displaying the "Else" value in the IF statement basically saying that there are no IMG tags found on the page but.. im sure they are there? any advice or guidance will be appreciated.
// This variable will contain all the HTML source code of the sample page
$htmlContent = file_get_contents('https://www.instagram.com/ken_flavius/');
var_dump($htmlContent);
// We'll add all the images in this array
$images = [];
// Instantiate a new object of class DOMDocument
$doc = new DOMDocument();
// Load the HTML doc into the object
$doc->loadHTML($htmlContent);
// Get all the IMG tags in the document
$elements = $doc->getElementsByTagName('img');
// If we get at least one result
if($elements->length > 0)
{
// Loop on all of the IMG tags
foreach($elements as $element)
{
// Get the attribute SRC of the IMG tag (this is the link of the image)
$src = $element->getAttribute('src');
if (strlen($src) > 0) {
// Add the link to the array containing all the links
array_push($images, $src);
}
}
//show all links
echo '<pre>'."\r\n";
print_r($images);
echo '</pre>'."\r\n";
} else {
// No result, it means that there were no IMG tags
echo 'no img tag found in the HTML source provided!';
}
Edited it to show the exact example that im using.
$url="http://example.com";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}
I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs.
<?php
$url = 'http://aice.anie.it/quotazione-lme-rame/';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTML($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tbody/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
?>
this prints just "hello!". I want to print the value extracted with the xpath, but the last echo doesn't do anything.
You have some errors in your code :
You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/, but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp, so get the iframe url directly.
You use the function loadHTML(), which load an HTML string. What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php)
You assume there is a tbody element on the page but there is no one. So remove that from your query filter.
Working code :
$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
When a link in A.html is clicked :
<tr>
<td class="book"><a class="booklink" ref="../collection?file=book1.pdf">
Good Read
</a>
-Blah blah blah
</td>
</tr>
?file=book1.pdf is passed to B.html:
<?php
$src = $_GET['file'];
?>
<iframe src="<?php echo $src; ?>" >
</iframe>
QUESTION:- How to retrieve the text "Good Read-Blah blah blah" from A.html and paste it into the meta description in B.html by using simple html dom? (Please know that there are thousand of listed data in the table in A.html)
Thank you.
Use DOM to load your HTML document and XPath to search it.
// note: if the HTML to parse has bad syntax, use: libxml_use_internal_errors(true);
$doc = new DOMDocument;
$doc->loadHTML(file_get_contents('A.html'));
if ($doc === false) {
throw new RuntimeException('Could not load HTML');
}
$xpath = new DOMXPath($doc);
$xpathResult = $xpath->query("//a[#href = '../collection?file={$_GET['file']}']/..");
if ($xpathResult === false) {
throw new LogicException('Something went wrong querying the document!');
}
foreach ($xpathResult as $domNode) {
echo 'Link text: ' . htmlentities($domNode->textContent) . PHP_EOL;
}
im being played by php and DomDocument.... basically i have some html saved in db. With anchor tags with different urls.... i want to force anchor tag hrefs not within allowedurl list to be replaced with #
eg
$allowed_url_basenames = array('viewprofile.php','viewalbum.php');
sample content from db1
<table cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top">
Edrine Kasasa has
</td>
<td valign="top">
invited 10 friend(s) to veepiz using the Invite Tool
</td>
</tr>
</tbody>
i want a php function that will leave first anchor tag href intact and change the second to href='#'.
This should be pretty straight-forward.
First, let's grab all of the anchor tags. $doc is the document you've created with your HTML as the source.
$anchors = $doc->getElementsByTagName('a');
Now we'll go through them one-by-one and inspect the href attribute. Let's pretend that the function contains_bad_url returns true when the passed string is on your blacklist. You'll need to write that yourself.
foreach($anchors as $anchor)
if($anchor->hasAttribute('href') && contains_bad_url($anchor->getAttribute('href'))) {
$anchor->setAttribute('href', '#');
}
}
Tada. That should be all there is to it. You should be able to get the results back as an XML string and do whatever you need to do with the rest.
Thanx Charles.... came up with this
function contains_bad_urls($href,$allowed_urls)
{
$x=pathinfo($href);
$bn=$x['filename'];
if (array_search($bn,$allowed_urls)>-1)
{
return false;
}
return true;
}
function CleanHtmlUrls($str)
{
$allow_urls = array('viewprofile','viewwall');//change these to whatever filename
$doc = new DOMDocument();
$doc->loadHTML($str);
$doc->formatOutput = true;
$anchors = $doc->getElementsByTagName('a');
foreach($anchors as $anchor)
{
$anchor->setAttribute('onclick','#');
if(contains_bad_urls($anchor->getAttribute('href'),$allow_urls))
{
$anchor->setAttribute('href', '#');
}
}
$ret=$doc->saveHTML();
return $ret
}