codeigniter, view template string to DomDocument operating - php

I have a php file in application\views\article.php
article.php content:
<!DOCTYPE html>
<html prefix='og: http://ogp.me/ns#'>
<head>
<title>test</title>
</head>
<body>
<div> test div1 </div>
<div> test div2 </div>
</body>
</html>
When I use $this->load->view() to load article.php template and use DomDocument to get dom.
$html=$this->load->view('article','',TRUE);
$doc = new DomDocument;
$doc->loadHTML($html);
echo $doc->saveXML($doc->getElementsByTagName('div')->item(0));
// or echo $doc->saveXML();
have the error message:
Message: DOMDocument::loadHTML(): Unexpected end tag : meta in Entity, line: 4
but whe I use this:
$html='<!DOCTYPE html>
<html prefix=\'og: http://ogp.me/ns#\'>
<head>
<title>test</title>
</head>
<body>
<div> test div1 </div>
<div> test div2 </div>
<p>Directory </p>
</body>
</html>';
$doc->loadHTML($html);
echo $doc->saveXML($doc->getElementsByTagName('div')->item(0));
// or echo $doc->saveXML();
this is success.
gettype($html) to two methods of $html, both are strings.

Try hide the warning with
libxml_use_internal_errors(true);
Or:
#$doc->loadHTML($html);
The warning is because the HTML returned by $this->load->view('article','',TRUE); is invalid, loadHTML() resolve this but show the warnings.
Manual

Related

How to edit head tag in PHP with DOM with only DOMDocument

Just trying to edit/modify the head tag in order to add something inside with DOM and PHP.
$dom = new DOMDocument();
$dom->loadHtml(utf8_decode($html), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
for($i=0; $i<count($r);$i++)
{
// Prepare the HTML to insert
Here I want to add $var inside head tag (at the end if possible)
}
return $dom->saveHTML();
Everytime I tried, I have LENGHT=0 as the result of var_dump.
Edit: I don't want to edit an existing tag. I want to add a new one. To be more specific, I need to add OG meta tag for Facebook sharing.
Edit2 as requested :
Before
<head>
<meta blabla>
<title></title>
</head>
<body>
<h1></h1>
</body>
After
<head>
<meta blabla>
<title></title>
<meta new1>
</head>
<body>
<h1></h1>
</body>
But need to be edit via DOMDocument in PHP...
Add this to the top of your file:
<?php
$var = "Hello world.";
?>
Then start the HTML and add it there.
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title><?= $var ?></title>
</head>
<body>
</body>
</html>
If you want to do it in PHP, you can try to use:
$titles = $domDocument->getElementsByTagName('title');
foreach($titles as $key => $title){
$title->setAttribute('attribute', 'value')
}
Source for the edit: https://stackoverflow.com/a/3195048/12077975
Try something along these lines:
$before=
'<html>
<head>
<meta name="old"/>
<title></title>
</head>
<body>
<h1></h1>
</body>
</html>
';
$HTMLDoc = new DOMDocument();
$HTMLDoc->loadHTML($before, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
$xpath = new DOMXPath($HTMLDoc);
$destination = $xpath->query('//head/title');
$template = $HTMLDoc->createDocumentFragment();
$template->appendXML('<meta name="new"/>');
$destination[0]->parentNode->insertBefore($template, $destination[0]->nextSibling);
echo $HTMLDoc->saveHTML();
Output:
<html>
<head>
<meta name="old">
<title></title><meta name="new">
</head>
<body>
<h1></h1>
</body>
</html

Way to create an HTTP proxy to convert relative paths to absolute paths

So, let's say that I am trying to proxy somesite.com, and I want to change this:
<!doctype html>
<html>
<body>
<img src="computerIcon.png">
</body>
</html>
to:
<!doctype html>
<html>
<body>
<img src="http://someproxy.net/?url=http://somesite.com/computerIcon.png">
</body>
</html>
And by the way, I prefer PHP.
You can use an XMLparser to update URLs of a document :
// Initial string
$html = '<!doctype html>
<html>
<body>
<img src="computerIcon.png">
</body>
</html>
';
$proxy = 'https://proxy.example.com/?url=https://domain.example.com/';
// Load HTML
$xml = new DOMDocument("1.0", "utf-8");
$xml->loadHTML($html);
// for each <img> tag,
foreach($xml->getElementsByTagName('img') as $item) {
// update attribute 'src'
$item->setAttribute('src', $proxy . $item->getAttribute('src'));
}
$xml->formatOutput = true;
echo $xml->saveHTML();
Output:
<!DOCTYPE html>
<html><body>
<img src="https://proxy.example.com/?url=https://domain.example.com/computerIcon.png">
</body></html>
Demo: https://3v4l.org/bW68Z

PHP DOMElement::replaceChild produces fatal error

Source HTML (test.html) is:
<html lang="ru">
<head>
<meta charset="UTF-8">
<title>PHP Test</title>
</head>
<body>
<h1>Test page</h1>
<div>
<div id="to-replace-1">Test content 1</div>
</div>
</body>
</html>
PHP to modify this HTML is:
<?php
$str = file_get_contents('test.html');
$doc = new DOMDocument();
#$doc->loadHTML($str);
$div1 = $doc->getElementById('to-replace-1');
echo $div1->nodeValue; // Success - 'Test content 1'
$div1_1 = $doc->createElement('div');
$div1_1->nodeValue = 'Content replaced 1';
$doc->appendChild($div1_1);
$doc->replaceChild($div1_1, $div1);
Doesn't matter - append newly created $div1_1 to $doc or not. The result is the same - last line produces 'PHP Fatal error: Uncaught DOMException: Not Found Error in ...'.
What's wrong?
Your issue is that $doc does not have a child which is $div1. Instead, you need to replace the child of $div1's parent, which you can access via its parentNode property:
$doc = new DOMDocument();
$doc->loadHTML($str, LIBXML_HTML_NODEFDTD);
$div1_1 = $doc->createElement('div');
$div1_1->nodeValue = 'Content replaced 1';
$div1 = $doc->getElementById('to-replace-1');
$div1->parentNode->replaceChild($div1_1, $div1);
echo $doc->saveHTML();
Output:
<html lang="ru">
<head>
<meta charset="UTF-8">
<title>PHP Test</title>
</head>
<body>
<h1>Test page</h1>
<div>
<div>Content replaced 1</div>
</div>
</body>
</html>
Demo on 3v4l.org
Note that you don't need to append $div1_1 to the HTML, replaceChild will do that for you.

Append HTML elements by PHP

I want to append my head tag with script tag(with some contents) in external Html file using PHP code.
But my Html is not updating or showing any errors.
PHP Code:
<?php
$doc = new DOMDocument();
$doc->loadHtmlFile( 'myfolder/myIndex.html');
$headNode = $doc->getElementsByTagName('head')->item(0);
$scriptNode = $doc->createElement("script");
$headNode->appendChild($scriptNode);
echo $doc->saveXML();
?>
Html File :
(A simple html pattern)
<html>
<head></head>
<body></body>
</html>
I have refered to the documentation here
Couldn't figure out the problem still.
Given a very simple HTML file ( simple.html )
<!DOCTYPE html>
<html lang='en'>
<head>
<meta charset='utf-8' />
<title>A simple HTML Page</title>
</head>
<body>
<h1>Simple HTML</h1>
<p>Well this is nice!</p>
</body>
</html>
Then using the following
$file='simple.html';
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->recover=true;
$dom->strictErrorChecking=false;
$dom->loadHTMLFile( $file );
$errors = libxml_get_errors();
libxml_clear_errors();
$script=$dom->createElement('script');
$script->textContent='/* Hello World */';
/* use [] notation rather than ->item(0) */
$dom->getElementsByTagName('head')[0]->appendChild( $script );
printf('<pre>%s</pre>',htmlentities( $dom->saveHTML() ));
/* write changes back to the html file - ie: save */
$dom->saveHTMLFile( $file );
will yield ( for display )
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>A simple HTML Page</title>
<script></script></head>
<body>
<h1>Simple HTML</h1>
<p>Well this is nice!</p>
</body>
</html>

Symfony Dom Crawler missing closing tag in template

I use the Symfony DOM Crawler to read and save an HTML document containing a template. But the closing HTML tags are missing in the template. Here is an example:
<?php
$htmlString = <<<'HTML'
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<h1>Title</h1>
<script id="my-template" type="text/template">
<div>{{ Name }}</div>
</script>
</body>
HTML;
$crawler = new Crawler($htmlString);
$output = join(
$crawler->filterXPath('//body')->each(
function (Crawler $node, $i) use ($htmlString) {
return $node->html();
}
)
);
I would expect something like:
<h1>Title</h1>
<script id="my-template" type="text/template">
<p>Hello</p>
<div>{{ Name }}</div>
</script>
But I get:
<h1>Title</h1>
<script id="my-template" type="text/template">
<p>Hello
<div>{{ Name }}
</script>
Do you have an any idea why is the DOM Crawler omitting the closing tag?
I've done some debugging and isolated this issue with following code (as Crawler utilizes DOMElement objects):
$htmlString = <<<'HTML'
<script id="my-template" type="text/template">
<div> Name </div>;
</script>
HTML;
$el = new \DOMDocument();
libxml_use_internal_errors(true);
$el->loadHTML($htmlString);
echo $el->saveHTML($el);
Ouputs (doctype, html and head added automatically, but it's not important here):
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script id="my-template" type="text/template">
<div> Name ;
</script></head></html>
As you can see it gives similar issue with closing tag inside script.
If you comment out libxml_use_internal_errors(true); then you'll get an error:
DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 2
I've also did some research about this error and found out that it's pretty old bug in LibXML2 library and not strictly PHP issue:
https://bugs.php.net/bug.php?id=52012
I'm getting this issue on PHP 7.0.6, so I guess it's still not fixed at all.
In general it looks like it's about parsing tag by the libxml library, so you will have to either not use the Crawler, or do not place HTML templates in script tags. Solution may vary on what you're trying to achieve.

Categories