Interpret string as HTML in PHP - php

i am working on PHP, and i've only just begun, so i would like to ask some advice on something i can't quite seem to find online.
I have a PHP-file that gets 2 strings: A name and a lot of HTML-text.
$name = $_POST['name'];
$content = $_POST['content'];
Now i want to use those 2 to create a new HTML file and save it. So far i've managed to make it save a new HTML file and use the name as the <title> tag. Now what i want is to do is replace the (currently empty) body with my HTML string, interpreted as HTML.
This builds my body tag:
$body = $doc->createElement('body');
$body = $root->appendChild($body);
Now i found 2 ways to do this, and both don't work:
Solution 1:
$content = $doc->createTextNode($content);
$content = $body->appendChild($content);
This inserts my HTML into my body, but it parses literally as '<div id="lala">content</div>'. So this isn't what i want.
Solution 2:
$content = $doc->loadHTML($content);
This actually makes the html load as HTML, but now it replaces my entire HTML and things like adding css and js to the head will now actually go UNDER the new body, instead of in the head. As such:
$link_1 = $doc->createElement('link');
$link_1->setAttribute('rel','stylesheet');
$link_1->setAttribute('href','css/mystylesheet.min.css');
$link_1 = $head->appendChild($link_1);
So basically i want to both interpret my string as HTML, AND make it load in the correct place. I've tried to change it to $body->loadHTML($content), but this gives me an internal error.
Does anyone know the PHP-method i'm looking for? Thanks!

This will work. Let me know if you need explanation :)
<?php
ini_set( 'display_errors', 1 );
ini_set( 'error_reporting', E_ALL );
$dom = new DOMDocument( '1.0' );
$dom->formatOutput = true;
$dom->preserveWhiteSpace = true;
// test values
$name = 'Test';
$content = '<div onclick="alert(\'Hello!\');">Test div</div>';
// create <html>, <head>, <title> and <body> tags
$html = $dom->createElement( 'html' );
$head = $dom->createElement( 'head' );
$title = $dom->createElement( 'title' );
$body = $dom->createElement( 'body' );
// title text
$titleText = $dom->createTextNode( $name );
// import the text in a new dom
$dom1 = new DOMDocument( '1.0' );
$dom1->formatOutput = true;
$dom1->preserveWhiteSpace = true;
$bodyText = $dom1->loadHTML( $content );
$bodyText = $dom1->getElementsByTagName('body')->item(0);
// add them to the dom
$html = $dom->appendChild( $html );
$html->appendChild( $head );
$head->appendChild( $title );
$title->appendChild( $titleText );
$html->appendChild( $body );
$bodyT = $dom->importNode( $bodyText, true );
$body->appendChild( $bodyT );
echo $dom->saveHTML();
?>
Hope this helps.

Of course, the text in createTextNode stands for plain text (vs. HTML).
Fatal error: Call to undefined method DOMElement::loadHTML()
Correct, loadHTML() belongs to DOMDocument, not DOMElement. In your case, the document is apparently $doc (not $body). So you need to call:
$doc->loadHTML()

Related

Replace all links in the body of html page using PHP

I have used the following code to replace all the links on HTML page.
$output = file_get_contents($turl);
$newOutput = str_replace('href="http', 'target="_parent" href="hhttp://localhost/e/site.php?turl=http', $output);
$newOutput = str_replace('href="www.', 'target="_parent" href="http://localhost/e/site.php?turl=www.', $newOutput);
$newOutput = str_replace('href="/', 'target="_parent" href="http://localhost/e/site.php?turl='.$turl.'/', $newOutput);
echo $newOutput;
I want to modify this code to replace only links inside the body and not in the head.
You can use DOMDocument to parse and manipulate the source. It's always a better idea to use a dedicated parser for a task like this instead of using string operations.
// Parse the HTML into a document
$dom = new \DOMDocument();
$dom->loadXML($html);
// Loop over all links within the `<body>` element
foreach($dom->getElementsByTagName('body')[0]->getElementsByTagName('a') as $link) {
// Save the existing link
$oldLink = $link->getAttribute('href');
// Set the new target attribute
$link->setAttribute('target', "_parent");
// Prefix the link with the new URL
$link->setAttribute('href', "http://localhost/e/site.php?turl=" . urlencode($oldLink));
}
// Output the result
echo $dom->saveHtml();
See https://eval.in/843484
You can decapitate the code.
Finds the body and separate the head from the body to two variables.
//$output = file_get_contents($turl);
$output = "<head> blablabla
Bla bla
</head>
<body>
Foobar
</body>";
//Decapitation
$head = substr($output, 0, strpos($output, "<body>"));
$body = substr($output, strpos($output, "<body>"));
// Find body tag and parse body and head to each variable
$newOutput = str_replace('href="http', 'target="_parent" href="hhttp://localhost/e/site.php?turl=http', $body);
$newOutput = str_replace('href="www.', 'target="_parent" href="http://localhost/e/site.php?turl=www.', $newOutput);
$newOutput = str_replace('href="/', 'target="_parent" href="http://localhost/e/site.php?turl='.$turl.'/', $newOutput);
echo $head . $newOutput;
https://3v4l.org/WYcYP

php, how to clone DOMElement?

Official documentation says that DOMElement has inherited method cloneNode http://php.net/manual/en/class.domelement.php . If i try to clone, it does not work. How to copy element from one DOMDocument to another? Namely, i have misplaced head, thus i have somehow to copy head and body, and than to echo them in right order.
ob_start();
$viewData = $this->data;
include_once( $this->viewTemplPath.$this->file );
$buffer = ob_get_clean();
$doc = new \DOMDocument();
$doc->loadHTML($buffer);
$head = $doc->getElementsByTagName('head')->item(0);
print_r('<br><br> 184 view.php head='); var_dump($head);
$body = $doc->getElementsByTagName('body')->item(0);
print_r('<br><br> 188 view.php body='); var_dump($body);
$docNew = new \DOMDocument();
$headNew = $head->cloneNone(true); // Fatal error: Call to undefined method DOMElement::cloneNone()
$docNew->appendChild($headNew);
$bodyNew = $body->cloneNone(true); // Fatal error: Call to undefined method DOMElement::cloneNone()
$docNew->appendChild($bodyNew);
echo $docNew->saveHTML();
In order to clone the Element, the solution is to import the Element i want to clone to the Document, and than add it as a child : http://php.net/manual/en/domdocument.importnode.php
This does not through errors, and echoes the new document.
But this does not resolve the problem with misplaced head.
ob_start();
$viewData = $this->data;
include_once( $this->viewTemplPath.$this->file );
$buffer = ob_get_clean();
$doc = new \DOMDocument();
$doc->loadHTML($buffer);
$head = $doc->getElementsByTagName('head')->item(0);
$body = $doc->getElementsByTagName('body')->item(0);
$docNew = new \DOMDocument();
$headNew = $docNew->importNode($head, true);
$docNew->appendChild($headNew);
$bodyNew = $docNew->importNode($body, true);
$docNew->appendChild($bodyNew);
echo $docNew->saveHTML();
You have typo mistake,
you wrote it CloneNone(true),
it should be cloneNode(true)

How do I get the value of a <pre> tag with no ID?

I have the following code set up from an example:
<?php
$url = 'http://somedomain/something';
$content = file_get_contents($url);
$first_step = explode( '<div id="somediv">' , $content );
$second_step = explode("</div>" , $first_step[1] );
echo $second_step[0];
?>
The problem here is that the website from which I'm trying to fetch the value of the pre tag has no ID:
<pre>some content</pre>
I've also tried this but no success so far:
<?php
$url = 'http://somedomain/something';
$content = file_get_contents($url);
$first_step = explode( '<script>document.getElementsByTagName("pre")' , $content );
$second_step = explode("</script>" , $first_step[1] );
echo $second_step[0];
?>
Basically, I'm trying to fetch a value from a domain which is wrapped by a pre tag with no additional identifiers. Any help appreciated!
PHP ships with a pretty decent document parser:
$dom = new DOMDocument;
$dom->loadHTMLFile('http://somedomain/something');
foreach ($dom->getElementsByTagName('pre') as $node) {
// do stuff with $node
echo $node->nodeValue, "\n";
}
See also: DOMDocument
there are many ways to parse html dom elements,
For PHP Dome parser, check the link http://simplehtmldom.sourceforge.net/
For Yahoo YQL, use this link https://developer.yahoo.com/yql/
In Javascript, Jquery also there are so many methods to parse HTML.
Use which is convenient to you.

fetch Youtube subscribers count by JSON API in PHP

i want to get the subscriber count value from this JSON file: http://gdata.youtube.com/feeds/api/users/googlechrome?v=2&alt=json
This is what i did but it's not working.
$youtube_url = json_decode( file_get_contents( 'http://gdata.youtube.com/feeds/api/users/googlechrome?v=2&alt=json' ), true );
$youtube_data = $youtube_url['entry']['yt$statistics']['subscriberCount'];
PHP Code:
function get_yt_subs($username) {
$xmlData = file_get_contents('http://gdata.youtube.com/feeds/api/users/' . strtolower($username));
$xmlData = str_replace('yt:', 'yt', $xmlData);
$xml = new SimpleXMLElement($xmlData);
$subs = $xml->ytstatistics['subscriberCount'];
return($subs);
}
Example of usage:
PHP Code:
get_yt_subs('3moeslam')
I just change the JSON method to XML and everything work fine for me. the question that i wrote work good for #Matt Koskela but not for me. anyway, i'll go a head with this method but i really want to know the problem with the JSON method.
$youtube_url = file_get_contents( 'http://gdata.youtube.com/feeds/api/users/googlechrome';
$youtube_url = str_replace( 'yt:', 'yt', $youtube_url );
$youtube_data = $youtube_url->ytstatistics['subscriberCount'];

Prevent save() from overwriting dtd in an xml file

I'm writing a script that adds nodes to an xml file. In addition to this I have an external dtd I made to handle the organization to the file. However the script I wrote keeps overwriting the dtd in the empty xml file when it's done appending nodes. How can I stop this from happening?
Code:
<?php
/*Dom vars*/
$dom = new DOMDocument("1.0", "UTF-8");
$previous_value = libxml_use_internal_errors(TRUE);
$dom->load('post.xml');
libxml_clear_errors();
libxml_use_internal_errors($previous_value);
$dom->formatOutput = true;
$entry = $dom->getElementsByTagName('entry');
$date = $dom->getElementsByTagName('date');
$para = $dom->getElementsByTagname('para');
$link = $dom->getElementsByTagName('link');
/* Dem POST vars used by dat Ajax mah ziggen, yeah boi*/
if (isset($_POST['Text'])){
$text = trim($_POST['Text']);
}
/*
function post(){
global $dom, $entry, $date, $para, $link,
$home, $about, $contact, $text;
*/
$entryC = $dom->createElement('entry');
$dateC = $dom->createElement('date', date("m d, y H:i:s")) ;
$entryC->appendChild($dateC);
$tab = "\n";
$frags = explode($tab, $text);
$i = count($frags);
$b = 0;
while($b < $i){
$paraC = $dom->createElement('para', $frags[$b]);
$entryC->appendChild($paraC);
$b++;
}
$linkC = $dom->createElement('link', rand(100000, 999999));
$entryC->appendChild($linkC);
$dom->appendChild($entryC);
$dom->save('post.xml');
/*}
post();
*/echo 1;
?>
It looks like in order to do this, you'd have to create a DOMDocumentType using
DOMImplementation::createDocumentType
then create an empty document using the DOMImplementation, and pass in the DOMDocumentType you just created, then import the document you loaded. This post: http://pointbeing.net/weblog/2009/03/adding-a-doctype-declaration-to-a-domdocument-in-php.html and the comments looked useful.
I'm guessing this is happening because after parsing/validation, the DTD isn't part of the DOM anymore, and PHP therefore isn't able to include it when the document is serialized.
Do you have to use a DTD? XML Schemas can be linked via attributes (and the link is therefore part of the DOM). Or there's RelaxNG, which can be linked via a processing instruction. DTDs have all this baggage that comes with them as a holdover from SGML. There are better alternatives.

Categories