Insert " < " into html file using php - php

Is there a way I can insert < or > in HTML file using PHP. Here is a a part of code
<?php
$custom_tag = $searchNode->nodeValue = "<";
$custom_tag = $searchNode->setAttribute("data-mark", $theme_name);
$custom_tag = $dom->saveHTML($searchNode);
$root = $searchNode->parentNode;
$root->removeChild($searchNode);
$test = $dom->createTextNode("<?php echo '$custom_tag'; ?>");
$root->appendChild($test);
$saved_file_in_HTML = $dom->saveHTML();
$saved_file = htmlspecialchars_decode($saved_file_in_HTML);
file_put_contents("test.html", $saved_file);
The problem is that I get < using above method and I would like to have <.
EDIT:
Full code:
if($searchNode->hasAttribute('data-put_page_content')) {
$custom_tag = $searchNode->nodeValue = "'; \$up_level = null; \$path_to_file = 'DATABASE/PAGES/' . basename(\$_SERVER['PHP_SELF'], '.php') . '.txt'; for(\$x = 0; \$x < 1; ) { if(is_file(\$up_level.\$path_to_file)) { \$x = 1; include(\$up_level.\$path_to_file); } else if(!is_file(\$up_level.\$path_to_file)) { \$up_level .= '../'; } }; echo '";
$custom_tag = $searchNode->setAttribute("data-mark", $theme_name);
$custom_tag = $dom->saveHTML($searchNode);
$root = $searchNode->parentNode;
$root->removeChild($searchNode);
$test = $dom->createTextNode("<?php echo '$custom_tag'; ?>");
$root->appendChild($test);
$saved_file_in_HTML = $dom->saveHTML();
$saved_file = htmlspecialchars_decode($saved_file_in_HTML);
file_put_contents("../THEMES/ACTIVE/". $theme_name ."/" . $trimed_file, $saved_file);
copy("../THEMES/ACTIVE/" . $theme_name . "/structure/index.php", "../THEMES/ACTIVE/" . $theme_name ."/index.php");
header("Location: editor.php");
}
FINAL EDIT:
If you want to have > or < using PHP DOMs methods, it is working using createTextNode() method.

The original question appears to concern how to use PHP to manufacture an HTML tag, so I'll address that question. While it is a good idea to use htmlentities(), you also need to be aware that it's primary purpose is to protect your code so that a malicious user doesn't inject it with javascript or other tags that could create security problems. In this case, the only way to generate HTML is to create HTML and PHP provides at least three ways to accomplish this feat.
You may write code such as the following:
<?php
define("LF_AB","<");
define("RT_AB",">");
echo LF_AB,"div",RT_AB,"\n";
echo "content\n";
echo LF_AB,"/div",RT_AB,"\n";
The code produces start and end div tags with some minimal content. Note, the defines are optional, i.e. one could code in a more straightforward fashion <?php echo "<"; ?> instead of resorting to using a define() to generate the left-angle tag character.
However, if one is wary of generating HTML in this manner, then you may also use html_entity_decode:
<?php
define("LF_AB",html_entity_decode("<"));
define("RT_AB",html_entity_decode(">"));
echo LF_AB,"div",RT_AB,"\n";
echo "content\n";
echo LF_AB,"/div",RT_AB,"\n";
This example treats the arguments to html_entity_decode as harmless HTML entities and then converts them into HTML left and right angle characters respectively.
Alternately, you could also take advantage of the DOM with PHP, even with an existing HTML page, as follows:
<?php
function displayContent()
{
$dom = new DOMDocument();
$element = $dom->createElement('div', 'My great content');
$dom->appendChild($element);
echo $dom->saveHTML();
}
?>
<!DOCTYPE html>
<html>
<head>
<title>Untitled</title>
<style>
div {
background:#ccbbcc;
color:#009;
font-weight:bold;
}
</style>
</head>
<body>
<?php displayContent(); ?>
</body>
</html>
Note: the Dom method createTextNode true to its name creates text and not HTML, i.e. the browser will only interpret the right and left angle characters as text without any additional meaning.

If I understand you correctly, I think you could use < and > in the first place. They should be rendered as HTML. Might save you some coding. Let me know if it works.

Related

Simple HTML Dom Crawler returns more than contained in attributes

I would like to extract the contents contained within certain parts of a website using selectors. I am using Simple HTML DOM to do this. However for some reason more data is returned than present in the selectors that I specify. I have checked the FAQ of Simple HTML DOM, but did not see anything that could help me out. I wasn't able to find anything on Stackoverflow either.
I am trying to get the contents/hrefs of all h2 class="hed" tags contained within the ul class="river" on this webpage: http://www.theatlantic.com/most-popular/
In my output I am receiving a lot of data from other tags like p class="dek has-dek" that are not contained within the h2 tag and should not be included. This is really strange as I thought the code would only allow for content within those tags to be scraped.
How can I limit the output to only include the data contained within the h2 tag?
Here is the code I am using:
<div class='rcorners1'>
<?php
include_once('simple_html_dom.php');
$target_url = "http://www.theatlantic.com/most-popular/";
$html = new simple_html_dom();
$html->load_file($target_url);
$posts = $html->find('ul[class=river]');
$limit = 10;
$limit = count($posts) < $limit ? count($posts) : $limit;
for($i=0; $i < $limit; $i++){
$post = $posts[$i];
$post->find('h2[class=hed]',0)->outertext = "";
echo strip_tags($post, '<p><a>');
}
?>
</div>
Output can be seen here. Instead of only a couple of article links, I get information of the author, information on the article, among others.
You are not outputting the h2 contents, but the ul contents in the echo:
echo strip_tags($post, '<p><a>');
Note that the statement before the echo does not modify $post:
$post->find('h2[class=hed]',0)->outertext = "";
Change code to this:
$hed = $post->find('h2[class=hed]',0);
echo strip_tags($hed, '<p><a>');
However, that will only do something with the first found h2. So you need another loop. Here is a rewrite of the code after load_file:
$posts = $html->find('ul[class=river]');
foreach($posts as $postNum => $post) {
if ($postNum >= 10) break; // limit reached
$heds = $post->find('h2[class=hed]');
foreach($heds as $hed) {
echo strip_tags($hed, '<p><a>');
}
}
If you still need to clear outertext, you can do it with $hed:
$hed->outertext = "";
You really only need one loop. Consider this:
foreach($html->find('ul.river > h2.hed') as $postNum => $h2) {
if ($postNum >= 10) break;
echo strip_tags($h2, '<p><a>') . "\n"; // the text
echo $h2->parent->href . "\n"; // the href
}

Simple HTML DOM How to add tag and choose 2 classes?

I have a code:
$data = PhpSimple\HtmlDomParser::str_get_html($result);
foreach($data->find($this->owner->selector) as $img) {
$dataSrc = 'data-src';
$img->$dataSrc = $img->src;
$img->src = $loading;
}
Where I add to all tags img attributes. I need after each tag insert a tag noscript. Tell me how to do it?
<noscript>
<img src='mySource' />
</noscript>
And the second question is how to specify the selector 2 class, as in the CSS,
reality turned out to specify only one class:
find('div[class=l-column_3] img')
Have a look at this question, where he finds the following workaround (edited it for your requirement):
$var = "<noscript><img src="mySource" /></noscript>";
$img->outertext = $img->makeup() . $img->innertext . $var;
Or you can use,
$img->outertext = $img->outertext;
In that case, have you tried this?
$data->find('div.l-column.l-column_3 img')

How to code PHP function that displays a specific div from external file if div called by getElementById has no value?

Thank you for answering my question so quickly. I did some more digging and ultimately found a solution for grabbing data from external file and specific div and posting it into another document using PHP DOMDocument. Now I'm looking to improve the code by adding an if condition that will grab data from a different div if the one called for initially by getElementById has now data. Here is the code for what I got so far.
External html as source.
<div id="tab1_header" class="cushycms"><h2>Meeting - 12:00pm to 3:00pm</h2></div>
My PHP file calling from source looks like this.
<?php
$source = "user_data.htm";
$dom = new DOMDocument();
$dom->loadHTMLFile($source);
$dom->preserveWhiteSpace = false;
$tab1_header = $dom->getElementById('tab1_header');
?>
<html>
<head>
<title></title>
</head>
<body>
<div><h2><?php echo $tab1_header->nodeValue; ?></h2></div>
</body>
</html>
The following function will output a message if a div id can't be found but...
if(!tab1_header)
{
die("Element not found");
}
I would like to call for a different div if the one called for initially has no data. Meaning if <div id="tab1_header"></div> then grab <div id="alternate"><img src="filler.png" /></div>. Can someone help me modify the function above to achieve this result.
Thanks.
either split up master.php so div1\2 are in a file each or set them each to a var, them include master.php, and use the appropriate variable
master.php
$d1='<div id="description1">Some Text</div>';
$d2='<div id="description2">Some Text</div>';
description1.php
include 'master.php';
echo $d1;
You can't do this solely with PHP includes unless you put the divs into separate files. Look into PHP templating; it's probably the best solution for this. Or, since you're new to the language, try using variables:
master.php
$description1 = '<div id="description1">Some Text</div>';
$description2 = '<div id="description2">Some Text</div>';
board1.php
include 'master.php';
echo $description1;
board2.php
include 'master.php';
echo $description2;
Alternatively, you could use JavaScript, but that might get a little messy.
Short answer is: although it's possible it's probably very bad idea taking this approach.
Longer answer: the solution may turn out to be too complicated. If in your master.php file is only HTML markup, you could read content of that file with file_get_contents() function and then parse it (i.e. with DOMDocument library functions). You would have to look for a div with given id.
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$divs = $doc->getElementsByTagName('div');
foreach ($divs as $div)
{
if( $div->getAttribute('id') == 'description1' )
{
echo $div->nodeValue."\n";
}
}
?>
If your master.php file has also some dynamic content you could do following trick:
<?php
ob_start();
include('master.php');
$sMasterPhpContent = ob_get_clean();
// same as above - parse HTML
?>
Edit:
$tab_header = $dom->getElementById('tab1_header') ? $dom->getElementById('tab1_header') : $dom->getElementById('tab2_header');

Fixing unclosed HTML tags

I am working on some blog layout and I need to create an abstract of each post (say 15 of the lastest) to show on the homepage. Now the content I use is already formatted in html tags by the textile library. Now if I use substr to get 1st 500 chars of the post, the main problem that I face is how to close the unclosed tags.
e.g
<div>.......................</div>
<div>...........
<p>............</p>
<p>...........| 500 chars
</p>
<div>
What I get is two unclosed tags <p> and <div> , p wont create much trouble , but div just messes with the whole page layout. So any suggestion how to track the opening tags and close them manually or something?
There are lots of methods that can be used:
Use a proper HTML parser, like DOMDocument
Use PHP Tidy to repair the un-closed tag
Some would suggest HTML Purifier
As ajreal said, DOMDocument is a solution.
Example :
$str = "
<html>
<head>
<title>test</title>
</head>
<body>
<p>error</i>
</body>
</html>
";
$doc = new DOMDocument();
#$doc->loadHTML($str);
echo $doc->saveHTML();
Advantage : natively included in PHP, contrary to PHP Tidy.
You can use DOMDocument to do it, but be careful of string encoding issues. Also, you'll have to use a complete HTML document, then extract the components you want. Here's an example:
function make_excerpt ($rawHtml, $length = 500) {
// append an ellipsis and "More" link
$content = substr($rawHtml, 0, $length)
. '… More >';
// Detect the string encoding
$encoding = mb_detect_encoding($content);
// pass it to the DOMDocument constructor
$doc = new DOMDocument('', $encoding);
// Must include the content-type/charset meta tag with $encoding
// Bad HTML will trigger warnings, suppress those
#$doc->loadHTML('<html><head>'
. '<meta http-equiv="content-type" content="text/html; charset='
. $encoding . '"></head><body>' . trim($content) . '</body></html>');
// extract the components we want
$nodes = $doc->getElementsByTagName('body')->item(0)->childNodes;
$html = '';
$len = $nodes->length;
for ($i = 0; $i < $len; $i++) {
$html .= $doc->saveHTML($nodes->item($i));
}
return $html;
}
$html = "<p>.......................</p>
<p>...........
<p>............</p>
<p>...........| 500 chars";
// output fixed html
echo make_excerpt($html, 500);
Outputs:
<p>.......................</p>
<p>...........
</p>
<p>............</p>
<p>...........| 500 chars… More ></p>
If you are using WordPress you should wrap the substr() invocation in a call to wpautop - wpautop(substr(...)). You may also wish to test the length of the $rawHtml passed to the function, and skip appending the "More" link if it isn't long enough.

How do I programmatically add rel="external" to external links in a string of HTML?

How can I check if links from a string variable are external? This string is the site content (like comments, articles etc).
And if they are, how do I append a external value to their rel attribute? And if they don't have this attribute, append rel="external" ?
A HTML parser is appropriate for input filtering, but for modifying output you'll need the performance of a simpleminded regex solution. In this case a callback regex would do:
$html = preg_replace_callback("#<a\s[^>]*href="(http://[^"]+)"[^>]*>#",
"cb_ext_url", $html);
function cb_ext_url($match) {
list ($orig, $url) = $match;
if (strstr($url, "http://localhost/")) {
return $orig;
}
elseif (strstr($orig, "rel=")) {
return $orig;
}
else {
return rtrim($orig, ">") . ' rel="external">';
}
}
You'll probably need more fine-grained checks. But that's the general approach.
Use an XML parser, like SimpleXML. Regex isn't made to do XML/HTML parsing, and here's a perfect explanation of what happens when you do: RegEx match open tags except XHTML self-contained tags.
Parse the input as XML, use the parser to select the required elements, edit their properties using the parser, and spit them back out.
It'll save you a headache, as regex makes me cry...
Here's my way of doing this (didn't test it):
<?php
$xmlString = "This is where the HTML of your site should go. Make sure it's valid!";
$xml = new SimpleXMLElement($xmlString);
foreach($xml->getElementsByTagName('a') as $a)
{
$attributes = $a->attributes();
if (isThisExternal($attributes['href']))
{
$a['rel'] = 'external';
}
}
echo $xml->asXml();
?>
It might be easier to do something like this on the client side, using jQuery:
<script type="text/javascript">
$(document).ready(function()
{
$.each($('a'), function(idx, tag)
{
// you might make this smarter and throw out URLS like
// http://www.otherdomain.com/yourdomain.com
if ($(tag).attr('href').indexOf('yourdomain.com') < 0)
{
$(tag).attr('rel', 'external');
}
});
});
</script>
As Craig White points out though, this doesn't do anything SEO-wise and won't help users who have JavaScript disabled.

Categories