PHP DOM saveHTML changes formatting - php

I load external HTML page and with loadHTML.
Than I replace two childs and remove one.
saveHTML() method changes something and I do not want that.
It changes position of the closing
</head>
tag, puts it right after and on original page closing head is further down the line after few tags.
It also changes body tag to:
<body class="something">
to just
<body>
.
How I can save it using PHP DOM so it respects all the positioning and attributes?
Here is the code:
$document = new DOMDocument();
#$document->loadHTML($contents);
$login_signup = $document->getElementById('loginBar')->getElementsByTagName('div')->item(1);
$login_signup->removeChild($login_signup->getElementsByTagName('h3')->item(0));
$todays_a = $document->createElement('a', 'Todays Digest');
$todays_a->setAttribute('href', $domain . $digest_newsletter . date('mdy') . '.html');
$previous_a = $document->createElement('a', 'Previous Digest');
$previous_a->setAttribute('href', $domain . $digest_newsletter . date('mdy', strtotime('-1 day')) . '.html');
$todays_div = $document->getElementById('myDiv');
$todays_div->replaceChild($todays_a, $todays_div->getElementsByTagName('script')->item(0));
$previous_div = $document->getElementById('myDiv2');
$previous_div->replaceChild($previous_a, $previous_div->getElementsByTagName('script')->item(0));
$contents = $document->saveHTML($document);

Related

Get td contain table from library simplehtmldom

simple_html_dom does not work in page "https://eldni.com/buscar-por-dni?dni=44626399"
<?php
include_once './simple_html_dom./HtmlWeb.php';
use simplehtmldom\HtmlWeb;
// get DOM from URL or file
$doc = new HtmlWeb();
$html = $doc->load('https://eldni.com/buscar-por-dni?dni=44626399');
foreach($html->find('td') as $e)
echo $e->plaintext . '<br>' . PHP_EOL;
?>
I want td plain text of the "td" table.

Append to start of href tag

I'm looking to turn
Some page to
Some page
using PHP. I'll have the HTML code of a random website so it's not as simple as using str_replace()
I've tried Replacing anchor href value with regex but that seems to just erase my entire page and I get a blank, white screen. Can anyone offer any help?
My code:
$html = file_get_contents(htmlentities($_GET['q'])); // Takes contents of website entered by user
$arr = array(); // Defines array
$html2 = ""; // Defines variable to write to later
$dom = new DOMDocument();
$dom->loadHTML($html); // Loads the HTML code displayed earlier
$domcss = $dom->getElementsByTagName('link');
foreach($domcss as $links) {
if( strtolower($links->getAttribute('rel')) == "stylesheet" ) {
$x = $links->getAttribute('href');
$html2 .= '<link rel="stylesheet" type="text/css" href="'.htmlentities($_GET['q']) . "/" . $x.'">';
}
} // This replaces all stylesheets from "./style.css", to "http://example.com/style.css"
echo $html2 . $html // Echos the entire webpage, with stylesheet links edited
To manipulate this with DOM, find the <a> tags and then if there is a href attribute, add the prefix in. The end of this code just echos out the resultant HTML...
$dom = new DOMDocument();
$dom->loadHTML($html); // Loads the HTML code displayed earlier
$aTags = $dom->getElementsByTagName('a');
$prefix = "http://example.com?q=";
foreach($aTags as $links) {
$href = $links->getAttribute('href');
if( !empty($href)) {
$links->setAttribute("href", $prefix.$href);
}
}
echo $dom->saveHTML();
$prefix contains the bit you want to add the the URL.

Retrieve the DOM from a variable with Simple HTML DOM Parser?

I'm using Simple HTML DOM Parser to retrieve informations from a website with this code:
$html = file_get_html("http://www.example.com/"]);
$table = $html->find("div[class=table]");
foreach ( $table as $tabella ) {
$title = $tabella->find (".elementTitle");
echo "<h2>" . $title[0] -> plaintext . "</h2>";
$minisito = $tabella->find ("h1[class=elementTitle] a");
echo "<p>" . $minisito[0] -> href . "</p>";
}
Now I need to extract other pieces of contents from the url contained in this specific urls $minisito[0] -> href
How can I create another variable using file_get_html command to extract data from this new urls?

Add canonical tag in <head> tag if not existing using PHP

What I want is add canonical tag inside the <head> tag if canonical tag does not exist using PHP
I have tried this but it didn't work
function addtag(){
$doc = new DOMDocument();
$doc->loadHTMLFile('http://'. $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI']);
$body = $doc->getElementsByTagName('head')->item(0);
$link = $doc->createElement('link');
$newLink = $body->appendChild($link);
$newLink->setAttribute("canonical", 'http://');
//$linkText = $doc->createTextNode("Display Text For Link");
//$newLink->appendChild($linkText);
echo $doc->saveHTML();
}

php add class to Link in UL LI of current page

trying with regex at php to add a "curPage" class to my ul-li menu before sending to browser.
code:
$loadedMenu=preg_replace("/a href=\"" . $file . ".htm\"/", "a href=\"" . $file . ".htm\" class=\"curPage\"", $loadedMenu);
content of $loadedMenu:
<nav><ul><li rel="file" id="516054b57fbba">דף הבית</li><li rel="file" id="51681f81a440b">משתמש חדש</li><li rel="file" id="516054b57fb40">אודות</li><li rel="file" id="5160f37b822a3">דף חדש</li><li rel="folder" id="516054d162176">תיקייה חדשה<ul><li rel="file" id="516054b57fc62">מיטל הנסיכה שלי</li><li rel="file" id="516054b57fc9a">נסיון</li><li rel="file" id="516054b57fb82">עזרה</li></ul></li><li rel="folfil" id="5160552162177">תיקיית תוכן<ul><li rel="file" id="516054b57fbf2">test</li></ul></li><li rel="file" id="516054b57fc2a">גלידה</li><li rel="file" id="516054b57fcd2">נסיון0</li></ul></nav>
It is so much error prone to parse HTML text like you're doing that it is almost no-no.
Better to use DOM parser to parse and modify HTML like this code:
$file = 'foo.htm'; // set your value here
# fetch your HTML content here
$html = <<< EOF
<html>
Click link1 morestuff
Click www.example.com morestuff
notexample.com morestuff
Click link1
</html>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
// find all hrefs with $file in it
$nodelist = $xpath->query("//a[contains(#href, '" . $file . "')]");
// iterate thru found links
for($i=0; $i < $nodelist->length; $i++) {
$node = $nodelist->item($i);
# add class attribute to them
$node->setAttribute('class', 'curPage');
}
echo $doc->saveHTML();
OUTPUT:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>Click link1 morestuff
Click www.example.com morestuff
notexample.com morestuff
Click link1
</body></html>
Executing this code on the command line works fine. That part that does not work is hidden from us.
<?php
$file = "index";
$loadedMenu = 'whatever';
$loadedMenu=preg_replace("/a href=\"" . $file . ".htm\"/", "a href=\"" . $file . ".htm\" class=\"curPage\"", $loadedMenu);
echo $loadedMenu;
// whatever
?>
Perhaps the initial value of loadedMenu has single quotes instead of double, or a slightly different href value.
You could use a slightly more generic regex, capturing any filename instead of the specific file in your code...
$loadedMenu=preg_replace('/a href="(.+?)"/', 'a href="$1" class="curPage"', $loadedMenu);

Categories