I load external HTML page and with loadHTML.
Than I replace two childs and remove one.
saveHTML() method changes something and I do not want that.
It changes position of the closing
</head>
tag, puts it right after and on original page closing head is further down the line after few tags.
It also changes body tag to:
<body class="something">
to just
<body>
.
How I can save it using PHP DOM so it respects all the positioning and attributes?
Here is the code:
$document = new DOMDocument();
#$document->loadHTML($contents);
$login_signup = $document->getElementById('loginBar')->getElementsByTagName('div')->item(1);
$login_signup->removeChild($login_signup->getElementsByTagName('h3')->item(0));
$todays_a = $document->createElement('a', 'Todays Digest');
$todays_a->setAttribute('href', $domain . $digest_newsletter . date('mdy') . '.html');
$previous_a = $document->createElement('a', 'Previous Digest');
$previous_a->setAttribute('href', $domain . $digest_newsletter . date('mdy', strtotime('-1 day')) . '.html');
$todays_div = $document->getElementById('myDiv');
$todays_div->replaceChild($todays_a, $todays_div->getElementsByTagName('script')->item(0));
$previous_div = $document->getElementById('myDiv2');
$previous_div->replaceChild($previous_a, $previous_div->getElementsByTagName('script')->item(0));
$contents = $document->saveHTML($document);
Related
simple_html_dom does not work in page "https://eldni.com/buscar-por-dni?dni=44626399"
<?php
include_once './simple_html_dom./HtmlWeb.php';
use simplehtmldom\HtmlWeb;
// get DOM from URL or file
$doc = new HtmlWeb();
$html = $doc->load('https://eldni.com/buscar-por-dni?dni=44626399');
foreach($html->find('td') as $e)
echo $e->plaintext . '<br>' . PHP_EOL;
?>
I want td plain text of the "td" table.
I'm looking to turn
Some page to
Some page
using PHP. I'll have the HTML code of a random website so it's not as simple as using str_replace()
I've tried Replacing anchor href value with regex but that seems to just erase my entire page and I get a blank, white screen. Can anyone offer any help?
My code:
$html = file_get_contents(htmlentities($_GET['q'])); // Takes contents of website entered by user
$arr = array(); // Defines array
$html2 = ""; // Defines variable to write to later
$dom = new DOMDocument();
$dom->loadHTML($html); // Loads the HTML code displayed earlier
$domcss = $dom->getElementsByTagName('link');
foreach($domcss as $links) {
if( strtolower($links->getAttribute('rel')) == "stylesheet" ) {
$x = $links->getAttribute('href');
$html2 .= '<link rel="stylesheet" type="text/css" href="'.htmlentities($_GET['q']) . "/" . $x.'">';
}
} // This replaces all stylesheets from "./style.css", to "http://example.com/style.css"
echo $html2 . $html // Echos the entire webpage, with stylesheet links edited
To manipulate this with DOM, find the <a> tags and then if there is a href attribute, add the prefix in. The end of this code just echos out the resultant HTML...
$dom = new DOMDocument();
$dom->loadHTML($html); // Loads the HTML code displayed earlier
$aTags = $dom->getElementsByTagName('a');
$prefix = "http://example.com?q=";
foreach($aTags as $links) {
$href = $links->getAttribute('href');
if( !empty($href)) {
$links->setAttribute("href", $prefix.$href);
}
}
echo $dom->saveHTML();
$prefix contains the bit you want to add the the URL.
I'm using Simple HTML DOM Parser to retrieve informations from a website with this code:
$html = file_get_html("http://www.example.com/"]);
$table = $html->find("div[class=table]");
foreach ( $table as $tabella ) {
$title = $tabella->find (".elementTitle");
echo "<h2>" . $title[0] -> plaintext . "</h2>";
$minisito = $tabella->find ("h1[class=elementTitle] a");
echo "<p>" . $minisito[0] -> href . "</p>";
}
Now I need to extract other pieces of contents from the url contained in this specific urls $minisito[0] -> href
How can I create another variable using file_get_html command to extract data from this new urls?
What I want is add canonical tag inside the <head> tag if canonical tag does not exist using PHP
I have tried this but it didn't work
function addtag(){
$doc = new DOMDocument();
$doc->loadHTMLFile('http://'. $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI']);
$body = $doc->getElementsByTagName('head')->item(0);
$link = $doc->createElement('link');
$newLink = $body->appendChild($link);
$newLink->setAttribute("canonical", 'http://');
//$linkText = $doc->createTextNode("Display Text For Link");
//$newLink->appendChild($linkText);
echo $doc->saveHTML();
}
trying with regex at php to add a "curPage" class to my ul-li menu before sending to browser.
code:
$loadedMenu=preg_replace("/a href=\"" . $file . ".htm\"/", "a href=\"" . $file . ".htm\" class=\"curPage\"", $loadedMenu);
content of $loadedMenu:
<nav><ul><li rel="file" id="516054b57fbba">דף הבית</li><li rel="file" id="51681f81a440b">משתמש חדש</li><li rel="file" id="516054b57fb40">אודות</li><li rel="file" id="5160f37b822a3">דף חדש</li><li rel="folder" id="516054d162176">תיקייה חדשה<ul><li rel="file" id="516054b57fc62">מיטל הנסיכה שלי</li><li rel="file" id="516054b57fc9a">נסיון</li><li rel="file" id="516054b57fb82">עזרה</li></ul></li><li rel="folfil" id="5160552162177">תיקיית תוכן<ul><li rel="file" id="516054b57fbf2">test</li></ul></li><li rel="file" id="516054b57fc2a">גלידה</li><li rel="file" id="516054b57fcd2">נסיון0</li></ul></nav>
It is so much error prone to parse HTML text like you're doing that it is almost no-no.
Better to use DOM parser to parse and modify HTML like this code:
$file = 'foo.htm'; // set your value here
# fetch your HTML content here
$html = <<< EOF
<html>
Click link1 morestuff
Click www.example.com morestuff
notexample.com morestuff
Click link1
</html>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
// find all hrefs with $file in it
$nodelist = $xpath->query("//a[contains(#href, '" . $file . "')]");
// iterate thru found links
for($i=0; $i < $nodelist->length; $i++) {
$node = $nodelist->item($i);
# add class attribute to them
$node->setAttribute('class', 'curPage');
}
echo $doc->saveHTML();
OUTPUT:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>Click link1 morestuff
Click www.example.com morestuff
notexample.com morestuff
Click link1
</body></html>
Executing this code on the command line works fine. That part that does not work is hidden from us.
<?php
$file = "index";
$loadedMenu = 'whatever';
$loadedMenu=preg_replace("/a href=\"" . $file . ".htm\"/", "a href=\"" . $file . ".htm\" class=\"curPage\"", $loadedMenu);
echo $loadedMenu;
// whatever
?>
Perhaps the initial value of loadedMenu has single quotes instead of double, or a slightly different href value.
You could use a slightly more generic regex, capturing any filename instead of the specific file in your code...
$loadedMenu=preg_replace('/a href="(.+?)"/', 'a href="$1" class="curPage"', $loadedMenu);