PHP DomDocument editing all links - php

I am using the following code to grab html from another page and place it into my php page:
$doc = new DomDocument;
// We need to validate our document before refering to the id
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('{URL IS HERE}'));
$content = $doc->getElementById('form2');
echo $doc->SaveHTML($content);
I want to change all instances of <a href="/somepath/file.htm"> so that I can prepend to it the actual domain instead. How can I do this?
So, it would need to change them to: <a href="http://mydomain.com/somepath/file.htm"> instead.

try something like:
$xml = new DOMDocument();
$xml->loadHTMLFile($url);
foreach($xml->getElementsByTagName('a') as $link) {
$oldLink = $link->getAttribute("href");
$link->setAttribute('href', "http://mydomain.com/" . $oldLink);
}
echo $xml->saveHtml();

Related

Bulk internal links redirect method in static site

I have a static site consisting of about 1500+ pages. Each one of those HTML pages has an internal link (I mean coded on the page itself) that I want to replace with a new one.
But, it is not possible to edit all those pages as there are thousands of them.
Here is an example:
Say I have a page www.example.com/page.html
There is a hyperlink on that page which directs to www.abc.com
I want this link replace/redirect with www.xyz.com instead of www.abc.com
I already have a MySQL table in which I have all the current links vs the new links in two columns.
Is there a way to redirect all these links by using the .htaccess file, PHP and MySQL? If it is possible then what would be the code look like?
I found a way to change all the links using this script.
<?php
require_once("connection.inc.php");
function get_value($file){
//$html = file_get_contents($file,true);
$dom = new DOMDocument('1.0');
$dom->loadHTMLFile($file);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a/#href');
foreach($nodes as $href) {
return $href->nodeValue; // echo current attribute value
//$href->nodeValue = 'new value'; // set new attribute value
//$href->parentNode->removeAttribute('href'); // remove attribute
}
}
function put_value($file,$pattern){
//$html = file_get_contents($file,true);
$dom = new DOMDocument('1.0');
$dom->loadHTMLFile($file);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a');
$nodes->item(0)->setAttribute("href",$pattern);
$dom->save($file);
}
$dir = new RecursiveDirectoryIterator(__DIR__);
$flat = new RecursiveIteratorIterator($dir);
$files = new RegexIterator($flat, '/\.html$/i');
foreach($files as $file) {
$value=get_value($file);
$query=mysql_query("SELECT new_links FROM site WHERE current_links='$value'");
if(mysql_num_rows($query)==1){
$value_orginal = mysql_result($query,0);
put_value($file,$value_orginal);
echo "The ".$file." has changed to ".$value."</br>";
}
else{
die("die baby die :)");
}
}
?>
you can use htaccess redirection
Redirect to a Different URL using .htaccess

Retrieve data from html page using xpath and php

I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs.
<?php
$url = 'http://aice.anie.it/quotazione-lme-rame/';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTML($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tbody/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
?>
this prints just "hello!". I want to print the value extracted with the xpath, but the last echo doesn't do anything.
You have some errors in your code :
You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/, but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp, so get the iframe url directly.
You use the function loadHTML(), which load an HTML string. What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php)
You assume there is a tbody element on the page but there is no one. So remove that from your query filter.
Working code :
$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}

DOMDocument grab html between two p tags [duplicate]

I'm trying to replace video links inside a string - here's my code:
$doc = new DOMDocument();
$doc->loadHTML($content);
foreach ($doc->getElementsByTagName("a") as $link)
{
$url = $link->getAttribute("href");
if(strpos($url, ".flv"))
{
echo $link->outerHTML();
}
}
Unfortunately, outerHTML doesn't work when I'm trying to get the html code for the full hyperlink like <a href='http://www.myurl.com/video.flv'></a>
Any ideas how to achieve this?
As of PHP 5.3.6 you can pass a node to saveHtml, e.g.
$domDocument->saveHtml($nodeToGetTheOuterHtmlFrom);
Previous versions of PHP did not implement that possibility. You'd have to use saveXml(), but that would create XML compliant markup. In the case of an <a> element, that shouldn't be an issue though.
See http://blog.gordon-oheim.biz/2011-03-17-The-DOM-Goodie-in-PHP-5.3.6/
You can find a couple of propositions in the users notes of the DOM section of the PHP Manual.
For example, here's one posted by xwisdom :
<?php
// code taken from the Raxan PDI framework
// returns the html content of an element
protected function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
?>
The best possible solution is to define your own function which will return you outerhtml:
function outerHTML($e) {
$doc = new DOMDocument();
$doc->appendChild($doc->importNode($e, true));
return $doc->saveHTML();
}
than you can use in your code
echo outerHTML($link);
Rename a file with href to links.html or links.html to say google.com/fly.html that has flv in it or change flv to wmv etc you want href from if there are other href
it will pick them up as well
<?php
$contents = file_get_contents("links.html");
$domdoc = new DOMDocument();
$domdoc->preservewhitespaces=“false”;
$domdoc->loadHTML($contents);
$xpath = new DOMXpath($domdoc);
$query = '//#href';
$nodeList = $xpath->query($query);
foreach ($nodeList as $node){
if(strpos($node->nodeValue, ".flv")){
$linksList = $node->nodeValue;
$htmlAnchor = new DOMElement("a", $linksList);
$htmlURL = new DOMAttr("href", $linksList);
$domdoc->appendChild($htmlAnchor);
$htmlAnchor->appendChild($htmlURL);
$domdoc->saveHTML();
echo ("<a href='". $node->nodeValue. "'>". $node->nodeValue. "</a><br />");
}
}
echo("done");
?>

Get link from html by php

How do I get this link <li><a rel="prev" href="/1149/" accesskey="p">< Prev</a></li> from an html document using PHP? How do I get the link by the "rel"?
I'm trying to get /1149/
Trying to understand what you want… If you want to take an HTML/XML input and grab the href value of a link with the attribute rel="prev" I'd suggest using DOMXpath, something like:
$html = '<li><a rel="prev" href="/1149/" accesskey="p">< Prev</a></li>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a[#rel='prev']") as $node) {
if ($node->hasAttribute('href')) {
echo $node->getAttribute('href') . '<br>';
}
}

Echoing only a div with php

I'm attempting to make a script that only echos the div that encolose the image on google.
$url = "http://www.google.com/";
$page = file($url);
foreach($page as $theArray) {
echo $theArray;
}
The problem is this echos the whole page.
I want to echo only the part between the <div id="lga"> and the next closest </div>
Note: I have tried using if's but it wasn't working so I deleted them
Thanks
Use the built-in DOM methods:
<?php
$page = file_get_contents("http://www.google.com");
$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($page);
libxml_use_internal_errors(false);
$domx = new DOMXPath($domd);
$lga = $domx->query("//*[#id='lga']")->item(0);
$domd2 = new DOMDocument();
$domd2->appendChild($domd2->importNode($lga, true));
echo $domd2->saveHTML();
In order to do this you need to parse the DOM and then get the ID you are looking for. Check out a parsing library like this http://simplehtmldom.sourceforge.net/manual.htm
After feeding your html document into the parser you could call something like:
$html = str_get_html($page);
$element = $html->find('div[id=lga]');
echo $element->plaintext;
That, I think, would be your quickest and easiest solution.

Categories