Change 'href' value of a link using PHP and DOM - php

I would like to change all links in an HTML variable to random ones. Here is my code but something prevents links from being changed:
<?php
$jobTemplateDetails = 'Click!
Click!';
////////////////////// CHANGE ALL LINKS
$linkDom = new DOMDocument;
#$linkDom->loadHTML($jobTemplateDetails);
$allLinks = $linkDom->getElementsByTagName('a');
foreach ($allLinks as $rawLink) {
$longLink = $rawLink->getAttribute('href');
$str = 'abcdefghijklmnopqrstuvwxyz';
$randomChar1 = $str[mt_rand(0, strlen($str)-1)];
$randomChar2 = $str[mt_rand(0, strlen($str)-1)];
$randomChar3 = $str[mt_rand(0, strlen($str)-1)];
$randomChar4 = $str[mt_rand(0, strlen($str)-1)];
$shortURL = mt_rand(1, 9).$randomChar1.mt_rand(1, 9).$randomChar2.$randomChar3.$randomChar4;
$rawLink->setAttribute('href', $shortURL);
}
echo $jobTemplateDetails;

When you echo $jobTemplateDetails; you only show the very first input string, not the DomDocument you manipulate.
Change that to
echo $linkDom->saveHTML();
///OUTPUT:
Click!
Click!
a fiddle: https://3v4l.org/KuCic
and the docs

Related

str_replace not replacing my h2 tag when using dom outerhtml

I have this code below that is parsing the html using dom
The parsing is working but I can't make the str_replace get to work.
when I var_dump both $old_h2 and $new_h2 the output is correct but the string length is incorrect could this be a factor why the variable is not being passed?
when I echo the content nothing is being changed.
Im not sure how can I debug this.
Is there a way to know if the variable is being passed to the str_replace?
<?php
$editorStr = strtr(get_sub_field('editor') , $arr);
if (get_row_layout() == 'condensed title_&_description')
{
$dom = new DOMDocument();
$dom->loadHTML(mb_convert_encoding($editorStr, 'HTML-ENTITIES', 'UTF-8'));
$dom->encoding = 'utf-8';
$h2tags = $dom->getElementsByTagName('h2');
if ($h2tags->length > 0)
{
$title = $h2tags->item(0)->textContent;
}
for($i=0;$i<=$dom->getElementsByTagName("h2")->length;$i++)
{
$h2 = $dom->getElementsByTagName("h2")->item($i);
$old_h2 = outerHTML($h2);
$h2->setAttribute('class', 'content-title');
$h2->setAttribute('data-title', 'title-' . ($counter + 1));
$new_h2 = outerHTML($h2);
$newacfcontent = str_replace( $old_h2,$new_h2,$editorStr);
echo $newacfcontent;
}
}
?>

PHP: Remove a hyperlink from element but retain the text and class

I need to process a DOM and remove all hyperlinks to a particular site while retaining the underlying text. Thus, something ling text changes into text. Taking cue from this thread, I wrote this:
$as = $dom->getElementsByTagName('a');
for ($i = 0; $i < $as->length; $i++) {
$node = $as->item($i);
$link_href = $node->getAttribute('href');
if (strpos($link_href,'offendinglink.com') !== false) {
$cl = $node->getAttribute('class');
$text = new DomText($node->nodeValue);
$node->parentNode->insertBefore($text, $node);
$node->parentNode->removeChild($node);
$i--;
}
}
This works fine except that I also need to retain the class attributed to the offending <a> tag and maybe turn it into a <div> or a <span>. Thus, I need this:
text
to turn into this:
<div class="nice">text</div>
How do I access the new element after it's been added (like in my code snippet)?
quote "How do I access the new element after it's been added (like in my code snippet)?" - your element is in $text i think.. anyway, i think this should work, if you need to save the class and the textContent, but nothing else
foreach($dom->getElementsByTagName('a') as $url){
if(parse_url($url->getAttribute("href"),PHP_URL_HOST)!=='badsite.com') {
continue;
}
$ele = $dom->createElement("div");
$ele->textContent = $url->textContent;
$ele->setAttribute("class",$url->getAttribute("class"));
$url->parentNode->insertBefore($ele,$url);
$url->parentNode->removeChild($url);
}
Tested solution:
<?php
$str = "<b>Dummy</b> <a href='http://google.com' target='_blank' class='nice' id='nicer'>Google.com</a> <a href='http://yandex.ru' target='_blank' class='nice' id='nicer'>Yandex.ru</a>";
$doc = new DOMDocument();
$doc->loadHTML($str);
$anchors = $doc->getElementsByTagName('a');
$l = $anchors->length;
for ($i = 0; $i < $l; $i++) {
$anchor = $anchors->item(0);
$link = $doc->createElement('div', $anchor->nodeValue);
$link->setAttribute('class', $anchor->getAttribute('class'));
$anchor->parentNode->replaceChild($link, $anchor);
}
echo preg_replace(['/^\<\!DOCTYPE.*?<html><body>/si', '!</body></html>$!si'], '', $doc->saveHTML());
Or see runnable.

Extracting specific text from HTML texts

I am not so familiar with regex. I am trying to obtain the results described at the bottom. Here is what I have done so far (note that $page contains tabulators):
$page = "<div class=\"title-container\">
<h1>Text here<span> /Sub-text/</span> </h1>
</div>";
// TITLE
preg_match_all ('/<h1>(.*)<\/h1>/U', $page, $out);
$hutitle = preg_replace("#<span>(.*)<\/span>\s#", "", $out[1][0]);
$entitle = preg_replace("'(.*)<span> /'", "", $out[1][0]);
I would like to get this:
$hutitle = "Text here";
$entitle = "Sub-text"; (Without html and "/")
I'd suggest using DOM with trim, no need for regex, here is a working code for your concrete case:
$page = "<div class=\"title-container\">\n <h1>Text here<span> /Sub-text/</span> </h1>\n </div>";
$dom = new DOMDocument;
$dom->loadHTML($page);
$hs = $dom->getElementsByTagName('h1');
foreach ($hs as $h) {
$enttitlenodes = $h->getElementsByTagName('span');
if ($enttitlenodes->length > 0 && $enttitlenodes->item(0)->tagName == 'span')
{
$entitle = trim($enttitlenodes->item(0)->nodeValue, " /");
echo $entitle . "\n";
$h->removeChild($enttitlenodes->item(0));
}
$hutitle = $h->nodeValue;
echo $hutitle;
}
See IDEONE demo
try this
<h1>(.*?)<span> /(.*?)/</span>
$1 and $2 are the results as you expected.

Extracting multiple strong tags using PHP Simple HTML DOM Parser

I have over 500 pages (static) containing content structures this way,
<section>
Some text
<strong>Dynamic Title (Different on each page)</strong>
<strong>Author name (Different on each page)</strong>
<strong>Category</strong>
(<b>Content</b> <b>MORE TEXT HERE)</b>
</section>
And I need to extract the data as formatted below, using PHP Simple HTML DOM Parser
$title = <strong>Dynamic Title (Different on each page)</strong>
$authot = <strong>Author name (Different on each page)</strong>
$category = <strong>Category</strong>
$content = (<b>Content</b> <b>MORE TEXT HERE</b>)
I have failed so far and can't get my head around it, appreciate any advice or code snippet to help me going on.
EDIT 1,
I have now solved the part with strong tags using,
$html = file_get_html($url);
$links = array();
foreach($html->find('strong') as $a) {
$content[] = $a->innertext;
}
$title= $content[0];
$author= $content[1];
the only remaining issue is --> How to extract content within parentheses? using similar method?
OK first you want to get all of the tags
Then you want to search through those again for the tags and tags
Something like this:
// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');
$strong = array();
// Find all <sections>
foreach($html->find('section') as $element) {
$section = $element->src;
// get <strong> tags from <section>
foreach($section->find('strong') as $strong) {
$strong[] = $strong->src;
}
$title = $strong[0];
$authot = $strong[1];
$category = $strong[2];
}
To get the parts in parentheses - just get the b tag text and then add the () brackets.
Or if you're asking how to get parts in between the brackets - use explode then remove the closing bracket:
$pieces = explode("(", $title);
$different_on_each_page = str_replace(")","",$pieces[1]);
$html_code = 'html';
$dom = new \DOMDocument();
$dom->LoadHTML($html_code);
$xpath = new \DOMXPath($this->dom);
$nodelist = $xpath->query("//strong");
for($i = 0; $i < $nodelist->length; $i++){
$nodelist->item($i)->nodeValue; //gives you the text inside
}
My final code that works now looks like this.
$html = file_get_html($url);
$links = array();
foreach($html->find('strong') as $a) {
$content[] = $a->innertext;
}
$title= $content[0];
$author= $content[1];
$category = $content[2];
$details = file_get_html($url)->plaintext;
$input = $details;
preg_match_all("/\(.*?\)/", $input, $matches);
print_r($matches[0]);

Find and replace all links in a web page using php/javascript

I need to find links in a part of some html code and replace all the links with two different absolute or base domains followed by the link on the page...
I have found a lot of ideas and tried a lot different solutions.. Luck aint on my side on this one.. Please help me out!!
Thank you!!
This is my code:
<?php
$url = "http://www.oxfordreference.com/views/SEARCH_RESULTS.html?&q=android";
$raw = file_get_contents($url);
$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));
$start = strpos($content,'<table class="short_results_summary_table">');
$end = strpos($content,'</table>',$start) + 8;
$table = substr($content,$start,$end-$start);
echo "{$table}";
$dom = new DOMDocument();
$dom->loadHTML($table);
$dom->strictErrorChecking = FALSE;
// Get all the links
$links = $dom->getElementsByTagName("a");
foreach($links as $link) {
$href = $link->getAttribute("href");
echo "{$href}";
if (strpos("http://oxfordreference.com", $href) == -1) {
if (strpos("/views/", $href) == -1) {
$ref = "http://oxfordreference.com/views/"+$href;
}
else
$ref = "http://oxfordreference.com"+$href;
$link->setAttribute("href", $ref);
echo "{$link->getAttribute("href")}";
}
}
$table12 = $dom->saveHTML;
preg_match_all("|<tr(.*)</tr>|U",$table12,$rows);
echo "{$rows[0]}";
foreach ($rows[0] as $row){
if ((strpos($row,'<th')===false)){
preg_match_all("|<td(.*)</td>|U",$row,$cells);
echo "{$cells}";
}
}
?>
When i run this code i get htmlParseEntityRef: expecting ';' warning for the line where i load the html
var links = document.getElementsByTagName("a"); will get you all the links.
And this will loop through them:
for(var i = 0; i < links.length; i++)
{
links[i].href = "newURLHERE";
}
You should use jQuery - it is excellent for link replacement. Rather than explaining it here. Please look at this answer.
How to change the href for a hyperlink using jQuery
I recommend scrappedcola's answer, but if you dont want to do it on client side you can use regex to replace:
ob_start();
//your HTML
//end of the page
$body=ob_get_clean();
preg_replace("/<a[^>]*href=(\"[^\"]*\")/", "NewURL", $body);
echo $body;
You can use referencing (\$1) or callback version to modify output as you like.

Categories