how to give some words in html a link - php

I have this html as just example
this is some html code, and this is html
this is image <img src="any url with html word" alt="html" />
<iframe src="html"></iframe>
<script type="text/javascript">
var html = "any thing here";
var x = "this is html"
</script>
I want any way to replace all html word with html
As we see it may be in html tag attribute and we must exclude all these chance to replace and just replace this word if it plain text in span or p or div
I tried all dom ways to do that and no way
$dom = new DOMDocument();
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
$query_entries = $xpath->evaluate("(//div | //span | //p)[not(ancestor::a)]/text()");
foreach($query_entries as $element){
if($element instanceof DOMText){
$element->nodeValue = str_replace('html','html',$element->nodeValue);
}
}
When I replace the nodeValue with a html it escape it and if I try to decode it it make errors in js codes
Any regex solution?

Related

How to add text node in new created element with PHP DOMDocument

How can I add content to the newly created tag? For example, I need to create like the following tag:
<script src="https://stackoverflow.com/">
alert("ok");
</script>
I have implemented the following code:
$finalDom = new DOMDocument;
$finalDom->loadHTML("", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$newElement = $finalDom->createElement("script");
$newElement->setAttribute("src", "https://stackoverflow.com/");
$finalDom->appendChild($newElement);
The result of this code is an only empty script tag:
<script src="https://stackoverflow.com/"></script>
You can set the content using $textContent property of DOMElement.
$newElement = $finalDom->createElement("script");
$newElement->setAttribute("src", "https://stackoverflow.com/");
$newElement->textContent = 'alert("ok");';
you can use createTextNode to add text node for the element, as
....
$newElement->setAttribute("src", "https://stackoverflow.com/");
$finalDom->appendChild($newElement);
$newElement->appendChild($finalDom->createTextNode('your text here'));
....
You can use any of these three options for adding text
elem.append(document.createTextNode(text))
elem.innerHTML = text
elem.textContent = text

appendXML stripping out img element

I need to insert an image with a div element in the middle of an article. The page is generated using PHP from a CRM. I have a routine to count the characters for all the paragraph tags, and insert the HTML after the paragraph that has the 120th character. I am using appendXML and it works, until I try to insert an image element.
When I put the <img> element in, it is stripped out. I understand it is looking for XML, however, I am closing the <img> tag which I understood would help.
Is there a way to use appendXML and not strip out the img elements?
$mcustomHTML = "<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></img></div>";
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $content);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$template = $doc->createDocumentFragment();
$template->appendXML($mcustomHTML);
$tag->appendChild($template);
break;
}
}
return $doc->saveHTML();
This should work for you. It uses a temporary DOM document to convert the HTML string that you have into something workable. Then we import the contents of the temporary document into the main one. Once it's imported we can simply append it like any other node.
<?php
$mcustomHTML = '<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></div>';
$customDoc = new DOMDocument();
$customDoc->loadHTML($mcustomHTML, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$doc = new DOMDocument();
$doc->loadHTML($content);
$customImport = $doc->importNode($customDoc->documentElement, true);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$tag->appendChild($customImport);
break;
}
}
return $doc->saveHTML();

how to remove specific url from text in regex

I am in problem to remove specific url from text but keep the text or html tags between the anchor tag. But I cannot remove it. I remove the specific url from the text but, cannot get the text or html between the anchor tag. Here is my code to remove specific url from the text.
preg_replace(|<a [^>]*href="http://www.microsoft.com[^"]*"[^>]*>.*</a>|iU, '', $a)
and Here is the sample
<img src="http://c.s-microsoft.com/en-in/CMSImages/MMD_TCFamily_1006_540x304.jpg?version=ac2c5995-fde2-b40b-3f2a-b6a0baa88250" class="mscom-image feature-image" alt="Learn about Lumia 950 and Lumia 950 XL." width="540" height="304">
I want to get the img tag or any text between that anchor tag having the specific url.
Did I make any mistake in my code. Please correct me. I want this in regex in php Please help me.
Here we go again...
Don't use regexes to parse html, use an html parser, DOMDocument for example:
$html = <<< EOF
<img src="http://c.s-microsoft.com/en-in/CMSImages/MMD_TCFamily_1006_540x304.jpg?version=ac2c5995-fde2-b40b-3f2a-b6a0baa88250" class="mscom-image feature-image" alt="Learn about Lumia 950 and Lumia 950 XL." width="540" height="304"> SOME TEXT
EOF;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query("//a[contains(#href,'microsoft.com')]") as $element ){
$img = $xpath->query('./img',$element)->item(0);
echo $img->getAttribute('src'); // img source
echo $img->getAttribute('alt'); // img alt text
echo $element->textContent; //text inside the a tag
}
//http://c.s-microsoft.com/en-in/CMSImages/MMD_TCFamily_1006_540x304.jpg?version=ac2c5995-fde2-b40b-3f2a-b6a0baa88250
//Learn about Lumia 950 and Lumia 950 XL.
//SOME TEXT
Ideone Demo

PHP DomDocument XPATH does not match to the HTML real structure

I'm trying to validate the following HTML code (
Please note the text content inside IMG tag, which is structurally correct as markup, but invalid as HTML):
<html>
<head>
</head>
<body>
<img src="./">
Some Text
</img>
</body>
</html>
Using PHP and DomDocument, I try to read entire tree with XPATH:
$dom = new DOMDocument();
$dom->validateOnParse = 0;
$dom->loadHTML($htmlSource);
$xpath = new DOMXPath($dom);
$allNodes = $xpath->query("//node()");
The result I get:
/html
/html/head
/html/body
/html/body/#text[1]
/html/body/img
/html/body/#text[2]
which obviously does not match the exact HTML structure.
What I expected to see is
....
/html/body/img/#text
....
Why does XPATH interpret the tree this way?
How can I get it to work as I expected?

How to read the <strong> text and the link url using DOMdocument?

I have this html:
<a href=" URL TO KEEP" class="class_to_check">
<strong> TEXT TO KEEP</strong>
</a>
I have a long html code with many link as above, I have to keep the links that have the <strong> inside, I have to keep the HREF of the link and the text inside the <strong>, how can i do using DOMDocument?
Thank you!
$html = "...";
$dom = new DOMDOcument();
$dom->loadHTML($html);
$xp = new XPath($dom);
$a = $xp->query('//a')->item(0);
$href = $a->getAttribute('href');
$strong = $a->nodeValue;
Of course, this XPath stuff works for just this particular html snippet. You'll have to adjust it to work with a more fully populated HTML tree.

Categories