appendXML stripping out img element - php

I need to insert an image with a div element in the middle of an article. The page is generated using PHP from a CRM. I have a routine to count the characters for all the paragraph tags, and insert the HTML after the paragraph that has the 120th character. I am using appendXML and it works, until I try to insert an image element.
When I put the <img> element in, it is stripped out. I understand it is looking for XML, however, I am closing the <img> tag which I understood would help.
Is there a way to use appendXML and not strip out the img elements?
$mcustomHTML = "<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></img></div>";
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $content);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$template = $doc->createDocumentFragment();
$template->appendXML($mcustomHTML);
$tag->appendChild($template);
break;
}
}
return $doc->saveHTML();

This should work for you. It uses a temporary DOM document to convert the HTML string that you have into something workable. Then we import the contents of the temporary document into the main one. Once it's imported we can simply append it like any other node.
<?php
$mcustomHTML = '<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></div>';
$customDoc = new DOMDocument();
$customDoc->loadHTML($mcustomHTML, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$doc = new DOMDocument();
$doc->loadHTML($content);
$customImport = $doc->importNode($customDoc->documentElement, true);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$tag->appendChild($customImport);
break;
}
}
return $doc->saveHTML();

Related

Not able to get image src using regex

I am using below regex to append a element in front of image tag, but it's not working. I took this code from Add link around img tags with regexp
preg_replace('#(<img[^>]+ src="([^"]*)" alt="[^"]*" />)#', '<a href="$2" ...>$1</a>', $str)
However, If I use below code without src, it works.
preg_replace('#(<img[^>]+ alt="[^"]*" />)#', '<a href="" ...>$1</a>', $str)
Any reason why I am not able to get the src from the image tag.
My image tag is <img src="" alt="">
A better way to do something like this is to use PHP's DOMDocument class as it is independent of how people write their HTML (e.g. putting the alt attribute before the src attribute). Something like this would work for your case:
$html = '<div id="x"><img src="/images/xyz" alt="xyz" /><p>hello world!</p></div>';
$doc = new DomDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DomXPath($doc);
$images = $xpath->query('//img');
foreach ($images as $img) {
// create a new anchor element
$a = $doc->createElement('a');
// copy the img src attribute to the a href attribute
$a->setAttribute('href', $img->attributes->getNamedItem('src')->nodeValue);
// add the a to the images parent
$img->parentNode->replaceChild($a, $img);
// make the image a child of the <a> element
$a->appendChild($img);
}
echo $doc->saveHTML();
Output:
<div id="x"><img src="/images/xyz" alt="xyz"><p>hello world!</p></div>
Demo on 3v4l.org

Regex extract image links

I am reading a html content. There are image tags such as
<img onclick="document.location='http://abc.com'" src="http://a.com/e.jpg" onload="javascript:if(this.width>250) this.width=250">
or
<img src="http://a.com/e.jpg" onclick="document.location='http://abc.com'" onload="javascript:if(this.width>250) this.width=250" />
I tried to reformat this tags to become
<img src="http://a.com/e.jpg" />
However i am not successful. The codes i tried to build so far is like
$image=preg_replace('/<img(.*?)(\/)?>/','',$image);
anyone can help?
Here's a version using DOMDocument that removes all attributes from <img> tags except for the src attribute. Note that doing a loadHTML and saveHTML with DOMDocument can alter other html as well, especially if that html is malformed. So be careful - test and see if the results are acceptable.
<?php
$html = <<<ENDHTML
<!doctype html>
<html><body>
<img onclick="..." src="http://a.com/e.jpg" onload="...">
<div><p>
<img src="http://a.com/e.jpg" onclick="..." onload="..." />
</p></div>
</body></html>
ENDHTML;
$dom = new DOMDocument;
if (!$dom->loadHTML($html)) {
throw new Exception('could not load html');
}
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//img') as $img) {
// unfortunately, cannot removeAttribute() directly inside
// the loop, as this breaks the attributes iterator.
$remove = array();
foreach ($img->attributes as $attr) {
if (strcasecmp($attr->name, 'src') != 0) {
$remove[] = $attr->name;
}
}
foreach ($remove as $attr) {
$img->removeAttribute($attr);
}
}
echo $dom->saveHTML();
Match one at a time then concat string, I am unsure which language you are using so ill explain in pseudo:
1.Find <img with regex place match in a string variable
2.Find src="..." with src=".*?" place match in a string variable
3.Find the end /> with \/> place match in a string variable
4.Concat the variables together

Obtain text between div tags within only certain span IDs with DOMdocument or SimpleDOM

this is my code:-
<?php
$url = file_get_contents("http://www.youtube.com/watch?v=QR8A3T6sPzU");
$doc = new DOMDocument();
#$doc->loadHTML($url);
$xpath = new DOMXPath($doc);
$myNews = $xpath->query('//#id="watch7-views-info"')->item(0);
echo $myNews;
?>
how to get the all text between div tags within only certain span IDs...
thanks
I'd use simpleHtmlDOM (simpleHtmlDOM):
include 'simpledom.php';
$html = file_get_html('http://www.youtube.com/watch?v=QR8A3T6sPzU');
// Find all divs with id watch7-views-info and echo their contents
foreach($html->find('div[id=watch7-views-info]') as $element)
echo $element->plaintext . '<br>';
This would find all divs with the specific id, you mentioned something about spans, but you'll have to elaborate because I don't know what you mean.

PHP DOM get nodevalue html? (without stripping tags)

I am trying to get the innerhtml of div tags in a file using nodeValue, however this code is outputting only plain text and seems to strip out all html tag from inside the div. How can I change this code to output the div's HTML content and not plain text, AND also output the main div wrapping it's child elements.
Example:
contents of file.txt:
<div class="1"><span class="test">text text text</span></div>
<div class="2"><span class="test">text text text</span></div>
<div class="3"><span class="test">text text text</span></div>
script.php:
$file= file_get_contents('file.txt');
$doc = new DOMDocument();
#$doc->loadHTML('<?xml encoding="UTF-8">'.$file);
$entries = $doc->getElementsByTagName('div');
for ($i=0;$i<$entries->length;$i++) {
$entry = $entries->item($i);
echo $entry->nodeValue;
}
outputs: text text texttext text texttext text text
what I need it to output:
<div class="1"><span class="test">text text text</span></div>
<div class="2"><span class="test">text text text</span></div>
<div class="3"><span class="test">text text text</span></div>
Notice the parent div's (..etc) are needed to be outputted as well wrapping the span tags...
HELP!
I have never done what you're attempting to do, but as a stab in the dark, using the API docs, does echo $entry->textContent; work?
Adding an update. This is from the comments located on the docs page for DOMNode:
Hi!
Combining all th comments, the easiest way to get inner HTML of the node is to use this function:
<?php function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML; } ?>
Or, maybe a simpler method is just to do:
echo $domDocument->saveXML($entry);
Instead of:
echo $entry->nodeValue;
You have to use:
echo $doc->saveXML($entry);
Here is a more complete example that might help others too, $doccontent is the HTML block as a string:
$doccontent = '<html> …'; // your html string
$dom = new DOMDocument;
$internalErrors = libxml_use_internal_errors(true); // prevent error messages
$content_utf = mb_convert_encoding($doccontent, 'HTML-ENTITIES', 'UTF-8'); // correct parsing of utf-8 chars
$dom->loadHTML($content_utf);
libxml_use_internal_errors($internalErrors); // prevent error messages
$specialdiv = $dom->getElementById('xdiv');
if(isset($specialdiv))
{
echo $dom->saveXML($specialdiv);
}

PHP dom to get tag class with multiple css class name

I have difficulties to get second link href and Text. How to select class="secondLink SecondClass". Using PHP Dom, Thank you
<td class="pos" >
<a class="firstLink" href="Search/?List=200003000112097&sr=1" >
Firs link value
</a>
<br />
<a class="secondLink SecondClass" href="/Search/?KeyOpt=ALL" >
Second Link Value
</a>
</td
My code is
// parse the html into a DOMDocument
$dom = new DOMDocument();
#$dom->loadHTML($html);
/*** discard white space ***/
$dom->preserveWhiteSpace = false;
// grab all the on the page
$xpath = new DOMXPath($dom);
//$hrefs = $xpath->evaluate("/html/body//a[#class='firstLink']");// its working
$hrefs = $xpath->evaluate("/html/body//a[#class='secondLink SecondClass']");// not working
Thank you
$hrefs = $xpath->evaluate("/html/body//a[contains(concat(' ',#class,' '),' secondClass ')
and (contains(concat(' ',#class,' '),' secondLink '))]"
from this answer
you can pick it by selecting your td having class pos and selecting anchor tags. then you cann control your returing array to get your specific anchor tag

Categories