I have the following code which I am trying to pull out just the text from my wordpress post and be able to echo just the text content in a div. (I am removing the blockquotes, images, etc from the post to be used elsewhere)
<?php
$content = get_the_content();
$content = wpautop($content);
$doc = new DOMDocument();
$doc->loadHTML(get_the_content(), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//blockquote') as $node) {
$node->parentNode->removeChild($node);
}
foreach ($xpath->query('//img') as $node) {
$node->parentNode->removeChild($node);
}
foreach( $xpath->query('//p[not(node())]') as $node ) {
$node->parentNode->removeChild($node);
}
$content = $doc->saveHTML($doc);
?>
<div>
<?php echo $content ?>
</div>
however the content doesn't appear.
I think you are overdoing too much for just retrieving a post. Why not just use the_content(); inside the loop?
get_the_content() does not auto-embed videos, or expand shortcodes, among other things -- and I can see that you are loading it again to an HTML format
try with this
use the_content();
Related
I have this in my php file.
<?php
$str = '<div>
<p>Text</p>
I need this text...
<p>next p</p>
... and this
</div>
';
$dom=new DomDocument();
$dom->loadHTML($str);
$p = $dom->getElementsByTagName('p');
foreach ($p as $item) {
echo $item->nodeValue;
}
This gives me the correct text for the p tags, but I also need the the text between the p tags ("I need this text...", "...and this").
Anyone know how to get the text after the p tag?
Best
Use DOMXPath:
$xpath = new DOMXpath($domDocument);
foreach ($xpath->query('//div/text()') as $textNode) {
echo $textNode->nodeValue;
}
I have the following 2 sets of code (Wordpress) using regex, but I was told that it's a bad practice.
I am using it in 2 ways:
To ake out the blockquote and images from the post and just display the text.
To essentially do the opposite and display just the images.
Looking to write it in the proper more acceptable/cross browser form.
html (display text):
<?php
$content = preg_replace('/<blockquote>(.*?)<\/blockquote>/', '', get_the_content());
$content = preg_replace('/(<img [^>]*>)/', '', $content);
$content = wpautop($content); // Add paragraph-tags
$content = str_replace('<p></p>', '', $content); // remove empty paragraphs
echo $content;
?>
html (display images):
<?php
preg_match_all('/(<img [^>]*>)/', get_the_content(), $images);
for( $i=0; isset($images[1]) && $i < count($images[1]); $i++ ) {
if ($i == end(array_keys($images[1]))) {
echo sprintf('<div id="last-img">%s</div>', $images[1][$i]);
continue;
}
echo $images[1][$i];
}
?>
You can use the answer from here: Strip Tags and everything in between
The point is to use a parser, rather than roll-your-own regex that might be buggy.
$content = get_the_content();
$content = wpautop($content);
$doc = new DOMDocument();
$doc->loadHTML(get_the_content(), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//blockquote') as $node) {
$node->parentNode->removeChild($node);
}
foreach ($xpath->query('//img') as $node) {
$node->parentNode->removeChild($node);
}
foreach( $xpath->query('//p[not(node())]') as $node ) {
$node->parentNode->removeChild($node);
}
$content = $doc->saveHTML($doc);
You may find that php DOMDocument has wrapped your html fragment in <html> tags, in which case look at How to saveHTML of DOMDocument without HTML wrapper?
The part that removes empty p tags is from Remove empty tags from a XML with PHP
I want to parse html using the php.
My html file is like this
<div class="main">
<div class="text">
Welcom to Stackoverflow
</div>
</div>
now i want to extract the only this part
<div class="text">
Welcom to Stackoverflow
</div>
for this i create the code like this
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="main"]');
foreach ($tags as $tag) {
var_dump(trim($tag->nodeValue));
}
this code gives only the
Welcom to Stackoverflow
but i want the tag also. how to do this??
If you only want to have the div with class "text" try this:
Change your query to: $xpath->query('//div[#class="text"]');
For the output you need: echo $dom->saveHTML( $tag );
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="text"]');
foreach ($tags as $tag) {
echo $dom->saveHTML( $tag );
}
The Querypath library for html/xml parsing makes such things much much easier.
I have a var of a HTTP (craigslist) link $link, and put the contents into $linkhtml. In this var is the HTML code for a craigslist page, $link.
I need to extract the text between <h2> and </h2>. I could use a regexp, but how do I do this with PHP DOM? I have this so far:
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
What do I do next to put the contents of the element <h2> into a var $title?
if DOMDocument looks complicated to understand/use to you, then you may try PHP Simple HTML DOM Parser which provides the easiest ever way to parse html.
require 'simple_html_dom.php';
$html = '<h1>Header 1</h1><h2>Header 2</h2>';
$dom = new simple_html_dom();
$dom->load( $html );
$title = $dom->find('h2',0)->plaintext;
echo $title; // outputs: Header 2
You can use this code:
$linkhtml= file_get_contents($link);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($linkhtml); // loads your html
$xpath = new DOMXPath($doc);
$h2text = $xpath->evaluate("string(//h2/text())");
// $h2text is your text between <h2> and </h2>
You can do this with XPath: untested, may contain errors
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("/html/body/h2");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
My PHP code
$dom = new DOMDocument();
#$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="text"]');
foreach ($tags as $tag) {
echo $tag->textContent;
}
What I'm trying to do here is to get the content of the div that has class 'text' but the problem when I loop and echo the results I only get the text I can't get the HTML code with images and all the HTML tags like p, br, img... etc i tried to use $tag->nodeValue; but also nothing worked out.
Personally, I like Simple HTML Dom Parser.
include "lib.simple_html_dom.php"
$html = str_get_html($file);
foreach($html->find('div.text') as $e){
echo $e->innertext;
}
Pretty simple, huh? It accommodates selectors like jQuery :)
What you need to do is create a temporary document, add the element to that and then use saveHTML():
foreach ($tags as $tag) {
$doc = new DOMDocument;
$doc->appendChild($doc->importNode($tag, true));
$html = $doc->saveHTML();
}
I found this snippet at http://www.php.net/manual/en/class.domelement.php:
<?php
function getInnerHTML($Node)
{
$Body = $Node->ownerDocument->documentElement->firstChild->firstChild;
$Document = new DOMDocument();
$Document->appendChild($Document->importNode($Body,true));
return $Document->saveHTML();
}
?>
Not sure if it works though.