Get post_content as a string - php

I have an array of posts from an WPQuery by doing
$query = new WPQuery(.....)
$array_of_posts = $query->post_content
However this is returning the content as HTML as with some other stuff too
<!-- wp:paragraph -->
<p>the text</p>
<!-- /wp:paragraph -->
And then i want to display this as content for each post in the array
foreach($array_of_posts as $post){
<h1> echo $post->post_content </h1>
But of course this just gives me a h1 with the html and other stuff. How can i just get the string?
Also this code is just pseudo code i know the syntax is wrong

You're looking wp_strip_all_tags()
Source # https://developer.wordpress.org/reference/functions/wp_strip_all_tags/
Properly strip all HTML tags including script and style. This differs from strip_tags() because it removes the contents of the <script> and <style> tags. E.g. strip_tags( '<script>something</script>' ) will return ‘something’. wp_strip_all_tags will return ‘’
echo wp_strip_all_tags( the_content() );
Alternatively , you could use remove_filter('term_description','wpautop'); in your function.php to remove the <p> tags.

Related

PHP Strip all content around text

I have text that looks like this or a billion variant of this, for example:
<div>content goes here... </div><div style="some style..."><span style="some styles..."><strong>[END_CONTACT]</strong></span></div><div>content goes here... </div>
<div>content goes here... </div><div style="other style..."><span style="other styles..."><strong>[END_CONTACT]</strong></span></div><div>content goes here... </div>
<div>content goes here... </div><div style="random stuff..."><span style="random stuff..."><strong>[END_CONTACT]</strong></span></div><div>content goes here... </div>
and a billion variations of this...
I want to be able to remove any variation of the text surrounding [END_CONTACT] so that all I am left with this is this:
<div>content goes here... </div><div>[END_CONTACT]</div><div>content goes here... </div>
How do I strip the content between the opening div tag and [END_CONTACT] and the content between [END_CONTACT] and the ending div tag?
Thanks
Use regular expressions! The following example using preg_replace will work as long as your content doesn't contain angle brackets, which you should not put in HTML.
$result = preg_replace('#<div\b[^>]*><span\b[^>]*><strong\b[^>]*>([^<]*)</strong></span></div>#i', '<div>$1</div>', $html);
How do I strip the content between the opening div tag and [END_CONTACT] and the content between [END_CONTACT] and ending div tag?
If the terms [END_CONTACT] and the <div> tag are always present, you can use PCRE REGEX in preg_replace():
$string = preg_replace('/<div[^>]*>.*\[END_CONTACT\].*<\/div>/i','<div>[END_CONTACT]</div>',$string);
Example:
$data = [];
$data[] = 'some text <div style="some style..."><span style="some styles..."><strong>[END_CONTACT]</strong></span></div>';
$data[] = 'somrthing else etc.<div style="other style..."><span style="other styles..."><strong>[END_CONTACT]</strong></span></div>';
$data[] = '<div style="random stuff..."><span style="random stuff..."><strong>[END_CONTACT]</strong></span></div>';
$data[] = 'and a billion variations of this...';
foreach ($data as $row){
$string = preg_replace('/<div[^>]*>.*\[END_CONTACT\].*<\/div>/i','<div>[END_CONTACT]</div>',$row);
print $string."<BR>";
}
Output:
<div>[END_CONTACT]</div>
<div>[END_CONTACT]</div>
<div>[END_CONTACT]</div>
and a billion variations of this...
UPDATE:
Sorry, wasn't clear about that in my original post. Is there any way to keep text or code outside of the string in question but still do the operation as you've suggested?
Try this Regex in the above PHP code:
(?!<div).(<div[^>]*>.*\[END_CONTACT\][^\div]*<\/div>)
Example:
content content content... <div style="random stuff..."><span style="random stuff..."><strong>[END_CONTACT]</strong></span></div> content content content
Output:
content content content... <div>[END_CONTACT]</div> content content content
NOTE:
It must be stated that you should use a DOM parser to work with HTML elements in complex compositions rather than Regex.
I have tested my answer and it does what is desired. And as stated above, what you should be using to deal with multilayered complex HTML is a proper PHP DOM Parser.

Wordpress: preg_replace inside a loop only works occasionally

I'm trying to make a custom RSS feed with some alteration to the HTML content of each post.
Inside the template file rss-custom.php I have this:
<?php while (have_posts()) : the_post(); ?>
<?php echo processPostContent(); ?>
<?php endwhile; ?>
in functions.php, there are three replacements as follows :
function processPostContent() {
$post = get_post(get_the_ID());
$post_content = strval($post->post_content);
// replace h3 and h4 tags with h2
$post_content = preg_replace('/<(\/?)h((?![12])\d)/im', "<$1h2", $post_content);
// strip every attribute of <img> other than src
$post_content = preg_replace('/<img[^>]*(src="[^"]*")[^>]*>/im', "<img $1 />", $post_content);
// insert text after some closing tags
$post_content = preg_replace('/<\/(h2|p|figure)>/im', "</$1><p>Inserted</p>", $post_content);
return $post_content;
}
Then I get a strange result: out of 20 posts, only 7-8 of them will have been fully replaced. The remaining get the first two replacements but not the third one. Does anyone know why that is?
The solution, turns out, doesn't have anything to do with the loop nor preg_replace. Some posts' contents do not include any HTML tag, only plain text. That's why preg_replace didn't have any effect on them. When those contents are rendered in the RSS feed, however, <p> tags are automatically inserted. That's what led me to believe the third replacement was skipped.
First paragraph.
Second paragraph.
is turned to
<p>First paragraph.</p>
<p>Second paragraph.</p>

How can i match and replace every character of (or between) different Nodes that has similar tagName? [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
I am trying to replace every character (including newline, tabs, whitespace etc) between Nodes that has the same tag name. The problem is that the regex matches the different node (string) as one based on similarity between the beginning and closing tags of the nodes and then output a single result.
For Example:
$html_string = "
<div> Below are object Node with the html code </div>
<script> alert('i want this to be replaced. it has no newline'); </script>
<div> I don't want this to be replaced </div>
<script>
console.log('i also want this to be replaced. It has newline');
</script>
<div> This is a div tag and not a script, so it should not be replaced </div>
<script> console.warn(Finally, this should be replaced, it also has newline');
</script>
<div> The above is the final result of the replacements </div> ";
$regex = '/(?:\<script\>)(.*)?(?:\<\/script\>)/ims';
$result = preg_replace($regex, '<!-- THIS SCRIPT CONTENT HERE HAS BEEN ALTERED -->', $html_string);
echo $result;
Expected Result:
<div> Below are object Node with the html code </div>
<!-- THIS SCRIPT CONTENT HERE HAS BEEN ALTERED -->
<div> I don't want this to be replaced </div>
<!-- THIS SCRIPT CONTENT HERE HAS BEEN ALTERED -->
<div> This is a div tag and not a script, so it should not be replaced </div>
<!-- THIS SCRIPT CONTENT HERE HAS BEEN ALTERED -->
<div> The above is the final result of the replacements </div>
Actual Output:
<div> Below are object Node with the html code </div>
<!-- THIS SCRIPT CONTENT HERE HAS BEEN ALTERED -->
<div> The above is the final result of the replacements </div>
How can i sort this out. Thanks in advance.
Using DOMDocument is generally preferable to trying to parse HTML with regex. Based on your question, this will give you the results you want. It finds each script node in the HTML and replaces it with the comment you specified:
$doc = new DOMDocument();
$doc->loadHTML("<html>$html_string</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//script') as $script) {
$comment = $doc->createComment('THIS SCRIPT CONTENT HERE HAS BEEN ALTERED');
$script->parentNode->replaceChild($comment, $script);
}
echo substr($doc->saveHTML(), 6, -8);
Note that because you don't have a top-level element in the HTML, one (<html>) has to be added on read and then removed on output (using substr).
Output:
<div> Below are object Node with the html code </div>
<!--THIS SCRIPT CONTENT HERE HAS BEEN ALTERED-->
<div> I don't want this to be replaced </div>
<!--THIS SCRIPT CONTENT HERE HAS BEEN ALTERED-->
<div> This is a div tag and not a script, so it should not be replaced </div>
<!--THIS SCRIPT CONTENT HERE HAS BEEN ALTERED-->
<div> The above is the final result of the replacements </div>
Demo on 3v4l.org
If you insist on using regex (but you should read this before you do), the problem with your regex lies in this part:
(.*)?
This looks for an optional string of as many characters as possible, leading up to </script>. So it basically absorbs all the characters between the first <script> and the last </script> (because all the characters in </script> match .). What you actually wanted was (.*?) which is non-greedy and so matches only up to the first </script> i.e.
$regex = '/(?:\<script\>)(.*?)(?:\<\/script\>)/ims';
$result = preg_replace($regex, '<!-- THIS SCRIPT CONTENT HERE HAS BEEN ALTERED -->', $html_string);
echo $result;
The output from this is as you require.
Demo on 3v4l.org

Remove the opening and closing div inside WordPress content

I want to remove a div added by a plugin to the content of WordPress posts. So the post has this structure:
<div class="post">
<div class="some-class">
<p>content</p>
</div>
</div>
I want to remove <div class="some-class"> and its closing </div> but leave the content. So it would be:
<div class="post">
<p>content</p>
</div>
using this filter:
add_filter( 'the_content', 'remove_class' , 100 );
function remove_class( $content ) {
$content = preg_replace('#<div[^>]*class="some-class"[^>]*>.*?</div>#is', '', $content);
return $content;
}
the content is also deleted, I just want the div and the closing div to be deleted. Any idea how?
this question is not duplicate of the other question because I want to remove a specific div not just all divs
You could just try to remove class attribute, so that only <div> is left, using code like this:
add_filter( 'the_content', 'remove_class' , 100 );
function remove_class( $content ) {
$content = preg_replace('/class=".*?"/', '', $content);
return $content;
}
#user7592255 you can try with jQuery like this:
$('p').unwrap();
If you can set an id or class on the p element you can target it more accurately
The content is removed because you replace the entire matched string with an empty string. Use a subpattern to capture the content of the <div> element and use it as replacement:
$content = preg_replace(
'#<div[^>]*class="some-class"[^>]*>(.*?)</div>#is',
'$1',
$content
);
However, be aware that it won't work properly if the content of <div class="some-class"> contains a <div> element.
There is no way to parse HTML using regex. The correct solution is to use an HTML parser (DOMDocument f.e.) to parse the HTML fragment and create its DOM, then operate the changes on the DOM and render it back to HTML.

Wordpress retain formatting when calling extended content?

I am calling in content in Wordpress via the below code. Eseentially, I am dividing the content of the post into three sections; 1. Before the tag, 2. After the tag and 3. Post gallery. The code I have so far works perfectly to get the content, however I am having an issue as all formatting tags (p in particular) are being stripped. Is there a way to retain these?
Thanks
<?php
// Fetch post content
$content = get_post_field( 'post_content', get_the_ID() );
// Get content parts
$content_parts = get_extended( $content );
?>
<p>
<?php echo $content_parts['main']; // Output content before <!--more--> ?>
</p>
<p class="read-more">
<?php echo strip_shortcodes($content_parts['extended']); // Output content after <!--more--> ?>
</p>
<button>Read More</button>
<?php $gallery = get_post_gallery_images( $post ); ?>
When you pull the post content using get_post_field, the autop filter is not applied:
http://codex.wordpress.org/Function_Reference/wpautop
You can apply all of the content filters yourself by adding this line after you set $content:
$content = apply_filters('the_content', $content);

Categories