Truncate WordPress post content without losing HTML formatting

Truncate WordPress post content without losing HTML formatting - php

I'm working on a WordPress theme where I need to truncate the post at a certain number of words. I understand how to use the_excerpt(), however this strips out all the paragraph breaks, links, etc. which is NOT the desired effect. I tried using jQuery Succinct and applying that to the_content() -- that maintained the formatting, but it cut off in the middle of a paragraph so I had an open <p> that then broke the rest of the layout. The client does not want to use the option to manually insert a "more" tag into the post.
Is there a way I can do this either via PHP or jQuery?

You have to create your own excerpt function. I have written one that keeps all html tags in tact and also cut the excerpt at the end of a sentence just after the chosen amount of words.
You need to remove the original excerpt filter first and add your new one. Add this to your functions.php
remove_filter('get_the_excerpt', 'wp_trim_excerpt');
add_filter('get_the_excerpt', 'pietergoosen_custom_wp_trim_excerpt');
Now add this below
function pietergoosen_custom_wp_trim_excerpt($pietergoosen_excerpt) {
global $post;
$raw_excerpt = $pietergoosen_excerpt;
if ( '' == $pietergoosen_excerpt ) {
$pietergoosen_excerpt = get_the_content('');
$pietergoosen_excerpt = strip_shortcodes( $pietergoosen_excerpt );
$pietergoosen_excerpt = apply_filters('the_content', $pietergoosen_excerpt);
$pietergoosen_excerpt = str_replace(']]>', ']]>', $pietergoosen_excerpt);
//Set the excerpt word count and only break after sentence is complete.
$excerpt_word_count = 75;
$excerpt_length = apply_filters('excerpt_length', $excerpt_word_count);
$tokens = array();
$excerptOutput = '';
$count = 0;
// Divide the string into tokens; HTML tags, or words, followed by any whitespace
preg_match_all('/(<[^>]+>|[^<>\s]+)\s*/u', $pietergoosen_excerpt, $tokens);
foreach ($tokens[0] as $token) {
if ($count >= $excerpt_word_count && preg_match('/[\?\.\!]\s*$/uS', $token)) {
// Limit reached, continue until ? . or ! occur at the end
$excerptOutput .= trim($token);
break;
}
// Add words to complete sentence
$count++;
// Append what's left of the token
$excerptOutput .= $token;
}
$pietergoosen_excerpt = trim(force_balance_tags($excerptOutput));
$excerpt_end = ' ' . ' » ' . sprintf(__( 'Read more about: %s »', 'pietergoosen' ), get_the_title()) . '';
$excerpt_more = apply_filters('excerpt_more', ' ' . $excerpt_end);
$pos = strrpos($pietergoosen_excerpt, '</');
if ($pos !== false)
// Inside last HTML tag
$pietergoosen_excerpt = substr_replace($pietergoosen_excerpt, $excerpt_end, $pos, 0);
else
// After the content
$pietergoosen_excerpt .= $excerpt_end;
return $pietergoosen_excerpt;
}
return apply_filters('pietergoosen_custom_wp_trim_excerpt', $pietergoosen_excerpt, $raw_excerpt);
}

Related

End excerpts with a full sentence for specific post types

I am trying to make excerpts end with a sentence, for a specific post type on my website, but for some reason, it is also effecting page excerpts and I cannot understand why.
function vhr_variable_length_excerpt($text, $w_length, $finish_sentence){
global $post;
if ( $post->post_type == 'poi' ) {
//Word length of the excerpt. This is exact or NOT depending on your '$finish_sentence' variable.
$w_length = 20; /* Change the Length of the excerpt. The Length is in words. */
//1 if you want to finish the sentence of the excerpt (No weird cuts).
$finish_sentence = 1; // Put 0 if you do NOT want to finish the sentence.
$tokens = array();
$out = '';
$word = 0;
//Divide the string into tokens; HTML tags, or words, followed by any whitespace.
$regex = '/(<[^>]+>|[^<>\s]+)\s*/u';
preg_match_all($regex, $text, $tokens);
foreach ($tokens[0] as $t){
//Parse each token
if ($word >= $w_length && !$finish_sentence){
//Limit reached
break;
}
if ($t[0] != '<'){
//Token is not a tag.
//Regular expression that checks for the end of the sentence: '.', '?' or '!'
$regex1 = '/[\?\.\!]\s*$/uS';
if ($word >= $w_length && $finish_sentence && preg_match($regex1, $t) == 1){
//Limit reached, continue until ? . or ! occur to reach the end of the sentence.
$out .= trim($t);
break;
}
$word++;
}
//Append what's left of the token.
$out .= $t;
}
return trim(force_balance_tags($out));
}
}
function vhr_excerpt_filter($text){
global $post;
if ( $post->post_type == 'poi' ) {
//Get the full content and filter it.
$text = get_the_content('');
$text = strip_shortcodes( $text );
$text = apply_filters('the_content', $text);
$text = str_replace(']]>', ']]>', $text);
//If you want to Allow SOME tags:
$allowed_tags = '<p>,<a>,<strong>,<b>'; /* Here I am allowing p, a, strong tags. Separate tags by comma. */
$text = strip_tags($text, $allowed_tags);
//Create the excerpt.
$text = vhr_variable_length_excerpt($text, $w_length, $finish_sentence);
return $text;
}
}
//Hooks the 'vhr_excerpt_filter' function to a specific (get_the_excerpt) filter action.
add_filter('get_the_excerpt', 'vhr_excerpt_filter', 5);
It doesn't effect any of my other custom post types, just the one I specify in the function, and then all pages on the website. It still effects my pages, even if I change the logic to something like = poi && != page as well. Any ideas why this would also be effecting pages? Is there an easier way to make this happen?
Thanks!

There is a multitudes of ways we can approach it. Taking the time to write a custom excerpt instead on relying on Wordpress is one of them...
We can count sentences by targeting end-of-sentence period.
function get_sentence_tally_excerpt( $content = '', $tally = 2, $stitches = '' ) {
$buffer = array_slice( explode( '.', sanitize_text_field( $content ) ), 0, $tally );
$filter = array_filter( array_map( 'trim', $buffer ), 'strlen' );
$excerpt = join( '. ', $filter ) . '.';
return esc_attr( $excerpt . $stitches );
};
You can specify what type of content should be 'truncated' and by how many sentences. On the front-end we can call our function get_tally_excerpt() like so:
<?= get_sentence_tally_excerpt( get_the_content() ); ?> //... 2 sentences by DEFAULT
<?= get_sentence_tally_excerpt( get_the_content(), 1 ); ?> //... 1 sentences ONLY
<?= get_sentence_tally_excerpt( get_the_content(), 5, '[...]' ); ?> //... 5 sentences ONLY with stitches at the end.

The function processes only the last pattern sought. What is wrong?

/**
* Quick Links for ACF
*/
function replace_text($content) {
$quick_links = get_field('quick_links', 'option');
if($quick_links && is_singular('post')) {
foreach($quick_links as $item) {
$word = $item['word_quick_links'];
$link = $item['link_quick_links'];
$preg_replace = preg_replace('/\b'.preg_quote($word, '/').'\b/', '' . $word . '', $content, 1);
}
return $preg_replace;
} else {
return $content;
}
}
add_filter('the_content', 'replace_text', 20 );
In the preg_replace() function, the last argument is limit - the maximum possible number of replacements of each template for each subject line. By default it is equal to -1 (without restrictions).
What is my mistake, why does the function process only one last sought-for template?

In your inner loop which replaces the text in the content, you always start off with the original text ($content) and return a new string ($preg_replace)...
$preg_replace = preg_replace('/\b'.preg_quote($word, '/').'\b/', '' . $word . '', $content, 1);
You should instead put the result back into the original content so that the next loop will add to the replacements rather than get a new string (so put the new value back into $content)...
$content = preg_replace('/\b'.preg_quote($word, '/').'\b/',
'' . $word . '',
$content, 1);
and then return this value (you could always return $content...
return $content;

Insert text in content after 300 words but after closing tag of a Paragraph

I am looking for a way to insert an ad or text after X amount of words and after the closing tag of the paragraph the last word appears in.
So far, I have only been able to do this after the X amount of characters. The problem with this approach is that HTML characters are counted which gives inaccurate results.
function chars1($content) {
// only inject google ads if post is longer than 2500 characters
$enable_length1 = 2500;
// insert after the 210th character
$after_character1 = 2100;
if (is_single() && strlen($content) > $enable_length1) {
$before_content1 = substr($content, 0, $after_character1);
$after_content1 = substr($content, $after_character1);
$after_content1 = explode('</p>', $after_content1);
ob_start();
dynamic_sidebar('single-image-ads-1');
$text1 = ob_get_contents();
ob_end_clean();
array_splice($after_content1, 1, 0, $text1);
$after_content1 = implode('', $after_content1);
return $before_content1 . $after_content1;
} else {
return $content;
}
}
//add filter to WordPress with priority 49
add_filter('the_content', 'chars1',49);
Another approach I have tried is using:
strip_tags($content)
and counted the words using:
st_word_count()
The problem with this is that I have no way of returning the $content with the HTML tags
Depending on the size of the post, I will insert up to 5 ad units, with the functions I have above I would need to create a function for each ad. If there is a way to insert all 5 ads using one function that would be great.
Any help is appreciated.

Deciding what is a word or not can oftentimes be very hard. But if you're alright with an approximate solution, like defining a word as text between two whitespaces, I suggest you implement a simple function yourself.
This may be achieved by iterating over the characters of the string until 150 words are counted and then jumping to the end of the current paragraph. Insert an ad and then repeat until you've added sufficiently many.
Implementing this in your function might look like this
function chars1($content) {
// only inject google ads if post is longer than 2500 characters
$enable_length1 = 2500;
// Insert at the end of the paragraph every 300 words
$after_word1 = 300;
// Maximum of 5 ads
$max_ads = 5;
if (strlen($content) > $enable_length1) {
$len = strlen($content);
$i=0;
// Keep adding untill end of content or $max_ads number of ads has ben inserted
while($i<$len && $max_ads-->0) {
// Work our way untill the apropriate length
$word_cout = 0;
$in_tag = false;
while(++$i < $len && $word_cout < $after_word1) {
if(!$in_tag && ctype_space($content[$i])) {
// Whitespace
$word_cout++;
}
else if(!$in_tag && $content[$i] == '<') {
// Begin tag
$in_tag = true;
$word_cout++;
}
else if($in_tag && $content[$i] == '>') {
// End tag
$in_tag = false;
}
}
// Find the next '</p>'
$i = strpos($content, "</p>", $i);
if($i === false) {
// No more paragraph endings
break;
}
else {
// Add the length of </p>
$i += 4;
// Get ad as string
ob_start();
dynamic_sidebar('single-image-ads-1');
$ad = ob_get_contents();
ob_end_clean();
$content = substr($content, 0, $i) . $ad . substr($content, $i);
// Set the correct i
$i+= strlen($ad);
}
}
}
return $content;
}
With this approach, it's easy to add new rules.

I've just had to do this myself. This is how I did it. First explode the content on </p> tags. Loop over the resulting array, put the end </p> back onto the paragraph, do a count on the paragraph with the tags stripped and add it to the global count. Compare the global word count against our word positions. If it's greater, append the content and unset that word position. Stringify and return.
function insert_after_words( $content, $words_positions = array(), $content_to_insert = 'Insert Me' ) {
$total_words_count = 0;
// Explode content on paragraphs.
$content_exploded = explode( '</p>', $content );
foreach ( $content_exploded as $key => $content ) {
// Put the paragraph tags back.
$content_exploded[ $key ] .= '</p>';
$total_words_count += str_word_count( strip_tags( $content_exploded[ $key ] ) );
// Check the total word count against the word positoning.
foreach ( $words_positions as $words_key => $words_count ) {
if ( $total_words_count >= $words_count ) {
$content_exploded[ $key ] .= PHP_EOL . $content_to_insert;
unset( $words_positions[ $words_key ] );
}
}
}
// Stringify content.
return implode( '', $content_exploded );
}

Place content in between paragraphs without images

I am using the following code to place some ad code inside my content .
<?php
$content = apply_filters('the_content', $post->post_content);
$content = explode (' ', $content);
$halfway_mark = ceil(count($content) / 2);
$first_half_content = implode(' ', array_slice($content, 0, $halfway_mark));
$second_half_content = implode(' ', array_slice($content, $halfway_mark));
echo $first_half_content.'...';
echo ' YOUR ADS CODE';
echo $second_half_content;
?>
How can i modify this so that the 2 paragraphs (top and bottom) enclosing the ad code should not be the one having images. If the top or bottom paragraph has image then try for next 2 paragraphs.
Example: Correct Implementation on the right.

preg_replace version
This code steps through every paragraph ignoring those that contain image tags. The $pcount variable is incremented for every paragraph found without an image, if an image is encountered however, $pcount is reset to zero. Once $pcount reaches the point where it would hit two, the advert markup is inserted just before that paragraph. This should leave the advert markup between two safe paragraphs. The advert markup variable is then nullified so only one advert is inserted.
The following code is just for set up and could be modified to split the content differently, you could also modify the regular expression that is used — just in case you are using double BRs or something else to delimit your paragraphs.
/// set our advert content
$advert = '<marquee>BUY THIS STUFF!!</marquee>' . "\n\n";
/// calculate mid point
$mpoint = floor(strlen($content) / 2);
/// modify back to the start of a paragraph
$mpoint = strripos($content, '<p', -$mpoint);
/// split html so we only work on second half
$first = substr($content, 0, $mpoint);
$second = substr($content, $mpoint);
$pcount = 0;
$regexp = '/<p>.+?<\/p>/si';
The rest is the bulk of the code that runs the replacement. This could be modified to insert more than one advert, or to support more involved image checking.
$content = $first . preg_replace_callback($regexp, function($matches){
global $pcount, $advert;
if ( !$advert ) {
$return = $matches[0];
}
else if ( stripos($matches[0], '<img ') !== FALSE ) {
$return = $matches[0];
$pcount = 0;
}
else if ( $pcount === 1 ) {
$return = $advert . $matches[0];
$advert = '';
}
else {
$return = $matches[0];
$pcount++;
}
return $return;
}, $second);
After this code has been executed the $content variable will contain the enhanced HTML.
PHP versions prior to 5.3
As your chosen testing area does not support PHP 5.3, and so does not support anonymous functions, you need to use a slightly modified and less succinct version; that makes use of a named function instead.
Also, in order to support content that may not actually leave space for the advert in it's second half I have modified the $mpoint so that it is calculated to be 80% from the end. This will have the effect of including more in the $second part — but will also mean your adverts will be generally placed higher up in the mark-up. This code has not had any fallback implemented into it, because your question does not mention what should happen in the event of failure.
$advert = '<marquee>BUY THIS STUFF!!</marquee>' . "\n\n";
$mpoint = floor(strlen($content) * 0.8);
$mpoint = strripos($content, '<p', -$mpoint);
$first = substr($content, 0, $mpoint);
$second = substr($content, $mpoint);
$pcount = 0;
$regexp = '/<p>.+?<\/p>/si';
function replacement_callback($matches){
global $pcount, $advert;
if ( !$advert ) {
$return = $matches[0];
}
else if ( stripos($matches[0], '<img ') !== FALSE ) {
$return = $matches[0];
$pcount = 0;
}
else if ( $pcount === 1 ) {
$return = $advert . $matches[0];
$advert = '';
}
else {
$return = $matches[0];
$pcount++;
}
return $return;
}
echo $first . preg_replace_callback($regexp, 'replacement_callback', $second);

You could try this:
<?php
$ad_code = 'SOME SCRIPT HERE';
// Your code.
$content = apply_filters('the_content', $post->post_content);
// Split the content at the <p> tags.
$content = explode ('<p>', $content);
// Find the mid of the article.
$content_length = count($content);
$content_mid = floor($content_length / 2);
// Save no image p's index.
$last_no_image_p_index = NULL;
// Loop beginning from the mid of the article to search for images.
for ($i = $content_mid; $i < $content_length; $i++) {
// If we do not find an image, let it go down.
if (stripos($content[$i], '<img') === FALSE) {
// In case we already have a last no image p, we check
// if it was the one right before this one, so we have
// two p tags with no images in there.
if ($last_no_image_p_index === ($i - 1)) {
// We break here.
break;
}
else {
$last_no_image_p_index = $i;
}
}
}
// If no none image p tag was found, we use the last one.
if (is_null($last_no_image_p_index)) {
$last_no_image_p_index = ($content_length - 1);
}
// Add ad code here with trailing <p>, so the implode later will work correctly.
$content = array_slice($content, $last_no_image_p_index, 0, $ad_code . '</p>');
$content = implode('<p>', $content);
?>
It will try to find a place for the ad from the mid of your article and if none is found the ad is put to the end.
Regards
func0der

I think this will work:
First explode the paragraphs, then you have to loop it and check if you find img inside them.
If you find it inside, you try the next.
Think of this as psuedo-code, since it's not tested. You will have to make a loop too, comments in the code :) Sorry if it contains bugs, it's written in Notepad.
<?php
$i = 0; // counter
$arrBoolImg = array(); // array for the paragraph booleans
$content = apply_filters('the_content', $post->post_content);
$contents = str_replace ('<p>', '<explode><p>', $content); // here we add a custom tag, so we can explode
$contents = explode ('<explode>', $contents); // then explode it, so we can iterate the paragraphs
// fill array with boolean array returned
$arrBoolImg = hasImages($contents);
$halfway_mark = ceil(count($contents) / 2);
/*
TODO (by you):
---
When you have $arrBoolImg filled, you can itarate through it.
You then simply loop from the middle of the array $contents (plural), that is exploded from above.
The startingpoing for your loop is the middle, the upper bounds is the +2 or what ever :-)
Then you simply insert your magic.. And then glue it back together, as you did before.
I think this will work. even though the code may have some bugs, since I wrote it in Notepad.
*/
function hasImages($contents) {
/*
This function loops through the $contents array and checks if they have images in them
The return value, is an array with boolean values, so one can iterate through it.
*/
$arrRet = array(); // array for the paragraph booleans
if (count($content)>=1) {
foreach ($contents as $v) { // iterate the content
if (strpos($v, '<img') === false) { // did not find img
$arrRet[$i] = false;
}
else { // found img
$arrRet[$i] = true;
}
$i++;
} // end for each loop
return $arrRet;
} // end if count
} // end hasImages func
?>

[This is just an idea, I don't have enough reputation to comment...]
After calling #Olavxxx's method and filling your boolean array you could just loop through that array in an alternating manner starting in the middle: Let's assume your array is 8 entries long. Calculating the middle using your method you get 4. So you check the combination of values 4 + 3, if that doesn't work, you check 4 + 5, after that 3 + 2, ...
So your loop looks somewhat like
$middle = ceil(count($content) / 2);
$i = 1;
while ($i <= $middle) {
$j = $middle + (-1) ^ $i * $i;
$k = $j + 1;
if (!$hasImagesArray[$j] && !$hasImagesArray[$k])
break; // position found
$i++;
}
It's up to you to implement further constraints to make sure the add is not shown to far up or down in the article...
Please note that you need to take care of special cases like too short arrays too in order to prevent IndexOutOfBounds-Exceptions.

Wordpress Archives Widget - Customize html output

I'm still pinned against wordpress it seems. I added the widget 'Archives' to my sidebar and once more, the html output is crap, it basically has this structure:
<li>text - (# of posts)</li>
I want to transform it into:
<li>text <small># of posts</small>
Unlike with plugins however, I wasn't able to find the line that creates the html output in the php pages suggested/mentioned by the wordpress community, namely functions.php, widgets.php and default-widgets.php
I've googled every possible keyword combination on the matter and I was unable to find something relevant.
All help is appreciated
Regards
G.Campos

Check out general-template.php. Two functions wp_get_archives and get_archives_link. You'd have to hack wp_get_archives to change what gets loaded in $text. The post count gets loaded into the $after variable which placed outside the link in get_archives_link. Instead of this:
$text = sprintf(__('%1$s %2$d'), $wp_locale->get_month($arcresult->month), $arcresult->year);
if ( $show_post_count )
$after = ' ('.$arcresult->posts.')' . $afterafter;
something like this:
$text = sprintf(__('%1$s %2$d'), $wp_locale->get_month($arcresult->month), $arcresult->year);
if ( $show_post_count )
$text= $text.' <small>'.$arcresult->posts.'</small>';
That's just for the Monthly archive. You'd have to make modifications on the Yearly, Weekly and Daily blocks.
Edit: Easiest way to exclude the <small> element from the link's title is to load it up in a separate variable in each block and then pass it into a modified get_archives_link. In the example above, right after $text gets loaded up just load that value into $title:
$text = sprintf(__('%1$s %2$d'), $wp_locale->get_month($arcresult->month), $arcresult->year);
$title = $text;
if ( $show_post_count )
$text= $text.' <small>'.$arcresult->posts.'</small>';
$output .= get_archives_link($url, $text, $format, $before, $after, $title);
Then modify get_archives_link:
function get_archives_link($url, $text, $format = 'html', $before = '', $after = '', $title = '') {
$text = wptexturize($text);
if($title == '')
$title = $text;
$title_text = esc_attr($title);
$url = esc_url($url);
if ('link' == $format)
$link_html = "\t<link rel='archives' title='$title_text' href='$url' />\n";
elseif ('option' == $format)
$link_html = "\t<option value='$url'>$before $text $after</option>\n";
elseif ('html' == $format)
$link_html = "\t<li>$before<a href='$url' title='$title_text'>$text</a>$after</li>\n";
else // custom
$link_html = "\t$before<a href='$url' title='$title_text'>$text</a>$after\n";
$link_html = apply_filters( "get_archives_link", $link_html );
return $link_html;
}

Add this code inside your theme functions.php file, It will wrap post archive counts inside span tag. In below code example I wrapped counts in span tag, you can add or modify it according to your requirement.
function wrap_archive_count($links) {
$links = str_replace('</a> (', '<span class="archive-count">', $links);
$links = str_replace(')', '</span></a>', $links);
return $links;
}
add_filter('get_archives_link', 'wrap_archive_count');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Truncate WordPress post content without losing HTML formatting - php

Related

End excerpts with a full sentence for specific post types

The function processes only the last pattern sought. What is wrong?

Insert text in content after 300 words but after closing tag of a Paragraph

Place content in between paragraphs without images

Wordpress Archives Widget - Customize html output

Categories

Resources