I am modifying a piece of code, the essence is to pick the first 90 characters from the body of a post. I have managed to get the text including some punctuation characters.
My problem is that I do not know how to get the 90 characters NOT to ignore newline. I want it to terminate once it encounters a line break. As it is now, it doesn't respect it and so ends up adding content from another line/paragraph.
This is the code I am using -
$title_data = substr($postdata,0,90);
$title_data = preg_replace("/[^\w#&,\":; ]/",'', strip_tags($title_data));
$data['post_title'] = "F. Y. I - " . $title_data . " ...";
The right first step you do the preg_replace(), then you put that value to substr() param.
$title_data = preg_replace("/[^\w#&,\":; ]/",'', strip_tags($postdata));
$data = substr($title_data,0,90);
$data['post_title'] = "F. Y. I - " . $data . " ...";
Here's my to cents... It also makes sure words aren't truncated.
// Break the string after the first paragraph (if any)
$parts = explode('</p>', $postdata);
// Remove all HTML from the first element (which contain the full text if no paragraph exists)
$excerpt = strip_tags($parts[0]);
$ending = '...';
if (strlen($excerpt) > 90) {
// Check where the last space is, so we don't truncate any words.
$excerpt = substr($excerpt, 0, 90 - strlen($ending));
$excerpt = substr($excerpt, 0, strrpos($excerpt, ' '));
}
// Return the new string
$data['post_title'] = "F. Y. I - " . $excerpt . $ending;
A bit more complicated, but might help to get the result you're after:
// Use `wpautop` to work WP's paragraph-adding magic.
$rawText = wpautop($postdata);
// Remove all the opening `<p>` tags...
$preSplitContent = str_replace('<p>', '', $rawText);
// ...and then break into an array using the closing `</p>` tags.
// (hacky, but this gives you an array where each
// item is a paragraph/line from your content)
$splitContent = explode('</p>', $preSplitContent);
// Then run your preg_replace
// (because `$splitContent[0]` is only the first
// line of your text, you won't include any content
// from the other lines)
$firstLine = preg_replace("/[^\w#&,\":; ]/",'', strip_tags($splitContent[0]));
// Then trim the result down to the first 90 characters
$finalText = substr($firstLine,0,90);
$data['post_title'] = "F. Y. I - " . $finalText . " ...";
Related
I am trying to determine the absolute position of certain words within a block of html, but only if they are outside of an actual html tag. For instance, if I wanted to determine the position of the word "join" using preg_match in this text:
<p>There are 14 more days until our holiday special so come join us!</p>
I could use:
preg_match('/join/', $post_content, $matches, PREG_OFFSET_CAPTURE, $offset);
The problem is that this is matching the word within the aria-label attribute, when what I need is the one just after the link. It would be fine to match between the <a> and </a>, just not inside the brackets themselves.
My actual end goal, most of what (I think) I have aside from this last element: I am trimming a block of html (not a full document) to cut off at a specific word count. I am trying to determine which character that last word ends at, and then joining the left side of the html block with only the html from the right side, so all html tags close gracefully. I thought I had it working until I ran into an example like I showed where the last word was also within an html attribute, causing me to split the string at the wrong location. This is my code so far:
$post_content = strip_tags ( $p->post_content, "<a><br><p><ul><li>" );
$post_content_stripped = strip_tags ( $p->post_content );
$post_content_stripped = preg_replace("/[^A-Za-z0-9 ]/", ' ', $post_content_stripped);
$post_content_stripped = preg_replace("/\s+/", ' ', $post_content_stripped);
$post_content_stripped_array = explode ( " " , trim($post_content_stripped) );
$excerpt_wordcount = count( $post_content_stripped_array );
$cutpos = 0;
while($excerpt_wordcount>48){
$thiswordrev = "/" . strrev($post_content_stripped_array[$excerpt_wordcount - 1]) . "/";
preg_match($thiswordrev, strrev($post_content), $matches, PREG_OFFSET_CAPTURE, $cutpos);
$cutpos = $matches[0][1] + (strlen($thiswordrev) - 2);
array_pop($post_content_stripped_array);
$excerpt_wordcount = count( $post_content_stripped_array );
}
if($pwordcount>$excerpt_wordcount){
preg_match_all('/<\/?[^>]*>/', substr( $post_content, strlen($post_content) - $cutpos ), $closetags_result);
$excerpt_closetags = "" . $closetags_result[0][0];
$post_excerpt = substr( $post_content, 0, strlen($post_content) - $cutpos ) . $excerpt_closetags;
}else{
$post_excerpt = $post_content;
}
I am actually searching the string in reverse in this case, since I am walking word by word backwards from the end of the string, so I know that my html brackets are backwards, eg:
>p/<!su nioj emoc os >a/<laiceps yadiloh>"su nioj"=lebal-aira "renepoon rerreferon"=ler "knalb_"=tegrat "lmth.egapemos/"=ferh a< ruo litnu syad erom 41 era erehT>p<
But it's easy enough to flip all of the brackets before doing the preg_match, or I am assuming should be easy enough to have the preg_match account for that.
Do not use regex to parse HTML.
You have a simple objective: limit the text content to a given number of words, ensuring that the HTML remains valid.
To this end, I would suggest looping through text nodes until you count a certain number of words, and then removing everything after that.
$dom = new DOMDocument();
$dom->loadHTML($post_content);
$xpath = new DOMXPath($dom);
$all_text_nodes = $xpath->query("//text()");
$words_left = 48;
foreach( $all_text_nodes as $text_node) {
$text = $text_node->textContent;
$words = explode(" ", $text); // TODO: maybe preg_split on /\s/ to support more whitespace types
$word_count = count($words);
if( $word_count < $words_left) {
$words_left -= $word_count;
continue;
}
// reached the threshold
$words_that_fit = implode(" ", array_slice($words, 0, $words_left));
// If the above TODO is implemented, this will need to be adjusted to keep the specific whitespace characters
$text_node->textContent = $words_that_fit;
$remove_after = $text_node;
while( $remove_after->parentNode) {
while( $remove_after->nextSibling) {
$remove_after->parentNode->removeChild($remove_after->nextSibling);
}
$remove_after = $remove_after->parentNode;
}
break;
}
$output = substr($dom->saveHTML($dom->getElementsByTagName("body")->item(0)), strlen("<body>"), -strlen("</body>"));
Live demo
Ok, I figured out a workaround. I don't know if this is the most elegant solution, so if someone sees a better one I would still love to hear it, but for now I realized that I don't have to actually have the html in the string I am searching to determine the position to cut, I just need it to be the same length. I grabbed all of the html elements and just created a dummy string replacing all of them with the same number of asterisks:
// create faux string with placeholders instead of html for search purposes
preg_match_all('/<\/?[^>]*>/', $post_content, $alltags_result);
$tagcount = count( $alltags_result );
$post_content_dummy = $post_content;
foreach($alltags_result[0] as $thistag){
$post_content_dummy = str_replace($thistag, str_repeat("*",strlen($thistag)), $post_content_dummy);
}
Then I just use $post_content_dummy in the while loop instead of $post_content, in order to find the cut position, and then $post_content for the actual cut. So far seems to be working fine.
I have a bunch of file. I need to print out the file only ITEM_DESCRIPTION: part. Lets say the contents of each files is like below
// ITEM_DESCRIPTION: Phone 1
// - Android 9.0 (Pie)
// - 64/128 GB
// - Li-Po 3500 mAh
I want the code to display like below
Phone 1
- Android 9.0 (Pie)
- 64/128 GB
- Li-Po 3500 mAh
so far, what I can produce is
// Phone 1 // - Android 9.0 (Pie) // - 64/128 GB // - Li-Po 3500 mAh
How I want to separate the double slash with new line?
Here is my code
// Get file path
// This code inside for loop which i don't write here
$filedir=$filelist[$i];
//Display Item Description
$search = "ITEM_DESCRIPTION";
$endsearch = "BRAND";
$contents = stristr(file_get_contents($filedir), $search);
$description = substr($contents, 0, stripos($contents, $endsearch));
$rmv_char = str_replace(str_split('\:'), ' ', $description);
$newline = str_replace(str_split('\//'), PHP_EOL , $rmv_char);
$phone_dscrpn = substr($newline, strlen($search));
Here is the way I had tried, but it doesn't work
$newline = str_replace(str_split('\//'), PHP_EOL , $rmv_char);
$newline = str_replace(str_split('\//'), "\r\n" , $rmv_char);
It looks like you already have newlines in the original data so you don't need to add them again. You just need to clear the slashes and the spaces (or tabs if that's what they are, hard to tell on SO).
Remember if you're testing output in browser it won't show the newlines without <pre></pre>.
// Get file path
// This code inside for loop which i don't write here
$filedir=$filelist[$i];
//Display Item Description
$search = "ITEM_DESCRIPTION";
$endsearch = "BRAND";
$contents = stristr(file_get_contents($filedir), $search);
$description = substr($contents, 0, stripos($contents, $endsearch));
//Clear out the slashes
$phone_dscrpn = str_replace("//", "", $description);
//Clear out the spaces
while(strpos($phone_dscrpn," ")!==false) {
$phone_dscrpn = str_replace(" ", " ", $phone_dscrpn);
}
Note this will replace any double slashes or double spaces within the description. If this could be an issue then you will need to consider a more advanced approach (e.g. line by line).
Assuming that all of your lines begin with // and this pattern isn't used in the actual product description then you can use a simple regular expression:
$description = preg_replace('~//\s(ITEM_DESCRIPTION:)?\s+~', '', $description);
Match //\s where \s is any white-space
Optionally match ITEM_DESCRIPTION:
Match \s+ any number of white-space characters
This will give you:
Phone 1
- Android 9.0 (Pie)
- 64/128 GB
- Li-Po 3500 mAh
Im showing a text excerpt that starts with a searched word with more 35 chars after that searched word.
Do you know some way to show this text excerpt (searched word + 35chars) without cut the last word, because with substr is not working?
$search = $url[1];
$read = $pdo->prepare("SELECT * FROM pages WHERE title LIKE ? OR content LIKE ? LIMIT ?,?");
$read->bindValue(1, "%$search%", PDO::PARAM_STR);
$read->bindValue(2, "%$search%", PDO::PARAM_STR);
$read->bindParam(3, $begin,PDO::PARAM_INT);
$read->bindParam(4, $max,PDO::PARAM_INT);
$read->execute();
$searchPos = stripos($result['content'],$search);
$searchLen = strlen($search);
$result_text = '"'.substr($result['content'], $searchPos, $searchLen + 35).'..."';
echo '<p>'.strip_tags($result_text).'</p>';
I'm guessing you are looking for something like the following:
<?php
#Do you know some way to show this text excerpt (searched word + 35chars) without cut the last word?
$search = 'Do';
$str = explode(' ', 'you know some way to show this text excerpt (searched word + 35chars) without cut the last word?');
$len = strlen($search) + 1;#might need to be 0 if you don\'t want search to be with the 35 characters
$ret = $search . ' ';
foreach($str as $st){
if(strlen($st) + $len < 35){
$len += strlen($st) + 1;
$ret .= $st . ' ';
}else{
break;
}
}
$ret = trim($ret);
var_dump($ret);
which gives string(33) "Do you know some way to show this"
Please use something like this:
//$orgText = "This text is exactly the same length...";
//$orgText = "This text is shorter...";
//$orgText = "This_text_has_no_spaces_and_is_shorter";
//$orgText = "This_text_has_no_spaces_and_is_the_same";
//$orgText = "This_text_has_no_spaces_and_is_really_longer";
$orgText = "This text is longer and will be definitely cut but last word survives...";
$searchedWord = "This";
$charsToShow = 35;
$desiredExcerptLength = strlen($searchedWord) + $charsToShow;
$searchedWordPos = strpos($orgText, $searchedWord);
if ($searchedWordPos !== false) { // Word found
$subText = substr($orgText, $searchedWordPos); // Subtext: begins with searched word and ends at org text end
$subTextLength = strlen($subText);
if ($subTextLength > $desiredExcerptLength) { // Subtext longer than desired excerpt => cut it but keep last word
$spaceAfterLastWordPos = strpos($subText, ' ', $desiredExcerptLength);
$excerpt = substr($subText, 0, $spaceAfterLastWordPos ? $spaceAfterLastWordPos : $subTextLength);
}
else { // Subtext equal or shorter than desired excerpt => keep all text
$excerpt = $subText;
}
}
var_dump($excerpt);
It's clear way to do it.
I hope that's a behavior what you meant.
You can check it at: http://writecodeonline.com/php
There are several "kinds" of text you can pass into that:
Text where searched word isn't present=> return NULL:
input: NULL, "", "Something without searched word"=> result: NULL
Text with spaces longer than desired excerpt length (searched word length + e.g. 35)=> return org text cut out but keep whole last word:
"This text is longer and will be definitely cut but last word survives..." => "This text is longer and will be definitely"
Text with spaces equal to desired excerpt length=> return org text:
"This text is exactly the same length..." => "This text is exactly the same length..."
Text with spaces shorter than desired excerpt length=> return org text:
"This text is shorter..." => "This text is shorter..."
Text without spaces longer than desired excerpt length=> return org text:
"This_text_has_no_spaces_and_is_really_longer" => "This_text_has_no_spaces_and_is_really_longer"
Text without spaces equal to desired excerpt length=> return org text:
"This_text_has_no_spaces_and_is_the_same" => "This_text_has_no_spaces_and_is_the_same"
Text without spaces shorter than desired excerpt length=> return org text:
"This_text_has_no_spaces_and_is_shorter" => "This_text_has_no_spaces_and_is_shorter"
Probably a simple problem here, but I cannot find it.
I am exploding a string that was input and stored from a textarea. I use nl2br() so that I can explode the string by the <br /> tag.
The string explodes properly, but when I try to get the first character of the string in a while loop, it only returns on the first line.
Note: The concept here is greentexting, so if you are familiar with that then you will see what I am trying to do. If you are not, I put a brief description below the code sample.
Code:
while($row = mysqli_fetch_array($r, MYSQLI_ASSOC)) {
$comment = nl2br($row['comment']);
$sepcomment = explode("<br />", $comment);
$countcomment = count($sepcomment);
$i = 0;
//BEGIN GREENTEXT COLORING LOOP
while($i < $countcomment) {
$fb = $sepcomment[$i];
$z = $fb[0]; // Check to see if first character is >
if ($z == ">") {
$tcolor = "#789922";
}
else {
$tcolor = "#000000";
}
echo '<font color="' . $tcolor . '">' . $sepcomment[$i] . '</font><br>';
$i++;
}
//END GREENTEXT COLORING LOOP
}
Greentext: If the first character of the line is '>' then the color of that entire line becomes green. If not, then the color is black.
Picture:
What I have tried:
strip_tags() - Thinking that possibly the tags were acting as the first characters.
$fb = preg_replace("/(<br\s*\/?>\s*)+/", "", $sepcomment[$i]);
str_replace()
echo $z //Shows the correct character on first line, blank on following lines.
$z = substr($fb, 0, 1);
Here is a test I just did where I returned the first 5 characters of the string.
Any ideas for getting rid of those empty characters?
Try "trim" function
$fb = trim($sepcomment[$i]);
http://php.net/manual/en/function.trim.php
(probably line breaks are the problem, there are \n\r characters after tag)
I am working on my php website (Not a Wordpress site) on the main index I display the two newest post. The thing is on the description it shows the entire article I find myself needing to display post excerpts maybe 35 word limit.
<?=$line["m_description"]?>
<?
$qresult3 = mysql_query("SELECT * FROM t_users WHERE u_id=".$line["m_userid"]." LIMIT 1");
if (mysql_num_rows($qresult3)<1) { ?>
<?php
// just the excerpt
function first_n_words($text, $number_of_words) {
// Where excerpts are concerned, HTML tends to behave
// like the proverbial ogre in the china shop, so best to strip that
$text = strip_tags($text);
// \w[\w'-]* allows for any word character (a-zA-Z0-9_) and also contractions
// and hyphenated words like 'range-finder' or "it's"
// the /s flags means that . matches \n, so this can match multiple lines
$text = preg_replace("/^\W*((\w[\w'-]*\b\W*){1,$number_of_words}).*/ms", '\\1', $text);
// strip out newline characters from our excerpt
return str_replace("\n", "", $text);
}
// excerpt plus link if shortened
function truncate_to_n_words($text, $number_of_words, $url, $readmore = 'read more') {
$text = strip_tags($text);
$excerpt = first_n_words($text, $number_of_words);
// we can't just look at the length or try == because we strip carriage returns
if( str_word_count($text) !== str_word_count($excerpt) ) {
$excerpt .= '... '.$readmore.'';
}
return $excerpt;
}
$src = <<<EOF
<b>My cool story</b>
<p>Here it is. It's really cool. I like it. I like lots of stuff.</p>
<p>I also like to read and write and carry on forever</p>
EOF;
echo first_n_words($src, 10);
echo "\n\n-----------------------------\n\n";
echo truncate_to_n_words($src, 10, 'http://www.google.com');
EDIT: Added functional example and accounted for punctuation and numbers in text
I have a function though other people may say it's not good because I'm still good at PHP too (tips welcome people) but this will give you what you are looking for, it may need better coding if anyone has suggestions.
function Short($text, $length, $url, $more){
$short = mb_substr($text, 0, $length);
if($short != $text) {
$lastspace = strrpos($short, ' ');
$short = substr($short , 0, $lastspace);
if(!$more){
$more = "Read Full Post";
} // end if more is blank
$short .= "...[<a href='$url'>$more</a>]";
} // end if content != short
$short = str_replace("’","'", $short);
$short = stripslashes($short);
$short = nl2br($short);
} // end short function
To Use:
say your article content is the variable $content
function($content, "35", "http://domain.com/article_post", "Read Full Story");
echo $short;
Similarly, you can adjust the function to remove $url and $more from it and just have the excerpt with ... at the end.