Autodetect punctuation in a HTML string, and split the string there

Autodetect punctuation in a HTML string, and split the string there - php

I have a set of punctuation characters:
$punctuation = array('.', '!', ';', '?');
A character limit variable:
$max_char = 55;
And a string with HTML:
$string = 'This is a test string. With HTML.';
How can I split this string to maximum $max_chr characters, using one of the characters in the $punctuation array as "keys" ?
So basically the string should split at the nearest punctuation character, but not inside a HTML tag definition/attribute (It doesn't matter if the split occurs inside a tag's contents and the tag remains unclosed -- because I'm checking for unclosed tags later).

If you want to know whether or not you're inside a tag you might need to do some kind of state machine, and then make use of a loop on the string. You can reference a string sortof like an array, so you can do something like:
$punctuation = array('.', '!', ';', '?');
$in_tag = false;
$max_char = 55;
$string = 'This is a test string. With HTML.';
$str_length = strlen($string) > $max_char ? $max_char : strlen($string);
for($i = 0; $i < $str_length; $i++)
{
$tempChar = $string[$i]; //Get the character at position $i
if((!$in_tag) && (in_array($tempChar, $punctuation)))
{
$string1 = substr($string, 0, $i);
$string2 = substr($string, $i);
}
elseif((!$in_tag) && ($tempChar == "<"))
{
$in_tag = true;
}
elseif(($in_tag) && ($tempChar == ">"))
{
$in_tag = false;
}
}

Related

PHP - Replacing characters with stars, except when there is a minus

How can I replace a string with stars except the first and the last letter but not a minus in case there is one.
Here for better illustration what I try to get:
From:
url-name
To
u**-***e
This is what I have so far:
function get_starred($str) {
$len = strlen($str);
return substr($str, 0, 1).str_repeat('_', $len - 2).substr($str, $len - 1, 1);
}

You could use the PCRE verbs to skip the first character of a string, last character of a string, and any -s. Like this:
(^.|-|.$)(*SKIP)(*FAIL)|.
https://regex101.com/r/YfrZ8r/1/
PHP example using preg_replace
preg_replace('/(^.|-|.$)(*SKIP)(*FAIL)|./', '*', 'url-name');
https://3v4l.org/0dSPQ

hey try implmenting the following:
function get_starred($str) {
$str_array =str_split($str);
foreach($str_array as $key => $char) {
if($key == 0 || $key == count($str_array)-1) continue;
if($char != '-') $str[$key] = '*';
}
return $str;
}

user3783242 has a great solution - However, if you for some reason do not want to use preg_replace(), you could do the following:
function get_starred($str) {
//make the string an array of letters
$str = str_split($str);
//grab the first letter (This also removes the first letter from the array)
$first = array_shift($str);
//grab the last letter (This also removes the last letter from the array)
$last = array_pop($str);
//loop through leftover letters, replace anything not a dash
//note the `&` sign, this is called a Reference, it means that if the variable is changed in the loop, it will be changed in the original array as well.
foreach($str as &$letter) {
//if letter is not a dash, set it to an astrisk.
if($letter != "-") $letter = "*";
}
//return first letter, followed by an implode of characters, followed by the last letter.
return $first . implode('', $str) . $last;
}

Here is mine:
$string = 'url-name foobar';
function star_replace($string){
return preg_replace_callback('/[-\w]+/i', function($match){
$arr = str_split($match[0]);
$len = count($arr)-1;
for($i=1;$i<$len;$i++) $arr[$i] = $arr[$i] == '-' ? '-' : '*';
return implode($arr);
}, $string);
}
echo star_replace($string);
This works on multiple words.
Output
u**-***e f****r
Sandbox
And it also takes into account puctuation
$string = 'url-name foobar.';
Output
u**-***e f****r.

how i can display only 2 phrase from sql [duplicate]

Is there a way to trim a text string in PHP so it has a certain number of characters? For instance, if I had the string:
$string = "this is a string";
How could I trim it to say:
$newstring = "this is";
This is what I have so far, using chunk_split(), but it isn't working. Can anyone improve on my method?
function trimtext($text)
{
$newtext = chunk_split($text,15);
return $newtext;
}
I also looked at this question, but I don't really understand it.

if (strlen($yourString) > 15) // if you want...
{
$maxLength = 14;
$yourString = substr($yourString, 0, $maxLength);
}
will do the job.
Take a look here.

substr cuts words in half. Also if word contains UTF8 characters, it misbehaves. So it would be better to use mb_substr:
$string = mb_substr('word word word word', 0, 10, 'utf8').'...';

You didn't say the reason for this but think about what you want to achieve. Here is a function for shorten a string word by word with or without adding ellipses at the end:
function limitStrlen($input, $length, $ellipses = true, $strip_html = true) {
//strip tags, if desired
if ($strip_html) {
$input = strip_tags($input);
}
//no need to trim, already shorter than trim length
if (strlen($input) <= $length) {
return $input;
}
//find last space within length
$last_space = strrpos(substr($input, 0, $length), ' ');
if($last_space !== false) {
$trimmed_text = substr($input, 0, $last_space);
} else {
$trimmed_text = substr($input, 0, $length);
}
//add ellipses (...)
if ($ellipses) {
$trimmed_text .= '...';
}
return $trimmed_text;
}

function trimtext($text, $start, $len)
{
return substr($text, $start, $len);
}
You can call the function like this:
$string = trimtext("this is a string", 0, 10);
Would return:
This is a

substr let's you take a portion of string consisting of exactly as much characters as you need.

You can use this
substr()
function to get substring

If you want to get a string with a certain number of characters you can use substr, i.e.
$newtext = substr($string,0,$length);
where $length is the given length of the new string.

If you want an abstract for the first 10 words (you can use html in $text, before script there is strip_tags)
use this code:
preg_match('/^([^.!?\s]*[\.!?\s]+){0,10}/', strip_tags($text), $abstract);
echo $abstract[0];

My function has some length to it, but I like to use it. I convert the string int to a Array.
function truncate($text, $limit){
//Set Up
$array = [];
$count = -1;
//Turning String into an Array
$split_text = explode(" ", $text);
//Loop for the length of words you want
while($count < $limit - 1){
$count++;
$array[] = $split_text[$count];
}
//Converting Array back into a String
$text = implode(" ", $array);
return $text." ...";
}
Or if the text is coming from an editor and you want to strip out the HTML tags.
function truncate($text, $limit){
//Set Up
$array = [];
$count = -1;
$text = filter_var($text, FILTER_SANITIZE_STRING);
//Turning String into an Array
$split_text = preg_split('/\s+/', $text);
//Loop for the length of words you want
while($count < $limit){
$count++;
$array[] = $split_text[$count];
}
//Converting Array back into a String
$text = implode(" ", $array);
return $text." ...";
}

With elipsis (...) only if longer - and taking care of special language-specific characters:
mb_strlen($text,'UTF-8') > 60 ? mb_substr($text, 0, 60,'UTF-8') . "…" : $text;

How can I get the correct position of a word in a UTF-8 text? [duplicate]

This question already has answers here:
preg_match and UTF-8 in PHP
(8 answers)
Closed 12 months ago.
I have a simple PHP code to get a sentences of a text and bold an specific word.
First of all I get an array with the words that I want and their position in the text.
$all_words = str_word_count($text, 2, 'åæéø');
// $words is an array with the words that I want find.
$words_found = array();
foreach ($all_words as $pos => $word_found) {
foreach ($words as $word) {
if ($word == strtolower($word_found)) {
$words_found[$pos] = $word_found;
break;
}
}
}
Then, for every word in $words_found I get a portion of the text with the word in the middle.
$length = 90;
foreach ($words_found as $offset => $word) {
$word_length = strlen($word);
$start = $offset - $length;
$last_start = $start + $length + $word_length;
$first_part = substr($text, $start, $length);
$last_part = substr($text, $last_start, $length);
$sentence = $first_part . '<b>' . $word . '</b>' . $last_part;
}
It works fine excepts that the text is a UTF-8 text with danish characteres (åæéø). So when $first_part or $last_part starts by an unicode character the susbtr string is empty.
I know mb_substr function, so I replace my code with it.
$word_length = mb_strlen($word, 'UTF-8');
$first_part = mb_substr($text, $start, $length, 'UTF-8');
$last_part = mb_substr($text, $last_start, $length, 'UTF-8');
But with this function (mb_substr) the position of the word ($offset) is wrong, the new substrings ($sentence) doesn't match as it should be.
Does it exist something like mb_str_word_count? How can I get a the correct position of the words?

Try using regex with Word Boundaries
$string = 'That this notpink a or pink blue red dark.';
$regex = '/\bpink\b/';
preg_match($regex, $string, $match, PREG_OFFSET_CAPTURE);
$pos = $match[0][1];
echo $pos;
Edit :
If you don't like regex, you can match word with stripos by using space
if(stripos($string, 'pink ') === 0)
$pos = 0;
else if(stripos($string, ' pink') !== false)
$pos = stripos($string, ' pink') + 1;
else
$pos = stripos($string, ' pink ') + 1;

I try the solution by #Mario Johnathan but it didn't work properly for me.
Finally I get a solution by my own: I use the non multi-byte functions like substr and the position given by str_word_count, and the solution is changing the first substring if the first character is a danish character.
$first_part_aux = str_split(trim($first_part));
if (!ctype_alpha($first_part_aux[0])) {
for ($i = 1; $i < count($first_part_aux); $i++) {
if (ctype_alpha($first_part_aux[$i])) {
$start = $start + $i;
$length = $length - $i;
$first_part = substr($text, $start, $length);
break;
}
}
}

Shorten a text string in PHP

Is there a way to trim a text string in PHP so it has a certain number of characters? For instance, if I had the string:
$string = "this is a string";
How could I trim it to say:
$newstring = "this is";
This is what I have so far, using chunk_split(), but it isn't working. Can anyone improve on my method?
function trimtext($text)
{
$newtext = chunk_split($text,15);
return $newtext;
}
I also looked at this question, but I don't really understand it.

if (strlen($yourString) > 15) // if you want...
{
$maxLength = 14;
$yourString = substr($yourString, 0, $maxLength);
}
will do the job.
Take a look here.

substr cuts words in half. Also if word contains UTF8 characters, it misbehaves. So it would be better to use mb_substr:
$string = mb_substr('word word word word', 0, 10, 'utf8').'...';

You didn't say the reason for this but think about what you want to achieve. Here is a function for shorten a string word by word with or without adding ellipses at the end:
function limitStrlen($input, $length, $ellipses = true, $strip_html = true) {
//strip tags, if desired
if ($strip_html) {
$input = strip_tags($input);
}
//no need to trim, already shorter than trim length
if (strlen($input) <= $length) {
return $input;
}
//find last space within length
$last_space = strrpos(substr($input, 0, $length), ' ');
if($last_space !== false) {
$trimmed_text = substr($input, 0, $last_space);
} else {
$trimmed_text = substr($input, 0, $length);
}
//add ellipses (...)
if ($ellipses) {
$trimmed_text .= '...';
}
return $trimmed_text;
}

function trimtext($text, $start, $len)
{
return substr($text, $start, $len);
}
You can call the function like this:
$string = trimtext("this is a string", 0, 10);
Would return:
This is a

substr let's you take a portion of string consisting of exactly as much characters as you need.

You can use this
substr()
function to get substring

If you want to get a string with a certain number of characters you can use substr, i.e.
$newtext = substr($string,0,$length);
where $length is the given length of the new string.

If you want an abstract for the first 10 words (you can use html in $text, before script there is strip_tags)
use this code:
preg_match('/^([^.!?\s]*[\.!?\s]+){0,10}/', strip_tags($text), $abstract);
echo $abstract[0];

My function has some length to it, but I like to use it. I convert the string int to a Array.
function truncate($text, $limit){
//Set Up
$array = [];
$count = -1;
//Turning String into an Array
$split_text = explode(" ", $text);
//Loop for the length of words you want
while($count < $limit - 1){
$count++;
$array[] = $split_text[$count];
}
//Converting Array back into a String
$text = implode(" ", $array);
return $text." ...";
}
Or if the text is coming from an editor and you want to strip out the HTML tags.
function truncate($text, $limit){
//Set Up
$array = [];
$count = -1;
$text = filter_var($text, FILTER_SANITIZE_STRING);
//Turning String into an Array
$split_text = preg_split('/\s+/', $text);
//Loop for the length of words you want
while($count < $limit){
$count++;
$array[] = $split_text[$count];
}
//Converting Array back into a String
$text = implode(" ", $array);
return $text." ...";
}

With elipsis (...) only if longer - and taking care of special language-specific characters:
mb_strlen($text,'UTF-8') > 60 ? mb_substr($text, 0, 60,'UTF-8') . "…" : $text;

Truncating a string after x amount of charcters

I have a string that that is an unknown length and characters.
I'd like to be able to truncate the string after x amount of characters.
For example from:
$string = "Hello# m#y name # is Ala#n Colem#n"
$character = "#"
$x = 4
I'd like to return:
"Hello# m#y name # is Ala#"
Hope I'm not over complicating things here!
Many thanks

I'd suggest explode-ing the string on #, then getting the 1st 4 elements in that array.
$string = "Hello# m#y name # is Ala#n Colem#n";
$character = "#";
$x = 4;
$split = explode($character, $string);
$split = array_slice($split, 0, $x);
$newString = implode($character, $split).'#';

function posncut( $input, $delim, $x ) {
$p = 0;
for( $i = 0; $i < $x; ++ $i ) {
$p = strpos( $input, $delim, $p );
if( $p === false ) {
return "";
}
++ $p;
}
return substr( $input, 0, $p );
}
echo posncut( $string, $character, $x );
It finds each delimiter in turn (strpos) and stops after the one you're looking for. If it runs out of text first (strpos returns false), it gives an empty string.
Update: here's a benchmark I made which compares this method against explode: http://codepad.org/rxTt79PC. Seems that explode (when used with array_pop instead of array_slice) is faster.

Something along these lines:
$str_length = strlen($string)
$character = "#"
$target_count = 4
$count = 0;
for ($i = 0 ; $i<$str_length ; $i++){
if ($string[$i] == $character) {
$count++
if($count == $target_count) break;
}
}
$result = sub_str($string,0,$i)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Autodetect punctuation in a HTML string, and split the string there - php

Related

PHP - Replacing characters with stars, except when there is a minus

how i can display only 2 phrase from sql [duplicate]

How can I get the correct position of a word in a UTF-8 text? [duplicate]

Shorten a text string in PHP

Truncating a string after x amount of charcters

Categories

Resources