PHP split string to next period (.) - php

My aim is to split string after every 7 words. If the 7th word has a comma (,), move to the next word with period (.) or exclamation (!)
So far, I've been able to split string by 7 words, but cannot check if it contains commas (,) or move to the next word with . or !
$string = "I have a feeling about the rain, let us ride. He figured I was only joking.";
$explode = explode(" ", $string);
$chunk_str = array_chunk($explode, 7);
for ($x=0; $x<count($chunk_str); $x++)
{
$strip = implode(" ",$chunk_str[$x]);
echo $strip.'<br><br>';
}
I expect
I have a feeling about the rain, let us ride.
He figured I was only joking.
But the actual output is
I have a feeling about the rain,
let us ride. He figured I was
only joking.

Here's one way to do what you want. Iterate through the list of words, 7 at a time. If the 7th word ends with a comma, increase the list pointer until you reach a word ending with a period or exclamation mark (or the end of the string). Output the current chunk. When you reach the end of the string, output any remaining words.
$string = "I have a feeling about the rain, let us ride. He figured I was only joking.";
$explode = explode(' ', $string);
$num_words = count($explode);
if ($num_words < 7) {
echo $string;
}
else {
$k = 0;
for ($i = 6; $i < count($explode); $i += 7) {
if (substr($explode[$i], -1) == ',') {
while ($i < $num_words && substr($explode[$i], -1) != '.' && substr($explode[$i], -1) != '!') $i++;
}
echo implode(' ', array_slice($explode, $k, $i - $k + 1)) . PHP_EOL;
$k = $i + 1;
}
}
echo implode(' ', array_slice($explode, $k)) . PHP_EOL;
Output:
I have a feeling about the rain, let us ride.
He figured I was only joking.
Demo on 3v4l.org

Related

Better / Cleaner method for splitting a string based on number of words?

I was in need of a method to count the number of words (not characters) within PHP, and start a <SPAN> tag within HTML to wrap around the remaining words after the specified number.
I looked into functions such as wordwrap and str_word_count, but those didn't seem to help. I went ahead and modified the code found here: http://php.timesoft.cc/manual/en/function.str-word-count.php#55818
Everything seems to work great, however I wanted to post here as this code is from 2005 and maybe there is a more modern / efficient way of handling what I'm trying to achieve?
<?php
$string = "One two three four five six seven eight nine ten.";
// the first number words to extract
$n = 3;
// extract the words
$words = explode(" ", $string);
// chop the words array down to the first n elements
$first = array_slice($words, 0, $n);
// chop the words array down to the retmaining elements
$last = array_slice($words, $n);
// glue the 3 elements back into a spaced sentence
$firstString = implode(" ", $first);
// glue the remaining elements back into a spaced sentence
$lastString = implode(" ", $last);
// display it
echo $firstString;
echo '<span style="font-weight:bold;"> '.$lastString.'</span>';
?>
You could use preg_split() with a regex instead. This is the modified version of this answer with an improved regex that uses a positive lookbehind:
function get_snippet($str, $wordCount) {
$arr = preg_split(
'/(?<=\w)\b/',
$str,
$wordCount*2+1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
);
$first = implode('', array_slice($arr, 0, $wordCount));
$last = implode('', array_slice($arr, $wordCount));
return $first.'<span style="font-weight:bold;">'.$last.'</span>';
}
Usage:
$string = "One two three four five six seven eight nine ten.";
echo get_snippet($string, 3);
Output:
One two three four five six seven eight nine ten.
Demo
Lets more even simple . Try this
<?php
$string = "One two three four five six seven eight nine ten.";
// the first number words to extract
$n = 2;
// extract the words
$words = explode(" ", $string);
for($i=0; $i<=($n-1); $i++) {
$firstString[] = $words[$i]; // This will return one, two
}
for($i =$n; $i<count($words); $i++) {
$firstString[] = $words[$i]; // This will return three four five six seven eight nine ten
}
print_r($firstString);
print_r($firstString);
?>
Demo here
I borrowed the code from here:
https://stackoverflow.com/a/18589825/1578471
/**
* Find the position of the Xth occurrence of a substring in a string
* #param $haystack
* #param $needle
* #param $number integer > 0
* #return int
*/
function strposX($haystack, $needle, $number){
if($number == '1'){
return strpos($haystack, $needle);
}elseif($number > '1'){
return strpos($haystack, $needle, strposX($haystack, $needle, $number - 1) + strlen($needle));
}else{
return error_log('Error: Value for parameter $number is out of range');
}
}
$string = "One two three four five six seven eight nine ten.";
$afterThreeWords = strposX($string, " ", 3);
echo substr($string, 0, $afterThreeWords); // first three words
This looks good to me, here's another way that you might check against this for efficiency?
I have no idea which is quicker. My guess is yours is quicker for longer strings
$string = "This is some reasonably lengthed string";
$n = 3;
$pos = 0
for( $i = 0; $i< $n; $i++ ){
$pos = strpos($string, ' ', $pos + 1);
if( !$pos ){
break;
}
}
if( $pos ){
$firstString = substr($string, 0, $pos);
$lastString = substr($string, $pos + 1);
}else{
$firstString = $string;
$lastString = null;
}

PHP: trim word OR part of it from begining/end of string

I need to trim words from begining and end of string. Problem is, sometimes the words can be abbreviated ie. only first three letters (followed by dot).
I tried hard to find suitable regular expression. Basicaly I need to chatch three or more initial characters up to length of replacement, but I cannot find regular expression, that will match variable length and will keep order of characters.
For example, if I need to trim 'insurance' from sentence 'insur. companies are rich', then pattern \^[insurance]{3,9}\ comes to my mind, but this pattern will also catch words like 'sensace', because order of characters (and their occurance) inside [] is not important for regexp.
Also, at end of string, I need remove serial-numbers, that are abbreviated from beginig - say 'XK-25F14' is sometimes presented as '25F14'. So I decided to go purely with character by character comparison.
Therefore I end with following php function
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// Reverse $s and $dirt so it will trim from the end of string
$s = strrev($s);
if ($reverse)
return trimWords($s, strrev($dirt), $case_insensitive, false);
// After second run return back-reversed string
return trim($s, ' .-');
}
I'm happy with this function, but it has one drawback. It trims only one occurence of word. How to make it trim more occurances, i.e. remove both 'insurance ' from 'Insurance insur. companies'.
And I'm also curious, it realy does not exists such regular expression, that will match variable length and will respect order of characters in pattern?
Final solution
Thanks to mrhobo I have ended with function based on regular expression. This function can be easily improved and shall also be the most efficient for this task.
I have modified my previous function and it is two times quicker than regexp, but it can remove only one word per single run, so to be able to remove word from begin and end, it has to runs itself twice and performance is same as regexp and to remove more than one occurance of word, it has to runs itself multiple times, which will then be more and more slower.
The final function goes like this.
function trimWords($string, $word, $case_insensitive = false, $min_abbrv = 3)
{
$exc = substr($word, $min_abbrv);
$pat = null;
$i = strlen($exc);
while ($i--)
$pat = '(?>'.preg_quote($exc[$i], '#').$pat.')?';
$pat = substr($word, 0, $min_abbrv).$pat;
$pat = '#(?<begin>^)?(?:\W*\b'.$pat.'\b\W*)+(?(begin)|$)#';
if ($case_insensitive)
$pat .= 'i';
return preg_replace($pat, '', $string);
}
NOTE: with this function, it does not matter, if abbreviation ends with dot or not, it wipes out any shorter form of word and also removes all nonword characters around the word.
EDIT: I just tried create replace pattern like insu(r|ra|ran|ranc|rance) and function with atomic groups is faster by ~30% and with longer words it could be possibly even more efficient.
Matching a word and all possible abbreviations from the nth letter isn't quite an easy task in regex.
Here is how I would do it for the word insurance from the 4th letter:
insu(?>r(?>a(?>n(?>c(?>(?<last>e))?)?)?)?)?(?(last)|\.)
http://regex101.com/r/aL2gV4
It works by using atomic groups to force the regex engine as far as possible forward past the last 'rance' letters using the nested pattern (?>a(?>b)?)?. If the last letter letter is matched we're not dealing with an abbreviation thus no dot is required, otherwise the dot is required. This is coded by (?(last)|\.).
To trim, I would create a function to build the above regex for an abbreviation. Then you can write a while loop that replaces each of the abbreviation regexes with empty space until there are no more matches.
Non regex version
Here is my non regex version that removes multiple words and abbreviated words from a string:
function trimWords($str, $word, $min_abbrv, $case_insensitive = false) {
$len = 0;
$word_len = strlen($word);
$strlen = strlen($str);
$cmp = $case_insensitive ? strncasecmp : strncmp;
for ($i = 0; $i < $strlen; $i++) {
if ($cmp($str[$i], $word[$len], $i) == 0) {
$len++;
} else if ($len > 0) {
if ($len == $word_len || ($len >= $min_abbrv && ($dot = $str[$i] == '.'))) {
$i -= $len;
$len += $dot;
$str = substr($str, 0, $i) . substr($str, $i+$len);
$strlen = strlen($str);
$dot = 0;
}
$len = 0;
}
}
return $str;
}
Example:
$string = 'ins. <- "ins." / insu. insuranc. insurance / insurance. <- "."';
echo trimWords($string, 'insurance', 4);
Output is:
ins. <- "ins." / / . <- "."
I wrote function that constructs regular expression pattern according to mrhobo and also simple test and benchmarked it against my function with pure PHP string comparison.
Here is the code:
$string = 'Insur. companies are nasty rich';
$dirt = 'insurance';
$cycles = 500000;
$start = microtime(true);
$i = $cycles;
while ($i) {
$i--;
regexpStyle($string, $dirt, true);
}
$stop = microtime(true);
$i = $cycles;
while ($i) {
$i--;
trimWords($string, $dirt, true);
}
$end = microtime(true);
$res1 = $stop - $start;
$res2 = $end - $stop;
$winner = $res1 < $res2 ? '<<<' : '>>>';
echo 'regexp: '.$res1.' '.$winner.' string operations: '.$res2;
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// After second run return back-reversed string
return trim($s, ' .-');
}
function regexpStyle($s, $dirt, $case_insensitive, $min_abbrev = 3)
{
$ss = substr($dirt, $min_abbrev);
$arr = str_split($ss);
$patt = '(?>(?<last>'.array_pop($arr).'))?';
$i = count($arr);
while ($i)
$patt = '(?>'.$arr[--$i].$patt.')?';
$patt = '#^'.substr($dirt, 0, $min_abbrev).$patt.'(?(last)|\.)#';
$patt .= $case_insensitive ? 'i' : null;
return trim(preg_replace($patt, '', $s));
}
and the winner is... moment of silence... it is...
a draw
regexp: 8.5169589519501 >>> string operations: 8.0951890945435
but I have strong feeling that regexp approach could be better utilized.

reorder / rewrap bbcodes

I'm trying to reorder the BBCodes but I failed
so
[̶b̶]̶[̶i̶]̶[̶u̶]̶f̶o̶o̶[̶/̶b̶]̶[̶/̶u̶]̶[̶/̶i̶]̶ ̶-̶ ̶w̶r̶o̶n̶g̶ ̶o̶r̶d̶e̶r̶ ̶ ̶
I̶ ̶w̶a̶n̶t̶ ̶i̶t̶ ̶t̶o̶ ̶b̶e̶:̶ ̶
̶[̶b̶]̶[̶i̶]̶[̶u̶]̶f̶o̶o̶[̶/̶u̶]̶[̶/̶i̶]̶[̶/̶b̶]̶ ̶-̶ ̶r̶i̶g̶h̶t̶ ̶o̶r̶d̶e̶r̶
PIC:
I tried with
<?php
$string = '[b][i][u]foo[/b][/u][/i]';
$search = array('/\[b](.+?)\[\/b]/is', '/\[i](.+?)\[\/i]/is', '/\[u](.+?)\[\/u]/is');
$replace = array('[b]$1[/b]', '[i]$1[/i]', '[u]$1[/u]');
echo preg_replace($search, $replace, $string);
?>
OUTPUT: [b][i][u]foo[/b][/u][/i]
any suggestions ? thanks!
phew, spent awhile thinking of the logic to do this. (feel free to put it in a function)
this only works for the scenario given. Like other users have commented it's impossible. You shouldn't be doing this. Or even on server side. I'd use a client side parser just to throw a syntax error.
supports [b]a[i]b[u]foo[/b]baa[/u]too[/i]
and bbcode with custom values [url=test][i][u]foo[/url][/u][/i]
Will break with
[b] bold [/b][u] underline[/u]
And [b] bold [u][/b] underline[/u]
//input string to be reorganized
$string = '[url=test][i][u]foo[/url][/u][/i]';
echo $string . "<br />";
//search for all opentags (including ones with values
$tagsearch = "/\[([A-Za-z]+)[A-Za-z=._%?&:\/-]*\]/";
preg_match_all($tagsearch, $string, $tags);
//search for all close tags to store them for later
$closetagsearch = "/(\[\/([A-Za-z]+)\])/is";
preg_match_all($closetagsearch, $string, $closetags);
//flip the open tags for reverse parsing (index one is just letters)
$tags[1] = array_reverse($tags[1]);
//create temp var to store new ordered string
$temp = "";
//this is the last known position in the original string after a match
$last = 0;
//iterate through each char of the input string
for ($i = 0, $len = strlen($string); $i < $len; $i++) {
//if we run out of tags to replace/find stop looping
if (empty($tags[1]) || empty($closetags[1]))
continue;
//this is the part of the string that has no matches
$good = substr($string, $last, $i - $last);
//next closing tag to search for
$next = $closetags[1][0];
//how many chars ahead to compare against
$scope = substr($string, $i, strlen($next));
//if we have a match
if ($scope === "$next") {
//add to the temp variable with a modified
//version of an open tag letter to become a close tag
$temp .= $good . substr_replace("[" . $tags[1][0] . "]", "/", 1, 0);
//remove the first key/value in both arrays
array_shift($tags[1]);
array_shift($closetags[1]);
//update the last known unmatched char
$last += strlen($good . $scope);
}
}
echo $temp;
Please also note: it might be the users intention to nest the tags out of order :X

Add space after every 4th character

I want to add a space to some output after every 4th character until the end of the string.
I tried:
$str = $rows['value'];
<? echo substr($str, 0, 4) . ' ' . substr($str, 4); ?>
Which just got me the space after the first 4 characters.
How can I make it show after every 4th ?
You can use chunk_split [docs]:
$str = chunk_split($rows['value'], 4, ' ');
DEMO
If the length of the string is a multiple of four but you don't want a trailing space, you can pass the result to trim.
Wordwrap does exactly what you want:
echo wordwrap('12345678' , 4 , ' ' , true )
will output:
1234 5678
If you want, say, a hyphen after every second digit instead, swap the "4" for a "2", and the space for a hyphen:
echo wordwrap('1234567890' , 2 , '-' , true )
will output:
12-34-56-78-90
Reference - wordwrap
Have you already seen this function called wordwrap?
http://us2.php.net/manual/en/function.wordwrap.php
Here is a solution. Works right out of the box like this.
<?php
$text = "Thiswordissoverylong.";
$newtext = wordwrap($text, 4, "\n", true);
echo "$newtext\n";
?>
Here is an example of string with length is not a multiple of 4 (or 5 in my case).
function space($str, $step, $reverse = false) {
if ($reverse)
return strrev(chunk_split(strrev($str), $step, ' '));
return chunk_split($str, $step, ' ');
}
Use :
echo space("0000000152748541695882", 5);
result: 00000 00152 74854 16958 82
Reverse mode use ("BVR code" for swiss billing) :
echo space("1400360152748541695882", 5, true);
result: 14 00360 15274 85416 95882
EDIT 2021-02-09
Also useful for EAN13 barcode formatting :
space("7640187670868", 6, true);
result : 7 640187 670868
short syntax version :
function space($s=false,$t=0,$r=false){return(!$s)?false:(($r)?trim(strrev(chunk_split(strrev($s),$t,' '))):trim(chunk_split($s,$t,' ')));}
Hope it could help some of you.
On way would be to split into 4-character chunks and then join them together again with a space between each part.
As this would technically miss to insert one at the very end if the last chunk would have exactly 4 characters, we would need to add that one manually (Demo):
$chunk_length = 4;
$chunks = str_split($str, $chunk_length);
$last = end($chunks);
if (strlen($last) === $chunk_length) {
$chunks[] = '';
}
$str_with_spaces = implode(' ', $chunks);
one-liner:
$yourstring = "1234567890";
echo implode(" ", str_split($yourstring, 4))." ";
This should give you as output:
1234 5678 90
That's all :D
The function wordwrap() basically does the same, however this should work as well.
$newstr = '';
$len = strlen($str);
for($i = 0; $i < $len; $i++) {
$newstr.= $str[$i];
if (($i+1) % 4 == 0) {
$newstr.= ' ';
}
}
PHP3 Compatible:
Try this:
$strLen = strlen( $str );
for($i = 0; $i < $strLen; $i += 4){
echo substr($str, $i, 4) . ' ';
}
unset( $strLen );
StringBuilder str = new StringBuilder("ABCDEFGHIJKLMNOP");
int idx = str.length() - 4;
while (idx > 0){
str.insert(idx, " ");
idx = idx - 4;
}
return str.toString();
Explanation, this code will add space from right to left:
str = "ABCDEFGH" int idx = total length - 4; //8-4=4
while (4>0){
str.insert(idx, " "); //this will insert space at 4th position
idx = idx - 4; // then decrement 4-4=0 and run loop again
}
The final output will be:
ABCD EFGH

Parse PHP String Based On Number of Characters

I'm starting to work on a small script that takes a string, counts the number of characters, then, based on the number of characters, splits/breaks the string apart and sends/emails 110 characters at a time.
What would be the proper logic/PHP to use to:
1) Count the number of characters in the string
2) Preface each message with (1/3) (2/3) (3/3), etc...
3) And only send 110 characters at a time.
I know I'd probably have to use strlen to count the characters, and some type of loop to loop through, but I'm not quite sure how to go about it.
Thanks!
You could use str_split, if you're not concerned with where you break the strings.
Else, if you are concerned with this (and want to, say, split only on a whitespace), you could do something like:
// $str is the string you want to chop up.
$split = preg_split('/(.{0,110})\s/',
$str,
0,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
With this array you could then simply do:
$count = count($split);
foreach ($split as $key => $message) {
$part = sprintf("(%d/%d) %s", $key+1, $count, $message);
// $part is now one of your messages;
// do what you wish with it here.
}
use str_split() and iterate over the resulting array.
From the top of my head, should work as is, but doesn't have to. Logic is ok though.
foreach ($messages as $msg) {
$len = strlen($msg);
if ($len > 110) {
$parts = ceil($len / 100);
for ($i = 1; $i <= $parts; $i++) {
$part = $i . '/' . $parts . ' ' . substr($msg, 0, 110);
$msg = substr($msg, 109);
your_sending_func($part);
}
} else {
your_sending_func($msg);
}
}

Categories