Sentence case without regular expressions - php

Is there a built-in php function, or a simple (efficient!) way to combine built-in functions, to give a string sentence case ("Sentence one. Sentence two.")?
PHP has similar built-in functions, but none that I can find for my it to my purposes:
ucfirst(strtolower("SENTENCE ONE. AND HERE'S TWO.")) returns "Sentence one. and here's two."; ucwords(strtolower("SENTENCE ONE. AND HERE'S TWO.")) "Sentence One. And Here's Two."

function sentence_case($str) {
$cap = true;
$ret='';
for($x = 0; $x < strlen($str); $x++){
$letter = substr($str, $x, 1);
if($letter == "." || $letter == "!" || $letter == "?"){
$cap = true;
}elseif($letter != " " && $cap == true){
$letter = strtoupper($letter);
$cap = false;
}
$ret .= $letter;
}
return $ret;
}
This will preserve existing proper noun capitals, acronyms and abbreviations.

You could split the string on ".", then ucfirst each sentence. Not the most elegant solution, but it works.
$sentences = explode(".",$paragraph);
$text = "";
foreach($sentences as $sentence) {
$text .= ucfirst(strtolower($sentence)).".";
}

Try this:
function sentenceCase($s){
$str = strtolower($s);
$cap = true;
for($x = 0; $x < strlen($str); $x++){
$letter = substr($str, $x, 1);
if($letter == "." || $letter == "!" || $letter == "?"){
$cap = true;
}elseif($letter != " " && $cap == true){
$letter = strtoupper($letter);
$cap = false;
}
$ret .= $letter;
}
return $ret;
}
Taken from php.net Works with more than just periods as line endings.

I came up with this solution using preg_split. It will try to split sentences on . boundaries where there is one or more spaces after the period.
It is still pretty efficient, but arguably less so that it's explode counterpart.
<?php
$str = "SENTENCE ONE. AND HERE'S TWO.";
$sentences = preg_split('/(\.\s+)/', $str, null, PREG_SPLIT_DELIM_CAPTURE);
array_walk(&$sentences, create_function('&$val', '$val = ucfirst(strtolower($val));'));
$str = implode('', $sentences);
echo $str; // Sentence one. And here's two.

Will work with new line breaks not only spaces.
function sentenceCase($text){
$cap = true; $newText = '';
for($x = 0; $x < strlen($text); $x++){
$letter = substr($text, $x, 1);
if($letter == '.' || $letter == '!' || $letter == '?' || $letter == "\n"){
$cap = true;
} elseif($letter != ' ' && $cap == true){
$letter = strtoupper($letter);
$cap = false;
}
$newText .= $letter;
}
return $newText;
}

Related

Php function UTF-8 characters issue

Here is my function that makes the first character of the first word of a sentence uppercase:
function sentenceCase($str)
{
$cap = true;
$ret = '';
for ($x = 0; $x < strlen($str); $x++) {
$letter = substr($str, $x, 1);
if ($letter == "." || $letter == "!" || $letter == "?") {
$cap = true;
} elseif ($letter != " " && $cap == true) {
$letter = strtoupper($letter);
$cap = false;
}
$ret .= $letter;
}
return $ret;
}
It converts "sample sentence" into "Sample sentence". The problem is, it doesn't capitalize UTF-8 characters. See this example.
What am I doing wrong?
The most straightforward way to make your code UTF-8 aware is to use mbstring functions instead of the plain dumb ones in the three cases where the latter appear:
function sentenceCase($str)
{
$cap = true;
$ret = '';
for ($x = 0; $x < mb_strlen($str); $x++) { // mb_strlen instead
$letter = mb_substr($str, $x, 1); // mb_substr instead
if ($letter == "." || $letter == "!" || $letter == "?") {
$cap = true;
} elseif ($letter != " " && $cap == true) {
$letter = mb_strtoupper($letter); // mb_strtoupper instead
$cap = false;
}
$ret .= $letter;
}
return $ret;
}
You can then configure mbstring to work with UTF-8 strings and you are ready to go:
mb_internal_encoding('UTF-8');
echo sentenceCase ("üias skdfnsknka");
Bonus solution
Specifically for UTF-8 you can also use a regular expression, which will result in less code:
$str = "üias skdfnsknka";
echo preg_replace_callback(
'/((?:^|[!.?])\s*)(\p{Ll})/u',
function($match) { return $match[1].mb_strtoupper($match[2], 'UTF-8'); },
$str);

match the first & last whole word in a variable

I use php preg_match to match the first & last word in a variable with a given first & last specific words,
example:
$first_word = 't'; // I want to force 'this'
$last_word = 'ne'; // I want to force 'done'
$str = 'this function can be done';
if(preg_match('/^' . $first_word . '(.*)' . $last_word .'$/' , $str))
{
echo 'true';
}
But the problem is i want to force match the whole word at (starting & ending) not the first or last characters.
Using \b as boudary word limit in search:
$first_word = 't'; // I want to force 'this'
$last_word = 'ne'; // I want to force 'done'
$str = 'this function can be done';
if(preg_match('/^' . $first_word . '\b(.*)\b' . $last_word .'$/' , $str))
{
echo 'true';
}
I would go about this in a slightly different way:
$firstword = 't';
$lastword = 'ne';
$string = 'this function can be done';
$words = explode(' ', $string);
if (preg_match("/^{$firstword}/i", reset($words)) && preg_match("/{$lastword}$/i", end($words)))
{
echo 'true';
}
==========================================
Here's another way to achieve the same thing
$firstword = 'this';
$lastword = 'done';
$string = 'this can be done';
$words = explode(' ', $string);
if (reset($words) === $firstword && end($words) === $lastword)
{
echo 'true';
}
This is always going to echo true, because we know the firstword and lastword are correct, try changing them to something else and it will not echo true.
I wrote a function to get Start of sentence but it is not any regex in it.
You can write for end like this. I don't add function for the end because of its long...
<?php
function StartSearch($start, $sentence)
{
$data = explode(" ", $sentence);
$flag = false;
$ret = array();
foreach ($data as $val)
{
for($i = 0, $j = 0;$i < strlen($val), $j < strlen($start);$i++)
{
if ($i == 0 && $val{$i} != $start{$j})
break;
if ($flag && $val{$i} != $start{$j})
break;
if ($val{$i} == $start{$j})
{
$flag = true;
$j++;
}
}
if ($j == strlen($start))
{
$ret[] = $val;
}
}
return $ret;
}
print_r(StartSearch("th", $str));
?>

How To Replace Some Characters With Asterisks

I have a simple task to do with PHP, but since I'm not familiar with Regular Expression or something... I have no clue what I'm going to do.
what I want is very simple actually...
let's say I have these variables :
$Email = 'john#example.com'; // output : ****#example.com
$Email2 = 'janedoe#example.com'; // output : *******#example.com
$Email3 = 'johndoe2012#example.com'; // output : ***********#example.com
$Phone = '0821212121'; // output : 082121**** << REPLACE LAST FOUR DIGIT WITH *
how to do this with PHP? thanks.
You'll need a specific function for each. For mails:
function hide_mail($email) {
$mail_segments = explode("#", $email);
$mail_segments[0] = str_repeat("*", strlen($mail_segments[0]));
return implode("#", $mail_segments);
}
echo hide_mail("example#gmail.com");
For phone numbers
function hide_phone($phone) {
return substr($phone, 0, -4) . "****";
}
echo hide_phone("1234567890");
And see? Not a single regular expression used. These functions don't check for validity though. You'll need to determine what kind of string is what, and call the appropriate function.
For e-mails, this function preserves first letter:
function hideEmail($email)
{
$parts = explode('#', $email);
return substr($parts[0], 0, min(1, strlen($parts[0])-1)) . str_repeat('*', max(1, strlen($parts[0]) - 1)) . '#' . $parts[1];
}
hideEmail('hello#domain.com'); // h****#domain.com
hideEmail('hi#domain.com'); // h*#domain.com
hideEmail('h#domain.com'); // *#domain.com
I tried for a single-regex solution but don't think it's possible due to the variable-length asterisks. Perhaps something like this:
function anonymiseString($str)
{
if(is_numeric($str))
{
$str = preg_replace('/^(\d*?)\d{4}$/', '$1****');
}
elseif(($until = strpos($str, '#')) !== false)
{
$str = str_repeat('*', $until) . substr($str, $until + 1);
}
return $str;
}
I create one function to do this, works fine for me. i hope help.
function ofuscaEmail($email, $domain_ = false){
$seg = explode('#', $email);
$user = '';
$domain = '';
if (strlen($seg[0]) > 3) {
$sub_seg = str_split($seg[0]);
$user .= $sub_seg[0].$sub_seg[1];
for ($i=2; $i < count($sub_seg)-1; $i++) {
if ($sub_seg[$i] == '.') {
$user .= '.';
}else if($sub_seg[$i] == '_'){
$user .= '_';
}else{
$user .= '*';
}
}
$user .= $sub_seg[count($sub_seg)-1];
}else{
$sub_seg = str_split($seg[0]);
$user .= $sub_seg[0];
for ($i=1; $i < count($sub_seg); $i++) {
$user .= ($sub_seg[$i] == '.') ? '.' : '*';
}
}
$sub_seg2 = str_split($seg[1]);
$domain .= $sub_seg2[0];
for ($i=1; $i < count($sub_seg2)-2; $i++) {
$domain .= ($sub_seg2[$i] == '.') ? '.' : '*';
}
$domain .= $sub_seg2[count($sub_seg2)-2].$sub_seg2[count($sub_seg2)-1];
return ($domain_ == false) ? $user.'#'.$seg[1] : $user.'#'.$domain ;
}
Output: a******#gmail.com
$email = str_replace(substr($old_email, 1, strlen(explode("#", $old_email)[0])-1), "**********", $old_email);
This is a quick fix to the question above;
It ensures just the first character of the email address as the extension shows up.
You can increase or reduce the number of asterisks depending

Cutting text without destroying html tags

Is there a way to do this without writing my own function?
For example:
$text = 'Test <span><a>something</a> something else</span>.';
$text = cutText($text, 2, null, 20, true);
//result: Test <span><a>something</a></span>
I need to make this function indestructible
My problem is similar to
This thread
but I need a better solution. I would like to keep nested tags untouched.
So far my algorithm is:
function cutText($content, $max_words, $max_chars, $max_word_len, $html = false) {
$len = strlen($content);
$res = '';
$word_count = 0;
$word_started = false;
$current_word = '';
$current_word_len = 0;
if ($max_chars == null) {
$max_chars = $len;
}
$inHtml = false;
$openedTags = array();
for ($i = 0; $i<$max_chars;$i++) {
if ($content[$i] == '<' && $html) {
$inHtml = true;
}
if ($inHtml) {
$max_chars++;
}
if ($html && !$inHtml) {
if ($content[$i] != ' ' && !$word_started) {
$word_started = true;
$word_count++;
}
$current_word .= $content[$i];
$current_word_len++;
if ($current_word_len == $max_word_len) {
$current_word .= '- ';
}
if (($content[$i] == ' ') && $word_started) {
$word_started = false;
$res .= $current_word;
$current_word = '';
$current_word_len = 0;
if ($word_count == $max_words) {
return $res;
}
}
}
if ($content[$i] == '<' && $html) {
$inHtml = true;
}
}
return $res;
}
But of course it won't work. I thought about remembering opened tags and closing them if they were not closed but maybe there is a better way?
This works perfectly for me:
function trimContent ($str, $trimAtIndex) {
$beginTags = array();
$endTags = array();
for($i = 0; $i < strlen($str); $i++) {
if( $str[$i] == '<' )
$beginTags[] = $i;
else if($str[$i] == '>')
$endTags[] = $i;
}
foreach($beginTags as $k=>$index) {
// Trying to trim in between tags. Trim after the last tag
if( ( $trimAtIndex >= $index ) && ($trimAtIndex <= $endTags[$k]) ) {
$trimAtIndex = $endTags[$k];
}
}
return substr($str, 0, $trimAtIndex);
}
Try something like this
function cutText($inputText, $start, $length) {
$temp = $inputText;
$res = array();
while (strpos($temp, '>')) {
$ts = strpos($temp, '<');
$te = strpos($temp, '>');
if ($ts > 0) $res[] = substr($temp, 0, $ts);
$res[] = substr($temp, $ts, $te - $ts + 1);
$temp = substr($temp, $te + 1, strlen($temp) - $te);
}
if ($temp != '') $res[] = $temp;
$pointer = 0;
$end = $start + $length - 1;
foreach ($res as &$part) {
if (substr($part, 0, 1) != '<') {
$l = strlen($part);
$p1 = $pointer;
$p2 = $pointer + $l - 1;
$partx = "";
if ($start <= $p1 && $end >= $p2) $partx = "";
else {
if ($start > $p1 && $start <= $p2) $partx .= substr($part, 0, $start-$pointer);
if ($end >= $p1 && $end < $p2) $partx .= substr($part, $end-$pointer+1, $l-$end+$pointer);
if ($partx == "") $partx = $part;
}
$part = $partx;
$pointer += $l;
}
}
return join('', $res);
}
Parameters:
$inputText - input text
$start - position of first character
$length - how menu characters we want to remove
Example #1 - Removing first 3 characters
$text = 'Test <span><a>something</a> something else</span>.';
$text = cutText($text, 0, 3);
var_dump($text);
Output (removed "Tes")
string(47) "t <span><a>something</a> something else</span>."
Removing first 10 characters
$text = cutText($text, 0, 10);
Output (removed "Test somet")
string(40) "<span><a>hing</a> something else</span>."
Example 2 - Removing inner characters - "es" from "Test "
$text = cutText($text, 1, 2);
Output
string(48) "Tt <span><a>something</a> something else</span>."
Removing "thing something el"
$text = cutText($text, 9, 18);
Output
string(32) "Test <span><a>some</a>se</span>."
Hope this helps.
Well, maybe this is not the best solution but it's everything I can do at the moment.
Ok I solved this thing.
I divided this in 2 parts.
First cutting text without destroying html:
function cutHtml($content, $max_words, $max_chars, $max_word_len) {
$len = strlen($content);
$res = '';
$word_count = 0;
$word_started = false;
$current_word = '';
$current_word_len = 0;
if ($max_chars == null) {
$max_chars = $len;
}
$inHtml = false;
$openedTags = array();
$i = 0;
while ($i < $max_chars) {
//skip any html tags
if ($content[$i] == '<') {
$inHtml = true;
while (true) {
$res .= $content[$i];
$i++;
while($content[$i] == ' ') { $res .= $content[$i]; $i++; }
//skip any values
if ($content[$i] == "'") {
$res .= $content[$i];
$i++;
while(!($content[$i] == "'" && $content[$i-1] != "\\")) {
$res .= $content[$i];
$i++;
}
}
//skip any values
if ($content[$i] == '"') {
$res .= $content[$i];
$i++;
while(!($content[$i] == '"' && $content[$i-1] != "\\")) {
$res .= $content[$i];
$i++;
}
}
if ($content[$i] == '>') { $res .= $content[$i]; $i++; break;}
}
$inHtml = false;
}
if (!$inHtml) {
while($content[$i] == ' ') { $res .= $content[$i]; $letter_count++; $i++; } //skip spaces
$word_started = false;
$current_word = '';
$current_word_len = 0;
while (!in_array($content[$i], array(' ', '<', '.', ','))) {
if (!$word_started) {
$word_started = true;
$word_count++;
}
$current_word .= $content[$i];
$current_word_len++;
if ($current_word_len == $max_word_len) {
$current_word .= '-';
$current_word_len = 0;
}
$i++;
}
if ($letter_count > $max_chars) {
return $res;
}
if ($word_count < $max_words) {
$res .= $current_word;
$letter_count += strlen($current_word);
}
if ($word_count == $max_words) {
$res .= $current_word;
$letter_count += strlen($current_word);
return $res;
}
}
}
return $res;
}
And next thing is closing unclosed tags:
function cleanTags(&$html) {
$count = strlen($html);
$i = -1;
$openedTags = array();
while(true) {
$i++;
if ($i >= $count) break;
if ($html[$i] == '<') {
$tag = '';
$closeTag = '';
$reading = false;
//reading whole tag
while($html[$i] != '>') {
$i++;
while($html[$i] == ' ') $i++; //skip any spaces (need to be idiot proof)
if (!$reading && $html[$i] == '/') { //closing tag
$i++;
while($html[$i] == ' ') $i++; //skip any spaces
$closeTag = '';
while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string
$reading = true;
$html[$i] = strtolower($html[$i]); //tags to lowercase
$closeTag .= $html[$i];
$i++;
}
$c = count($openedTags);
if ($c > 0 && $openedTags[$c-1] == $closeTag) array_pop($openedTags);
}
if (!$reading) //read only tag
while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string
$reading = true;
$html[$i] = strtolower($html[$i]); //tags to lowercase
$tag .= $html[$i];
$i++;
}
//skip any values
if ($html[$i] == "'") {
$i++;
while(!($html[$i] == "'" && $html[$i-1] != "\\")) {
$i++;
}
}
//skip any values
if ($html[$i] == '"') {
$i++;
while(!($html[$i] == '"' && $html[$i-1] != "\\")) {
$i++;
}
}
if ($reading && $html[$i] == '/') { //self closed tag
$tag = '';
break;
}
}
if (!empty($tag)) $openedTags[] = $tag;
}
}
while (count($openedTags) > 0) {
$tag = array_pop($openedTags);
$html .= "</$tag>";
}
}
It's not idiot proof but tinymce will clear this thing out so further cleaning is not necessary.
It may be a little long but i don't think it will eat a lot of resources and it should be faster than regex.

PHP - isolate first letter from a text

I have a set of articles, in which I want to style the first letter from each article (with CSS).
the articles usually start with a paragrah, like:
<p> bla bla </p>
So how could I wrap the first letter from this text within a <span> tag ?
Unless you need to do something extremely fancy, there's also the :first-letter CSS selector.
<?php
$str = '<p> bla bla </p>';
$search = '_^<p> *([\w])(.+) *</p>$_i';
$replacement = '<p><span>$1</span>$2</p>';
$new = preg_replace( $search, $replacement, $str );
echo $new."\n";
You can do this in all CSS.
CSS supports "Pseudo-Elements" where you can choose the first letter / first word and format it differently from the rest of the document.
http://www.w3schools.com/CSS/CSS_pseudo_elements.asp
There's a compatibility chart; some of these may not work in IE 6
http://kimblim.dk/css-tests/selectors/
you could add a Php span but might not be as clean
$s = " la la ";
$strip = trim(strip_tags($s));
$t = explode(' ', $strip);
$first = $t[0];
// then replace first character with span around it
$replace = preg_replace('/^?/', '$1', $first);
// then replace the first time of that word in the string
$s = preg_replace('/'.$first.'/', $replace, $s, 1);
echo $s;
//not tested
I've not found a versatile method yet, but a traditional code implementation (that may be slower) works:
function pos_first_letter($haystack) {
$ret = false;
if (!empty($haystack)) {
$l = strlen($haystack);
$t = false;
for ($i=0; $i < $l; $i++) {
if (!$t && ($haystack[$i] == '<') ) $t = true;
elseif ($t && ($haystack[$i] == '>')) $t = false;
elseif (!$t && !ctype_space($haystack[$i])) {
$ret = $i;
break;
}
}
}
return $ret;
}
Then call:
$i = pos_first_letter( $your_string );
if ($i !== false) {
$output = substr($s, 0, $i);
$output .= '<span>' . substr($s, $i, 1) . '</span>';
$output .= substr($s, $i+1);
}

Categories