PHP sentence case function - php

I have got a function from Google to clean my paragraph in sentence case.
I want to modify this function such that it converts more than 2 new line character to newline. Else it converts all new line characters in space. So I would have it in paragraph format. Here it's converting all new line character to space.
It's converting in proper sentence case. But, In case If I found single word having first word as capital then I need function to ignore that. As Sometimes, if there would be any noun It needs to be capital. We can't change it small case. Else if noun and it is having more than 2 capital characters other than first character than convert it to lower case.
Like -> Noun => Noun but NoUn => noun. Means I want if other than first character is capital than it convert it two lower case Else it keep it in same format.
function sentence_case($string) {
$sentences = preg_split('/([.?!]+)/', $string, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$new_string = '';
foreach ($sentences as $key => $sentence) {
$new_string .= ($key & 1) == 0?
ucfirst(strtolower(trim($sentence))) :
$sentence.' ';
}
$new_string = preg_replace("/\bi\b/", "I", $new_string);
//$new_string = preg_replace("/\bi\'\b/", "I'", $new_string);
$new_string = clean_spaces($new_string);
$new_string = m_r_e_s($new_string);
return trim($new_string);
}

function sentence_case($str) {
$cap = true;
$ret='';
for($x = 0; $x < strlen($str); $x++){
$letter = substr($str, $x, 1);
if($letter == "." || $letter == "!" || $letter == "?"){
$cap = true;
}elseif($letter != " " && $cap == true){
$letter = strtoupper($letter);
$cap = false;
}
$ret .= $letter;
}
return $ret;
}
This will preserve existing proper noun capitals, acronyms and abbreviations.

This should do it:
function sentence_case($string) {
// Protect the paragraphs and the caps
$parDelim = "/(\n+\r?)/";
$string = preg_replace($parDelim, "#PAR#", $string);
$string = preg_replace("/\b([A-Z])/", "#$1#", $string);
$sentences = preg_split('/([.?!]+)/', $string, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$new_string = '';
foreach ($sentences as $key => $sentence) {
$new_string .= ($key & 1) == 0?
ucfirst(strtolower(trim($sentence))) :
$sentence.' ';
}
$new_string = preg_replace("/\bi\b/", "I", $new_string);
//$new_string = preg_replace("/\bi\'\b/", "I'", $new_string);
$new_string = clean_spaces($new_string);
$new_string = m_r_e_s($new_string);
// Restore the paragraphs and the caps
$new_string = preg_replace("#PAR#", PHP_EOL, $new_string);
$new_string = preg_replace("/#([A-Z])#/", "$1", $new_string);
return trim($new_string);
}
It works by identifying the items you want to protect (paragraph & first cap) and marking it with a string that you can then replace back when you are done. The assumption is that you don't already have #PAR# and #A-Z# in the string. You can use whatever you want to use for the paragraph delimiter afterwards if you want to force a certain type of line ending or several lines in between.

Slightly modifying Sylverdrag's function, I got something that is working well for me. This version works for every messed-up sentence I've thrown at it so far :)
function sentence_case($string) {
$string = strtolower($string); // ADDED THIS LINE
// Protect the paragraphs and the caps
$parDelim = "/(\n+\r?)/";
$string = preg_replace($parDelim, "#PAR#", $string);
$string = preg_replace("/\b([A-Z])/", "#$1#", $string);
$sentences = preg_split('/([.?!]+)/', $string, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$new_string = '';
foreach ($sentences as $key => $sentence) {
$new_string .= ($key & 1) == 0?
ucfirst(strtolower(trim($sentence))) :
$sentence.' ';
}
$new_string = preg_replace("/\bi\b/", "I", $new_string);
// GOT RID OF THESE LINES
// $new_string = preg_replace("/\bi\'\b/", "I'", $new_string);
// $new_string = clean_spaces($new_string);
// $new_string = m_r_e_s($new_string);
// Restore the paragraphs and the caps
$new_string = preg_replace("#PAR#", PHP_EOL, $new_string);
$new_string = preg_replace("/#([A-Z])#/", "$1", $new_string);
$new_string = preg_replace("/#([A-Z])#/", "$1", $new_string);
preg_match('/#(.*?)#/', $new_string, $match); // MODIFIED THIS LINE TO USE (.*?)
return trim($new_string);
}
$string = "that IS it!! YOU and i are Totally DONE hanging out toGether... like FOREveR DUDE! i AM serious. a good FRIEND would NOT treAT me LIKE you dO. it Seems, howEVER, THAT you ARE not ONE of tHe GOOD ONEs.";
echo "<b>String 1:</b> " . $string . "<br><b>String 2:</b> " . sentence_case($string);
PHP Sandbox
:) :)

Related

How to print longest and shortest word from text file story.txt and find the frequency of each of them in php?

$string = file_get_contents('./story.txt', true);
$words = explode('\n', $string);
$w = array();
foreach ($words as $word) {
$temp = explode(' ', $word);
$w = array_merge($w, $temp);
}
$longestWordLength = 0;
$longestWord = '';
foreach ($w as $word) {
if (strlen($word) > $longestWordLength) {
$longestWordLength = strlen($word);
$longestWord = $word;
}
}
echo $longestWord;
echo strlen($longestWord);
I have written this code but it scans the ending word from one para and first word from next paragraph as same. In the following para:
arts along the stream looks almost like a flash of sunlight.
Desert animals are generally the color of the desert.
In this,
sunlight. Desert
is treated as one word.
You can try this way to get the longest string from the file text, I've removed extra spaces and dot characters from the array using array_map()
function longest_string_in_array($array) {
$array = array_map(function($item) { return trim($item, '. ');},$array);
$mapping = array_combine($array, array_map('strlen', $array));
return $mapping;
}
$string = file_get_contents('./story.txt', true);
$array = preg_split('/[\s]+/', $string );
print_r(longest_string_in_array($array));
So, I guess the answer to this handsome boy's problem is that those words are not splitting apart that are separated by anything other than single whitespace. In this case the solution is to use,
$string = file_get_contents('./story.txt', true);
$words = preg_split('/\s+/', $string);
and you're good to go.

Edit all odd words in string to upper case

I need to edit all odd words to upper case.
Here is sample of imput string:
very long string with many words
Expected output:
VERY long STRING with MANY words
I have this code, but it seams to me, that I can do it in better way.
<?php
$lines = file($_FILES["fname"]["tmp_name"]);
$pattern = "/(\S[\w]*)/";
foreach($lines as $value)
{
$words = NULL;
$fin_str = NULL;
preg_match_all($pattern, $value, $matches);
for($i = 0; $i < count($matches[0]); $i = $i + 2){
$matches[0][$i] = strtoupper($matches[0][$i]);
$fin_str = implode(" ", $matches[0]);
}
echo $fin_str ."<br>";
P.S. I need to use only preg_match function.
Here's a preg_replace_callback example:
<?php
$str = 'very long string with many words';
$newStr = preg_replace_callback('/([^ ]+) +([^ ]+)/',
function($matches) {
return strtoupper($matches[1]) . ' ' . $matches[2];
}, $str);
print $newStr;
// VERY long STRING with MANY words
?>
You only need to match the repeating pattern: /([^ ]+) +([^ ]+)/, a pair of words, then preg_replace_callback recurses over the string until all possible matches are matched and replaced. preg_replace_callback is necessary to call the strtoupper function and pass the captured backreference to it.
Demo
If you have to use regular expressions, this should get you started:
$input = 'very long string with many words';
if (preg_match_all('/\s*(\S+)\s*(\S+)/', $input, $matches)) {
$words = array();
foreach ($matches[1] as $key => $odd) {
$even = isset($matches[2][$key]) ? $matches[2][$key] : null;
$words[] = strtoupper($odd);
if ($even) {
$words[] = $even;
}
}
echo implode(' ', $words);
}
This will output:
VERY long STRING with MANY words
You may don't need regex simply use explode and concatenate the string again:
<?php
function upperizeEvenWords($str){
$out = "";
$arr = explode(' ', $str);
for ($i = 0; $i < count($arr); $i++){
if (!($i%2)){
$out .= strtoupper($arr[$i])." ";
}
else{
$out .= $arr[$i]." ";
}
}
return trim($out);
}
$str = "very long string with many words";
echo upperizeEvenWords($str);
Checkout this DEMO

Propper Regex Pattern

I need to check the incoming string and leave only characters, matching:
small case a-z letters
_ character
any numbers
only one dot (first one)
$string = 'contDADdas7.6.asdASDj_##e1!Ddd__aa#S.txt';
$pattern = "/[a-z_0-9]+/";
preg_match_all("/[a-z_0-9]+/", $name, $result);
echo implode('', $result[0]);
has to be
contdas7.6asdj_e1dd__aatxt
It matches first three points, how can I take only one first dot ?
You can try this:
$string = strrev($string);
$string = preg_replace('~[^a-z0-9_.]++|\.(?![^.]*$)~', '', $string);
$string = strrev($string);
An other way:
$strs = explode('.', $string);
if (count($strs)>1) {
$strs[0] .= '.' . $strs[1];
unset($strs[1]);
}
$string = preg_replace('~[^a-z0-9_.]++~', '', implode('', $strs));
<?php
$str = "contDADdas7.6.asdASDj_##e1!Ddd__aa#S.txt";
preg_match_all("/[a-z_0-9\.]+/", $str, $match);
$newstr = implode("", $match[0]);
echo substr_replace(str_replace(".", "", $newstr), ".", strpos($newstr, "."), 0);
Output:
contdas7.6asdj_e1dd__aatxt

preg_replace - replace certain character

i want : He had XXX to have had it. Or : He had had to have XXX it.
$string = "He had had to have had it.";
echo preg_replace('/had/', 'XXX', $string, 1);
output :
He XXX had to have had it.
in the case of, 'had' is replaced is the first.
I want to use the second and third. not reading from the right or left, what "preg_replace" can do it ?
$string = "He had had to have had it.";
$replace = 'XXX';
$counter = 0; // Initialise counter
$entry = 2; // The "found" occurrence to replace (starting from 1)
echo preg_replace_callback(
'/had/',
function ($matches) use ($replace, &$counter, $entry) {
return (++$counter == $entry) ? $replace : $matches[0];
},
$string
);
Try this:
<?php
function my_replace($srch, $replace, $subject, $skip=1){
$subject = explode($srch, $subject.' ', $skip+1);
$subject[$skip] = str_replace($srch, $replace, $subject[$skip]);
while (($tmp = array_pop($subject)) == '');
$subject[]=$tmp;
return implode($srch, $subject);
}
$test ="He had had to have had it.";;
echo my_replace('had', 'xxx', $test);
echo "<br />\n";
echo my_replace('had', 'xxx', $test, 2);
?>
Look at CodeFiddle
Probably not going to win any concours d'elegance with this, but very short:
$string = "He had had to have had it.";
echo strrev(preg_replace('/dah/', 'XXX', strrev($string), 1));
Try this
Solution
function generate_patterns($string, $find, $replace) {
// Make single statement
// Replace whitespace characters with a single space
$string = preg_replace('/\s+/', ' ', $string);
// Count no of patterns
$count = substr_count($string, $find);
// Array of result patterns
$solutionArray = array();
// Require for substr_replace
$findLength = strlen($find);
// Hold index for next replacement
$lastIndex = -1;
// Generate all patterns
for ( $i = 0; $i < $count ; $i++ ) {
// Find next word index
$lastIndex = strpos($string, $find, $lastIndex+1);
array_push( $solutionArray , substr_replace($string, $replace, $lastIndex, $findLength));
}
return $solutionArray;
}
$string = "He had had to have had it.";
$find = "had";
$replace = "yz";
$solutionArray = generate_patterns($string, $find, $replace);
print_r ($solutionArray);
Output :
Array
(
[0] => He yz had to have had it.
[1] => He had yz to have had it.
[2] => He had had to have yz it.
)
I manage this code try to optimize it.

Remove all the line breaks from the html source

Well I know obfuscation is a bad idea. But I want all of my html code to come in one long single line. All the html tags are generated through PHP, so I think its possible. I knew replacing \n\r from regular expression, but have no idea how to do this one. In case I am unclear here is an example
$output = '<p>
<div class="title">Hello</div>
</p>';
echo $output;
To be view in the source viewer as <p><div class="title">Hello</div></p>
Maybe this?
$output = str_replace(array("\r\n", "\r"), "\n", $output);
$lines = explode("\n", $output);
$new_lines = array();
foreach ($lines as $i => $line) {
if(!empty($line))
$new_lines[] = trim($line);
}
echo implode($new_lines);
You can try this perhaps.
// Before any output
ob_start();
// End of file
$output = ob_get_clean();
echo preg_replace('/^\s+|\n|\r|\s+$/m', '', $output);
This should, unless I messed up the regex, catch all output, and then replace all new line characters as well as all whitespace at the end and beginning of lines.
If you already have all output collected in a variable, you can of course just use the last line directly and skip the output buffering stuff :)
Worked for me:
$output = str_replace(array("\r\n", "\r", "\n"), "", $output);
You can do :
$output = '<p>'.
'<div class="title">Hello</div>'.
'</p>';
This way, $output won't contain any line jump.
This should also work :
$output = preg_replace(array('/\r/', '/\n/'), '', $output);
$output = preg_replace('!\s+!m', ' ', $output);
This is already well answered, but you may be able to do more than just trim spaces at both ends of each line:
First extract all text within quotes (you don't want to touch those), replace with a marker with a sequence number, store the sequence number with the text
Extract all text within <script></script> tags and do the same as step #1
Replace all white-space (including \n, \r) with spaces
Replace all >1 space sequences with 1 space
Replace all >_< with >< (_ = space)
Replace all _>, <_ and </_ with >, < and </ (_ = space)
Replace markers with the actual texts
This procedure can potentially compact the entire HTML file. This takes advantage of the fact that multiple white-space text inside HTML tags are intepreted as one single space.
This is a (as far as I have tested) working implementation of Stephen Chung's instructions. I'm not entirely convinced by number five, but have included it anyway.
Put the things you want to protect in the protected_parts array. Do it in order that you want them protected. If the starting and ending bits are different (as they would be in HTML tags), separate them by using a comma.
Also, I've no idea if this is the most optimised way of doing this, but it works for me and seems reasonably fast. Feel free to improve, etc. (Let me know if you do too!)
function MinifyHTML($str) {
$protected_parts = array("<pre>,</pre>", "\"", "'");
$extracted_values = array();
$i = 0;
foreach ($protected_parts as $part) {
$finished = false;
$search_offset = 0;
$first_offset = 0;
$startend = explode(",", $part);
if (count($startend) == 1) { $startend[1] = $startend[0]; }
while (!$finished) {
$first_offset = strpos($str, $startend[0], $search_offset);
if ($first_offset === false) { $finished = true; }
else {
$search_offset = strpos($str, $startend[1], $first_offset + strlen($startend[0]));
$extracted_values[$i] = substr($str, $first_offset + strlen($startend[0]), $search_offset - $first_offset - strlen($startend[0]));
$str = substr($str, 0, $first_offset + strlen($startend[0]))."$#".$i."$".substr($str, $search_offset);
$search_offset += strlen($startend[1]) + strlen((string)$i) + 3 - strlen($extracted_values[$i]);
$i++;
}
}
}
$str = preg_replace("/\s/", " ", $str);
$str = preg_replace("/\s{2,}/", " ", $str);
$str = str_replace("> <", "><", $str);
$str = str_replace(" >", ">", $str);
$str = str_replace("< ", "<", $str);
$str = str_replace("</ ", "</", $str);
for ($i = count($extracted_values); $i >= 0; $i--) {
$str = str_replace("$#".$i."$", $extracted_values[$i], $str);
}
return $str;
}
This is an improved function of the above. It adds text area protection and also anything that is a tag remains untouched.
I also removed strlen in the loop (its static).
This might run faster as a one pass filter to check for any of the protected parts. For such a small protected_parts array it's going to be more efficient than looping through the $str four times.
Also this doesn't fix: class = " " (the extra spaces between = and ") as its stuff inside the tags.
function MinifyHTML($str) {
$protected_parts = array('<pre>,</pre>','<textarea>,</textarea>', '<,>');
$extracted_values = array();
$i = 0;
foreach ($protected_parts as $part) {
$finished = false;
$search_offset = $first_offset = 0;
$end_offset = 1;
$startend = explode(',', $part);
if (count($startend) === 1) $startend[1] = $startend[0];
$len0 = strlen($startend[0]); $len1 = strlen($startend[1]);
while ($finished === false) {
$first_offset = strpos($str, $startend[0], $search_offset);
if ($first_offset === false) $finished = true;
else {
$search_offset = strpos($str, $startend[1], $first_offset + $len0);
$extracted_values[$i] = substr($str, $first_offset + $len0, $search_offset - $first_offset - $len0);
$str = substr($str, 0, $first_offset + $len0).'$$#'.$i.'$$'.substr($str, $search_offset);
$search_offset += $len1 + strlen((string)$i) + 5 - strlen($extracted_values[$i]);
++$i;
}
}
}
$str = preg_replace("/\s/", " ", $str);
$str = preg_replace("/\s{2,}/", " ", $str);
$replace = array('> <'=>'><', ' >'=>'>','< '=>'<','</ '=>'</');
$str = str_replace(array_keys($replace), array_values($replace), $str);
for ($d = 0; $d < $i; ++$d)
$str = str_replace('$$#'.$d.'$$', $extracted_values[$d], $str);
return $str;
}
You can't have <div> inside <p> - it is not spec-valid.
If you don't need to store it in a variable you can use this:
?><div><?php
?><div class="title">Hello</div><?php
?></div><?php

Categories