Uppercasing first letters of words using preg_replace - php

I need to turn names that are always in lower case into uppercase.
e.g. john johnsson -> John Johnsson
but also:
jonny-bart johnsson -> Jonny-Bart Johnsson
How do I accomplish this using PHP?

You could also use a regular expression:
preg_replace_callback('/\b\p{Ll}/', 'callback', $str)
\b represents a word boundary and \p{Ll} describes any lowercase letter in Unicode. preg_replace_callback will call a function called callback for each match and replace the match with its return value:
function callback($match) {
return mb_strtoupper($match[0]);
}
Here mb_strtoupper is used to turn the matched lowercase letter to uppercase.

If you're expecting unicode characters...or even if you're not, I recommend using mb_convert_case nonetheless. You shouldn't need to use preg_replace when there's a php function for this.

<?php
//FUNCTION
function ucname($string) {
$string =ucwords(strtolower($string));
foreach (array('-', '\'') as $delimiter) {
if (strpos($string, $delimiter)!==false) {
$string =implode($delimiter, array_map('ucfirst', explode($delimiter, $string)));
}
}
return $string;
}
?>
<?php
//TEST
$names =array(
'JEAN-LUC PICARD',
'MILES O\'BRIEN',
'WILLIAM RIKER',
'geordi la forge',
'bEvErly CRuSHeR'
);
foreach ($names as $name) { print ucname("{$name}\n"); }
//PRINTS:
/*
Jean-Luc Picard
Miles O'Brien
William Riker
Geordi La Forge
Beverly Crusher
*/
?>
From comments on the PHP manual entry for ucwords.

with regexps:
$out = preg_replace_callback("/[a-z]+/i",'ucfirst_match',$in);
function ucfirst_match($match)
{
return ucfirst(strtolower($match[0]));
}

Here's what I came up with (tested)...
$chars="'";//characters other than space and dash
//after which letters should be capitalized
function callback($matches){
return $matches[1].strtoupper($matches[2]);
}
$name="john doe";
$name=preg_replace_callback('/(^|[ \-'.$chars.'])([a-z])/',"callback",$name);
Or if you have php 5.3+ this is probably better (untested):
function capitalizeName($name,$chars="'"){
return preg_replace_callback('/(^|[ \-'.$chars.'])([a-z])/',
function($matches){
return $matches[1].strtoupper($matches[2]);
},$name);
}
My solution is a bit more verbose than some of the others posted, but I believe it offers the best flexibility (you can modify the $chars string to change which characters can separate names).

Related

substr() to preg_replace() matches php

I have two functions in PHP, trimmer($string,$number) and toUrl($string). I want to trim the urls extracted with toUrl(), to 20 characters for example. from https://www.youtube.com/watch?v=HU3GZTNIZ6M to https://www.youtube.com/wa...
function trimmer($string,$number) {
$string = substr ($string, 0, $number);
return $string."...";
}
function toUrl($string) {
$regex="/[^\W ]+[^\s]+[.]+[^\" ]+[^\W ]+/i";
$string= preg_replace($regex, "<a href='\\0'>".trimmer("\\0",20)."</a>",$string);
return $string;
}
But the problem is that the value of the match return \\0 not a variable like $url which could be easily trimmed with the function trimmer().
The Question is how do I apply substr() to \\0 something like this substr("\\0",0,20)?
What you want is preg_replace_callback:
function _toUrl_callback($m) {
return "" . trimmer($m[0], 20) ."";
}
function toUrl($string) {
$regex = "/[^\W ]+[^\s]+[.]+[^\" ]+[^\W ]+/i";
$string = preg_replace_callback($regex, "_toUrl_callback", $string);
return $string;
}
Also note that (side notes wrt your question):
You have a syntax error, '$regex' is not going to work (they don't replace var names in single-quoted strings)
You may want to look for better regexps to match URLs, you'll find plenty of them with a quick search
You may want to run through htmlspecialchars() your matches (mainly problems with "&", but that depends how you escape the rest of the string.
EDIT: Made it more PHP 4 friendly, requested by the asker.

Is right using preg_match to search word that contain specific letters?

I have a group of letters, for example :
$word='estroaroint';
that can be arranged to be words like :
- store
- train
- restoration
- ...etc
They can be found in my file list 'dictionary.txt'.
A letter only can only be used once.
How to write a php script able to perform that?
I would try to manage it with this function: strpbrk() http://php.net/manual/en/function.strpbrk.php
It isn't really possible to do that in one step with a regex. However, it is possible to do it in two steps:
the first step find all the words in the dictionary that only contains the letters.
the second step filter words where letters are repeated.
Example (only for ascii range):
$pattern = '~\b[' . $word . ']{1,' . strlen($word) . '}+\b~';
if (preg_match_all($pattern, $dictionary, $m)) {
$chars = count_chars ($word, 1);
$result = array_filter($m[0], function ($i) use ($chars) {
foreach (count_chars($i, 1) as $k=>$v) {
if ($v > $chars[$k]) return false;
}
return true;
});
print_r($result);
}
PHP links: array_filter - count_chars
Note: to extend this script to multibyte characters, you need to write your own function mb_count_chars (since this function doesn't exist) that splits a multibyte string (you can use for example mb_substr, mb_strlen and a loop, or preg_split with ~(?=.)~u and the PREG_SPLIT_NO_EMPTY option). You need to add the u modifier to the regex pattern too and to change strlen to its multybyte equivalent.

Optimize ucallwords function [duplicate]

This question already has answers here:
Make all words lowercase and the first letter of each word uppercase
(3 answers)
Closed 1 year ago.
The ucwords function in PHP doesn't consider non-whitespace to be word boundaries. So, if I ucwords this-that, I get This-that. What I want is all words capitalized, such as This-That.
This is a straightforward function to do so. Anyone have suggestions to improve the runtime?
function ucallwords($s)
{
$s = strtolower($s); // Just in case it isn't lowercased yet.
$t = '';
// Set t = only letters in s (spaces for all other characters)
for($i=0; $i<strlen($s); $i++)
if($s{$i}<'a' || $s{$i}>'z') $t.= ' ';
else $t.= $s{$i};
$t = ucwords($t);
// Put the non-letter characters back in t
for($i=0; $i<strlen($s); $i++)
if($s{$i}<'a' || $s{$i}>'z') $t{$i} = $s{$i};
return $t;
}
My gut feeling is that this could be done in a regular expression, but every time I start working on it, it gets complicated and I end up having to work on other things. I forget what I was doing and I have to start over. What I'd really like to hear is that PHP already has a good ucallwords function that I can use instead.
Taken directly from ucwords manual:
By jmarois at ca dot ibm dot com
<?php
//FUNCTION
function ucname($string) {
$string =ucwords(strtolower($string));
foreach (array('-', '\'') as $delimiter) {
if (strpos($string, $delimiter)!==false) {
$string =implode($delimiter, array_map('ucfirst', explode($delimiter, $string)));
}
}
return $string;
}
?>
<?php
//TEST
$names =array(
'JEAN-LUC PICARD',
'MILES O\'BRIEN',
'WILLIAM RIKER',
'geordi la forge',
'bEvErly CRuSHeR'
);
foreach ($names as $name) { print ucname("{$name}\n"); }
//PRINTS:
/*
Jean-Luc Picard
Miles O'Brien
William Riker
Geordi La Forge
Beverly Crusher
*/
?>
You can add more delimiters in the for-each loop array if you want to handle more characters.
A regular expression is easy for this:
$s = 'this-that'; //Original string to uppercase.
$r = preg_replace('/(^|[^a-z])[a-z]/e', 'strtoupper("$0")', $s);
This assumes that $s is lower case. You can use a-zA-Z in the second line to match upper and lower case letters. Alternately, you can wrap $s in the second line with strtolower($s).

preg_replace using pattern as index of replacement data array

I would like to know if there is a simple way to use the matched pattern in a preg_replace as an index for the replacement value array.
e.g.
preg_replace("/\{[a-z_]*\}/i", "{$data_array[\1]}", $string);
Search for {xxx} and replace it with the value in $data_array['xxx'], where xxx is a pattern.
But this expression does not work as its invalid php.
I have written the following function, but I'd like to know if it is possible to do it simply. I could use a callback, but how would I pass the $data_array to it too?
function mailmerge($string, $data_array, $tags='{}')
{
$tag_start=$tags[0];
$tag_end =$tags[1];
if( (!stristr($string, $tag_start)) && (!stristr($string, $tag_end)) ) return $string;
while(list($key,$value)=each($data_array))
{
$patterns[$key]="/".preg_quote($tag_start.$key.$tag_end)."/";
}
ksort($patterns);
ksort($data_array);
return preg_replace($patterns, $data_array, $string);
}
From my head:
preg_replace_callback("/\{([a-z_]*)\}/i", function($m) use($data_array){
return $data_array[$m[1]];
}, $string);
Note: The above function requires PHP 5.3+.
Associative Array replacement - keep matched fragments if not found:
$words=array("_saudation_"=>"Hello", "_animal_"=>"cat", "_animal_sound_"=>"MEooow");
$source=" _saudation_! My Animal is a _animal_ and it says _animal_sound_ , _no_match_";
echo (preg_replace_callback("/\b_(\w*)_\b/", function($match) use ($words) { if(isset($words[$match[0]])){
return ($words[$match[0]]);}else{
return($match[0]);}
}, $source));
//returns: Hello! My Animal is a cat and it says MEooow , _no_match_
*Notice, thats although "_no_match_" lacks translation, it will match during regex, but
preserve its key.
you can use preg_replace_callback and write a function where you can use that array index, or else you can use the e modifier to evaluate the replacement string (though note that the e modifier is deprecated, so the callback function is better solution).

Finding string and replacing with same case string

I need help while trying to spin articles. I want to find text and replace synonymous text while keeping the case the same.
For example, I have a dictionary like:
hello|hi|howdy|howd'y
I need to find all hello and replace with any one of hi, howdy, or howd'y.
Assume I have a sentence:
Hello, guys! Shouldn't you say hello me when I say you HELLO?
After my operation it will be something like:
hi, guys! Shouldn't you say howd'y to me when I say howdy?
Here, I lost the case. I want to maintain it! It should actually be:
Hi, guys! Shouldn't you say howd'y to me when I say HOWDY?
My dictionary size is about 5000 lines
hello|hi|howdy|howd'y go|come
salaries|earnings|wages
shouldn't|should not
...
I'd suggest using preg_replace_callback with a callback function that examines the matched word to see if (a) the first letter is not capitalized, or (b) the first letter is the only capitalized letter, or (c) the first letter is not the only capitalized letter, and then replace with the properly modified replacement word as desired.
You can find your string and do two tests:
$outputString = 'hi';
if ( $foundString == ucfirst($foundString) ) {
$outputString = ucfirst($outputString);
} else if ( $foundString == strtoupper($foundString) ) {
$outputString = strtoupper($outputString);
} else {
// do not modify string's case
}
Here's a solution for retaining the case (upper, lower or capitalized):
// Assumes $replace is already lowercase
function convertCase($find, $replace) {
if (ctype_upper($find) === true)
return strtoupper($replace);
else if (ctype_upper($find[0]) === true)
return ucfirst($replace);
else
return $replace;
}
$find = 'hello';
$replace = 'hi';
// Find the word in all cases that it occurs in
while (($pos = stripos($input, $find)) !== false) {
// Extract the word in its current case
$found = substr($input, $pos, strlen($find));
// Replace all occurrences of this case
$input = str_replace($found, convertCase($found, $replace), $input);
}
You could try the following function. Be aware that it will only work with ASCII strings, as it uses some of the useful properties of ASCII upper and lower case letters. However, it should be extremely fast:
function preserve_case($old, $new) {
$mask = strtoupper($old) ^ $old;
return strtoupper($new) | $mask .
str_repeat(substr($mask, -1), strlen($new) - strlen($old) );
}
echo preserve_case('Upper', 'lowercase');
// Lowercase
echo preserve_case('HELLO', 'howdy');
// HOWDY
echo preserve_case('lower case', 'UPPER CASE');
// upper case
echo preserve_case('HELLO', "howd'y");
// HOWD'Y
This is my PHP version of the clever little perl function:
How do I substitute case insensitively on the LHS while preserving case on the RHS?

Categories