Optimize ucallwords function [duplicate] - php

This question already has answers here:
Make all words lowercase and the first letter of each word uppercase
(3 answers)
Closed 1 year ago.
The ucwords function in PHP doesn't consider non-whitespace to be word boundaries. So, if I ucwords this-that, I get This-that. What I want is all words capitalized, such as This-That.
This is a straightforward function to do so. Anyone have suggestions to improve the runtime?
function ucallwords($s)
{
$s = strtolower($s); // Just in case it isn't lowercased yet.
$t = '';
// Set t = only letters in s (spaces for all other characters)
for($i=0; $i<strlen($s); $i++)
if($s{$i}<'a' || $s{$i}>'z') $t.= ' ';
else $t.= $s{$i};
$t = ucwords($t);
// Put the non-letter characters back in t
for($i=0; $i<strlen($s); $i++)
if($s{$i}<'a' || $s{$i}>'z') $t{$i} = $s{$i};
return $t;
}
My gut feeling is that this could be done in a regular expression, but every time I start working on it, it gets complicated and I end up having to work on other things. I forget what I was doing and I have to start over. What I'd really like to hear is that PHP already has a good ucallwords function that I can use instead.

Taken directly from ucwords manual:
By jmarois at ca dot ibm dot com
<?php
//FUNCTION
function ucname($string) {
$string =ucwords(strtolower($string));
foreach (array('-', '\'') as $delimiter) {
if (strpos($string, $delimiter)!==false) {
$string =implode($delimiter, array_map('ucfirst', explode($delimiter, $string)));
}
}
return $string;
}
?>
<?php
//TEST
$names =array(
'JEAN-LUC PICARD',
'MILES O\'BRIEN',
'WILLIAM RIKER',
'geordi la forge',
'bEvErly CRuSHeR'
);
foreach ($names as $name) { print ucname("{$name}\n"); }
//PRINTS:
/*
Jean-Luc Picard
Miles O'Brien
William Riker
Geordi La Forge
Beverly Crusher
*/
?>
You can add more delimiters in the for-each loop array if you want to handle more characters.

A regular expression is easy for this:
$s = 'this-that'; //Original string to uppercase.
$r = preg_replace('/(^|[^a-z])[a-z]/e', 'strtoupper("$0")', $s);
This assumes that $s is lower case. You can use a-zA-Z in the second line to match upper and lower case letters. Alternately, you can wrap $s in the second line with strtolower($s).

Related

How to check characters alternatively and replace it with Y if it is X?

I have a string, something like this:
$str ="it is a test string.";
// for more clarification
i t i s a t e s t s t r i n g .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Now I need to check all characters that are multiples of 4 (plus first character). like these:
1 => i
4 => i
8 => [space]
12 => t
16 => r
20 => .
Now, I need to compare them with Y (Y is a variable (symbol), for example Y = 'r' in here). So I want to replace Y with X (X is a variable (symbol) too, for example X = 'm' in here).
So, I want this output:
it is a test stming.
Here is my solution: I can do that using some PHP function:
strlen($str): to count the number of characters (named $sum)
$sum / 4: To find characters that are multiples of 4
substr($str, 4,1): to select specific character (named $char) {the problem is here}
if ($char == 'r') {}: to compare
str_replace('r','m',$char): to replace
And then combining all $char to each other.
But my solution has two problem:
substr() does not count [space] character (As I mentioned above)
combining characters is complicated a bit. (It needs to some waste processing)
Well, is there any solution? I like to do that using REGEX, Is it possible?
Could just use a simple regex with callback (add u flag if utf-8, s for . to match newline).
$str = preg_replace_callback(['/^./', '/.{3}\K./'], function ($m) {
return $m[0] == "r" ? "m" : $m[0];
}, $str); echo $str;
See this demo at tio.run > it is a test stming.
1st pattern: ^. any first character
2nd pattern: \K resets after .{3} any three characters, only want to check the fourth .
For use with anonymous function PHP >= 5.3 is required. Here is the workaround (demo).
Update: #Mariano demonstrated in his very nice answer that it is even with a single regex replacement possible. Thank you for the benchmark that reveals a rather bad performance for the preg_replace_callback solution. A more efficient variant without callback (but still two patterns).
$str = preg_replace(['/^r/', '/(?:...[^r])*...\Kr/'], 'm', $str);
I also included #revo's answer from 2017 in Mariano's benchmark and ran it on tio.run (100k loops). With newer PHP and PCRE2 the numbers seem to have changed slightly, "no regex" leads at tio.run.
In .NET or modern browser JS regex it also could be done like this by a variable length lookbehind.
If all characters in your string are in single byte, you can use something from PHP's official language reference:
$str ="it is a test string.";
$y="r";
$x="m";
$len=strlen($str);
if($str[0]==$y)
{
$str=substr_replace($str,$x,0,1);
}
if($len>=3)
{
for($i=3;$i<$len;$i+=4)
{
if($str[$i]==$y)
{
$str=substr_replace($str,$x,$i,1);
}
}
}
var_dump($str);
3v4l demo
Outputs it is a test stming.
Edit:
As #Don'tPanic points out, String is mutable using [] operator, so instead of using
$str=substr_replace($str,$x,$i,1);
you can just use
$str[$i]=$x;
This is an alternative using preg_replace()
$y = 'r';
$y = preg_quote($y, '/');
$x = 'M';
$x = preg_quote($x, '/');
$subject = 'rrrrrr rrrrr rrrrrr rrrr rrrr.';
$regex = "/\\G(?:^|(?(?<!^.).)..(?:.{4})*?)\\K$y/s";
$result = preg_replace($regex, $x, $subject);
echo $result;
// => MrrMrr MrrrM rrMrrr rrrM rrMr.
ideone demo
Regex:
\G(?:^|(?(?<!^.).)..(?:.{4})*?)\Km
\G is an assertion to the end of last match (or start of string)
(?:^|(?(?<!^.).)..(?:.{4})*?) matches:
^ start of string, to check at position 1
(?(?<!^.).) is an if clause that yields:
..(?:.{4})*?) 2 chars + a multiple of 4 if it has just replaced at position 1
...(?:.{4})*?) 3 chars + a multiple of 4 for successive matches
\K resets the text matched to avoid using backreferences
I must say though, regex is an overkill for this task. This code is counterintuitive and a typical regex that proves difficult to understand/debug/maintain.
EDIT. There was a later discussion about performance vs. code readability, so I did a benchmark to compare:
RegEx with a callback (#bobblebubble's answer).
RegEx with 2 replacements in an array (#bobblebubble's suggestion in comment).
No RegEx with substr_replace (#Passerby's answer).
Pure RegEx (this answer).
Result:
Code #1(with_callback): 0.548 secs/50k loops
Code #2(regex_array): 0.158 secs/50k loops
Code #3(no_regex): 0.120 secs/50k loops
Code #4(pure_regex): 0.118 secs/50k loops
Benchmark in ideone.com
Try this
$str ="it is a test string.";
$y="r";
$x="m";
$splite_array = str_split($str);
foreach ($splite_array as $key => $val)
{
if($key % 4 == 0 && $val == $y)
{
$splite_array[$key] = $x;
}
}
$yout_new_string = implode($splite_array);
This piece of code could help you on your way:
// Define variables
$string = "it is a test string.";
$y = 'r';
$x = 'm';
// Convert string to array
$chars = explode('', $string);
// Loop through all characters
foreach ($chars as $key => $char) {
// Array keys start at 0, so we add 1
$keyCount = $key+1;
// Check if deviding the key by 4 doesn't have rest value
// This means it is devisable by 4
if ($keyCount % 4 == 0 && $value == $y) {
$chars[$key] = $x;
}
}
// Convert back to string
$string = implode($chars);
Here is one other way to do this using string access and modification by character. (Consequently, it is only useful for single-byte encoded strings.)
// First character handled outside the loop because its index doesn't match the pattern
if ($str[0] == $y) $str[0] = $x;
// access every fourth character
for ($i=3; isset($str[$i]) ; $i+=4) {
// change it if it needs to be changed
if ($str[$i] == $y) $str[$i] = $x;
}
This modifies the original string rather than creating a new string, so if that shouldn't happen, it should be used on a copy.
Late to the party, puting aside \G anchor, I'd go with (*SKIP)(*F) method:
$str = "it is a test string.";
echo preg_replace(['~\Ar~', '~.{3}\K(?>r|.(*SKIP)(?!))~'], 'm', $str);
Short and clean.
PHP live demo

Replace string between two slashes [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 2 years ago.
I have to modify an URL like this:
$string = "/st:1/sc:RsrlYQhSQvs=/fp:1/g:3/start:2015-07-01/end:2015-07-30";
Namely, I want to delete st:1 with a regex. I used:
preg_replace("/\/st:(.*)\//",'',$string)
but I got
end:2015-07-30
while I would like to get:
/sc:RsrlYQhSQvs=/fp:1/g:3/start:2015-07-01/end:2015-07-30
Same if I would like to delete fp:1.
You can use:
$string = preg_replace('~/st:[^/]*~','',$string);
[^/]* will only match till next /
You are using greedy matching with . that matches any character.
Use a more restricted pattern:
preg_replace("/\/st:[^\/]*/",'',$string)
The [^\/]* negated character class only matches 0 or more characters other than /.
Another solution would be to use lazy matching with *? quantifier, but it is not that efficient as with the negated character class.
FULL REGEX EXPLANATION:
\/st: - literal /st:
[^\/]* - 0 or more characters other than /.
You need to add ? in your regex:-
<?php
$string = "/st:1/sc:RsrlYQhSQvs=/fp:1/g:3/start:2015-07-01/end:2015-07-30";
echo preg_replace("/\/st:(.*?)\//",'',$string)
?>
Output:- https://eval.in/397658
Based on this same you can do for next things also.
Instead of using regex here you should make parsing utility functions for your special format string, they are simple, they don't take to long to write and they will make your life a lot easier:
function readPath($path) {
$parameters = array();
foreach(explode('/', $path) as $piece) {
// Here we make sure we have something
if ($piece == "") {
continue;
}
// This here is just a fancy way of splitting the array returned
// into two variables.
list($key, $value) = explode(':', $piece);
$parameters[$key] = $value;
}
return $parameters;
}
function writePath($parameters) {
$path = "";
foreach($parameters as $key => $value) {
$path .= "/" . implode(":", array($key, $value));
}
return $path;
}
Now you can just work on it as a php array, in this case you would go:
$parameters = readPath($string);
unset($parameters['st']);
$string = writePath($parameters);
This makes for much more readable and reusable code, additionally since most of the time you are dealing with only slight variations of this format you can just change the delimiters each time or even abstract these functions to using different delimiters.
Another way to deal with this is to convert the string to conform to a normal path query, using something like:
function readPath($path) {
return parse_str(strtr($path, "/:", "&="));
}
In your case though since you are using the "=" character in a url you would also need to url encode each value so as to not conflict with the format, this would involve similarly structured code to above though.

Uppercasing first letters of words using preg_replace

I need to turn names that are always in lower case into uppercase.
e.g. john johnsson -> John Johnsson
but also:
jonny-bart johnsson -> Jonny-Bart Johnsson
How do I accomplish this using PHP?
You could also use a regular expression:
preg_replace_callback('/\b\p{Ll}/', 'callback', $str)
\b represents a word boundary and \p{Ll} describes any lowercase letter in Unicode. preg_replace_callback will call a function called callback for each match and replace the match with its return value:
function callback($match) {
return mb_strtoupper($match[0]);
}
Here mb_strtoupper is used to turn the matched lowercase letter to uppercase.
If you're expecting unicode characters...or even if you're not, I recommend using mb_convert_case nonetheless. You shouldn't need to use preg_replace when there's a php function for this.
<?php
//FUNCTION
function ucname($string) {
$string =ucwords(strtolower($string));
foreach (array('-', '\'') as $delimiter) {
if (strpos($string, $delimiter)!==false) {
$string =implode($delimiter, array_map('ucfirst', explode($delimiter, $string)));
}
}
return $string;
}
?>
<?php
//TEST
$names =array(
'JEAN-LUC PICARD',
'MILES O\'BRIEN',
'WILLIAM RIKER',
'geordi la forge',
'bEvErly CRuSHeR'
);
foreach ($names as $name) { print ucname("{$name}\n"); }
//PRINTS:
/*
Jean-Luc Picard
Miles O'Brien
William Riker
Geordi La Forge
Beverly Crusher
*/
?>
From comments on the PHP manual entry for ucwords.
with regexps:
$out = preg_replace_callback("/[a-z]+/i",'ucfirst_match',$in);
function ucfirst_match($match)
{
return ucfirst(strtolower($match[0]));
}
Here's what I came up with (tested)...
$chars="'";//characters other than space and dash
//after which letters should be capitalized
function callback($matches){
return $matches[1].strtoupper($matches[2]);
}
$name="john doe";
$name=preg_replace_callback('/(^|[ \-'.$chars.'])([a-z])/',"callback",$name);
Or if you have php 5.3+ this is probably better (untested):
function capitalizeName($name,$chars="'"){
return preg_replace_callback('/(^|[ \-'.$chars.'])([a-z])/',
function($matches){
return $matches[1].strtoupper($matches[2]);
},$name);
}
My solution is a bit more verbose than some of the others posted, but I believe it offers the best flexibility (you can modify the $chars string to change which characters can separate names).

Finding string and replacing with same case string

I need help while trying to spin articles. I want to find text and replace synonymous text while keeping the case the same.
For example, I have a dictionary like:
hello|hi|howdy|howd'y
I need to find all hello and replace with any one of hi, howdy, or howd'y.
Assume I have a sentence:
Hello, guys! Shouldn't you say hello me when I say you HELLO?
After my operation it will be something like:
hi, guys! Shouldn't you say howd'y to me when I say howdy?
Here, I lost the case. I want to maintain it! It should actually be:
Hi, guys! Shouldn't you say howd'y to me when I say HOWDY?
My dictionary size is about 5000 lines
hello|hi|howdy|howd'y go|come
salaries|earnings|wages
shouldn't|should not
...
I'd suggest using preg_replace_callback with a callback function that examines the matched word to see if (a) the first letter is not capitalized, or (b) the first letter is the only capitalized letter, or (c) the first letter is not the only capitalized letter, and then replace with the properly modified replacement word as desired.
You can find your string and do two tests:
$outputString = 'hi';
if ( $foundString == ucfirst($foundString) ) {
$outputString = ucfirst($outputString);
} else if ( $foundString == strtoupper($foundString) ) {
$outputString = strtoupper($outputString);
} else {
// do not modify string's case
}
Here's a solution for retaining the case (upper, lower or capitalized):
// Assumes $replace is already lowercase
function convertCase($find, $replace) {
if (ctype_upper($find) === true)
return strtoupper($replace);
else if (ctype_upper($find[0]) === true)
return ucfirst($replace);
else
return $replace;
}
$find = 'hello';
$replace = 'hi';
// Find the word in all cases that it occurs in
while (($pos = stripos($input, $find)) !== false) {
// Extract the word in its current case
$found = substr($input, $pos, strlen($find));
// Replace all occurrences of this case
$input = str_replace($found, convertCase($found, $replace), $input);
}
You could try the following function. Be aware that it will only work with ASCII strings, as it uses some of the useful properties of ASCII upper and lower case letters. However, it should be extremely fast:
function preserve_case($old, $new) {
$mask = strtoupper($old) ^ $old;
return strtoupper($new) | $mask .
str_repeat(substr($mask, -1), strlen($new) - strlen($old) );
}
echo preserve_case('Upper', 'lowercase');
// Lowercase
echo preserve_case('HELLO', 'howdy');
// HOWDY
echo preserve_case('lower case', 'UPPER CASE');
// upper case
echo preserve_case('HELLO', "howd'y");
// HOWD'Y
This is my PHP version of the clever little perl function:
How do I substitute case insensitively on the LHS while preserving case on the RHS?

preg_replace: wildcards do not match umlaut-characters

i want to filter a String by using the \w wildcard, but unfortunately it does not cover umlauts.
$i = "Die Höhe";
$x = preg_replace("/[^\w\s]/","",$i);
echo $x; // "Die Hhe";
However, i can add all the characters to preg_replace, but this is not very elegant, since the list will become very long. ATM, i am preparing this only for German, but there are more languages to come.
$i = "Die Höhe";
$x = preg_replace("/[^\w\säöüÄÖÜß]/","",$i);
echo $x; // "Die Höhe";
Is there a way to match all of them at once?
You strings are obviously UTF-8, so you want the 'u' flag and unicode properties instead of \w
$x = preg_replace('/[^\p{L}\p{N} ]/u',"",$i);
this should remove all, in my opinion, non meaningful chars:
$val = "Die Höhe";
$val = preg_replace('/[^\x20-\x7e\xa1-\xff]+/u', '', $val);
echo $val; // "Die Höhe"

Categories