How to make regex for name abbreviation - php

I have original data like this
Original Data:
SRI ISTYANINGSIH
DIANA WREDHININGSIH
ENDANG WAHYU PURWANINGSIH
THERESIA PUDJI ASTUTIE SARI
And I need to show it like this:
View Data:
SRI I.
DIANA W.
ENDANG W. P.
THERESIA P. A. S.
How can I accomplish this using PHP and regular expressions?

This is a simple solution which could easily be turned into a function.
$name = 'THERESIA PUDJI ASTUTIE SARI';
//split the name to a maximum of 2 array values.
list ($first_name, $second_names) = explode(' ', $name, 2);
$second_names = explode(' ', $second_names);
foreach ($second_names as $key => $value) {
$second_names[$key] = $value[0] . '.';
}
echo $first_name . ' ' . implode(' ', $second_names);

suppose, $var has the string that you want to convert.
$var = "THERESIA PUDJI ASTUTIE SARI";
$parts = explode(" ", $var);
$str = $parts[0]." ";
for($i=1; $i<count($parts); $i++){
$str .= $parts[$i][0].". ";
}
echo $str will give you desired output.

I don't know what you mean with "make delimiter" but here is a regular expression based solution:
function shorten_name($name) {
return preg_replace('/ (\w)\w*/', ' $1.', $name);
}
The pattern matches a space followed by any number of "word characters" (\w), i.e. letters (locale aware), digits and underscores, then replaces this sequence with only the space, the first letter and a dot.
Possible modifications:
If you only want to match uppercase letters from A-Z like in your example, replace \w with [A-Z].
If you want to match anything that is not a space (i.e. "MÜLLER-RIEBENSEE" => "M."), replace \w with \S (non-whitespace).
If you want to have other characters than the space as separator, use character class and subpattern for it too for example: preg_replace('/([\s-])(\w)\w*/', '$1$2.', $name) to take any whitespace character \s or the dash - as separator (i.e. "MÜLLER-RIEBENSEE" => "M.-R.")

Related

PHP - filter UTF-8 string to allow only basic charset and some punctuation [duplicate]

I want to disallow all symbols in a string, and instead of going and disallowing each one I thought it'd be easier to just allow alphanumeric characters (a-z A-Z 0-9).
How would I go about parsing a string and converting it to one which only has allowed characters? I also want to convert any spaces into _.
At the moment I have:
function parseFilename($name) {
$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
$name = str_replace(' ', '_', $name);
return $name;
}
Thanks
Try
$name = preg_replace("/[^a-zA-Z0-9]/", "", $name);
You could do both replacements at once by using arrays as the find / replace params in preg_match():
$str = 'abc def+ghi&jkl ...z';
$find = array( '#[\s]+#','#[^\w]+#' );
$replace = array( '_','' );
$newstr = preg_replace( $find,$replace,$str );
print $newstr;
// outputs:
// abc_defghijkl_z
\s matches whitespace (replaced with a single underscore), and as #F.J described, ^\w is anything "not a word character" (replaced with empty string).
preg_replace() is the way to go here, the following should do what you want:
function parseFilename($name) {
$name = str_replace(' ', '_', $name);
$name = preg_replace('/[^\w]+/', '', $name);
return $name;
}
[^\w] is equivalent to [^a-zA-Z0-9_], which will match any character that is not alphanumeric or an underscore. The + after it means match one or more, this should be slightly more efficient than replacing each character individually.
The replacement if spaces with spaces does not require the might of the regex engine; it can wait out the first round of replacements.
The purging of all non-alphanumeric characters and underscores is concisely handled by \W -- it means any character not in a-z, A-Z, 0-9, or _.
Code: (Demo)
function sanitizeFilename(string $name): string {
return preg_replace(
'/\W+/',
'',
str_replace(' ', '_', $name)
);
}
echo sanitizeFilename('This/is My 1! FilenAm3');
Output:
Thisis_My_____1_FilenAm3
...but if you want to condense consecutive spaces and replace them with a single underscore, then use regex. (Demo)
function sanitizeFilename(string $name): string {
return preg_replace(
['/ +/', '/\W+/'],
['_', ''],
$name
);
}
echo sanitizeFilename('This/has a Gap !n 1t');
Output:
Thishas_a_Gap_n_1t
Try working with the HTML part
pattern="[A-Za-z]{8}" title="Eight letter country code">

Explode string on second last and last space to create exactly 3 elements

I want to split a space-delimited string by its spaces, but I need the total elements in the result array to be exactly 3 AND if the string has more than two spaces, only the last two spaces should be used as delimiters.
My input strings follow a predictable format. The strings are one or more words, then a word, then a parenthetically wrapped word (word in this context is a substring with no whitespaces in it).
Sample strings:
Stack Over Flow Abcpqr (UR)becomes:["Stack Over Flow", "Abcpqr", "(UR)"]
Fluency in English Conversation Defklmno (1WIR)becomes:["Fluency in English Conversation","Defklmno","(1WIR)"]
English Proficiency GHI (2WIR)becomes:["English Proficiency","GHI","(2WIR)"]
Testing ADG (3WIR)becomes:["Testing","ADG","(3WIR)"]
I used the following code, but it is only good for Testing (3WIR).
$Original = $row['fld_example'];
$OriginalExplode = explode(' ', $Original);
<input name="example0" id="example0" value="<?php echo $OriginalExplode[0]; ?>" type="text" autocomplete="off" required>
<input name="example1" id="example1" value="<?php echo $OriginalExplode[1]; ?>" type="text" autocomplete="off" required>
Basically, I just need to explode the string on spaces, starting from the end of the string, and limiting the total explosions to 2 (to make 3 elements.
You can approach this using explode and str_replace
$string = "Testing (3WIR)";
$stringToArray = explode(":",str_replace("(",":(",$string));
echo '<pre>';
print_r($stringToArray);
Edited question answer:-
$subject = "Fluency in English Conversation Defklmno (1WIR)";
$toArray = explode(' ',$subject);
if(count($toArray) > 2){
$first = implode(" ",array_slice($toArray, 0,count($toArray)-2));
$second = $toArray[count($toArray)-2];
$third = $toArray[count($toArray)-1];
$result = array_values(array_filter([$first, $second, $third]));
}else{
$result = array_values(array_filter(explode(":",str_replace("(",":(",$subject))));
}
DEMO HERE
I am not a fan of regular expressions, but this one seems to work very fine:
Regex to split a string only by the last whitespace character
So the PHP code would be:
function splitAtLastWord($sentence)
{
return preg_split("/\s+(?=\S*+$)/", $sentence);
}
$sentence = "Fluency in English Conversation Defklmno (1WIR)";
list($begin, $end) = splitAtLastWord($sentence);
list($first, $middle) = splitAtLastWord($begin);
$result = [$first, $middle, $end];
echo "<pre>" . print_r($result, TRUE) . "</pre>";
The output is:
Array
(
[0] => Fluency in English Conversation
[1] => Defklmno
[2] => (1WIR)
)
You can also write the same function without a regular expression:
function splitAtLastWord($sentence)
{
$words = explode(" ", $sentence);
$last = array_pop($words);
return [implode(" ", $words), $last];
}
Which is, to be honest, a better way of doing this.
This is a computationally more efficient way to do it:
function splitAtLastWord($sentence)
{
$lastSpacePos = strrpos($sentence, " ");
return [substr($sentence, 0, $lastSpacePos), substr($sentence, $lastSpacePos + 1)];
}
It looks a bit less nice but it is faster.
Anyway, defining a separate function like this is useful, you can reuse it in other places.
To isolate the two delimiting spaces, use / (?=(?:\S+ )?\()/ which leverages a lookahead containing an optional group.
Code: (Demo)
$strings = [
'Stack Over Flow Abcpqr (UR)',
'Fluency in English Conversation Defklmno (1WIR)',
'English Proficiency GHI (2WIR)',
'Testing ADG (3WIR)',
];
foreach ($strings as $string) {
echo json_encode(
preg_split('/ (?=(?:\S+ )?\()/', $string)
) . "\n";
}
Output:
["Stack Over Flow","Abcpqr","(UR)"]
["Fluency in English Conversation","Defklmno","(1WIR)"]
["English Proficiency","GHI","(2WIR)"]
["Testing","ADG","(3WIR)"]
Pattern Breakdown:
#match a literal space
(?= #start lookahead
(?:\S+ )? #optionally match one or more non-whitespaces followed by a space
\( #match a literal opening parenthesis
) #end lookahead
When matching the first delimiting space, the optional subpattern will match characters. When matching the second delimiting space (before the parenthesis), the optional subpattern will not match any characters.
As a more generic solution, if the goal was to split on the space before either of the last two non-whitespace substrings, this pattern looks ahead in the same fashion but matches all the way to the end of the string.
/ (?=(?:\S+ )?\S+$)/
While I don't find non-regex solutions to be anywhere near as elegant or concise, here is one way to explode on all spaced then implode all elements except the last two: (Demo)
function implodeNotLastTwoElements($string) {
$array = explode(' ', $string);
array_splice($array, 0, -2, implode(' ', array_slice($array, 0, -2)));
return $array;
}
foreach ($strings as $string) {
echo json_encode(implodeNotLastTwoElements($string)) . "\n";
}
Or (Demo)
function implodeNotLastTwoElements($string) {
$array = explode(' ', $string);
return [implode(' ', array_slice($array, 0, -2))] + array_slice($array, -3);
}
These non-regex approaches are iterating/scanning over the data 4 times versus regex only scanning the input string once and directly creating the desired result. The decision between regex or non-regex is a no-brainer for me in this case.

PHP convert uppercase words to lowercase, but keep ucfirst on lowercase words

An example:
THIS IS A Sentence that should be TAKEN Care of
The output should be:
This is a Sentence that should be taken Care of
Rules
Convert UPPERCASE words to lowercase
Keep the lowercase words with an uppercase first character intact
Set the first character in the sentence to uppercase.
Code
$string = ucfirst(strtolower($string));
Fails
It fails because the ucfirst words are not being kept.
This is a sentence that should be taken care of
You can test each word for those rules:
$str = 'THIS IS A Sentence that should be TAKEN Care of';
$words = explode(' ', $str);
foreach($words as $k => $word){
if(strtoupper($word) === $word || // first rule
ucfirst($word) !== $word){ // second rule
$words[$k] = strtolower($word);
}
}
$sentence = ucfirst(implode(' ', $words)); // third rule
Output:
This is a Sentence that should be taken Care of
A little bit of explanation:
Since you have overlapping rules, you need to individually compare them, so...
Break down the sentence into separate words and check each of them based on the rules;
If the word is UPPERCASE, turn it into lowercase; (THIS, IS, A, TAKEN)
If the word is ucfirst, leave it alone; (Sentence, Care)
If the word is NOT ucfirst, turn it into lowercase, (that, should, be, of)
You can break the sentence down into individual words, then apply a formatting function to each of them:
$sentence = 'THIS IS A Sentence that should be TAKEN Care of';
$words = array_map(function ($word) {
// If the word only has its first letter capitalised, leave it alone
if ($word === ucfirst(strtolower($word)) && $word != strtoupper($word)) {
return $word;
}
// Otherwise set to all lower case
return strtolower($word);
}, explode(' ', $sentence));
// Re-combine the sentence, and capitalise the first character
echo ucfirst(implode(' ', $words));
See https://eval.in/936462
$str = "THIS IS A Sentence that should be TAKEN Care of";
$str_array = explode(" ", $str);
foreach ($str_array as $testcase =>$str1) {
//Check the first word
if ($testcase ==0 && ctype_upper($str1)) {
echo ucfirst(strtolower($str1))." ";
}
//Convert every other upercase to lowercase
elseif( ctype_upper($str1)) {
echo strtolower($str1)." ";
}
//Do nothing with lowercase
else {
echo $str1." ";
}
}
Output:
This is a Sentence that should be taken Care of
I find preg_replace_callback() to be a direct tool for this task. Create a pattern that will capture the two required strings:
The leading word
Any non-leading, ALL-CAPS word
Code: (Demo)
echo preg_replace_callback(
'~(^\pL+\b)|(\b\p{Lu}+\b)~u',
function($m) {
return $m[1]
? mb_convert_case($m[1], MB_CASE_TITLE, 'UTF-8')
: mb_strtolower($m[2], 'UTF-8');
},
'THIS IS A Sentence that should be TAKEN Care of'
);
// This is a Sentence that should be taken Care of
I did not test this with multibyte input strings, but I have tried to build it with multibyte characters in mind.
The custom function works like this:
There will always be either two or three elements in $m. If the first capture group matches the first word of the string, then there will be no $m[2]. When a non-first word is matched, then $m[2] will be populated and $m[1] will be an empty string. There is a modern flag that can be used to force that empty string to be null, but it is not advantageous in this case.
\pL+ means one or more of any letter (single or multi-byte)
\p{Lu}+ means one or more uppercase letters
\b is a word boundary. It is a zero-width character -- it doesn't match a character, it checks that the two consecutive characters change from a word to a non-word or vice versa.
My answer makes just 3 matches/replacement on the sample input string.
$string='THIS IS A Sentence that should be TAKEN Care of';
$arr=explode(" ", $string);
foreach($arr as $v)
{
$v = ucfirst(strtolower($v));
$stry = $stry . ' ' . $v;
}
echo $stry;

Optional Character in PHP Regular Expression Replace

I have data in this format coming from a database...
BUS 101S Business and Society
or
BUS 101 Business and Society
Notice the optional "S" character (which can be any uppercase character)
I need to replace the "BUS 101S" part with null and here is what I have come up with...
$value = "BUS 101S Business and Society";
$sub = substr($value, 0, 3); // Gives me "BUS"
$num = substr($value, 4, 3); // Gives me "101"
$new_value = preg_replace("/$sub $num"."[A-Z]?/", null, $value);
The value of $new_value now contains S Business and Society. So I'm close, Just need it to replace the optional single uppercase character as well. Any ideas?
Assuming the pattern is 3 uppercase letters, 3 numbers and then an optional uppercase letter, just use a single preg_match:
$new = preg_replace('/^[A-Z]{3} \d{3}[A-Z]?/', '', $old);
The ^ will only match at the beginning of a line/string. The {3} means "match the preceding token 3 times exactly". The ? means "match the preceding token zero or one times"
You can also do something like this, so you don't bother with substr:
preg_replace('#^[A-Z]{3} [0-9]{3}[A-Z]? (.*)$#', '$1', $value);
Or using preg_match, to get all the components of the string
if (preg_match('#^([A-Z]{3}) ([0-9]{3})([A-Z]?) (.*)$#', $value, $matches)) {
$firstMatch=$matches[1];//BUS ($matches[0] is the whole string)
$secondMatch=$matches[2];//101
$secondMatch=$matches[3];//S or ''
$secondMatch=$matches[4];//the rest of the text
}
Wouldn't it just be easier to do something like:
$str = 'BUS 101S Business and Society';
$words = explode(' ', $str);
array_unshift($words); // bus
array_unshift($words); // 101s
$str = implode(' ', $words);

PHP Regex: How to get capital words then add string if a ucwords matches?

I have this dynamic string
"ZAN ROAD HOG HEADWRAPS The most
popular ZAN headwrap style-features
custom and original artwork"
EDIT
How can I check all the capital words then if I encountered a ucwords() or title case word then I will automatically add a '--' after the last capital word?
Note: The capital words are the product name and the first ucwords() or title case word is the start of the product description.
I have this code right now but its not working at the moment:
<?php
$str = preg_replace( '/\s+/', ' ', $sentence );
$words = array_reverse( explode( ' ', $str ) );
foreach ( $words as $k => $s ) {
if ( preg_match( '/\b[A-Z]{5,}\b/', $s ) ) {
$words[$k] = $s . " --";
break;
}
}
$short_desc = addslashes( trim( join( ' ', array_reverse( $words ) ) ));
?>
Thanks in advance.
You can do this:
$str = preg_replace('/^(?:\p{Lu}+\s+)+(?=\p{Lu}*\p{Ll})/u', '$0-- ', $str);
Here ^(?:\p{Lu}+\s+)+ describes a sequence of words at the begin of the string that are separated by whitespace where each word is a sequence of uppercase letters (\p{Lu}, see Unicode character properties). The look-ahead assertion (?=\p{Lu}*\p{Ll}) is just to ensure that there actually is something following that contains a lowercase letter.
You can just look for capital letters in the start of the string:
$regexp = "/^([A-Z][A-Z\s]+)([A-Z].+)/";
$matches = $preg_match($regexp, $string);
$out = $matches[1] . "-- " . $matches[2];
The first [A-Z] looks for a capital letter in the beginning of the line
The next [A-Z\s]+ looks for 1 or more capital letters or spaces
Then, [A-Z].+ looks for the first capital letter of the remaining text and any character subsequently.
The remaining lines are, I hope, self explanatory
-Pranav
By performing a non-global replacement (informing preg_replace() that you only wish to make one replacement), you can avoid using ^ to anchor your pattern to the front of the input string.
The targeted position of your insert string immediately follows that final occurrence of "one or more uppercase letters followed by a space".
No capture groups or references are needed. \K in the pattern says "restart the fullstring match" in other words "release/forget any previously matched characters and start matching from this point". ...then we just don't match anymore characters -- this delivers the zero-length position to insert the --. Effectively, no characters are lost in the action.
Code: (PHP Demo) (Regex Demo)
$string = "ZAN ROAD HOG HEADWRAPS The most popular ZAN headwrap style-features custom and original artwork";
echo preg_replace('~(?:[A-Z]+ )+\K~', '-- ', $string, 1);
echo "\n---\n";
echo preg_replace('~^(?:[A-Z]+ )+\K~', '-- ', $string); // without telling function to perform a single replacement
Output:
ZAN ROAD HOG HEADWRAPS -- The most popular ZAN headwrap style-features custom and original artwork
---
ZAN ROAD HOG HEADWRAPS -- The most popular ZAN headwrap style-features custom and original artwork
As a fringe case acknowledgement, if you have a product description that starts with A or I, then the pattern will need to be fortified slightly to accommodate. This could be achieved a number of ways; this seems simple/logical/direct to me: (Regex Demo)
~(?:[A-Z]+ )+\K(?=[A-Z])~

Categories