PHP: Show Only the First Letter of Each Word + Include Punctuation - php

I'm using the below code to display only the first letter of each word in my string. For example, "Hello World!" would be displayed as "H W". However, I want to also include punctuation like this: "H W!"
How can I modify my code so punctuation is preserved?
$editversetext = preg_split("/[\s,_-]+/", $editversetext);
$initials = "";
foreach ($editversetext as $w) {
$initials .= $w[0];
}
$initials = implode(' ',str_split($initials));
echo $initials . ".";

You may use the following regex to match what you need:
'~\b(\p{L})\p{L}*(\p{P}?)~u'
See the regex demo.
Details
\b - a word boundary
(\p{L}) - Capturing group #1: a letter
\p{L}* - 0+ letters
(\p{P}?) - Capturing group #2: an optional punctuation (NOTE: if you also want to match symbols, replace \p{P} with [\p{P}\p{S}])
u - the "Unicode" modifier that enables PCRE_UTF and PCRE_UCP verbs to fully enable Unicode support.
Depending on the input you have, you may either use a replacing approach, or you may collect the matches and then combine them into the result you need in a similar way you are doing it now.
See the PHP demo:
$str = 'Hello World!';
// Replacing approach (if all words are matches):
echo preg_replace('~\b(\p{L})\p{L}*(\p{P}?)~u', '$1$2', $str) . "\n"; // => H W!
// Collecting/post-processing (if there are non-matching sequences)
$res = [];
preg_replace_callback('~\b(\p{L})\p{L}*(\p{P}?)~u', function($m) use (&$res) {
$res[] = $m[1].$m[2];
return '';
}, $str);
print_r(implode(" ", $res)); // => H W!

To match and remove all word characters that are not the first one, use \B a non word boundary.
$str = preg_replace('/\B\w+/', "", $str);
See regex demo or PHP demo
Be aware that digitis belong to \w. Use [A-Za-z] or unicode \pL with u flag instead if desired.

Related

PHP Extract Specific Character from string

i have the bellow string
$LINE = TCNU1573105 HDPE HTA108 155 155 000893520918 PAL990 25.2750 MT 28.9750 MT
and i want extract the PAL990 from the above string. actually extract PAL990 string or any string that has PAL followed by some digits Like PAL222 or PAL123
i tried many ways and could not get the result. i used,
substr ( $LINE, 77, 3)
but when the value in different position i get the wrong value.
You may use
$LINE = "TCNU1573105 HDPE HTA108 155 155 000893520918 PAL990 25.2750 MT 28.9750 MT";
if (preg_match('~\bPAL\d+\b~', $LINE, $res)) {
echo $res[0]; // => PAL990
}
See the PHP demo and this regex demo.
Details
\b - a word boundary
PAL - a PAL substring
\d+ - 1+ digits
\b - a word boundary.
The preg_match function will return the first match.
Note that in case your string contains similar strings in between hyphens/whitespace you will no longer be able to rely on word boundaries, use custom whitespace boundaries then, i.e.:
'~(?<!\S)PAL\d+(?!\S)~'
See this regex demo
EDIT
If you may have an optional whitespace between PAL and digits, you may use
preg_replace('~.*\b(PAL)\s?(\d+)\b.*~s', '$1$2', $LINE)
See this PHP demo and this regex demo.
Or, match the string you need with spaces, and then remove them:
if (preg_match('~\bPAL ?\d+\b~', $LINE, $res)) {
echo str_replace(" ", "", $res[0]);
}
See yet another PHP demo
Note that ? makes the preceding pattern optional (1 or 0 occurrences are matched).
$string = "123ABC1234 *$%^&abc.";
$newstr = preg_replace('/[^a-zA-Z\']/','',$string);
echo $newstr;
Output:ABCabc

Search string for first word that has an exclamation-mark

I have a string like this:
$string = 'Hello k-on! Lorem Ipsum! Lorem.';
I want to get the first word that is followed by an exclamation-mark. So in the example above, it should be:
$word = 'k-on';
I'm lost as to what's the appropriate approach to take. Maybe a regex solution?
If you need to only support ASCII letter words, you can use
/\b[a-z]+(?:-[a-z]+)*!/i
See regex demo
If you plan to support Unicode, use \p{L}:
/\b\p{L}+(?:-\p{L}+)*!/u
See another regex demo
Here is the pattern explanation:
\b - a word boundary (the previous character must be a non-word one or the beginning of the string)
\p{L}+ - 1 or more Unicode characters (or ASCII if [a-zA-Z] is used)
(?:-\p{L}+)* - zero or more sequences of:
- - a literal hyphen
\p{L}+ - 1 or more Unicode characters (or ASCII if [a-zA-Z] is used)
! - a literal ! symbol
PHP demo:
$re = '/\b\p{L}+(?:-\p{L}+)*!/u';
$str = "Hello k-ąn! Lorem Ipsum! Lorem.";
preg_match($re, $str, $match);
print_r($match);
I think this might do what you're looking for. Basically split the string into words, look for the first word that ends in '!', do whatever then break out of the loop:
$string = 'Hello k-on! Lorem Ipsum! Lorem.';
arry = explode(" ", $string);
foreach ($arry as $word) {
if (substr($word,-1) == "!") {
do something ...
break;
}
}
$string = 'Hello k-on! Lorem Ipsum! Lorem.';
preg_match('/[A-Za-z0-9-]+!/', $string, $match);
$yourWord = str_replace("!", "", $match[0]); //prints k-on
obviously, the Solution for the requirement is RegExp, here i used a simple expression which allows AlphaNumeric String, exceptionally allowing hyphen(-) as well. use of preg_match matches the pattern into the string and returns the first matching keyword, which in your case is k-on! and used str_replace in order to take out the exclamation from the returned string.
know more about preg_match : http://php.net/manual/en/function.preg-match.php

Replace all the first character of words in a string using preg_replace()

I have a string as
This is a sample text. This text will be used as a dummy for "various" RegEx "operations" using PHP.
I want to select and replace all the first alphabet of each word (in the example : T,i,a,s,t,T,t,w,b,u,a,d,f,",R,",u,P). How do I do it?
I tried /\b.{1}\w+\b/. I read the expression as "select any character that has length of 1 followed by word of any length" but didn't work.
You may try this regex as well:
(?<=\s|^)([a-zA-Z"])
Demo
Your regex - /\b.{1}\w+\b/ - matches any string that is not enclosed in word characters, starts with any symbol that is in a position after a word boundary (thus, it can even be whitespace if there is a letter/digit/underscore in front of it), followed with 1 or more alphanumeric symbols (\w) up to the word boundary.
That \b. is the culprit here.
If you plan to match any non-whitespace preceded with a whitespace, you can just use
/(?<!\S)\S/
Or
/(?<=^|\s)\S/
See demo
Then, replace with any symbol you need.
You may try to use the following regex:
(.)[^\s]*\s?
Using the preg_match_all and implode the output result group 1
<?php
$string = 'This is a sample text. This text will be used as a dummy for'
. '"various" RegEx "operations" using PHP.';
$pattern = '/(.)[^\s]*\s?/';
$matches;
preg_match_all($pattern, $string, $matches);
$output = implode('', $matches[1]);
echo $output; //Output is TiastTtwbuaadf"R"uP
For replace use something like preg_replace_callback like:
$pattern = '/(.)([^\s]*\s?)/';
$output2 = preg_replace_callback($pattern,
function($match) { return '_' . $match[2]; }, $string);
//result: _his _s _ _ample _ext. _his _ext _ill _e _sed _s _ _ummy _or _various" _egEx _operations" _sing _HP.

PHP Regex: Remove words less than 3 characters

I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);
Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries
You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''
Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);
As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"
To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.

Regex Preg_match_all match all pattern

Here is my concern,
I have a string and I need to extract chraracters two by two.
$str = "abcdef" should return array('ab', 'bc', 'cd', 'de', 'ef'). I want to use preg_match_all instead of loops. Here is the pattern I am using.
$str = "abcdef";
preg_match_all('/[\w]{2}/', $str);
The thing is, it returns Array('ab', 'cd', 'ef'). It misses 'bc' and 'de'.
I have the same problem if I want to extract a certain number of words
$str = "ab cd ef gh ij";
preg_match_all('/([\w]+ ){2}/', $str); // returns array('ab cd', 'ef gh'), I'm also missing the last part
What am I missing? Or is it simply not possible to do so with preg_match_all?
For the first problem, what you want to do is match overlapping string, and this requires zero-width (not consuming text) look-around to grab the character:
/(?=(\w{2}))/
The regex above will capture the match in the first capturing group.
DEMO
For the second problem, it seems that you also want overlapping string. Using the same trick:
/(?=(\b\w+ \w+\b))/
Note that \b is added to check the boundary of the word. Since the match does not consume text, the next match will be attempted at the next index (which is in the middle of the first word), instead of at the end of the 2nd word. We don't want to capture from middle of a word, so we need the boundary check.
Note that \b's definition is based on \w, so if you ever change the definition of a word, you need to emulate the word boundary with look-ahead and look-behind with the corresponding character set.
DEMO
In case if you need a Non-Regex solution, Try this...
<?php
$str = "abcdef";
$len = strlen($str);
$arr = array();
for($count = 0; $count < ($len - 1); $count++)
{
$arr[] = $str[$count].$str[$count+1];
}
print_r($arr);
?>
See Codepad.

Categories