Create acronym from a string containing only words - php

I'm looking for a way that I can extract the first letter of each word from an input field and place it into a variable.
Example: if the input field is "Stack-Overflow Questions Tags Users" then the output for the variable should be something like "SOQTU"

$s = 'Stack-Overflow Questions Tags Users';
echo preg_replace('/\b(\w)|./', '$1', $s);
the same as codaddict's but shorter
For unicode support, add the u modifier to regex: preg_replace('...../u',

Something like:
$s = 'Stack-Overflow Questions Tags Users';
if(preg_match_all('/\b(\w)/',strtoupper($s),$m)) {
$v = implode('',$m[1]); // $v is now SOQTU
}
I'm using the regex \b(\w) to match the word-char immediately following the word boundary.
EDIT:
To ensure all your Acronym char are uppercase, you can use strtoupper as shown.

Just to be completely different:
$input = 'Stack-Overflow Questions Tags Users';
$acronym = implode('',array_diff_assoc(str_split(ucwords($input)),str_split(strtolower($input))));
echo $acronym;

$initialism = preg_replace('/\b(\w)\w*\W*/', '\1', $string);

If they are separated by only space and not other things. This is how you can do it:
function acronym($longname)
{
$letters=array();
$words=explode(' ', $longname);
foreach($words as $word)
{
$word = (substr($word, 0, 1));
array_push($letters, $word);
}
$shortname = strtoupper(implode($letters));
return $shortname;
}

Regular expression matching as codaddict says above, or str_word_count() with 1 as the second parameter, which returns an array of found words. See the examples in the manual. Then you can get the first letter of each word any way you like, including substr($word, 0, 1)

The str_word_count() function might do what you are looking for:
$words = str_word_count ('Stack-Overflow Questions Tags Users', 1);
$result = "";
for ($i = 0; $i < count($words); ++$i)
$result .= $words[$i][0];

function initialism($str, $as_space = array('-'))
{
$str = str_replace($as_space, ' ', trim($str));
$ret = '';
foreach (explode(' ', $str) as $word) {
$ret .= strtoupper($word[0]);
}
return $ret;
}
$phrase = 'Stack-Overflow Questions IT Tags Users Meta Example';
echo initialism($phrase);
// SOQITTUME

$s = "Stack-Overflow Questions IT Tags Users Meta Example";
$sArr = explode(' ', ucwords(strtolower($s)));
$sAcr = "";
foreach ($sArr as $key) {
$firstAlphabet = substr($key, 0,1);
$sAcr = $sAcr.$firstAlphabet ;
}

using answer from #codaddict.
i also thought in a case where you have an abbreviated word as the word to be abbreviated e.g DPR and not Development Petroleum Resources, so such word will be on D as the abbreviated version which doesn't make much sense.
function AbbrWords($str,$amt){
$pst = substr($str,0,$amt);
$length = strlen($str);
if($length > $amt){
return $pst;
}else{
return $pst;
}
}
function AbbrSent($str,$amt){
if(preg_match_all('/\b(\w)/',strtoupper($str),$m)) {
$v = implode('',$m[1]); // $v is now SOQTU
if(strlen($v) < 2){
if(strlen($str) < 5){
return $str;
}else{
return AbbrWords($str,$amt);
}
}else{
return AbbrWords($v,$amt);
}
}
}

As an alternative to #user187291's preg_replace() pattern, here is the same functionality without needing a reference in the replacement string.
It works by matching the first occurring word characters, then forgetting it with \K, then it will match zero or more word characters, then it will match zero or more non-word characters. This will consume all of the unwanted characters and only leave the first occurring word characters. This is ideal because there is no need to implode an array of matches. The u modifier ensures that accented/multibyte characters are treated as whole characters by the regex engine.
Code: (Demo)
$tests = [
'Stack-Overflow Questions Tags Users',
'Stack Overflow Close Vote Reviewers',
'Jean-Claude Vandàmme'
];
var_export(
preg_replace('/\w\K\w*\W*/u', '', $tests)
);
Output:
array (
0 => 'SOQTU',
1 => 'SOCVR',
2 => 'JCV',
)

Related

Formatting string according to pattern without regex in php

How can I format an arbitrary string according to a flexible pattern? The only solution I came up with is using regular expressions, but then I need 2 "patterns" (one for the search and one for the output).
Example:
$str = '123ABC5678";
Desired output: 12.3AB-C5-67.8
I would like to use a pattern in a variable (one that a user can easily define without knowledge of regular expressions) It could look like this:
$pattern = '%%.%%%-%%-%%.%';
So the user would just have to use 2 different characters (% and .)
A solution with regex would look like this:
$str = '123ABC5678';
$pattern_src = '#(.{2})(.{3})(.{2})(.{2})(.{1})#';
$pattern_rpl = "$1.$2-$3-$4.$5";
$res = preg_replace($pattern_src, $pattern_rpl, $str);
//$res eq 12.3AB-C5-67.8
Way too complicated since the user would need to define $pattern_src and $pattern_rpl. If the string could vary in length, it would be even more complex to explain.
Yes, I could write a function/parser that builds the required regular expressions based on a simple user pattern like %%.%%%-%%-%%.%. But I wonder if there is any "built in" way to achieve this with php? I was thinking about sprintf etc., but that doesn't seem to do the trick. Any ideas?
I was thinking about sprintf etc., but that doesn't seem to do the trick.
You're on the right track. You can accomplish this with vsprintf as follows:
$str = '123ABC5678';
$pattern = '%%.%%%-%%-%%.%';
echo vsprintf(str_replace('%', '%s', $pattern), str_split($str));
Output:
12.3AB-C5-67.8
This is assuming the number of % characters in $pattern match the length of $str.
Why not write a simple parser that works as follows:
For each character of pattern:
if you match percent character, output next character from input
if you match any other character, output it
$str = '123ABC5678';
$pattern = '%%.%%%-%%-%%.%';
if (strlen($str) < substr_count($pattern, '%'))
Die('The length of input string is lower than number number of placeholders');
$len = strlen($pattern);
$stringIndex = 0;
$output = '';
for ($i = 0; $i < $len; $i++) {
if ($pattern[$i] === '%') {
$output .= $str[$stringIndex];
$stringIndex++;
} else {
$output .= $pattern[$i];
}
}
echo $output;
I have a similar solution that looks like this.
<?php
$format = '%%.%%%-%%-%%.%';
$string = '123ABC5678';
$new_string = '';
$c = 0;
for( $i = 0; $i < strlen( $format ); $i++ )
{
if( $format[ $i ] == '%' )
{
$new_string .= $string[ $c ];
$c++;
}
else
{
$new_string .= $format[ $i ];
}
}
echo $new_string;
Output:
12.3AB-C5-67.8
How about this pattern from the user?
2.3-2-2.1
Where the pattern is a number means n chars, a dot or dash means add a dot or dash.
Now you make a regex to parse the user input:
preg_match_all("/(.)/", $User_input, $pattern);
Now you will have an array with either numbers or dots and dashes.
So loop through the array and build the string:
$string = '123ABC5678';
$User_input = "2.3-2-2.1";
preg_match_all("/(.)/", $User_input, $pattern);
$i=0;
$str="";
foreach($pattern[1] as $val){
if(is_numeric($val)){
$str .= substr($string,$i,$val);
$i=$i+$val;
}else{
$str .= $val;
}
}
echo $str;
https://3v4l.org/5eg5G

Using php to extract keyword pairs for SEO

I'm currently investigating some new ideas for long tail SEO. I have a site where people can create their own blogs, which brings pretty good long tail traffic already. I'm already displaying the article title inside the article's title tags.
However, often the title does not match well for keywords in the content, and I'm interested in maybe adding some keywords into the title that php has actually determined would be best.
I've tried using a script which I made to work out what the most common words are on a page. This works ok but the problem with this is it comes up with pretty useless words.
It's occurred to me that what would be useful is to make a php script that would extract frequently occurring pairs (or sets of 3) words and then put them in an array ordered by how often they occur.
My problem: how to parse text in a more dynamic way to look for recurring pairs or triplets of words. How would I go about this?
function extractCommonWords($string, $keywords){
$stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
$string = preg_replace('/\s\s+/i', '', $string); // replace whitespace
$string = trim($string); // trim the string
$string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
$string = strtolower($string); // make it lowercase
preg_match_all('/\b.*?\b/i', $string, $matchWords);
$matchWords = $matchWords[0];
foreach ( $matchWords as $key=>$item ) {
if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
unset($matchWords[$key]);
}
}
$wordCountArr = array();
if ( is_array($matchWords) ) {
foreach ( $matchWords as $key => $val ) {
$val = strtolower($val);
if ( isset($wordCountArr[$val]) ) {
$wordCountArr[$val]++;
} else {
$wordCountArr[$val] = 1;
}
}
}
arsort($wordCountArr);
$wordCountArr = array_slice($wordCountArr, 0, $keywords);
return $wordCountArr;
}
For the sake of including some code - here's another primitive adaptation that returns multi-word keywords of a given length and occurrences - rather than strip all common words it only filters those that are at the start and end of a keyword. It still returns some nonsense but that is unavoidable really.
function getLongTailKeywords($str, $len = 3, $min = 2){ $keywords = array();
$common = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
$str = preg_replace('/[^a-z0-9\s-]+/', '', strtolower(strip_tags($str)));
$str = preg_split('/\s+-\s+|\s+/', $str, -1, PREG_SPLIT_NO_EMPTY);
while(0<$len--) for($i=0;$i<count($str)-$len;$i++){
$word = array_slice($str, $i, $len+1);
if(in_array($word[0], $common)||in_array(end($word), $common)) continue;
$word = implode(' ', $word);
if(!isset($keywords[$len][$word])) $keywords[$len][$word] = 0;
$keywords[$len][$word]++;
}
$return = array();
foreach($keywords as &$keyword){
$keyword = array_filter($keyword, function($v) use($min){ return !!($v>$min); });
arsort($keyword);
$return = array_merge($return, $keyword);
}
return $return;
}
run code *on random BBC News article
The problem with just ignoring common words, grammar and punctuation though is that they still carry meaning within a sentence. If you remove them you are at best changing the meaning or at worst generating unintelligible phrases. Even the idea of extracting "keywords" itself is flawed because words can have different meanings - when you remove them from a sentence you take them out of context.
It's not my area but there are complex studies into natural languages and there is no easy solution - though the general theory goes like this: A computer cannot decipher the meaning of a single piece of text, it has to rely on cross referencing a semantically tagged corpus of related material (which is a huge overhead).

How can I wrap characters in stringA with a tag if they are found in stringB?

For example:
$stringA = 'Whatcha talkin bout Willis?';
$stringB = 'aeiou';
I need to wrap all characters in stringB matching any in stringA with a <span>.
How can I do this?
echo tagVowels($stringA);
function tagVowels($string) {
// ????
// So far I've been using a manual loop through each character.
// I'm hoping for a simpler/cleaner way.
for ($i = 0; $i <= strlen($string) -1; $i++) {
if (strpos()) {
$string = str_replace();
}
}
}
Result:
Wh<span>a</span>tch<span>a</span> t<span>a</span>lk<span>i</span>n bout W<span>i</span>ll<span>i</span>s
Generate a character class from $stringB:
$stringA = preg_replace('/['.$stringB.']/', '<span>$0</span>', $stringA);
This will wrap every single of those characters in span tags. If you want consecutive matching characters to end up in the same span tag, use this:
$stringA = preg_replace('/['.$stringB.']+/', '<span>$0</span>', $stringA);
Note that this approach will get ugly, if you include characters that are meta-characters within regex character classes (^]-\). However, as Brad Christie mentioned you can get around this problem, if you call preg_quote($stringB) instead of inserting $stringB right away.
Using a non-regex variation, take advantage of str_replace's ability to accept arrays:
$find = str_split($stringB);
$replace = array();
foreach ($find as $ltr)
{
$replace[] = sprintf('<span>%s</span>', $ltr);
}
$stringA = str_replace($find, $replace, $stringA);
Simple example: http://ideone.com/3CKctq

Strlen to strip every [x] characters

I'm trying strip every third character (in the example a period) below is my best guess and is close as ive gotten but im missing something, probably minor. Also would this method (if i could get it working) be better than a regex match, remove?
$arr = 'Ha.pp.yB.ir.th.da.y';
$strip = '';
for ($i = 1; $i < strlen($arr); $i += 2) {
$arr[$i] = $strip;
}
One way you can do it is:
<?php
$oldString = 'Ha.pp.yB.ir.th.da.y';
$newString = "";
for ($i = 0; $i < strlen($oldString ); $i++) // loop the length of the string
{
if (($i+1) % 3 != 0) // skip every third letter
{
$newString .= $oldString[$i]; // build up the new string
}
}
// $newString is HappyBirthday
echo $newString;
?>
Alternatively the explode() function might work, if the letter you're trying to remove is always the same one.
This might work:
echo preg_replace('/(..)./', '$1', 'Ha.pp.yB.ir.th.da.y');
To make it general purpose:
echo preg_replace('/(.{2})./', '$1', $str);
where 2 in this context means you are keeping two characters, then discarding the next.
A way of doing it:
$old = 'Ha.pp.yB.ir.th.da.y';
$arr = str_split($old); #break string into an array
#iterate over the array, but only do it over the characters which are a
#multiple of three (remember that arrays start with 0)
for ($i = 2; $i < count($arr); $i+=2) {
#remove current array item
array_splice($arr, $i, 1);
}
$new = implode($arr); #join it back
Or, with a regular expression:
$old = 'Ha.pp.yB.ir.th.da.y';
$new = preg_replace('/(..)\./', '$1', $old);
#selects any two characters followed by a dot character
#alternatively, if you know that the two characters are letters,
#change the regular expression to:
/(\w{2})\./
I'd just use array_map and a callback function. It'd look roughly like this:
function remove_third_char( $text ) {
return substr( $text, 0, 2 );
}
$text = 'Ha.pp.yB.ir.th.da.y';
$new_text = str_split( $text, 3 );
$new_text = array_map( "remove_third_char", $new_text );
// do whatever you want with new array

Unable to find tokens in string

I am trying to write a small php
application and i am facing a problem.
This is suppose to get text like :
this is *noun but it is *name.
And
should take the words that start with
a star and add them to the string
tokens. However this is not working.
// get list of fields (each should have words delimited by underscores
$storyArray = split(' ', $story);
$tokens = ""; // space-delimited list of fields
for ($i = 0; $i < count($storyArray); $i++) {
if ($storyArray[$i][0] == '*')
$tokens .= $storyArray[$i] + " ";
}
$tokensArray = split(' ', $tokens);
Wow, I can't believe I've been debugging this and missing the obvious fault!
This line here:
$tokens .= $storyArray[$i] + " ";
You must concatenate with a period (.), not a plus sign! What you have right now is basically the same as $tokens .= 0;
This worked for me:
$story = "this is *noun but it is *name";
$storyArray = split(' ', $story);
$tokens = array();
for ($i = 0; $i < count($storyArray); $i++) {
if ($storyArray[$i][0] == '*') {
array_push($tokens, substr($storyArray[$i], 1));
}
}
var_dump($tokens);
$tokenString = implode(" ", $tokens);
Note that I'm pushing the tokens directly into an array, then imploding it.
"+" is for addition, not string concatenation. It casts its arguments as numbers, which will always be 0 in your source.
On another note, splitting $tokens is unnecessary. Instead, append tokens to $tokensArray:
$story = "this is *noun but it is *name";
// get list of fields (each should have words delimited by underscores
$storyArray = split(' ', $story);
$tokens = ""; // space-delimited list of fields
$tokensArray=array();
for ($i = 0; $i < count($storyArray); $i++) {
if ($storyArray[$i][0] == '*') {
$tokens .= $storyArray[$i] . " ";
$tokensArray[] = $storyArray[$i];
}
}
If you only needed $tokens for generating $tokensArray, you can get rid of it. Also, depending on whether you need $storyArray, preg_match_all(...) might be able to replace your code:
preg_match_all('/\*\w+/', $story, $tokensArray);
$tokensArray = $tokensArray[0];
You can also use a regular expression to achieve the same effect, without all the string manipulation you are doing right now. This would be the most elegant solution:
$string = "this is *noun but it is *name";
// Lets set up an empty array
$tokens = array();
preg_match_all('/\*\w+/m', $string, $tokens);
$tokens = $tokens[0]; // Only one sub-pattern, dropping unnecessary dimension.
var_dump($tokens);
Regular expressions exists to do mainly exactly the kind of task you are trying to achieve now. They are usually faster than doing string manipulations manually (Regular Expression engine in PHP is compiled code).
To explain my regex:
/: start boundary
\*: an asterisk (*)
\w: any alpha-numeric character or underscore
+: previous marker, 1 or more times. (match \w one or more times)
/: end boundary
m: multiline modifier
Replace
$tokens .= $storyArray[$i] + " ";
with
$tokens .= $storyArray[$i]." ";
And
$tokensArray = split(' ', $tokens);
with
$tokensArray = split(' ', rtrim($tokens));
$tokens .= $storyArray[$i] + " ";
in this line, you should be using the . operator to concatenate strings.

Categories