PHP Regex: Remove words less than 3 characters

PHP Regex: Remove words less than 3 characters - php

I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);

Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries

You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''

Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);

As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"

To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.

Related

How to replace all occurrences of a character except the first one in PHP using a regular expression?

Given an address stored as a single string with newlines delimiting its components like:
1 Street\nCity\nST\n12345
The goal would be to replace all newline characters except the first one with spaces in order to present it like:
1 Street
City ST 12345
I have tried methods like:
[$street, $rest] = explode("\n", $input, 2);
$output = "$street\n" . preg_replace('/\n+/', ' ', $rest);
I have been trying to achieve the same result using a one liner with a regular expression, but could not figure out how.

I would suggest not solving this with complicated regex but keeping it simple like below. You can split the string with a \n, pop out the first split and implode the rest with a space.
<?php
$input = explode("\n","1 Street\nCity\nST\n12345");
$input = array_shift($input) . PHP_EOL . implode(" ", $input);
echo $input;
Online Demo

You could use a regex trick here by reversing the string, and then replacing every occurrence of \n provided that we can lookahead and find at least one other \n:
$input = "1 Street\nCity\nST\n12345";
$output = strrev(preg_replace("/\n(?=.*\n)/", " ", strrev($input)));
echo $output;
This prints:
1 Street
City ST 12345

You can use a lookbehind pattern to ensure that the matching line is preceded with a newline character. Capture the line but not the trailing newline character and replace it with the same line but with a trailing space:
preg_replace('/(?<=\n)(.*)\n/', '$1 ', $input)
Demo: https://onlinephp.io/c/5bd6d

You can use an alternation pattern that matches either the first two lines or a newline character, capture the first two lines without the trailing newline character, and replace the match with what's captured and a space:
preg_replace('/(^.*\n.*)\n|\n/', '$1 ', $input)
Demo: https://onlinephp.io/c/2fb2f

I leave you another method, the regex is correct as long as the conditions are met, in this way it always works
$string=explode("/","1 Street\nCity\nST\n12345");
$string[0]."<br>";
$string[1]." ".$string[2]." ".$string[3]

Adding space to string based on condition

I have a load of labels which are camel case. Some examples are
whatData
whoData
deliveryDate
importantQuestions
What I am trying to do is this. Any label which has the word Data needs to have this word removed. At the point of the capital letter, I need to provide a space. Finally, everything should be uppercase. I have done the removal of Data and the uppercase by doing this ($data->key is the label)
strtoupper(str_replace('Data', '', $data->key))
The part I am struggling with is adding the spaces between words. So basically the above words should end up like this
WHAT
WHO
DELIVERY DATE
IMPORTANT QUESTIONS
How can I factor in the last part of this?
Thanks

It will add spaces before every capital letters. Try this:
$String = 'whatData';
$Words = preg_replace('/(?<!\ )[A-Z]/', ' $0', $String);

Problem
Your regex '~^[A-Z]~' will match only the first capital letter. Check out Meta Characters in the Pattern Syntax for more information.
Your replacement is a newline character '\n' and not a space.
Solution
Use preg_replace(). Try below code.
$string = "whatData";
echo preg_replace('/(?<!\ )[A-Z]/', ' $0', $string);
Output
what Data

Try following:
$string = 'importantQuestions';
$string = strtoupper(ltrim(preg_replace('/[A-Z]/', ' $0', $string)));
echo $string;
This will give you output as:
IMPORTANT QUESTIONS

Try this:
preg_split: split on camel case
array_map: UPPER case all the element
implode: Implode the array
str_replace: Replace the `DATE` with empty
trim: trim the white spaces.
Do this simple things:
echo trim(str_replace("DATE", "", implode(" ", array_map("strtoupper", preg_split('/(?=[A-Z])/', 'deliveryDate', -1, PREG_SPLIT_NO_EMPTY))))); // DELIVERY
This is result exactly what you want.

PHP rtrim all trailing special characters

I'm making a function that that detect and remove all trailing special characters from string. It can convert strings like :
"hello-world"
"hello-world/"
"hello-world--"
"hello-world/%--+..."
into "hello-world".
anyone knows the trick without writing a lot of codes?

Just for fun
[^a-z\s]+
Regex demo
Explanation:
[^x]: One character that is not x sample
\s: "whitespace character": space, tab, newline, carriage return, vertical tab sample
+: One or more sample
PHP:
$re = "/[^a-z\\s]+/i";
$str = "Hello world\nhello world/\nhello world--\nhellow world/%--+...";
$subst = "";
$result = preg_replace($re, $subst, $str);

try this
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
or escape apostraphe from string
preg_replace('/[^A-Za-z0-9\-\']/', '', $string); // escape apostraphe

You could use a regex like this, depending on your definition of "special characters":
function clean_string($input) {
return preg_replace('/\W+$/', '', $input);
}
It replaces any characters that are not a word character (\W) at the end of the string $ with nothing. \W will match [^a-zA-Z0-9_], so anything that is not a letter, digit, or underscore will get replaced. To specify which characters are special chars, use a regex like this, where you put all your special chars within the [] brackets:
function clean_string($input) {
return preg_replace('/[\/%.+-]+$/', '', $input);
}

This one is what you are looking for. :
([^\n\w\d \"]*)$
It removes anything that is not from the alphabet, a number, a space and a new line.
Just call it like this :
preg_replace('/([^\n\w\s]*)$/', '', $string);

Finding and replacing all words that ends with 'ing'

I'm trying to find and replace all words that ends with 'ing'. How would I do that?
$text = "dreaming";
if (strlen($text) >= 6) {
if (0 === strpos($text, "ing"))
//replace the last 3 characters of $text <---not sure how to do this either
echo $text;
echo "true";
}
Result:
null
Want Result:
dream
true

You could also use substr
$text = "dreaming";
if (substr($text, (strlen($text) - 3), 3) === 'ing') {
$text = substr($text, 0, (strlen($text) - 3));
}
echo $text;

This should work for replacing ing at the end of words whilst ignoring stuff starting with Ing as well as words with ing in the middle of them.
$output = preg_replace('/(\w)ing([\W]+|$)/i', '$1$2', $input);
Updated to reflect change specified in comments.

You could use two regexs depending on what you are trying to accomplish your question is a bit ambiguous.
echo preg_replace('/([a-zA-Z]+)ing((:?[\s.,;!?]|$))/', '$1$2', $text);
or
echo preg_replace('/.{3}$/', '', $text);
The first regex looks for word characters before an ing and then punctuation marks, white spaces, or the end of the string. The second just takes off the last three characters of the string.

You can use regex and word boundaries.
$str = preg_replace('/\Bing\b/', "", $str);
\B (non word boundary) matches where word characters are sticking together.
Be aware it substitutes king to k. See demo at regex101

Regex to insert dot (.) after characters, before new line

I'm reformatting some text, and sometimes I have a string, where there is a sentence which is not ended by a dot.
I'm running various checks for this purpose, and one more I'd like is to "Add dot after last character before new line".
I'm not sure how to form the regular expression for this:]
$string = preg_replace("/???/", ".\n", $string);

Try this one:
$string = preg_replace("/(?<![.])(?=[\n\r]|$)/", ".", $string);
negative lookbehind (?<![.]) is checking previous character is not .
positive lookahead (?=[\n\r]|$) is checking next character is a newline or end of string.

like this I suppose:
<?php
$string = "Add dot after last character before new line\n";
$string = preg_replace("/(.)$/", "$1.\n", $string);
print $string;
?>
This way the dot will be added after the word line in the sentence and before the \n.
demo : http://ideone.com/J4g7tH

I'd do:
$string = "Add dot after last character before new line\n";
$string = preg_replace("/([^.\r\n])$/s", "$1.", $string);

Thanks for all the answers, but none of them really caught all scenarios right.
I fumbled my way to a good solution using the word boundary regex character class:
// Add dot after every word boundary that is followed by a new line.
$string = preg_replace("/[\b][\n]/", ".\n", $string);
I guess [\b][\n] could just as well be \b\n without square brackets.

This works for me:
$content = preg_replace("/(\w+)(\n)/", "$1.$2", $content);
It will match a word immediately followed by a new line, and add a dot in between.
Will match:
Hello\n
Will not match:
Hello \n
or
Hello.\n

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regex: Remove words less than 3 characters - php

You can use the word boundary tag: \b: Replace: \b[a-z]{1,2}\b with ''

Use this preg_replace('/(\b.{1,2}\s)/','',$your_string);

To me, it seems that this hack works fine with most PHP versions: $string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1)); Where [a-zA-Z0-9] are the accepted Char/Number range.

Related

How to replace all occurrences of a character except the first one in PHP using a regular expression?

Adding space to string based on condition

PHP rtrim all trailing special characters

Finding and replacing all words that ends with 'ing'

Regex to insert dot (.) after characters, before new line

Categories

Resources