So I am trying to make a morse code encoder/decoder; I got the encoder done, but the decoder is giving me some problems.
So, if I use the function test and input "ab" it will return "ab". If however, I input "a b" it returns "c d" (as it should, 100% working)
function test($code){
$search = array('/\ba\b/', '/\bb\b/');
$replace = array('c', 'd');
return preg_replace($search, $replace, $code);
}
BUT when I use the function morsedecode and input ".- -..." it doesn't do anything and retuns ".- -...".
function morsedecode($code){
$search = array('/\b.-\b/', '/\b-...\b/');
$replace = array('a', 'b');
return preg_replace($search, $replace, $code);
}
I am stuck because it doesn't seem to be working for symbols, as it does for letters and words. Does anyone know the reason for this and is there anyway to work around this in PHP?
Update
If all your characters are surrounded by spaces (or beginning/end of line), you will probably find it easier to use strtr rather than a regex based approach. Since strtr replaces longest matches first, you don't have to worry about (for example) -.- (k) being partially replaced as -a.
function morsedecode($code){
$search = array('.-', '-...');
$replace = array('a', 'b');
return strtr($code, array_combine($search, $replace));
}
echo morsedecode(".- -...");
Output:
a b
Demo on 3v4l.org
Original Answer
Your problem is that \b matches a word boundary i.e. the place where the character to the left is a word character (a-zA-Z0-9_) and the character to the right a non-word character (or vice versa). Since you have no word characters in your input string, you can never match a word boundary. Instead, you could use lookarounds for a character which is not a dot or a dash:
function morsedecode($code){
$search = array('/(?<![.-])\.-(?![.-])/', '/(?<![.-])-\.\.\.(?![.-])/');
$replace = array('a', 'b');
return preg_replace($search, $replace, $code);
}
echo morsedecode(".- -...");
Output
a b
Demo on 3v4l.org
Note that . is a special character in regex (matching any character) and needs to be escaped, otherwise it will match a - as well as a ..
\b is a word boundary, which is any of the following.
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
'/\b.-\b/'
The first does \b not match in .- -... because of #1. Specifically if the first character is a word character
A word character = ASCII letter, digit or underscore so . is not a word character.
Also, you need to escape . characters like \..
Try looking for \s* (any number of white spaces) instead of a word boundary.
function morsedecode($code){
$search = array('/\s*\.-\s*/', '/\s*-\.\.\.\s*/');
$replace = array('a', 'b');
return preg_replace($search, $replace, $code);
}
Example
https://regex101.com/r/LCZXCn/1
I ended up coming up with my own little fix for the problem:
function morsedecode($code){
$bd_code = str_replace(array('.', '-', '/'), array('dot', 'dash', '~slash~'), $code);
$search = array('/\bdotdash\b/', '/\bdashdotdotdot\b/', '/\bdashdotdashdot\b/', 'etc..');
$replace = array('a', 'b', 'c', 'etc..');
$string = preg_replace($search, $replace, $bd_code);
return str_replace(array(' ', '~slash~'), array('', ' '), $string);
}
Definitely not the most efficient but gets the job done. #Nick answer is definitely an efficient way to go.
Related
I'm making a function that that detect and remove all trailing special characters from string. It can convert strings like :
"hello-world"
"hello-world/"
"hello-world--"
"hello-world/%--+..."
into "hello-world".
anyone knows the trick without writing a lot of codes?
Just for fun
[^a-z\s]+
Regex demo
Explanation:
[^x]: One character that is not x sample
\s: "whitespace character": space, tab, newline, carriage return, vertical tab sample
+: One or more sample
PHP:
$re = "/[^a-z\\s]+/i";
$str = "Hello world\nhello world/\nhello world--\nhellow world/%--+...";
$subst = "";
$result = preg_replace($re, $subst, $str);
try this
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
or escape apostraphe from string
preg_replace('/[^A-Za-z0-9\-\']/', '', $string); // escape apostraphe
You could use a regex like this, depending on your definition of "special characters":
function clean_string($input) {
return preg_replace('/\W+$/', '', $input);
}
It replaces any characters that are not a word character (\W) at the end of the string $ with nothing. \W will match [^a-zA-Z0-9_], so anything that is not a letter, digit, or underscore will get replaced. To specify which characters are special chars, use a regex like this, where you put all your special chars within the [] brackets:
function clean_string($input) {
return preg_replace('/[\/%.+-]+$/', '', $input);
}
This one is what you are looking for. :
([^\n\w\d \"]*)$
It removes anything that is not from the alphabet, a number, a space and a new line.
Just call it like this :
preg_replace('/([^\n\w\s]*)$/', '', $string);
I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);
Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries
You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''
Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);
As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"
To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.
I have this code which does what I want, but in four steps. I want only alphanumeric lower case letters, and space should be replaced by an underscore. I have written this function but want to learn if it's possible with one preg_replace() function call:
$str = 'qwerty!##$##$^##$Hello %#$sdsdsss';
$cityu= strtolower($str);
$id = str_replace(' ', '_', $cityu);
$outcome = preg_replace("/[^a-zA-Z0-9_]/", "", $id);
var_dump($outcome);
I want one preg_replace() to do this.
You can't use a single replacement string since you're doing two distinct operations, but preg_replace() can take arrays as arguments. This allows you to makes multiple sets of replacements in a single preg_replace() call.
$str = strtolower('qwerty!##$##$^##$Hello %#$sdsdsss');
echo preg_replace(array('/ /', '/\W/'), array('_', ''), $str);
// output: qwertyhello_sdsdsss
\W is a negated shorthand character class which is equivalent to [^A-Za-z0-9_].
Note that replacement order matters. Replacements will occur in the order they're listed, so you would get a different result in reverse order: first non-word characters would be replaced, then spaces, but the spaces will already have been removed in the first step.
echo preg_replace(array('/\W/', '/ /'), array('', '_'), $str);
// output: qwertyhellosdsdsss
I know this question has been asked several times for sure, but I have my problems with regular expressions... So here is the (simple) thing I want to do in PHP:
I want to make a function which replaces unwanted characters of strings. Accepted characters should be:
a-z A-Z 0-9 _ - + ( ) { } # äöü ÄÖÜ space
I want all other characters to change to a "_". Here is some sample code, but I don't know what to fill in for the ?????:
<?php
// sample strings
$string1 = 'abd92 s_öse';
$string2 = 'ab! sd$ls_o';
// Replace unwanted chars in string by _
$string1 = preg_replace(?????, '_', $string1);
$string2 = preg_replace(?????, '_', $string2);
?>
Output should be:
$string1: abd92 s_öse (the same)
$string2: ab_ sd_ls_o
I was able to make it work for a-z, 0-9 but it would be nice to allow those additional characters, especially äöü. Thanks for your input!
To allow only the exact characters you described:
$str = preg_replace("/[^a-zA-Z0-9_+(){}#äöüÄÖÜ -]/", "_", $str);
To allow all whitespace, not just the (space) character:
$str = preg_replace("/[^a-zA-Z0-9_+(){}#äöüÄÖÜ\s-]/", "_", $str);
To allow letters from different alphabets -- not just the specific ones you mentioned, but also things like Russian and Greek, or other types of accent marks:
$str = preg_replace("/[^\w+(){}#\s-]/", "_", $str);
If I were you, I'd go with the last one. Not only is it shorter and easier to read, but it's less restrictive, and there's no particular advantage to blocking stuff like и if äöüÄÖÜ are all fine.
Replace [^a-zA-Z0-9_\-+(){}#äöüÄÖÜ ] with _.
$string1 = preg_replace('/[^a-zA-Z0-9_\-+(){}#äöüÄÖÜ ]/', '_', $string1);
This replaces any characters except the ones after ^ in the [character set]
Edit: escaped the - dash.
The following function strips some words into an array, adjusts whitespaces and does something else I need. I also need to remove dashes, as I write them as words too. But this function doesn't remove dashes. What's wrong?
function stripwords($string)
{
// build pattern once
static $pattern = null;
if ($pattern === null) {
// pull words to remove from somewhere
$words = array('alpha', 'beta', '-');
// escape special characters
foreach ($words as &$word) {
$word = preg_quote($word, '#');
}
// combine to regex
$pattern = '#\b(' . join('|', $words) . ')\b\s*#iS';
}
$print = preg_replace($pattern, '', $string);
list($firstpart)=explode('+', $print);
return $firstpart;
}
To answer your question, the problem is the \b which designates a word boundary. If you have a space before or after the hyphen, it won't remove it as in " - ", the word boundary doesn't apply.
From http://www.regular-expressions.info/wordboundaries.html:
There are three different positions
that qualify as word boundaries:
Before the first character in the
string, if the first character is a
word character.
After the last
character in the string, if the last
character is a word character.
Between
two characters in the string, where
one is a word character and the other
is not a word character.
A "word character" is a character that can be used to form words.
A simple solution:
By adding \s along with \b to your pattern and using a positive look-behind and a positive look-ahead, you should be able to solve your problem.
$pattern = '#(?<=\b|\s|\A)(' . join('|', $words) . ')(?=\b|\s|\Z)\s*#iS';
Nowhere in your regex pattern are you looking for dashes. Why not just do
$string = str_replace('-', '', $string);
after you do your regex stuff?