PHP: replace characters and make exceptions (preg_replace) - php

How do I:
replace characters in a word using preg_replace() but make
an exception if they are part of a
certain word.
replace an uppercase character with an
uppercase replacement even if the
replacement is lowercase and vice
versa.
example:
$string = 'Newton, Einstein and Edison. end';
echo preg_replace('/n/i', '<b>n</b>', $string);
from: newton, Einstein and Edison. end
to: Newton, Einstein and Edison. end
In this case I want all the n letters to be replaced unless they are part of the word end And Newton should not change to newton

echo preg_replace('/((?<!\be)n|n(?!d\b))/i', '<b>\1</b>', $string);
It matches any letter 'n' that is either not preceded by [word boundary + e] or not followed by [d + word boundary].
The general case: /((?<!\b$PREFIX)$LETTER|$LETTER(?!$SUFFIX\b))/i'

Related

How to remove a character from a string only if it follows a number?

I have several rows of data that are in address format, I want to remove the house number from each address.
So far I have been able to remove the number using:
<?php
$string = '25a Test Lane';
if (preg_match("/[0-9]/", $string)) {
$string = preg_replace("/[0-9]/", "", $string);
}
?>
$string then becomes 'a Test Lane' - but how would I go about removing 'a' as well? Bearing in mind the 'a' could be any letter following a number. I'd want to remove any character that immediately follows the number (no space in between).
You can use
trim(preg_replace("/\b\d+[a-zA-Z]*\b/", "", $string))
trim(preg_replace("/\b\d+[a-zA-Z]?\b/", "", $string))
Here is the regex demo. NOTE: if you only want to allow a single letter after the number, replace * with ? in [a-zA-Z]*.
Details:
\b - a word boundary
\d+ - one or more digits
[a-zA-Z]* - zero or more ASCII letters
[a-zA-Z]? - one or zero ASCII letters
\b - a word boundary.
See the PHP demo:
$string = '25a Test Lane';
$string = trim(preg_replace("/\b\d+[a-zA-Z]*\b/", "", $string));
echo $string;
// => Test Lane

Match exact word with any character regex

How to match exact word contains any special character ?
$string = 'Fall in love with #PepsiMoji! Celebrate #WorldEmojiDay by downloading our keyboard # http://bit.ly/pepsiKB & take your text game up a notch. - teacher';
preg_match("/\b#worldemojiday\b/i",$string); //false
I want to match exact word containing any character. Like if I want to match word 'download' in this string, It should return false
preg_match("/\bdownload\b/i",$string); //false
But when I search for downloading, It should return true.
Thanks
The problem is with \b word boundary before # non-word character. \b cannot match the position between 2 non-word (or between 2 word) characters, thus, you do not get a match.
A solution is either to remove the first \b, or use \B (a non-word boundary matching between 2 word or 2 non-word characters) instead of it.
\B#worldemojiday\b
Or
#worldemojiday\b
See demo (or this one)
Note that \B also matches at the beginning of a string.
Here is a way to build a regex dynamically, adding word boundaries only where necessary:
$srch = "žvolen";
$srch = preg_quote($srch);
if (preg_match('/\w$/u', $srch)) {
$srch .= '\\b';
}
if (preg_match('/^\w/u', $srch)) {
$srch = '\\b' . $srch;
}
echo preg_match("/" . $srch . "/ui", "žvolen is used.");
What about using lookarounds:
(?<!\w)#WorldEmojiDay(?!\w)
This ensures, that there's no word character before or after the string. See test at regex101

How to change all words to upper-case but exclude Roman numerals?

I'm trying to fix some manually typed addresses. I need to apply ucwords on the whole address but I want to keep all the roman numerals in uppercase and the letters after the house number.
VIA PIPPO III 74A
should become:
Via Pippo III 74A
How can I achieve this?
Use a negative lookahead to find words that are not Roman numerals:
/\b(?![LXIVCDM]+\b)([A-Z]+)\b/
Explanation:
\b - assert position at a word boundary
(?! - negative lookahead
[LXIVCDM]+ - match any character from the list one or more times
\b - assert position at a word boundary
) - end of negative lookahead
[A-Z] - any uppercase alphabet, one or more times
\b - assert position at a word boundary
Effectively, this matches any word that aren't entirely composed of the characters in the list [LXIVCDM] - that is, it matches any word that is not a Roman numeral.
Regex101 Demo
Now, use preg_replace_callback() to capture these words, convert them into lower case, and then capitalize the first letter:
$input = 'VIA PIPPO III 74A';
$pattern = '/\b(?![LXIVCDM]+\b)([A-Z]+)\b/';
$output = preg_replace_callback($pattern, function($matches) {
return ucfirst(strtolower($matches[0]));
}, $input);
var_dump($output);
Output:
string(17) "Via Pippo III 74A"
Demo
To selectively uppercase parts of a string via mb_eregi_replace():
$str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper('\\1')", $str, 'e');
Full example, how to fix an address manually typed, uppercasing the first letter of a words and keeping uppercase roman numerals and the letters A,B,C after the house number):
function ucAddress($str) {
// first lowercase all and use the default ucwords
$str = ucwords(strtolower($str));
// let's fix the default ucwords...
// uppercase letters after house number (was lowercased by the strtolower above)
$str = mb_eregi_replace('\b([0-9]{1,4}[a-z]{1,2})\b', "strtoupper('\\1')", $str, 'e');
// the same for roman numerals
$str = mb_eregi_replace('\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b', "strtoupper('\\0')", $str, 'e');
return $str;
}

PHP preg_match - only allow alphanumeric strings and - _ characters

I need the regex to check if a string only contains numbers, letters, hyphens or underscore
$string1 = "This is a string*";
$string2 = "this_is-a-string";
if(preg_match('******', $string1){
echo "String 1 not acceptable acceptable";
// String2 acceptable
}
Code:
if(preg_match('/[^a-z_\-0-9]/i', $string))
{
echo "not valid string";
}
Explanation:
[] => character class definition
^ => negate the class
a-z => chars from 'a' to 'z'
_ => underscore
- => hyphen '-' (You need to escape it)
0-9 => numbers (from zero to nine)
The 'i' modifier at the end of the regex is for 'case-insensitive' if you don't put that you will need to add the upper case characters in the code before by doing A-Z
if(!preg_match('/^[\w-]+$/', $string1)) {
echo "String 1 not acceptable acceptable";
// String2 acceptable
}
Here is one equivalent of the accepted answer for the UTF-8 world.
if (!preg_match('/^[\p{L}\p{N}_-]+$/u', $string)){
//Disallowed Character In $string
}
Explanation:
[] => character class definition
p{L} => matches any kind of letter character from any language
p{N} => matches any kind of numeric character
_- => matches underscore and hyphen
+ => Quantifier — Matches between one to unlimited times (greedy)
/u => Unicode modifier. Pattern strings are treated as UTF-16. Also
causes escape sequences to match unicode characters
Note, that if the hyphen is the last character in the class definition it does not need to be escaped. If the dash appears elsewhere in the class definition it needs to be escaped, as it will be seen as a range character rather then a hyphen.
\w\- is probably the best but here just another alternative
Use [:alnum:]
if(!preg_match("/[^[:alnum:]\-_]/",$str)) echo "valid";
demo1 | demo2
Why to use regex? PHP has some built in functionality to do that
<?php
$valid_symbols = array('-', '_');
$string1 = "This is a string*";
$string2 = "this_is-a-string";
if(preg_match('/\s/',$string1) || !ctype_alnum(str_replace($valid_symbols, '', $string1))) {
echo "String 1 not acceptable acceptable";
}
?>
preg_match('/\s/',$username) will check for blank space
!ctype_alnum(str_replace($valid_symbols, '', $string1)) will check for valid_symbols

Replace symbol if it is preceded and followed by a word character

I want to change a specific character, only if it's previous and following character is of English characters. In other words, the target character is part of the word and not a start or end character.
For Example...
$string = "I am learn*ing *PHP today*";
I want this string to be converted as following.
$newString = "I am learn'ing *PHP today*";
$string = "I am learn*ing *PHP today*";
$newString = preg_replace('/(\w)\*(\w)/', '$1\'$2', $string);
// $newString = "I am learn'ing *PHP today* "
This will match an asterisk surrounded by word characters (letters, digits, underscores). If you only want to do alphabet characters you can do:
preg_replace('/([a-zA-Z])\*([a-zA-Z])/', '$1\'$2', 'I am learn*ing *PHP today*');
The most concise way would be to use "word boundary" characters in your pattern -- they represent a zero-width position between a "word" character and a "non-word" characters. Since * is a non-word character, the word boundaries require the both neighboring characters to be word characters.
No capture groups, no references.
Code: (Demo)
$string = "I am learn*ing *PHP today*";
echo preg_replace('~\b\*\b~', "'", $string);
Output:
I am learn'ing *PHP today*
To replace only alphabetical characters, you need to use a [a-z] as a character range, and use the i flag to make the regex case-insensitive. Since the character you want to replace is an asterisk, you also need to escape it with a backslash, because an asterisk means "match zero or more times" in a regular expression.
$newstring = preg_replace('/([a-z])\*([a-z])/i', "$1'$2", $string);
To replace all occurances of asteric surrounded by letter....
$string = preg_replace('/(\w)*(\w)/', '$1\'$2', $string);
AND
To replace all occurances of asteric where asteric is start and end character of the word....
$string = preg_replace('/*(\w+)*/','\'$1\'', $string);

Categories