Replace placeholders which start with # then whole word - php

I need to replace words that start with hash mark (#) inside a text.
Well I know how I can replace whole words.
preg_replace("/\b".$variable."\b/", $value, $text);
Because that \b modifier accepts only word characters so a word containing hash mark wont be replaced.
I have this html which contains #companyName type of variables which I replace with a value.

\b matches between an alphanumeric character (shorthand \w) and a non-alphanumeric character (\W), counting underscores as alphanumeric. This means, as you have seen, that it won't match before a # (unless that's preceded by an alnum character).
I suggest that you only surround your query word with \b if it starts and end with an alnum character.
So, perhaps something like this (although I don't know any PHP, so this may be syntactically completely wrong):
if (preg_match('/^\w/', $variable))
$variable = '\b'.$variable;
if (preg_match('/\w$/', $variable))
$variable = $variable.'\b';
preg_replace('/'.$variable.'/', $value, $text);

All \b does is match a change between non-word and word characters. Since you know $variable starts with non-word characters, you just need to precede the match by a non-word character (\W).
However, since you are replacing, you either need to make the non-word match zero-width, i.e. a look-behind:
preg_replace("/(?<=\\W)".$variable."\\b/", $value, $text);
or incorporate the matched character into the replacement text:
preg_replace("/(\\W)".$variable."\\b/", $value, "$1$text");

Why not just
preg_replace("/#\b".$variable."\b/", $value, $text);

Following expression can also be used for marking boundaries for words containing non-word characters:-
preg_replace("/(^|\s|\W)".$variable."($|\s|\W)/", $value, $text);

Related

How do I escape the brackets in a mysql REGEXP [duplicate]

I have a regular expression to escape all special characters in a search string. This works great, however I can't seem to get it to work with word boundaries. For example, with the haystack
add +
or
add (+)
and the needle
+
the regular expression /\+/gi matches the "+". However the regular expression /\b\+/gi doesn't. Any ideas on how to make this work?
Using
add (plus)
as the haystack and /\bplus/gi as the regex, it matches fine. I just can't figure out why the escaped characters are having problems.
\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:
add +
...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.
Try changing it to:
/\b\s?+/gi
Edit:
Extend this concept as far as you want. If you want the first + after any word boundary:
/\b[^+]*+/gi
Boundaries are very conditional assertions; what they anchor depends on what they touch. See this answer for a detailed explanation, along with what else you can do to deal with it.

How to remove special characters except for characters "ñ / Ñ" and dash "-" in PHP Laravel

I want to remove special characters in a string but have "ñ / Ñ" and "-" remain.
So if I have:
»¿Antonio Ramon-Peñaą
Result would be:
Antonio Ramon-Peña
You could use PCRE verbs to allow for a set list of characters, if it matches that ignore it. Then for all other characters (because it is not a character you care about) remove them.
preg_replace('/[\w\h-](*SKIP)(*FAIL)|./u', '', '»¿Antonio Ramon-Peñaą')
For a longer regex explanation see https://regex101.com/r/pCwGuy/1/. Basically [\w\h-] are any horizontal spaces, word characters, or hyphens. The u modifier after the closing delimiter expands the \w to include word characters outside ascii set. The . is any single character.
Alternatively you could match all valid characters then rejoin them.
preg_match('/[\w\h-]+/u', '»¿Antonio Ramon-Peñaą', $match);
echo implode('', $match);

regex capture certain characters only

currently dealing with a bit of a problem. this is my string "all-days"
im in need of some assistance to creating a regex to capture the first character, the dash and also the first character after the dash. Im a bit of a newbie to Regex so forgive me.
Here is what ive got so far. (^.)
capture the first character, the dash and also the first
character after the dash
With preg_match function:
$s = "all-days";
preg_match('/^(.)[^-]*(-)(.)/', $s, $m);
unset($m[0]);
print_r($m);
The output:
Array
(
[1] => a
[2] => -
[3] => d
)
Its not regex but If you want just a solution as you want by other way it can be achieve by explode, array_walk and implode
$string = 'all-days-with-my-style';
$arr = explode("-",$string);
$new = array_walk($arr,function(&$a){
$a = $a[0];
});
echo implode("-",$arr);
Live demo : https://eval.in/882846
Output is : a-d-w-m-s
I assume your string only contains word characters and hyphens, and doesn't have consecutive hyphens:
To remove all that isn't the first character the hyphens and the first character after them, remove all that isn't after a word boundary:
$result = preg_replace('~\B\w+~', '', 'all-days');
If you only want to match these characters, just catch each character after a word boundary:
if ( preg_match_all('~\b.~', 'all-days', $matches) )
print_r($matches[0]);
Code
See code in use here
\b(\w|-\b)
For more precision, the following can be used (note that it uses Unicode groups, so it doesn't work in every language, but it does in PHP). This will only match letters, not numbers and underscores. It uses a negative lookbehind and positive lookahead, but you can understand it if you keep reading this article and break it apart one piece at a time.
(\b\p{L}|(?<=\p{L})-(?=\p{L}))
Explanation
\b Assert position at a word boundary
(\w|-\b) Capture the following into capture group 1
\w Match any word character
| Or
- Match the - character literally
\b Assert position at a word boundary
\b:
Asserts the position in the string matches 1 of the following:
^\w Assert position at the start of the string and match a word character
\w$ Match a word character and assert its position as the last position in the string
\W\w Match any non-word character, followed by a word character
\w\W Match any word character, followed by a non-word character
\w:
Means a word character (usually defined by any character in the set a-zA-Z0-9_, however, some languages also accept Unicode characters that represent any letter, number, or underscore \p{L}\p{N}_).
For more precision (depending on the use-case), you can specify [a-zA-Z] (for ASCII letters), \p{L} for Unicode letters, or [a-z] with the i flag for ASCII characters with the case-insensitive flag enabled in regex.

Regex to remove single characters from string

Consider the following strings
breaking out a of a simple prison
this is b moving up
following me is x times better
All strings are lowercased already. I would like to remove any "loose" a-z characters, resulting in:
breaking out of simple prison
this is moving up
following me is times better
Is this possible with a single regex in php?
$str = "breaking out a of a simple prison
this is b moving up
following me is x times better";
$res = preg_replace("#\\b[a-z]\\b ?#i", "", $str);
echo $res;
How about:
preg_replace('/(^|\s)[a-z](\s|$)/', '$1', $string);
Note this also catches single characters that are at the beginning or end of the string, but not single characters that are adjacent to punctuation (they must be surrounded by whitespace).
If you also want to remove characters immediately before punctuation (e.g. 'the x.'), then this should work properly in most (English) cases:
preg_replace('/(^|\s)[a-z]\b/', '$1', $string);
As a one-liner:
$result = preg_replace('/\s\p{Ll}\b|\b\p{Ll}\s/u', '', $subject);
This matches a single lowercase letter (\p{Ll}) which is preceded or followed by whitespace (\s), removing both. The word boundaries (\b) ensure that only single letters are indeed matched. The /u modifier makes the regex Unicode-aware.
The result: A single letter surrounded by spaces on both sides is reduced to a single space. A single letter preceded by whitespace but not followed by whitespace is removed completely, as is a single letter only followed but not preceded by whitespace.
So
This a is my test sentence a. o How funny (what a coincidence a) this is!
is changed to
This is my test sentence. How funny (what coincidence) this is!
You could try something like this:
preg_replace('/\b\S\s\b/', "", $subject);
This is what it means:
\b # Assert position at a word boundary
\S # Match a single character that is a “non-whitespace character”
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
\b # Assert position at a word boundary
Update
As raised by Radu, because I've used the \S this will match more than just a-zA-Z. It will also match 0-9_. Normally, it would match a lot more than that, but because it's preceded by \b, it can only match word characters.
As mentioned in the comments by Tim Pietzcker, be aware that this won't work if your subject string needs to remove single characters that are followed by non word characters like test a (hello). It will also fall over if there are extra spaces after the single character like this
test a hello
but you could fix that by changing the expression to \b\S\s*\b
Try this one:
$sString = preg_replace("#\b[a-z]{1}\b#m", ' ', $sString);

Regex Entire Input Matches Pattern

How can I make a regex pattern for use with the PHP preg_replace function that removes all characters which do not fit in a certain pattern. For example:
[a-zA-Z0-9]
You can negate the character set by using ^:
[^a-zA-Z0-9]
The ^ only negates the existing character set [...] it is in, and it only applies when it is the first character inside the set. You can read more about negated character sets here
So, finally:
preg_replace('/[^a-zA-Z0-9]/', '', $input);
Edit:
As noted in the comments below, you can also add the + quantifier so consecutive invalid characters will be replaced in 1 match of preg_replace's iteration:
preg_replace('/[^a-zA-Z0-9]+/', '', $input);

Categories