preg_replace add space before and after of punctuation characters - php

I have a word filled with some punctuations.
$word = "'Ankara'da!?'";
I want to put spaces before or after punctuation characters.
Except apostrophe character which is in the middle of word.
At the result there must be only one space between letters or punctuations.
Required result: ' Ankara'da ! ? '
I tried below and Added accent Turkish chars. ( because \w didnt work)
preg_replace('/(?![a-zA-Z0-9ğüışöçİĞÜŞÖÇ])/ig', " ", $word);
Result: 'Ankara 'da ! ? '

If you need to only add single spaces between punctuation symbols and avoid adding them at the start/end of the string, you may use the following solution:
$word = "'Ankara'da!?'";
echo trim(preg_replace_callback('~\b\'\b(*SKIP)(*F)|\s*(\p{P}+)\s*~u', function($m) {
return ' ' . preg_replace('~\X(?=\X)~u', '$0 ', $m[1]) . ' ';
}, $word)); // => ' Ankara'da ! ? '
See the PHP demo.
The \b\'\b(*SKIP)(*F) part matches and skips all ' that are enclosed with word chars (letters, digits, underscores, and some more rarely used word chars). The \s*(\p{P}+)\s* part matches 0+ whitespaces, then captures 1+ punctuation symbols (including _!) into Group 1 and then any 0+ whitespaces are matched. Then, single spaces are added after each Unicode character (\X) that is followed with another Unicode character ((?=\X)). The outer leading/trailing spaces are later removed with trim()).
There is a way to do that with
$word = "'Ankara'da!?'";
echo preg_replace('~^\s+|\s+$|(\s){2,}~u', '$1',
preg_replace('~(?!\b\'\b)\p{P}~u', ' $0 ', $word)
);
See another PHP demo
The '~(?!\b\'\b)\p{P}~u' pattern matches any punctuation that is not ' enclosed with word chars, and this symbol is enclosed with spaces, and then '~^\s+|\s+$|(\s){2,}~u' pattern is used to remove all whitespaces at the start/end of the string and shrinks all whitespaces into 1 in all other locations.

Related

Regex to match first word until first space not containg numbers in UTF8 string

To find first word until first space i use regex:
([^\s]+)
But how to find first word until first space not containg numbers
For example string is:
First12word50 Secųond-Word Thirdųword ' result must be Secųond-Word
First1-2word50 Secųond/Word Thirdųword ' result must be Secųond/Word
First1/2word50 Secųond+Word Thirdųword ' result must be Secųond+Word
First1/2word50 Secųond1+Word Thirdų-word ' result must be Thirdų-word
First1/2word50 Sec1ųond1+Word Thirdų-word ' result must be Thirdų-word
First1/2word50 Sec1ųond1+Word Thir11dų-word ' result must be EMPTY
Regex ([^\s(?<!\d)$]+)
return me only
First
You can use
^(?:(?=\S*\d)\S+(?:\s+(?=\S*\d)\S+)*\W*)?\K\S*
See demo
The regex matches:
^ - start of a string
(?:(?=\S*\d)\S+(?:\s+(?=\S*\d)\S+)*\W*)? - one or zero occurrence (i.e. it is optional) of...
(?=\S*\d)\S+ - one or more characters other than whitespace that should contain at least one digit
(?:\s+(?=\S*\d)\S+)* - zero or more sequences of...
\s+ - one or more whitespace characters
(?=\S*\d)\S+ - ibid.
\W* - zero or more non-word characters
\K - omitting the whole text in the buffer matched so far
\S* - zero or more non-whitespace characters
PHP code:
$re = '~^(?:(?=\S*\d)\S+(?:\s+(?=\S*\d)\S+)*\W*)?\K\S*~u';
$arr = array("First12word50 Secųond-Word Thirdųword", "First1-2word50 Secųond/Word Thirdųword",
"First1/2word50 Secųond+Word Thirdųword", "First1/2word50 Secųond1+Word Thirdų-word",
"First1/2word50 Sec1ųond1+Word Thir11dų-word");
foreach ($arr as $s) {
preg_match($re, $s, $m);
echo '"' . $m[0] . "\"\n";
}

php regex remove all non-alphanumeric and space characters from a string

I need a regex to remove all non-alphanumeric and space characters, I have this
$page_title = preg_replace("/[^A-Za-z0-9 ]/", "", $page_title);
but it doesn't remove space characters and replaces some non-alphanumeric characters with numbers.
I need the special characters like puntuation and spaces removed.
If all you want to leave all of the alphanumeric bits you would use this:
(\W)+
Here is some test code:
$original = "Match spaces and {!}#";
echo $original ."<br>";
$altered = preg_replace("/(\W)+/", "", $original);
echo $altered;
Here is the output:
Match spaces and {!}#
Matchspacesand
Here is the explanation:
1st Capturing group: (\W) matches any non-word character [^a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
I need the special characters like puntuation and spaces removed.
Then use:
$page_title = preg_replace('/[\p{P}\p{Zs}]+/u', "", $page_title);
\p{P} matches any punctuation character
\p{Zs} matches any space character
/u - To support unicode
Try this
preg_replace('/[^[:alnum:]]/', '', $page_title);
[:alnum:] matches alphanumeric characters
Works good for me on Sublime and PHP Regex Tester
$page_title = preg_replace("/[^A-Za-z0-9]/", "", $page_title);

How to replace non-ASCII characters in a string in PHP?

I need to replace characters in a string which are not represented with a single byte.
My string is like this
$inputText="centralkøkkenet kliniske diætister";
In that string there are characters like ø and æ. These characters should be replaced. How do I mention these in a regular expression that I can use for replacement?
If you want to replace everything other than alphanumeric and space character then try it.
[^a-zA-Z0-9 ]
Here is demo
Sample code:
$re = "/[^a-zA-Z0-9 ]/";
$str = "centralkøkkenet kliniske diætister";
$subst = '';
$result = preg_replace($re, $subst, $str);
Better use [^\w\s] or [\W\S] to make it short and simple as suggested by #hjpotter92 as well in comments.
Pattern explanation:
[^\w\s] any character except: word characters:
(a-z, A-Z, 0-9, _), whitespace (\n, \r, \t,\f, and " ")
[\W\S] any character of:
non-word characters (all but a-z, A-Z, 0-9, _),
non-whitespace (all but \n, \r, \t, \f, and " ")
If you want to keep also punctation ie.: -'"!..., use this one:
$text = 'central-køkkenet "kliniske" diætister!';
$new = preg_replace('/[\x7F-\xFF]/ui', '', $text);
echo $new,"\n";
output:
central-kkkenet "kliniske" ditister!

Find words starting and ending with dollar signs $ in PHP

I am looking to find and replace words in a long string. I want to find words that start looks like this: $test$ and replace it with nothing.
I have tried a lot of things and can't figure out the regular expression. This is the last one I tried:
preg_replace("/\b\\$(.*)\\$\b/im", '', $text);
No matter what I do, I can't get it to replace words that begin and end with a dollar sign.
Use single quotes instead of double quotes and remove the double escape.
$text = preg_replace('/\$(.*?)\$/', '', $text);
Also a word boundary \b does not consume any characters, it asserts that on one side there is a word character, and on the other side there is not. You need to remove the word boundary for this to work and you have nothing containing word characters in your regular expression, so the i modifier is useless here and you have no anchors so remove the m (multi-line) modifier as well.
As well * is a greedy operator. Therefore, .* will match as much as it can and still allow the remainder of the regular expression to match. To be clear on this, it will replace the entire string:
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*)\$/', '', $text));
# => string(0) ""
I recommend using a non-greedy operator *? here. Once you specify the question mark, you're stating (don't be greedy.. as soon as you find a ending $... stop, you're done.)
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*?)\$/', '', $text));
# => string(10) " bar quz "
Edit
To fix your problem, you can use \S which matches any non-white space character.
$text = '$20.00 is the $total$';
var_dump(preg_replace('/\$\S+\$/', '', $text));
# string(14) "$20.00 is the "
There are three different positions that qualify as word boundaries \b:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
$ is not a word character, so don't use \b or it won't work. Also, there is no need for the double escaping and no need for the im modifiers:
preg_replace('/\$(.*)\$/', '', $text);
I would use:
preg_replace('/\$[^$]+\$/', '', $text);
You can use preg_quote to help you out on 'quoting':
$t = preg_replace('/' . preg_quote('$', '/') . '.*?' . preg_quote('$', '/') . '/', '', $text);
echo $t;
From the docs:
This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Contrary to your use of word boundary markers (\b), you actually want the inverse effect (\B)-- you want to make sure that there ISN'T a word character next to the non-word character $.
You also don't need to use capturing parentheses because you are not using a backreference in your replacement string.
\S+ means one or more non-whitespace characters -- with greedy/possessive matching.
Code: (Demo)
$text = '$foo$ boo hi$$ mon$k$ey $how thi$ $baz$ bar $foobar$';
var_export(
preg_replace(
'/\B\$\S+\$\B/',
'',
$text
)
);
Output:
' boo hi$$ mon$k$ey $how thi$ bar '

php replace if two or more non alphanumeric characters

I have been trying to replace a portion of a string if two of more non alphanumeric characters are found.
I have it partly working but can not replace when a underscore is in there.
This is what i am trying.
$str = "-dxs_ s";
$str = preg_replace('/\W{2,}|\_{2,}/', ' ', $str);
reults in -dxs_ s should be -dxs s.
So how do you replace if two or more non alphanumeric characters are found in a string?
Simply
$str = preg_replace('/(\W|_){2,}/', ' ', $str);
What this is doing is grouping the "non-word or underscore" part and applies the 2+ quantifier to it as a whole.
See it in action.
\W also excludes _ therefore you need your own characters class :
/[^a-zA-Z0-9]{2,}/
or
$result = preg_replace('/[^a-z\d]{2,}/i', ' ', $subject);

Categories