Regex of non breaking space in php - php

input:
$string = "a b c d e";
i have a string in php and I need to replace the string with the non-break space code
output:
"a \xc2\xa0b c \xc2\xa0d \xc2\xa0\xc2\xa0e"
single space and the first space is not allowed to replace with \xc2\xa0
when two space appear " ", the output is " \xc2\xa0", first space is kept and the second space is replace.
when three spaces appear " ", the output is " \xc2\xa0\xc2\xa0", first space is kept and the second and third space is replaced.
the input string is randomly
Any idea with the Regular expression or other function of php
Thank you very much.

preg_replace('/(?<= ) {1,2}/', "\xc2\xa0", $str);
Lookbehind (?<= ) sees if a space is preceeding the match, {1,2} matches 1 and 2 spaces. The replace will only happen with the spaces matched, not the lookbehind. If you want to replace as many spaces as possible (if there are more than 3 also), just replace {1,2} with +.

$s = preg_replace('~(?<= ) ~', '\xc2\xa0', $s);

Related

Why regex with lookaheads doesn't match?

I need (in PHP) to split a sententse by the word that cannot be the first or the last one in the sentence. Say the word is "pression" and here is my regex
/^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$/i
Live here: https://regex101.com/r/CHAhKj/1/
First, it doesn't match.
Next, I think - it is at all possible to split that way? I tryed simplified example
print_r(preg_split('/^.+pizza.+$/', 'my pizza is cool'));
live here http://sandbox.onlinephpfunctions.com/code/10b674900fc1ef44ec79bfaf80e83fe1f4248d02
and it prints an array of 2 empty strings, when I expect
['my ', ' is cool']
I need (in PHP) to split a sentence by the word that cannot be the first or the last one in the sentence
You may use this regex:
(?<=[^\s.?]\h)pression(?=\h[^\s.?])
RegEx Demo
RegEx Details:
(?<=[^\s.?]\h): Lookbehind to assert that ahead of current position we have a space and a character that not a whitespace, not a dot and not a ?.
pression: Match word pression
(?=\h[^\s.?]): Lookahead to assert that before current position we have a space and a character that not a whitespace, not a dot and not a ?
First, ^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$ can't match any string at all because the (?=[\s\.\,\:\;])p part requires p to be also either a whitespace char, or a ., ,, : or ;, which invalidates the whole match at once.
Second, ^.+pizza.+$ pattern does not ensure the pizza matched is not the first or last word in a sentence as . matches whitespace, too. It does not return anything meaningful, because preg_split uses the match to break string into chunks, and the two empty values are 1) start of string and 2) empty string positions.
That said, all you need is:
preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)
See the regex demo. Details:
^ - start of string
(.*?\w\W+) - Capturing group 1: any zero or more chars, as few as possible, then a word char and then one or more non-word chars
pression - a word
(\W+\w.*) - Capturing group 2: one or more non-word chars, a word char, and then any zero or more chars as many as possible
$ - end of string.
s makes the . match across lines and i flag makes the pattern match in a case insensitive way.
See the PHP demo:
$text = "You can use any regular expression pression inside the lookahead ";
if (preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)) {
echo $m[1] . " << | >> " . $m[2];
}
// => You can use any regular expression << | >> inside the lookahead

matching space only before character regex and replacing space with dash

What I'm after is to replace a space with a dash - BUT only if that space is has a character after it. Reason being is I have an array of strings some of which have spaces inserted after the final word or character. (this is out of my control).
e.g
In the example below I have used %20 to show where a space is
string1 = farmer%20jones
string2 = farmer%20jim%20
I have the following regex preg_replace('/\s./', '-', $string);
I think I'm half way there, but the above searches for a space preceding a character and replaces that with a -
What I get with the above regex is
string1 = farmer-jones
string2 = farmer-jim-
What I want is:
string1 = farmer-jones
string2 = farmer-jim
I don't want that trailing - to be added.
Any help much appreciated
You can use negative lookahead here:
$repl = preg_replace('/\h+(?!$)/', '-', $string);
\h+ will match 1 or more horizontal whitespace.
(?!$) will assert that next position is not end of line.
RegEx Demo

Preg_replace Tag Replace Dashes With HTML Tag

I am partially disabled. I write a LOT of wordpress posts in 'text' mode and to save typing I will use a shorthand for emphasis and strong tags. Eg. I'll write -this- for <em>this</em>.
I want to add a function in wordpress to regex replace word(s) that have a pair of dashes with the appropriate html tag. For starters I'd like to replace -this- with <em>this</em>
Eg:
-this- becomes <em>this</em>
-this-. becomes <em>this</em>.
What I can't figure out is how to replace the bounding chars. I want it to match the string, but then retain the chars immediately before and after.
$pattern = '/\s\-(.*?)\-(\s|\.)/';
$replacement = '<em>$1</em>';
return preg_replace($pattern, $replacement, $content);
...this does the 'search' OK, but it can't get me the space or period after.
Edit: The reason for wanting a space as the beginning boundary and then a space OR a period OR a comma OR a semi-colon as the ending boundary is to prevent problems with truly hyphenated words.
So pseudocode:
1. find the space + string + (space or punctuation)
2. replace with space + open_htmltag + string + close_htmltag + whatever the next char is.
Ideas?
a space as the beginning boundary and then a space OR a period OR a comma OR a semi-colon as the ending boundary
You can try with capturing groups with <em>$1</em>$2 as substitution.
[ ]-([^-]*)-([ .,;])
DEMO
sample code:
$re = "/-([^-]*)-([ .,;])/i";
$str = " -this-;\n -this-.\n -this- ";
$subst = '<em>$1</em>$2';
$result = preg_replace($re, $subst, $str);
Note: Use single space instead of \s that match any white space character [\r\n\t\f ]
Edited by o/p: Did not need opening space as delimiter. This is the winning answer.
You can try with Positive Lookahead as well with only single capturing group.
-([^-]*)-(?=[ .,;])
substitution string: <em>$1</em>
DEMO
You can use this regex:
(-)(.*?)(-)
Check the substitution section:
Working demo
Edit: as an improvement you can also use -(.*?)- and utilize capturing group \1
In the code below, the regex pattern will start at a hyphen and collect any non-hyphen characters until the next hyphen occurs. It then wraps the collected text in an em tag. The hyphens are discarded.
Note: If you use a hyphen for its intended purposes, this may cause problems. You may want to devise an escape character for that.
$str = "hello -world-. I am -radley-.";
$replace = preg_replace('/-([^-]+?)-/', '<em>$1</em>', $str);
echo $str; // no formatting
echo '<br>';
echo $replace; // formatting
Result:
hello -world-. I am -radley-.
hello <em>world</em>. I am <em>radley</em>.

How to remove more than one whitespace

Hello guys I currently have a problem with my preg_replace :
preg_replace('#[^a-zA-z\s]#', '', $string)
It keeps all alphabetic letters and white spaces but I want more than one white space to be reduced to only one. Any idea how this can be done ?
$output = preg_replace('!\s+!', ' ', $input);
From Regular Expression Basic Syntax Reference
\d, \w and \s
Shorthand character classes matching digits, word characters (letters, digits, and underscores), and whitespace (spaces, tabs, and line breaks). Can be used inside and outside character classes.
The character type \s stands for five different characters: horizontal tab (9), line feed (10), form feed (12), carriage return (13) and ordinary space (32). The following code will find every substring of $string which is composed entirely of \s. Only the first \s in the substring will be preserved. For example, if line feed, horizontal tab and ordinary space occur immediately after one another in a substring, line feed alone will remain after the replacement is done.
$string = preg_replace('#(\s)\s+#', '\1', $string);
preg_replace(array('#\s+#', '#[^a-zA-z\s]#'), array(' ', ''), $string);
Though it will replace all of whitespaces with spaces. If you want to replace consequent whitespaces (like two newlines with only one newline) - you should figure out logic for that, coz \s+ will match "\n \n \n" (5 whitespaces in a row).
try using trim instead
<?php
$something = " Error";
echo $something."\n";
echo "------"."\n";
echo trim($something);
?>
output
Error
------
Error
Question is old and miss some details. Let's assume OP wanted to reduce all consecutive horizontal whitespaces and replace by a space.
Exemple:
"\t\t \t \t" => " "
"\t\t \t\t" => "\t \t"
One possible solution would be simply to use the generic character type \h which stands for horizontal whitespace space:
preg_replace('/\h+/', ' ', $text)

Regex: remove non-alphanumeric chars, multiple whitespaces and trim() all together

I have a $text to strip off all non-alphanumeric chars, replace multiple white spaces and newline by single space and eliminate beginning and ending space.
This is my solution so far.
$text = '
some- text!!
for testing?
'; // $text to format
//strip off all non-alphanumeric chars
$text = preg_replace("/[^a-zA-Z0-9\s]/", "", $text);
//Replace multiple white spaces by single space
$text = preg_replace('/\s+/', ' ', $text);
//eliminate beginning and ending space
$finalText = trim($text);
/* result: $finalText ="some text for testing";
without non-alphanumeric chars, newline, extra spaces and trim()med */
Is it possible to combine/achieve all these in one regular expression? as I would get the desired result in one line as below
$finalText = preg_replace(some_reg_expression, $replaceby, $text);
thanks
Edit: clarified with a test string
Of course you can. That is very easy.
The re will look like:
((?<= )\s*)|[^a-zA-Z0-9\s]|(\s*$)|(^\s*)
I have no PHP at hand, I have used Perl (just to test the re and show that it works) (you can play with my code here):
$ cat test.txt
a b c d
a b c e f g fff f
$ cat 1.pl
while(<>) {
s/((?<= )\s*)|[^a-zA-Z0-9\s]|(\s*$)|(^\s*)//g;
print $_,"\n";
}
$ cat test.txt | perl 1.pl
a b c d
a b c e f g fff f
For PHP it will be the same.
What does the RE?
((?<= )\s*) # all spaces that have at least one space before them
|
[^a-zA-Z0-9\s] # all non-alphanumeric characters
|
(\s*$) # all spaces at the end of string
|
(^\s*) # all spaces at the beginning of string
The only tricky part here is ((?<= )\s*), lookbehind assertion. You remove spaces if and only if the substring of spaces has a space before.
When you want to know how lookahead/lookbehind assertions work, please take a look at http://www.regular-expressions.info/lookaround.html.
Update from the discussion:
What happens when $text ='some ? ! ? text';?
Then the resulting string contains multiple spaces between "some" and "text".
It is not so easy to solve the problem, because one need positive lookbehind assertions with variable length, and that is not possible at the moment. One cannot simple check spaces because it can happen so that it is not a space but non-alphanumerich character and it will be removed anyway (for example: in " !" the "!" sign will be removed but RE knows nothing about; one need something like (?<=[^a-zA-Z0-9\s]* )\s* but that unfortunately will not work because PCRE does not support lookbehind variable length assertions.
I do not think that you can achieve that with one regex. You would basically need to stick in an if else condition, which it is not possible through Regular Expressions alone.
You would basically need one regex to remove non-alphanumeric digits and another one to collapse the spaces, which is basically what you are already doing.
Check this if this is what you are looking for ---
$patterns = array ('/[^a-zA-Z0-9\s]/','/\s+/');
$replace = array ("", ' ');
trim( preg_replace($patterns, $replace, $text) );
MAy be it may need some modification, just let me know if this is something what you want to do??
For your own sanity, you will want to keep regular expressions that you can still understand and edit later on :)
$text = preg_replace(array(
"/[^a-zA-Z0-9\s]/", // remove all non-space, non-alphanumeric characters
'/\s{2,}/', // replace multiple white space occurrences with single
), array(
'',
' ',
), trim($originalText));
$text =~ s/([^a-zA-Z0-9\s].*?)//g;
Doesn't have to be any harder than this.

Categories