I want to remove all \r \n \r\n which is pretty easy to so I wrote:
str_replace(array("\r","\n"),"",$text);
but I saw this line:
str_replace(array("\r","\n","\\r","\\n"),"",$text);
and I was wondering what is the double backslash means \\r and \\n.
\ is an escape character, it's used to escape the following character.
In "\n", the backslash escapes n and the result will be a new line character.
In "\\n", the first backslash escapes the second backslash and the n is kept as is, so the result is a string containing \n (literally).
See the PHP official documentation > Strings.
In the context of your question, str_replace() will remove new lines ("\n" and "\r") and also remove \n and \r from the string ("\\n" and "\\r" respectively). There's no reason a text contains the words \n and \r, so it seems that using "\\n" and "\\r" has no interest here.
The first backslash escapes the second one, so it matches a literal backslash in $text.
I'm not sure why you would want to match that if you just want to remove newlines and carriage returns from the string.
Related
I am trying to learn Regex in PHP and stuck in here now. My ques may appear silly but pls do explain.
I went through a link:
Extra backslash needed in PHP regexp pattern
But I just could not understand something:
In the answer he mentions two statements:
2 backslashes are used for unescaping in a string ("\\\\" -> \\)
1 backslash is used for unescaping in the regex engine (\\ -> \)
My ques:
what does the word "unescaping" actually means? what is the purpose of unescaping?
Why do we need 4 backslashes to include it in the regex?
The backslash has a special meaning in both regexen and PHP. In both cases it is used as an escape character. For example, if you want to write a literal quote character inside a PHP string literal, this won't work:
$str = ''';
PHP would get "confused" which ' ends the string and which is part of the string. That's where \ comes in:
$str = '\'';
It escapes the special meaning of ', so instead of terminating the string literal, it is now just a normal character in the string. There are more escape sequences like \n as well.
This now means that \ is a special character with a special meaning. To escape this conundrum when you want to write a literal \, you'll have to escape literal backslashes as \\:
$str = '\\'; // string literal representing one backslash
This works the same in both PHP and regexen. If you want to write a literal backslash in a regex, you have to write /\\/. Now, since you're writing your regexen as PHP strings, you need to double escape them:
$regex = '/\\\\/';
One pair of \\ is first reduced to one \ by the PHP string escaping mechanism, so the actual regex is /\\/, which is a regex which means "one backslash".
I think you can use "preg_quote()":
http://php.net/preg_quote
This function escapes special chars, so you can give an input as it is, without escaping by yourself:
<?php
$string = "online 24/7. Only for \o/";
$escaped_string = preg_quote($string, "/"); // 2nd param is optional and used if you want to escape also the delimiter of your regex
echo $escaped_string; // $escaped_string: "online 24\/7. Only for \\o\/"
?>
I have tested \v (vertical white space) for matching \r\n and their combinations, but I found out that \v does not match \r and \n. Below is my code that I am using..
$string = "
Test
";
if (preg_match("#\v+#", $string )) {
echo "Matched";
} else {
echo "Not Matched";
}
To be more clear, my question is, is there any other alternative to match \r\n?
PCRE and newlines
PCRE has a superfluity of newline related escape sequences and alternatives.
Well, a nifty escape sequence that you can use here is \R. By default \R will match Unicode newlines sequences, but it can be configured using different alternatives.
To match any Unicode newline sequence that is in the ASCII range.
preg_match('~\R~', $string);
This is equivalent to the following group:
(?>\r\n|\n|\r|\f|\x0b|\x85)
To match any Unicode newline sequence; including newline characters outside the ASCII range and both the line separator (U+2028) and paragraph separator (U+2029), you want to turn on the u (unicode) flag.
preg_match('~\R~u', $string);
The u (unicode) modifier turns on additional functionality of PCRE and Pattern strings are treated as (UTF-8).
The is equivalent to the following group:
(?>\r\n|\n|\r|\f|\x0b|\x85|\x{2028}|\x{2029})
It is possible to restrict \R to match CR, LF, or CRLF only:
preg_match('~(*BSR_ANYCRLF)\R~', $string);
The is equivalent to the following group:
(?>\r\n|\n|\r)
Additional
Five different conventions for indicating line breaks in strings are supported:
(*CR) carriage return
(*LF) linefeed
(*CRLF) carriage return, followed by linefeed
(*ANYCRLF) any of the three above
(*ANY) all Unicode newline sequences
Note: \R does not have special meaning inside of a character class. Like other unrecognized escape sequences, it is treated as the literal character "R" by default.
This doesn't answer the question for alternatives, because \v works perfectly well
\v matches any character considered vertical whitespace; this includes the platform's carriage return and line feed characters (newline) plus several other characters, all listed in the table below.
You only need to change "#\v+#" to either
"#\\v+#" escape the backslash
or
'#\v+#' use single quotes
In both cases, you will get a match for any combination of \r and \n.
Update:
Just to make the scope of \v clear in comparison to \R, from perlrebackslash
\R
\R matches a generic newline; that is, anything considered a linebreak sequence by Unicode. This includes all characters matched by \v (vertical whitespace), ...
If there is some strange requirement that prevents you from using a literal [\r\n] in your pattern, you can always use hexadecimal escape sequences instead:
preg_match('#[\xD\xA]+#', $string)
This is pattern is equivalent to [\r\n]+.
To match every LINE of a given String, simple use the ^$ Anchors and advice your regex engine to operate in multi-line mode. Then ^$ will match the start and end of each line, instead of the whole strings start and end.
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
in PHP, that would be the m modifier after the pattern. /^(.*?)$/m will simple match each line, seperated by any vertical space inside the given string.
Btw: For line-Splitting, you could also use split() and the PHP_EOL constant:
$lines = explode(PHP_EOL, $string);
The problem is that you need the multiline option, or dotall option if using dot. It goes at the end of the delimiter.
http://www.php.net/manual/en/regexp.reference.internal-options.php
$string = "
Test
";
if(preg_match("#\v+#m", $string ))
echo "Matched";
else
echo "Not Matched";
To match a newline in PHP, use the php constant PHP_EOL. This is crossplatform.
if (preg_match('/\v+' . PHP_EOL ."/", $text, $matches ))
print_R($matches );
This regex also matches newline \n and carriage return \r characters.
(?![ \t\f])\s
DEMO
To match one or more newline or carriage return characters, you could use the below regex.
(?:(?![ \t\f])\s)+
DEMO
I have double backslashes '\' in my string that needs to be converted into single backslashes '\'. I've tried several combinations and end up with the whole string disappearing when I used echo or more backslashes are added to the string by accident. This regex thing is making me go bonkers...lol...
I tried this amongst other failed attempts:
$pattern = '[\\]';
$replacement = '/\/';
?>
<td width="100%"> <?php echo preg_replace($pattern, $replacement,$q[$i]);?></td>
I do apologise if this is a foolish issue and I appreciate any pointers.
Use stripslashes() - it does exactly what you're looking for.
<td width="100%"> <?php echo stripslashes($q[$i]);?></td>
Use stripslashes instead. Also, in your regex, you are searching for single backslashes and your replacement is incorrect. \\{2} should search for double backslashes and \ should replace them with singles, although I haven't tested this.
Just to explain further, the pattern [\\] matches any character in a set comprised of a single backslash. In php, you should also delimit your regex with forward slashes: /[\\]/
Your replacement, which is (without delimiters) \, is not a regular expression for matching a single backslash. The regex for matching a single backslash is \\. Note the escaping. This said, the replacement term needs to be a string, not a regex (with the exception of backreferences).
EDIT: Sven claims below that stripslashes removes all backslashes. This is simply not true, and I will explain why below.
If a string contains 2 backslashes, the first one will be considered an escaping backslash and will be removed. This can be seen at http://www.phpfiddle.org/main/code/3yn-2ut. The fact that any backslashes remain at all by itself contradicts the claim that stripslashes removes all backslashes.
Just to clarify, this string declaration is invalid: $x = "\";, since the backslash escapes the second quote. This string "\\" contains one backslash. In the process of unquoting this string, this backslash will be removed. This "\\\\" string contains two backslashes. When unquoting, the first will be considered an escaping backslash, and will be removed.
Use preg_replace to turn double backslash into single backslash:
preg_replace('/\\\\{2}/', '\\', $str)
The \ in the first parameter needs to be escaped twice, once for string and once more for regex, just like CodeAngry says.
In the second parameter it only gets excaped once for string.
Make sense?
Never use a regular expression if the string you are looking for is constant, as is the case with "Every instance of double backslash".
Use str_replace() for this task. It is a very easy function that replaces every occurance of a string with another.
In your case: str_replace('\\\\', '\\', $var).
The double backslash actually translates into four backslashed, because inside any quotes (single or double), a single backslash is the start of an escape sequence for the following character. If you want one literal backslash, you have to write two of them. You want two backslashes, you have to write four of them.
I do not like the suggestion of stripslashes(). This will of course "decode" your double backslash into one single backslash. But it will also remove all single backslashes in the whole string. If there were none - fine, otherwise things will fail now.
$pattern = '[\\]'; // wrong
$pattern = '[\\\\]'; // right
escape \ as \\ and escape \\ as \\\\ because \\] means escaped ].
Use htmlentities function to convert your slashes to html entities then using str_replace or preg_match to change them with new entity
How can characters " \n \t \r " be replaced with '-' ?
echo preg_replace('/\s/','-','\n\t\n\r\n');//output '\n\t\n\r\n' instead should be'-----'
Edit: I have dynamic content in real app like:
preg_replace('/\s/','-',$_Request['content']);
can I fix it by adding "" around variable?
preg_replace('/\s/','-',"$_Request['content']");
Edit2:
How can be string converted from format 'str' to format "str"?
Thanks
Well, two things. First, the problem is single quotes in your replacement string. Meta-Characters (\n\t\r, etc) are not processed inside of single quotes.
However, don't use a regex for this. There's no need for the complexity of the regex. Use
Either use str_replace:
echo str_replace(array("\r", "\n", "\t", "\v"), '-', "\r\n\t\r\v\n\t");
Or strtr:
echo strtr("\r\n\t\r\v\n\t", "\r\n\t\v", '----');
Edit: Ahh, now I see what you're getting at. You have a string with a literal \r\n\t\r\v\n\t in it, and want to replace them out. Well, you can do that via regex:
$regex = '/(\s|\\\\[rntv]{1})/';
$string = preg_replace($regex, '-', $_GET['content']);
Basically, it matches any space character, and any literal \ followed by either r, n, t or v...
If you are looking to replace the actual whitespace characters, you need to enclose the input string in double quotes (") so PHP converts the escape sequences for you:
echo preg_replace('/\s/', '-', "\n\t\n\r\n");
Else if the escape sequences occur literally (i.e. you see \n\t\n\r\n instead of line feed, tab, line feed, carriage return, line feed), you need to replace by the following character class (and keep single quotes (') on the input string):
echo preg_replace('/\\\\[rnt]/', '-', '\n\t\n\r\n');
You ought to be passing content through $_POST instead of $_GET, I don't know how PHP handles tabs, newlines and returns in GET variables.
You are using 's instead of "s. You should change your code to:
echo preg_replace('/\s/','-',"\n\t\n\r\n");
See here: single-quoted and double-quoted.
http://www.php.net/manual/en/language.types.string.php
There's also a string method for that:
echo strtr($str, "\r\n\t\v ", "-----");
If you want to remove linebreaks but retain spaces, then remove the trailing and the fifth -.
Since you seemingly want literal \r and \n converted, you need to use a map (or even a regex) like:
echo strtr($str, array('\\r'=>"\r", '\\n'=>"\n", '\t'=>"\t", ' '=>"␣"));
// single quoted strings escaped twice for illustration
Try:
echo preg_replace('/\s/','-',"\n\t\n\r\n");
Note the double quotes on the string.
If you enclose a string with single quotes, special characters lose their special meaning:
echo preg_replace('/\s/','-',"\n\t\n\r\n");
I've to replace newline (\n) with & in a string so that the received data could be parsed with parse_str() into array. The thing is that when I put \n in single quote it somehow turns out as to be replaced with a space:
str_ireplace(array('&', '+', '\n'), array('', '', '&'), $response)
"id=1 name=name gender=gender age=age friends=friends"
But when I put \n in double quotes then it works just fine:
str_ireplace(array('&', '+', "\n"), array('', '', '&'), $response)
"id=1&name=name&gender=gender&age=age&friends=friends"
Why is that so?
Because only the escaped sequences \' and \\ have a meaning in single quoted strings.
See the documentation:
To specify a literal single quote, escape it with a backslash (\). To specify a literal backslash, double it (\\). All other instances of backslash will be treated as a literal backslash: this means that the other escape sequences you might be used to, such as \r or \n, will be output literally as specified rather than having any special meaning.
Update:
Another difference is that PHP only substitutes variables inside double-quoted strings (and heredoc). Therefore you can consider processing of single-quoted strings to be faster in general (but maybe not measurably faster).
Btw you don't necessarily need to use str_ireplace as &, + and \n have no upper or lower case version. There is just one version, so str_replace would be enough.