Convert special characters like ' \t ' to " \t " (a real tab) - php

I want to make a CSV file importer on my website. I want the user to choose the delimiter.
The problem is when the form submits, the delimiter field is stored as '\t', for example, so when I'm parsing the file, I search for the string '\t' instead of a real TAB. It does the same thing with every special characters like \r, \n, etc...
I want to know the way or the function to use to convert these characters to their true representation without using an array like:
't' => "\t"
'r' => "\r"
...

You should probably decide what special chars will you allow and create a function like this one:
function translate_quoted($string) {
$search = array("\\t", "\\n", "\\r");
$replace = array( "\t", "\n", "\r");
return str_replace($search, $replace, $string);
}

echo str_replace("\\t", "\t", $string);
View an example here: http://ideone.com/IVFZk

PHP interpreter automatically escapes double quoted strings found in PHP source files, so echo "\t" actually indicates a TAB character.
On the contrary, when you read a string from any external source, the backslash assumes its literal value: a backslash and a 't'. You would express it in a PHP source as "\\t" (double quotes) or '\t' (single quotes), which is not what you want.
Sebastián's solution works, but PHP provides a native function for that.
stripcslashes() recognises C-like sequences (\a, \b, \f, \n, \r, \t and \v), as well as octal and hexadecimal representation, converting them to their actual meaning.
// C-like escape sequence
stripcslashes('\t') === "\t"; // true;
// Hexadecimal escape sequence
stripcslashes('\x09') === "\t"; // true;
// Octal escape sequence
stripcslashes('\011') === "\t"; // true;

Doesnt look like SO is leaving the tab in quotes, but tabbing once in any pad then copying into quotes should work.
$data = str_replace("\t", " ", $data);

Related

Robustly detect dash in PHP string [duplicate]

preg_replace does not return desired result when I use it on string fetched from database.
$result = DB::connection("connection")->select("my query");
foreach($result as $row){
//prints run-d.m.c.
print($row->artist . "\n");
//should print run.d.m.c
//prints run-d.m.c
print(preg_replace("/-/", ".", $row->artist) . "\n");
}
This occurs only when i try to replace - (dash). I can replace any other character.
However if I try this regex on simple string it works as expected:
$str = "run-d.m.c";
//prints run.d.m.c
print(preg_replace("/-/", ".", $str) . "\n");
What am I missing here?
It turns out you have Unicode dashes in your strings. To match all Unicode dashes, use
/[\p{Pd}\xAD]/u
See the regex demo
The \p{Pd} matches any hyphen in the Unicode Character Category 'Punctuation, Dash' but a soft hyphen, \xAD, hence it should be combined with \p{Pd} in a character class.
The /u modifier makes the pattern Unicode aware and makes the regex engine treat the input string as Unicode code point sequence, not a byte sequence.

How to parse characters in a single-quoted string?

To get a double quoted string (which I cannot change) correctly parsed I have to do following:
$string = '15 Rose Avenue\n Irlam\n Manchester';
$string = str_replace('\n', "\n", $string);
print nl2br($string); // demonstrates that the \n's are now linebreak characters
So far, so good.
But in my given string there are characters like \xC3\xA4. There are many characters like this (beginning with \x..)
How can I get them correctly parsed as shown above with the linebreak?
You can use
$str = stripcslashes($str);
You can escape a \ in single quotes:
$string = str_replace('\\n', "\n", $string);
But you're going to have a lot of potential replaces if you need to do \\xC3, etc.... best use a preg_replace_callback() with a function(callback) to translate them to bytes

Php display as a html text the new line \n

I'm using echo to display result but the result contains line breaks /n /t /r.
I want to know if the result has is \n or \t or \r and how many. I need to know so I can replace it in a html tag like <p> or <div>.
The result is coming from on other website.
In pattern CreditTransaction/CustomerData:
Email does not contain any text
In pattern RecurUpdate/CustomerData:
Email does not contain any text
In pattern AccountInfo:
I want like this.
In pattern CreditTransaction/CustomerData:
\n
\n
\n
\n\tEmail does not contain any text
\n
In pattern RecurUpdate/CustomerData:
\n
\n
\n
\n\tEmail does not contain any text
\n\tIn pattern AccountInfo:
Your question is quite unclear but I'll do my best to provide an answer.
If you want to make \n, \r, and \t visible in the output you could just manually unescape them:
str_replace("\n", '\n', str_replace("\r", '\r', str_replace("\t", '\t', $string)));
Or if you want to unescape all escaped characters:
addslashes($string);
To count how many times a specific character/substring occurs:
substr_count($string, $character_or_substring);
To check if the string contains a specific character/substring:
if (substr_count($string, $character_or_substring) > 0) {
// your code
}
Or:
if (strpos($string, $character_or_substring) !== false) { // notice the !==
// your code
}
As mentioned earlier by someone else in a comment, if you want to convert the newlines to br tags:
nl2br($string);
If you want to make tabs indenting you could replace all tabs with  :
str_replace("\t", ' ', $string);
Use double quotes to find newline and tab characters.
$s = "In pattern CreditTransaction/CustomerData:
Email does not contain any text
In pattern RecurUpdate/CustomerData: ";
echo str_replace("\t", "*", $s); // Replace all tabs with '*'
echo str_replace("\n", "*", $s); // Replace all newlines with '*'

Special (escaped) characters in replacements array in preg_replace get escaped

I’m trying to modify a string of the following form where each field is delimited by a tab except for the first which is followed by two or more tabs.
"$str1 $str2 $str3 $str4 $str5 $str6"
The modified string will have each field wrapped in HTML table tags, and be on its own, indented line as so.
"<tr>
<td class="title">$str1</td>
<td sorttable_customkey="$str2"></td>
<td sorttable_customkey="$str3"></td>
<td sorttable_customkey="$str4"></td>
<td sorttable_customkey="$str5"></td>
<td sorttable_customkey="$str6"></td>
</tr>
"
I tried using code like the following to do it.
$patterns = array();
$patterns[0]='/^/';
$patterns[1]='/\t\t+/';
$patterns[2]='/\t/';
$patterns[3]='/$/';
$replacements = array();
$replacements[0]='\t\t<tr>\r\n\t\t\t<td class="title">';
$replacements[1]='</td>\r\n\t\t\t<td sorttable_customkey="';
$replacements[2]='"></td>\r\n\t\t\t<td sorttable_customkey="';
$replacements[3]='"></td>\r\n\t\t</tr>\r\n';
for ($i=0; $i<count($lines); $i++) {
$lines[$i] = preg_replace($patterns, $replacements, $lines[$i]);
}
The problem is that the escaped characters (tabs and newlines) in the replacement array remain escaped in the destination string and I get the following string.
"\t\t<tr>\r\n\t\t\t<td class="title">$str</td>\r\n\t\t\t<td sorttable_customkey="$str2"></td>\r\n\t\t\t<td sorttable_customkey="$str3"></td>\r\n\t\t\t<td sorttable_customkey="$str4"></td>\r\n\t\t\t<td sorttable_customkey="$str5"></td>\r\n\t\t\t<td sorttable_customkey="$str6"></td>\r\n\t\t</tr>\r\n"
Strangely, this line I tried earlier on does work:
$data=preg_replace("/\t+/", "\t", $data);
Am I missing something? Any idea how to fix it?
You need double quotes or heredocs for the replacement string - PCRE only parses those escape characters in the search string.
In your working example preg_replace("/\t+/", "\t", $data) those are both literal tab characters because they're in double quotes.
If you changed it to preg_replace('/\t+/', '\t', $data) you can observe your main problem - PCRE understands that the \t in the search string represents a tab, but doesn't for the one in the replacement string.
So by using double quotes for the replacement, e.g. preg_replace('/\t+/', "\t", $data), you let PHP parse the \t and you get the expected result.
It is slightly incongruous, just something to remember.
Your $replacements array has all its strings decalred as single-quoted strings.
That means that escaped characters won't scape (except \').
It is not related directly to PCRE regular expressions, but to how PHP handles strings.
Basically you can type strings like these:
<?php # String test
$value = "substitution";
$str1 = 'this is a $value that does not get substituted';
$str2 = "this is a $value that does not remember the variable"; # this is a substitution that does not remember the variable
$str3 = "you can also type \$value = $value" # you can also type $value = substitution
$bigstr =<<< MARKER
you can type
very long stuff here
provided you end it with the single
value MARKER you had put earlier in the beginning of a line
just like this:
MARKER;
tl;dr version: problem is single quotes in the $replacements and $patterns that should be double quotes

Remove excess whitespace from within a string

I receive a string from a database query, then I remove all HTML tags, carriage returns and newlines before I put it in a CSV file. Only thing is, I can't find a way to remove the excess white space from between the strings.
What would be the best way to remove the inner whitespace characters?
Not sure exactly what you want but here are two situations:
If you are just dealing with excess whitespace on the beginning or end of the string you can use trim(), ltrim() or rtrim() to remove it.
If you are dealing with extra spaces within a string consider a preg_replace of multiple whitespaces " "* with a single whitespace " ".
Example:
$foo = preg_replace('/\s+/', ' ', $foo);
$str = str_replace(' ','',$str);
Or, replace with underscore, & nbsp; etc etc.
none of other examples worked for me, so I've used this one:
trim(preg_replace('/[\t\n\r\s]+/', ' ', $text_to_clean_up))
this replaces all tabs, new lines, double spaces etc to simple 1 space.
$str = trim(preg_replace('/\s+/',' ', $str));
The above line of code will remove extra spaces, as well as leading and trailing spaces.
If you want to replace only multiple spaces in a string, for Example: "this string have lots of space . "
And you expect the answer to be
"this string have lots of space", you can use the following solution:
$strng = "this string have lots of space . ";
$strng = trim(preg_replace('/\s+/',' ', $strng));
echo $strng;
There are security flaws to using preg_replace(), if you get the payload from user input [or other untrusted sources]. PHP executes the regular expression with eval(). If the incoming string isn't properly sanitized, your application risks being subjected to code injection.
In my own application, instead of bothering sanitizing the input (and as I only deal with short strings), I instead made a slightly more processor intensive function, though which is secure, since it doesn't eval() anything.
function secureRip(string $str): string { /* Rips all whitespace securely. */
$arr = str_split($str, 1);
$retStr = '';
foreach ($arr as $char) {
$retStr .= trim($char);
}
return $retStr;
}
$str = preg_replace('/[\s]+/', ' ', $str);
You can use:
$str = trim(str_replace(" ", " ", $str));
This removes extra whitespaces from both sides of string and converts two spaces to one within the string. Note that this won't convert three or more spaces in a row to one!
Another way I can suggest is using implode and explode that is safer but totally not optimum!
$str = implode(" ", array_filter(explode(" ", $str)));
My suggestion is using a native for loop or using regex to do this kind of job.
To expand on Sandip’s answer, I had a bunch of strings showing up in the logs that were mis-coded in bit.ly. They meant to code just the URL but put a twitter handle and some other stuff after a space. It looked like this
? productID =26%20via%20#LFS
Normally, that would‘t be a problem, but I’m getting a lot of SQL injection attempts, so I redirect anything that isn’t a valid ID to a 404. I used the preg_replace method to make the invalid productID string into a valid productID.
$productID=preg_replace('/[\s]+.*/','',$productID);
I look for a space in the URL and then remove everything after it.
I wrote recently a simple function which removes excess white space from string without regular expression implode(' ', array_filter(explode(' ', $str))).
Laravel 9.7 intruduced the new Str::squish() method to remove extraneous whitespaces including extraneous white space between words: https://laravel.com/docs/9.x/helpers#method-str-squish
$str = "I am a PHP Developer";
$str_length = strlen($str);
$str_arr = str_split($str);
for ($i = 0; $i < $str_length; $i++) {
if (isset($str_arr[$i + 1]) && $str_arr[$i] == ' ' && $str_arr[$i] == $str_arr[$i + 1]) {
unset($str_arr[$i]);
}
else {
continue;
}
}
echo implode("", $str_arr);

Categories