How to parse characters in a single-quoted string? - php

To get a double quoted string (which I cannot change) correctly parsed I have to do following:
$string = '15 Rose Avenue\n Irlam\n Manchester';
$string = str_replace('\n', "\n", $string);
print nl2br($string); // demonstrates that the \n's are now linebreak characters
So far, so good.
But in my given string there are characters like \xC3\xA4. There are many characters like this (beginning with \x..)
How can I get them correctly parsed as shown above with the linebreak?

You can use
$str = stripcslashes($str);

You can escape a \ in single quotes:
$string = str_replace('\\n', "\n", $string);
But you're going to have a lot of potential replaces if you need to do \\xC3, etc.... best use a preg_replace_callback() with a function(callback) to translate them to bytes

Related

PHP Regex replace all instances of a character in similar strings

I have a file that contains a collection strings. All of the strings begin with the same set of characters and end with the same character. I need to find all of the strings that match a certain pattern, and then remove particular characters from them before saving the file. Each string looks like this:
Data_*: " ... "
where Data_ is the same for each string, the asterisk is an incrementing integer that is either two or three digits, and the colon and the double quotation marks are the same for each string. The ... is completely different in every string and it's the part of each I need to work with. I need to remove all double quotation marks from the ... , preserving the enclosing double quotation marks. I don't need to replace them, just remove them.
So for example, I need this...
Data_83: "He said, "Yes!" to the question"
to become this...
Data_83: "He said, Yes! to the question"
I am familiar with PHP and would like to use this. I know how to do something like...
<?php
$filename = 'path/to/file';
$content = file_get_contents($filename);
$new_content = str_replace('"', '', $content);
file_put_contents($filename, $new_content);
And I'm pretty sure a regular expression will be what I'm wanting to use to find the strings and remove the extra double quotation marks. But I'm very new to regular expressions and need some help here.
EDIT:
I should have mentioned, the file is a PHP file containing an object. It looks a bit like this:
<?php
$thing = {
Data_83: "He said, "Yes!" to the question",
Data_84: "Another string with "unwanted" quotes"
}
You may use preg_replace_callback with a regex like
'~^(\h*Data_\d{2,}:\h*")(.*)"~m'
Note that you may make it safer if you specify an optional , at the end of the line: '~^(\h*Data_\d{2,}:\h*")(.*)",?\h*$~m' but you might need to introduce another capturing group then (around ,?\h*, and then append $m[3] in the preg_replace_callback callback function).
Details
^ - start of the line (m is a multiline modifier)
(\h*Data_\d{2,}:\h*") - Group 1 ($m[1]):
\h* - 0+ horizontal whitespaces
Data_ - Data_ substring
\d{2,} - 2 or more digits
: - a colon
\h* - 0+ horizontal whitespaces
" - double quote
(.*) - Group 2 ($m[2]): any 0+ chars other than line break chars, as many as possible, up to the last...
" - double quote (on a line).
The $m represents the whole match object, and you only need to remove the " inside $m[2], the second capture.
See the PHP demo:
preg_replace_callback('~^(\h*Data_\d{2,}:\h*")(.*)"~m', function($m) {
return $m[1] . str_replace('"', '', $m[2]) . '"';
}, $content);
Not as elegant but you could create a UDF:
function RemoveNestedQuotes($string)
{
$firstPart = explode(":", $string)[0];
preg_match('/"(.*)"/', $string, $matches, PREG_OFFSET_CAPTURE);
$tmpString = $matches[1][0];
return $firstPart . ': "' . preg_replace('/"/', '', $tmpString) . '"';
}
example:
$string = 'Data_83: "He said, "Yes!" to the question"';
echo RemoveNestedQuotes($string);
// Data_83: "He said, Yes! to the question"
One more step after str_replace with implode and explode. You can just do it like this.
<?php
$string = 'Data_83: "He said, "Yes!" to the question"';
$string = str_replace('"', '', $string);
echo $string =implode(': "',explode(': ',$string)).'"';
?>
Demo : https://eval.in/912466
Program Output
Data_83: "He said, Yes! to the question"
Just to replace " quotes
<?php
$string = 'Data_83: "He said, "Yes!" to the question"';
echo preg_replace('/"/', '', $string);
?>
Demo : https://eval.in/912457
The way I see it, you don't need to make any preg_replace_callback() calls or a convoluted run of explosions and replacements. You merely need to disqualify the 2 double quotes that you wish to retain and match the rest for removal.
Code: (Demo)
$string = 'Data_83: "He said, "Yes!" to the question",
Data_184: "He said, "WTF!" to the question"';
echo preg_replace('/^[^"]+"(*SKIP)(*FAIL)|"(?!,\R|$)/m','',$string);
Output:
Data_83: "He said, Yes! to the question",
Data_184: "He said, WTF! to the question"
Pattern Demo
/^[^"]+"(*SKIP)(*FAIL)|"(?!,?$)/m
This pattern says:
match from the start of each line until you reach the first double quote, then DISQUALIFY it.
then after the |, match all double quotes that are not optionally followed by a comma then the end of line.
While this pattern worked on regex101 with my sample input, when I transferred it to the php sandbox to whack together a demo, I needed to add \R to maintain accuracy. You can test to see which is appropriate for your server/environment.

Remove backwards slashes in string

When running a function which returns a string, I end up with backwards-slashes before a quotation marks, like this:
$string = get_string();
// returns: Example
I suspect it is some type of escaping happening somewhere. I know I can string replace the backwards-slash, but I suppose in these cases, there is some type of unescape function you run?
You only need to escape quotes when it matches your starting/ending delimiter. This code should work properly:
$string = 'Example';
If your string is enclosed in single quotes ', then " doesn't need to be escaped. Likewise, the opposite is true.
Avoid using stripslashes(), as it could cause issues if single quotes need to contain slashes. A simple find/replace should work for you:
$string = 'Example';
$string = str_replace($string, '\"', '"');
echo $string; //echos Example
<?php
$string = 'Example';
echo stripslashes($string);
?>

Using preg_replace not working properly

I need to replace everything in a string that is not a word,space,comma,period,question mark,exclamation mark,asterisk or '. I'm trying to do it using preg_replace, but not getting the correct results:
$string = "i don't know if i can do this,.?!*!##$%^&()_+123|";
preg_replace("~(?![\w\s]+|[\,\.\?\!\*]+|'|)~", "", $string);
echo $string;
Result:
i don't know if i can do this,.?!!*##$%^&()_+123|
Need Result:
i don't know if i can do this,.?!*
I don't know if you're happy to call html_entity_decode first to convert that ' into an apostrophe. If you are, then probably the simplest way to achieve this is
// Convert HTML entities to characters
$string = html_entity_decode($string, ENT_QUOTES);
// Remove characters other than the specified list.
$string = preg_replace("~[^\w\s,.?!*']+~", "", $string);
// Convert characters back to HTML entities. This will convert the ' back to '
$string = htmlspecialchars($string, ENT_QUOTES);
If not, then you'll need to use some negative assertions to remove & when not followed by #, ; when not preceded by &#039, and so on.
$string = preg_replace("~[^\w\s,.?!*'&#;]+|&(?!#)|&#(?!039;)|(?<!&)#|(?<!&#039);~", "", $string);
The results are subtly different. The first block of code, when provided ", will convert it to " and then remove it from the string. The second block will remove & and ; and leave quot behind in the result.

Special (escaped) characters in replacements array in preg_replace get escaped

I’m trying to modify a string of the following form where each field is delimited by a tab except for the first which is followed by two or more tabs.
"$str1 $str2 $str3 $str4 $str5 $str6"
The modified string will have each field wrapped in HTML table tags, and be on its own, indented line as so.
"<tr>
<td class="title">$str1</td>
<td sorttable_customkey="$str2"></td>
<td sorttable_customkey="$str3"></td>
<td sorttable_customkey="$str4"></td>
<td sorttable_customkey="$str5"></td>
<td sorttable_customkey="$str6"></td>
</tr>
"
I tried using code like the following to do it.
$patterns = array();
$patterns[0]='/^/';
$patterns[1]='/\t\t+/';
$patterns[2]='/\t/';
$patterns[3]='/$/';
$replacements = array();
$replacements[0]='\t\t<tr>\r\n\t\t\t<td class="title">';
$replacements[1]='</td>\r\n\t\t\t<td sorttable_customkey="';
$replacements[2]='"></td>\r\n\t\t\t<td sorttable_customkey="';
$replacements[3]='"></td>\r\n\t\t</tr>\r\n';
for ($i=0; $i<count($lines); $i++) {
$lines[$i] = preg_replace($patterns, $replacements, $lines[$i]);
}
The problem is that the escaped characters (tabs and newlines) in the replacement array remain escaped in the destination string and I get the following string.
"\t\t<tr>\r\n\t\t\t<td class="title">$str</td>\r\n\t\t\t<td sorttable_customkey="$str2"></td>\r\n\t\t\t<td sorttable_customkey="$str3"></td>\r\n\t\t\t<td sorttable_customkey="$str4"></td>\r\n\t\t\t<td sorttable_customkey="$str5"></td>\r\n\t\t\t<td sorttable_customkey="$str6"></td>\r\n\t\t</tr>\r\n"
Strangely, this line I tried earlier on does work:
$data=preg_replace("/\t+/", "\t", $data);
Am I missing something? Any idea how to fix it?
You need double quotes or heredocs for the replacement string - PCRE only parses those escape characters in the search string.
In your working example preg_replace("/\t+/", "\t", $data) those are both literal tab characters because they're in double quotes.
If you changed it to preg_replace('/\t+/', '\t', $data) you can observe your main problem - PCRE understands that the \t in the search string represents a tab, but doesn't for the one in the replacement string.
So by using double quotes for the replacement, e.g. preg_replace('/\t+/', "\t", $data), you let PHP parse the \t and you get the expected result.
It is slightly incongruous, just something to remember.
Your $replacements array has all its strings decalred as single-quoted strings.
That means that escaped characters won't scape (except \').
It is not related directly to PCRE regular expressions, but to how PHP handles strings.
Basically you can type strings like these:
<?php # String test
$value = "substitution";
$str1 = 'this is a $value that does not get substituted';
$str2 = "this is a $value that does not remember the variable"; # this is a substitution that does not remember the variable
$str3 = "you can also type \$value = $value" # you can also type $value = substitution
$bigstr =<<< MARKER
you can type
very long stuff here
provided you end it with the single
value MARKER you had put earlier in the beginning of a line
just like this:
MARKER;
tl;dr version: problem is single quotes in the $replacements and $patterns that should be double quotes

Remove newline character from a string using PHP regex

How can I remove a new line character from a string using PHP?
$string = str_replace(PHP_EOL, '', $string);
or
$string = str_replace(array("\n","\r"), '', $string);
$string = str_replace("\n", "", $string);
$string = str_replace("\r", "", $string);
To remove several new lines it's recommended to use a regular expression:
$my_string = trim(preg_replace('/\s\s+/', ' ', $my_string));
Better to use,
$string = str_replace(array("\n","\r\n","\r"), '', $string).
Because some line breaks remains as it is from textarea input.
Something a bit more functional (easy to use anywhere):
function strip_carriage_returns($string)
{
return str_replace(array("\n\r", "\n", "\r"), '', $string);
}
stripcslashes should suffice (removes \r\n etc.)
$str = stripcslashes($str);
Returns a string with backslashes stripped off. Recognizes C-like \n,
\r ..., octal and hexadecimal representation.
Try this out. It's working for me.
First remove n from the string (use double slash before n).
Then remove r from string like n
Code:
$string = str_replace("\\n", $string);
$string = str_replace("\\r", $string);
Let's see a performance test!
Things have changed since I last answered this question, so here's a little test I created. I compared the four most promising methods, preg_replace vs. strtr vs. str_replace, and strtr goes twice because it has a single character and an array-to-array mode.
You can run the test here:
        https://deneskellner.com/stackoverflow-examples/1991198/
Results
251.84 ticks using preg_replace("/[\r\n]+/"," ",$text);
81.04 ticks using strtr($text,["\r"=>"","\n"=>""]);
11.65 ticks using str_replace($text,["\r","\n"],["",""])
4.65 ticks using strtr($text,"\r\n"," ")
(Note that it's a realtime test and server loads may change, so you'll probably get different figures.)
The preg_replace solution is noticeably slower, but that's okay. They do a different job and PHP has no prepared regex, so it's parsing the expression every single time. It's simply not fair to expect them to win.
On the other hand, in line 2-3, str_replace and strtr are doing almost the same job and they perform quite differently. They deal with arrays, and they do exactly what we told them - remove the newlines, replacing them with nothing.
The last one is a dirty trick: it replaces characters with characters, that is, newlines with spaces. It's even faster, and it makes sense because when you get rid of line breaks, you probably don't want to concatenate the word at the end of one line with the first word of the next. So it's not exactly what the OP described, but it's clearly the fastest. With long strings and many replacements, the difference will grow because character substitutions are linear by nature.
Verdict: str_replace wins in general
And if you can afford to have spaces instead of [\r\n], use strtr with characters. It works twice as fast in the average case and probably a lot faster when there are many short lines.
Use:
function removeP($text) {
$key = 0;
$newText = "";
while ($key < strlen($text)) {
if(ord($text[$key]) == 9 or
ord($text[$key]) == 10) {
//$newText .= '<br>'; // Uncomment this if you want <br> to replace that spacial characters;
}
else {
$newText .= $text[$key];
}
// echo $k . "'" . $t[$k] . "'=" . ord($t[$k]) . "<br>";
$key++;
}
return $newText;
}
$myvar = removeP("your string");
Note: Here I am not using PHP regex, but still you can remove the newline character.
This will remove all newline characters which are not removed from by preg_replace, str_replace or trim functions

Categories