I fetch a field from a database that contains a rtf document.
For Example this could look like this:
{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{*\generator Msftedit 5.41.21.2509;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang7\f0\fs22 asdfasdf\par
a\par
sf\par
asd\par
fasd\par
\b dfas\b0\par
dfas\par
}
Now PHP fetches this as double quoted from the database, the result ist that the string will not be interpreded char wise... assumed special chars like '\r' and '\n' got recognized.
How can i convert from this double quoted to a single quoted string so that i got all raw chars? Or how can i achieve that the value is asigned as single quoted when i fetch it from database?
Thanks in advance
-ralf
Now PHP fetches this as double quoted
from the database
What? The result of mysql_fetch_row or whatewer is just a string. Nothing is reinterpreted in any way. \n just stays \n. Only string literals you write in the PHP file into double quotes will be "interpreted" and then stored as a string.
There is nothing like single- or double-quoted string. There are just single- or double-quoted string literals in the PHP source code from which the actual PHP strings will be made.
The only problem you have now is how to process/parse the RTF data. (Assuming the data was stored in blob column so there is no complication with character encodings.)
First of all you should invest some time who (or what) is escaping your code.
But for a quick solution, try to use the stripslashes() function:
$unsecaped = stripslashes( $database_data );
But I urge you try to find what is escaping the data.
This can occur:
Before inserting the data into database. This is typically caused by the PHP directive magic_quotes_gpc.
When retrieving the data from database.
Updated
I didn't understand your problem...
You want to keep all those backslashes but avoid to \r and \n being interpreted as carriage return and line feed...
Try to do a str_replace to find all those \r and \n and replacing them with \r and \n.
I don't know if \r could belong to any wise char, so maybe you should replace only " \r
"/" \n ", You'll need preg_replace() for this possibly.
Related
I have a use case where a customer needs to load JSON-serialized objects via a CSV import. Some of these objects contain strings which contain double-quotes. Typically I would simply add a '\' before the nested double-quote in order to escape it, however this seems to conflict with the parsing of the CSV file. We're using PHP 7.0 and the function "fgetcsv" to read the lines of the file. Whenever I do this I notice odd behavior after an escaped double-quote is encountered. Here's a sample row from the CSV:
"{""test"": ""\""this\"" is a test""}"
And here is how PHP reads this column using fgetcsv:
{"test": "\"this\"" is a test""}"
I have confirmed any double-quotes after the initial escaped double-quote run into this problem. Thinking the backslash may be causing issues with escaping I tried using another backslash to escape the backslash:
"{""test"": ""\\""this\\"" is a test""}"
And here's the result:
{"test": "\\"this\\" is a test"}
So while this does resolve the issue with any double-quotes beyond the first, I am left with two backslashes instead of one.
Without changing the underlying code, is there a way to escape this data so that fgetcsv will interpret it appropriately? Like so:
{"test": "\"this\" is a test"}
You could try using \" to represent a double quote instead of "".
E.g., "{\"test\": \"\\\"this\\\" is a test\"}"
Whether this works may depend on the version of fgetcsv you are using -- I'm not sure.
Alternatively, if you're using fgetcsv 5.3 or later, you could try changing the fgetcsv parameters to change the enclosure character or escape character so that it doesn't conflict with JSON. See the parameters in the fgetcsv docs.
enclosure
The optional enclosure parameter sets the field enclosure character (one character only).
escape
The optional escape parameter sets the escape character (one character only).
Note: Usually an enclosure character is escaped inside a field by doubling it; however, the escape character can be used as an alternative. So for the default parameter values "" and \" have the same meaning. Other than allowing to escape the enclosure character the escape character has no special meaning; it isn't even meant to escape itself.
(emphasis in original)
In PHP, is it at all possible to output the contents of a string to show any escaped characters that may be contained within the string? I get that the whole point of escaping characters is so that they aren't treated in the usual way. But I would still like to be able to view the raw contents of a string so I can see for myself exactly how characters like \n and \r, etc. are represented. Does PHP have a method for doing this?
Use json_encode() to encode the string as JSON. The JSON encoding of strings (which is, in fact, JavaScript) is the same as the one used by PHP. Both JavaScript and PHP were inspired from C and they copied the notation of string literals from it.
if you use single quotation marks it should do what you need
eg echo 'this\n'; will output this\n where as echo "this\n"; will output this and a new line
I was wondering if anybody knew how to get around this problem.
I am gathering user input from a HTML form which is then posted using htmlspecialchars into PHP to avoid issues when using quotes/etc...
However, I also want to run server-side validation checks on the data being gathered through regular expressions - though I'm not sure how to go about this.
So far, I have thought of decoding the htmlspecialchars - but because I am going to be using the Strings straight away, this means that the code could break after I run this conversion. e.g: Let's say the user inputted a single quote, " into a field. This would be converted to ", then if I decode this and use it in a variable, it could end up like: $string = """; which is going to give me issues.
Any advice on this would be greatly appreciated!
You seem to misunderstand the difference between data and how this data is altered to be parseable in a certain context.
A php string can contain any data. What is stored in this string is the "raw" form: the form in which we want to manipulate the data if needed.
In certain contexts, not all characters are valid. For example, in a html textarea, the < and > characters may not be used, because they are special characters. We still want to be able to use these characters. To use special characters in a context, we escape these characters. By escaping a special character it looses its special meaning. In the context of a html textarea, the < character is escaped as the sequence <. Unlike the < character, this escaped sequence does not have a special meaning in html, and thus if we send the following sequence to the browser, it knows how to parse that sequence and display the right thing: <textarea><</textarea>. When we talk about what the data is that this textarea contains, we do not say that it contains <, but instead we say that it contains <.
As you said, in a php script, in a double quoted string, the " character has a special meaning. This has only to do with parsing. PHP simply does not know how to parse a sequence $str = """;. If we would want to have the double quote in such a double quoted string, we would need to escape it. We escape a double quote in a php double quoted string by prepending it with a \. To make a string containing a single double quote, using the double quoted notation, you would write $str = "\"";.
However, none of this matters.. You are taking input from a html form. When you click the submit button, the browser reads what is in the textarea(, and decodes it as html?). The browser then encodes it in a way as dictated by the form tag, and sends it to the server. The server then decodes the blob of text back in it's raw data form. That data is passed to PHP, and it is this form you will encounter in $_POST['myTextarea'].
In conclusion: If data is encoded, realize for which context it was encoded and decode it based on that context. You do not need to escape for php quoted strings, because you are working on internal strings. There is nothing to parse. Remind yourself that when you are going to use the data somewhere, that you should take care that all special characters in your data for that particular context are escaped.
I suppose that htmlspecialchars() function is called after posting the form to PHP. Simplest solution then will be to match against regular expression first and then do htmlspecialchars().
Also, if you have string encoded with htmlspecialchars(), after decoding with htmlspecialchars_decode(), PHP internal representation will be "\"", so you break nothing. There is big difference how you write strings by hand to PHP file and how PHP internally handle them. You really don't need to be bothered by this.
I'd like to keep a certain string in a configuration file, that is to be parsed by PHP parse_ini_file() function. However, this string contains some special characters (with codes like 0x2C or 0x3D) that need to be encoded in some way. Is there any way to write a special character with a hex code in such a file?
The proper way to escape INI values is to enclose them in "double quotes". If your string doesn't contain double quotes, you can use it in as a value enclosed in double quotes.
Escaping single quotes with a backslash seems to work as long as there are not two consecutive double quotes in the value, as per http://php.net/manual/en/function.parse-ini-file.php#100046
If you want to do your own escaping, you certainly can:
htmlspecialchars / htmlspecialchars_decode escapes <,>,& and ".
htmlentities / html_entitity_decode will escape very aggresively (but also very safely) to HTML entities
urlencode / urldecode will escape all special characters except _-~..
base64_encode / base64_decode will ensure the encoded string contains only alphanumeric characters and +=/. This might be optimal for encoding binary data but doesn't preserve readability.
If you notice the two line below, the double quotes are not the same. the first one is what i have a problem with. They are shown as strange characters like - �. But the secound line double quotes is just fine.
“this is line 1.”
and
"this is line 2."
What is the difference between the two double quotes, and how can the special characters be prevented?
You should make sure, your PHP script uses utf-8, as well as the html meta tag says utf-8.
For the first thing, try in PHP (before any output occurs)
header('Content-Type: text/html; charset=utf-8');
In php, you can escape most HTML specialchars with "htmlentities". See http://php.net/manual/de/function.htmlentities.php
First line you copied probably from MS word/MS Excel. Their double quotes are different and will not parse properly using HTTP. You need to convert them using UTF-8 charset and then display on your website.
In line 1, those quotes are sometimes called "smart quotes". They are ascii code #147 and #148.
In line 2, those are "normal" quotes, ascii code #34.
Because character definition beyond ascii code #127 can become somewhat arbritrary depending on the font used, I try to avoid using the smart quote characters.
Micorosoft Word will (infamously) convert normal quotes to "smart quotes". This "feature" can be turned off in settings.
This issue occurred to me when I copy pasted text text from word document to label. If you observe carefully Word document double quotes looked little curvy opposed to HTML double quotes. Just removing copy pasted doubled quotes and typing again helped. ” - This from Word ." - This is from HTML . You can see the difference yourselves