Where mysterious no-break line %a0 breaking JSON coming from?

Where mysterious no-break line %a0 breaking JSON coming from? - php

I give up trying to solve this by myself, I need help! I have been working on a WordPress project that has a few features working with AJAX. After updating PHP to 5.6 (as the latest WordPress requested), many of my AJAX functions are broken because of a mysterious no-break line character %a0 appearing in its response and breaking the JSON structure.
The response is from json_encode().
JSON response I am getting:
{
"term_id":75,
"name":"iPhone
3G",
"slug":"iphone-3g"
},
Investigation:
After many hours of reading about this, I tried several solutions that worked for others, but they didn't work for me.
Turning off magic_quotes_gpc in php.ini
Escape the string with preg_replace on the server side for all different no-break/new-line special characters
Escape the string on client side with str.replace
I checked the database, and there is no %a0 for that entry, there is a space %20, which is correct. I also noticed that if I remove that space, this happens to the next item that has a space %20.
I should also mention that this example above with the iPhone 3G is not unique. After this item, a few items are cleared (even those with space %20) but then it happens again later on down the latter with other items, same situation.
So it appears that PHP is replacing %20 with %a0 every so often.
What should I do?

Related

php remove unknown characters

I am building a web application which will run in electron with angular as a frontend framework and laravel as a backend framework. In the application it's possible to login with a smartcard (thanks to node-pcsclite), it reads the bytes on the smartcard and then I convert them.
The smartcard contains a code which is linked to the staff table in my MSSQL database. I can retrieve the code from the smartcard and I can log into the application when it uses mysql as database server.
Now when I'm trying to do the same but with mssql, I get an error which should be viewed in html mode instead of the error page itself.
(The code can be alphanumeric)
So it adds all these strange characters (probably non-existing characters), not that much of a problem right? At least, that's what I thought. So I tried to fix it by using this code inside my laravel controller:
preg_replace('/[^A-Za-z0-9\-]/', '', $string);
This didn't solve anything. Then I thought I might have a problem with the query, so I ran SQL Profiler, the problem is that (probably because of the special characters) the query is broken.
select top 1 * from [Staff] where [CodeInit] = '
go
So does anyone know how to really remove the strange characters?
If you need more information feel free to ask.

I had this problem and landed to this question when searching for a solution. I was unable to find any fix.
The string with non-printable characters retrieved from mdecrypt_generic() so I wanted a way to remove those characters. When I copy and paste the retrieved value from browser to Brackets text editor, it show these red dots.
I just pasted it to google and then it was encoded to %10. Nothing helped till now, so as a temporary solution I just used rtrim() to remove those dots.
Copy the dot in brackets and replace with "DOT_HERE".
rtrim(rtrim($pvp, "DOT_HERE"), "\0\4");
"\0\4" will remove only nulls and EOT but not that dot character(%10).
Further here is a screenshot with that red dot. You can use Brackets text editor to see this.
Note that $pvp is the decrypted text.

PHP session string contains strange blank/null characters

Back story: I've been trying to implement a DynamoDB session handle in my Symfony2 application.
I hit a stumbling block when the session is saved to DynamoDB. It appears the string coming from PHP is in some sort of strange encoding that contains blank characters that aren't whitespace, which then prevents the string from being saved in DynamoDB correctly. The string also doesn't play nice when I paste it into PhpStorm.
Here is a sample of of it:
$illegalString = 's:8:"userData";O:27:"\SomeClass":49:{s:8:"�*�email";s:27:"me#domain.com";s:13:"�*�first_name";s:4:"Greg";';
And for reference, here is a screen shot from PhpStorm showing that it isn't whitespace.
Also, if I try to move my cursor around on those characters, other characters start to appear, in the image below my cursor is a couple of spaces to the left of the last semi colon on line 1, the quotation mark does not exist in the string but for some reason it appears when my cursor is on it.
If you copy/paste the string above into the site below, it breaks the page: http://www.asciivalue.com/index.php
Three questions:
What is wrong with this string? What sort of funky encoding is it?
Why is PHP handling session strings this way?
How can I tell PHP to only use UTF-8 when creating session strings?
Note: This only appears to happen on AWS ec2 using the latest Linux AMI.

Those characters tell that you have some problem with encodings somewhere (either when converting from one to another (possibly silently) or specifying wrong encoding).
The sequence you have there seems to be EF BF BD (as I see it after I've copy-pasted it into UTF-8 document) and it stands for REPLACEMENT CHARACTER -- used as replacement for illegal characters when converting from one encoding to another (or validating/cleaning up using wrong encoding).
For example: A0 character is valid in ISO 8599-1, but if you wrongly treat such string as UTF-8 encoded, that character is invalid there and will be replaced by aforementioned sequence.
I suggest to check your session data before it gets saved by a session handler (especially if you use custom one) -- maybe it is like that before writing into session.
Also check what session.serialize_handler you are using -- especially if custom one is used.
You can also try writing your own session handler (the part that will write encoded data into file or whatever -- it's easy) -- see what kind of data comes to a handler: is it good or already "corrupted".
I have not used any of the AWS services myself, so cannot advise on this part.

PHP - Can't remove strange character

I'd really appreciate some help with this. I've wasted days on this problem and none of the suggestions I have found online seem to give me a fix.
I have a CSV file from a supplier. It appears to have been exported from an Microsoft system.
I'm using PHP to import the data into MySQL (both latest versions).
I have one particular record which contains a strange character that I can't get rid of. Manual editing to remove the character is possible, but I would prefer an automated solution as this will happen multiple times a day.
The character appears to be an interpretation of a “smart quote”. A hex editor tells me that the character codes are C2 and 92. In the hex editor it looks like a weird A followed by a smart quote. In other editors and Calc, Writer etc it just appears as a box. ﾒ
I'm using mb_detect_encoding to determine the encoding. All records in the CSV file are returned as ASCII, except the one with the strange character, which is returned as UTF-8.
I can insert the offending record into MySQL and it just appears in Workbench as a square.
MySQL tables are configured to utf-8 – utf8_unicode_ci and other unusual UTF characters (eg fractions) are ok.
I've tried lots of solutions to this...
How to detect malformed utf-8 string in PHP?
Remove non-utf8 characters from string
Removing invalid/incomplete multibyte characters
How to detect malformed utf-8 string in PHP?
How to replace Microsoft-encoded quotes in PHP
etc etc but none of them have worked for me.
All I really want to do is remove or replace the offending character, ideally with a search and replace for the hex values but none of the examples I have tried have worked.
Can anyone help me move forward with this one please?
EDIT:
Can't post answer as not enough reputation:
Thanks for your input. Much appreciated.
I'm just going to go with the hex search and replace:
$DodgyText = preg_replace("/\xEF\xBE\x92/", "" ,$DodgyText);
I know it's not the elegant solution, but I need a quick fix and this works for me.

Another solution is:
$contents = iconv('UTF-8', 'Windows-1251//IGNORE',$contents);
$contents = iconv('Windows-1251', 'UTF-8//IGNORE',$contents);
Where you can replace Windows-1251 to your local encoding.

At a quick glance, this looks like a UTF-8 file. (UTF-8 is identical with the first 128 characters in the ASCII table, hence everything is detected as ASCII except for the special character.)
It should work if your database connection is also UTF-8 encoded (which it may not be by default).
How to do that depends on your database library, let us know which one you're using if you need help setting the connection encoding.

updated code based on established findings
You can do search & replace on strings using hexadecimal notation:
str_replace("\xEF\xBE\x92", '', $value);
This would return the value with the special code removed
That said, if your database table is UTF-8, you shouldn't need that conversion; instead you could look at the connection (or session) character set (i.e. SET NAMES utf8;). Configuring this depends on what library you use to connect to your database.
To debug the value you could use bin2hex(); this usually helps in doing searches online.

What would cause an to turn into a unicode character?

I've got some documents on my website which users can edit via a rich text editor and then save them (to the DB) and print them. Some users are experiencing an issue (only happening on the live site) where some of the characters are getting screwed up. I've checked the DB, and the funny characters are in the DB, so it's not a display issue. It either happens when they save the document (submit the form on the site) or they've put something weird in there or their browser changed some of the characters.
The character that keeps appearing everywhere is Â . It's an accented A followed by a space. Looking at the source HTML, it appears that the affected documents had all their 's converted. But whenever I try it, they come out fine.
What would cause an to turn into a unicode character, but only in limited cases?

Misinterpreting the UTF-8 encoding as Latin-1 will cause this.
>>> u'\xa0'.encode('utf-8').decode('latin-1')
u'\xc2\xa0'
>>> print u'\xa0*'.encode('utf-8').decode('latin-1')
Â *

Special characters in Flex

I am working on a Flex app that has a MySQL database. Data is retrieved from the DB using PHP then I am using AMFPHP to pass the data on to Flex
The problem that I am having is that the data is being copied from Word documents which sometimes result in some of the more unusual characters are not displaying properly. For example, Word uses different characters for starting and ending double quotes instead of just " (the standard double quotes). Another example is the long dash instead of -.
All of these characters result in one or more accented capital A characters appearing instead. Not only that, each time the document is saved, the characters are replaced again resulting in an ever-increasing number of these accented A's appearing.
Doing a search and replace for each troublesome character to swap it for one of the none characters seems to work but obviously this requires compiling a list of all the characters that may appear and means there is scope for this continuing as new characters are used for the first time. It also seems like a bit of a brute force way of getting round the problem rather than a proper solution.
Does anyone know what causes this and have any good workarounds / fixes? I have had similar problems when using utf-8 characters in html documents that aren't set to use utf-8. Is this the same thing and if so, how do I get flex to use utf-8?
Many thanks
Adam

It is the same thing, and smart quotes aren't special as such: you will in fact be failing for every non-ASCII character. As such a trivial ad-hoc replace for the smart quote characters will be pointless.
At some point, someone is mis-decoding a sequence of bytes as ISO-8859-1 or Windows code page 1252 when it should have been UTF-8. Difficult to say where without detail/code.
What is “the document”? What format is it? Does that format support UTF-8 content? If it does not, you will need to encode output you put into it at the document-creation phase to the encoding the consumer of that document expects, eg. using iconv.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.