I am using shared server with a php script that sends a number to my application. The server like to mess with me by adding an invisible ' to that number. I've been trying for around 6 hours fix this but there is no way for me to access the '. As soon as I convert the number(string form) to a real number it give an error that say the number looks like this: '200. You can never see the ' in logcat, only if the error occurs. If I log the string's length, it counts one more character than there should be. It works when I test it localy on my computer but when I upload it to the shared server it adds the '. Do anyone know why this is happening? Also is there any method to convert a string with a ' in it to a number without using any string manipulation methods since there is no way to access the ' ?
Try trimming the string before converting:
String number = dataFromServer.trim();
Integer.parseInt(number);
Are you sure the invisible thing is the quote character? It may be that the code that generates the error message attempts to put quotes around the string, but that there is something after the digits that eats the closing quote when displayed. (For example, a leading UTF-16 surrogate codepoint, or a U+0000?). Try to log out the numerical code units in the string you receive one by one.
Isn't your php script file encoded in "UTF-8 with BOM"?
If yes try "UTF-8 without BOM", since it's a nasty-sometimes-invisible caracter that gets added at the beginning of the file and "sometimes" gets interpreted by the server.
Related
I am trying to extract n characters from a string using
substr($originalText,0,250);
The nth character is an en-dash. So I get the last character as †when I view it in notepad. In my editor, Brackets, I can't even open the log file it since it only supports UTF-8 encoding.
I also cannot run json_encode on this string.
However, when I use substr($originalText,0,251), it works just fine. I can open the log file and it shows an en-dash instead of â€. json_encode also works fine.
I can use mb_convert_encoding($mystring, "UTF-8", "Windows-1252") to circumvent the problem, but could anyone tell me why having these characters at the end specifically causes an error?
Moreover, on doing this, my log file shows †in brackets, which is confusing too.
My question is why is having the en-dash at the end of the string, different from having it anywhere else (followed by other characters).
Hopefully my question is clear, if not I can try to explain further.
Thanks.
Pid's answer gives an explanation for why this is happening, this answer just looks at what you can do about it...
Use mb_substr()
The multibyte string module was designed for exactly this situation, and provides a number of string functions that handle multibyte characters correctly. I suggest having a look through there as there are likely other ones that you will need in other places of your application.
You may need to install or enable this module if you get a function not found error. Instructions for this are platform dependent and out-of-scope for this question.
The function you want for the case in your question is called mb_substr() and is called the same as you would use substr(), but has other optional arguments.
UTF-8 uses so-called surrogates which extend the codepage beyond ASCII to accomodate many more characters.
A single UTF-8 character may be coded into one, two, three or four bytes, depending on the character.
You cut the string right in the middle of a multi-byte character:
[<-character->]
[byte-0|byte-1]
^
You cut the string right here in the middle!
[<-----character---->]
[byte-0|byte-1|byte-2]
^ ^
Or anywhere here if it's 3 bytes long.
So the decoder has the first byte(s) but can't read the entire character because the string ends prematurely.
This causes all the effects you are witnessing.
The solution to this problem is here in Dezza's answer.
I am trying to send a url like this for search data
http://localhost/project/search/text:75%
I am getting 400 - Bad Request error in here.
I even tried replacing percentage with %25. But it didn't worked. How should I send the search data containing percentage?
In URLs, the % percent character is reserved for character encoding.
Usually to represent a % character you can use %25, but as you have already tried this and that it doesn't work for you, you should instead use PHP's urlencode function like so:
$url=urlencode("text:%75");
The same issue occurs with :, this therefore prevents the same issue with this character also (which for reference is %3A).
Partially from this question.
I'm using php and mysqli and I meet a problem with an insert query which looks like :
SET NAMES 'utf8'
$text = mysqli_real_escape_string($connection, $text)
insert into table values('', '".$text."');
Pages are encoded utf8 without BOM and mysql is utf8 general ci
The problem is when I use phpmyadmin the request works fine but when I use website interface and type a text with character "+" it replace with a space " " in mysql but all other characters like ', ", accents, \, /, % are correctly inserted...
It worked before I probably made a mistake.
Thanks you by advance and sorry for my poor english.
It is neither mysql, not mysqli, not PHP.
None of them put any special meaning in this character.
If you care to verify your inserts, by simply echoing $text out before insert, you will see that it is already stripped of + sign. So, you have to find the code that strips that symbol out.
A program is not a "black box" which you feed with data and it returns some unexpected output.
But rather set of operators, each performing some data manipulations.
So, you have to debug your code, means you have to echo your $text variable out in various parts of your code to see where it gets changed. Most likely it is getting some unnecessary treatment. After finding that code you may either remove it or ask here if it ok or not.
The only possible case of automated replacement of + character would be if you type your text right in the browser's address bar. In this case + can be replaced with space automatically as PHP does decode urlencoded text and + is used to substitute space character in the URL
I've run into an interesting behavior in the native PHP 5 implementation of json_encode(). Apparently when serializing an object to a json string, the encoder will null out any properties that are strings containing "curly" quotes, the kind that would potentially be copy-pasted out of MS Word documents with the auto conversion enabled.
Is this an expected behavior of the function? What can I do to force these kinds of characters to covert to their basic equivalents? I've checked for character encoding mismatches between the database returning the data and the administration page the inserts it and everything is setup correctly - it definitely seems like the encoder just refuses these values because of these characters. Has anyone else encountered this behavior?
EDIT:
To clarify;
MSWord will take standard quotation marks and apostraphes and convert them to more aesthetic "fancy" or "curly" quotes. These characters can cause problems when placed in content managers that have charset mistmatches between their editing interface (in the html) and the database encoding.
That's not the problem here, though. For example, I have a json_object representing a person's profile and the string:
Jim O’Shea
The UTF code for that apostraphe being \u2019
Will come out null in the json object when fetched from database and directly json_encoded.
{"model_name":"Bio","logged":true,"BioID":"17","Name":null,"Body":"Profile stuff!","Image":"","Timestamp":"2011-09-23 11:15:24","CategoryID":"1"}
Never had this specific problem (i.e. with json_encode()) but a simple - albeit a bit ugly - solution I have used in other places is to loop through your data and pass it through this function I got from somewhere (will credit it when I find out where I got it):
function convert_fancy_quotes ($str) {
return str_replace(array(chr(145),chr(146),chr(147),chr(148),chr(151)),array("'","'",'"','"','-'),$str);
}
json_encode has the nasty habit of silently dropping strings that it finds invalid (i.e. non-UTF8) characters in. (See here for background: How to keep json_encode() from dropping strings with invalid characters)
My guess is the curly quotes are in the wrong character set, or get converted along the way. For example, it could be that your database connection is ISO-8859-1 encoded.
Can you clarify where the data comes from in what format?
If I ever need to do that, I first copy the text into Notepad and then copy it from there. Notepad forces it to be normal quotes. Never had to do it through code though...
I sometimes import data from CSV files that were provided to me, into a mysql table.
In the last one I did, some of the entries has a weird bad character in front of the actual data, and it got imported in my database. Now I'm looking for a way to clean it up.
The bad data is in the mysql column 'email', it seems to be always right in front of the actual data. When trying to print it on my screen using PHP, it shows up as �. When exporting it to a CSV file, it looks like  , and if I SET CHARACTER SET utf8 before printing it on the screen using PHP, it looks like a normal space ' '.
I was thinking of writing a PHP script that goes over all my rows one at a time, fix the email address field, and update the row. However I'm not quite sure about the "fix the email" part!
I was thinking maybe to do a "explode" and use the bad character as a delimiter, but I don't know how to type that character into my code.
Is there maybe a way to find the underlying value/utf8/hex or whatever of that character, then find it in the string?
I hope it's clear enough.
Thanks
EDIT:
In Hex, it looks like it's A0. What can I do to search and delete a character by its hex value? Either in PHP or directly in MySQL I guess ...
SELECT HEX(field) FROM table; should help determine the character.
As an alternative solution, it might actually be easier to fix the issue at the source. I've encountered similar problems with CSV files exported from Excel and have generally found that using something along the lines of...
$correctedLine = mb_convert_variables('UTF-8', 'Windows-1252', $sourceLine);
...tends to rectify the issue. (That said, you'll need to ensure that you have the multi byte string extension compiled in/enabled.)
you can trim any leading unprintable ascii char with something like:
update t set email = substr(email, 2) where ascii(email) not between 32 and 126
you can get the ascii value of the offending char with this:
select ascii(email) as first_char
I think I found a PHP answer that seems to work more reliably:
$newemail = preg_replace('/\xA0/', '', $row['oldemail']);
And then I'm going to update the row with the new email