Issues processing the curly apostrophe - php

One of the systems I look after receives daily csv files from a third party. Recently the integration stopped working. I managed to pinpoint root the cause - the curly apostrophe. Once replaced with a regular one import files got processed successfully.
Third party system that generates these files is one of the Microsoft products, MS Access I think. System that receive and process these files is written in PHP on MySql database.
And here are the questions I would like to ask here:
- is this PHP or MySql that does not 'like' this character?
- are there any more characters of this kind that php/mysql would have issue processing?

I am not sure what "curly apostrophe" is, but if it's usual apostrophe (like in "it's") - yes, it does, as it is used as a string delimiter in MySQL.
If it's some other charecter - then it doesn't have any special meaning in PHP.
Anyway, you have to always format SQL query parts according to their role, to avoid whatever syntax errors.
Please refer to my earlier answer on the matter: In PHP when submitting strings to the database should I take care of illegal characters

Related

Handling database table entries with backslashes?

We're running into a weird edge case where we are trying to store a json blob in a table in our database, and that blob needs to be able to contain the \ character. So a user were to enter in \test it needs to come back as exactly that, but instead its coming back as a tab followed by "est"
As far as I can tell, whats happening is that when a user enters and submits "\test" it gets evaluated into "\ \test" (remove the space, cant put two backslashes in here and have it display right?) by the client and then entered into the table. I can verify that in the SQL that gets called against the table there are two backslashes. When I look at it in the table after this step its back to "\test". When the client loads it up again it gets evaluated into a tab followed by an "est".
We are under the impression that the second backslash is necessary so that the first backslash will get escaped and not evaluated but maybe that is what is causing issues? I sort of assume when the query runs one of the backslashes gets escaped anyway but I'm not really sure what to do about that. Is there something with out our database is handling backslashes that we need to be looking out for? Is there a way to handle this that we haven't considered?
It's a Postgres database if that's helpful. I'd say I'm beginner to intermediate on this sort of thing, I'm looking through documentation but if anyone can even point me in the right direction that would be very helpful.
Postgres version as far as i can tell through Amazon Aws is 9.3
EDIT
I think ive tracked this issue down to a line in our php backend that I don't really understand. I'm looking at the documentation for that now and will mark this as answered since I've verified that its not an issue with SQL.
Blockquote A backslash as - by default - no special meaning in SQL. This might be caused by whatever code is processing those values (and sending them to the database). See here for an online example: rextester.com/QLLYG57275 – a_horse_with_no_name
I'm accepting this as the answer as I've verified that the issue is with out backend code constructing the SQL, and not how the SQL is being handled on the database end.

php remove unknown characters

I am building a web application which will run in electron with angular as a frontend framework and laravel as a backend framework. In the application it's possible to login with a smartcard (thanks to node-pcsclite), it reads the bytes on the smartcard and then I convert them.
The smartcard contains a code which is linked to the staff table in my MSSQL database. I can retrieve the code from the smartcard and I can log into the application when it uses mysql as database server.
Now when I'm trying to do the same but with mssql, I get an error which should be viewed in html mode instead of the error page itself.
(The code can be alphanumeric)
So it adds all these strange characters (probably non-existing characters), not that much of a problem right? At least, that's what I thought. So I tried to fix it by using this code inside my laravel controller:
preg_replace('/[^A-Za-z0-9\-]/', '', $string);
This didn't solve anything. Then I thought I might have a problem with the query, so I ran SQL Profiler, the problem is that (probably because of the special characters) the query is broken.
select top 1 * from [Staff] where [CodeInit] = '
go
So does anyone know how to really remove the strange characters?
If you need more information feel free to ask.
I had this problem and landed to this question when searching for a solution. I was unable to find any fix.
The string with non-printable characters retrieved from mdecrypt_generic() so I wanted a way to remove those characters. When I copy and paste the retrieved value from browser to Brackets text editor, it show these red dots.
I just pasted it to google and then it was encoded to %10. Nothing helped till now, so as a temporary solution I just used rtrim() to remove those dots.
Copy the dot in brackets and replace with "DOT_HERE".
rtrim(rtrim($pvp, "DOT_HERE"), "\0\4");
"\0\4" will remove only nulls and EOT but not that dot character(%10).
Further here is a screenshot with that red dot. You can use Brackets text editor to see this.
Note that $pvp is the decrypted text.

Database contains #39 instead of #039

-- Sytem is MySQL, PHP, Apache and the code is built around the Codeigniter Framework
EDIT FOR CLARITY: I am not storing data, I am trying to retrieve data that was stored some years ago (badly as escaped data). In the database the name Fred' is stored as Fred&#39 yet when I convert Fred' using htmlspecialcahrs it comes out as Fred&#039. My question is what do I need to do to make Fred' convert to Fred&#39 and any other equivalents?
Original Question
I've inherited a database from another system (Invision Power Board to be exact). The site is now custom coded using Codeigniter but is using the same member database from the old Invision Power Board site.
I've now discovered a problem where by if a user has an apostrophe in their name e.g. "Fred'" codeigniter's built in html_escape function (which just uses htmlspecialchars) converts it to Fred&#039
Yet in the database the name is saved as: Fred&#39 and thus the lookup fails.
I'm not sure what Invision Power Board was doing to the string before inserting it into the database, but does anyone have any idea how I could ensure that it is converted to &#39 instead of &#039 ?
Simply saying do a str_replace or change the data in the db is not useful as there are hundreds of possibilities for what could be in a users name. A quick search for users with a # in their name (presumably a special char) shows up 440 users who are currently unable to login due to this bug in our site.
EDIT: Fixed some formatting to remove ";" so it doesn't just display an apostrophe
You can use preg_replace() to remove 0's from php generated string before comparison:
$string = 'Fred&#039';
$string = preg_replace('/&#0+([1-9]+)/', '&#$1', $string);
var_dump(str_split($string));
// str_split to show real result

The holy grail of cleaning input and output in php?

I've been running and developed a classified site now for the last 8 months and all the bugs were due to only one reason: how the users input their text...
My question is: Is there a php class, a plugin, something that I can do
$str = UltimateClean($str) before sending $str to my sql??
PS. I also noticed the problems doubled when i started using JSON, because I also have to be careful outputting the result in JSON..
Some issues I faced: multi-language strings (different charsets), copy-paste from Excel sheets.
Note: I am not worried for SQL Injections.
No, there isn't.
Different modes of escaping are for different purposes. You cannot universally escape something.
For Databases: Use PDO with prepared queries
For HTML: Use htmlspecialchars()
For JSON: json_encode() handles this for you
For character sets: You should be using UTF-8 on your page. Do this, and set your databases accordingly, and watch those issues disappear.

Best practices about parsing multi language feed

I'm having a problem parsing data from different feeds, some of them in English, others in Italian and others in Spanish. I'm parsing using a PHP script and saving the parsed data into my MySQL database.
The problem is that when I parse items that contains "non common" characters like: "Strage di Viareggio Più" when I look into my database the phrase is stored in this way: "Strage di Viareggio Più".
My database can use that kind character because when I input that manualy it works fine, in the original feed (rss file) the phrase is also fine, I think is my PHP server who is changing the letter. How can I solve this? Thanks!
Make sure that the database uses UTF-8 (as you say it does) and that the PHP script has its internal encoding set to UTF-8, which you can achieve with iconv_set_encoding. If you're reading data from an HTTP request that should be all you need, as long as the request tags its own encoding correctly.
Looks like input data is in UTF-8, but charset/collation of DB table - ASCII. I would suggest to have UTF-8 everywhere.
What you need to implement, before saving to MySQL is:
http://php.net/manual/en/function.htmlentities.php
Check these different threads for more information
Best practices in PHP and MySQL with international strings
htmlentities() makes Chinese characters unusable
What I find incredible is that this question has received -2 in the past 24 hours without any comments.
From the question posted:
I'm parsing using a PHP script and saving the parsed data into my MySQL database.
and
I think is my PHP server who is changing the letter. How can I solve this? Thanks!
The answers posted so far are related to the encoding and settings of MySQL. The person asking the question has clearly stated that he can insert special characters manually and is having no problems:
My database can use that kind character because when I input that manualy it works fine
My answer was to help him convert the characters into an html entity which will circumvent the problem he is having with the RSS feed and answering the question posted.

Categories