Send SMS containing 'æøå' with AT commands - php

I'm making a SMS sending function for at project i'm working on. The code works just fine, but
when i send the letters 'æ-ø-å-Æ-Ø-Å' it turns to 'f-x-e-F-X-E'.
How do i change the encoding so that I can send these letters?
This is my code:
<?php
include "php_serial.class.php";
$html = $_POST['msg'];
$serial = new phpSerial;
$serial->deviceSet("/dev/cu.HUAWEIMobile-Modem");
$serial->deviceOpen();
$serial->sendMessage("ATZ\n\r");
// Wait and read from the port
var_dump($serial -> readPort());
$serial->sendMessage("ATE0\n\r");
// Wait and read from the port
var_dump($serial -> readPort());
// To write into
$serial->sendMessage("AT+cmgf=1;+cnmi=2,1,0,1,0\n\r");//
$serial->sendMessage("AT+cmgs=\"+45{$_POST['number']}\"\n\r");
$serial->sendMessage("{$html}\n\r");
$serial->sendMessage(chr(26));
//wait for modem to send message
sleep(3);
$read=$serial->readPort();
$serial->deviceClose();
$read = preg_replace('/\s+/', '', $read);
$read = substr($read, -2);
if($read == "OK") {
header("location: index.php?send=1");
} else {
header("location: index.php?send=2");
}
?>

First of all, you must seriously redo your AT command handling to
Read and parse the every single response line given back from the modem until you get a final result code for every single command line invocation, no exceptions whatsoever. See this answer for more details.
For AT+CMGS specifically you also MUST wait for the "\n\r> " response before sending data, see this answer for more details.
Now to answer your question about æøå turning into fxe, this is a classical stripping of the most significant bit of ISO 8859-1 encoding (which I had almost forgotten about). This is probably caused the default character encoding, but since you always should be explicit and set the character encoding you want to use in any case, investigating that further is of no value. The character encoding used for strings to AT commands is controlled by the AT+CSCS command (see this answer for more details), run AT+CSCS=? to get a list of options.
Based on your information you seem to using ISO 8859-1, so running AT+CSCS="8859-1" will stop zeroing the MSB. You might be satisfied with just that, but I strongly recommend using character encoding UTF-8 instead, it is just so vastly superior to 8859-1.
All of that failing I am quite sure that at least one of the GMS or IRA encodings should supports the æøå characters, but then you have to do some very custom translation, those characters will have binary values very different from what is common in text elsewhere.

Related

GSM modem AT commands UCS2 error 500

Three days ago I started making a simple app to send SMS. I already tested it and it works in GSM CSCS mode, but when I switch it to UCS2 it doesn't show the Cyrillic letters
<?php
error_reporting(E_ALL);
$fp = fopen("/dev/ttyUSB0", 'w+');
$msg = strtoupper(bin2hex(mb_convert_encoding("Тест", "UCS-2", "UTF-8")));
$number_raw = "+359000000000";
$number = bin2hex(mb_convert_encoding($number_raw, "UCS-2", "UTF-8"));
echo $number."<br>";
echo $msg;
$debug = false;
if(!$fp || $debug){
//echo "Can not open!";
}else{
fwrite($fp, "AT+CMGF=1".chr(13)); // OK
sleep(5);
fwrite($fp, 'AT+CSCS="UCS2"'.chr(13)); // OK
sleep(5);
fwrite($fp, 'AT+CMGS="'.$number.'"'.chr(13)); // OK
sleep(5);
fwrite($fp, $msg.chr(26)); // ERROR 500
echo "<br>Sent";
}
?>
Number and message are encoded properly according to this source: http://www.columbia.edu/kermit/ucs2.html
When the message is sent, I receive it (so the encoding of the number is correct) but the content isn't shown properly.
https://imgur.com/a/cCaTU
What are possible causes for this behaviour and could it be my PHP file encoding? Also why is linux finding 3 GSM tty devices?
After you have removed all the horrible sleep calls and implemented proper final response parsing, then you need fix the parsing for waiting for the "\r\n> " response of AT+CMGS.
Without any of that fixed, character set encoding trouble is a very minor problem. When you run AT+CSCS="UCS2" every single string will be encoded by the modem that way and must be encoded that way by you until eternity (or another character set is selected), so for instance to switch from UCS2 to UTF-8 would be AT+CSCS="005500540046002D0038" and not AT+CSCS="UTF-8".

Get all contents of stream socket with fgets() in blocking mode

In order to complete the handshaking for Websockets in ssl, the socket must be read in blocking mode. Using stream sockets, communication is done from the php backend with the (javascript) client using fwrite() and fgets(). In blocking mode, fgets() will wait until the next line comes in, and grab one line. Once the socket connection is made, the client sends the PHP some headers so that the handshake can be completed. The problem is, I can't think of a way to find where the end of the headers are, since the order depends on the browser being used.
I used this work around for chrome (since the sec-websocket-extensions line is the last header sent)
stream_set_blocking($lsSocketNew, true);
$lcHeader = "";
while($lcLine = fgets($lsSocketNew)){
$lcHeader .= $lcLine;
if(strstr($lcLine, "Sec-WebSocket-Extensions")){
break;
}
}
but this doesn't work in other browsers like firefox, where this header is the first one sent. :P
(I think fread() is supposed to do what I am looking for -- in blocking mode it is supposed to get "everything" on the socket when it comes in... but when I tried fread instead, it was returning a blank string. :P stream_get_contents() was the same )
Although I can't give you a PHP advice, there is a couple of things that you may want to consider:
I. What kind of "everything" are you looking for? There are no message borders in TCP so "everything in the stream" is equivalent to "random ordered amount of data". Unfortunately, you aren't going to magically read all HTTP headers and stop there.
II. Given point I, you have to find something that separates HTTP headers from an HTTP body. This is actually rather simple, because the headers end with a blank line. So, just read the data until you receive CRLF CRLF*. In PHP you will most probably see CRLF as \n, though this can depend on the OS.
III. If you're implementing websockets, using fgets is questionable, because the rest of the protocol (after HTTP handshake) is binary. You may want to use dedicated PHP's sockets module and socket_recv instead of fread. I can't say how these two functions differ, but socket_* functions are just a wrapper around BSD sockets which are implemented in a wide variety of languages. Since they're mostly language agnostic, you will find more support and tutorials in the internet.
* Per the HTTP standard:
CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>

PHP: string breaks at special character

I wrote a small PHP script which does a "branding" on a present PDF file. This means on every page I put a string like "belongs to " at a special position. Therefor I use Zend_Pdf out of the Zend Framework.
Because the script is used in German language area, in one line there I use the special character "ö" ("Gehört zu ").
On my local machine (Windows, XAMPP) the script worked fine, but when moving it to my hoster's webspace (some Linux), the string breaks at "ö". That means in my PDF on appears "Geh".
The code is this:
if (substr($file, strlen($file) - 4) === '.pdf') {
$name = $user->GetName;
$fontSize = 12;
$xTextPos = 100;
$yTextPos = 10;
set_include_path(dirname(__FILE__)); // set include_path for external library Zend Framework
require_once('Zend' .DS . 'Pdf.php');
$pdf = Zend_Pdf::load($file);
$font = Zend_Pdf_Font::fontWithName(Zend_Pdf_Font::FONT_HELVETICA);
$branding = 'Gehört zu ' . $name; // German for: 'Belongs to ', problem with 'ö'
foreach ($pdf->pages as &$page) {
$page->setFont($font, $fontSize);
$page->drawText($branding, $xTextPos, $yTextPos);
}
}
I guess the problem is related to some kind of default charset or language setting of the PHP environment. So I searched here and tried out:
$branding = utf8_encode('Gehört zu ') . $name;
...and I made some experiments with functions like html_entity_decode but nothing helped and I decided stopping groping in the dark and open an own question.
Looking forward to any hints. Thank you in advance for your help!
EDIT: Meanwhile I found the same (?) problem, solved on a German forum. But if I do it like they say...
$branding = mb_convert_encoding('Gehört zu ', 'ISO-8859-1') . $name;
... the resulting branding in the PDF is "Gehrt zu ". The "ö" is skipped now.
For this I found another hint on the Zend issue tracker.
I sum up, that I can drop all UTF8 things and concentrate on Latin-1 AKA ISO 8859-1.
I still don't understand why the code worked on my Windows + XAMPP and now crashes on my hoster's Linux.
Your guess is right, the problem is related to encoding. Where exactly the encoding is messed up is hard to say from afar. I'm assuming you work not only with Zend_Pdf, but also have the MVC in place (meaning a complete Zend_Application).
You should check if your application serves pages as UTF-8, by setting:
resources.view.encoding = "UTF-8"
and also placing the appropriate meta-tags in your layout/view.
Depending on what Editor you use, your files may be encoded in a different encoding. You can use Notepad++ on Windows to check your file-encoding and for converting it to UTF-8 (don't just set the encoding to UTF-8, this might mess up your file!) if necessary. I recommend using Eclipse with text file encoding set to "UTF-8" (Preferences > General > Workspace) to make sure your code files are encoded in UTF-8.
Now comes the crucial part:
Zend_Pdf_Page::drawText(string $text, float $x, float $y, string $charEncoding)
See that last argument... set it. If you're lucky, you can skip the previous stuff and just set the encoding there.
edit: I missed something. Database connections. You should check the encoding there too. I frequently work with MS SQL Server, which uses Latin-1 internally; not setting driver_otpions.CharacterSet can mess up stuff pretty bad too. This might be relevant, if you have soemthing like Gehört zu: Günther, where the Name Günther is fetched from db.
Encoding is also depending of the file encoding.
If you encode your file in UTF8 for example and use ut8_encode("ö"), then you'll encode in UTF_8 something already in UTF_8.
So you may want to check what your file encoding is, and what your PDF lib is requiring. Then apply the right formula/transformation.

How to format incoming email text for HTML display

I've set up a script that processes incoming emails and creates blog entries on Blogger. I'm using PEAR's Mail_Mime libs (for now) to read the incoming message. The messages often have characters in them that cannot be read by browsers--this happens most often when people use Outlook or cut/paste from MS Word.
So the output at the other end is something like this:
Here is a test post with “quotes” and ‘apostrophes�for what it�s worth, it also has dashes�and other strange formatting cut and paste from MS Word.
You can also see the output in the wild.
It's not hard to fix any specific instance, but each client (hotmail, gmail, outlook, etc) seems to handle things just a bit differently. Mail_Mime only seems to munge the output and, if I turn off Mail_Mime's parsing and try to translate the encoded characters myself using mb_convert_encoding or some manual simulation of this, it's even worse.
Please not that this is not going to be solved by selecting the right encoding type and using decode/encode/convert functions. The incoming formats vary from Windows-1252 to UTF8 to just about anything else mail clients can think of.
Has anyone scripted this before that could save me some time by offering up a sample or advice on the best approach? I've tried all the simple answers and done plenty of experimenting, so please don't bother responding unless you've dealt with a similar issue successfully or have a deep understanding of encoding issues.
The only way to do this is to do it by the spec's which is I'm afraid to pull in the 'Content-Type' mime header, pick up the charset (it'll look like Content-Type: text/plain; charset="us-ascii") then convert to UTF-8, and of course ensure your output on the web is sent as UTF-8 with the right headers.
To solve this problem, and get my message into valid UTF-8 that is readable from a browser, I found this PHP lib, ConvertCharset by Mikolaj Jedrzejak, which worked on almost everything. It still had issues with a specific symbol (=A0) when converting from Windows-1252 or iso-8859-1. So I converted this character manually before setting the code loose.
Here's what it looks like overall:
// decode using Mail_Mime
require 'Mail.php';
require 'Mail/mime.php';
require 'Mail/mimeDecode.php';
$params['include_bodies'] = true;
$params['decode_bodies'] = true; // this decodes it!
$params['decode_headers'] = true;
$decoder = new Mail_mimeDecode($input);
$mime = $decoder->decode($params);
// too much work to put in this example
$charset = ...; //do some magic with $mime->parts to get the character set
$text = ...; //do some magic with $mime->parts to get the text
// fix the =A0 control character; it's already been decoded
// by Mail_Mime, so we need the actual byte code now
// this has to be done before trying to convert to UTF-8
$char = chr(hexdec(substr('A0',1)));
$text = str_replace($char, '', $text);
// convert to UTF-8 using ConvertCharset
require 'ConvertCharset.class.php';
if( strtolower($charset) != 'utf-8' ) {
$converter = new ConvertCharset($charset, 'utf-8', false);
}
$text = $converter->Convert($text);
Then everything is spiffy. It even does the infamous Iñtërnâtiônàlizætiøn conversion, as well as accepting french, spanish, and pastes directly from MS Word :)

Dealing with eacute and other special characters using Oracle, PHP and Oci8

Hi I am trying to store names into an Oracle database and fetch them back using PHP and oci8.
However, if I insert the é directly into the Oracle database and use oci8 to fetch it back I just receive an e
Do I have to encode all special characters (including é) into html entities (ie: é) before inserting into database ... or am I missing something ?
Thx
UPDATE: Mar 1 at 18:40
found this function:
http://www.php.net/manual/en/function.utf8-decode.php#85034
function charset_decode_utf_8($string) {
if(#!ereg("[\200-\237]",$string) && #!ereg("[\241-\377]",$string)) {
return $string;
}
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e","'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",$string);
$string = preg_replace("/([\300-\337])([\200-\277])/e","'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",$string);
return $string;
}
seems to work, although not sure if its the optimal solution
UPDATE: Mar 8 at 15:45
Oracle's character set is ISO-8859-1.
in PHP I added:
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1");
to force the oci8 connection to use that character set.
Retrieving the é using oci8 from PHP now worked ! (for varchars, but not CLOBs had to do utf8_encode to extract it )
So then I tried saving the data from PHP to Oracle ... and it doesnt work..somewhere along the way from PHP to Oracle the é becomes a ?
UPDATE: Mar 9 at 14:47
So getting closer.
After adding the NLS_LANG variable, doing direct oci8 inserts with é works.
The problem is actually on the PHP side.
By using ExtJs framework, when submitting a form it encodes it using encodeURIComponent.
So é is sent as %C3%A9 and then re-encoded into é.
However it's length is now 2 (strlen($my_sent_value) = 2) and not 1.
And if in PHP I try: $my_sent_value == é = FALSE
I think if I am able to re-encode all these characters in PHP back into lengths of byte size 1 and then inserting them into Oracle, it should work.
Still no luck though
UPDATE: Mar 10 at 11:05
I keep thinking I am so close (yet so far away).
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9"); works very sporadicly.
I created a small php script to test:
header('Content-Type: text/plain; charset=ISO-8859-1');
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9");
$conn= oci_connect("user", "pass", "DB");
$stmt = oci_parse($conn, "UPDATE temp_tb SET string_field = '|é|'");
oci_execute($stmt, OCI_COMMIT_ON_SUCCESS);
After running this once and loggin into the Oracle Database directly I see that STRING_FIELD is set to |¿|. Obviously not what I had come to expect from my previous experience.
However, if I refresh that PHP page twice quickly.... it worked !!!
In Oracle I correctly saw |é|.
It seems like maybe the environment variable is not being correctly set or sent in time for the first execution of the script, but is available for the second execution.
My next experiment is to export the variable into PHP's environment, however, I need to reset Apache for that...so we'll see what happens, hopefully it works.
I presume you are aware of these facts:
There are many different character sets: you have to pick one and, of course, know which one you are using.
Oracle is perfectly capable of storing text without HTML entities (é). HTML entities are used in, well, HTML. Oracle is not a web browser ;-)
You must also know that HTML entities are not bind to a specific charset; on the contrary, they're used to represent characters in a charset-independent context.
You indistinctly talk about ISO-8859-1 and UTF-8. What charset do you want to use? ISO-8859-1 is easy to use but it can only store text in some latin languages (such as Spanish) and it lacks some common chars like the € symbol. UTF-8 is trickier to use but it can store all characters defined by the Unicode consortium (which include everything you'll ever need).
Once you've taken the decision, you must configure Oracle to hold data in such charset and choose an appropriate column type. E.g., VARCHAR2 is fine for plain ASCII, NVARCHAR2 is good for UTF-8.
This is what I finally ended up doing to solve this problem:
Modified the profile of the daemon running PHP to have:
NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
So that the oci8 connection uses ISO-8859-1.
Then in my PHP configuration set the default content-type to ISO-8859-1:
default_charset = "iso-8859-1"
When I am inserting into an Oracle Table via oci8 from PHP, I do:
utf8_decode($my_sent_value)
And when receiving data from Oracle, printing the variable should just work as so:
echo $my_received_value
However when sending that data over ajax I have had to use:
utf8_encode($my_received_value)
If you really cannot change the character set that oracle will use then how about Base64 encoding your data before storing it in the database. That way, you can accept characters from any character set and store them as ISO-8859-1 (because Base64 will output a subset of the ASCII character set which maps exactly to ISO-8859-1). Base64 encoding will increase the length of the string by, on average, 37%
If your data is only ever going to be displayed as HTML then you might as well store HTML entities as you suggested, but be aware that a single entity can be up to 10 characters per unencoded character e.g. ϑ is ϑ
I had to face this problem : the LatinAmerican special characters are stored as "?" or "¿" in my Oracle database ... I can't change the NLS_CHARACTER_SET because we're not the database owners.
So, I found a workaround :
1) ASP.NET code
Create a function that converts string to hexadecimal characters:
public string ConvertirStringAHex(String input)
{
Encoding encoding = System.Text.Encoding.GetEncoding("ISO-8859-1");
Byte[] stringBytes = encoding.GetBytes(input);
StringBuilder sbBytes = new StringBuilder(stringBytes.Length);
foreach (byte b in stringBytes)
{
sbBytes.AppendFormat("{0:X2}", b);
}
return sbBytes.ToString();
}
2) Apply the function above to the variable you want to encode, like this
myVariableHex = ConvertirStringZHex( myVariable );
In ORACLE, use the following:
PROCEDURE STORE_IN_TABLE( iTEXTO IN VARCHAR2 )
IS
BEGIN
INSERT INTO myTable( SPECIAL_TEXT )
VALUES ( UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW( iTEXTO ));
COMMIT;
END;
Of course, iTEXTO is the Oracle parameter which receives the value of "myVariableHex" from ASP.NET code.
Hope it helps ... if there's something to improve pls don't hesitate to post your comments.
Sources:
http://www.nullskull.com/faq/834/convert-string-to-hex-and-hex-to-string-in-net.aspx
https://forums.oracle.com/thread/44799
If you have different charsets between the server side code (php in this case) and the Oracle database, you should set server side code charset in the Oracle connection, then Oracle made the conversion.
Example: Let's assume:
php charset utf-8 (default).
Oracle charset AMERICAN_AMERICA.WE8ISO8859P1
In the connection to Oracle made by php you should set UTF8 (third parameter).
oci_pconnect("USER", "PASS", "URL"),"UTF8");
Doing this, you write code in utf-8 (not doing any conversion at all) and get utf-8 from the database through this connection.
So you could write something like SELECT * FROM SOME_TABLE WHERE TEXT = 'SOME TEXT LIKE áéíóú Ñ' and also get utf-8 text as a result.
According to the php documentation, by default, Oracle client (oci_pconnect) takes the NLS_LANG environment variable from the Operating system. Some debian based systems has no NLS_LANG enviromental variable, so I think Oracle client use it's own charset (AMERICAN_AMERICA.WE8ISO8859P1) if we don't specify the third parameter.

Categories