PHP substr giving weird characters from xml feed

PHP substr giving weird characters from xml feed - php

For some reason when using substr I am getting a weird character on output like this "".
All I am doing is this:
$price = substr($game->price, 1);
$dollars_original = substr($game->fullPrice, 1);
echo "Price: $" . $price . "<br />\n";
echo "Original Price: $" . $dollars_original . "<br />\n";
This comes from an XML feed here: http://itch.io/browse/platform-linux/price-sale.xml which i parse like so:
$url = 'http://itch.io/browse/platform-linux/price-sale.xml';
$xml = simplexml_load_string(file_get_contents($url));
So for example a price might be £0.89 but, when removing the £ sign it comes out as �0.89
What am I missing here?

It looks like you have a mixup in character encodings. What is Latin-1 or Windows-1252, and what is UTF-8? If you're working on UTF-8 data, you may have to use mb_substr(), which is UTF-8 aware.

Related

PHP UTF-8 mb_convert_encode and Internet-Explorer

Since some days I read about Character-Encoding, I want to make all my Pages with UTF-8 for Compability. But I get stuck when I try to convert User-Input to UTF-8, this works on all Browsers, expect Internet-Explorer (like always).
I don't know whats wrong with my code, it seems fine to me.
I set the header with char encoding
I saved the file in UTF-8 (No BOM)
This happens only, if you try to access to the page via $_GET on the internet-Explorer myscript.php?c=äüöß
When I write down specialchars on my site, they would displayed correct.
This is my Code:
// User Input
$_GET['c'] = "äüöß"; // Access URL ?c=äüöß
//--------
header("Content-Type: text/html; charset=utf-8");
mb_internal_encoding('UTF-8');
$_GET = userToUtf8($_GET);
function userToUtf8($string) {
if(is_array($string)) {
$tmp = array();
foreach($string as $key => $value) {
$tmp[$key] = userToUtf8($value);
}
return $tmp;
}
return userDataUtf8($string);
}
function userDataUtf8($string) {
print("1: " . mb_detect_encoding($string) . "<br>"); // Shows: 1: UTF-8
$string = mb_convert_encoding($string, 'UTF-8', mb_detect_encoding($string)); // Convert non UTF-8 String to UTF-8
print("2: " . mb_detect_encoding($string) . "<br>"); // Shows: 2: ASCII
$string = preg_replace('/[\xF0-\xF7].../s', '', $string);
print("3: " . mb_detect_encoding($string) . "<br>"); // Shows: 3: ASCII
return $string;
}
echo $_GET['c']; // Shows nothing
echo mb_detect_encoding($_GET['c']); // ASCII
echo "äöü+#"; // Shows "äöü+#"
The most confusing Part is, that it shows me, that's converted from UTF-8 to ASCII... Can someone tell me why it doesn't show me the specialchars correctly, whats wrong here? Or is this a Bug on the Internet-Explorer?
Edit:
If I disable converting it says, it's all UTF-8 but the Characters won't show to me either... They are displayed like "????"....
Note: This happens ONLY in the Internet-Explorer!

Although I prefer using urlencoded strings in address bar but for your case you can try to encode $_GET['c'] to utf8. Eg.
$_GET['c'] = utf8_encode($_GET['c']);

An approach to display the characters using IE 11.0.18 which worked:
Retrieve the Unicode of your character : example for 'ü' = 'U+00FC'
According to this post, convert it to utf8 entity
Decode it using utf8_decode before dumping
The line of code illustrating the example with the 'ü' character is :
var_dump(utf8_decode(html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", 'U+00FC'), ENT_NOQUOTES, 'UTF-8')));
To summarize: For displaying purposes, go from Unicode to UTF8 then decode it before displaying it.
Other resources:
a post to retrieve characters' unicode

Base 64 encode - twice

I need to base 64 encode parts of a URL for an S3 URL.
I'm left with something like:
http://d111111abcdef8.cloudfront.net/image.jpg?color=red&size=medium
&Expires=1357034400
&Signature=nitfHRCrtziwO2HwPfWw~yYDhUF5EwRunQA-j19DzZrvDh6hQ73lDx~-ar3UocvvRQVw6EkC~GdpGQyyOSKQim-TxAnW7d8F5Kkai9HVx0FIu- 5jcQb0UEmatEXAMPLE3ReXySpLSMj0yCd3ZAB4UcBCAqEijkytL6f3fVYNGQI6
&Key-Pair-Id=APKA9ONS7QCOWEXAMPL
As you can see Signature and Key Pair ID are encoded.
I need to use the above URL as a param in another URL.
I have base64 encoded (as to mask the domain, makes it a liitle prettier) and then URL encoded this.
My question is, with having certain params base 64 encoded, then base 64 encoding the entire string again, upon decode, will the original params such as Signature and Key Pair ID be readable?

Simple question, simple answer: Yes.

If you are going to do this, you will need to use "&" instead of "&" in the string you are encoding. Also, base64_decoding the entire encoded string, will only decode the last encoding.
An example:
$string1 = "This is a string";
$string2 = "This is another String";
$string1 = base64_encode( $string1 );
$string2 = base64_encode( $string2 );
echo $string1 . "<br />";
echo $string2 . "<br />";
$entity = "HTTP://www.google.com/?param1=" . $string1 . "&param2=" . $string2;
$encoded_entity = base64_encode( $entity );
echo $encoded_entity . "<br />";
$decoded_entity = base64_decode( $encoded_entity );
echo $decoded_entity . "<br />";
This will output:
VGhpcyBpcyBhIHN0cmluZw==
VGhpcyBpcyBhbm90aGVyIFN0cmluZw==
SFRUUDovL3d3dy5nb29nbGUuY29tLz9wYXJhbTE9VkdocGN5QnBjeUJoSUhOMGNtbHVadz09JmFtcDtwYXJhbTI9VkdocGN5QnBjeUJoYm05MGFHVnlJRk4wY21sdVp3PT0=
HTTP://www.google.com/?param1=VGhpcyBpcyBhIHN0cmluZw==&param2=VGhpcyBpcyBhbm90aGVyIFN0cmluZw==
So as you can see, you can decode the entire string, but only the string you encoded will be decoded. Not all the levels of the encoded string. For that you will first have to decode the string, and then after that, decode the parameters.

iconv(): Detected an illegal character in input string

I am using Iconv function to convert string to requested character encoding. Look On Below Code
$sms_text = 'A:'f3*'F'; // Output received from SMPP
$result = iconv('UTF-16BE' ,'UTF-8//IGNORE' , $sms_text);
echo 'Ignore: ' .$result;
echo $sms_text = iconv('UTF-16BE' ,'UTF-8' , $sms_text);
$result1 = iconv('UTF-16BE' ,'UTF-8//TRANSLIT' , $sms_text); //line no (53)
echo 'Transilt: '.$result1;
And I received the below Output
If I have a string of dari and pashto Language, then its showing only first word and dont return remaining string after blank space. Even //IGNORE gives the same output.
Should I replace these blank spaces with the help of with some other character so that I can get complete string?
Note: I am passing string received from SMPP(receiver).
SMS Sent to SMPP : افغانستان کابل
Outpur Received from SMPP : 'A:'f3*'F
String back converted by iconv : افغانستان
English string is working well.
Thanks in advance.

You are converting string to UTF-8 charset on this line:
echo $sms_text = iconv('UTF-16BE' ,'UTF-8' , $sms_text);
The error appears becouse you are trying to convert it second time.
To resolve this issue you should not update $sms_text variable on mentioned line.
Here is a code and output that works for me:
$sms_text ="فغانستان کابل";
echo "UTF-8 : $sms_text \n";
$sms_text = iconv('UTF-8', 'UTF-16BE', $sms_text);
echo "UTF-16BE : $sms_text \n";
echo 'Ignore: ' . iconv('UTF-16BE' ,'UTF-8//IGNORE' , $sms_text) . "\n";
echo 'Simple: ' . iconv('UTF-16BE' ,'UTF-8' , $sms_text) . "\n";
echo 'Transilt: '. iconv('UTF-16BE' ,'UTF-8//TRANSLIT' , $sms_text) . "\n";
Output:
UTF-8 : فغانستان کابل
UTF-16BE : A:'F3*'F
Ignore: فغانستان کابل
Simple: فغانستان کابل
Transilt: فغانستان کابل
As for blank spaces, could you please share the test string?

Acents become interrogation marks in php when parsing html

i'm getting a PT-BR text automatically from downloading a html page and the acentution becomes interrogation marks when I use uft8_decode, this is my function:
function pegaMsg($string)
{
$bot_url = "http://website.com";
//&rnd=&msg="
$rand_msg = rand(0,100);
$url = $bot_url . $rand_msg . "&msg=" . $string;
$url = str_replace(" ", "%20", $url);
//echo "\n" . $url;
$download = http_get($url, $referer="");
$download['FILE'] = utf8_decode($download['FILE']);
$download['FILE'] = str_replace("var resp = ", "", $download['FILE']);
$download['FILE'] = str_replace("\\r\\n", "", $download['FILE']);
$download['FILE'] = str_replace(";", "", $download['FILE']);
$download['FILE'] = str_replace("\'", "", $download['FILE']);
$download['FILE'] = trim($download['FILE']);
return $download['FILE'];
}
this is the output expected:
VOCÊ TINHA DUAS ESCOLHAS:
and this is what I get:
'VOC? TINHA DUAS ESCOLHAS:
what can I do ? I want the ^ displayed ! thanks and sorry for the bad english

utf8_decode replaces invalid code unit sequences ?. The reason you're getting a ? is likely because the text you're passing to utf8_decode was not in UTF-8 to begin with.
In fact, it's possible it was already in ISO-8859-1, which is the encoding of the string returned by utf8_decode. In that case, your solution would be to just omit the call to utf8_decode.
If the original text was neither in UTF-8 nor in ISO-8859-1 (which is what I'm assuming you want, since you're calling utf8_decode), you have to use iconv or mb_convert_encoding.
A final possibility is that whatever is interpreting the script output is assuming the encoding of the script output is different from what it actually and it also converts invalid code unit sequences to ?.

Try to use encode
$download['FILE'] = utf8_encode($download['FILE']);

How do I convert this one element of the array to utf-8?

Using Zend _gdata. For some reason, recently the $when string is no longer utf-8. I need to convert it to utf-8. All the other fields are working fine.
foreach ($feed as $event) { //iterating through all events
$contentText = stripslashes($event->content->text); //striping any escape character
$contentText = preg_replace('/\<br \/\>[\n\t\s]{1,}\<br \/\>/','<br />',stripslashes($event->content->text)); //replacing multiple breaks with a single break
$contentText = explode('<br />',$contentText); //splitting data by break tag
$eventData = filterEventDetails($contentText);
$when = $eventData['when'];
$where = $eventData['where'];
$duration = $eventData['duration'];
$title = stripslashes($event->title);
echo '<li class="pastShows">' . $when . " - " . $title . ", " . $where . '</li>';
}
How do I make $when utf-8?
Thanks!

Depending on what encoding that string is using, you should be able to encode it to UTF-8 using one of the following functions :
utf8_encode()
iconv()
For example :
$when = utf8_encode($eventData['when']);
Or :
$when = iconv('ISO-8859-1', 'UTF-8', $eventData['when']);

If the string is in Latin1 you can just do what Pascal suggests.
Otherwise you need to find out which encoding it is.
Therefor check your php.ini settings or you can try to detect it by mb_detect_encoding (be aware it's not fail prove)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP substr giving weird characters from xml feed - php

It looks like you have a mixup in character encodings. What is Latin-1 or Windows-1252, and what is UTF-8? If you're working on UTF-8 data, you may have to use mb_substr(), which is UTF-8 aware.

Related

PHP UTF-8 mb_convert_encode and Internet-Explorer

Base 64 encode - twice

iconv(): Detected an illegal character in input string

Acents become interrogation marks in php when parsing html

How do I convert this one element of the array to utf-8?

Categories

Resources