SimpleXMLElement set value to special characters - php

I'm trying to send special characters (less than, greater than) to the text of a node using SimpleXMLElement, but it's converting it to the escaped values.
$header[0] = "%SET(Amt_,<AMT>". $amt . "</AMT>) \n" .$header[0];
The above results in the following in the XML file:
%SET(Amt_,<AMT>100</AMT>)
I tried using html_entity_decode and it still wrote to file the same way. Is there any way to write 'special' characters to the text value of a simplexmlelement object?
clarification: I want to write the actual characters '<' and '>' to the file when $header->asXML() is called. Currently the escaped versions are what is written to file.

Use htmlspecialchars while sending data and htmlspecialchars_decode whenever you want to get the original string.
<?php
$amt = 100;
$header[0] = "Some Value";
$header[0] = "%SET(Amt_,<AMT>". $amt . "</AMT>) \n" .$header[0];
$node = htmlspecialchars($header[0]);
$value = htmlspecialchars_decode($node);
file_put_contents("filename.txt", $value);
echo "written on file";
?>

A work around that works for my case, I reopened the file contents, decoded it and resaved it to file.
$xml->asXML('test.xml');
$coded = file_get_contents('test.xml');
file_put_contents('test.xml', htmlspecialchars_decode($coded), LOCK_EX);

Related

Wrapping String PHP

I have a problem with my code, i have this code that create image from external source of image & string. I used json to get the string.
My problem is if i used the string from json data i could not get the proper wrapping of string like this:
http://prntscr.com/dbhg4n
$url = 'https://bible-api.com/Psalm100:4-5?translation=kjv';
$JSON = file_get_contents($url);
$data = json_decode($JSON);
$string = $data->text;
But if i declare and set string directly i got the output that i want like this:
http://prntscr.com/dbhg7q
$string = "Enter into his gates with thanksgiving, and into his courts with praise: be thankful unto him, and bless his name. For the Lord is good; his mercy is everlasting; and his truth endureth to all generations.";
I dont think the error or the problem is on the code for wrapping the text on my image. I think it is on the json data. How can i fix this?
The text has \n symblols. Just replace them:
$string = preg_replace("/\n/", ' ', $data->text);
or without a regular expression:
$string = str_replace("\n", ' ', $data->text);

PHP UTF-8 mb_convert_encode and Internet-Explorer

Since some days I read about Character-Encoding, I want to make all my Pages with UTF-8 for Compability. But I get stuck when I try to convert User-Input to UTF-8, this works on all Browsers, expect Internet-Explorer (like always).
I don't know whats wrong with my code, it seems fine to me.
I set the header with char encoding
I saved the file in UTF-8 (No BOM)
This happens only, if you try to access to the page via $_GET on the internet-Explorer myscript.php?c=äüöß
When I write down specialchars on my site, they would displayed correct.
This is my Code:
// User Input
$_GET['c'] = "äüöß"; // Access URL ?c=äüöß
//--------
header("Content-Type: text/html; charset=utf-8");
mb_internal_encoding('UTF-8');
$_GET = userToUtf8($_GET);
function userToUtf8($string) {
if(is_array($string)) {
$tmp = array();
foreach($string as $key => $value) {
$tmp[$key] = userToUtf8($value);
}
return $tmp;
}
return userDataUtf8($string);
}
function userDataUtf8($string) {
print("1: " . mb_detect_encoding($string) . "<br>"); // Shows: 1: UTF-8
$string = mb_convert_encoding($string, 'UTF-8', mb_detect_encoding($string)); // Convert non UTF-8 String to UTF-8
print("2: " . mb_detect_encoding($string) . "<br>"); // Shows: 2: ASCII
$string = preg_replace('/[\xF0-\xF7].../s', '', $string);
print("3: " . mb_detect_encoding($string) . "<br>"); // Shows: 3: ASCII
return $string;
}
echo $_GET['c']; // Shows nothing
echo mb_detect_encoding($_GET['c']); // ASCII
echo "äöü+#"; // Shows "äöü+#"
The most confusing Part is, that it shows me, that's converted from UTF-8 to ASCII... Can someone tell me why it doesn't show me the specialchars correctly, whats wrong here? Or is this a Bug on the Internet-Explorer?
Edit:
If I disable converting it says, it's all UTF-8 but the Characters won't show to me either... They are displayed like "????"....
Note: This happens ONLY in the Internet-Explorer!
Although I prefer using urlencoded strings in address bar but for your case you can try to encode $_GET['c'] to utf8. Eg.
$_GET['c'] = utf8_encode($_GET['c']);
An approach to display the characters using IE 11.0.18 which worked:
Retrieve the Unicode of your character : example for 'ü' = 'U+00FC'
According to this post, convert it to utf8 entity
Decode it using utf8_decode before dumping
The line of code illustrating the example with the 'ü' character is :
var_dump(utf8_decode(html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", 'U+00FC'), ENT_NOQUOTES, 'UTF-8')));
To summarize: For displaying purposes, go from Unicode to UTF8 then decode it before displaying it.
Other resources:
a post to retrieve characters' unicode

How to stop PHP Domdocument::SaveXML from inserting "CDATA"?

I'm using PHP to get all the "script" tags from web pages, and then appending text after the </script> that is not always valid html. Because it's not always valid markup I can't just use appendchild/replacechild to add that information, unless I'm misunderstanding how replacechild works.
Anyway, when I do
$script_tags = $doc->getElementsByTagName('script');
$l = $script_tags->length;
for ($i = $l - 1; $i > -1; $i--)
$script_tags_string = $doc->saveXML($script_tags->item($i));
This puts "<![CDATA[" and "]]>" around the contents of the script tag. How can I disable this? Please don't tell me to just delete it afterwards, that's what I'm going to do if I can't find a solution for this.
I have a suspicion that the CDATA is inserted because it would otherwise be invalid XML.
Have you tried using saveHTML instead of saveXML?
One way I've found to fix this:
Before echoing the document, make a loop around all script tags, and use str_replace for "<", ">" to some string, make sure to only use that string inside script tags.
Then, use the method saveXML() in a variable, and finally use str_replace replacing "STRING" to "<" or ">"
Here is the code:
<?php
//First loop
foreach($dom->getElementsByTagName('script') as $script){
$script->nodeValue = str_replace("<", "ESCAPE_CHAR_LT", $script->nodeValue);
$script->nodeValue = str_replace(">", "ESCAPE_CHAR_GT", $script->nodeValue);
}
//Obtaining XHTML
$output = $dom->saveXML();
//Seccond replace
$output = str_replace("ESCAPE_CHAR_LT", "<", $output);
$output = str_replace("ESCAPE_CHAR_GT", ">", $output);
//Print document
echo $output;
?>
As you can see, now you are free to use "<" ">" in your scripts.
Hope this helps someone.

Acents become interrogation marks in php when parsing html

i'm getting a PT-BR text automatically from downloading a html page and the acentution becomes interrogation marks when I use uft8_decode, this is my function:
function pegaMsg($string)
{
$bot_url = "http://website.com";
//&rnd=&msg="
$rand_msg = rand(0,100);
$url = $bot_url . $rand_msg . "&msg=" . $string;
$url = str_replace(" ", "%20", $url);
//echo "\n" . $url;
$download = http_get($url, $referer="");
$download['FILE'] = utf8_decode($download['FILE']);
$download['FILE'] = str_replace("var resp = ", "", $download['FILE']);
$download['FILE'] = str_replace("\\r\\n", "", $download['FILE']);
$download['FILE'] = str_replace(";", "", $download['FILE']);
$download['FILE'] = str_replace("\'", "", $download['FILE']);
$download['FILE'] = trim($download['FILE']);
return $download['FILE'];
}
this is the output expected:
VOCÊ TINHA DUAS ESCOLHAS:
and this is what I get:
'VOC? TINHA DUAS ESCOLHAS:
what can I do ? I want the ^ displayed ! thanks and sorry for the bad english
utf8_decode replaces invalid code unit sequences ?. The reason you're getting a ? is likely because the text you're passing to utf8_decode was not in UTF-8 to begin with.
In fact, it's possible it was already in ISO-8859-1, which is the encoding of the string returned by utf8_decode. In that case, your solution would be to just omit the call to utf8_decode.
If the original text was neither in UTF-8 nor in ISO-8859-1 (which is what I'm assuming you want, since you're calling utf8_decode), you have to use iconv or mb_convert_encoding.
A final possibility is that whatever is interpreting the script output is assuming the encoding of the script output is different from what it actually and it also converts invalid code unit sequences to ?.
Try to use encode
$download['FILE'] = utf8_encode($download['FILE']);

Replacing \r\n (newline characters) after running json_encode

So when I run json_encode, it grabs the \r\n from MySQL aswell. I have tried rewriting strings in the database to no avail. I have tried changing the encoding in MySQL from the default latin1_swedish_ci to ascii_bin and utf8_bin. I have done tons of str_replace and chr(10), chr(13) stuff. I don't know what else to say or do so I'm gonna just leave this here....
$json = json_encode($new);
if(isset($_GET['pretty'])) {
echo str_replace("\/", "/", jsonReadable(parse($json)));
} else {
$json = str_replace("\/", "/", $json);
echo parse($json);
}
The jsonReadable function is from here and the parse function is from here. The str_replaces that are already in there are because I am getting weird formatted html tags like </h1>. Finally, $new is an array which is crafted above. Full code upon request.
Help me StackOverflow. You're my only hope
Does the string contain "\r\n" (as in 0x0D 0x0A) or the literal string '\r\n'? If it's the former, this should remove any newlines.
$json = preg_replace("!\r?\n!", "", $json);
Optionally, replace the second parameter "" with "<br />" if you'd like to replace the newlines with a br tag. For the latter case, try the following:
$json = preg_replace('!\\r?\\n!', "", $json);
Don't replace it in the JSON, replace it in the source before you encode it.
I had a similar issue, i used:
$p_num = trim($this->recp);
$p_num = str_replace("\n", "", $p_num);
$p_num = str_replace("\r", ",", $p_num);
$p_num = str_replace("\n",',', $p_num);
$p_num = rtrim($p_num, "\x00..\x1F");
Not sure if this will help with your requirements.

Categories