Using PHP to decode a string - php

I'm trying to decode a string using PHP but it doesn't seem to be returning the correct result.
I've tried using html_entity_decode as well as utf8_decode(urldecode())
Current code:
$str = "joh'#test.com";
$decodeStr = html_entity_decode($str, ENT_COMPAT, "UTF-8");
Expected return is john#test.com

I suppose your html entity code for character 'n' is wrong.
Working example:
$str = "john#test.com";
echo $decodeStr = html_entity_decode($str, ENT_COMPAT, "UTF-8");

The HTML entity code for n is n, whereas the entity code in your string is for a single apostrophe '. If you wanted to convert single quotes, the ENT_QUOTES flag must be used when calling html_entity_decode(), as the default is ENT_COMPAT | ENT_HTML401 (from the PHP docs) which doesn't convert single quotes. If you need additional flags, you can "add" them using the pipe | symbol like this: ENT_HTML401 | ENT_QUOTES.
If you're expecting john#test.com:
$str = "john#test.com";
$decodeStr = html_entity_decode($str, ENT_COMPAT, "UTF-8");
echo $decodeStr; // john#test.com
Or if you're expecting joh'#test.com:
$str = "joh'#test.com";
$decodeStr = html_entity_decode($str, ENT_QUOTES, "UTF-8");
echo $decodeStr; // joh'#test.com

Shouldn't the entity for # be # instead of ' which is for an apostrophe?

Related

Convert france character to HTML special character

I'm using Zend framework with mongoDB. I need to convert France character to special character.
For example: Prénom -> Prénom . what could I do?
htmlentities ( http://php.net/htmlentities ) can do this if you call:
htmlentities('Prénom', ENT_COMPAT, 'UTF-8');
I get:
Prénom
as the result
Maybe you can take a look at strtr function (Read more at http://php.net/strtr)?
I think that the right way to look is either mb_convert_encoding or htmlentities
Here is an example which you can view here:
$text = "Prénom";
echo mb_convert_encoding($text, 'HTML-ENTITIES', 'UTF-8');
echo "\n";
echo htmlentities($text, ENT_COMPAT | ENT_HTML401, 'UTF-8');

PHP How to encode text to numeric entity?

I have xml like this:
<formula type="inline">
<default:math xmlns="http://www.w3.org/1998/Math/MathML">
<default:mi>
&Zopf;
</default:mi>
</default:math>
</formula>
My goal is to get rid of all special entities like &Zopf; by replacing them by their numeric entity presentations.
I tried :
$test = <content of the xml>;
$convmap = array(0x80, 0xffff, 0, 0xffff);
$test = mb_encode_numericentity($test, $convmap, 'UTF-8');
But this will not replace the &Zopf; Any idea?
My goal is to get:
ℤ
as shown here: http://www.fileformat.info/info/unicode/char/2124/index.htm
Thank you.
Your converter is converting your LaTeX into MathML, not HTML entities. You need something that converts directly into HTML character references, or a MathML to HTML character reference converter.
You should be able to use htmlentities:
htmlentities($symbolsToEncode, ENT_XML1, 'UTF-8');
http://pt1.php.net/htmlentities
You can change ENT_XML1 to ENT_SUBSTITUTE and it will return Unicode Replacement Characters or Hex character references.
As an alternative, you could use strtr to convert the characters to something you specify:
$chars = array(
"\x8484" => "蒄"
...
);
$convertedXML = strtr($xml, $chars);
http://php.net/strtr
Someone has done something similar on GitHub.
So you need to decode the named entities first:
function decodeNamedEntities($string) {
static $entities = NULL;
if (NULL === $entities) {
$entities = array_flip(
array_diff(
get_html_translation_table(HTML_ENTITIES, ENT_COMPAT | ENT_HTML5, 'UTF-8'),
get_html_translation_table(HTML_ENTITIES, ENT_COMPAT | ENT_XML1, 'UTF-8')
)
);
}
return str_replace(array_keys($entities), $entities, $string);
}
After that you can use htmlentities to encode them in a different format if it is really needed.

htmlentities() double encoding entities in string

I want only the unencoded characters to get converted to html entities, without affecting the entities which are already present. I have a string that has previously encoded entities, e.g.:
gaIUSHIUGhj>&hyphen; hjb×jkn.jhuh>hh> …
When I use htmlentities(), the & at the beginning of entities gets encoded again. This means &hyphen; and other entities have their & encoded to &:
&times;
I tried decoding the complete string, then encoding it again, but it does not seem to work properly. This is the code I tried:
header('Content-Type: text/html; charset=iso-8859-1');
...
$b = 'gaIUSHIUGhj>&hyphen; hjb×jkn.jhuh>hh> …';
$b = html_entity_decode($b, ENT_QUOTES, 'UTF-8');
$b = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $b);
$b = htmlentities($b, ENT_QUOTES, 'UTF-8');
But it does not seem to work the right way. Is there a way to prevent or stop this from happening?
Set the optional $double_encode variable to false. See the documentation for more information.
Your resulting code should look like:
$b = htmlentities($b, ENT_QUOTES, 'UTF-8', false);
You did good looking at the documentation, but you missed the best part. It can be hard to decipher this sometimes:
// > > > > > > Scroll >>> > > > > > Keep going. > > > >>>>>> See below. <<<<<<
string htmlentities ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = 'UTF-8' [, bool $double_encode = true ]]] )
Look at the very end.
I know. Confusing. I usually ignore the signature line and go straight down to the next block (Parameters) for the blurbs on each argument.
So you want to use the double_encoded argument at the end to tell htmlentities not to re-encode (and you probably want to stick with UTF-8 unless you have a specific reason not to):
$str = "gaIUSHIUGhj>&hyphen; hjb×jkn.jhuh>hh> …";
// Double-encoded!
echo htmlentities($str, ENT_COMPAT, 'utf-8', true) . "\n";
// Not double-encoded!
echo htmlentities($str, ENT_COMPAT, 'utf-8', false);
https://ignite.io/code/513ab23bec221e4837000000

PHP htmlentities and saving the data in xml format

Im trying to save some data into a xml file using the following PHP script:
<?php
$string = 'Go to google maps and some special characters ë è & ä etc.';
$string = htmlentities($string, ENT_QUOTES, 'UTF-8');
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$root = $doc->createElement('top');
$root = $doc->appendChild($root);
$title = $doc->createElement('title');
$title = $root->appendChild($title);
$id = $doc->createAttribute('id');
$id->value = '1';
$text = $title->appendChild($id);
$text = $doc->createTextNode($string);
$text = $title->appendChild($text);
$doc->save('data.xml');
echo 'data saved!';
?>
I'm using htmlentities to translate all of the string into an html format, if I leave this out the special characters won't be translated to html format. this is the output:
<?xml version="1.0" encoding="UTF-8"?>
<top>
<title id="1">&lt;a href=&quot;google.com/maps&quot;&gt;Go to google maps&lt;/a&gt; and some special characters &euml; &egrave; &amp; &auml; etc.</title>
</top>
The ampersand of the html tags get a double html code: &lt; and an ampersand becomes: &amp;
Is this normal behavior? Or how can I prevent this from happening? Looks like a double encoding.
Try to remove the line:
$string = htmlentities($string, ENT_QUOTES, 'UTF-8');
Because the text passed to createTextNode() is escaped anyway.
Update:
If you want the utf-8 characters to be escaped. You could leave that line and try to add the $string directly in createElement().
For example:
$title = $doc->createElement('title', $string);
$title = $root->appendChild($title);
In PHP documentation it says that $string will not be escaped. I haven't tried it, but it should work.
It is the htmlentities that turns a & into &
When working with xml data you should not use htmlentities, as the DOMDocument will handle a & and not &.
As of php 5.3 the default encoding is UTF-8, so there is no need to convert to UTF-8.
This line:
$string = htmlentities($string, ENT_QUOTES, 'UTF-8');
… encodes a string as HTML.
This line:
$text = $doc->createTextNode($string);
… encodes your string of HTML as XML.
This gives you an XML representation of an HTML string. When the XML is parsed you get the HTML back.
how can I prevent this from happening?
If your goal is to store some text in an XML document. Remove the line that encodes it as HTML.
Looks like a double encoding.
Pretty much. It is encoded twice, it just uses different (albeit very similar) encoding methods for each of the two passes.

PHP - convert a string with - or + signs to HTML

How do I convert a string that has a - or + sign to a html friendly string?
I mean to convert those characters to html notations, like space is and so on...
ps: htmlentities doesn't work. I still see the -/+
Try this
$string = str_replace('+', '+', $string); // Convert + sign
$string = str_replace('-', '-', $string); // Convert - sign
I don't think there is entities for these symbols see: http://www.w3schools.com/tags/ref_entities.asp
I tested with
$str = "- and +"; echo htmlentities($str);
and didn't get entities. According to: http://us.php.net/manual/en/function.htmlentities.php
I would expect them to be encoded if there was encoding available.
No idea what you want to accomplish. But this escapes selected characters to html entities:
$html = preg_replace("/([+-])/e", '"&#".ord("$1").";"', $html);
As far as I am aware, - and + are fine in HTML, and dont have an entity equivalent. See http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Are you sure you're not thinking of URL encoding?
Specify that you want it to use unicode as follows:
htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
Have a look at the 2nd comment on this page:
http://www.php.net/manual/en/function.htmlentities.php#100388
This will enable more encoding characters.
If you just want to encode some, then this is a little lighter weight:
<?php
$ent = array(
'+'=>'+',
'-'=>'+'
);
echo strtr('+ and -', $ent);
?>

Categories