PHP function to convert special characters to unicode(UTF-16) - php

Is there a PHP function that can take a string and convert any special characters to unicode. Similar to htmlspecialchars() or UTF8_encode().
For example in the string: "I think Bob's going too".
I would need the apostrophe or single right quote unicode in place of the apostrophe in "Bob's". So then after conversion the string should read: "I think Bob\u2019s going too".
I need this for use in a PHP script that prints into a javascript function.
Using \ to escape or ' does not work, it stops the script from running. I am trying to use Flowplayers Playist plugin. The only way it seems I can have a string with special characters is if they are in unicode.
Here is a JSFIDDLE to play around with and see what I mean when I say it doesn't work. Just replace \u2019 with ' or something similar and click to have the song play. The media player just goes black and doesn't play anything, whereas if you leave it with \u2019 then it plays fine.
Any help is appreciated.

I think json_encode() is the function you are looking for here.
The following code:
$string = "I think Bob’s going too";
print_r(json_encode($string));
will output:
"I think Bob\u2019s going too"

Related

Finding and replacing special characters in php

Encoding makes this a tough thing to explain. I'm getting a string from an XML file using PHP. When I echo it I see a small black circle: • or • . Oh, stackoverflow renders these, sorry. I meant to say it's the ascii character "bull" or "#8226"
echo $str;
gets me:
[CIRCLE] wordswords [CIRCLE] more words [CIRCLE] still more words
How can I find this character using PHP? I want to explode on it. I can't search for a circle, and searching for 8226 or circ doesn't work. Do I have to use urlencode?
$str=url_encode($str);
$str=str_replace(%E2%80%A2,'-CIRCLE-',$str);
$str=url_decode($str);
$str=explode('-CIRCLE-');
Or is there a more efficient way?
Check out this thread: Bullet "•" in XML. I think it will help your to find an answer.

htmlspecialchars in php not decoding

I have a application where I store sting as it is but while dispying, I want special characters to be converted to their HTML name like for < will be &lt. To achieve it, I am using php inbuilt function htmlspecialchars.
Output of text with this function is achieved with following code
$reviewTxt = htmlspecialchars($reviewTxt);
echo $reviewTxt;
Now, for reviewTxt to be 'I loved you <3', it should produce I loved you <3 but should display the original text. In my case, it displays the encoded data I loved you <3. I also tried to paste I loved you <3 instead of above php code just to see if I can get original text and yes, it shows 'I loved you <3'.
I am not sure what I am missing,
It looks like you are encoding twice with htmlspecialchars() / htmlentities().
That causes the & symbol of the first result to be encoded in the second result, giving you a string like I loved you &lt;3.
So it will show the encoded & followed by the litteral string lt;.

How to get &curren to display literally, not as an HTML entity

I'm using php to look at an XML file that has a URL in it. The URLs look something like this:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
When I echo out the URLs, the "&curren" shows up as "¤" (AKA #164, A4 or currency symbol) and the links don't work. This happens even though there isn't a closing semicolon for it. What is the cleanest way to make "&curren" display literally?
Funny enough I ran into the same problem just now and I found this answer. However, I found another solution which might even be better!
Simply put the variable at the beginning of your query string, and you will avoid the &curren completely.
Do:
https://site.com/bacon_report?currentDimension=2&Id=1&report=1&param=1
instead of:
https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
Use the php function urlencode:
urlencode("https://site.com/bacon_report?Id=1&report=1&currentDimension=2&param=1"
will output
https%3A%2F%2Fsite.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
The problem here is escaping - you need to escape the "&" characters. In XML all special characters like <, >, ', " and & should be escaped.
Escape it properly as
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
..just like in HTML:
WRONG - no escaping
CORRECT - correct escape sequence
So - the cleanest way to show "&curren" in HTML/XML is to properly escape the ampersand, and render it as "&curren".
I think that in this case it is best to use htmlentities because with urlencode you get
https%3A%2F%2Fexample.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
and when applying urldecode, you will still have the &curren symbol
where as with htmlentities the url comes out clean.
https://example.com/bacon_report?Id=1&report=1&currentDimension=2&param=1
I came across this issue while working on technical documentation (in Markdown which gets converted to HTML).
To solve the issue I used a zero-width space character which I copied and pasted from between these brackets (​). That way it appears that there is no space and can include the below without any issues:
/search?query=1&currentLonLat=-74.600291,40.360869

apostrophe in preg_match_all() is giving me problems

So I've got this piece of code that wont play nice.
preg_match_all("/(\{\[)([\w-\d\s\.\|']*)(\]\})/i",$replace_text, $match);
What it is supposed to do, is allow an apostrophe to be in my replacement text. So in my text, where i have "{[SPIN--they are|they’re]}" it should return "they are" or "they're".
But instead, it simply does nothing and spits out the entire spintax code just as I typed above.
The only time this does not work, is when a replacement text has an apostrophe. It works perfectly everywhere else. Been trying to fix this for two days and I'm about to throw my keyboard through my monitor.
There are many things that my project does and it is imperative to have the {[SPIN-- before specifying the replacement text, and the ]} closing brackets.
Can someone help, please?
In your example string it's not a single quote character, but something that looks similarly
’ (the actual character) vs ' (that's what you think it is)

Replace characters in a string with their HTML coding

I need to replace characters in a string with their HTML coding.
Ex. The "quick" brown fox, jumps over the lazy (dog).
I need to replace the quotations with the & quot; and replace the brakets with & #40; and & #41;
I have tried str_replace, but I can only get 1 character to be replaced. Is there a way to replace multiple characters using str_replace? Or is there a better way to do this?
Thanks!
I suggest using the function htmlentities().
Have a look at the Manual.
PHP has a number of functions to deal with this sort of thing:
Firstly, htmlentities() and htmlspecialchars().
But as you already found out, they won't deal with ( and ) characters, because these are not characters that ever need to be rendered as entities in HTML. I guess the question is why you want to convert these specific characters to entities? I can't really see a good reason for doing it.
If you really do need to do it, str_replace() will do multiple string replacements, using arrays in both the search and replace paramters:
$output = str_replace(array('(',')'), array('&#40','&#41'), $input);
You can also use the strtr() function in a similar way:
$conversions = array('('=>'(', ')'=>')');
$output = strtr($conversions, $input);
Either of these would do the trick for you. Again, I don't know why you'd want to though, because there's nothing special about ( and ) brackets in this context.
While you're looking into the above, you might also want to look up get_html_translation_table(), which returns an array of entity conversions as used in htmlentities() or htmlspecialchars(), in a format suitable for use with strtr(). You could load that array and add the extra characters to it before running the conversion; this would allow you to convert all normal entity characters as well as the same time.
I would point out that if you serve your page with the UTF8 character set, you won't need to convert any characters to entities (except for the HTML reserved characters <, > and &). This may be an alternative solution for you.
You also asked in a separate comment about converting line feeds. These can be converted with PHP's nl2br() function, but could also be done using str_replace() or strtr(), so could be added to a conversion array with everything else.

Categories