How to convert special characters in HTML using PHP? - php

Im looking to convert special characters like smart quotes to HTML entities but without converting other HTML markup because I need HTML markup to work.
For example, convert <div>NVH “noise”</div> to <div>NVH “noise” issues<div>
Its strange that if I log this on my local environment I get “noise” with smartquotes but on server I got ?noise?. My local runs LAMP with php56. server ran 54 and 55. I upgraded to 56 still no luck. I think either something in php configuration or among other things. Same exact code.

For only one translation of the smart quotes, or for some characters and not others, str_replace is probably the only way to go:
$string = str_replace(array('“','”'), array('“','”'), $string);

Try php htmlspecialchars method
<?php
$new = htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo $new; // <a href='test'>Test</a>
?>

Related

Settings that could influence PHP str_replace behaviour

I am currently working on a replacement tool that will dynamically replace certain strings (including html) in a website using a smarty outputfilter.
For the replacement to take place, I am using PHP's str_ireplace method, which reads the code that is supposed to be replaced and the replacement code from a database, and then pass the result to the smarty output (using an output filter), in a similar way as the below.
$tpl_source = str_ireplace($replacements['sourceHTML'], $replacements['replacementHTML'], $tpl_source);
The problem is, that although it works great on my dev server, once uploaded to the live server replacements occasionally fail. The same replacements work just fine on my dev version though. After some examinations and googling there was not much I could find out regarding this issue. So my question is, what could influence str_replace's behavour?
Thanks
Edit with replacement example:
$htmlsource = file_get_contents('somefile.html');
$newstr = str_replace('Some text', 'sometext', $htmlsource); // the text to be replaced does exist in the html source
fails to replace. After some checking, it looks like the combination of "> creates a problem. But just the combination of it. If I try to change only (") it works, if I try to change only (>) it works.
It might be that special chars like umlauts do not display on the live server correctly and so str_replace() would fail, if there are specialchars inside the string you want to replace.
Is the input string identical on both systems? Have you verified this? Are you sure?
Things to check:
Are the HTML attributes in the same order?
Are the attribute values using the same kind quote marks? (eg <a href='#'> vs <a href="#">)
Is there any other stray HTML code getting in there?
Is the entity encoding the same? (eg vs   - same character; different HTML)
Is the character-set the same? (eg utf-8 vs ISO 8859-1: Accented characters will be encoded differently)
Any of these things will affect the result and produce the failures you're describing.
This was a trikcy problem, and it ended up having nothing to do with the str_replace method itself;
We are using smarty as a tamplating system. The str_replace method was used by a smarty ouput filter in order to change the html in some ocassions, just before it was delivered to the user.
Here is the Smarty outputfilter Code:
function smarty_outputfilter_replace($tpl_source, &$smarty)
{
$replacements = Content::getReplacementsForPage();
if (is_array($replacements))
{
foreach ($replacements as $replacementData)
{
$tpl_source = str_replace($replacementData['sourcecode'], $replacementData['replacementcode'], $tpl_source);
}
}
return ($tpl_source);
}
So this code failed now and then for now apparent reason... until I realized that the HTML code in the smarty template was being manipulated by an Apache filter.
This resulted into the source code in the browser (which we were using as the code to be replaced by something else) not being identical to the template code (which smarty was trying to modify). Result? str_replace failed :)

Removing newlines from ob_get_clean() output

ok i do have this following codes
<?php
ob_start();
?>
codepad is an
online compiler/interpreter,
and a simple collaboration tool.
Paste
your code below,
and codepad wi
ll run
it and give you a short
URL you can use to share
it in chat or email
<?php
$str = str_replace('\r\n','',trim(ob_get_clean()));
echo $str;
?>
and you can see how it works here
http://codepad.org/DrOmyoY9
now what i want here is to remove the newlines from the stored output of the ob_get_clean().
I almost looked around the internet on how to remove newlines in the strings and that's the common and fastest method to remove the newlines aside from using the slowly preg_replace().
Why this happens? is this already a bug? or i just missed something?
\r\n is windows style, but if the user using linux or mac, it would be different . so best solution is:
$str = str_replace(array("\r","\n"),'',trim(ob_get_clean()));
I think u miss one thing, it should be :
$str = str_replace("\r\n",'',trim(ob_get_clean()));
using double quotes not single quotes

Convert &apos; to an apostrophe in PHP

My data has many HTML entities in it (• ...etc) including &apos;. I just want to convert it to its character equivalent.
I assumed htmlspecialchars_decode() would work, but - no luck. Thoughts?
I tried this:
echo htmlspecialchars_decode('They&apos;re here.');
But it returns: They&apos;re here.
Edit:
I've also tried html_entity_decode(), but it doesn't seem to work:
echo html_entity_decode('They&apos;re here.')
also returns: They&apos;re here.
Since &apos; is not part of HTML 4.01, it's not converted to ' by default.
In PHP 5.4.0, extra flags were introduced to handle different languages, each of which includes &apos; as an entity.
This means you can do something like this:
echo html_entity_decode('They&apos;re here.', ENT_QUOTES | ENT_HTML5);
You will need both ENT_QUOTES (convert single and double quotes) and ENT_HTML5 (or any language flag other than ENT_HTML401, so choose the most appropriate to your situation).
Prior to PHP 5.4.0, you'll need to use str_replace:
echo str_replace('&apos;', "'", 'They&apos;re here.');
There is a "right" way, without using str_replace , #cbuckley was right it's because the default for html_entity_decode is HTML 4.01, but you can set an HTML 5 parameter that will decode it.
Use it like this:
html_entity_decode($str,ENT_QUOTES | ENT_HTML5)
The &apos; entity and a lot of others are not in the PHP translation table used by html_entity_decode and htmlspecialchars_decode functions, unfortunately.
Check this comment from the PHP manual:
http://php.net/manual/en/function.get-html-translation-table.php#73410
This should work:
$value = "They&apos;re here.";
html_entity_decode(str_replace("&apos;","'",$value));
What you are actually looking for is html_entity_decode().
html_entity_decode() translates all entities to characters, while htmlspecialchars_decode() only reverses what htmlspecialchars() will encode.
EDIT: Looking at the examples on the page I linked to, I did a bit more investigation and the following seems to not work:
[matt#scharley ~]$ php
<?php
$tmp = array_flip(get_html_translation_table(HTML_ENTITIES));
var_dump($tmp['&apos;']);
PHP Notice: Undefined index: &apos; in - on line 3
NULL
This is why it's not working. Why it's not in the lookup table is another question entirely, something I can't answer unfortunately.
Have you tried using echo htmlspecialchars('They&apos;re here.')?
I think that is what you are looking for.

PHP: How to prevent unwanted line breaks

I'm using PHP to create some basic HTML. The tags are always the same, but the actual links/titles correspond to PHP variables:
$string = '<p style="..."><strong><i>'.$title[$i].'</i></strong>
<br>';
echo $string;
fwrite($outfile, $string);
The resultant html, both as echoed (when I view the page source) and in the simple txt file I'm writing to, reads as follows:
<p style="..."><a href="http://www.example.com
"><strong><i>Example Title
</i></strong></a></p>
<br>
While this works, it's not exactly what I want. It looks like PHP is adding a line break every time I interrupt the string to insert a variable. Is there a way to prevent this behavior?
Whilst it won't affect your HTML page at all with the line breaks (unless you are using pre or text-wrap: pre), you should be able to call trim() on those variables to remove newlines.
To find out if your variable has a newline at front or back, try this regex
var_dump(preg_match('/^\n|\n$/', $variable));
(I think you have to use single quotes so PHP doesn't turn your \n into a literal newline in the string).
My guess is your variables are to blame. You might try cleaning them up with trim: http://us2.php.net/trim.
The line breaks show up because of multi-byte encoding, I believe. Try:
$newstring = mb_substr($string_w_line_break,[start],[length],'UTF-8');
That worked for me when strange line breaks showed up after parsing html.

Searching through a string trying to find ' in PHP

I am using tinyMCE and, rather annoyingly, it replaces all of my apostrophes with their HTML numeric equivalent. Now most of the time this isn't a problem but for some reason I am having a problem storing the apostrophe replacement. So i have to search through the string and replace them all. Any help would be much appreciated
did you try:
$string = str_replace("'", "<replacement>", $string);
Is it just apostrophes that you want decoded from HTML entities, or everything?
print html_entity_decode("Hello, that's an apostophe.", ENT_QUOTE);
will print
Hello, that's an apostrophe.
Why work around the problem when you can fix the cause? You can just turn of the TinyMCE entity encoding*. More info: here
*Unless you want all the other characters encoded, that is.

Categories