I need to remove all dodgy html characters from a web-site I'm parsing using Curl and simplehtml dom.
<?php
$html = "this is a text";
var_dump($html);
var_dump(html_entity_decode($html,ENT_COMPAT,"UTF-8"));
Which outputs
string(19) "this is a text"
string(15) "this is a text"
I don't want to use preg* as there are other characters in the text (e.g. °).
This is driving me insane now!
Thanks,
James
You need to specify your output encoding with a header:
<?php
header('Content-Type: text/html; charset=utf-8');
$html = "this is a text";
var_dump($html);
var_dump(html_entity_decode($html,ENT_COMPAT,"UTF-8"));
?>
The browser does not assume UTF-8 by default, that's why it displays the wrong character.
If that's the only character that needs replacing just use str_replace()
var_dump(str_replace(' ', ' ', "this is a text"));
See it in action
Related
I am adding a pad to my string, to fill with spaces, but it doesn't work
the code is here
<?php
$string1 = "Product 1 ";
$newString = str_pad($string1,100);
echo $newString."test";
echo "<br>";
$string2 = "Product 2222 ";
echo str_pad($string2,100," ")."test";
echo "<br>";
?>
the output is like this:
Product 1 test
Product 2222 test
You could try $str = str_pad($string2,(100*strlen(" "))," ")."test"; instead.
renders to a non-breaking-space in html (and when writing to document with fpdf).
Please note this can only work with fpdf when you tell it to write all lines as html! And the encoding should be utf-8 probably
$fpdf->Write(iconv('UTF-8', 'windows-1252', html_entity_decode($str)));
When the output of the PHP is converted to HTML, all the white spaces except the first are removed and it is the default feature of HTML and web browsers. so the output will not be correct.
You have to use the " " instead of white space in the str_pad function. HTML don't ignore the " " and against each existance of it, HTML adds a white space to the string.
I'm just running some example PHP code verbatim, but it's outputting as a single line in my browser. I'm expecting to see new multiple lines.
<?php
$author = "Alfred E Newman";
echo <<<_END
This is a Headline
This is the first line.
This is the second.
- Written by $author.
_END;
?>
Your browser by default assumes that any output is HTML and when displaying HTML, newline characters are treated like spaces. You'd either need to output HTML with BR or P tags to force newlines or you can send a content-type header to tell the browser that the output you are sending is plain text.
<?php
$author = "Alfred E Newman";
// tell the browser that your output is plain text
header("Content-Type: text/plain");
echo <<<_END
This is a Headline
This is the first line.
This is the second.
- Written by $author.
_END;
?>
<?php
$author = "Alfred E Newman";
$str = "This is a Headline
This is the first line.
This is the second.
- Written by $author.
";
echo nl2br($str);
?>
will give you what you need;
This question already has answers here:
Converting <br /> into a new line for use in a text area
(6 answers)
Closed 5 years ago.
I have a text with <br> tags and I want to save it into MySQL database as a new line. not HTML tags.
for example :
$string = 'some text with<br>tags here.'
and I want to save it into MySQL like this :
some text with
tags here
what right str_replace for this purpose? thank you.
There is already a function in PHP that converts a new line to a br called nl2br(). However, the reverse is not true. Instead you can create your own function like this:
function br2nl($string)
{
$breaks = array("<br />","<br>","<br/>");
return str_ireplace($breaks, "\r\n", $string);
}
Then whenever you want to use it, just call it as follows:
$original_string = 'some text with<br>tags here.';
$good_string = br2nl($original_string);
There are three things worth mentioning:
It may be better to store the data in the database exactly as the user entered it and then do the conversion when you retrieve it. Of course this depends what you are doing.
Some systems such as Windows use \r\n. Some systems such as Linux and Mac use \n. Some systems such as older Mac systems user \r for new line characters. Given this and especially if you choose to use point 1. above, you might prefer to use the PHP constant PHP_EOL instead of \r\n. This will give the correct new line character no matter what system you are on.
The method I posted above will be more efficient than preg_replace. However, it does not take into account non-standard HTML such as <br /> and other variations. If you need to take into account these variations then you should use the preg_replace() function. With that said, one can overthink all the possible variations and yet still not account for them all. For example, consider <br id="mybreak"> and many other combinations of attributes and white space.
You could use str_replace, as you suggest.
$string = 'some text with<br>tags here.';
$string = str_replace('<br>', "\r\n", $string);
Although, if your <br> tags may also be closed, <br /> or <br/>, it may be worth considering using preg_replace.
$string = 'some text with<br>tags here.';
$string = preg_replace('/<br(\s+\/)?>/', "\r\n", $string);
Here try this. This will replace all <br> to \r\n.
$string = 'some text with<br>tags here.';
str_replace("<br>","\r\n",$string);
echo $string;
Output:
some text with
tags here.
You can use htmlentities— Convert all HTML characters to entities and html_entity_decode to Convert HTML entities to characters
$string = 'some text with<br>tags here'
$a = htmlentities($string);
$b = html_entity_decode($a);
echo $a; // some text with<br>tags here
echo $b; // some text with<br>tags here
Try :
mysql_real_escape_string
function safe($value){
return mysql_real_escape_string($value);
}
I have the below string (in Turkish):
$string = "Otpor lideri Maroviç: Gezi eylemcileriyle temasımız olmadı";
However, when I attempt to echo the string, I get the below result:
Otpor lideri Maroviç: Gezi eylemcileriyle temas?m?z olmad?
How can I solve this problem?
First of all:
$string = "Otpor lideri Maroviç: Gezi eylemcileriyle temasımız olmadı";
echo htmlentities($string);
And make sure to add... to your head
<meta charset="UTF-8">
header('Content-type: text/plain; charset=utf-8');
$string = "Otpor lideri Maroviç: Gezi eylemcileriyle temasımız olmadı";
echo $string;
Open your file with code editor ex. Notepad++ and use "Convert to UTF-8" function.
This should help. Here in Poland we are also using special characters and this is a common problem.
I am having trouble working out how to do this, I have a string looks something like this...
$text = "<p>This is some example text This is some example text This is some example text</p>
<p><em>This is some example text This is some example text This is some example text</em></p>
<p>This is some example text This is some example text This is some example text</p>";
I basically want to use something like preg_repalce and regex to remove
<em>This is some example text This is some example text This is some example text</em>
So I need to write some PHP code that will search for the opening <em> and closing </em> and delete all text in-between
hope someone can help,
Thanks.
$text = preg_replace('/([\s\S]*)(<em>)([\s\S]*)(</em>)([\s\S]*)/', '$1$5', $text);
In case if you are interested in a non-regex solution following would aswell:
<?php
$text = "<p>This is some example text This is some example text This is some example text</p>
<p><em>This is some example text This is some example text This is some example text</em></p>
<p>This is some example text This is some example text This is some example text</p>";
$emStartPos = strpos($text,"<em>");
$emEndPos = strpos($text,"</em>");
if ($emStartPos && $emEndPos) {
$emEndPos += 5; //remove <em> tag aswell
$len = $emEndPos - $emStartPos;
$text = substr_replace($text, '', $emStartPos, $len);
}
?>
This will remove all the content in between tags.
$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';
preg_match("#<em>(.+?)</em>#", $text, $output);
echo $output[0]; // This will output it with em style
echo '<br /><br />';
echo $output[1]; // This will output only the text between the em
[ View output ]
For this example to work, I changed the <em></em> contents a little, otherwise all your text is the same and you cannot really understand if the script works.
However, if you want to get rid of the <em> and not to get the contents:
$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';
echo preg_replace("/<em>(.+)<\/em>/", "", $text);
[ View output ]
Use strrpos to find the first element and
then the last element.
Use substr to get the part of string.
And then replace the substring with empty string from original string.
format: $text = str_replace('<em>','',$text);
$text = str_replace('</em>','',$text);