PHP HTMLEntities for specific elements - php

I am working with htmlentites() to replace characters with safe html, namely the PHP because I want to be able to show PHP examples. The problem I am running into is all the tags are replaced (ex: </p>, </br> etc). I know I can write a custom htmentities to only replace the <? tags and other specific xml tags but I was wondering is there something in PHP that already does this?

I would use the highlight_string() function in PHP as it will escape the characters and syntax highlight the code.

Related

PHP strings omit text because of PHP short tags wrap it

A simple code sample <?echo '<?this text is ignored?> this text is shown';?> writes just "this text is shown" and ignores the text inside php tags despite it's a string.
Unfortunately, I can't find any explanation in specs, so, how to handle this? I know, that we can escape special symbols and everything goes all right, but what's the matter of such php behaviour?
PHP 5.3, local server.
This behaviour prevents from reading lines from php files inside zip via zip_entry_read() and usage of eval() then.
PHP is not ignoring the text inside your inner <? .. ?>, your browser is ignoring it.
Anything you put inside of angle braces is an HTML tag as far as your browser is concerned.
I'm not sure what you were expecting, but if you want the tags to show in the browser you have to replace the opening < with <
If you actually wanted to execute the code inside the inner php tags you can just go ahead and remove the inner tags as they are redundant.

PHP: htmlentities/strip_tags

I've been re-writing my website lately and added a Syntax highlighter so that I can post code snippets. Before, all I did was htmlentities() the string so that it would be safe and not break anything, but now that I have to use a <pre> to highlight code, htmlentites() effectively removes the syntax highlighting from the page. I've been trying to come up with a function that will just perform an htmlentites() on anything between two tags (<entitiesparse> </entitiesparse>) but nothing seems to work. Does anyone know of a function that I can either:
a) Set it to htmlentities() everything but specific tags (like strip_tags())
OR
b) Only htmlentities() things in certain tags (As mentioned above)
You only need to apply htmlentities() to the raw content. So you can apply htmlentities() to the raw content (the article text) and then invoke a function to add syntax highlighting after that. So long as you check that your syntax highlighting code cannot introduce unexpected nasties, you don't need to call htmlentities() again.
And if you're saying that you use the a element to highlight code, I strongly suggest you use the code element instead, which is designed to provide markup for lines or blocks of programming code. The a element should only be used as an anchor for a hyperlink.
For instance, you could use
<code class="highlighted-code">/* line of code here /*</code>
Then you could use a cascading style sheet to provide background colour for any element of type code with class equal to "highlighted-code", for instance:
code.highlighted-code {background-color: yellow}

Specify iframe Link using Regex

Problem:
I need to confirm that iframe have one type of link with the following format:
http://www.example.com/embed/*****11 CHARACTERS MAX.****?rel=0
Starts with: http://www.example.com/embed/
Ends with: ?rel=0
11 CHARACTERS MAX. means in this spot, there can any 11 characters. Don't go beyond 11.
NOTE: none of the specified tags are ensured to be in every post. It depends on how user uses the editor.
I'm using PHP
I used the line below to make sure all tags are excluded except the ones specified:
$rtxt_offer = preg_replace('#<(?!/?(u|br|iframe)\b)[^>]+>#', '', $rtxt_offer);
You wrote you only want to validate the link value with a regular expression:
$doesMatch = preg_match('~^http://www.example.com/embed/[^?]{0,11}\?rel=0$~', $link);
This does specifically what you're asking for.
For removing tags please see strip_tags or use a HTML parser to do it, which will also help you to get the link value more properly.
In a similar question/answer I posted some example code how to use strip_tags and SimpleXMLElement together: Extract all the text and img tags from HTML in PHP.
First of all, there is built-in function in PHP that strips tags for you: http://php.net/manual/en/function.strip-tags.php no need to use slow regex here.
Steps you'll need to solve your problem:
Parse this text as DomDocument
Get iframe node from it
Get src attribute from iframe and parse it with parse_url
Now you can perform easy checks on all components returned by parse_url
Happy coding

str_replace inline script code from html in php not working

I have a html page stored in the mysql database. I get the html from the database and try to replace some of the inline javascript code from the html content. I tried using str_replace() but it does not replace the inline javascript code. I can replace other html content like divs but not inline javascript code.
How can I do find and replace the inline javascript code?
PHP should be seeing the entire HTML page as a big string, so in theory, it should be able to alter JS and HTML alike. Is it possible the string still has slashes, and your str_replace can't find the search criteria due to the slashes?
Try printing the entirety of the string to the screen to make sure, and if it does still have slashes, use a stripslashes($string) call to get rid of them.
You probably want to use a DOM parser to handle your webpage as a DOM structure, not a serialised string of HTML (where things like string replacement and regular expressions can be troublesome).

htmlentities displaying html safely

I have data that is coming in from a rss feed. I want to be safe and use htmlentities but then again if I use it if there is html code in there the page is full of code and content. I don't mind the formatting the rss offers and would be glad to use it as long as I can display it safely. I'm after the content of the feed but also want it to format decently too (if there is a break tag or paragraph or div) Anyone know a way?
Do you want to protect from XSS in the feed? If so, you'll need an HTML sanitizer to run on the HTML prior to displaying it:
HTMLSanitizer
HTMLPurifier
If you just want to escape whatever is there, just call htmlspecialchars() on it. But any HTML will appear as escaped text...
You can use the strip_tags tags function and specify the allowed tags in there:
echo strip_tags($content, '<p><a>');
This way any tag not specified in allowed tags will be removed.
You can transform the HTML into mark down and then back up again using various libraries.

Categories