str_replace inline script code from html in php not working - php

I have a html page stored in the mysql database. I get the html from the database and try to replace some of the inline javascript code from the html content. I tried using str_replace() but it does not replace the inline javascript code. I can replace other html content like divs but not inline javascript code.
How can I do find and replace the inline javascript code?

PHP should be seeing the entire HTML page as a big string, so in theory, it should be able to alter JS and HTML alike. Is it possible the string still has slashes, and your str_replace can't find the search criteria due to the slashes?
Try printing the entirety of the string to the screen to make sure, and if it does still have slashes, use a stripslashes($string) call to get rid of them.

You probably want to use a DOM parser to handle your webpage as a DOM structure, not a serialised string of HTML (where things like string replacement and regular expressions can be troublesome).

Related

safely load HTML from user into textarea

I'm using TinyMCE 4 on a project, where I need to be able to pre-populate the textarea with HTML that was submitted through POST (for server-side error handling without deleting all their work) I know that a textarea works mostly like a tag, in that HTML inside is not parsed into DOM, so most sites show the demo:
<textarea name="demo"><?=$_POST['demo']?></textarea>
but what happens when a user submits HTML that includes an unmatched <textarea> or </textarea> tag?
Is there a standard way to manage this risk?
use htmlspecialchars($_POST['demo']) in php when outputing
Remove only the <textarea> tags from the user input. Please see this post using regular expressions. It tells you how to remove only certain tags (unlike htmlentities) which removes all tags.
Use xmp tag instead of textarea. It will display html as itself.
Eg: http://dadinck.x10.mx/xmp.html
htmlentities function will replace every html caracter (such as <) to one that will display correctly but wont break your html.
http://www.php.net/manual/en/function.htmlentities.php

Extract Table as Text Using Php

I'm looking for a simple method to get the first table of a webpage and put the whole thing into a string, that is all.
So I need to know how to use preg_match or similar to get the first instance of a table from a DOM object and get that whole thing into a string:
I have a class to download webpages as DOM but I cannot convert the html to a string as I need it..
$nodes = $this->bot->QuerySelector($this->download['DOM'], "//table[1][#class='tyebfghjftsdf-ccfkk']");
Please help
I would use Tidy to convert page to valid XHTML, then read it using XML reader (not building DOM) and start echoing data when tag is found and terminate on tag. No regular expressions involved.

Ignore html tags when loading page

Is there a simple php script ignoring html content in a database and not loading it using php?
Like: don't load images, or anchors, or elements with class=""...
Best Regards
use function strip_tag('your content here'). It will remove all HTML tags from your content and gives pure text base output.
http://php.net/manual/en/function.strip-tags.php
I think you're looking for strip_tags(). It removes HTML tags from a text. You can also specify list of tags to keep.
Have to say, parsing text content from a HTML page requires more complex operation.
You can strip_tags()
This function tries to return a string with all NUL bytes, HTML and
PHP tags stripped from a given str. It uses the same tag stripping
state machine as the fgetss() function.

htmlentities displaying html safely

I have data that is coming in from a rss feed. I want to be safe and use htmlentities but then again if I use it if there is html code in there the page is full of code and content. I don't mind the formatting the rss offers and would be glad to use it as long as I can display it safely. I'm after the content of the feed but also want it to format decently too (if there is a break tag or paragraph or div) Anyone know a way?
Do you want to protect from XSS in the feed? If so, you'll need an HTML sanitizer to run on the HTML prior to displaying it:
HTMLSanitizer
HTMLPurifier
If you just want to escape whatever is there, just call htmlspecialchars() on it. But any HTML will appear as escaped text...
You can use the strip_tags tags function and specify the allowed tags in there:
echo strip_tags($content, '<p><a>');
This way any tag not specified in allowed tags will be removed.
You can transform the HTML into mark down and then back up again using various libraries.

Is there a token for capture line breaks in multiline regex?

I've run into this problems several times before when trying to do some html scraping with php and the preg* functions.
Most of the time I've to capture structures like that:
<!-- comment -->
<tag1>lorem ipsum</tag>
<p>just more text with several html tags in it, sometimes CDATA encapsulated…</p>
<!-- /comment -->
In particular I want something like this:
/<tag1>(.*?)<\/tag1>\n\n<p>(.*?)<\/p>/mi
but the \n\n doesn't look like it would work.
Is there a general line-break switch?
I think you could replace the \n\n with (\r?\n){2} this way you capture the CRLF pair instead of just the LF char.
Are you sure you want to parse HTML using regexps ? HTML isn't regular and there are too many corner cases.
I would investigate some form of HTML parser (perhaps this one ?), and then identify the pattern you're interested in via the returned HTML data structure.
Or you could look at the Dom Extension to php. It has a function to load html from a string or a file. You can then use the php dom methods to traverse the dom and find the data you are interested in.

Categories