One regex instead of two? - php

I'm trying to get rid of php code in a file using regex. Some of the php is not well-formatted, so that there may be extra spaces and/or line breaks. As an example:
<?php require_once('some_sort_of_file.php');
?>
I've come up with the following regex which seems to work:
$initial_text = preg_replace('/\s+/', ' ', $initial_text );
$initial_text = preg_replace('/' . preg_quote('<?php') . '.*?' . preg_quote('?>') . '/', '', $initial_text);
but was wondering if there might be a way to just use 1 regex statement, in order to speed things up.
Thanks!

An even better way to do it: use the built-in tokenizer. Regexes have problems with parsing irregular languages like PHP. The tokenizer, on the other hand, parses PHP code just like PHP itself does.
Sample code:
// some dummy code to play with
$myhtml = '<html>
<body>foo bar
<?php echo "hello world"; ?>
baz
</body>
</html>';
// Our own little function to do the heavy lifting
function strip_php($text) {
// break the code into tokens
$tokens = token_get_all($text);
// loop over the tokens
foreach($tokens as $index => $token) {
// If the token is not an array (e.g., ';') or if it is not inline HTML, nuke it.
if(!is_array($token) || token_name($token[0]) !== 'T_INLINE_HTML') {
unset($tokens[$index]);
}
else { // otherwise, echo it or do whatever you want here
echo $token[1];
}
}
}
strip_php($myhtml);
Output:
<html>
<body>foo bar
baz
</body>
</html>
DEMO

you can put it as a single regex using the s modifier which will allow the dot to match newline chars too. I added the i modifier too to make it case-insensitive.. dunno if you care about that:
$initial_text = preg_replace('~<\?php.*?\?>~si', '', $initial_text );

Related

Keep all html whitespaces in php mysql

i want to know how to keep all whitespaces of a text area in php (for send to database), and then echo then back later. I want to do it like stackoverflow does, for codes, which is the best approach?
For now i using this:
$text = str_replace(' ', '&nbs p;', $text);
It keeps the ' ' whitespaces but i won't have tested it with mysql_real_escape and other "inject prevent" methods together.
For better understanding, i want to echo later from db something like:
function jack(){
var x = "blablabla";
}
Thanks for your time.
Code Blocks
If you're trying to just recreate code blocks like:
function test($param){
return TRUE;
}
Then you should be using <pre></pre> tags in your html:
<pre>
function test($param){
return TRUE;
}
</pre>
As plain html will only show one space even if multiple spaces/newlines/tabs are present. Inside of pre tags spaces will be shown as is.
At the moment your html will look something like this:
function test($param){
return TRUE;
}
Which I would suggest isn't desirable...
Escaping
When you use mysql_real_escape you will convert newlines to plain text \n or \r\n. This means that your code would output something like:
function test($param){\n return TRUE;\n}
OR
<pre>function test($param){\n return TRUE;\n}</pre>
To get around this you have to replace the \n or \r\n strings to newline characters.
Assuming that you're going to use pre tags:
echo preg_replace('#(\\\r\\\n|\\\n)#', "\n", $escapedString);
If you want to switch to html line breaks instead you'd have to switch "\n" to <br />. If this were the case you'd also want to switch out space characters with - I suggest using the pre tags.
try this, works excellently
$string = nl2br(str_replace(" ", " ", $string));
echo "$string";

Finding and replacing attributes using preg_replace

I am trying to redo some forms that have uppercase field names and spaces, there are hundreds of fields and 50 + forms... I decided to try to write a PHP script that parses through the HTML of the form.
So now I have a textarea that I will post the html into and I want to change all the field names from
name="Here is a form field name"
to
name="here_is_a_form_field_name"
How in one command could I parse through and change it so all in the name tags would be lowercase and spaces replace with underscores
I am assuming preg_replace with an expression?
Thanks!
I would suggest not using regex for manipulation of HTML .. I would use DOMDocument instead, something like the following
$dom = new DOMDocument();
$dom->loadHTMLFile('filename.html');
// loop each textarea
foreach ($dom->getElementsByTagName('textarea') as $item) {
// setup new values ie lowercase and replacing space with underscore
$newval = $item->getAttribute('name');
$newval = str_replace(' ','_',$newval);
$newval = strtolower($newval);
// change attribute
$item->setAttribute('name', $newval);
}
// save the document
$dom->saveHTML();
An alternative would be to use something like Simple HTML DOM Parser for the job - there are some good examples on the linked site
I agree that preg_replace() or rather preg_replace_callback() is the right tool for the job, here's an example of how to use it for your task:
preg_replace_callback('/ name="[^"]"/', function ($matches) {
return str_replace(' ', '_', strtolower($matches[0]))
}, $file_contents);
You should, however, check the results afterwards using a diff tool and fine-tune the pattern if necessary.
The reason why I would recommend against a DOM parser is that they usually choke on invalid HTML or files that contain for example tags for templating engines.
This is your Solution:
<?php
$nameStr = "Here is a form field name";
while (strpos($nameStr, ' ') !== FALSE) {
$nameStr = str_replace(' ', '_', $nameStr);
}
echo $nameStr;
?>

PHP using prefix tags to linkify text

I'm trying to write a code library for my own personal use and I'm trying to come up with a solution to linkify URLs and mail links. I was originally going to go with a regex statement to transform URLs and mail addresses to links but was worried about covering all the bases. So my current thinking is perhaps use some kind of tag system like this:
l:www.google.com becomes http://www.google.com and where m:john.doe#domain.com becomes john.doe#domain.com.
What do you think of this solution and can you assist with the expression? (REGEX is not my strong point). Any help would be appreciated.
Maybe some regex like this :
$content = "l:www.google.com some text m:john.doe#domain.com some text";
$pattern = '/([a-z])\:([^\s]+)/'; // One caracter followed by ':' and everything who goes next to the ':' which is not a space or tab
if (preg_match_all($pattern, $content, $results))
{
foreach ($results[0] as $key => $result)
{
// $result is the whole matched expression like 'l:www.google.com'
$letter = $results[1][$key];
$content = $results[2][$key];
echo $letter . ' ' . $content . '<br/>';
// You can put str_replace here
}
}

my php function strip_tags is not working according to my expectations

I am taking input as comments in my website. where i want few html tags to allow like
<h2>, <h3>, so on. . .
and few to ban.
But i am also using a function which check the part of string and replace it with smilies
let us say '<3' for heart and ':D' for lol
When i use function sanitizeHTML() which is following
public function sanitizeHTML($inputHTML, $allowed_tags = array('<h2>', '<h3>', '<p>', '<br>', '<b>', '<i>', '<a>', '<ul>', '<li>', '<blockquote>', '<span>', '<code>', '<img>')) {
$_allowed_tags = implode('', $allowed_tags);
$inputHTML = strip_tags($inputHTML, $_allowed_tags);
return preg_replace('#<(.*?)>#ise', "'<' . $this->removeBadAttributes('\${1}1') . '>'", $inputHTML);
}
function removeBadAttributes($inputHTML) {
$bad_attributes = 'onerror|onmousemove|onmouseout|onmouseover|' . 'onkeypress|onkeydown|onkeyup|javascript:';
return stripslashes(preg_replace("#($bad_attributes)(\s*)(?==)#is", 'SANITIZED ', $inputHTML));
}
It remove bad attributes and allow only valid tags but when string like <3 for heart come this function remove the part of string after <3 .
Note :
The smilies code which do not have html special chars < or > sign work fine.
You're using PCRE to parse html, which is never a good idea. The expression <(.*?)> will match everything from < up to the next >. You need something more like <[^>]+>. However, that still has problems (and will capture <3). You could use a negative lookahead (<(?!3)[^>]+>) to handle that specific case, but there are a lot of other cases to consider. You may want to consider using a DOM parser instead.

Replacing Placeholder text containing a variable

Hi I have placeholder text in my content from the CMS like this:
$content = "blah blah blah.. yadda yadda, listen to this:
{mediafile file=audiofile7.mp3}
and whilst your here , check this: {mediafile file=audiofile24.mp3}"
and i need to replace the placeholders with some html to display the swf object to play the mp3.
How do i do a replace that gets the filename from my placeholder.
I think the regx pattern is {mediafile file=[A-Za-z0-9_]} but then how do i apply that to the whole variable containing the markers?
Thanks very much to anyone that can help,
Will
Here is a quick example, using preg_replace_all, to show how it works :
if $content is declared this way :
$content = "blah blah blah.. {mediafile file=img.jpg}yadda yadda, listen to this:
{mediafile file=audiofile7.mp3}
and whilst your here , check this: {mediafile file=audiofile24.mp3}";
You can replace the placeholders with something like this :
$new_content = preg_replace_callback('/\{mediafile(.*?)\}/', 'my_callback', $content);
var_dump($new_content);
And the callback function might look like this :
function my_callback($matches) {
$file_full = trim($matches[1]);
var_dump($file_full); // string 'file=audiofile7.mp3' (length=19)
// or string 'file=audiofile24.mp3' (length=20)
$file = str_replace('file=', '', $file_full);
var_dump($file); // audiofile7.mp3 or audiofile24.mp3
if (substr($file, -4) == '.mp3') {
return '<SWF TAG FOR #' . htmlspecialchars($file) . '#>';
} else if (substr($file, -4) == '.jpg') {
return '<img src="' . htmlspecialchars($file) . '" />';
}
}
Here, the last var_dump will get you :
string 'blah blah blah.. <img src="img.jpg" />yadda yadda, listen to this:
<SWF TAG FOR #audiofile7.mp3#>
and whilst your here , check this: <SWF TAG FOR #audiofile24.mp3#>' (length=164)
Hope this helps :-)
Don't forget to add checks and all that, of course ! And your callback function will most certainly become a bit more complicated ^^ but this should give you an idea of what is possible.
BTW : you might want to use create_function to create an anonymous function... But I don't like that : you've got to escape stuff, there is no syntax-highlighting in the IDE, ... It's hell with a big/complex function.
I originally was thinking you could use a function that involved json_decode, but strings need to be wrapped in quotes or json_decode doesn't handle them. So if your placeholders were written:
{"mediafile" : "file" : "blahblah.mp3"}
you could change my sample code from using explode($song) to json_decode($song, true) and have a nice keyed array to work with.
Either way, I went with using the strtok function to find the placeholders, and then a basic string replace function to change the instances of the found placeholders into html, which is just gibberish.
strtok, so far as PHP docs indicate, does not use regex, so this would be not only simpler but also avoid a call to the preg library.
One last thing. If you do go with json syntax, you will have to re-wrap the placeholders in{} as strtok removes the tokens it is searching by.
<?php
$content = "blah blah blah.. yadda yadda, listen to this:
{mediafile file=audiofile7.mp3}
and whilst your here , check this: {mediafile file=audiofile24.mp3}";
function song2html($song) {
$song_info = explode("=", $song);
$song_url = $song_info[1];
$song_html = "<object src=\"$song_url\" blahblahblah>blah</object>";
return ($song_html);
}
$tok = strtok($content, "{}");
while ($tok !== false) {
if(strpos($tok, "mediafile") !== false) {
$songs[] = $tok;
}
$tok = strtok("{}");
}
foreach($songs as $asong) {
$content = str_replace($asong, song2html($asong), $content);
}
echo $content;
?>
Read the regex docs carefully.
your pattern looks a little off. {mediafile file=([^}]+)} might be ore like what you're looking for (the regex you gave doesn't allow for ".").
you do something like that
$content = preg_replace_callback(
'|{mediafile file='([A-Za-z0-9_.]+)}|',
create_function(
// single quotes are essential here,
// or alternative escape all $ as \$
'$matches',
'return "<embed etc ... " . ($matches[1]) ."more tags";'
),
$content
);
you can see the manual of preg_replace_callback. Normal preg_replace also work but might be a messy.

Categories