I have the following line of php code in some of my pages
<?php include("contactform.php"); ?>
I have a crude CMS where I exchange lines of code for user manageable tags, my hope is to convert this line of code into [contact] so that people can add or remove it at their leisure. This is how far i've got...
i.e. $file = preg_replace('#<?php include("contactform.php"); ?>#i', "[contact]", $file);
$file looks something like this...
<h1 class="title">Title</h1>
<p>Text</p>
<?php include("contactform.php"); ?>
So the PHP code has not been stripped out by the server as we are editing the file and not viewing it.
I'm pretty new to PHP so I guess i'm being really stupid, is there a way to do this?
If you want to do 1:1 string replacements, then use the simpler str_replace
$file = str_replace('<?php include("contactform.php"); ?>', "[contact]", $file);
With a preg_replace you need to escape meta characters like ? and ( with backslashes:
$file = preg_replace('#<\?php include\("contactform.php"\); \?>#i', "[contact]", $file);
And using a regex would only provide any advantage if you want to make it more resilient of whitespace for example. Use \s+ instead of literal spaces in that case.
Related
I want to be able to remove all non php data from a string / file.
Now this preg_replace line works perfectly:
preg_replace('/\?>.*\<?/', '', $src); // Remove all non php data
BUT... problem is that it works only for the first match and not for all of the string/file...
Small tweak needed here ;)
It would be simpler the other way round:
preg_match_all('~<\?.+?\?>~s', $src, $m);
$php = implode('', $m[0]);
Matching non-php blocks is much trickier, because they can also occur before the first php block and after the last one: blah <? php ?> blah.
Also note that no regex solution can handle <?'s inside php strings, as in:
<? echo "hi ?>"; ?>
You have to use tokenizer to parse this correctly.
I have thousands of PHP pages which have a header and footer included using php, like
<?php include("header.php"); ?> //STATIC CONTENT HERE <?php include("footer.php"); ?>
I want to implement auto keyword linking for certain keywords in the static text. But I can only add PHP codes to my header or footer files.
This can be a complex operation. The steps:
Get all the files whose words you want to replace (glob)
Specify an array (or arrays) for the "find" and "replace" criteria
Iterate over the files returned by glob replacing the text as you go (preg_replace)
Write the new text (file_put_contents)
This sample code replaces all words in the $words array with a link to http://www.wordlink.com/<yourword>. If you need a different link for each word you'll need to specify $replace as an array using $1 where you want the searched word to appear in the replacement (and change $replace in the regex to $replace[$i]).
Also, the glob function below looks for all html files in the specified $filesDir directory. If you need something different you're going to have to manually edit the glob path yourself. Finally, the regular expression used only replaces whole words. i.e. if you wanted to replace the word super, the word superman will not have the word super replaced in the middle.
Oh, and the replace is NOT case sensitive as per the i modifier at the end of the pattern.
// specify where your static html files live
$filesDir = '/path/to/my/html/files/';
// specify where to save your updated files
$newDir = '/location/of/new/files/';
// get an array of all the static html files in $filesDir
$fileList = glob("$filesDir/*.html");
$words = array('super', 'awesome');
$replace = '$1';
// iterate over the html files.
for ($i=0; $i < count($fileList); $i++) {
$filePath = $filesDir . $fileList[$i];
$newPath = $newDir . $fileList[$i];
$html = file_get_contents($filePath);
$pattern = '#\b(' . str_replace('#', '\#', $words[$i]) . ')\b#i';
$html = preg_replace($pattern, $replace, $html);
file_put_contents($newPath, $html);
echo "$newpath file written\n";
}
Obviously, you need write-access to the new folder location. I would not recommend overwriting your original files. Translation:
Always backup before doing anything crazy.
P.S. the regexes are not UTF-8 safe, so if you're dealing with international characters you'll need to edit the regex pattern as well.
P.P.S. I'm really being kind here because SO is not a code-for-free site. Don't even think about commenting something like "it doesn't work" when I try it :) If it doesn't fit your specifications, feel free to peruse the php manual for the functions involved.
This is just an idea. I did a quick test and seems works...
<?php
include("header.php");
ob_start();
?>
//STATIC CONTENT HERE
<?php
$contents = ob_get_contents();
ob_end_clean();
// now you have all your STATIC CONTENT HERE into $contents var
// so you can use preg_replace on it to add your links
echo $contents_with_my_links;
include("footer.php");
?>
Indeed you should add this code to your current header/footer files.
OK. Its just an idea that solves the problem. As rdlowrey said this may be inefficient, but if you need replace keywords dynamically (with database based link, for instance) then this could be a good solution...
I need to remove the comment lines from my code.
preg_replace('!//(.*)!', '', $test);
It works fine. But it removes the website url also and left the url like http:
So to avoid this I put the same like preg_replace('![^:]//(.*)!', '', $test);
It's work fine. But the problem is if my code has the line like below
$code = 'something';// comment here
It will replace the comment line with the semicolon. that is after replace my above code would be
$code = 'something'
So it generates error.
I just need to delete the single line comments and the url should remain same.
Please help. Thanks in advance
try this
preg_replace('#(?<!http:)//.*#','',$test);
also read more about PCRE assertions http://cz.php.net/manual/en/regexp.reference.assertions.php
If you want to parse a PHP file, and manipulate the PHP code it contains, the best solution (even if a bit difficult) is to use the Tokenizer : it exists to allow manipulation of PHP code.
Working with regular expressions for such a thing is a bad idea...
For instance, you thought about http:// ; but what about strings that contain // ?
Like this one, for example :
$str = "this is // a test";
This can get complicated fast. There are more uses for // in strings. If you are parsing PHP code, I highly suggest you take a look at the PHP tokenizer. It's specifically designed to parse PHP code.
Question: Why are you trying to strip comments in the first place?
Edit: I see now you are trying to parse JavaScript, not PHP. So, why not use a javascript minifier instead? It will strip comments, whitespace and do a lot more to make your file as small as possible.
I am trying to get the page or last directory name from a url
for example if the url is: http://www.example.com/dir/ i want it to return dir or if the passed url is http://www.example.com/page.php I want it to return page Notice I do not want the trailing slash or file extension.
I tried this:
$regex = "/.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*/i";
$name = strtolower(preg_replace($regex,"$2",$url));
I ran this regex in PHP and it returned nothing. (however I tested the same regex in ActionScript and it worked!)
So what am I doing wrong here, how do I get what I want?
Thanks!!!
Don't use / as the regex delimiter if it also contains slashes. Try this:
$regex = "#^.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*$#i";
You may try tho escape the "/" in the middle. That simply closes your regex. So this may work:
$regex = "/.*\.(com|gov|org|net|mil|edu)\/([a-z_\-]+).*/i";
You may also make the regex somewhat more general, but that's another problem.
You can use this
array_pop(explode('/', $url));
Then apply a simple regex to remove any file extension
Assuming you want to match the entire address after the domain portion:
$regex = "%://[^/]+/([^?#]+)%i";
The above assumes a URL of the format extension://domainpart/everythingelse.
Then again, it seems that the problem here isn't that your RegEx isn't powerful enough, just mistyped (closing delimiter in the middle of the string). I'll leave this up for posterity, but I strongly recommend you check out PHP's parse_url() method.
This should adequately deliver:
substr($s = basename($_SERVER['REQUEST_URI']), 0, strrpos($s,'.') ?: strlen($s))
But this is better:
preg_replace('/[#\.\?].*/','',basename($path));
Although, your example is short, so I cannot tell if you want to preserve the entire path or just the last element of it. The preceding example will only preserve the last piece, but this should save the whole path while being generic enough to work with just about anything that can be thrown at you:
preg_replace('~(?:/$|[#\.\?].*)~','',substr(parse_url($path, PHP_URL_PATH),1));
As much as I personally love using regular expressions, more 'crude' (for want of a better word) string functions might be a good alternative for you. The snippet below uses sscanf to parse the path part of the URL for the first bunch of letters.
$url = "http://www.example.com/page.php";
$path = parse_url($url, PHP_URL_PATH);
sscanf($path, '/%[a-z]', $part);
// $part = "page";
This expression:
(?<=^[^:]+://[^.]+(?:\.[^.]+)*/)[^/]*(?=\.[^.]+$|/$)
Gives the following results:
http://www.example.com/dir/ dir
http://www.example.com/foo/dir/ dir
http://www.example.com/page.php page
http://www.example.com/foo/page.php page
Apologies in advance if this is not valid PHP regex - I tested it using RegexBuddy.
Save yourself the regular expression and make PHP's other functions feel more loved.
$url = "http://www.example.com/page.php";
$filename = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
Warning: for PHP 5.2 and up.
I am writing a comment-stripper and trying to accommodate for all needs here. I have the below stack of code which removes pretty much all comments, but it actually goes too far. A lot of time was spent trying and testing and researching the regex patterns to match, but I don't claim that they are the best at each.
My problem is that I also have situation where I have 'PHP comments' (that aren't really comments' in standard code, or even in PHP strings, that I don't actually want to have removed.
Example:
<?php $Var = "Blah blah //this must not comment"; // this must comment. ?>
What ends up happening is that it strips out religiously, which is fine, but it leaves certain problems:
<?php $Var = "Blah blah ?>
Also:
will also cause problems, as the comment removes the rest of the line, including the ending ?>
See the problem? So this is what I need...
Comment characters within '' or "" need to be ignored
PHP Comments on the same line, that use double-slashes, should remove perhaps only the comment itself, or should remove the entire php codeblock.
Here's the patterns I use at the moment, feel free to tell me if there's improvement I can make in my existing patterns? :)
$CompressedData = $OriginalData;
$CompressedData = preg_replace('!/\*.*?\*/!s', '', $CompressedData); // removes /* comments */
$CompressedData = preg_replace('!//.*?\n!', '', $CompressedData); // removes //comments
$CompressedData = preg_replace('!#.*?\n!', '', $CompressedData); // removes # comments
$CompressedData = preg_replace('/<!--(.*?)-->/', '', $CompressedData); // removes HTML comments
Any help that you can give me would be greatly appreciated! :)
If you want to parse PHP, you can use token_get_all to get the tokens of a given PHP code. Then you just need to iterate the tokens, remove the comment tokens and put the rest back together.
But you would need a separate procedure for the HTML comments, preferably a real parser too (like DOMDocument provides with DOMDocument::loadHTML).
You should first think carefully whether you actually want to do this. Though what you're doing may seem simple, in the worst case scenario, it becomes extremely complex problem (to solve with just few regular expressions). Let me just illustrate just of the few problems you would be facing when trying to strip both HTML and PHP comments from a file.
You can't straight out strip HTML comments, because you may have PHP inside the HTML comments, like:
<!-- HTML comment <?php echo 'Actual PHP'; ?> -->
You can't just simply separately deal with stuff inside the <?php and ?> tags either, since the ending thag ?> can be inside strings or even comments, like:
<?php /* ?> This is still a PHP comment <?php */ ?>
Let's not forget, that ?> actually ends the PHP, if it's preceded by one line comment. For example:
<?php // ?> This is not a PHP comment <?php ?>
Of course, like you already illustrated, there will be plenty of problems with comment indicators inside strings. Parsing out strings to ignore them isn't that simple either, since you have to remember that quotes can be escaped. Like:
<?php
$foo = ' /* // None of these start a comment ';
$bar = ' \' // Remember escaped quotes ';
$orz = " ' \" \' /* // Still not a comment ";
?>
Parsing order will also cause you headache. You can't just simply choose to parse either the one line comments first or the multi line comments first. They both have to be parsed at the same time (i.e. in the order they appear in the document). Otherwise you may end up with broken code. Let me illustrate:
<?php
/* // Multiline comment */
// /* Single Line comment
$omg = 'This is not in a comment */';
?>
If you parse multi line comments first, the second /* will eat up part of the string destroying the code. If you parse the single line comments first, you will end up eating the first */, which will also destroy the code.
As you can see, there are many complex scenarios you'd have to account, if you intend to solve your problem with regular expression. The only correct solution is to use some sort of PHP parser, like token_get_all(), to tokenize the entire source code and strip the comment tokens and rebuild the file. Which, I'm afraid, isn't entirely simple either. It also won't help with HTML comments, since the HTML is left untouched. You can't use XML parsers to get the HTML comments either, because the HTML is rarely well formed with PHP.
To put it short, the idea of what you're doing is simple, but the actual implementation is much harder than it seems. Thus, I would recommend trying to avoid doing this, unless you have a very good reason to do it.
One way to do this in REGEX is to use one compound expression and preg_replace_callback.
I was going to post a poor example but the best place to look is at the source code to the PHP port of Dean Edwards' JS packer script - you should see the general idea.
http://joliclic.free.fr/php/javascript-packer/en/
try this
private function removeComments( $content ){
$content = preg_replace( "!/\*.*?\*/!s" , '', $content );
$content = preg_replace( "/\n\s*\n/" , "\n", $content );
$content = preg_replace( '#^\s*//.+$#m' , "", $content );
$content = preg_replace( '![\s\t]//.*?\n!' , "\n", $content );
$content = preg_replace( '/<\!--.*-->/' , "\n", $content );
return $content;
}