Unwrap / amalgamate PHP code from several .php files - php

For debugging purposes, when working on PHP projects with many file / many include (example: Wordpress code), I would sometimes be interested in seeing the "unwrapped" code, and to amalgamate / flatten ("flatten" is the terminology used in Photoshop-like tools when you merge many layers into one layer) all files into one big PHP file.
How to do an amalgamation of multiple PHP files?
Example:
$ php index.php --amalgamation
would take these files as input:
vars.php
<?php
$color = 'green';
$fruit = 'apple';
?>
index.php
<?php
include 'vars.php';
echo "A $color $fruit";
?>
and produce this amalgamated output:
<?php
$color = 'green';
$fruit = 'apple';
echo "A $color $fruit";
?>
(it should work also with many files, e.g. if index.php includes vars.php which itself includes abc.php).

We can write an amalgamation/bundling script that fetches a given file's contents and matches any instances of include|require, and then fetches any referred files' contents, and substitutes the include/require calls with the actual code.
The following is a rudimentary implementation that will work (based on a very limited test on files with nested references) with any number of files that include/require other files.
<?php
// Main file that references further files:
$start = 'test/test.php';
function bundle_files(string $filepath)
{
// Fetch current code
$code = file_get_contents($filepath);
// Set directory for referred files
$dirname = pathinfo($filepath, PATHINFO_DIRNAME);
// Match and substitute include/require(_once) with code:
$rx = '~((include|require)(_once)?)\s+[\'"](?<path>[^\'"]+)[\'"];~';
$code = preg_replace_callback($rx, function($m) use ($dirname) {
// Ensure a valid filepath or abort:
if($path = realpath($dirname . '/' . $m['path'])) {
return bundle_files($path);
} else {
die("Filepath Read Fail: {$dirname}/{$m['path']}");
}
}, $code);
// Remove opening PHP tags, note source filepath
$code = preg_replace('~^\s*<\?php\s*~i', "\n// ==== Source: {$filepath} ====\n\n", $code);
// Remove closing PHP tags, if any
$code = preg_replace('~\?>\s*$~', '', $code);
return $code;
}
$bundle = '<?php ' . "\n" . bundle_files($start);
file_put_contents('bundle.php', $bundle);
echo $bundle;
Here we use preg_replace_callback() to match and substitute in order of appearance, with the callback calling the bundling function on each matched filepath and substituting include/require references with the actual code. The function also includes a comment line indicating the source of the included file, which may come in handy if/when you're debugging the compiled bundle file.
Notes/Homework:
You may need to refine the base directory reference routine. (Expect trouble with "incomplete" filepaths that rely on PHP include_path.)
There is no control of _once, code will be re-included. (Easy to remedy by recording included filepaths and skipping recurrences.)
Matching is only made on "path/file.php", ie. unbroken strings inside single/double quotes. Concatenated strings are not matched.
Paths including variables or constants are not understood. Files would have to be evaluated, without side-effects!, for that to be possible.
If you use declare(strict_types=1);, place it atop and eliminate following instances.
There may be other side-effects from the bundling of files that are not addressed here.
The regex does no lookbehind/around to see if your include/require is commented out!
If your code jumps in and out of PHP mode and blurts out HTML, all bets are off
Managing the inclusion of autoloaded classes is beyond this snippet.
Please report any glitches and edge cases. Feel free to develop and (freely) share.

Related

Importing a php.inc file into a PERL program

I'm trying to use 1 include file for both perl and php
Is there a nice way to import a myphp.inc file within perl?
$ cat myphp.inc
<?php
$some_var="hello world";
?>
Using the above in my test.php works fine:
include "myphp.inc";
If I remove the < ? php then test.php will just print out the contents of the myphp.inc file... if I leave them in then my perl programs complains with:
Unterminated <> operator
I've seen the perl module: PHP::Include but I would like to stay away from external modules if possible.
Anyone have ideas on doing this??
Don't try to write code that is both PHP and Perl, they are different languages, even if they have some shared ancestry. If you want to share data between the two, then use a structured data format. JSON is a popular flavour. PHP has parse_json and Perl has the JSON module.
I would like to stay away from external modules if possible
Code reuse is a virtue … although there is nothing stopping you reimplementing the modules from scratch.
Ideally you would store the shared/configuration data in a format easily readable in both PHP and Perl. XML, JSON, or a simple text file with key-value pairs (as in the .ini file that simbabque suggests) would work great.
If you are determined to read the PHP file in Perl but you do not want to use a module such as PHP::Include then you are left with writing something like this:
use IO::File;
sub require_php {
my $source_filename = $_[0];
my $dest_filename = 'temp.inc.pl';
open my $source, $source_filename or die "Could not open $source_filename: $!";
open my $destination, '>>'.$dest_filename or die "Cound not open file for writing";
while(my $line = <$source>) {
if(index($line,'<?php')==-1 && index($line,'?>')==-1) {
print $destination $line
}
}
close $destination;
close $source;
require $dest_filename;
unlink $dest_filename;
}
our $some_var = '';
require_php('myphp.inc');
which will end with $some_var having the value of "hello world".

unlink files with a case-insensitive (glob-like) pattern

I have two folders, in one i have the videos and in the second one the configuration files for each video(3 files per video). Now if i want to delete a video i have to delete files by hand.
I found this :
<?php
$filename = 'name.of.the.video.xml';
$term = str_replace(".xml","", $filename);
$dirPath = ("D:/test/");
foreach (glob($dirPath.$term.".*") as $removeFile)
{
unlink ($removeFile);
}
?>
A echo will return:
D:/test/name.of.the.video.jpg
D:/test/name.of.the.video.srt
D:/test/name.of.the.video.xml
Is ok and it help me a lot, but i have a problem here.
Not all files are the same ex:
Name.of.The.video.jpg
Name.Of.The.Video.xml
If i echo the folder looking for that string and is not identic with the $filename will return empty.
So, my question is, how can i make that search Case insensitive?
Thank you.
You are making use of the glob function which is case sensitive. You are using the wrong function therefore to get the list of files.
You should therefore first normalize the filenames in the directory so they all share the same case (e.g. all lowercase). Or you need to use another method to get the directory listing case-insensitive. I suggest the first, however if that is not an option, why don't you glob for all files first and then filter the list of files using preg_grep which allows to specify patterns that are case-insensitive?
Which leads me to the point that it's more practicable to use DirectoryIterator with a RegexIterator:
$filename = 'name.of.the.video.xml';
$term = basename($filename, ".xml");
$files = new DirectoryIterator($dirPath);
$filesFiltered = new RegexIterator($files, sprintf('(^%s\\..*$)i', preg_quote($term)));
foreach($filesFiltered as $file)
{
printf("delete: %s\n", $file);
unlink($file->getPathname());
}
A good example of the flexibility of the Iterators code are your changed requirements: Do that for two directories at once. You just create two DirectoryIterators and append the one to the other with an AppendIterator. Job done. The rest of the code stays the same:
...
$files = new AppendIterator();
$files->append(new DirectoryIterator($dirPath1));
$files->append(new DirectoryIterator($dirPath2));
...
Voilá. Sounds good? glob is okay for some quick jobs that need just it. For everything else with directory operations start to consider the SPL. It has much more power.
Is strcasecmp() a valid function for this? Its a case insensitive str comparison function?
Surely if you know the file name and you can echo it out, you can pass this to unlink()?

Find and replace in multiple files

OK, whats the best solution in php to search through a bunch of files contents for a certain string and replace it with something else.
Exactly like how notepad++ does it but obviously i dont need the interface to that.
foreach (glob("path/to/files/*.txt") as $filename)
{
$file = file_get_contents($filename);
file_put_contents($filename, preg_replace("/regexhere/","replacement",$file));
}
So I recently ran into an issue in which our web host converted from PHP 5.2 to 5.3 and in the process it broke our installation of Magento. I did some individual tweaks that were suggested, but found that there were still some broken areas. I realized that most of the problems were related to an issue with the "toString" function present in Magento and the now deprecated PHP split function. Seeing this, I decided that I would try to create some code that would find and replace all the various instances of the broken functions. I managed to succeed in creating the function, but unfortunately the shot-gun approach didn't work. I still had errors afterwards. That said, I feel like the code has a lot of potential and I wanted to post what I came up with.
Please use this with caution, though. I'd recommended zipping a copy of your files so that you can restore from a backup if you have any issues.
Also, you don't necessarily want to use this as is. I'm providing the code as an example. You'll probably want to change what is replaced.
The way the code works is that it can find and replace whatever is in the folder it is put in and in the sub folders. I have it tweaked so that it will only look for files with the extension PHP, but you could change that as needed. As it searches, it will list what files it changes. To use this code save it as "ChangePHPText.php" and upload that file to wherever you need the changes to happen. You can then run it by loading the page associated with that name. For example, mywebsite.com\ChangePHPText.php.
<?php
## Function toString to invoke and split to explode
function FixPHPText( $dir = "./" ){
$d = new RecursiveDirectoryIterator( $dir );
foreach( new RecursiveIteratorIterator( $d, 1 ) as $path ){
if( is_file( $path ) && substr($path, -3)=='php' && substr($path, -17) != 'ChangePHPText.php'){
$orig_file = file_get_contents($path);
$new_file = str_replace("toString(", "invoke(",$orig_file);
$new_file = str_replace(" split(", " preg_split(",$new_file);
$new_file = str_replace("(split(", "(preg_split(",$new_file);
if($orig_file != $new_file){
file_put_contents($path, $new_file);
echo "$path updated<br/>";
}
}
}
}
echo "----------------------- PHP Text Fix START -------------------------<br/>";
$start = (float) array_sum(explode(' ',microtime()));
echo "<br/>*************** Updating PHP Files ***************<br/>";
echo "Changing all PHP containing toString to invoke and split to explode<br/>";
FixPHPText( "." );
$end = (float) array_sum(explode(' ',microtime()));
echo "<br/>------------------- PHP Text Fix COMPLETED in:". sprintf("%.4f", ($end-$start))." seconds ------------------<br/>";
?>

Is there any tool that will resolve and hardcode every included file of a PHP script?

I would need a tool, if it exists or if you can write in under 5 mins (don't want to waste anyone's time).
The tool in question would resolve the includes, requires, include_once and require_once in a PHP script and actually harcode the contents of then, recursively.
This would be needed to ship PHP scripts in one big file that actually use code and resources from multiple included files.
I know that PHP is not the best tool for CLI scripts, but as I'm the most pro-efficient at it, I use it to write some personal or semi-personal tools. I don't want un-helpful answers or comments that tell me to use something else than PHP or learn something else.
The idea of that approach is to be able to have a single file that would represent everything needed to put it in my personal ~/.bin/ directory and let it live there as a completely functional and self-contained script. I know I could set include paths in the script to something that would honor the XDG data directories standards or anything else, but I wanted to try that approach.
Anyway, I ask there because I don't want to re-invent the wheel and all my searches gave nothing, but if I don't have any insight here, I will continue in the way I was going to and actually write a tool that will resolve the includes and requires.
Thanks for any help!
P.S.: I forgot to include examples and don't want to rephrase the message:
Those two files
mainfile.php
<?php
include('resource.php');
include_once('resource.php');
echo returnBeef();
?>
resource.php
<?php
function returnBeef() {
return "The beef!";
}
?>
Would be "compiled" as (comments added for clarity)
<?php
/* begin of include('resource.php'); */?><?php
function returnBeef() {
return "The beef!";
}
?><?php /* end of include('resource.php); */
/*
NOT INCLUDED BECAUSE resource.php WAS PREVIOUSLY INCLUDED
include_once('resource.php');
*/
echo returnBeef();
?>
The script does not have to output explicit comments, but it could be nice if it did.
Thanks again for any help!
EDIT 1
I made a simple modification to the script. As I have begun writing the tool myself, I have seen a mistake I made in the original script. The included file would have, to do the least amount of work, to be enclosed out of start and end tags (<?php ?>)
The resulting script example has been modified in consequence, but it has not been tested.
EDIT 2
The script does not actually need to do heavy-duty parsing of the PHP script as in run-time accurate parsing. Simple includes only have to be treated (like include('file.php');).
I started working on my script and am reading the file to unintelligently parse them to include only when in <?php ?> tags, not in comments nor in strings. A small goal is to also be able to detect dirname(__FILE__)."" in an include directive and actually honor it.
An interesting problem, but one that's not really solvable without detailed runtime knowledge. Conditional includes would be nearly impossible to determine, but if you make enough simple assumptions, perhaps something like this will suffice:
<?php
# import.php
#
# Usage:
# php import.php basefile.php
if (!isset($argv[1])) die("Invalid usage.\n");
$included_files = array();
echo import_file($argv[1])."\n";
function import_file($filename)
{
global $included_files;
# this could fail because the file doesn't exist, or
# if the include path contains a run time variable
# like include($foo);
$file = #file_get_contents($filename);
if ($file === false) die("Error: Unable to open $filename\n");
# trimming whitespace so that the str_replace() at the end of
# this routine works. however, this could cause minor problems if
# the whitespace is considered significant
$file = trim($file);
# look for require/include statements. Note that this looks
# everywhere, including non-PHP portions and comments!
if (!preg_match_all('!((require|include)(_once)?)\\s*\\(?\\s*(\'|")(.+)\\4\\s*\\)?\\s*;!U', $file, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE ))
{
# nothing found, so return file contents as-is
return $file;
}
$new_file = "";
$i = 0;
foreach ($matches as $match)
{
# append the plain PHP code up to the include statement
$new_file .= substr($file, $i, $match[0][1] - $i);
# make sure to honor "include once" files
if ($match[3][0] != "_once" || !isset($included_files[$match[5][0]]))
{
# include this file
$included_files[$match[5][0]] = true;
$new_file .= ' ?>'.import_file($match[5][0]).'<?php ';
}
# update the index pointer to where the next plain chunk starts
$i = $match[0][1] + strlen($match[0][0]);
}
# append the remainder of the source PHP code
$new_file .= substr($file, $i);
return str_replace('?><?php', '', $new_file);
}
?>
There are many caveats to the above code, some of which can be worked around. (I leave that as an exercise for somebody else.) To name a few:
It doesn't honor <?php ?> blocks, so it will match inside HTML
It doesn't know about any PHP rules, so it will match inside PHP comments
It cannot handle variable includes (e.g., include $foo;)
It may introduce scope errors. (e.g., if (true) include('foo.php'); should be if (true) { include('foo.php'); }
It doesn't check for infinitely recursive includes
It doesn't know about include paths
etc...
But even in such a primitive state, it may still be useful.
You could use the built in function get_included_files which returns an array of, you guessed it, all the included files.
Here's an example, you'd drop this code at the END of mainfile.php and then run mainfile.php.
$includes = get_included_files();
$all = "";
foreach($includes as $filename) {
$all .= file_get_contents($filename);
}
file_put_contents('all.php',$all);
A few things to note:
any include which is actually not processed (ie. an include inside a function) will not be dumped into the final file. Only includes which have actually run.
This will also have a around each file but you can have multiple blocks like that with no issues inside a single text file.
This WILL include anything included within another include.
Yes, get_included_files will list the script actually running as well.
If this HAD to be a stand-alone tool instead of a drop in, you could read the inital file in, add this code in as text, then eval the entire thing (possibly dangerous).

PHP: Equivalent of include using eval

If the code is the same, there appears to be a difference between:
include 'external.php';
and
eval('?>' . file_get_contents('external.php') . '<?php');
What is the difference? Does anybody know?
I know the two are different because the include works fine and the eval gives an error. When I originally asked the question, I wasn't sure whether it gave an error on all code or just on mine (and because the code was evaled, it was very hard to find out what the error meant). However, after having researched the answer, it turns out that whether or not you get the error does not depend on the code in the external.php, but does depend on your php settings (short_open_tag to be precise).
After some more research I found out what was wrong myself. The problem is in the fact that <?php is a "short opening tag" and so will only work if short_open_tag is set to 1 (in php.ini or something to the same effect). The correct full tag is <?php, which has a space after the second p.
As such the proper equivalent of the include is:
eval('?>' . file_get_contents('external.php') . '<?php ');
Alternatively, you can leave the opening tag out all together (as noted in the comments below):
eval('?>' . file_get_contents('external.php'));
My original solution was to add a semicolon, which also works, but looks a lot less clean if you ask me:
eval('?>' . file_get_contents('external.php') . '<?php;');
AFAIK you can't take advantage of php accelerators if you use eval().
If you are using a webserver on which you have installed an opcode cache, like APC, eval will not be the "best solution" : eval'd code is not store in the opcode cache, if I remember correctly (and another answer said the same thing, btw).
A solution you could use, at least if the code is not often changed, is get a mix of code stored in database and included code :
when necessary, fetch the code from DB, and store it in a file on disk
include that file
as the code is now in a file, on disk, opcode cache will be able to cache it -- which is better for performances
and you will not need to make a request to the DB each time you have to execute the code.
I've worked with software that uses this solution (the on-disk file being no more than a cache of the code stored in DB), and I worked not too bad -- way better that doing loads of DB requests of each page, anyway...
Some not so good things, as a consequence :
you have to fetch the code from the DB to put it in the file "when necessary"
this could mean re-generating the temporary file once every hour, or deleting it when the entry in DB is modified ? Do you have a way to identify when this happens ?
you also have to change your code, to use the temporary file, or re-generate it if necessary
if you have several places to modifiy, this could mean some work
BTW : would I dare saying something like "eval is evil" ?
This lets you include a file assuming file wrappers for includes is on in PHP:
function stringToTempFileName($str)
{
if (version_compare(PHP_VERSION, '5.1.0', '>=') && strlen($str < (1024 * 512))) {
$file = 'data://text/plain;base64,' . base64_encode($str);
} else {
$file = Utils::tempFileName();
file_put_contents($file, $str);
}
return $file;
}
... Then include that 'file.' Yes, this will also disable opcode caches, but it makes this 'eval' the same as an include with respect to behavior.
As noted by #bwoebi in this answer to my question, the eval substitution does not respect the file path context of the included file. As a test case:
Baz.php:
<?php return __FILE__;
Foo.php:
<?php
echo eval('?>' . file_get_contents('Baz.php', FILE_USE_INCLUDE_PATH)) . "\n";
echo (include 'Baz.php') . "\n";
Result of executing php Foo.php:
$ php Foo.php
/path/to/file/Foo.php(2) : eval()'d code
/path/to/file/Baz.php
I don't know of any way to change the __FILE__ constant and friends at runtime, so I do not think there is any general way to define include in terms of eval.
Only eval('?>' . file_get_contents('external.php')); variant is correct replacement for include.
See tests:
<?php
$includes = array(
'some text',
'<?php print "some text"; ?>',
'<?php print "some text";',
'some text<?php',
'some text<?php ',
'some text<?php;',
'some text<?php ?>',
'<?php ?>some text',
);
$tempFile = tempnam('/tmp', 'test_');
print "\r\n" . "Include:" . "\r\n";
foreach ($includes as $include)
{
file_put_contents($tempFile, $include);
var_dump(include $tempFile);
}
unlink($tempFile);
print "\r\n" . "Eval 1:" . "\r\n";
foreach ($includes as $include)
var_dump(eval('?>' . $include . '<?php '));
print "\r\n" . "Eval 2:" . "\r\n";
foreach ($includes as $include)
var_dump(eval('?>' . $include));
print "\r\n" . "Eval 3:" . "\r\n";
foreach ($includes as $include)
var_dump(eval('?>' . $include . '<?php;'));
Output:
Include:
some textint(1)
some textint(1)
some textint(1)
some text<?phpint(1)
some textint(1)
some text<?php;int(1)
some textint(1)
some textint(1)
Eval 1:
some textNULL
some textNULL
bool(false)
some text<?phpNULL
bool(false)
some text<?php;NULL
some textNULL
some textNULL
Eval 2:
some textNULL
some textNULL
some textNULL
some text<?phpNULL
some textNULL
some text<?php;NULL
some textNULL
some textNULL
Eval 3:
some text<?php;NULL
some text<?php;NULL
bool(false)
some text<?php<?php;NULL
bool(false)
some text<?php;<?php;NULL
some text<?php;NULL
some text<?php;NULL
Some thoughts about the solutions above:
Temporary file
Don't. It's very bad for performance, just don't do it. Not only does it drive your opcode cache totally crazy (cache hit never happens + it tries to cache it again every time) but also gives you the headache of filesystem locking under high (even moderate) loads, as you have to write the file and Apache/PHP has to read it.
Simple eval()
Acceptable in rare cases; don't do it too often. Indeed it's not cached (poor opcode cache just doesn't know it's the same string as before); at the same time, if your code is changing each time, eval is A LOT BETTER than include(), mostly because include() fills up the opcode cache on each call. Just like the tempfile case. It's horrible (~4x slower).
In-memory eval()
Actually, eval is very fast when your script is already in the string; most of the time it's the disk operation that pulls it back, now surely this depends on what you do in the script but in my very-small-script case, it was ~400 times faster. (Do you have memcached? Just thinking loud) So what include() can't do is evaluate the same thing twice without file operation, and this is very important. If you use it for ever-changing, small, memory-generated strings, obviously it's eval to choose - it's many-many times faster to load once + eval again and again than an iterated include().
TL;DR
Same code, once per request: include
Same code, several calls per request: eval
Varying code: eval
here is my approach.
it creates temporary php file and includes it.
but this way if code you want to run on this function has errors program exits before removing temporary file
so i make an autoclean procedure in function. this way it cleans old temporary files by an timeout everytime function runs. you can set timeout or disable it from options at start of function
i also added ignore error option for solving non removed temporary files. if errors ignored, program will continue and remove temporary file.
also some projects have to disable autoclean because it scans whole directory everytime it runs. it could hurt disk performance.
function eval2($c) {
$auto_clean_old_temporary_files=false; //checks old temporary eval2 files for this spesific temporary file names generated by settings below
$ignore_all_errors=true; //if you ignore errors you can remove temporary files even there is an error
$tempfiledirectory=''; //temporary file directory
$tempfileheader='eval2_'; // temporary file header
$tempfiletimeseperator='__'; // temporary file seperator for time
$tempfileremovetimeout=200; // temp file cleaning time in seconds
if ($auto_clean_old_temporary_files===true) {
$sd=scandir('.'); //scaning for old temporary files
foreach ($sd as $sf) {
if (strlen($sf)>(32+strlen($tempfileheader)+strlen($tempfiletimeseperator)+3)) { // if filename long enough
$t1=substr($sf,(32+strlen($tempfileheader)),strlen($tempfiletimeseperator)); //searching time seperator
$t2=substr($sf,0,strlen($tempfileheader)); //searching file header
if ($t1==$tempfiletimeseperator && $t2==$tempfileheader) { //checking for timeseperator and file name header
$ef=explode('.',$sf);
unset($ef[count($ef)]);//removing file extension
$nsf=implode('.',$ef);//joining file name without extension
$ef=explode($tempfiletimeseperator,$nsf);
$tm=(int)end($ef); //getting time from filename
$tmf=time()-$tm;
if ($tmf>$tempfileremovetimeout && $tmf<123456 && $tmf>0) { // if time passed more then timeout and difference with real time is logical
unlink($sf); // finally removing temporary file
}
}
}
}
}
$n=$tempfiledirectory.$tempfileheader . md5(microtime().rand(0,5000)). $tempfiletimeseperator . time() .'.php'; //creating spesific temporary file name
$c='<?php' . PHP_EOL . $c . PHP_EOL; //generating php content
file_put_contents($n,$c); //creating temporary file
if ($ignore_all_errors===true) { // including temporary file by your choise
$s=#include($n);
}else{
$s=include($n);
}
return $s;
}

Categories