Checking content of many small files in PHP - php

I have an unusual problem. Using PHP script, I have to traverse through a folder with around 1 million small text files (size ranges from 1KB to 1MB), and pick only those with ctime in certain interval AND content containing particular search string.
First part (picking files that have time of creation in certain range) I managed using readdir but checking the file content for search string proves to be a challenge. Using file_get_contents (and then stripos) simply won't do. It's slow, it brings my PHP script to its knees.
I'm sure I'm not the first one with this kind of a problem, but I'm not a PHP developer. This code has been inherited from previous dev. I'm not sure which alternative I should be using and what code would spare my server RAM and CPU.

I would try shell_exec combined with find and grep:
$output = shell_exec("find . -type f -ctime $MyCtime -exec grep -H -m 1 $MySearchString {} +;");
-H to show filename
-m 1 to stop searching at first ocurrence in file

PHP won't handle it easily(it will take lots of time + will overload the CPU), consider to use bash and regular expressions to solve the problem
Simply saying, PHP is not the right tool in the situation

Related

diff images between 2 servers

I have 2 servers serv1 and serv2 and need to compare the images in those 2 servers to detect which files are missing or has been modified.
So far I have 3 options:
- Create an API using PHP
I created an API file that will return all the images in serv1/www/app/images/
get the modification time of each images
return the result as json
output is something like this: { 'path/to/file' : 123232433422 }
I fetch that in serv2, decode then merge the array to the images in serv2/www/app/images
get the array_diff, works fine
cons:
- takes a lot of time (fetching, decoding, merging, looping, comparison... )
- Use rsync
Dry run to get the list of images that is existing in serv1 but is missing or modified in serv2 (very fast :))
cons:
apache can't run ssh because it's not authorized to access ~/.ssh/
would need to give apache permission but my client doesn't want it
so in short, i cannot use anything that would require permission
- maybe I could use some library or vendor but I doubt my client would allow me. If it can be shell script or a php built in function, I'll do it as long as it's possible.
So my question is if there is another way to fetch the images and modification date of those images without requiring authentication? My first solution is okay if it can be optimized cause if the array is too large, it takes a lot of time.
I hope the solution can be done in PHP, or Shell script.
Please help give me more options. Thanks
Install utility md5deep (or sha1deep) on both servers.
Execute md5deep on first server and save result to text file:
user#server1> md5deep -l -r mydir > server1.txt
Result file would look like this:
e7c3fcf5ad7583012379ec49e9a47b28 .\a\file1.php
2ef76c2ecaefba21b395c6b0c6af7314 .\b\file2.txt
45e19bb4b38d529d6310946966f4df12 .\c\file3.bin
...
Then, copy file server1.txt to second server and run md5deep in negative matching mode:
md5deep -l -r -X server1.txt mydir
This will print checksums and names of all files on second server which are different from first server.
Alternatively, you can compare text files created by md5deep -l -r dir yourself using diff or similar utility.
Last note - it may be easier to simply run md5deep -l -r mydir | gzip > md5deep.txt.gz in cron on each server, such that you have ready to compare filelist with checksums on each server (gzipped so it is fast to fetch).

Validate the syntax of PHP files more efficiently

Validating the syntax of a bunch of PHP files is SLOW
We use php -l file.php to validate the syntax of many php files as part of a continuous integration setup. We actually do something like: `find . -name "*.php" | xargs --max-args=1 php -l" because the php executable only takes one argument.
This is horrendously slow and mostly because it involves firing up a brand new parser / interpreter (not to mention process) for each PHP file in order to verify it's syntax and we have thousands.
Is there a faster way?
What about adding a time in the search eg
`find . -mtime -7 -name "*.php" | xargs --max-args=1 php -l
to the find command to only validate the files that have been modified on the last week?
I am presuming most of your code base does not change every few days?
Updated
You might also want to try the -newer flag
`find . -newer /path/to/file -name "*.php" | xargs --max-args=1 php -l
it finds all files newer than the one given, very handy, especially if your version control changes a certain system file every time you checkout alternatively use:
touch -t 201303121000 /path/to/file
to create a dummy file for use with -newer
I've given up on php -l entirely for the same reason, though in my case (and perhaps in yours) it doesn't matter.
Since I'm using PHPUnit for my unit tests I don't need to lint the files being tested. If the file wouldn't pass the linter it won't pass any tests either (even one which simply includes the file).
If you haven't covered 100% of your files with PHPUnit, you may be able to fake the effect of the linter with something like:
class FakeLinterTest extends PHPUnit_Framework_TestCase {
public function testLintAllTheFiles() {
foreach ($this->listAllPHPFiles() as $file) {
include_once($file);
}
}
private function listAllPHPFiles() {
// Traverse your entire source tree.
}
}
That code is entirely untested. Also, if you have a big project you may need to play games with memory limits and/or break up the "lint" into chunks to stop it murdering your CI system.

Using grep to search for instances of a string in all files within a directory, using PHP bash script

I am writing a PHP command line script, and I'm wondering if any grep experts can help me to come up with a command to do the following:
For all files ending in .java in the current directory ($curDir), search within the file for any instance of $str, and return an indicator whether any instance has been found or not.
I've tried piecing together a grep command from various bits found on the internet but, having not really used grep before, its a bit difficult to piece together. I would appreciate any help.
Thanks!
This should do the trick:
exec("grep '$str' *.java", $output_array);
Use -l if you want it to just output the file names. Other grep options can be found here:
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
Depending on what $str is you may need to pass it through: escapeshellarg():
http://www.php.net/manual/en/function.escapeshellarg.php
Grep is well suited to that task.
grep -rl '$str' .
That command will report file names of files that match your string. It will only report each file name once but that seems to be what you want. The option "-r" says to recursively traverse the file system and the option "-l" says to just report file that contain a match. You may also want to check out the options "-H" and "-n" as I use them frequently.

Compile C++ file Using PHP

I am using PHP on Windows machin. I also use Dev C++. I can perfectly compile .cpp file on CMD using this command:
g++ hello.cpp -O3 -o hello.exe
Now what I am trying to do is running the same command using php system() function, so it looks like this:
system("g++ c:\wamp\www\grader\hello.cpp -O3 -o C:\wamp\www\grader\hello.exe");
but it doesn't compile. I am lost, please tell me what am I missing?
I also looked up at this question and thats exactly what I need, but I couldnt find a usefull solution for my case there:
Php script to compile c++ file and run the executable file with input file
Use the PHP exec command.
echo exec('g++ hello.cpp -O3 -o hello.exe');
should work.
There's a whole family of different exec & system commands in PHP, see here:
http://www.php.net/manual/en/ref.exec.php
If you want the output into a variable, then use :
$variable = exec('g++ hello.cpp -O3 -o hello.exe');
If that doesn't work, then make sure that g++ is available in your path, and that your logged in with sufficient enough privliges to allow it to execute.
You may find also that it's failing beacuse PHP is essentially being executed by your web server (Unless your also running PHP from the cmd prompt) , and the web server user ID may not have write access to the folder where G++ is trying to create the output file.
Temporarily granting write access to 'Everybody' on the output folder will verify if that is the case.
Two things:
You are using double quotes and are not escaping the \ inside the path.
You are not using a full path to g++.
The first one is important as \ followed by something has a special meaning in such a string (you might know \n as new line), the second one is relevant since the PHP environment might have a different search path.
A solution might be
system("c:\\path\\to\\g++ c:\\wamp\\www\\grader\\hello.cpp -O3 -o C:\\wamp\\www\\grader\\hello.exe");
Alternatively you can use single quotes, intead of double quotes, they use diffeent,less strict escaping rules
system('c:\path\to\g++ c:\wamp\www\grader\hello.cpp -O3 -o C:\wamp\www\grader\hello.exe');
or use / instead of \, which is supported by windows, too.
system("c:/path/to/g++ c:/wamp/www/grader/hello.cpp -O3 -o C:/wamp/www/grader/hello.exe");
What you do is your choice, while many might consider the first one as ugly, and the last one as bad style on Windows ;-)
Thanks to everyone. I tried to run the codes given in above posts and it worked like a charm.
I ran the following code using my browser
$var = exec("g++ C:/wamp/www/cpp/hello.cpp -O3 -o C:/wamp/www/cpp/hello.exe");
echo $var;
The exe file is created. I am able to see the result when i run the exe file but the problem is when i run the above code in the browser, the result is not displayed on the webpage. I gave full access permission to all users but still give does not show the result on the webpage.
I really need help on this since i am doing a project on simulated annealing where i want to get the result from compiled c++ program and display it in the webpage with some jquery highcharts.
Thanks again to all, it has helped me alot and i have learnt alot as well.

Problem making system calls with PHP scripts

I have the following PHP script:
<?php
$fortune = `fortune`;
echo $fortune;
?>
but the output is simply blank (no visible errors thrown).
However, if I run php -a, it works:
php > echo `fortune`;
Be careful of reading health books, you might die of a misprint.
-- Mark Twain
php >
Am I missing a config directive or something that would cause this?
Edit: So, I tried running my script using $ php-cgi fortunetest.php and it worked as expected. Maybe the issue is with Apache2?
Anyways, I found the solution: fortune lived in /usr/games, so I thought it might be a $PATH issue, but when I did su www-data and ran $ fortune it worked as expected, and /usr/games was in $PATH. Apparently, Apache was using a different $PATH variable even though it was running under user www-data, so I rewrote the script to use /usr/games/fortune instead of plain fortune and it worked. Since fortune wasn't the point of the script, it was kind of a waste of time, but lesson learned.
the fortune command just outputs a quotation or famous quote for you. You can just simply store those quotations in a flat text file, or if you have a database, store them in a table. Then use the rand() function or similar in PHP to generate a random number and use this number to get that row in the quotations file/table. This way, your PHP script is not dependent on whether the system has the fortune command installed or not.
Use full paths like /usr/games/fortune instead of just fortune. Apache doesn't always expand paths automatically.
<?php passthru('/usr/games/fortune'); ?>
To find a full path do which in the shell e.g..
which fortune
Honestly I don't know. However It might be you have to say echo $fortune instead.

Categories