Validate the syntax of PHP files more efficiently - php

Validating the syntax of a bunch of PHP files is SLOW
We use php -l file.php to validate the syntax of many php files as part of a continuous integration setup. We actually do something like: `find . -name "*.php" | xargs --max-args=1 php -l" because the php executable only takes one argument.
This is horrendously slow and mostly because it involves firing up a brand new parser / interpreter (not to mention process) for each PHP file in order to verify it's syntax and we have thousands.
Is there a faster way?

What about adding a time in the search eg
`find . -mtime -7 -name "*.php" | xargs --max-args=1 php -l
to the find command to only validate the files that have been modified on the last week?
I am presuming most of your code base does not change every few days?
Updated
You might also want to try the -newer flag
`find . -newer /path/to/file -name "*.php" | xargs --max-args=1 php -l
it finds all files newer than the one given, very handy, especially if your version control changes a certain system file every time you checkout alternatively use:
touch -t 201303121000 /path/to/file
to create a dummy file for use with -newer

I've given up on php -l entirely for the same reason, though in my case (and perhaps in yours) it doesn't matter.
Since I'm using PHPUnit for my unit tests I don't need to lint the files being tested. If the file wouldn't pass the linter it won't pass any tests either (even one which simply includes the file).
If you haven't covered 100% of your files with PHPUnit, you may be able to fake the effect of the linter with something like:
class FakeLinterTest extends PHPUnit_Framework_TestCase {
public function testLintAllTheFiles() {
foreach ($this->listAllPHPFiles() as $file) {
include_once($file);
}
}
private function listAllPHPFiles() {
// Traverse your entire source tree.
}
}
That code is entirely untested. Also, if you have a big project you may need to play games with memory limits and/or break up the "lint" into chunks to stop it murdering your CI system.

Related

Checking content of many small files in PHP

I have an unusual problem. Using PHP script, I have to traverse through a folder with around 1 million small text files (size ranges from 1KB to 1MB), and pick only those with ctime in certain interval AND content containing particular search string.
First part (picking files that have time of creation in certain range) I managed using readdir but checking the file content for search string proves to be a challenge. Using file_get_contents (and then stripos) simply won't do. It's slow, it brings my PHP script to its knees.
I'm sure I'm not the first one with this kind of a problem, but I'm not a PHP developer. This code has been inherited from previous dev. I'm not sure which alternative I should be using and what code would spare my server RAM and CPU.
I would try shell_exec combined with find and grep:
$output = shell_exec("find . -type f -ctime $MyCtime -exec grep -H -m 1 $MySearchString {} +;");
-H to show filename
-m 1 to stop searching at first ocurrence in file
PHP won't handle it easily(it will take lots of time + will overload the CPU), consider to use bash and regular expressions to solve the problem
Simply saying, PHP is not the right tool in the situation

PHP - How to do a string replace in a very large number of files?

I have two million text files in a server online accesible to internet users. I was asked to make a change (a string replace operation) to these files as soon as possible. I was thinking about doing a str_replace on every text file on the server. However, I don't want to tie up the server and make it unreachable by internet users.
Do you think the following is a good idea?
<?php
ini_set('max_execution_time', 1000);
$path=realpath('/dir/');
$objects = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path), RecursiveIteratorIterator::SELF_FIRST);
foreach($objects as $name => $object){
set_time_limit(100);
//do str_replace stuff on the file
}
Use find, xargs and sed from shell, i.e.:
cd /dir
find . -type f -print0 | xargs -0 sed -i 's/OLD/NEW/g
Will search all files recursively (hidden also) inside the current dir and replace OLD for NEW using sed.
Why -print0?
From man find:
If you are piping the output of find into another program and
there is the faintest possibility that the files which you are
searching for might contain a newline, then you should seriously
consider using the '-print0' option instead of '-print'.
Why xargs ?
From man find:
The specified command is run once for each matched file.
That is, if there are 2000 files in /dir, then find ... -exec ... will result in 2000 invocations of sed; whereas find ... | xargs ... will only invoke sed once or twice.
Don't do this with PHP, it's most likely to fail horribly and I'll take up all your system resources.
find . -type f -exec sed -i 's/search/replace/g' {} +
The example above with search and replace string and it's recursive and regular files including hidden ones.
You could also do this with a Python program limited to one core (which is the default). If your machine has multiple cores, and at least one is generally free, you should be set.

File not found, but files are present

I'm working on a server where users should be able to run protein sequences against a database, and it uses an executable called blastall. The server generates an executable which it should then run using batch. However, it doesn't appear to be running. Here is an example of an executable is generates (cmd.sh):
#!/usr/bin/env sh
cd /var/www/dbCAN
php -q /var/www/dbCAN/tools/blast.php -e -w /var/www/dbCAN/data/blast/20121019135548
Where the crazy number at the end of that is an auto-generated job ID based on when the job was submitted. There are 2 issues, and I'm trying to solve one at a time. The first issue is that when manually executed (by me simply running ./cmd.sh), I get the following errors:
sh: 1: /var/www/dbCAN/tools/blast/bin/blastall: not found
sh: 1: /var/www/dbCAN/tools/blast/bin/blastall: not found
sh: 1: -t: not found
But this doesn't really make sense to me, as the directory specified does in fact contain blastall. It has full rwx permissions and every directory along the way has appropriate permissions.
The blast.php file in tools looks like this:
try {
do_blast($opts["w"]);
$info['status'] = 'done';
$fp = fopen("$opts['w']/info.yaml","w")
fwrite($fp, Sypc::YAMLDump($info)); fclose($fp);
}
With of course variable declarations above it, and the do_blast function looks like this (again with variables declared above it and a cd so the directories work out):
function do_blast($workdir)
{
system("/var/www/dbCAN/tools/blast/bin/blastall -d data/blast/all.seq.fa -m 9 -p blastp -i $workdir/input.faa -o $workdir/output.txt")
system("/var/www/dbCAN/tools/blast/bin/blastall -d data/blast/all.seq.fa -p blastp -i $workdir/input.faa -o $workdir/output2.txt")
}
Any idea what may be causing this issue? I thought it may be because I'm running it and it was created by apache, but rwx is allowed for all users. I can include more information if needed, but I chose not to at this point because the original person who wrote the PHP split up everything into tons of little files, so it's difficult to pinpoint where the problem is exactly. Any ideas (if not complete solutions) are very appreciated.
EDIT: Solution found. As it turns out, the blastall executable had been compiled on a different linux system. Switched to a different executable and it ran flawlessly.
Could it be an issue with relative paths in your script? See my answer here, maybe it helps:
finding a file in php that is 4 directories up
The solution was to recompile the blastall executable. It had been compiled for Redhat and I am using Ubuntu. Unfortunately I assumed the executable I was given was for my system, not the previous one.

Using grep to search for instances of a string in all files within a directory, using PHP bash script

I am writing a PHP command line script, and I'm wondering if any grep experts can help me to come up with a command to do the following:
For all files ending in .java in the current directory ($curDir), search within the file for any instance of $str, and return an indicator whether any instance has been found or not.
I've tried piecing together a grep command from various bits found on the internet but, having not really used grep before, its a bit difficult to piece together. I would appreciate any help.
Thanks!
This should do the trick:
exec("grep '$str' *.java", $output_array);
Use -l if you want it to just output the file names. Other grep options can be found here:
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
Depending on what $str is you may need to pass it through: escapeshellarg():
http://www.php.net/manual/en/function.escapeshellarg.php
Grep is well suited to that task.
grep -rl '$str' .
That command will report file names of files that match your string. It will only report each file name once but that seems to be what you want. The option "-r" says to recursively traverse the file system and the option "-l" says to just report file that contain a match. You may also want to check out the options "-H" and "-n" as I use them frequently.

syntax error checking in php on a project level rather than file

I have a large php project and different developers work on the same project. Changes in php file e.g syntax error can lead to 500 internal server error if another developer tries to run the same project - leaving the other developer clueless as to where the error is from. I need to download some batch file that checks the whole project and displays the line numbers and errors that occured for each file in the project and not just in one file e.g. when using php -l filename - instead I want it to be php -l project
If you are using linux, this command will check your current folder recursively for all php files, syntax check them and filter out the ones that are OK:
find . -name \*.php -exec php -l {} \; | grep -v "No syntax errors"
You'll get a nice list of files and errors.
Edit: Turns out the OP is looking for a way to activate error reporting. I'll leave this in place anyway because I'm sure it's good universal advice for many in similar situations.
I don't know your situation, but to me, it sounds like what you might really need is a proper deployment process using at least a version control system. Multiple developers working on the same files simultaneously without any version control is a recipe for disaster, that much I can guarantee you.
Some starting points:
Setting up a deployment / build / CI cycle for PHP projects
An introduction into VCS mercurial that is very nicely done and helps understand how version control works
Here is a programmers.SE question that might suit your situation: https://softwareengineering.stackexchange.com/questions/74708/source-control-on-a-live-shared-hosting-site
$it = new RecursiveIteratorIterator( new RecursiveDirectoryIterator('.'));
$regx = new RegexIterator($it, '/^.*\.php$/i', // only matched text will be returned
RecursiveRegexIterator::GET_MATCH);
foreach ($regx as $file) {
$file = $file[0];
exec('php -l ' . $file); //check the syntax here
}

Categories