I'd like to,
Check the word count for a folder full of text files.
Output a list of the files arranged by word count in the format - FILENAME is WORDCOUNT
I know str_word_count is used to get individual wordcounts for files but I'm not sure how to rearrange the output.
Thanks in advance.
Adapted from here.
<?php
$files = array();
$it = new DirectoryIterator("/tmp");
$it->rewind();
while ($it->valid()) {
$count = str_word_count(file_get_contents($it->getFilename()));
$files[sprintf("%010d", $count) . $it->getFilename()] =
array($count, $it->getFilename());
$it->next();
}
ksort($files);
foreach ($files as $tup) {
echo sprintf("%s is %d\n", $tup[1], $tup[0]);
}
EDIT It would be more elegant to have $file's key be the file name and $file's value be the word count and then sort by value.
I don't use php but I would
create array to hold filename and
wordcount
read through the folder full of text
files and for each save the filename
and wordcount to the array
sort the array by wordcount
output the array
To store the information (#2) I would put the information into a 2D array. There is more information about 2D arrays here at Free PHP Tutorial. Thus array[0][0] would equal the name of the first file and array0 would be the wordcount. array1[0] and array1 would be the for the next file.
To sort the array (#3) you can use the tutorial firsttube.com.
The to output I would do a loop through the array and output the first and second location.
for ($i = 0; $i < sizeof($array); ++$i) {
print the filename ($array[$i][0]) and wordcount ($array[$i][1])
}
If you would like to keep the iterator-style approach (yet still do essentially the same as Artefacto's answer) then something like the following would suffice.
$dir_it = new FilesystemIterator("/tmp");
// Build array iterator with word counts
$arr_it = new ArrayIterator();
foreach ($dir_it as $fileinfo) {
// Skip non-files
if ( ! $fileinfo->isFile()) continue;
$fileinfo->word_count = str_word_count(file_get_contents($fileinfo->getPathname()));
$arr_it->append($fileinfo);
}
// Sort by word count descending
$arr_it->uasort(function($a, $b){
return $b->word_count - $a->word_count;
});
// Display sorted files and their word counts
foreach ($arr_it as $fileinfo) {
printf("%10d %s\n", $fileinfo->word_count, $fileinfo->getFilename());
}
Aside: If the files are particularly large (read: loading each one entirely into memory just to count the words is too much) then you could loop over the file line-by-line (or byte-by-byte if you really wanted to) with the SplFileObject.
Related
I have looked on Stackover for a simular issue but can't find a solution.
I am trying to write a script that looks at the content of two directories to findout if a filename match can be found in both
directories. If a match is found write the name of the matched filename to an array.
The first thing I am doing is using""scandir" to create an array of data from the first directory.
In the "foreeach"loop through the array from the "scandir" result and perform a "file_exists" using the variable "$image1"
to fing a match in the seconds directory "allimages/boardsclean". If a match is found write the filename to the "$found_images" array.
Testing the result of the "$found_images" array I am not seeing the result I was expecting.
Can anyone see where I am going wrong?
$c1 = 0;
$c2 = 0;
$scan = scandir('allimages/temp1');
$found_images = array();
foreach ($scan as $image1) {
if (file_exists('allimages/temp1/'.$image1) && ('allimages/temp2/'.$image1)) {
echo "file match in Scan $image1</br>";
$found_images[] = 'allimages/adminclean/'. $image1;
$c1++;
}
}
echo $c1."</br>";
foreach ($found_images as $image3) {
echo "file match $image3 </br>";
$c2++;
}
echo $c2."</br>";
First, you don't need to test for the file from the scandir because, well... it's already there and was returned. Second, you don't test for the one in the second directory. You need:
if(file_exists('allimages/temp2/'.$image1)) {
However, just scan both directories and compute the intersection of the returned arrays which will give you files common to both directories. It's as simple as:
$found = array_intersect(scandir('allimages/temp1'), scandir('allimages/temp2'));
Then you can filter out directories if you want and add allimages/adminclean/ in the array or when needed.
I am trying to make a PHP application which searches through the files of your current directory and looks for a file in every subdirectory called email.txt, then it gets the contents of the file and compares the contents from email.txt with the given query and echoes all the matching directories with the given query. But it does not work and it looks like the problem is in the if-else part of the script at the end because it doesn't give any output.
<?php
// pulling query from link
$query = $_GET["q"];
echo($query);
echo("<br>");
// listing all files in doc directory
$files = scandir(".");
// searching trough array for unwanted files
$downloader = array_search("downloader.php", $files);
$viewer = array_search("viewer.php", $files);
$search = array_search("search.php", $files);
$editor = array_search("editor.php", $files);
$index = array_search("index.php", $files);
$error_log = array_search("error_log", $files);
$images = array_search("images", $files);
$parsedown = array_search("Parsedown.php", $files);
// deleting unwanted files from array
unset($files[$downloader]);
unset($files[$viewer]);
unset($files[$search]);
unset($files[$editor]);
unset($files[$index]);
unset($files[$error_log]);
unset($files[$images]);
unset($files[$parsedown]);
// counting folders
$folderamount = count($files);
// defining loop variables
$loopnum = 0;
// loop
while ($loopnum <= $folderamount + 10) {
$loopnum = $loopnum + 1;
// gets the emails from every folder
$dirname = $files[$loopnum];
$email = file_get_contents("$dirname/email.txt");
//checks if the email matches
if ($stremail == $query) {
echo($dirname);
}
}
//print_r($files);
//echo("<br><br>");
?>
Can someone explain / fix this for me? I literally have no clue what it is and I debugged soo much already. It would be heavily gracious and appreciated.
Kind regards,
Bluppie05
There's a few problems with this code that would be preventing you from getting the correct output.
The main reason you don't get any output from the if test is the condition is (presumably) using the wrong variable name.
// variable with the file data is called $email
$email = file_get_contents("$dirname/email.txt");
// test is checking $stremail which is never given a value
if ($stremail == $query) {
echo($dirname);
}
There is also an issue with your scandir() and unset() combination. As you've discovered scandir() basically gives you everything that a dir or ls would on the command line. Using unset() to remove specific files is problematic because you have to maintain a hardcoded list of files. However, unset() also leaves holes in your array, the count changes but the original indices do not. This may be why you are using $folderamount + 10 in your loop. Take a look at this Stack Overflow question for more discussion of the problem.
Rebase array keys after unsetting elements
I recommend you read the PHP manual page on the glob() function as it will greatly simplify getting the contents of a directory. In particular take a look at the GLOB_ONLYDIR flag.
https://www.php.net/manual/en/function.glob.php
Lastly, don't increment your loop counter at the beginning of the loop when using the counter to read elements from an array. Take a look at the PHP manual page for foreach loops for a neater way to iterate over an array.
https://www.php.net/manual/en/control-structures.foreach.php
I am successfully able to get random images from my 'uploads' directory with my code but the issue is that it has multiple images repeat. I will reload the page and the same image will show 2 - 15 times without changing. I thought about setting a cookie for the previous image but the execution of how to do this is frying my brain. I'll post what I have here, any help would be great.
$files = glob($dir . '/*.*');
$file = array_rand($files);
$filename = $files[$file];
$search = array_search($_COOKIE['prev'], $files);
if ($_COOKIE['prev'] == $filename) {
unset($files[$search]);
$filename = $files[$file];
setcookie('prev', $filename);
}
Similar to slicks answer, but a little more simple on the session front:
Instead of using array_rand to randomise the array, you can use a custom process that reorders based on just a rand:
$files = array_values(glob($dir . '/*.*'));
$randomFiles = array();
while(count($files) > 0) {
$randomIndex = rand(0, count($files) - 1);
$randomFiles[] = $files[$randomIndex];
unset($files[$randomIndex]);
$files = array_values($files);
}
This is useful because you can seed the rand function, meaning it will always generate the same random numbers. Just add (before you randomise the array):
if($_COOKIE['key']) {
$microtime = $_COOKIE['key'];
else {
$microtime = microtime();
setcookie('key', $microtime);
}
srand($microtime);
This does means that someone can manipulate the order of the images by manipulating the cookie, but if you're okay with that this this should work.
So you want to have no repeats per request? Use session. Best way to avoid repetitions is to have two arrays (buckets). First one will contains all available elements that your will pick from. The second array will be empty for now.
Then start picking items from first array and move them from 1st array to the second. (Remove and array_push to the second). Do this in a loop. On the next iteration first array won't have the element you picked already so you will avoid duplicates.
In general. Move items from a bucket to a bucket and you're done. Additionally you can store your results in session instead of cookies? Server side storage is better for that kind of things.
I have a directory containing a changeable number of files, each of which contains a single character on the first line, and CSV for the remaining file content, such as:
U
Status4,jwalker,Tech Manual 03264
Status3,jwalker,Status Report 3213
Status4,rmartino,Tech Manual 52002
...
Using this code, I can easily get a listing of the all report filenames in a directory:
<?php
// Open session
session_start();
// Get array of reports from directory
$files = scandir('reports');
$files <= array_pop($files);
$files <= array_shift($files);
$files <= array_shift($files);
// Extract summaries
for ($i=0; $i <= count($files)-1; $i++) {
$summaries[$i] = file($files[$i]);
}
?>
The reason I would like to use the file() function in line 13 is because it would conveniently break the file into an array so I could easily reference a particular line, such as $summaries[3][2] to get the third line from the fourth file in the directory (remembering that I'm counting from the PHP default 0 here). The PHP.net documentation doesn't indicate anything about NOT using an array as the string to be passed to file(), so I would assume there is a way to do it. But I've only found constants and strings, not arrays being passed.
Anyone have any insight here? Much thanks!
There's no issue with passing the value of an array element $arr[$k] to file() as long as it's a string that represents a file path.
The problem is you're scanning the reports folder, and getting a list of file names not file paths. You need to prepend reports to the file name. Not doing so should cause an exception when file() can't find the file.
<?php
// Open session
session_start();
// Get array of reports from directory
$files = scandir('reports');
// Extract summaries
for ($i=0; $i <= count($files)-1; $i++) {
if ($files[$i] == '.' || $files[$i] == '..') continue;
$summaries[$i] = file('reports' . DIRECTORY_SEPARATOR . $files[$i]);
}
?>
Also, not sure what the following is meant to be doing, but what it's actually doing is removing the last file, I've omitted it in my answer
$files <= array_pop($files);
The following two lines are presumably removing . and .. paths, I've replaced those with a check in the loop.
$files <= array_shift($files);
$files <= array_shift($files);
I want to "include" the last 2 files in my php file based on their names.
So the names are:
01-blahbla.php
02-sdfsdff.php
03-aaaaaaa.php
04-bbbbbbb.php
05-wwwwwwi.php
and I only want to include 05 and 04 since their name starts with the biggest number.
How do I go on about doing this?
Assuming there is only the numbered files in the folder, you could use
$files = glob('/path/to/files/*.php'); // get all php files in path
natsort($files); // sort in natural ascending order
$highest = array_pop($files); // get last file, e.g. highest number
$second = array_pop($files); // again for second highest number
Put the values in an array, reverse sort it using rsort() and then take the first two:
$values = array('01-blahbla.php', '02-sdfsdff.php', '03-aaaaaaa.php', '04-bbbbbbb.php', '05-wwwwwwi.php');
rsort($values);
$file1 = $values[0];
$file2 = $values[1];
require_once $file1;
require_once $file2;
The PHP manual at php.net has some great info for the various sort methods.
Update:
As Psytronic noted, rsort will not work for numbers, but you can create a custom function that easily does the same thing:
function rnatsort(&$values) {
natsort($values);
return array_reverse($values, true);
}
$files = rnatsort($values);
List the directory contents into an array.
Sort that array using built-in PHP sorting functions.
Do a require_once() on the first two elements of the array.