How to compute tf-idf from multiple text files in php? - php

I'm successfully computing tf-idf from an array. Now I want that tf-idf should be computed from multiple text files as I have multiple text files in my directory. Can anyone please modify this code for multiple text files so that first all the files in the directory should read and then on the basis of these files contents tf-idf computed.. Below is my code thanks...
$collection = array(
1 => 'this string is a short string but a good string',
2 => 'this one isn\'t quite like the rest but is here',
3 => 'this is a different short string that\' not as short'
);
$dictionary = array();
$docCount = array();
foreach($collection as $docID => $doc) {
$terms = explode(' ', $doc);
$docCount[$docID] = count($terms);
foreach($terms as $term) {
if(!isset($dictionary[$term])) {
$dictionary[$term] = array('df' => 0, 'postings' => array());
}
if(!isset($dictionary[$term]['postings'][$docID])) {
$dictionary[$term]['df']++;
$dictionary[$term]['postings'][$docID] = array('tf' => 0);
}
$dictionary[$term]['postings'][$docID]['tf']++;
}
}
$temp = ('docCount' => $docCount, 'dictionary' => $dictionary);
Computing tf-idf
$index = $temp;
$docCount = count($index['docCount']);
$entry = $index['dictionary'][$term];
foreach($entry['postings'] as $docID => $postings) {
echo "Document $docID and term $term give TFIDF: " .
($postings['tf'] * log($docCount / $entry['df'], 2));
echo "\n";
}

Have a look at this answer: Reading all file contents from a directory - php
There you find the information how to read all the file contents from a directory.
With this information you should be able to modify your code by yourselve to get it work like expected.

Related

How to get Contents of text files as value in foreach loop in glob function in php?

I am developing a search engine with vector space Model. I successfully computed tf-idf with associative array data already define in code. Now I want that data should be come from directory where I have a folders and in each folder there is a number of text files with dummy data. I have tried alot but stuck at 1 point using glob function because I want all .txt files as key and its contents as value in foreach loop of glob function.... Below is my code.
Tf-idf With Associative Array Data
$collection = array(
1 => 'this string is a short string but a good string',
2 => 'this one isn\'t quite like the rest but is here',
3 => 'this is a different short string that\' not as short'
);
$dictionary = array();
$docCount = array();
foreach($collection as $docID => $doc) {
$terms = explode(' ', $doc);
$docCount[$docID] = count($terms);
foreach($terms as $term) {
if(!isset($dictionary[$term])) {
$dictionary[$term] = array('df' => 0, 'postings' => array());
}
if(!isset($dictionary[$term]['postings'][$docID])) {
$dictionary[$term]['df']++;
$dictionary[$term]['postings'][$docID] = array('tf' => 0);
}
$dictionary[$term]['postings'][$docID]['tf']++;
}
}
$temp = ('docCount' => $docCount, 'dictionary' => $dictionary);
As you see in 1st foreach loop is that $DocID is key and $doc is its contents(value) of collection array. But I don't know how to implement exact same thing when files read from directory. See code below..
Tf-idf With .txt Files and its contents read from directory
foreach (glob("C:\\wamp\\www\\Web-info\\documents\\awd_1990_00\\*.txt") as $file) {
$file_handle = fopen($file, "r");
//echo $file;
$dictionary = array();
$docCount = array();
foreach($file as $docID=> $value) {
echo $value;
$terms = explode(' ', $doc);
$docCount[$docID] = count($terms);
foreach($terms as $term) {
if(!isset($dictionary[$term])) {
$dictionary[$term] = array('df' => 0, 'postings' => array());
}
if(!isset($dictionary[$term]['postings'][$docID])) {
$dictionary[$term]['df']++;
$dictionary[$term]['postings'][$docID] = array('tf' => 0);
}
$dictionary[$term]['postings'][$docID]['tf']++;
}
}
}
$temp = array('docCount' => $docCount, 'dictionary' => $dictionary);
This gives me error on 1st foreach loop that invalid arugument supplied for foreach loop. As I mentioned earlier I want .txt files as a key and its contents as a value in 1st foreach loop. But I got this error Can anybody please Tell me how to do this.. Thanks in advance..
If you want to treat the entire file as one value, you can use file_get_contents() to read the file into a string:
$dictionary = array();
$docCount = array();
foreach (glob("C:\\wamp\\www\\Web-info\\documents\\awd_1990_00\\*.txt") as $docID) {
$value = file_get_contents($docID);
...
}

Convert array to an .ini file

I need to parse an .ini file into an array, and later change the values of the array and export it to the same .ini file.
I managed to read the file, but didn’t find any simple way to write it back.
Any suggestions?
Sample .ini file:
1 = 0;
2 = 1372240157; // timestamp.
In order to write the .ini file back, you need to create your own function, for PHP offers no functions out of the box other than for reading (which can be found here: http://php.net/manual/pl/function.parse-ini-file.php).
An example of function that might encapsulate a multidimensional array to .ini-syntax compatible string might look like this:
function arr2ini(array $a, array $parent = array())
{
$out = '';
foreach ($a as $k => $v)
{
if (is_array($v))
{
//subsection case
//merge all the sections into one array...
$sec = array_merge((array) $parent, (array) $k);
//add section information to the output
$out .= '[' . join('.', $sec) . ']' . PHP_EOL;
//recursively traverse deeper
$out .= arr2ini($v, $sec);
}
else
{
//plain key->value case
$out .= "$k=$v" . PHP_EOL;
}
}
return $out;
}
You can test it like this:
$x = [
'section1' => [
'key1' => 'value1',
'key2' => 'value2',
'subsection' => [
'subkey' => 'subvalue',
'further' => ['a' => 5],
'further2' => ['b' => -5]]]];
echo arr2ini($x);
(Note that short array syntax is available only since PHP 5.4+.)
Also note that it doesn't preserve the comments that were present in your question. There are no easy ways to remember them, when it is software (as opposed to a human) that updates the file back.
I've made significant changes to the function provided by rr- (many thanks for the kick-start!)
I was unhappy with the way multidimensional properties are handled in that version. I took the example ini file from the php documentation page for parse_ini_file and got a result which included the keys third_section.phpversion and third_section.urls - not what I expected.
I tried using a RecursiveArrayIterator for unlimited nesting, but unfortunately, a header with key-value pairs under it is the maximum limit of recursion that parse_ini_string will process before choking on an error message.
So I started from scratch, added some curveballs as the fourth and last items, and ended up with this:
$test = array(
'first_section' => array(
'one' => 1,
'five' => 5,
'animal' => "Dodo bird",
),
'second_section' => array(
'path' => "/usr/local/bin",
'URL' => "http://www.example.com/username",
),
'third_section' => array(
'phpversion' => array(5.0, 5.1, 5.2, 5.3),
'urls' => array(
'svn' => "http://svn.php.net",
'git' => "http://git.php.net",
),
),
'fourth_section' => array(
7.0, 7.1, 7.2, 7.3,
),
'last_item' => 23,
);
echo '<pre>';
print_r($test);
echo '<hr>';
$ini = build_ini_string($test);
echo $ini;
echo '<hr>';
print_r( parse_ini_string($ini, true) );
function build_ini_string(array $a) {
$out = '';
$sectionless = '';
foreach($a as $rootkey => $rootvalue){
if(is_array($rootvalue)){
// find out if the root-level item is an indexed or associative array
$indexed_root = array_keys($rootvalue) == range(0, count($rootvalue) - 1);
// associative arrays at the root level have a section heading
if(!$indexed_root) $out .= PHP_EOL."[$rootkey]".PHP_EOL;
// loop through items under a section heading
foreach($rootvalue as $key => $value){
if(is_array($value)){
// indexed arrays under a section heading will have their key omitted
$indexed_item = array_keys($value) == range(0, count($value) - 1);
foreach($value as $subkey=>$subvalue){
// omit subkey for indexed arrays
if($indexed_item) $subkey = "";
// add this line under the section heading
$out .= "{$key}[$subkey] = $subvalue" . PHP_EOL;
}
}else{
if($indexed_root){
// root level indexed array becomes sectionless
$sectionless .= "{$rootkey}[] = $value" . PHP_EOL;
}else{
// plain values within root level sections
$out .= "$key = $value" . PHP_EOL;
}
}
}
}else{
// root level sectionless values
$sectionless .= "$rootkey = $rootvalue" . PHP_EOL;
}
}
return $sectionless.$out;
}
My input and output arrays match (functionally, anyway) and my ini file looks like this:
fourth_section[] = 7
fourth_section[] = 7.1
fourth_section[] = 7.2
fourth_section[] = 7.3
last_item = 23
[first_section]
one = 1
five = 5
animal = Dodo bird
[second_section]
path = /usr/local/bin
URL = http://www.example.com/username
[third_section]
phpversion[] = 5
phpversion[] = 5.1
phpversion[] = 5.2
phpversion[] = 5.3
urls[svn] = http://svn.php.net
urls[git] = http://git.php.net
I know it may be a little overkill, but I really needed this function in two of my own projects. Now I can read an ini file, make changes and save it.
The answer by RR works and I added one change
in else statement
//plain key->value case
$out .= "$k=$v" . PHP_EOL;
change it to
//plain key->value case
$out .= "$k=\"$v\"" . PHP_EOL;
By having " around the value, you can have larges values in the INI otherwise parse_ini_* functions will have an issue
http://missioncriticallabs.com/blog/2009/08/double-quotation-marks-in-php-ini-files/
This is my enhanced version answer of rr- (thanks to him), my function is a part of class in laravel eco-system so a function named Arr::isAssoc is used which is basically to detect whether the given array is an associative array or not.
private function arrayToConfig(array $array, array $parent = []): string
{
$returnValue = '';
foreach ($array as $key => $value)
{
if (is_array($value)) // Subsection case
{
// Merge all the sections into one array
if (is_int($key)) $key++;
$subSection = array_merge($parent, (array)$key);
// Add section information to the output
if (Arr::isAssoc($value))
{
if (count($subSection) > 1) $returnValue .= PHP_EOL;
$returnValue .= '[' . implode(':', $subSection) . ']' . PHP_EOL;
}
// Recursively traverse deeper
$returnValue .= $this->arrayToConfig($value, $subSection);
$returnValue .= PHP_EOL;
}
elseif (isset($value)) $returnValue .= "$key=" . (is_bool($value) ? var_export($value, true) : $value) . PHP_EOL; // Plain key->value case
}
return count($parent) ? $returnValue : rtrim($returnValue) . PHP_EOL;
}
What about using php internal functions ? http://php.net/manual/en/function.parse-ini-file.php

php - converting swithch to .csv file read method

Can someone please point me in the correct direction to convert my switch code from currently being listed like below to being drawn from a CSV file instead:
$video = (isset($_GET['video']) ? $_GET['video'] : null);
if($video) {
switch($video) {
case "apple":
$Heading ='Apple Heading';
$Videonum ='1';
$Content ='<h2>Apple Sub Heading</h2>
<p>Apple content</p>';
$SideContent ='Apple side content';
break;
I will end up with lots of cases and it'll be easier to manage from a .csv file - thank you
I think you need a two-dimensional array with the identifier ('apple', …) as the key for the inner arrays. By parsing a CSV-file you will get an array with multiple rows, but you need to search for the row, that contains the required data. Maybe you also can save PHP files containing the necessary data-arrays or even use a database (which is probably most common for such cases).
Target arrays as I would use it:
$data = array(
'apple' => array(
'heading' => 'Apple Heading',
'video_num' => 1,
'content' => '<h2>Apple Sub Heading</h2>
<p>Apple content</p>',
'side_content' => 'Apple side content',
),
/* more manufacturer sub-arrays */
);
In this first case you could access the whole data by just reading from the array:
if( !empty( $_GET['video'] ) && isset( $data[$_GET['video']] ) )
{
var_dump(
$data[$_GET['video']]['heading'],
$data[$_GET['video']]['content']
);
}
else
{
echo '<p class="error">No video specified or "' . $_GET['video'] . '" is not available.</p>';
}
FYI; Array as retrieved from a CSV-file:
$data = array(
1 => array(
'manufacturer' => 'Apple',
'heading' => 'Apple Heading',
'video_num' => 1,
'content' => '<h2>Apple Sub Heading</h2>
<p>Apple content</p>',
'side_content' => 'Apple side content',
),
/* more rows */
);
Read your csv file
$data = array();
$fp = fopen('manufacturer.csv', 'r');
while (!feof($fp)) {
$line = explode(';',fgets($fp));
$data[$line[0]]['heading'] = $line[1];
$data[$line[0]]['video'] = $line[2];
$data[$line[0]]['content'] = $line[3];
$data[$line[0]]['side'] = $line[4];
}
fclose($fp);
Your csv looks like
apple;Apple Heading;1;<h2>Apple Sub Heading</h2><p>Apple content</p>;Apple side content
microsoft;MS Heading;1;<h2>MS Sub Heading</h2><p>MS content</p>;MS side content
...
Then acces your content with the manufacturer name
if(isset($data[$_GET['video']] && !empty($_GET['video']))){
$Heading = $data[$_GET['video']]['heading'];
$Videonum = $data[$_GET['video']]['video'];
$Content = $data[$_GET['video']]['content'];
$SideContent = $data[$_GET['video']]['side'];
}

CSV file to flat array with materialized path

I have CSV file which contains a list of files and directories:
Depth;Directory;
0;bin
1;basename
1;bash
1;cat
1;cgclassify
1;cgcreate
0;etc
1;aliases
1;audit
2;auditd.conf
2;audit.rules
0;home
....
Each line depends on the above one (for the depth param)
I would like to create an array like this one in order to store it into my MongoDB collection with Materialized Paths
$directories = array(
array('_id' => null,
'name' => "auditd.conf",
'path' => "etc,audit,auditd.conf"),
array(....)
);
I don't know how to process...
Any ideas?
Edit 1:
I'm not really working with directories - it's an example, so I cannot use FileSystems functions or FileIterators.
Edit 2:
From this CSV file, I'm able to create a JSON nested array:
function nestedarray($row){
list($id, $depth, $cmd) = $row;
$arr = &$tree_map;
while($depth--) {
end($arr );
$arr = &$arr [key($arr )];
}
$arr [$cmd] = null;
}
But i'm not sure it's the best way to proceed...
This should do the trick, I think (it worked in my test, at least, with your data). Note that this code doesn't do much error checking and expects the input data to be in proper order (i.e. starting with level 0 and no holes).
<?php
$input = explode("\n",file_get_contents($argv[1]));
array_shift($input);
$data = array();
foreach($input as $dir)
{
if(count($parts = str_getcsv($dir, ';')) < 2)
{
continue;
}
if($parts[0] == 0)
{
$last = array('_id' => null,
'name' => $parts[1],
'path' => $parts[1]);
$levels = array($last);
$data[] = $last;
}
else
{
$last = array('id' => null,
'name' => $parts[1],
'path' => $levels[$parts[0] - 1]['path'] . ',' . $parts[1]);
$levels[$parts[0]] = $last;
$data[] = $last;
}
}
print_r($data);
?>
The "best" way to go would be to not store your data in CSV format, as it's the Wrong Tool For The Job.
That said, here you go:
<?php
$lines = file('/path/to/your/csv_file.csv');
$directories = array();
$path = array();
$lastDepth = NULL;
foreach ($lines as $line) {
list($depth, $dir) = str_getcsv($line, ';');
// Skip headers and such
if (!ctype_digit($depth)) {
continue;
}
if ($depth == $lastDepth) {
// If this depth is the same as the last, pop the last directory
// we added off the stack
array_pop($path);
} else if ($depth == 0) {
// At depth 0, reset the path
$path = array();
}
// Push the current directory onto the path stack
$path[] = $dir;
$directories[] = array(
'_id' => NULL,
'name' => $dir,
'path' => implode(',', $path)
);
$lastDepth = $depth;
}
var_dump($directories);
Edit:
For what it's worth, once you have the desired nested structure in PHP, it would probably be a good idea to use json_encode(), serialize(), or some other format to store it to disk again, and get rid of the CSV file. Then you can just use json_decode() or unserialize() to get it back in PHP array format whenever you need it again.

Extracting and grouping database data to an array

I have a database field called "servers"
This field has a link in each row, this field content:
> http://www.rapidshare.com/download1
> http://www.rapidshare.com/download2
> http://www.rapidshare.com/download3
> http://www.megaupload.com/download1
> http://www.megaupload.com/download2
> http://www.megaupload.com/download3
> http://www.fileserve.com/download1
> http://www.fileserve.com/download2
> http://www.fileserve.com/download3
I want to create an array with all the server names, and create more array with links inside.
That's how it should be:
$servers = array(
'rapidshare' => array(
'link1' => 'http://www.rapidshare.com/download1',
'link2' => 'http://www.rapidshare.com/download2',
'link3' => 'http://www.rapidshare.com/download3'),
'megaupload' => array(
'link1' => 'http://www.megaupload.com/download1',
'link2' => 'http://www.megaupload.com/download2',
'link3' => 'http://www.megaupload.com/download3'),
'fileserve' => array(
'link1' => 'http://www.megaupload.com/download1',
'link2' => 'http://www.megaupload.com/download2',
'link3' => 'http://www.megaupload.com/download3')
);
This will do the trick: (make sure that domain is actually showing up in $domain variable though because it might be $matches[1]... I can't remember)
$newStructure = array();
foreach($links as $link) {
preg_match("/www\.([^\.])\.com/",$link,$matches);
$domain = $matches[0];
$currentLength = count($newStructure[$domain]);
if($currentLength) {
$newStructure[$domain]['link'.($currentLength+1)] = $link;
} else {
$newStructure[$domain] = array('link1'=>$link);
}
}
$server = array(
'http://www.rapidshare.com/download1',
'http://www.rapidshare.com/download2',
'http://www.rapidshare.com/download3',
'http://www.megaupload.com/download1',
'http://www.megaupload.com/download2',
'http://www.megaupload.com/download3',
'http://www.fileserve.com/download1',
'http://www.fileserve.com/download2',
'http://www.fileserve.com/download3'
);
$match = array();
$myarray = array();
foreach($server as $v) {
// grab server name
preg_match('/\.(.+)\./', $v, $match);
$serverName = $match[1];
// initialize new array if its the first link of that particular server
if (!isset($myarray[$serverName])) {
$myarray[$serverName] = array();
}
// count server array to check how many links are there, and make next link key
$linkKey = 'link' . (count($myarray[$serverName]) + 1);
// store value
$myarray[$serverName][$linkKey] = $v;
}
print_r($myarray);
Hey maybe this will help you. But i dont see the purpose of those names of the keys (link1,link2 etc..). This wont work on pagination thou.

Categories