Creating a simple text file based search engine - php

I need to create a simple text file based search engine asap (using PHP)! Basically it has to read files in a directory, remove stop and useless words, index each remaining useful word with how many times it appears in each document.
I guess the pseudo code for this is:
for each file in directory:
read in contents,
compare to stop words,
add each remaining word to array,
count how many times that word appears in document,
add that number to the array,
add the id/name of the file to the array,
also need to count the total amount of words (after useless removal i guess) in the whole file, which im guessing can be done afterwards as long as i can get the file id from that array and then count the words inside....?
Can anyone help, maybe provide a barebones structure? I think the main bit i need help with is getting the number of times each word appears in the document and adding it to the index array...
Thanks

$words=array();
foreach (glob('*') as $file) {
$contents=file_get_contents($file);
$words[$file]=array();
preg_match_all('/\S+/',$contents,$matches,PREG_SET_ORDER);
foreach ($matches as $match) {
if (!isset($words[$file][$match[0]))
$words[$file][$match[0]]=0;
$words[$file][$match[0]]++;
}
foreach ($useless as $value)
if (isset($words[$file][$value]))
unset($words[$file][$value]);
$count=count($words[$file]);
var_dump($words[$file]);
echo 'Number of words: '.$count;
}

Take a look at str_word_count. It counts words, but can also extract them to an array (each value in the array being a word). You can then post-process this array to remove stop words, count occurrences, etc.

Well getting each file in the directory should be simple by using glob
Then reading the files can be done with
file_get_contents
/**
* This is how you will add extra rows
*
* $index[] = array(
* 'filename' => 'airlines.txt',
* 'word' => 'JFK',
* 'count' => 3,
* 'all_words_count' => 42
* );
*/
$index = array();
$words = array('jfk', 'car');
foreach( $words as $word ) {
// All files with a .txt extension
// Alternate way would be "/path/to/dir/*"
foreach (glob("test_files/*.txt") as $filename) {
// Includes the file based on the include_path
$content = file_get_contents($filename, true);
$count = 0;
$totalCount = str_word_count($content);
if( preg_match_all('/' . $word . '/i', $content, $matches) ) {
$count = count($matches[0]);
}
// And another item to the list
$index[] = array(
'filename' => $filename,
'word' => $word,
'count' => $count,
'all_words_count' => $totalCount
);
}
}
// Debug and look at the index array,
// make sure it looks the way you want it.
echo '<pre>';
print_r($index);
echo '</pre>';
When I tested the above code, this is what I got.
Array
(
[0] => Array
(
[filename] => test_files/airlines.txt
[word] => jfk
[count] => 2
[all_words_count] => 38
)
[1] => Array
(
[filename] => test_files/rentals.txt
[word] => jfk
[count] => 0
[all_words_count] => 47
)
[2] => Array
(
[filename] => test_files/airlines.txt
[word] => car
[count] => 0
[all_words_count] => 38
)
[3] => Array
(
[filename] => test_files/rentals.txt
[word] => car
[count] => 3
[all_words_count] => 47
)
)
I think I have solved your question :D Add this to the after the above script and you should be able to sort the count, starting at zero with $sorted and from the highest with $sorted_desc
function sorter($a, $b) {
if( $a['count'] == $b['count'] )
return 0;
return ($a['count'] < $b['count']) ? -1 : 1;
}
// Clone the original list
$sorted = $index;
// Run a custom sort function
uasort($sorted, 'sorter');
// Reverse the array to find the highest first
$sorted_desc = array_reverse($sorted);
// Debug and look at the index array,
// make sure it looks the way you want it.
echo '<h1>Ascending</h1><pre>';
print_r($sorted);
echo '</pre>';
echo '<h1>Descending</h1><pre>';
print_r($sorted_desc);
echo '</pre>';

Here's a basic structure:
Create an $index array
Use scandir (or glob, if you need to only get files of a certain type) to get the files in the directory.
For each file:
Get contents with file_get_contents
Use str_word_count to get array $word_stream of word stream
Create an array $word_array to hold word counts
For each word in $word_stream:
If it is in a $ignored_words array, skip it
If it is not already in $word_array as a key, add $word_array[$word] = 1
If it is already in $word_array, increment $word_array[$word]++
Get the sum of $word_array with array_sum, or the sum of unique words with count; you can add them to $word_array with keys "_unique" and "_count" (which will not be words), if you like
Add the filename as a key to the $index array, with the value being $word_array

Related

php check if element exist through session array

How can I loop through set of session array and check if $_session['items'][1][p_alt-variation-1] and so on, are exist? the [p_alt-variation-{n}] elements are dynamic if certain item has these add-on variations, so it could be as much as more than 1
print_r($_session['items'])
Array
(
[0] => Array
(
[p_name] => Hovid PetSep
[p_code] => 336910
[p_coverImg] => 14-1460428610-ulNvG.jpg
[p_id] => 14
[p_price] => 24.50
[p_qty] => 2
)
[1] => Array
(
[p_name] => X-Dot Motorbike Helmet G88 + Bogo Visor (Tinted)
[p_code] => 2102649
[p_coverImg] => 12-1460446199-wI5qx.png
[p_id] => 12
[p_price] => 68.00
[p_alt-variation-1] => Red
[p_alt-variation-2] => L
[p_qty] => 1
)
)
I want to show for user if certain item has various variations in their cart if exist, how to look for element in array if contains string like [p_alt-variation-{n}] through?
I use foreach($_SESSION['items'] as $cart_item){ ... } to loop all cart items to show item's info.
Thanks for advice.
Not a regex guru, but you could just get the keys and check using preg_grep. If it has more than one key for that keyword, just count the results.
Here's the idea:
foreach($_SESSION['items'] as $cart_item) { // loop the items
$keys = array_keys($cart_item); // get the keys of the current batch
// make the expression
$matches = preg_grep('~^p_alt\-variation\-\d+~i', $keys); // simply just checking the key name with has a number in the end, adjust to your liking
if(count($matches) > 1) { // if it has more than one, or just change this to how many do you want
echo 'has more than one variation';
}
}
If you wanted to use some of that keys, just use the results that was found inside $matches:
if(count($matches) > 1) {
foreach($matches as $matched_key) {
echo $cart_item[$matched_key];
}
}
The session variables are just like any other array values; you can always use isset function. Please refer: http://php.net/manual/en/function.isset.php

Search multidimensional array for value in certain key, return value of different key

I'm new to php and have been using the community and answers here to really help me with a little project I'm working on so thank you all in advance for the help so far!
I am pulling a load of information held in a poorly formatted text file/feed, trimming the contents of special characters and then using str_replace to find other specific strings and replace them with commas(,) or semi-colons(;), in order to create a usable piece of text. I then want to search this text for certain keywords and return other parts of the text in it's place.
So far, I've managed to explode the text into a multidimensional array, but I can't work out how to search this array now, in order to pull out a specific piece of information. I'm essentially trying to build a searchable array that I can pull information from as and when the original feed updates. Here's a sample of the array as it stands at the moment:
Array
(
[0] => Array
(
[0] => 240
[1] => 1
[2] => euro
[3] => 2016-02-19 15:30:00
[4] => EUR
)
[1] => Array
(
[0] => 240
[1] => 3
[2] => euro2
[3] => 2016-02-19 15:00:00
[4] => EUR
)
[2] => Array
(
[0] => 1890
[1] => 9
[2] => uspb
[3] => 2016-02-17 22:59:00
[4] => USD
)
)
Essentially, I want to be able to write something that will search this array for say uspb (array 2, key 2) and if it is found, return the value held under another key. So if I want key 0, it will return 1890. If I want key 1 when searching for euro2 it will return "3".
I've looked through a ton of examples and nothing really fits what I'm after at the moment. Perhaps I'm looking at this the wrong way and using an array isn't the correct approach. Any advice would be greatly appreciated.
For reference, here's a copy of my code (slight redacted) so far.
<?php
$file=file_get_contents("http://www.example.com/feed/");
$trim=trim($file, "[]");
$find = array("{\"value\":\"", "\",\"date_utc\":\"", "\",\"currency\":\"");
$replace = array(",", ",", "");
$replaced = str_replace($find, $replace, $trim);
$ret = array_map (
function ($_) {return explode (',', $_);},
explode (';', $replaced)
);
print_r ($ret);
?>
As your array is multidimensional - you have to iterate over it to find the value you need:
foreach ($ret as $value) {
// the index you want to search is always `2`?
if ($value[2] == 'uspb2') {
echo $value[0];
break;
}
}
And moving to function:
function findMyValue(
$array,
$search_key,
$search_str,
$key
) {
foreach ($array as $v) {
if ($v[$search_key] == $search_str) {
return $v[$key];
}
}
return 'NOT_FOUND';
}
echo findMyValue($ret, 2, 'euro', 1); // outputs 3
echo findMyValue($ret, 2, 'uspb', 0); // outputs 1890
And as already noticed in comments - it's not a poorly formated text, it's JSON. You can get an array from JSON string simply with json_decode function:
$file=file_get_contents("http://www.example.com/feed/");
$ret = json_decode($file, true);
var_dump($ret);

PHP from string to multiple arrays at the hand of placeholders

Good day,
I have an I think rather odd question and I also do not really know how to ask this question.
I want to create a string variable that looks like this:
[car]Ford[/car]
[car]Dodge[/car]
[car]Chevrolet[/car]
[car]Corvette[/car]
[motorcycle]Yamaha[/motorcycle]
[motorcycle]Ducati[/motorcycle]
[motorcycle]Gilera[/motorcycle]
[motorcycle]Kawasaki[/motorcycle]
This should be processed and look like:
$variable = array(
'car' => array(
'Ford',
'Dodge',
'Chevrolet',
'Corvette'
),
'motorcycle' => array(
'Yamaha',
'Ducati',
'Gilera',
'Kawasaki'
)
);
Does anyone know how to do this?
And what is it called what I am trying to do?
I want to explode the string into the two arrays. If it is a sub array
or two individual arrays. I do not care. I can always combine the
latter if I wish so.
But from the above mentioned string to two arrays. That is what I
want.
Solution by Dlporter98
<?php
///######## GET THE STRING FILE OR DIRECT INPUT
// $str = file_get_contents('file.txt');
$str = '[car]Ford[/car]
[car]Dodge[/car]
[car]Chevrolet[/car]
[car]Corvette[/car]
[motorcycle]Yamaha[/motorcycle]
[motorcycle]Ducati[/motorcycle]
[motorcycle]Gilera[/motorcycle]
[motorcycle]Kawasaki[/motorcycle]';
$str = explode(PHP_EOL, $str);
$finalArray = [];
foreach($str as $item){
//Use preg_match to capture the pieces of the string we want using a regular expression.
//The first capture will grab the text of the tag itself.
//The second capture will grab the text between the opening and closing tag.
//The resulting captures are placed into the matches array.
preg_match("/\[(.*?)\](.*?)\[/", $item, $matches);
//Build the final array structure.
$finalArray[$matches[1]][] = $matches[2];
}
print_r($finalArray);
?>
This gives me the following array:
Array
(
[car] => Array
(
[0] => Ford
[1] => Dodge
[2] => Chevrolet
[3] => Corvette
)
[motorcycle] => Array
(
[0] => Yamaha
[1] => Ducati
[2] => Gilera
[3] => Kawasaki
)
)
The small change I had to make was:
Change
$finalArray[$matches[1]] = $matches[2]
To:
$finalArray[$matches[1]][] = $matches[2];
Thanks a million!!
There are many ways to convert the information in this string to an associative array.
split the string on the new line into an array using the explode function:
$str = "[car]Ford[/car]
[car]Dodge[/car]
[car]Chevrolet[/car]
[car]Corvette[/car]
[motorcycle]Yamaha[/motorcycle]
[motorcycle]Ducati[/motorcycle]
[motorcycle]Gilera[/motorcycle]
[motorcycle]Kawasaki[/motorcycle]";
$items = explode(PHP_EOL, $str);
At this point each delimited item is now an array entry.
Array
(
[0] => [car]Ford[/car]
[1] => [car]Dodge[/car]
[2] => [car]Chevrolet[/car]
[3] => [car]Corvette[/car]
[4] => [motorcycle]Yamaha[/motorcycle]
[5] => [motorcycle]Ducati[/motorcycle]
[6] => [motorcycle]Gilera[/motorcycle]
[7] => [motorcycle]Kawasaki[/motorcycle]
)
Next, loop over the array and pull out the appropriate pieces needed to build the final associative array using the preg_match function with a regular expression:
$finalArray = [];
foreach($items as $item)
{
//Use preg_match to capture the pieces of the string we want using a regular expression.
//The first capture will grab the text of the tag itself.
//The second capture will grab the text between the opening and closing tag.
//The resulting captures are placed into the matches array.
preg_match("/\[(.*?)\](.*?)\[/", $item, $matches);
//Build the final array structure.
$finalArray[$matches[1]] = $matches[2]
}
The following is an example of what will be found in the matches array for a given iteration of the foreach loop.
Array
(
[0] => [motorcycle]Gilera[
[1] => motorcycle
[2] => Gilera
)
Please note that I use the PHP_EOL constant to explode the initial string. This may not work if the string was pulled from a different operating system than the one you are running this code on. You may need to replace this with the actual end of line characters that is being used by the string.
Why don't you create two separate arrays?
$cars = array("Ford", "Dodge", "Chevrolet", "Corvette");
$motorcycle = array("Yamaha", "Ducati", "Gilera", "Kawasaki");
You could also use an Associative array to do this.
$variable = array("Ford"=>"car", "Yamaha"=>"motorbike");

Get all array data with a certain string

I want to get all the array data where keys has the characters 'ch' from the start. How do I get it?
Array ( [editpostid] => 0 [editpostcat] => 1 [ch114] => on [ch115] => on )
The keys of the data may vary as the numbers come from the record id's from the database.
how do I place all the data with 'ch' in the start of keys on to a separate array?
Do like this
<?php
$arr = array('ch'=>10,'abch'=>20,'ch23'=>45);
$newarr=array();
foreach($arr as $k=>$v)
{
if(substr(strtolower($k),0,2)=='ch')
{
array_push($newarr,$v); // Make use of this if you just need the values
//$newarr[$k]=$v; // Uncomment this and comment above statement, if you need the keys too
}
}
print_r($newarr);
OUTPUT:
Array
(
[0] => 10
[1] => 45
)
$charray=array();
foreach($yourarray as $key=>$value){
if(preg_match("/^ch/",$key)){
$charray[$key]=>$value;
}
}
//$charray is the new arrayas you asked for
echo implode(',',$charray);
refer official documentation for preg_match for more information

Selecting from an Array using a random key

My question is a little bit complicated but I will try to explain it as good as I can.
I have an array let's say:
$array(5){
[1]=>1,
[2]=>2,
[3]=>3,
[4]=>4,
[5]=>5
}
And I have a randomly generated key, let's say $rand = 34526147; The length of the key is always the same.
Now the question is: I want to select keys from the array which are ordered randomly BUT BASED ON THE KEY WE HAVE. I mean when I give the same key it will always return the same order but if I change the key it will return differently ordered array. Thank You.
My understanding is you want to shuffle() the array, but make it consistent for whatever $rand value is provided. I also believe PHP uses rand (behind the scenes) within shuffle which makes it possible to use srand (giving a consistent randomized order for the provided key). So, with that said:
$rand = 34526247;
srand($rand);
shuffle($array);
Because you're always seeding random from that same "key" you should get a consistent (repeatable) shuffle outcome. (At least it did with a brief test)
Note: This means $rand must be a numeric value. And, if at any point it isn't, you'd need to convert it to one.
uksort allows defining a custom sort based on the keys:
uksort($array,function($a,$b){
global $rand;
return strpos(''.$rand, ''.$a) - strpos(''.$rand, ''.$b);
});
Note that this assumes all keys exist in $rand.
Example state of array:
Array (
[3] => 3
[4] => 4
[5] => 5
[2] => 2
[1] => 1 )
You should calculate a new value for each question / row in addition to its id. This value needs to change with each new key and look random enough to your players. You can simply multiply the id with the key, then order by the rightmost digits, like this:
$key = 1243;
$questions = array(
195741 => array('foo'),
168762 => array('bar'),
984133 => array('baz'),
);
$newquestions = array();
foreach ($questions as $id => $row) {
// calculate a random looking order depending on $key
$newquestions[$id * $key % 100] = $row;
}
ksort($newquestions);
Output:
Array
(
[19] => Array
(
[0] => baz
)
[63] => Array
(
[0] => foo
)
[66] => Array
(
[0] => bar
)
)
Edit: Include actual sorting
I could be wrong, but I read you to mean that you are asking if you can change the order of the Qs to match that given in the rand int.
$questions = array(
1=>'one',
2=>'two',
3=>'three',
4=>'four',
5=>'five',
);
$rand = 34526147;
$order = array_unique(str_split($rand));
foreach($order as $ord){
if(array_key_exists($ord, $questions)) {
echo 'Q: ' . $ord . ' is ' .$questions[$ord] . PHP_EOL;
}
}
Gives:
Q: 3 is three
Q: 4 is four
Q: 5 is five
Q: 2 is two
Q: 1 is one

Categories