Search text files and display results with PHP - php

I have a folder (blogfiles/posts) with various text files, numbered (1.txt, 2.txt, 3.txt...) and they each hold a post for a blog (I haven't learned SQL yet). I'm trying to make a search engine for it that will take a query from a text box (done with this part), then search the files for each word in the query, and return the results (possibly in order of the number of times the word occurs).
Each text file looks like this:
Title on Line 1
Date Posted on Line 2 (in Month Date, Year form)
Post body to search on lines 3 and up
I currently have this code:
<?php
$q = $_GET["q"];
$qArray = explode(" ", $q);
//preparing files
$post_directory = "blogfiles/posts/";
$files = scandir($post_directory, 1);
$post_count = (count($files)) - 2;
$files = array_pop($files); // there are 2 server files I want to ignore (#1)
$files = array_pop($files); // there are 2 server files I want to ignore (#2)
foreach ($files as $file) {
//getting title
$post_path = $post_directory . $file;
$post_filecontents = file($post_path);
$post_title = $post_filecontents[0];
echo "<tr><td>" . $post_title . "</td></tr>";
}
if ($post_count > 2) {
$postPlural = "s";
}
echo "<tr><td>" . $post_count . " post" . $postPlural . ".";
?>
I'll apologize now for the formatting, I was trying to separate it to troubleshoot.
Any help to get this working would be greatly appreciated.

There are many ways to search files.
use preg_match_all function to match pattern for each file.
use system() function to run external command like grep (only available under *nix).
use strpos function ( not recommended because of low performance and lack of support of pattern ).
If you will face a big traffic you'd better use pre-build indexes to accelerate the search. for example split the posts into tokens ( words ) and add position info along with the words, when user search the some words you can just split the words first and then look for the indexes. It's simpler to discribe this method than to implement it. You may need a existing full-text search engine like Apache Lucene.

Related

Context index generation for meilisearch

I've been using all sorts of hacks to generate file indexes out of SMB shares. And it's all cool with basic filepath plus metadata indexing.
The next step I want to implement is an algorithm combining some unix-like utilities and php, to index specific context from within files.
Now the first step in this context generation is something like this
while read p; do egrep -rH '^;|\(|^\(|\)$' "$p"; done <textual.txt > text_context_search.txt
This is specific regexing for my purpose for indexing contents of programs, this extracts lines that are whole comments or contains comments out of CNC program files.
resulting output is something like
file_path:regex_hit
now obviously most programs has more than one comment, so theres too much redundancy not only in repetition, but an exhaustive context index is about a gigabyte in size
I am now working towards script that would compact redudancy in such pattern
file_path_1:regex_hit_1
file_path_1:regex_hit_2
file_path_1:regex_hit_3
...
would become:
file_path_1:regex_hit1,regex_hit_2,regex_hit3
and if I succeed to do this in efficient manner its all ok.
The problem here is whether I'm doing this in a proper way. Maybe I should be using different tools to generate such context index in the first place ?
EDIT
After further copying and pasting from stack overflow and thinking about it I glued up solution using not my code, that nearly entirely solves my previously mentioned issue.
<?php
// https://stackoverflow.com/questions/26238299/merging-csv-lines-where-column-value-is-the-same
$rows = array_map('str_getcsv', file('text_context_search2.1.txt'));
//echo '<pre>';
print_r($csv);
//echo '</pre>';
// Array for output
$concatenated = array();
// Key to organize over
$sortKey = '0';
// Key to concatenate
$concatenateKey = '1';
// Separator string
$separator = ' ';
foreach($rows as $row) {
// Guard against invalid rows
if (!isset($row[$sortKey]) || !isset($row[$concatenateKey])) {
continue;
}
// Current identifier
$identifier = $row[$sortKey];
if (!isset($concatenated[$identifier])) {
// If no matching row has been found yet, create a new item in the
// concatenated output array
$concatenated[$identifier] = $row;
} else {
// An array has already been set, append the concatenate value
$concatenated[$identifier][$concatenateKey] .= $separator . $row[$concatenateKey];
}
}
// Do something useful with the output
//var_dump($concatenated);
//echo json_encode($concatenated)."\n";
$fp = fopen('exemplar.csv', 'w');
foreach ($concatenated as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);

Getting File Name Details from Image File

I am looking for a way to grab details from a file name to insert it into my database. My issue is that the file name is always a bit different, even if it has a pattern.
Examples:
arizona-911545_1920.jpg
bass-guitar-913092_1280.jpg
eiffel-tower-905039_1280.jpg
new-york-city-78181_1920.jpg
The first part is always what the image is about, for example arizona, bass guitar, eiffel tower, new york city followed by a unique id and the width of the image.
What I am after would be extracting:
name id and width
So if I run for example getInfo('arizona-911545_1920.jpg');
it would return something like
$extractedname
$extractedid
$extractedwidth
so I could easily save this in my mysql database like
INSERT into images VALUES ('$extractedname','$extractedid','$extractedwidth')
What bothers me most is that image names can be longer, for example new-york-city-bank or even new-york-city-bank-window so I need a safe method to get the name, no matter how long it would be.
I do know how to replace the - between the name, that's not an issue. I am really just searching for a way to extract the details I mentioned above.
I would appreciate it if someone could enlighten me on how to solve this.
Thanks :)
One of the simplest way in this case is to use regexp, for example:
preg_match('/^(\D+)-(\d+)_(\d+)/', $filename, $matches);
// $matches[1] - name
// $matches[2] - id
// $matches[3] - width
This is the main Idea.
Let's pick a file first.
Filename will be "bass-guitar-913092_1280.jpg"
First of all we will Split this with explode, to dot( . ) in variable $Temp
This will give us an Array of bass-guitar-913092_1280 and jpg
We will choose to have the first Item of the array to continue since is the name we are interested in so we will get it with $Temp[0]
After this we will Split it Again this time to ( _ ).
Now we will have an array of bass-guitar-913092 and 1280
The Second value of the Array is what We need so we will pick it with $Temp[1]
The Last part is simple as the others, We will now Split the file name $Temp[0] with ( - ) We will get the Last value of it which is the id $Temp[count($Temp)-1] and we will remove this from the array list, and Connect everything else with implode and the delimeter we want
Now we can use also the Function ucwords to Capitalize every first letter of each word on the main name.
In the following code, there are 2 ways of getting the name, one with lowercase letters, and one with uppercase first letters of each word, uncomment what you want.
Edited Code as a Function
<?php
function ExtractFileInfo($fileName) {
$Temp = explode(".",$fileName);
$Temp = explode("_",$Temp[0]);
$width = $Temp[1];
$Temp = explode("-",$Temp[0]);
$id = $Temp[count($Temp)-1];
unset($Temp[count($Temp-1)]);
// If you want to have the name with lowercase letters Uncomment the Following:
//$name = implode(" ",$Temp);
// If you Want to Capitalize every first letter of the name Uncomment the Following:
//$name = ucwords(implode(" ",$Temp));
return array($name,$id,$width);
}
?>
This will return an Array of 3 Elements Name, Id and Width
Extracting the data you are looking for would be best via a regex pattern like the following:
(.+)-(\d+_(\d+))
Example here: https://regex101.com/r/oM5bS8/2
preg_match('(.+)-(\d+_(\d+))',"<filename>", $matches);
$extractedname = $matches[1];
$extractedid = $matches[2];
$extractedwidth = $matches[3];
EDIT - Just reread the question and you are looking for extraction techniques not how to post the image from a page to your backend. I will leave this here for reference.
When you post files via a form in html to a PHP backend there are few items that are needed.
1) You need to ensure that your form type is multi-part so that it knows to pass the files along.
<form enctype="multipart/form-data">
2) Your php backend needs to iterate over the files and save them accordingly.
Here is a sample of how to iterate over the files that are being submitted.
foreach($_FILES as $file) {
$n = $file['name'];
$s = $file['size'];
if (!$n) continue;
echo "File: $n ($s bytes)";
}

Searching a text file and displaying a part of the String. (PHP)

So I want to search a text file which contains a list of suburbs with names and postcodes. Depending on the the postcode given I want to display the suburb. I know I'm suppose to loop through the text file but have no idea how to actually search for exact value in the line and then display a different part of that same line. I know that I can use the explode function and get the part of the string I want but what I don't know how to do is loop through the file and finding the exact line of it.
Any Help on this is most Appreciated !
Thanks !
Since you didn't provide any example of what you have tried so far, this may not completely match what you are doing.
Assuming a file suburbs.txt with contents like:
Somewhere,12345
Somewhere Else,12346
This Place,12347
There,12348
You could do the following to loop through the entries:
$zipCode = '12346';
$lines = file('/path/to/suburbs.txt');
foreach ( $lines as $line )
{
$fields = explode( ',', $line );
if ( $fields[1] == $zipCode )
{
echo "Your suburb is: " . $fields[0];
break;
}
}
file() loads a file into an array. This is what allows you to loop through using foreach(). There are other methods of doing this as well, but this should help you move in the right direction.

PHP syntax equivalent to SQL %

I have a folder unit_images that is full of images with random names except they have a prefix that corresponds to a PRIMARY KEY in a db table.
1_4534534fw4.jpg
2_43534tw4t45.png
You get my point. I am using YoxView to view images and am doing something like this:
<?php
$query = "SELECT id FROM images WHERE unit = ".$_GET['id']."";
$result = mysqli_query($con, $query);
echo '<div class="yoxview">';
while ($row = mysqli_fetch_assoc($result))
{
echo '<img src="unit_images/01.jpg" alt="First" title="First image" />
<img src="unit_images/02.jpg" alt="Second" title="Second image" />'
}
echo '</div>';
?>
I want to list the images from my folder unit_images WHERE the unit is the one currently being shown.I just do not know how to tell PHP the filename. I want to do something similar to a 1_% in SQL. Does that make sense? Since I do not know what the rest of the filename looks like I just need to tell php it doesn't matter.
The closest equivalent to SQL's % wildcard that works for files is the shell's * wildcard, which is available in php through the glob() function. You can iterate through its results as follows:
foreach (glob("unit_images/1_*.jpg") as $filename) {
echo '<img src="unit_images/' . htmlspecialchars(basename($filename))
. '" />';
}
If I understand you correctly, you want to get a list of items in a directory with a certain prefix.
Step 1 (use sandir() to determine the items in the directory):
$files = scandir('unit_images');
Step 2 (eliminate the unwanted image names):
for ($i=0; $i<count($files); i++)
{
$file = $files[i];
$prefix = explode("_", $file)[0];
if ($prefix != $_GET['id'])
{
unset($files[i]);
}
}
$files is now an array of only the file names prefixed with $_GET['id]
My advice would be to store the whole name of the file in a column so that you can reference it directly instead of having to search in the file system for a matching name.
Having said that, you can search for a file using the glob PHP function.
Maybe you can do something like this:
glob("unit_images/<YOUR_PRIMARY_KEY>_*.{png,jpg}", GLOB_BRACE);

Removing Lines in php ? Is this possible?

I have been struggling to create a Simple ( really simple ) chat system for my website as my knowledge on Javascripting/AJAX are Limited after gather resources and help from many kind people I was able to create my simple chat system but left with one problem.
The messages are posted to a file called "msg.html" in this format :
<p><span id="name">$name</span><span id="Msg">$message</span></p>
And then using PHP and AJAX I will retrieve the messages instantly from the file using the
file(); function and a foreach(){} loop withing PHP here is the code :
<?php
$file = 'msg.html';
$data = file($file);
$max_lines = 20;
if(count($data) > $max_lines){
// here i want the data to be deleted from oldest until i only have 20 messages left.
}
foreach($data as $line_num => $line){
echo $line_num . " . " . $line;
}
?>
My Question is how can i delete the oldest messages so that i am only left with the latest 20 Messages ?
How does something like this seem to you:
$file = 'msg.html';
$data = file($file);
$max_lines = 20;
foreach($data as $line_num => $line)
{
if ($line_num < $max_lines)
{
echo $line_num . " . " . $line;
}
else
{
unset($data[$line_num]);
}
}
file_put_contents('msg.html', $data);
?>
http://www.php.net/manual/en/function.file-put-contents.php for more info :)
I suppose you can read the file, explode it into an array, chop off everything but last 20 fields and write it back to file, overwriting the old one... Perhaps not the best solution but one that comes to mind if you really cant use database as Delan suggested
That's called round-robin if I recall correctly.
As far as I know, you can't remove arbitrary portions of a file. You need to overwrite the file with the new contents (or create a new file and remove the old one). You could also store messages in individual files but of course that implies up to $max_lines files to read.
You should also use flock() to avoid data corruption. Depending on the platform it's not 100% reliable but it's better than nothing.

Categories