search for something inside file without open it? - php

i have about 50 file in some folder
i want get all files contain 'Name : EMAD' inside!
example:
the php imap library!
they search in thousands messages text files for some word!
i have wrote an stupid function to open file and search inside
$s = scandir('dir');
foreach($s as $file){
$content = file_get_contents($file);
if(strpos($content,'Name : ENAD') !== false)
$matched_files[] = $file;
}
but what if there is thousands of files!
should i open all files???? !!!!
is that possible to search for something inside file without open it?
if NO
what is the best and fast way to do that ?php

is that possible to search for something inside file without open it?
of course - no. Where is your common sense? Can you search a refrigerator without opening it?
i have about 50 file in some folder
no problem, it will be fast enough. opening a file is not THAT heavy operation as you imagine.
but what if there is thousands of files!
first have a thousand then come to ask.
what is the best and fast way to do that ?
Store your data in database, not files

Is there a reason you need to use PHP functions? This is exactly what grep was designed to do...
You could always just use PHP's exec() to run an appropriate grep command, such as:
grep -lr 'Name : ENAD' dir
But you might also want to consider (if you're the person creating thousands of files in the first place) whether that is the best way of storing your data - if you usually need the ability to search quickly, you might want to either use a database instead of plain files (e.g. MySQL, PostgreSQL, SQLite, et cetera), or keep a search index (using e.g. Sphinx, Solr, or Lucene).

Related

File truncated when reading

I am writing some json results in files in PHP on shared hosting (fwrite).
Then I read those files to extract json results (file_get_contents).
It happens some times (maybe one out of more than one thousand) that when I read this file it appears truncated: I can only read a multiple of the first 32768 bytes of the file.
I added some code to copy/paste the file I am reading in case the json string is not valid, and I then get 2 different files: the original one was correctly written as it contains a valid json string and the copied one contains only the beginning of the original one and has a size of x*32768 bytes.
Would you have any idea of what could be the problem and how to solve this? (I don't know how to investigate further)
Thank you
Without example code it is impossible to give a 'fix my code' answer, but when doing file write/read sort of programming, you should follow a simple process (which, from the description, is missing one fairly critical step!)
First, write to a TEMP file (you are writing to a file, but it is important here to write to a TEMP file - otherwise, you could have race conditions....... ;);
an easy way to do that in php
$yourData = "whateverYourDataIs....";
$goodfilename = 'whateverYourGoodFileNameIsForYourData.json';
$tempfilename = 'tempfile' . time(); // MANY ways to do this (lots of SO posts on it - just get a unique name every time you write ('unique' may not be needed if you only occasionally do a write, but it is a good safety measure to avoid collisions and time() works for many programs.)
// Now, use $tempfilename in your fwrite.
$fwrite = fwrite($tempfilename,$yourData);
if ($fwrite === false) {
// the write failed, so do whatever 'error' function you may need
// since it failed, there should be no file, but not a bad idea to attempt to delete it
unlink ($tempfile);
}
else {
// the write succeeded, so let's do a 'sanity check' on the file to make sure it is good JSON (this is a 'paranoid' check, but "better safe than sorry", right?)
if(json_decode($tempfile)){
// we know the file is good JSON, so now RENAME (this is really fast, so collisions are almost impossible) NOTE: see http://php.net/manual/en/function.rename.php comments for some potential challenges and workarounds if you have trouble with rename.
rename($tempfilename,$goodfilename);
}
// Now, the GOOD file will contain your new data - and those read issues are gone! (though never say 'never' - it may be possible, but very unlikely!)
}
This may/not be your issue directly and you will have to suit this to fit your code, but as a safety factor - and a good way to avoid collisions, it should give you ~100% read success, which I believe is what you are after!)
If this doesn't help, then some direct code will be needed to provide a more complete answer.
As suggested by #UlrichEckhardt comment, it was due to read / write concurrency problem. I was trying to read a file that was being writen. I solved this by just waiting before trying to read the file again

How to read any of the file type using php7?

I have tried to extract the user email addresses from my server. But the problem is maximum files are .txt but some are CSV files with txt extension. When I am trying to read and extract, I could not able to read the CSV files which with TXT extension. Here is my code:
<?php
$handle = fopen('2.txt', "r");
while(!feof($handle)) {
$string = fgets($handle);
$pattern = '/[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}/i';
preg_match_all($pattern, $string, $matches);
foreach($matches[0] as $match)
{
echo $match;
echo '<br><br>';
}
}
?>
I have tried to use this code for that. The program is reading the complete file which are CSV, and line by line which are Text file. There are thousands of file and hence it is difficult to identify.
Kindly, suggest me what I should do to resolve my problem? Is there any solution which can read any format, then it will be awesome.
Well your files are different. Because of that you will have to take a different approach for each of those. In more general terms this is usually calling adapting and is mostly provided using the Adapter design pattern.
Should you use the adapter design pattern you would have a code inspecting the extension of a file to be opened and a switch with either txt or csv. Based on the value you would retrieve aTxtParseror aCsvParser` respectively.
However, before diving deep into this territory you might want to have a look at the files first. I cannot say this for sure without seeing the structures but you can. If the contents of both the text and csv files are the same then a very simple approach is to change the extension to either txt or a csv for all files and then process them using same logic, knowing files with the same extension will now be processed in the same manner.
But from what I understood the file structures actually differ. So to keep your code concise the adapter pattern, having two separate classes/functions for parsing and another one on top of that for choosing the right parsing function (this top function would actually be a form of a strategy) and running it.
Either way, I very much doubt so there is a solution for the problem you are facing as a file structure is mostly your and your own.
Ok, so problem is when CSV file has too long string line. Based on this restriction I suggest you to use example from php.net Here is an example:
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo $buffer;
// do your operation for searching here
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}

completely deleting a file from server

I want to delete a file by using PHP. I have used the unlink() function, but I was wondering about the security of unlink. Is the file completely deleted from the server? I want to make sure that there is no way to get the file back and the file is completely removed from the server.
open the file in binary mode for writing, write 1's over the entire file, close the file, and then unlink it. overwrites any data within the file so it cannot be recovered.
Personally i would say use 1's instead of 0's as 1's are actual data and will always write, where as 0's may not write, depending on several factors.
Edit: After some thought, and reading of comments, i would go with a hybrid approach, depending on "how deleted" you want the file to be, if you simply wish to make it so the data cannot be recovered, overwrite the entire files length with 1's as this is fast, and destroys the data, the problem with this, is it leaves a set length of uniform data on the disk which infers a file USED to be there and gives away the files length, giving vital pieces of forensic information. Simply writing random data will not avoid this also, as if all the drive sectors around this file are untouched, this will also leave a forensic trace.
The best solution factoring in forensic deletion, obfuscation and plausible deniability (again, this is overkill, but im adding it for the sake of adding it), overwrite the entire length of the file with 1's and then, for HALF the length of the file in bytes, write from mt_rand in random length sizes, from random starting points, leaving the impression that many files of varying lengths used to be in this area, thus creating a false trail. (again, this is completely overkill and is generally only needed by serial killers and the CIA, but im adding it for the sake of doing so).
the US government used to recommend a seven step wipe, for disks.
1) all '1's
2) all '0's
3) the pattern '01'
4) the pattern '10'
5) a random pattern
6) all '1'
7) a random pattern,
re the code sample, using a language like PHP is wrong for this type of wipe as your relaying on the OS really wipeing the file and not doing something cleaver like only wipeing it the last time or just unlinking it, however...
(untested)
$filename = "/usr/local/something.txt";
$size = filesize($filename);
$pat1 = chr(0);
$pat2 = chr(255);
$pat3 = chr(170);
$pat4 = chr(85);
$mask = str_repeat($pat1, $size);
file_put_contents($filename, $mask);
$mask = str_repeat($pat2, $size);
file_put_contents($filename, $mask);
$mask = str_repeat($pat3, $size);
file_put_contents($filename, $mask);
$mask = str_repeat($pat4, $size);
file_put_contents($filename, $mask);
This might not answer HOW to perfectly delete a file "with PHP", but it answers your question: "Is the file completely deleted from the server ?"
In some cases, No! (on UNIX/POSIX OS).
According to the highest voted comment on the offical PHP unlink() manual page, the unlink function does not really delete the file, it's deleting the system link to the file's content ! As files can have several files names (!) [symlinks?] the file will only be deleted when ALL file names are unlinked. So, if your file has 2 names, then unlink() will not really delete the file unless you unlink() both file names. Dear linux guys, please correct me here if necessary.
This might be why the function is called unLINK() and not delete() !!!
Here a full quote of the excellent comment:
Deleted a large file but seeing no increase in free space or decrease of disk usage? Using UNIX or other POSIX OS? The unlink() is not about removing file, it's about removing a file name. The manpage says: `unlink - delete a name and possibly the file it refers to''. Most of the time a file has just one name -- removing it will also remove (free, deallocate) thebody' of file (with one caveat, see below). That's the simple, usual case.
However, it's perfectly fine for a file to have several names (see the link() function), in the same or different directories. All the names will refer to the file body and keep it alive', so to say. Only when all the names are removed, the body of file actually is freed. The caveat: A file's body may *also* bekept alive' (still using diskspace) by a process holding the file open. The body will not be deallocated (will not free disk space) as long as the process holds it open. In fact, there's a fancy way of resurrecting a file removed by a mistake but still held open by a process...
Have a look on unlink()'s sister function link() here.
The (imo) best way to delete a file via PHP:
The way to go to really delete a file with PHP (in linux) is to use the exec() function, which executes real bash commands (doing things with linux bash feel correct btw). In this case, the file test.jpg would be deleted by doing:
exec("rm test.jpg);
More info on how to use rm (remove) correctly can be found for example here. Please note: PHP needs the right to delete the file!
UPDATE: Unfortunatly, the linux rm command ALSO does not really delete the file if it has two names/links. Look here for more info.
I'll have a deeper research on that and give feedback...
It is possible that because of some fragmentation on the disk some parts of file will stay, even if the file is totally overwritten.
The other way is to run (by shell_exec()) external program, that is system specific. Here is an example (for Windows), however I have not tested it.
You should do multiple passes of overwriting to deminish traces. For instance using the US DoD 5220-22.M : "Overwrite all addressable locations with a character, its complement, then a random character and verify" (from killdisk site)
Here's what the EFF recommends to permanently remove a file http://ssd.eff.org/tech/deletion.
In my embedded Ubuntu device, I use: echo exec('rm /usr/share/subdirectory/subdirectory/filename'); This works for me.
if you use rm -f (--force) then linux will
ignore nonexistent files and arguments, never prompt
rm -d will
remove empty directories
If you enter rm --help at the prompt you get the help screen. The last lines read:
Note that if you use rm to remove a file, it might be possible to recover some of its contents, given sufficient expertise and/or time. For greater assurance that the contents are truly unrecoverable, consider using shred.
Since my system is a "closed" system then I'm not concerned about violating security issues. My logic being that one must have the system password to SSH into the OS and the only user interface is via web pages.
#Sliq's comments are still true to date. You need to decide for your case.

How to keep file to 1000 lines with Linux or PHP?

I have a file that I'm using to log IP addresses for a client. They want to keep the last 500 lines of the file. It is on a Linux system with PHP4 (oh no!).
I was going to add to the file one line at a time with new IP addresses. We don't have access to cron so I would probably need to make this function do the line-limit cleanup as well.
I was thinking either using like exec('tail [some params]') or maybe reading the file in with PHP, exploding it on newlines into an array, getting the last 1000 elements, and writing it back. Seems kind of memory intensive though.
What's a better way to do this?
Update:
Per #meagar's comment below, if I wanted to use the zip functionality, how would I do that within my PHP script? (no access to cron)
if(rand(0,10) == 10){
shell_exec("find . logfile.txt [where size > 1mb] -exec zip {} \;")
}
Will zip enumerate the files automatically if there is an existing file or do I need to do that manually?
The fastest way is probably, as you suggested, to use tail:
passthru("tail -n 500 $filename");
(passthru does the same as exec only it outputs the entire program output to stdout. You can capture the output using an output buffer)
[edit]
I agree with a previous comment that a log rotate would be infinitely better... but you did state that you don't have access to cron so I'm assuming you can't do logrotate either.
logrotate
This would be the "proper" answer, and it's not difficult to set this up either.
You may get the number of lines using count(explode("\n", file_get_contents("log.txt"))) and if it is equal to 1000, get the substring starting from the first \n to the end, add the new IP address and write the whole file again.
It's almost the same as writing the new IP by opening the file in a+ mode.

Search Text In Files Using PHP

How to search text in some files like PDF, doc, docs or txt using PHP?
I want to do similar function as Full Text Search in MySQL,
but this time, I'm directly search through files, not database.
The search will do searching in many files that located in a folder.
Any suggestion, tips or solutions for this problem?
I also noticed that, google also do searching through the files.
For searching PDF's you'll need a program like pdftotext, which converts content from a pdf to text. For Word documents a simular thingy could be available (because of all the styling and encryption in Word files).
An example to search through PDF's (copied from one of my scripts (it's a snippet, not the entire code, but it should give you some understanding) where I extract keywords and store matches in a PDF-results-array.):
foreach($keywords as $keyword)
{
$keyword = strtolower($keyword);
$file = ABSOLUTE_PATH_SITE."_uploaded/files/Transcripties/".$pdfFiles[$i];
$content = addslashes(shell_exec('/usr/bin/pdftotext \''.$file.'\' -'));
$result = substr_count(strtolower($content), $keyword);
if($result > 0)
{
if(!in_array($pdfFiles[$i], $matchesOnPDF))
{
array_push($matchesOnPDF, array(
"matches" => $result,
"type" => "PDF",
"pdfFile" => $pdfFiles[$i]));
}
}
}
Depending on the file type, you should convert the file to text and then search through it using i.e. file_get_contents() and str_pos(). To convert files to text, you have - beside others - the following tools available:
catdoc for word files
xlhtml for excel files
ppthtml for powerpoint files
unrtf for RTF files
pdftotext for pdf files
If you are under a linux server you may use
grep -R "text to be searched for" ./ // location is everything under the actual directory
called from php using exec resulting in
cmd = 'grep -R "text to be searched for" ./';
$result = exec(grep);
print_r(result);
2021 I came across this and found something so I figure I will link to it...
Note: docx, pdfs and others are not regular text files and require more scripting and/or different libraries to read and/or edit each different type unless you can find an all in one library. This means you would have to script out each different file type you want to search though including a normal text file. If you don't want to script it completely then you have to install each of the libraries you will need for each of the file types you want to read as well. But you still need to script each to handle them as the library functions.
I found the basic answer here on the stack.

Categories