I have a log file (log.txt) in the form:
=========================================
March 01 2050 13:05:00 log v.2.6
General Option: [default] log_options.xml
=========================================
Loaded options from xml file: '/the/path/of/log_options.xml'
printPDF started
PDF export
PDF file created:'/path/of/file.1.pdf'
postProcessingDocument started
INDD file removed:'/path/of/file.1.indd'
Error opening document: '/path/of/some/filesomething.indd':Error: file doesnt exist or no permissions
=========================================
March 01 2050 14:15:00 log v.2.6
General Option: [default] log_options.xml
=========================================
Loaded options from xml file: '/the/path/of/log_options.xml'
extendedprintPDF started
extendedprintPDF: Error: Unsaved documents have no full name: line xyz
Note: Each file name is of the format: 3lettersdatesomename_LO.pdf/indd. Example: MNM011112ThisFile_LO.pdf. Also, on a given day and time, the entry could either have just errors, just the message about the file created or both, like I have shown here.
The file continues this way. And, I have a db in the form:
id itemName status
1 file NULL
And so on...
Now, I am expected to go through the log file and for each file that is created or if there in an error, I am supposed to update the last column of DB with appropriate message: File created or Error. I thought of searching the string "PDF file created/Error" and then grabbing the file name.
I have tried various things like pathinfo() and strpos. But, I can't seem to understand how I am going to get it done.
Can someone please provide me some inputs on how I can solve this? The txt file and db are pretty huge.
NOTE: I provided the 2nd entry of the log file to be clear that the format in which errors appear IS NOT consistent. I would like to know if I can still achieve what I am supposed to with an inconsistent format for errors.
Can somebody please help after reading the whole question again? There have been plenty of changes from the first time I posted this.
You can use the explode method of php to break your file into pieces of words.
In case the fields in your text file are tab separated then you can explode on explode(String,'\t'); or else in case of space separated, explode on space.
Then a simple substr(word,start_index,length) on each word can give you the name of file (here start_index should be 0).
Using mysql_connect will help you connect to mysql database, or a much efficient way would be to use PDO (PHP Data Objects) to make your code much more reliable and flexible.
Another way out would be to use the preg_match method and specify a regular expression matching your error msg and parse for the file name.
You can refer to php.net manual for help any time.
Are all of the files PDFs? If so you can do a regex search on files with the .pdf extension. However, if the filename is also contained in the error string, you will need to exclude that somehow.
// Assume filenames contain only upper/lowercase letters, 0-9, underscores, periods, dashes, and forward slashes
preg_match_all('/([a-zA-Z0-9_\.-/]+\.pdf)/', $log_file_contents, $matches);
// $matches should be an array containing each filename.
// You can do array_unique() to exclude duplicates.
Edit: Keep in mind, $matches will be a multi-dimensional array as described http://php.net/manual/en/function.preg-match-all.php and http://php.net/manual/en/function.preg-match.php
To test a regex expression, you can use http://regexpal.com/
Okay, so the main issue here is that you either don't have a consistent delimiter for "entries"..or else you are not providing enough info. So based on what you have provided, here is my suggestion. The main caveat here is that without a solid delimiter for "entries," there's no way to know for sure if the error matches up with the file name. The only way to fix this is to format your file better. Also you have to fill in some blanks, like your db info and how you actually perform the query.
$handle = fopen("log.txt", "rb");
while (!feof($handle)) {
// get the current row
$row = fread($handle, 8192);
// get file names
preg_match('~^PDF file created:(.*?)$~',$row,$match);
if ( isset($match[1]) ) {
$files[] = $match[1];
}
// get errors
preg_match('~^Error:(.*?)$~',$row,$match);
if ( isset($match[1]) ) {
$errors[] = $match[1];
}
}
fclose($handle);
// connect to db
foreach ($files as $k => $file) {
// assumes your table just has basename of file
$file = basename($file);
$error = ( isset($errors[$k]) ) ? $errors[$k] : null;
$sql = "update tablename set status='$error' where itemName='$file'";
// execute query
}
EDIT: Actually going back to your post, it looks like you want to update a table not insert, so you will want to change the query to be an update. And you may need to further work with $file in that foreach for your where clause, depending on how you store your filenames in your db (for example, if you just store the basename, you will likely want to do $file = basename($file); in the foreach). Code updated to reflect this.
So hopefully this will point you in the right direction.
Related
I have tried to extract the user email addresses from my server. But the problem is maximum files are .txt but some are CSV files with txt extension. When I am trying to read and extract, I could not able to read the CSV files which with TXT extension. Here is my code:
<?php
$handle = fopen('2.txt', "r");
while(!feof($handle)) {
$string = fgets($handle);
$pattern = '/[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}/i';
preg_match_all($pattern, $string, $matches);
foreach($matches[0] as $match)
{
echo $match;
echo '<br><br>';
}
}
?>
I have tried to use this code for that. The program is reading the complete file which are CSV, and line by line which are Text file. There are thousands of file and hence it is difficult to identify.
Kindly, suggest me what I should do to resolve my problem? Is there any solution which can read any format, then it will be awesome.
Well your files are different. Because of that you will have to take a different approach for each of those. In more general terms this is usually calling adapting and is mostly provided using the Adapter design pattern.
Should you use the adapter design pattern you would have a code inspecting the extension of a file to be opened and a switch with either txt or csv. Based on the value you would retrieve aTxtParseror aCsvParser` respectively.
However, before diving deep into this territory you might want to have a look at the files first. I cannot say this for sure without seeing the structures but you can. If the contents of both the text and csv files are the same then a very simple approach is to change the extension to either txt or a csv for all files and then process them using same logic, knowing files with the same extension will now be processed in the same manner.
But from what I understood the file structures actually differ. So to keep your code concise the adapter pattern, having two separate classes/functions for parsing and another one on top of that for choosing the right parsing function (this top function would actually be a form of a strategy) and running it.
Either way, I very much doubt so there is a solution for the problem you are facing as a file structure is mostly your and your own.
Ok, so problem is when CSV file has too long string line. Based on this restriction I suggest you to use example from php.net Here is an example:
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo $buffer;
// do your operation for searching here
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
I'm writing some PHP code to import a CSV file to a Postgre DB, and I'm getting the error below. Can you help me?
Warning: pg_end_copy(): Query failed: ERROR: literal newline found in data HINT: Use "\n" to represent newline. CONTEXT: COPY t_translation, line 2 in C:\xampp\htdocs\importing_csv\importcsv.php on line 21
<?php
$connString = 'host = localhost dbname= importdb user=postgres password=pgsql';
$db = pg_connect($connString);
$file = file('translation.csv');
//pg_exec($db, "CREATE TABLE t_translation (id numeric, identifier char(100), device char(10), page char(40), english char(100), date_created char(30), date_modified char(30), created_by char(30), modified_by char(30) )");
pg_exec($db, "COPY t_translation FROM stdin");
foreach ($file as $line) {
$tmp = explode(",", $line);
pg_put_line($db, sprintf("%d\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n", $tmp[0], $tmp[1], $tmp[2], $tmp[3], $tmp[4], $tmp[5], $tmp[6], $tmp[7], $tmp[8]));
}
pg_put_line($db, "\\.\n");
pg_end_copy($db);
?>
You need to specify FILE_IGNORE_NEW_LINES flag in file() function as a 2nd parameter which otherwise by default will include the newline char at the end of the each array item. This is likely whats causing the issue here.
So just add this flag FILE_IGNORE_NEW_LINES so that lines extracted from csv file will not have newline char at the end of the each line:
$file = file('translation.csv', FILE_IGNORE_NEW_LINES);
Also I would recommend using fgetcsv() instead to read csv file.
If you're willing to use PDO (necessitates a separate connection call), there's an elegant solution that does not require as much processing of the data by PHP, and that will also work with any combination of fields so long as their names in the CSV header match the names in the database. I'll assume you already have initialized PDO and have the object as $pdo, and the filename is $filename. Then:
$file=fopen($filename,'r');
$lines=explode("\n", fread ($file, filesize($filename)));
if (end($lines)=='') array_pop($lines); // Remove the last line if it empty, as often happens, so it doesn't generate an error with postgres
$fields=array_shift($lines); // Retrieve & remove the field list
$null_as="\\\\N"; // Or whatever your notation for NULL is, if needed
$result=$pdo->pgsqlCopyFromArray('t_translation',$lines,',',$null_as,$fields);
This is pretty minimal, there is no error handling other than $result returning success or failure, but it can be a starting point.
I like this solution better than the approach you are taking though because you don't need to specify the fields at all, it's all handled automatically.
If you don't want to use PDO, there's a similar solution using your setup and syntax, just for the last line replace it with:
pg_copy_from($db,'t_translation',$lines,',',$null_as)
This solution, however, does not dynamically adjust the field names, the fields of the CSV need to exactly match those in the table. However, the names don't need to line up, as the first line of the CSV file is ignored. I haven't tested this last line though because I don't use this type of connection, so there could be an error in it.
I'm trying to group a bunch of files together based on RecipeID and StepID. Instead of storing all of the filenames in a table I've decided to just use glob to get the images for the requested recipe. I feel like this will be more efficient and less data handling. Keeping in mind the directory will eventually contain many thousands of images. If I'm wrong about this then the below question is not necessary lol
So let's say I have RecipeID #5 (nachos, mmmm) and it has 3 preparation steps. The naming convention I've decided on would be as such:
5_1_getchips.jpg
5_2_laycheese.jpg
5_2_laytomatos.jpg
5_2_laysalsa.jpg
5_3_bake.jpg
5_finishednachos.jpg
5_morefinishedproduct.jpg
The files may be generated by a camera, so DSC###.jpg...or the person may have actually named each picture as I have above. Multiple images can exist per step. I'm not sure how I'll handle dupe filenames, but I feel that's out of scope.
I want to get all of the "5_" images...but filter them by all the ones that DON'T have any step # (grouped in one DIV), and then get all the ones that DO have a step (grouped in their respective DIVs).
I'm thinking of something like
foreach ( glob( $IMAGES_RECIPE . $RecipeID . "-*.*") as $image)
and then using a substr to filter out the step# but I'm concerned about getting the logic right because what if the original filename already has _#_ in it for some reason. Maybe I need to have a strict naming convention that always includes _0_ if it doesn't belong to a step.
Thoughts?
Globbing through 1000s of files will never being faster than having indexed those files in a database (of whatever type) and execute a database query for them. That's what databases are meant for.
I had a similar issue with 15,000 mp3 songs.
In the Win command line dir
dir *.mp3 /b /s > mp3.bat
Used a regex search and replace in NotePad++ that converted the the file names and prefixed and appended text creating a Rename statement and Ran the mp3.bat.
Something like this might work for you in PHP:
Use regex to extract the digits using preg_replace to
Create a logic table(s) to create the words for the new file names
create the new filename with rename()
Here is some simplified and UNTESTED Example code to show what I am suggesting.
Example Logic Table:
$translation[x][y][z] = "phrase";
$translation[x][y][z] = "phrase";
$translation[x][y][z] = "phrase";
$translation[x][y][z] = "phrase";
$folder = '/home/user/public_html/recipies/';
$dir=opendir($folder);
while (false !== ($found=readdir($dir))){
if pathinfo($file,PATHINFO_EXTENSION) == '.jpg')
{
$files[]= pathinfo($file,PATHINFO_FILENAME);
}
}
foreach($files as $key=> $filename){
$digit1 = 'DSC(\d)\d\d\.jpg/',"$1", $filename);
$digit2 = 'DSC\d(\d)\d\.jpg',"$1", $filename);
$digit3 = 'DSC\d\d(\d)\.jpg',"$1", $filename);
$newName = $translation[$digit1][$digit2][$digit3]
ren($filename,$newfilename);
}
I am trying to write a file in PHP. So far it works "kind of".
I have an array of names in the format {Rob, Kevin, Michael}. I use the line of code
foreach($Team as $user)
{
print_r($user);
//create a file for each user
$file = fopen("./employee_lists/".$user, 'w');
//I have also tried: fopen("employee_lists/$user", 'w');
// ... ... ...
//write some data to each file.
}
This works as expected: The print_r shows "Rob Kevin Michael", however, the filenames are saved are as follows: ROB~1, KEVIN~1, MICHAE~1
When I'm going on to use these files later in my code, and I want to relate the usernames of "Rob" to ROB~1, I'll have to take some extra step to do this. I feel like I'm using fopen incorrectly, but it does exactly what I want minus this little naming scheme issue.
It seems like your $user variable contains an invalid character for file system paths (my best guess would be a new line).
Try:
$file = fopen("./employee_lists/".trim($user), 'w');
You should sanitize $user before using it a as file name.
$pattern = '/(;|\||`|>|<|&|^|"|'."\n|\r|'".'|{|}|[|]|\)|\()/i';
// no piping, passing possible environment variables ($),
// seperate commands, nested execution, file redirection,
// background processing, special commands (backspace, etc.), quotes
// newlines, or some other special characters
$user= preg_replace($pattern, '', $user);
$user= '"'.preg_replace('/\$/', '\\\$', $user).'"'; //make sure this is only interpreted as ONE argument
By the way, it's a bad idea using an user name for a file name. It's better to use a numeric id.
Sorry for my bad English.
I must to check 2 csv files, if strings with one id is different, must write to file.
If there is no string with id from 1st file in second file, must write this to file too.
it works, but with element (id=47) i have got a trouble. it into to files, but script sad, that there is only in one.
download script you can from here
http://sil-design.ru/uploads/script.zip
If you do a echo $str1[0].' - '.$str2[0].'<br />'; you will see that the two 47's are never compared. Also I am not sure what the t is in: $f2 = fopen($fileurl, 'rt');.
If you open your backup.csv in notepad and place your cursor after the 47;XL and hold delete to delete anything after it and save. Then try your script again, it should work. It seems that the backup.csv was created in a weird way, I am guessing PHP is getting an EOF before the file has even ended!