PHP Get Newest File Path - php

I'm using the following to convert CSV to JSON (https://gist.github.com/robflaherty/1185299). I need to need to modify it so that instead of using the exact file url path, it's pulling the newest file url in the directory as it's "source" in $feed.
Any help would be great! I've tried using the code found here PHP: Get the Latest File Addition in a Directory, but can't seem to figure how modify it so that it would work.
<?php
header('Content-type: application/json');
// Set your CSV feed
$feed = 'http://myurl.com/test.csv';
// Arrays we'll use later
$keys = array();
$newArray = array();
// Function to convert CSV into associative array
function csvToArray($file, $delimiter) {
if (($handle = fopen($file, 'r')) !== FALSE) {
$i = 0;
while (($lineArray = fgetcsv($handle, 4000, $delimiter, '"')) !== FALSE) {
for ($j = 0; $j < count($lineArray); $j++) {
$arr[$i][$j] = $lineArray[$j];
}
$i++;
}
fclose($handle);
}
return $arr;
}
// Do it
$data = csvToArray($feed, ',');
// Set number of elements (minus 1 because we shift off the first row)
$count = count($data) - 1;
//Use first row for names
$labels = array_shift($data);
foreach ($labels as $label) {
$keys[] = $label;
}
// Add Ids, just in case we want them later
$keys[] = 'id';
for ($i = 0; $i < $count; $i++) {
$data[$i][] = $i;
}
// Bring it all together
for ($j = 0; $j < $count; $j++) {
$d = array_combine($keys, $data[$j]);
$newArray[$j] = $d;
}
// Print it out as JSON
echo json_encode($newArray);
?>

It's a difficult question to answer because there isn't enough detail.
Here are some questions that need answered.
1). Are you creating the csv files that are being read? If you are, you just make sure that the file you want to read is called "latest.csv" and when you go to create "latest.csv" you check for an existing "latest.csv" and rename/archive it first. Your directory then contains archives but the latest one is always of the same name.
2). If you are not creating the csv files then you might want to ask the provider of the csv files if there's a way for you to identify the latest one, as surely, if they are providing them they'd expect to be providing everyone the latest feed and have a mechanism of doing that.
3). If you don't know the provider and want to take a guess, have a look at how the files are named and try to predict the latest one. Eg, if they appear to be including a month and year in them do a file_exists() (if you can) on the predicted next latest file. Again, just a possibility.

Based on your comments, if the files reside on the same server or are accessible on a filesystem that supports the file functions, then:
array_multisort(array_map('filemtime', $files=glob('/path/to/*.csv')), SORT_DESC, $files);
$newest = $files[0];
For remote access you could look at something like this: How can I download the most recent file on FTP with PHP?

Related

PHP Laravel read csv

I have a CSV file where the data is in landscape orientation.ex:
name, test name
age, 20
gender,Male
where the first column is the headers and the second the data, i tried using laravel maatwebsite/Excel and the response after reading the file, the first row is taken as the headers. (name and test name).
is there any method to read this type of CSV files in laravel using maatwebsite/Excel
You can use this function
public function readCSV($csvFile, $array)
{
$file_handle = fopen($csvFile, 'r');
while (!feof($file_handle)) {
$line_of_text[] = fgetcsv($file_handle, 0, $array['delimiter']);
}
fclose($file_handle);
return $line_of_text;
}
$csvFileName = "test.csv";
$csvFile = public_path('csv/' . $csvFileName);
$this->readCSV($csvFile,array('delimiter' => ','))
You don't need an entire library. PHP has a built-in function http://php.net/manual/en/function.str-getcsv.php
It doesn't matter for such small CSVs, but try to demand of those who give you CSVs to have them properly formatted, and not transposed or use online solutions to do so yourself. Here's the solution, but it stores the entire array in memory, even though the package in the first example is specifically made to avoid that.
With composer
You may use Spatie's simple-excel like so:
$csv = __DIR__ . '/data.csv';
$data = [];
SimpleExcelReader::create($csv)
// ->useDelimiter(';') // Optional
->noHeaderRow() // Optional
->getRows()
->each(function(array $row) use (&$data) {
$length = count($row);
for ($i = 1; $i < $length; $i++) {
$data[$i - 1] ??= [];
$data[$i - 1][$row[0]] = $row[$i];
}
});
I also opened an issue for your use-case. (Issue was closed, due to the solution not being memory efficient)
Without composer
As said by "online Thomas", there's a native PHP function for that, and I find it easiest to use it, in general, like so:
$csv = __DIR__ . '/data.csv';
$data = array_map('str_getcsv', file($csv));
Caveat: does not produce the desired results, if fields contain linebreaks
Use closure in your case, or if you need a delimiter other than ',', etc.:
$csv = __DIR__ . '/data.csv';
$data = [];
array_map(function ($line) {
$row = str_getcsv($line);
$length = count($row);
for ($i = 1; $i < $length; $i++) use (&$data) {
$data[$i - 1] ??= [];
$data[$i - 1][$row[0]] = $row[$i];
}
}, file($csv));
For 2022 readers: what I am using right now is: https://github.com/spatie/simple-excel
work like a charm with no memory usage due to LazyCollections

PHP read part of large CSV file

I have a large CSV file. Because of memory concerns (with MySQL), I would like to only read a part of it at a time, if possible.
That it's CSV might not be important. The important thing is that it needs to cut with a new line.
Example content:
Some CSV content
that will break
on a line break
This could be my path:
$path = 'path/to/my.csv';
A solution for it could in my mind look like this:
$csv_content1 = read_csv_file($path, 0, 100);
$csv_content2 = read_csv_file($path, 101, 200);
It reads the raw content on line 0-100.
It reads the raw content on line 101-200.
Information
No parsing is needed (just split into content).
The file exists on my own server.
Don't read the whole file into the memory.
I want to be able to do the second read on another time, not on the same run. I accept save temp values like pointers if needed.
I've been trying to read other topics but did not find an exact match to this problem.
Maybe some of these could somehow work?
SplFileObject
fgetcsv
Maybe I can't use $csv_content2 before I've used $csv_content1, because I need to save some kind of a pointer? In that case it's fine. I will read them in order anyway.
After much thinking and reading I finally think I found the solution to my problem. Correct me if this is a bad solution because of memory usage or from other perspectives.
First run
$buffer = part($path_to_file, 0, 100);
Next run
$buffer = part($path_to_file, $buffer['pointer'], 100);
Function
function part($path, $offset, $rows) {
$buffer = array();
$buffer['content'] = '';
$buffer['pointer'] = array();
$handle = fopen($path, "r");
fseek($handle, $offset);
if( $handle ) {
for( $i = 0; $i < $rows; $i++ ) {
$buffer['content'] .= fgets($handle);
$buffer['pointer'] = mb_strlen($buffer['content']);
}
}
fclose($handle);
return($buffer);
}
In my more object oriented environment it looks more like this:
function part() {
$handle = fopen($this->path, "r");
fseek($handle, $this->pointer);
if( $handle ) {
for( $i = 0; $i < 2; $i++ ) {
if( $this->pointer != $this->filesize ) {
$this->content .= fgets($handle);
}
}
$this->pointer += mb_strlen($this->content);
}
fclose($handle);
}

How can i upload a csv file in mysql database using multithreads? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a csv file, containing millions of email addresses which I want to upload fast into a mysql database with PHP.
Right now I'm using a single threaded program which takes too much time to upload.
//get the csv file
$file = $_FILES['csv']['tmp_name'];
$handle = fopen($file,"r");
//loop through the csv file and insert into database
do {
if ($data[0]) {
$expression = "/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$/";
if (preg_match($expression, $data[0])) {
$query=mysql_query("SELECT * FROM `postfix`.`recipient_access` where recipient='".$data[0]."'");
mysql_query("SET NAMES utf8");
$fetch=mysql_fetch_array($query);
if($fetch['recipient']!=$data[0]){
$query=mysql_query("INSERT INTO `postfix`.`recipient_access`(`recipient`, `note`) VALUES('".addslashes($data[0])."','".$_POST['note']."')");
}
}
}
} while ($data = fgetcsv($handle,1000,",","'"));
First of all, I can't stress enough; fix your indentation - it will make life easier for everyone.
Secondly, the answer depends a lot on the actual bottlenecks you are encountering:
Regular expressions are very slow, especially when they're in a loop.
Databases tend to either work well for WRITES or for READS but not BOTH: try decreasing the amount of queries beforehand.
It stands to reason that the less PHP code in your loop, the faster it will work. Consider decreasing conditions (for instance).
For the record, your code is not safe against mysql injection: filter $_POST before hand [*]
[*] speaking of which, it's faster to access a variable than the index of an array, like $_POST.
You can simulate multithreading by having your main program split the huge CSV file into a smaller one and run each file into a different process.
common.php
class FileLineFinder {
protected $handle, $length, $curpos;
public function __construct($file){
$handle = fopen($file, 'r');
$length = strlen(PHP_EOL);
}
public function next_line(){
while(!feof($this->handle)){
$b = fread($this->handle, $this->length);
$this->curpos += $this->length;
if ($b == PHP_EOL) return $this->curpos;
}
return false;
}
public function skip_lines($count){
for($i = 0; $i < $count; $i++)
$this->next_line();
}
public function __destruct(){
fclose($this->handle);
}
}
function exec_async($cmd, $outfile, $pidfile){
exec(sprintf("%s > %s 2>&1 & echo $! >> %s", $cmd, $outfile, $pidfile));
}
main.php
require('common.php');
$maxlines = 200; // maximum lines subtask will be processing at a time
$note = $_POST['note'];
$file = $_FILES['csv']['tmp_name'];
$outdir = dirname(__FILE__) . DIRECTORY_SEPARATOR . 'out' . DIRECTORY_SEPARATOR;
//make sure our output directory exists
if(!is_dir($outdir))
if(!mkdir($outdir, 0755, true))
die('Cannot create output directory: '.$outdir);
// run a task for each chunk of lines in the csv file
$i = 0; $pos = 0;
$l = new FileLineFinder($file);
do {
$i++;
exec_async(
'php -f sub.php -- '.$pos.' '.$maxlines.' '.escapeshellarg($file).' '.escapeshellarg($note),
$outdir.'proc'.$i.'.log',
$outdir.'proc'.$i.'.pid'
);
$l->skip_lines($maxlines);
} while($pos = $l->next_line());
// wait for each task to finish
do {
$tasks = count(glob($outdir.'proc*.pid'));
echo 'Remaining Tasks: '.$tasks.PHP_EOL;
} while ($tasks > 0);
echo 'Finished!'.PHP_EOL;
sub.php
require('common.php');
$start = (int)$argv[1];
$count = (int)$argv[2];
$file = $argv[3];
$note = mysql_real_escape_string($argv[4]);
$lines = 0;
$handle = fopen($file, 'r');
fseek($handle, $start, SEEK_SET);
$expression = "/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$/";
mysql_query('SET NAMES utf8');
//loop through the csv file and insert into database
do {
$lines++;
if ($data[0]) {
if (preg_match($expression, $data[0])) {
$query = mysql_query('SELECT * FROM `postfix`.`recipient_access` where recipient="'.$data[0].'"');
$fetch = mysql_fetch_array($query);
if($fetch['recipient'] != $data[0]){
$query = mysql_query('INSERT INTO `postfix`.`recipient_access`(`recipient`, `note`) VALUES("'.$data[0].'","'.$note.'")');
}
}
}
} while (($data = fgetcsv($handle, 1000, ',', '\'')) && ($lines < $count));
Credits
https://stackoverflow.com/a/2162528/314056
https://stackoverflow.com/a/45966/314056
The most pressing thing to do is to make sure your database is properly indexed so the lookup query you do for every row is as fast as possible.
Other than that, there simply isn't that much you can do. For a multithreaded solution, you'll have to go outside PHP.
You could also just import the CSV file in mySQL, and then weed out the superfluous data using your PHP script - that is likely to be the fastest way.
Just a general suggestion: The key to speed up any program is to know which part take most of the time.
And then figure out how to reduce it. Sometimes you will be very surprised by the actual result.
btw, I don't think multithreading would solve your your problem.
Put the whole loop inside an SQL transaction. That will speed things up by an order of magnitude.

Parse CSV file of links to php array, feed these links to simplehtmldom

I have a php code that will read and parse csv files into a multiline array, what i need to do next is to take this array and let simplehtmldom fire off a crawler to return some company stocks info.
The php code for the CSV parser is
$arrCSV = array();
// Opening up the CSV file
if (($handle = fopen("NASDAQ.csv", "r")) !==FALSE) {
// Set the parent array key to 0
$key = 0;
// While there is data available loop through unlimited times (0) using separator (,)
while (($data = fgetcsv($handle, 0, ",")) !==FALSE) {
// Count the total keys in each row $data is the variable for each line of the array
$c = count($data);
//Populate the array
for ($x=0;$x<$c;$x++) {
$arrCSV[$key][$x] = $data[$x];
}
$key++;
} // end while
// Close the CSV file
fclose($handle);
} // end if
echo "<pre>";
echo print_r($arrCSV);
echo "</pre>";
This works great and parses the array line by line, $data being the variable for each line. What i need to do now is to get this to be read via simplehtmldom, which is where it breaks down, im looking at using this code or something very similar, im pretty inexperienced at this but guess i would be needing a foreach statement somewhere along the line.
This is the simplehtmldom code
$html = file_get_html($data);
$html->find('div[class="detailsDataContainerLt"]');
$tickerdetails = ("$es[0]");
$FileHandle2 = fopen($data, 'w') or die("can't open file");
fwrite($FileHandle2, $tickerdetails);
fclose($FileHandle2);
fclose($handle);
So my qyestion is how can i get them both working together, i jave checked out simplehtmldom manual page several times and find it a littlebit vague in this area, the simplehtmldom code above is what i use in another function but by direclty linking so i know that it works.
regards
Martin
Your loop could be reduced to (yes, it's the same):
while ($data = fgetcsv($handle, 0, ',')) {
$arrCSV[] = $data;
}
Using SimpleXML instead of SimpleDom (Since it's standard PHP):
foreach ($arrCSV as $row) {
$xml = simplexml_load_file($row[0]); // Change 0 to the index of the url
$result = $xml->xpath('//div[contains(concat(" ", #class, " "), " detailsDataContainerLt")]');
if ($result->length > 0) {
$file = fopen($row[1], '2'); // Change 1 to the filename you want to write to
if ($file) {
fwrite($file, (string) $result->item(0));
fclose($file);
}
}
}
that should do it if I understood correctly...

Read a file from line X to line Y? [duplicate]

The closest I've seen in the PHP docs, is to fread() a given length, but that doesnt specify which line to start from. Any other suggestions?
Yes, you can do that easily with SplFileObject::seek
$file = new SplFileObject('filename.txt');
$file->seek(1000);
for($i = 0; !$file->eof() && $i < 1000; $i++) {
echo $file->current();
$file->next();
}
This is a method from the SeekableIterator interface and not to be confused with fseek.
And because SplFileObject is iterable you can do it even easier with a LimitIterator:
$file = new SplFileObject('longFile.txt');
$fileIterator = new LimitIterator($file, 1000, 2000);
foreach($fileIterator as $line) {
echo $line, PHP_EOL;
}
Again, this is zero-based, so it's line 1001 to 2001.
You not going to be able to read starting from line X because lines can be of arbitrary length. So you will have to read from the start counting the number of lines read to get to line X. For example:
<?php
$f = fopen('sample.txt', 'r');
$lineNo = 0;
$startLine = 3;
$endLine = 6;
while ($line = fgets($f)) {
$lineNo++;
if ($lineNo >= $startLine) {
echo $line;
}
if ($lineNo == $endLine) {
break;
}
}
fclose($f);
Unfortunately, in order to be able to read from line x to line y, you'd need to be able to detect line breaks... and you'd have to scan through the whole file. However, assuming you're not asking about this for performance reasons, you can get lines x to y with the following:
$x = 10; //inclusive start line
$y = 20; //inclusive end line
$lines = file('myfile.txt');
$my_important_lines = array_slice($lines, $x, $y);
See: array_slice
Well, you can't use function fseek to seek the appropriate position because it works with given number of bytes.
I think that it's not possible without some sort of cache or going through lines one after the other.
Here is the possible solution :)
<?php
$f = fopen('sample.txt', 'r');
$lineNo = 0;
$startLine = 3;
$endLine = 6;
while ($line = fgets($f)) {
$lineNo++;
if ($lineNo >= $startLine) {
echo $line;
}
if ($lineNo == $endLine) {
break;
}
}
fclose($f);
?>
If you're looking for lines then you can't use fread because that relies on a byte offset, not the number of line breaks. You actually have to read the file to find the line breaks, so a different function is more appropriate. fgets will read the file line-by-line. Throw that in a loop and capture only the lines you want.
I was afraid of that... I guess it's plan B then :S
For each AJAX request I'm going to:
Read into a string the number of lines I'm going to return to the client.
Copy the rest of the file into a temp file.
Return string to the client.
It's lame, and it will probably be pretty slow with 10,000+ lines files, but I guess it's better than reading the same over and over again, at least the temp file is getting shorter with every request... No?

Categories