I have the following code to read from a file, and write back to it after some computation.
if(file_exists(CACHE_FILE_PATH)) {
//read the cache and delete that line!
$inp = array();
$cache = fopen(CACHE_FILE_PATH, 'r');
if($cache) {
while(!feof($cache)) {
$tmp = fgets($cache);
//some logic with $tmp
$inp[] = $tmp;
}
fclose($cache);
}
var_dump($inp);
$cache = fopen(CACHE_FILE_PATH, 'w');
var_dump($inp);
if($cache) {
var_dump($inp);
foreach ($inp as $val) {
echo "\nIN THE LOOP";
fwrite($val."\n");
}
fclose($cache);
}
}
The output of the var_dumps is:
array(3) {
[0]=>
string(13) "bedupako|714
"
[1]=>
string(16) "newBedupako|624
"
[2]=>
string(19) "radioExtension|128
"
}
array(3) {
[0]=>
string(13) "bedupako|714
"
[1]=>
string(16) "newBedupako|624
"
[2]=>
string(19) "radioExtension|128
"
}
array(3) {
[0]=>
string(13) "bedupako|714
"
[1]=>
string(16) "newBedupako|624
"
[2]=>
string(19) "radioExtension|128
"
}
Even though its an array, it is not going in the loop and printing IN THE LOOP! Why?
This part of your code:
fwrite($val."\n");
Should be:
fwrite($cache, $val); // the "\n" is only required if it was stripped off after fgets()
The first argument to fwrite() must be a file descriptor opened with fopen().
Of course, if you had turned on error_reporting(-1) and ini_set('display_errors', 'On') during development you would have spotted this immediately :)
As suggested in the comments, you should try to simplify your code by using constructs like file() to read the whole file into an array of lines and then use join() and file_put_contents() to write the whole thing back.
If you just want a cache of key/value pairs, you could look into something like this:
// to read, assuming the cache file exists
$cache = include CACHE_FILE_PATH;
// write back cache
file_put_contents(CACHE_FILE_PATH, '<?php return ' . var_export($cache, true) . ';');
It reads and writes files containing data structures that PHP itself can read (a lot faster than you can).
Related
I do have a quite strange happening to me, and i can't seem to figure where is my problem
I have a csv file I use to export datas. It's filled with urls and other stuff.
I have extracted URL in this the array $urlsOfCsv
I extracts csv lines into an array this way :
$request->file('file')->move(public_path('uploads/temp/'),'tempcsv.csv');
$file = fopen(public_path('uploads/temp/').'tempcsv.csv',"r");
$lines = [];
fgetcsv($file, 10000, ",");
$o=0;
while (($data = fgetcsv($file, 0, "\t")) !== FALSE) {
$lines[$o]= $data;
$o++;
}
fclose($file);
File::delete($file);
$urlsOfCsv = array_column($lines,0);
but I can't extract domain with parse_url() because I'm getting this strange thing :
foreach($urlsOfCsv as $url){
var_dump($url);
var_dump(parse_url($url));
}
will give me result like this :
string(41) "https://www.h4d.com/" array(1) { ["path"]=> string(41) "_h_t_t_p_s_:_/_/_w_w_w_._h_4_d_._c_o_m_/_" }
string(73) "https://www.campussuddesmetiers.com/" array(1) { ["path"]=> string(73) "_h_t_t_p_s_:_/_/_w_w_w_._c_a_m_p_u_s_s_u_d_d_e_s_m_e_t_i_e_r_s_._c_o_m_/_" }
string(69) "http://altitoy-ternua.com/?lang=es" array(2) { ["path"]=> string(53) "_h_t_t_p_:_/_/_a_l_t_i_t_o_y_-_t_e_r_n_u_a_._c_o_m_/_" ["query"]=> string(15) "_l_a_n_g_=_e_s_" }
string(81) "https://www.opquast.com/communaute/jobs/" array(1) { ["path"]=> string(81) "_h_t_t_p_s_:_/_/_w_w_w_._o_p_q_u_a_s_t_._c_o_m_/_c_o_m_m_u_n_a_u_t_e_/_j_o_b_s_/_" }
I don't even have the 'host' key inside the array.
Any idea why I get this result ?
I tried lot of things with regex to use some other function. But i get either empty results or anything.
I suppose this has something to do with the csv stuff, but I can't find where.
Thanks to Cbroe I manage to found the solution that was indeed pretty obvious.
I had bad encoding in my csv. After a little bit of research i found my file to be encoded in UTF-16.
I tried convert encoding that way ( which is probably not optimal given the double loop ) :
while (($data = fgetcsv($file, 0, "\t")) !== FALSE) {
for($i=0;$i<count($data);$i++){
$data[$i] = mb_convert_encoding( $data[$i],'UTF-8','UTF-16');
}
$lines[$o]= $data;
$o++;
}
And now it works just fine. parse_url() will give me the awaited result ( UrlParser::getDomain() also works for me ).
When I retrieve a file, it outputs:
...22nlarray(3) { [0]=> string(62) "/public_html/wp/wp-content/plugins/AbonneerProgrammas/Albums/." [1]=> string(63) "/public_html/wp/wp-content/plugins/AbonneerProgrammas/Albums/.." [2]=> string(69) "/public_html/wp/wp-content/plugins/AbonneerProgrammas/Albums/22nl.mp3" }
I, however, only want the 22nl displayed. I do not want the rest over there. How can I do that? Is there a function that deletes the rest of the output except 22nl (which is a filename)?
My PHP code:
// get contents of the current directory
$contents = ftp_nlist($conn_id, $destination_folder);
foreach ($contents as $mp3_url) {
$filename = basename($mp3_url, ".mp3");
echo "<a href='$mp3_url'>$filename</a>";
}
var_dump($contents);
There are similar questions to mine, however, they did not provide a good answer for me.
Greetings,
Rezoo Aftib
We can say that your value from the array will be
$value='string(69) "/public_html/wp/wp-content/plugins/AbonneerProgrammas/Albums/22nl.mp3"';
So, you can make that to export your file name:
$value = explode('"',$value);
$exploded = explode('/', $value[1]);
$full_mp3_name = end($exploded);
$just_name = explode(".",$full_mp3_name);
$just_name = $just_name[0];
When you print $full_mp3_name you will need to have 22nl.mp3
When you print $just_name you will need to have 22nl
It will better with functions but its example if it will be option for you.
Here's my script
test.php
function doc2text($filename){
$name = pathinfo($filename,PATHINFO_FILENAME);
$count = exec('abiword --to=txt '.$filename.' && wc -w classes/'.$name.'.txt');
$count = explode(" ",$count);
var_dump($count);
return $count[0];
}
echo doc2text('classes/demo.pdf');
When I run this script in command line like so :
php test.php
The var_dump output normally :
array(2) {
[0]=>
string(4) "1663"
[1]=>
string(16) "classes/demo.txt"
}
But when I run the same page on my browser the array is empty :
array(1) { [0]=> string(0) "" }
This is really weird... Any clues why it's doing this ?
abiword is probably not in the path of the webserver environment. Try: /path/to/abiword. Also, $filename will need to be in the directory of the running PHP script or you need to specify the path.
Also use error reporting:
error_reporting(E_ALL);
ini_set('display_errors', '1');
Why is my PHP script hanging?
$path = tempnam(sys_get_temp_dir(), '').'.txt';
$fileInfo = new \SplFileInfo($path);
$fileObject = $fileInfo->openFile('a');
$fileObject->fwrite("test line\n");
var_dump(file_exists($path)); // bool(true)
var_dump(file_get_contents($path)); // string(10) "test line
// "
var_dump(iterator_count($fileObject)); // Hangs on this
If I delete the last line (iterator_count(...) and replace it with this:
$i = 0;
$fileObject->rewind();
while (!$fileObject->eof()) {
var_dump($fileObject->eof());
var_dump($i++);
$fileObject->next();
}
// Output:
// bool(false)
// int(0)
// bool(false)
// int(1)
// bool(false)
// int(2)
// bool(false)
// int(3)
// bool(false)
// int(4)
// ...
The $fileObject->eof() always returns false so I get an infinite loop.
Why are these things happening? I need to get a line count.
By what I see in your code, you are opening the file with mode a at this line:
$fileObject = $fileInfo->openFile('a');
When you do that, its write only:
$fileObject->eof() needs to read the file, you should open the file with a+ to allow read/write:
$fileObject = $fileInfo->openFile('a+');
Ps: either with a or a+, the pointer goes to the end of the file.
I have a been having problems opening and reading the contents of a 2gb csv file. Everytime I run the script it exhausts the servers memory (10GB VPS Cloud Server) and then gets killed. I have made a test script and was wondering if anyone could have a look and confirm that I am not doing anything silly (php wise) here that would cause what seems and unsually high amount of memory usage. I have spoken to my hosting company but they seem to be of the opinion that it is a code problem. So just wondering if anyone can look over this and confirm there is nothing in the code that would cause this kind of problem.
Also if you deal with 2GB csvs, have you encounted anything like this before ?
Thanks
Tim
<?php
ini_set("memory_limit", "10240M");
$start = time();
echo date("Y-m-d H:i:s", $start)."\n";
$file = 'myfile.csv';
$lines = $keys = array();
$line_count = 0;
$csv = fopen($file, "r");
if(!empty($csv))
{
echo "file open \n";
while(($csv_line = fgetcsv($csv, null, ',', '"')) !== false)
{
if($line_count==0) {
foreach($csv_line as $item) {
$keys[] = preg_replace("/[^a-zA-Z0-9]/", "", $item);
}
} else {
$array = array();
for ($i = 0; $i <count($csv_line); $i++) {
$array[$keys[$i]] = $csv_line[$i];
}
$lines[] = (object) $array;
//print_r($array);
//echo "<br/><br/>";
}
$line_count++;
}
if ($line_count == 0) {
echo "invalid csv or wrong delimiter / enclosure ".$file;
}
} else {
echo "cannot open ".$file;
}
fclose ($csv);
echo $line_count . " rows \n";
$end = time();
echo date("Y-m-d H:i:s", $end)."\n";
$time = number_format((($end - $start)/60), 2);
echo $time."\n";
echo "peak memory usages ".memory_get_peak_usage(true)."\n";
it is not actually an "opening" problem but rather processing problem
I am sure you don't need to keep all the parsed lines in the memory like you currently do.
Why not just put the parsed line wherever it belongs to - a database or another file or anything?
It will make your code to keep in the memory as little as just one line at a time.
As others have already pointed out, you're loading the whole 2 GB file into memory. You do this while creating an array with multiple strings out of each line, so factually the resulting memory needed is more than the plain file-size.
You might want to process each row of the CSV file separately, ideally with an iterator, for example one that returns each line as a keyed array:
$csv = new CSVFile('../data/test.csv');
foreach ($csv as $line) {
var_dump($line);
}
Exemplary output here:
array(3) {
["Make"]=> string(5) "Chevy"
["Model"]=> string(4) "1500"
["Note"]=> string(6) "loaded"
}
array(3) {
["Make"]=> string(5) "Chevy"
["Model"]=> string(4) "2500"
["Note"]=> string(0) ""
}
array(3) {
["Make"]=> string(5) "Chevy"
["Model"]=> string(0) ""
["Note"]=> string(6) "loaded"
}
This iterator is inspired by one that's build in in PHP called SPLFileObject. As this is an iterator, you decide what you do with each line's/row's data. See the related question: Process CSV Into Array With Column Headings For Key
class CSVFile extends SplFileObject
{
private $keys;
public function __construct($file)
{
parent::__construct($file);
$this->setFlags(SplFileObject::READ_CSV);
}
public function rewind()
{
parent::rewind();
$this->keys = parent::current();
parent::next();
}
public function current()
{
return array_combine($this->keys, parent::current());
}
public function getKeys()
{
return $this->keys;
}
}
PHP is really the wrong language for this. String manipulation usually results in copies of strings being allocated in memory, and garbage collection will only occur when the script ends, when it is really no longer needed. If you know how to do it, and it fits the execution environment, you'd be better with perl or sed/awk.
Having said this, there are two memory hogs on the script. The first is the foreach, which copies the array. Do a foreach on the array_keys, and refer back to the string entry in the array to get at the lines. The second, is the one referred by #YourCommonSense: you should design your algorithm so it works in streaming mode (i.e. not requiring the storage of the full dataset in memory). At a cursory glance, it seems feasible.