I am developing a PHP application where I need to fetch 5 random email addresses from a CSV file and send to user.
I already worked with CSV file many times but don't know how to fetch randomly in limit.
NOTE: CSV file have more than 200k emails.
Any one have a idea or suggestion then please send me.
If CSV is too big and won't be saved in a DB
You'll have to loop through all of the rows in the CSV once to count them.
You'll have to call a random-number generator function (rand, mt_rand, others...) and parametrize it to output numbers from 0 to $count, and call it 5 times (to get 5 numbers).
You'll have to loop through all of the rows in the CSV again and only copy the necessary information for the rows whose number matches the randomly generated values.
Nota bene: don't use file_get_contents with str_getcsv. Instead use fopen with fgetcsv. The first approach loads the entire file to memory which we don't want to do. The second approach only read the file line-by-line.
If the CSV is too big and will be saved in a DB
Loop through the CSV rows and insert each record into the DB.
Use a select query with LIMIT 5 and ORDER BY RAND().
If the CSV is small enough to fit into memory
Loop through the CSV rows and create an array holding all of them.
You'll have to call a random-number generator function (rand, mt_rand, others...) and parametrize it to output numbers from 0 to array count, and call it 5 times (to get 5 numbers).
Then retrieve the rows from the big array by their index number -- using the randomly generated numbers as indexes.
If csv file is not too big you can load whole file to array to get something like
e[0] = 'someone1#somewhere.com';
e[1] = 'someone2#somewhere.com';
e[2] = 'someone3#somewhere.com';
then you can pick random email by e[rand(0, sizeof(e))];
and do this 5 times (with check for double items)
Read all emails from CSV then select random 5 email from email array.
To select 5 random number use array_rand function.
$email = array('test#test.com','test2#test.com','test3#test.com','test4#test.com','test5#test.com');
$output = array_rand($email, 5);
print_r($email); // will return random 5 email.
for large number try to use something like
$max = count($email);
$email_rand = array();
for ($i =0; $i<5; $i++)
{
$a = mt_rand(0, $max);
$email_rand[] = $email[$a];
}
print_r($email_rand);
<?php
$handle = fopen('test.csv', 'r');
$csv = fgetcsv($handle);
function randomMail($key)
{
global $csv;
$randomMail = $csv[$key];
return $randomMail;
}
$randomKey = array_rand($csv, 5);
print_r(array_map('randomMail', $randomKey));
This is small utility to achieve the thing you expect and change the declaration of randomMail function as you desired.
for($i=0;$i<5;$i++)
{
$cmd = "awk NR==$(($"."{RANDOM} % `wc -l < ~/Downloads/email.csv` + 1)) ~/Downloads/email.csv >> listemail.txt";
$rs = exec($cmd);
}
after you read list mail from listmail.txt
Related
Background
I'm trying to complete a code challenge where I need to refactor a simple PHP application that accepts a JSON file of people, sorts them by registration date, and outputs them to a CSV file. The provided program is already functioning and works fine with a small input but intentionally fails with a large input. In order to complete the challenge, the program should be modified to be able to parse and sort a 100,000 record, 90MB file without running out of memory, like it does now.
In it's current state, the program uses file_get_contents(), followed by json_decode(), and then usort() to sort the items. This works fine with the small sample data file, however not with the large sample data file - it runs out of memory.
The input file
The file is in JSON format and contains 100,000 objects. Each object has a registered attribute (example value 2017-12-25 04:55:33) and this is how the records in the CSV file should be sorted, in ascending order.
My attempted solution
Currently, I've used the halaxa/json-machine package, and I'm able to iterate over each object in the file. For example
$people = \JsonMachine\JsonMachine::fromFile($fileName);
foreach ($people as $person) {
// do something
}
Reading the whole file into memory as a PHP array is not an option, as it takes up too much memory, so the only solution I've been able to come up with so far has been iterating over each object in the file, finding the person with the earliest registration date and printing that. Then, iterating over the whole file again, finding the next person with the earliest registration date and printing that etc.
The big issue with that is that the nested loops: a loop which runs 100,000 times containing a loop that runs 100,000 times. It's not a viable solution, and that's the furthest I've made it.
How can I parse, sort, and print to CSV, a JSON file with 100,000 records? Usage of packages / services is allowed.
I ended up importing into MongoDB in chunks and then retrieving in the correct order to print
Example import:
$collection = (new Client($uri))->collection->people;
$collection->drop();
$people = JsonMachine::fromFile($fileName);
$chunk = [];
$chunkSize = 5000;
$personNumber = 0;
foreach ($people as $person) {
$personNumber += 1;
$chunk[] = $person;
if ($personNumber % $chunkSize == 0) { // Chunk is full
$this->collection->insertMany($chunk);
$chunk = [];
}
}
// The very last chunk was not filled to the max, but we still need to import it
if(count($chunk)) {
$this->collection->insertMany($chunk);
}
// Create an index for quicker sorting
$this->collection->createIndex([ 'registered' => 1 ]);
Example retrieve:
$results = $this->collection->find([],
[
'sort' => ['registered' => 1],
]
);
// For every person...
foreach ($results as $person) {
// For every attribute...
foreach ($person as $key => $value) {
if($key != '_id') { // No need to include the new MongoDB ID
echo some_csv_encode_function($value) . ',';
}
}
echo PHP_EOL;
}
I have a large number of one dimensional arrays stored in memory cache and I want to write them to a CSV files, the arrays come one by one through the use of a queue. But I want to "limit each csv file to about 100 rows" and then write the remaining arrays into newer files and so on.
I would highly appreciate any help in this.
I used this function to pass arrays into a csv but I don't know how to limit the number of rows to 100 and then open new files and write to them.
The messages passed in the queue are keys named as SO1,SO2, SO3 n so on with the last message being passed as "LAST". And based on the keys , the arrays associated with the keys are read in from memcache and have to be written into csv files. The messages reach one after another via rabitmq queue from a preceeding module.
// Assuming $S01 is an array fetched from memcache based on the key say S01 received via a queue.
$SO1 = array('Name'=> 'Ubaid', 'Age'=>'24', 'Gender'=>'Male','Lunch'=>'Yes', 'Total'=> '1000');
$row_count= 0;
$csv_file_count= 1;
while($msg != "LAST" ){ // As long as msg recieved is not LAST
$csv = fopen("file_". $csv_file_count.".csv", "w");
fputcsv($csv, array_keys($SO));
while($msg != "LAST" && $row_count<100){
fputcsv($csv, $SO); // Write toCSV
$row_count++;
}
$row_count=0;
$csv_file_count++;
fclose($csv);
You could make a counter like this.
$row_count = 0;
$csv_file_count = 1;
while (!$q->isEmpty()){ // as long as the queue is not empty
$csv = fopen("file_".$csv_file_count.".csv", "w"); // open "file_n.csv" n file number
fputcsv($csv,explode(',',"col1,col2,col3,col4")); // Your custom headers
while (!$q->isEmpty() && $row_count < 100){ // so while queue is not empty and the counter didnt reach 100
fputcsv($csv, explode(',',$q->pop())); // write to file. Explode by , or space or whatever your data looks like
$row_count++; // increment row counter
}
$row_count = 0; // when that is not true anymore, reset row counter
$csv_file_count++; // increment file counter
$csv.close() // close file
} // repeats untill queue is empty
Updated to use fputcsv()
If you want another seperator in your csv file you can do like this:
fputcsv($csv, explode(',',explode(',',$q->pop()),";"); // (;) for example. Default is comma (,)
You can also specify a field enclosure
fputcsv($csv, explode(',',explode(',',$q->pop()),",","'"); // (') for example. Default is double quote (")
fputcsv() takes 2 required parameters and 2 optional
From php.net fputcsv
int fputcsv ( resource $handle , array $fields [, string $delimiter = "," [, string $enclosure = '"' [, string $escape_char = "\" ]]] )
Fields shall be an array, therefore explode(',',$q->pop()) as the 2nd parameter
in "file.txt" i have some numbers like 1234,123456,12345678 etc.
so i was wondering how i can get dynamically just one element, for example echo just 12345,then echo 123456, but one by one ???when user comes i want to show him one element,and then next time i want to show another element on the page, it would be good also if i could erase elements that i have echo...When i manually enter position it works... Please help...
I have a following code:
function test($n){
$file=file_get_contents("file.txt");
$array = explode(",", $file);
return print_r($array[$n]);
};
The solution will need deleting the number on your filesystem or marking it as already showed either on the filesystem or on the database, after the number gets echoed.
Shuffling the numbers or generating a random index will never quite be full proof since the same number could be shown again.
First of all, this is usually a terrible way of doing this.
It's much better to use a database, for example.
But you could do the following:
<?php
function getRandomNumber() {
// retrieve and parse the numbers
$numbers = file_get_contents('file.txt');
$numbers = explode(',' $numbers);
// select a random index
$randomIndex = array_rand($numbers);
// get the chosen number
$randomNumber = $numbers[$randomIndex];
// remove it from array
unset($numbers[$randomIndex]);
// save updated file
file_put_contents('file.txt', implode(',', $numbers));
return $randomNumber;
}
I have a file with the size of around 10 GB or more. The file contains only numbers ranging from 1 to 10 on each line and nothing else. Now the task is to read the data[numbers] from the file and then sort the numbers in ascending or descending order and create a new file with the sorted numbers.
Can anyone of you please help me with the answer?
I'm assuming this is somekind of homework and goal for this is to sort more data than you can hold in your RAM?
Since you only have numbers 1-10, this is not that complicated task. Just open your input file and count how many occourances of every specific number you have. After that you can construct simple loop and write values into another file. Following example is pretty self explainatory.
$inFile = '/path/to/input/file';
$outFile = '/path/to/output/file';
$input = fopen($inFile, 'r');
if ($input === false) {
throw new Exception('Unable to open: ' . $inFile);
}
//$map will be array with size of 10, filled with 0-s
$map = array_fill(1, 10, 0);
//Read file line by line and count how many of each specific number you have
while (!feof($input)) {
$int = (int) fgets($input);
$map[$int]++;
}
fclose($input);
$output = fopen($outFile, 'w');
if ($output === false) {
throw new Exception('Unable to open: ' . $outFile);
}
/*
* Reverse array if you need to change direction between
* ascending and descending order
*/
//$map = array_reverse($map);
//Write values into your output file
foreach ($map AS $number => $count) {
$string = ((string) $number) . PHP_EOL;
for ($i = 0; $i < $count; $i++) {
fwrite($output, $string);
}
}
fclose($output);
Taking into account the fact, that you are dealing with huge files, you should also check script execution time limit for your PHP environment, following example will take VERY long for 10GB+ sized files, but since I didn't see any limitations concerning execution time and performance in your question, I'm assuming it is OK.
I had a similar issue before. Trying to manipulate such a large file ended up being huge drain on resources and it couldn't cope. The easiest solution I ended up with was to try and import it into a MySQL database using a fast data dump function called LOAD DATA INFILE
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Once it's in you should be able to manipulate the data.
Alternatively, you could just read the file line by line while outputting the result into another file line by line with the sorted numbers. Not too sure how well this would work though.
Have you had any previous attempts at it or are you just after a possible method of doing it?
If that's all you don't need PHP (if you have a Linux maschine at hand):
sort -n file > file_sorted-asc
sort -nr file > file_sorted-desc
Edit: OK, here's your solution in PHP (if you have a Linux maschine at hand):
<?php
// Sort ascending
`sort -n file > file_sorted-asc`;
// Sort descending
`sort -nr file > file_sorted-desc`;
?>
:)
I am writing a php script that will parse through a file, (synonyms.dat), and coordinate a list of synonyms with their parent word, for about 150k words.
Example from file:
1|2
(adj)|one|i|ane|cardinal
(noun)|one|I|ace|single|unity|digit|figure
1-dodecanol|1
(noun)|lauryl alcohol|alcohol
1-hitter|1
(noun)|one-hitter|baseball|baseball game|ball
10|2
(adj)|ten|x|cardinal
(noun)|ten|X|tenner|decade|large integer
100|2
(adj)|hundred|a hundred|one hundred|c|cardinal
(noun)|hundred|C|century|one C|centred|large integer
1000|2
(adj)|thousand|a thousand|one thousand|m|k|cardinal
(noun)|thousand|one thousand|M|K|chiliad|G|grand|thou|yard|large integer
**10000|1
(noun)|ten thousand|myriad|large**
In the example above I want to link ten thousand, myriad, large to the word 1000.
I have tried various method of reading the .dat file into memory using file_get_contents and then exploding the file at \n, and using various array search techniques to find the 'parent' word and it's synonyms. However, this is extremely slow, and more often then not crashes my web server.
I believe what I need to do is use preg_match_all to explode the string, and then just iterate over the string, inserting into my database where appropriate.
$contents = file_get_contents($page);
preg_match_all("/([^\s]+)\|[0-9].*/",$contents,$out, PREG_SET_ORDER);
This matches each
1|2
1-dodecanol|1
1-hitter|1
But I don't know how to link the fields in between each match, IE the synonyms themselves.
This script is intended to be run once, to get all the information into my database appropriately. For those interested, I have a database 'synonym_index' which holds a unique id of each word, as well as the word. Then another table 'synonym_listing' which contains a 'word_id' column and a 'synomym_id' column where each column is a foreign key to synonym_index. There can be multiple synonym_id's to each word_id.
Your help is greatly appreciated!
You can use explode() to split each line into fields. (Or, depending on the precise format of the input, fgetcsv() might be a better choice.)
Illustrative example, which will almost certainly need adjustment for your specific use case and data format:
$infile = fopen('synonyms.dat', 'r');
while (!feof($infile)) {
$line = rtrim(fgets($infile), "\r\n");
if ( $line === '' ) {
continue;
}
// Line follows the format HEAD_WORD|NUMBER_OF_SYNONYM_LINES
list($headWord, $n) = explode('|', $line);
$synonyms = array();
// For each synonym line...
while ( $n-- ) {
$line = rtrim(fgets($infile), "\r\n");
$fields = explode('|', $line);
$partOfSpeech = substr(array_shift($fields), 1, -1);
$synonyms[$partOfSpeech] = $fields;
}
// Now here, when $headWord is '**10000', $synonyms should be array(
// 'noun' => array('ten thousand', 'myriad', 'large**')
// )
}
Wow, for this type of functionality you have databases with tables and indices.
PHP is to serve a request/response, not to read a big file into memory. I advise you to put the data in a database. That will be much faster - and it is made for it.