Cakephp - import 10k lines of email adresses - php

I have a script that reads about 10k lines of a txt file with an email address in each line.
For each address I do a check if its already in use and if yes I put the address into an error array, if not I put the address int an save array.
After the foreach and if no addreses are in the error array, I do a $this->Newsletter->saveMany($data).
For some reason, I always get a timeout, when I import more than about 500 lines.
Is there a way to go another / better way to avoid the timeout?
Please advice!
public function import() {
$filename = './files/newsletterImport/newsletter.txt';
$lines = file($filename);
foreach ($lines as $line_num => $line) {
// check unique
if($email = $this->Newsletter->find('first', array('conditions' => array('email' => trim($line))))){
$error[$line_num]['email'] = trim($line);
$error[$line_num]['cancel'] = date('d.m.Y H:i:s', strtotime($email['Newsletter']['cancel']));
}else{
$data[$line_num]['Newsletter']['email'] = trim($line);
$data[$line_num]['Newsletter']['active'] = 1;
}
}
if(!$error){
$this->Newsletter->create();
if($this->Newsletter->saveMany($data)){
$this->set('msg', 'Success');
}else{
$this->set('msg', 'Error! Nothing imported!');
}
}else{
$this->set('msg', 'Error! Nothing imported!');
$this->set('error', $error);
}
}else{
$this->set('msg', 'No file found!');
}

Since you don't really know when you time out, a few tips.
1) Meassure the time for every bit of code, unless you know what is the part that's taking you most of the time available.
2) You do a query to the DB each for each line, that's 10k worth of queries... My heart hurts. So, consider doing a big query to get all emails, and compare them with php to know if they are unique. Again, time it, maybe the SQL option turns out to be more efficient (depends on how you do the search on a PHP array)
3) Try adjusting the options of the save. The atomic option is true by default, and that probably is a bit too much to a 10k save.
$this->Newsletter->saveMany($data, array('atomic'=>false));
If that those not solve the problem, try to divide the array and save it by stages. I know it a pain, but maybe the transaction is just too much.

Related

how to use fopen, fgets, flose properly? It works fine but later sometime the count just goes down to random number

It works fine but later sometime the count just goes down to random
number. My guess is my code cannot process multiple visits at a time.
Where increment heppens
Where it displays the count
<?php
$args_loveteam = array('child_of' => 474);
$loveteam_children = get_categories($args_loveteam);
if(in_category('loveteams', $post->ID)){
foreach ($loveteam_children as $loveteam_child) {
$post_slug = $loveteam_child->slug;
echo "<script>console.log('".$post_slug."');</script>";
if(in_category($loveteam_child->name)){
/* counter */
// opens file to read saved hit number
if($loveteam_child->slug == "loveteam-mayward"){
$datei = fopen($_SERVER['DOCUMENT_ROOT']."/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-".$post_slug."-2.txt","r");
}else{
$datei = fopen($_SERVER['DOCUMENT_ROOT']."/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-".$post_slug.".txt","r");
}
$count = fgets($datei,1000);
fclose($datei);
$count=$count + 1 ;
// opens file to change new hit number
if($loveteam_child->slug == "loveteam-mayward"){
$datei = fopen($_SERVER['DOCUMENT_ROOT']."/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-".$post_slug."-2.txt","w");
}else{
$datei = fopen($_SERVER['DOCUMENT_ROOT']."/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-".$post_slug.".txt","w");
}
fwrite($datei, $count);
fclose($datei);
}
}
}
?>
I would at least change your code to this
foreach ($loveteam_children as $loveteam_child) {
$post_slug = $loveteam_child->slug;
echo "<script>console.log('".$post_slug."');</script>";
if($loveteam_child->slug == "loveteam-mayward"){
$filename = "{$_SERVER['DOCUMENT_ROOT']}/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-{$post_slug}.txt";
}else{
$filename = "{$_SERVER['DOCUMENT_ROOT']}/wp-content/themes/inside-showbiz-Vfeb13.ph-updated/countlog-{$post_slug}-2.txt";
}
$count = file_get_contents($filename);
file_get_contents($filename, ++$count, LOCK_EX);
}
You could also try flock on the file to get a lock before modifying it. That way if another process comes along it has to wait on the first one. But file_put_contents works great for things like logging where you may have many processes competing for the same file.
Database should be ok, but even that may not be fast enough. It shouldn't mess up your data though.
Anyway hope it helps. This is kind of an odd question, concurrency can be a real pain if you have a high chance of process collisions and race conditions etc etc.
However as I mentioned (in the comments) using the filesystem is probably not going to provide the consistency you need. Probably the best for this may be some kind of in memory storage such as Redis. But that is hard to say without full knowing what you use it for. For example if it should persist on server reboot.
Hope it helps, good luck.

PHP writing into text file while looping

I'm developing an app where user upload excel [.xlsx] file for dumping data into MySQL database. I have programmed in such a way that there is a LOG created for each import. So that user can see if there is any error occurred and etc.. My script was working perfectly before implementing the log system.
After implementing the log system i can see duplicate rows inserted into database. Also die() command is not working.
It just keep looping continuously!
I have written sample code below. Please tell whats wrong in my logging method.
Note: if i remove logging [Writing into file] script works correctly.
$file = fopen("20131105.txt", "a");
fwrite($file, "LOG CREATED".PHP_EOL);
foreach($hdr as $k => $v) {
$username = $v['un'];
$address = $v['adr'];
$message = $v['msg'];
if($username == '') {
fwrite($file, 'Error: Missing User Name'.PHP_EOL);
continue;
} else {
// insert into database
}
}
fwrite($file, PHP_EOL."LOG CLOSED");
fclose($file);
echo 1;
die();
First, your die statement is after your loop. It needs to be inside your loop to end it;
Second, you're looping over $hdr. It's not defined in your snippet tho. It has to be an array. What does it contain?
var_dump($hdr);
The documentation for foreach as given in php manual highlights
"Reference of a $value and the last array element remain even after the foreach loop. It is recommended to destroy it by unset()."[1].
Try unsetting the values in foreach using unset($value) . This might be the reason for duplicate values.

php fgetcsv multiple lines not only one or all

I wand to read biiiiig CSV-Files and want to insert them into a database. That already works:
if(($handleF = fopen($path."\\".$file, 'r')) !== false){
$i = 1;
// loop through the file line-by-line
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->executeInsert($data, $tableFields);
}
unset($dataRow);
}
$i++;
}
fclose($handleF);
}
My problem of this solution is, that it's very slow. But the files are too big to put it directly into the memory... So I wand to ask, if there a posibility to read, for example 10 lines, into the $dataRow array not only one or all.
I want to get a better balance between the memory and the performance.
Do you understand what i mean? Thanks for help.
Greetz
V
EDIT:
Ok, I still have to try to find a solution with the MSSQL-Database. My solution was to stack the data and than make a multiple-MSSQL-Insert:
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->setCurrentRow($i);
if(count($dataStack) > 210){
array_push($dataStack, $data);
#echo '<pre>', print_r($dataStack), '</pre>';
$this->executeInsert($dataStack, $tableFields, true);
// reset the stack
unset($dataStack);
$dataStack = array();
} else {
array_push($dataStack, $data);
}
unset($data);
}
$i++;
unset($dataRow);
}
}
Finaly I have to loop the Stack and build in mulitiple Insert in the method "executeInsert", to create a query like this:
INSERT INTO [myTable] (field1, field2) VALUES ('data1', 'data2'),('data2', 'datta3')...
That works much better. I still have to check the best balance, but therefor i can change only the value '210' in the code above. I hope that help's everybody with a similar problem.
Attention: Don't forget to execute the method "executeInsert" again after readin the complete file, because it could happen that there are still some data in the stack and the method will only be executed when the stack reach the size of 210....
Greetz
V
I think your bottleneck is not reading the file. Which is a text file. Your bottleneck is the INSERT in the SQL table.
Do something, just comment the line that actually do the insert and you will see the difference.
I had this same issue in the past, where i did exactly what you are doing. reading a 5+ million lines CSV and inserting it in a Mysql table. The execution time was 60 hours which is
unrealistic.
My solutions was switch to another db technology. I selected MongoDB and the execution time
was reduced to 5 minutes. MongoDB performs really fast on this scenarios and also have a tool called mongoimport that will allow you to import a csv file firectly from the command line.
Give it a try if the db technology is not a limitation on your side.
Another solution will be spliting the huge CSV file into chunks and then run the same php script multiple times in parallel and each one will take care of the chunks with an specific preffix or suffix on the filename.
I don't know which specific OS are you using, but in Unix/Linux there is a command line tool
called split that will do that for you and will also add any prefix or suffix you want to the filename of the chunks.

Removing Lines in php ? Is this possible?

I have been struggling to create a Simple ( really simple ) chat system for my website as my knowledge on Javascripting/AJAX are Limited after gather resources and help from many kind people I was able to create my simple chat system but left with one problem.
The messages are posted to a file called "msg.html" in this format :
<p><span id="name">$name</span><span id="Msg">$message</span></p>
And then using PHP and AJAX I will retrieve the messages instantly from the file using the
file(); function and a foreach(){} loop withing PHP here is the code :
<?php
$file = 'msg.html';
$data = file($file);
$max_lines = 20;
if(count($data) > $max_lines){
// here i want the data to be deleted from oldest until i only have 20 messages left.
}
foreach($data as $line_num => $line){
echo $line_num . " . " . $line;
}
?>
My Question is how can i delete the oldest messages so that i am only left with the latest 20 Messages ?
How does something like this seem to you:
$file = 'msg.html';
$data = file($file);
$max_lines = 20;
foreach($data as $line_num => $line)
{
if ($line_num < $max_lines)
{
echo $line_num . " . " . $line;
}
else
{
unset($data[$line_num]);
}
}
file_put_contents('msg.html', $data);
?>
http://www.php.net/manual/en/function.file-put-contents.php for more info :)
I suppose you can read the file, explode it into an array, chop off everything but last 20 fields and write it back to file, overwriting the old one... Perhaps not the best solution but one that comes to mind if you really cant use database as Delan suggested
That's called round-robin if I recall correctly.
As far as I know, you can't remove arbitrary portions of a file. You need to overwrite the file with the new contents (or create a new file and remove the old one). You could also store messages in individual files but of course that implies up to $max_lines files to read.
You should also use flock() to avoid data corruption. Depending on the platform it's not 100% reliable but it's better than nothing.

Small help saving to txt file

Hello there so I just setup this basic poll, I inspired myself from something I found out there, and it's just a basic ajax poll that waves the results in a text file.
Although I was wondering, since I do not want the user to simply mass-click to advantage / disadvantage the results, i thought about adding a new text file that could simply save the IP, one on each line, and then checks if it's already logged, if yes, display the results, if not, show the poll.
My lines of code to save the result are:
<?php
$vote = $_REQUEST['vote'];
$filename = "votes.txt";
$content = file($filename);
$array = explode("-", $content[0]);
$yes = $array[0];
$no = $array[1];
if ($vote == 0)
{
$yes = $yes + 1;
}
if ($vote == 1)
{
$no = $no + 1;
}
$insert = $yes."-".$no;
$fp = fopen($filename,"w");
fputs($fp,$insert);
fclose($fp);
?>
So I'd like to know how I could check out the IPs, in the same way it does basically.
And I'm not interested in database, even for security measures, I'm alright with what Ive got.
Thanks to any help!
To stop multiple votes, I'd set a cookie once a user has voted. If the user reloads the page with the voting form on it and has a cookie, you could show just the results, or a "You have already voted." message. Note that this will not stop craftier people from double-voting - all they would have to do is remove the saved cookie, and they could re-vote.
Keep in mind though that IPs can be shared so your idea of storing IPs might backfire - people on a shared external-facing IP won't be able to vote, as your system will have registered a previous vote from someone at the same IP address.
easiest way is to write data to file is
file_put_contents($filename, $data)
and to read data from file
file_get_contents($filename);
To get IP Address of the user
$_SERVER['REMOTE_ADDR']
See php manual for file_put_contents for more information and file_get_contents
Here is sample code
<?php
// File path
$file = 'votedips.txt';
// Get User's IP Address
$ip = $_SERVER['REMOTE_ADDR'];
// Get data from file (if it exists) or initialize to empty string
$votedIps = file_exists($file) ? file_get_contents($file) : '';
//
$ips = explode("\n", $votedIps);
if (array_search($ip, $ips)) {
// USER VOTED
} else {
$ips[] = $ip;
}
// Write data to file
$data = implode("\n", $ips);
file_put_contents($file, $data);
?>
You can use file_get_contents to save the file's content into a variable and then use the strpos function to check if the IP exists in that variable.
For example:
$ipfile=file_get_contents('ip.txt');
if (strpos($ipfile, $_SERVER['REMOTE_ADDR'])!==FALSE) // show the results
else // show the poll
Be careful with storing IPs in a text file, and then using file_get_contents() and similar functions for loading the data/parseing. As an absolute worst case, assuming that every possible IP address used your system to vote, you'd end up with a text file in the many many gigabytes in size, and you'd exceed PHP's memory_limit very quickly.

Categories