Email Alerts on saved searches, procedure and safety/performance tips&tricks?

Email Alerts on saved searches, procedure and safety/performance tips&tricks? - php

I built an email alert for my users (now are only 2,000)
so every night a crontab execute a php script that query the mysql to find matches with user's saved search. it's a classified website in my case, but i would like to learn in case i had to build something for bigger clients
my concerns are:
what happen if my user grow x10 or
x100 times? is the server going to
crash? there any tip you can suggest
on manage something like that?
there is any way to protect my file
cron/nightly_script.php to be
executed form outside calling it in
the url of the browser? consider
tham im using a string in crontab
like:
lynx [absolute url/script.php]
what about the email blast? for each
query if the query has results the
script sends an email, so it means a
blast of emails...is it going to be
considered spam automatically and
then i could blacklisted?
thanks!!!

what happen if my user grow x10 or
x100 times? is the server going to
crash? there any tip you can suggest
on manage something like that?
Your server could crash/get slow as hell because of extensive memory/cpu usage. You should use a message queue like redis/beanstalkd/gearmand to throttle your email alerts. My preference goes out to redis. use the blocking pop/push with predis library which support blocking pop/push.
there is any way to protect my file
cron/nightly_script.php to be executed
form outside calling it in the url of
the browser? consider tham im using a
string in crontab like:
Don't use cron if you want to scale. Instead create couple of daemons.
1 to schedule sending messages(this part could also be cron) to message queue,
1 to process messages send to message queue.
Daemons don't need to be spawned each time and spawning processes is (relative) expensive. Second your script should not call any URL anymore but instead call the PHP scripts directly(CLI).
what about the email blast? for each
query if the query has results the
script sends an email, so it means a
blast of emails...is it going to be
considered spam automatically and then
i could blacklisted?
When using a message queue you can throttle yourself!

Well, you should probably modify your script so that you can spread the load. For example, you can have the cron run 4+ times a day and each time it does a percentage of the user base, instead of doing them all once a day.
You can take it out of the web server target path and put the cron somewhere that i not accessible externally. It could be executed like this: php /location/of/script.php
I guess it will vary depending on who you send it to, but you should consider how often you send this notice.

Number 1: Monitor the server, watch the load and the time it takes to run. It shouldn't crash it but you may find you get to the point where the load is to high and requests for web pages start to slow down.
But one thing to watch is PHP's memory garbage can be odd sometimes, so watch memory usage of the cron job. If it gets to high PHP will crash.
If it starts to get to much there are lots of solutions; there is no need to have the web server and the email sending on the same machine for instance. As long as they can access the same DB, set up a 2nd server just for email sending. This is what cloud computing is perfect for, hire a 2nd server 4 hours a night (or whatever) and turn it off the rest of the time.
That's just one suggestion ... there are many solutions and it really depends on your situation.

As for number 2, the best solution would be to move the script outside the document root so it's not accessible from a browser and call it directly
php [location/script.php]
If you can't do that, I would do a IP check and only allow it to be called from localhost IP.
You could also build in safe checks: store when the last time you sent a email to a particular user was and check that before sending another. This would protect against crontab problems as well as hackers.

Related

Security concerns running cron jobs with financial transactions involved

I'm working on an app written in CakePHp 2.3.8 on a Ubuntu 12.04 server running apache2. I'd like to create a cron job to handle a situation that occurs on the first day of every month. Each month users are given a set amount of specific tasks they can use, if they go over this limit they're charged based on the number of tasks they go over by. I'd like to create a cron job to accomplish this, but my concern is someone accessing the URL of the CakePHP action for this specific task which could then initiate financial transactions.
I read through this writeup from Google about cron jobs, but I'm not quite sure I understand what they're saying about securing URL's.
A cron handler is just a normal handler defined in app.yaml. You can prevent users
from accessing URLs used by scheduled tasks by restricting access to administrator
accounts. Scheduled tasks can access admin-only URLs. You can restrict a URL by
adding login: admin to the handler configuration in app.yaml.
If the URL being accessed is powered by my CakePHP app, how is cron able to determine whether or not an administrator is accessing it? Or am I supposed to write a stand-alone PHP (or whatever language) file to handle these cron jobs, and inside that file it can "talk" to cron to determine if an admin is accessing it?
Say I do use CakePHP to power it. Would it be safe (or rather necessary) to use a long string in the URL so that basically no one would guess it, and have it match that string in the code?
So something like www.mysite.com/url/to/task/jdbpojzm2929qJjfwX82j3zze9iwj919jsfjmmwmwi
And then my code for that job
function cron_called_function($code){
if($code == "jdbpojzm2929qJjfwX82j3zze9iwj919jsfjmmwmwi"){
//do task
}
}

Non-public member functions cannot be accessed via the url. Cake convention says prefix the method with an underscore.
private function _cron_called_function() { // or protected
// do task
}
Or perhaps look at creating a shell
and setting up a cron in cake

Never use URLs to do these kind of tasks, it is simply plain wrong, insecure and can cause your script to die or the server to become not responding any more.
Lets say you have 10000 users and a script runtime of 30 sec, it is very likely that the script times out before it finished and you end up with just a part of your users being processed at this time. The other scenario with a high or infinite amount of script runtime can lock your server. Depending on the script or DB actions it might cause the server to have a high load and users who use the site while the script is running will encounter a horrible slow to non responding site.
Also you can't really run a loop on a single URL, well you could redirect from one to another that does the limit and offset thing to simulate a loop over the 100000 users. If you don't loop over the records but fetch all 100000 at the same time it's likely your script dies because of running out of memory.
You should create a shell that processes the users in a loop and always just processes batches of for example 10, 50 or 100 users.
When executing your shell I recommend to use it with the "nice" command together to limit the amount of CPU time the shell is allowed to use to prevent the shell from taking 100% CPU usage to keep your site responding.
You can't "talk" to a cron either, a cron is nothing more than a timed executing of something. You can't really specify an user either except you implemented a shell in a way that allows you to pass a specific user as argument for example "cake transactions --user admin". If you mean to execute the shell with a specific system user see How to specify in crontab by what user to run script?.
Look at creating a shell
and setting up a cron in cake.

There are a bunch of ways to prevent anyone buy your own server from accessing a url. None are perfect, but some are better than others.
If possible, point the cron to a page that is simply not visible on the web. This could be a page that is located above the public_html heirarchy. From within the server, this page will be accessible, but will not be accessible via url. This is the best option, IMO.
Another option is to restrict the page to the ip address of the server and to other values in the request such as a post or querystring variable.
And, of course, you have already figured out that you can include a long secret or token in the url that would long enough to make it difficult or unlikely to guess.
You could also ping a page that, in turn, uses CURL to log in as an administrator and runs the page - this is, in some ways, the option that most reflects how you interact with the site. You could create a admin called "cron" and then there would be a log of "cron"'s activities just like any other admin. http://php.net/manual/en/book.curl.php

Is there any way that I make the PHP at server side to perform some kind of actions on the data on it's own?

I have this scenario:
User submits a link to my PHP website and closes the browser. Now that the server has got the link it will analyse the submitted link (page) for the broken links and after it has completely analysed the posted link, it will send an email to the user. I have a complete understanding of the second part i.e. how to analyse the page for the broken links and send the mail to the user. Only problem that I have is how may I achieve this first part i.e. make the server keep running the actions on it's own even even if there is no request made by the client end?
I have learned that "Crontab" or a "fork" may work for me. What do you say about these? Is it possible to achieve what I want, using these? What are the alternatives?

crontab would be the way to go for something like this.
Essentially you have two applications:
A web site where users submit data to a database.
An offline script, scheduled to run via cron, which checks for records in the database and performs the analysis, sending notifications of the results when complete.
Both of these applications share the same database, but are otherwise oblivious to each other.
A website itself isn't suited well for this sort of offline work, it's mainly a request/response system. But a scheduled task works for this. Unless the user is expecting an immediate response, a small delay of waiting for the next scheduled run of the offline task is fine.

The server should run the script independently of the browser. Once the request is submitted, the php server runs the script and returns the result to the browser (if it has a result to return)
An alternative would be to add the request to a database and then use crontab run the php script at a given interval. The script would then check the database to see if there's anything that needs to be processed. You could limit the script to run one database entry every minute (or whatever works). This will help prevent performance problems if you have a lot of requests at once, but will be slower to send the email.

A typical approach would be to enter the link into a database when the user submits it. You would then use a cron job to execute a script periodically, which will process any pending links.
Exactly how to setup a cron job (or equivalent scheduled task) depends on your server. If you have a host which provides a web-based admin tool (such as CPanel), there will often be a way to do it in there.

PHP script will keep running after the client closes the broser (terminating the connection).
Only keep in mind PHP scripts maximum execution time is limited to "max_execution_time" directive value.
Of course here I suppose the link submission happens calling your script page... I don't understand if this is your use case...

For the sake of simplicity, a cronjob could do the wonders. User submits a link, the web handler simply saves the link into a DB (let me pretend here that the table is named "queued_links"). Then a cronjob scheduled to run each minute (for example), selects every link from queued_links, does the application logic (finds broken page links) and sends the email. It then also deletes the link from queued_links (or updates a flag to represent the fact that the link has already been processed.
In the sake of scale and speed, a cronjob wouldn't fit as well as a Message Queue (see rabbitmq, activemq, gearman, and beanstalkd (gearman and beanstalk are my favorite 2, simple and fit well with php)). In lieu of spawning a cronjob every minute, a queue processor listens for 'events' and asynchronously processes the 'events' (think 'onLinkSubmission($link)'), and processes the messages ASAP. The cronjob solution is just a simplified implementation of one of these MQ solutions, will result in better / more predictable results, but at the cost of adding new services to maintain, etc.

well, there are couple of ways, simplest of them would be:
When user submit a request, save this request some where, let's call it jobs table, and inform customer that his request has been received, they'll be updated site finish processing your request, or whatever suites you.
Now, create a (or multiple) scripts (depending upon requirement) and run this script from Cron, this script will pick requests from Job table, process it, do whatever required.
Alternatively, you can evaluate possibility of message_queue or may be using a Job server for this.
so, it all depends on your requirement.

PHP Memory Usage during sleep and loops

I have a few questions about PHP memory usage. I'm going to run some tests on my own, but getting various advice is quite helpful.
I recently learned about the PHP function ignore_user_abort(), which allows a script to continue running even if a user closes the page. I was thinking about using this for my E-mail Newsletter tool instead of Cron jobs, as configuring Cron jobs has various pitfuls. The alternative approach of making a user stay on the page, using AJAX requests, and running part of the script after the page content has been delivered all have issues as well.
My solution would be to run call ignore_user_abort(true) at the beginning of the script, and at the end after the content has been generated, call flush() for good measure, and then run the newsletter script. Alternatively, do this with an AJAX.
First of all, does anyone see issues with that approach?
Second of all, if I used the script with no time limit set, and a while loop going through each email, what would the memory usage be like if I did it in one go? Since I'd be overwriting variables, not using new ones, I'd think it would be low.
Third, because if I am sending a large volume of emails, say 1000 per run, I don't want to overload my mail server. With my cron job, I run the script every 5 minutes, sending a batch of 50 emails out. If I was doing this in a single pass, could I send out 50 emails, call sleep for say 5 minutes, and then continue for another 50 emails? If so, what is the script memory usage like during the sleep period? Would this be an efficient method?
What I'm really trying to do here is come up with a way to create a newsletter tool that doesn't require the complex (for non-technical folks) task of setting up a Cron job (Which isn't even an option on shared hosts), and doesn't require the user to keep their browser open on a single page.
Any ideas suggestions or feedback is welcome. Thanks!

At a former job we wrote a daemon for a critical function in PHP, not exactly what you describe but similar enough -- certainly with loops and sleeps. We were very doubtful about its long-term stability -- specially in memory management--, so we subjected it to pretty tough stress testing. Results were excellent, and the code was put to production and running flawlessly for months if not years.
Caveats:
IIRC, PHP has a counter-based garbage
collector. This means that, unlike in
Java, two objects referencing each
other will stay in memory even if not
accessible by your program. You need
to be careful about this when you
'abandon' your objects.
Web servers
often have mechanisms to kill
long-running scripts. This may defeat
your purpose here -- specially if the
server's configuration can't be
tuned.

PHP: Sending huge quantity of emails in batch

Putting aside the disdain from junk marketing I need to send around 15,000 emails to customers. My coworker has tried to send them through php mail loop but obviously it gets stuck quickly. Is there a conventional (i.e. through PHP script) to accomplish this swiftly? If not, how do you suggest I do this (maybe through exec) without too much overhead?
Thanks!

I've used PEAR's Mail_Queue to queue up 200,000+ mails at a time. Populating a database is easy and quick, even with customised content, and then a fairly simple script sends around 250 a time - if the load average isn't too high. Then it loops around and sends the next batch.
You won't send them any faster than is usually possible, but it will do it without any problems.
The tutorial gives you almost everything you need - just loop around th 'send_messages.php' script (from the command line is better) until the database queue is empty.

You could look at using something like Gearman to create a queue system as recommended here. Another option would be to look at a paid service like Amazon's Simple Email Service (SES)

No matter how you implement immediate delivery: it'll be a lengthy process that's always subject to interruptions and you can't afford restarting the delivery and sending the same message twice to 5,000 customers.
I believe that a reliable system must use queues. The main script simply adds recipients to a queue and then you have a secondary process that picks items from the queue, get them sent and finally mark them as sent. This secondary process can be launched manually (maybe from the command line) or via cron tab.
I've never used but I have this in my bookmarks: http://ledscripts.com/free/php/phpledmailer

Are you running it through CGI or as a script on the command line? It's best to run it as a script on the command line.
If you say that it gets stuck, try running set_time_limit(0); to avoid PHP quitting from the execution being too long.

Method to trigger a php script from the server on particular schedule without using cron job facility

As my server is not supporting cron job, I want a file in my server to trigger its action on a particular time every day..
Please let me know whether it possible to do run a script at a particular time from the server side itself without any external act.

I agree with Kel's answer.
You could try out one of the free cronjob services available, if your server doesn't support it.
Online Cronjobs
Set Cronjob
Just the first two found on Google, there's likely to be more if you search a little.

You cannot start script without ANY external act.
If your file server has SSH or HTTP server or something like that, you can configure cron job on another server to start your script via SSH / HTTP / something like that.
Also, you can create PHP script, which would do sleeping in a loop all the time, and wake up and do some job only if current time is near some specific value. You will have to correct maximum execution time for php script (see here for details), and you will have to start your script on server startup. BTW, this does not look like good solution.

As mentioned before, this is not possible literally "without external act".
A nice solution I found in the ThinkUp software (don't know where else this is used) to use a RSS feed reader. From the point of simplicity, this is probably the best option.
The idea is that you use your feed reader to automatically call a script on your site every XX hours (or whatever interval you want). When called, this script executes the maintenance tasks or whatever it is that you want to do.
To make sure that not everybody can run that script and cause your server to break down (I suppose this is a somewhat heavy task), you can use a unique long identifier string appended as URL parameter to make sure that the script only gets called by you.
Other than that, you can use one of the "poor man's" web cron job services that have been suggested in other answers.

if (rand(0,100)==0){
if (!file_exists($tf='tmp/job.crontime') || (time() - filemtime($tf))>(60*60*24)){
... # your tasks
touch($tf);
}
}
This simple & stupid script uses a file to store the time of last job-ecexution. If >60*60*24 has passed — it launches the job code. rand(0,100) should lower the overhead of checking for jobs on each request: 1/100 is the chance of running your jobs.
Put it in the end of your 'index.php'. Don't use in projects with modelate to high load :))
The Great Disadvantage: it won't run if you don't have any visitors.
UPD: Write a script that runs indefinitely and every 30s does touch('tmp/job.crontime') to report it's still alive. It should also check the current time & perform actions.
In index.php, if more than 30s has passed — re-launch the daemon with an HTTP-request. Ugly, but fully functional. You'll also deal with time limits, be careful!

Well, if this is on a public web server and you have enough visits, you could always use those to run code to check for a given value, say hour of day, number of times a file have been accessed (or store your number in a file). Just put your php code on top of a web page.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.