Dynamically create PDFs and email them - php

I've got a registration list, which I need to send out a PDF to each person on the list. Each email needs to contain a PDF, which has a base version on the server, but each person's needs to be personalized via name/company etc over the top. This needs to be emailed to each person, which at the moment adds up to be 2,500, but can easily be much higher in the future.
I've only just started working on this project, but the problem I've encountered continuously since last week are that the server doesn't seem to be able to handle doing this. Currently the script is using Zend, which then allows it to use Zend_Pdf and Zend_Mail to create and email the PDFs. Zend_mail connects to an smtp server from smtp.com to do the actual emailing.
Since we have quite a few sites running on the server, we can't afford it to be going down, and when I run it in batches it can start to go down. The best solution I have thus far is running curl from my local machine to the script, which then does one person. The curl script then calls it again, over and over in batches. Even this runs into problems at times, and seems to some how hog memory even after it should be complete (I'm really not sure how).
So what I'm looking for is information on doing this, from libraries, code, information on server setups, anything that can make this much less painful, and much quicker for us to run. I've run out of ideas, and this is something I've not really had to do before (especially at a bulk level).
Thank you.
Edit:
I also forgot to mention that it's using zend_barcode::factory for creating a barcode on the PDF.

First step I suggest is to work out where the problem lies if you can. Is it the PDF generation? Is it the emailing? "Server doesn't seem to be able to handle this" doesn't say what is actually failing as with the "server goes down" - you need to determine if you are running out of memory/disk-space/time or something else. That will help you determine if you need a tweak or a new approach to your generation. Because you said that even single manual invocations can fail you should be able to narrow the problem down to exactly what is the cause of the failure.
If you are running near some resource limit (which might be the case with several sites running), you probably need to offload this capability onto another machine. Your options include:
run the same setup on a new host and adjust your applications to use the new system
run a new setup on a new host
use an external system (such as the mentioned PDFCrowd or Docmosis)
Start with the specifics of the problem. I hope that helps. Please note I work for the company that created Docmosis.

Here's some ideas:
Is there a particular reason this has to run on a web server? Why not run the framework
from a different machine, but with the same settings? You might have to create a different
controller to handle the command-line version of the request, but there's no fundamental
reason it can't work.
If creating PDFs programatically is giving you a headache, you can instead use a service.
In the past, I've used PDFCrowd with good results, and they provided
a useful PHP library. You can give them a blob of HTML, using full URLs for any stylesheets
and images, and they'll create a PDF for you.
The cost per document varies from 0.5-4.5 cents per document depending on your rate plan.
There are other services which do the same thing.
If this kind of batch job is a big deal for your company, you might consider an
asynchronous job queue like beanstalk. You could queue
up thousands of these, and a worker script could handle the requests at whatever pace you
deem reasonable.

From my experience - two options:
Dynamically generate PDFs using one or more PDF libraries (which can be awfully slow).
OR
Use something like wkhtmltopdf which is a simple shell utility to convert html to pdf using the webkit rendering engine, and qt.
Basically, you can loop over n HTML pages and generate PDF's without the overhead of purely dynamic PDF generation!
We've used this to distribute thousands of personalised PDF's on a daily basis as it quickly converts HTML pages to PDF. There are dependencies, but it works and is less intensive (computationally) than 'creating' PDFs individually.
Hope this helps.

If you are trying to call the script over HTTP, the script will timeout based on the max_execution_time specified in the php.ini.
You need to write a php script which can be run from command line and then schedule it via a cron job. The script at a time, can read one user, put together his pdf file, and email him. After that, you might have to run some performance checks to see if the server can handle the process.

Related

Scheduled task (cronjob) with PHP

I'm creating a website that requires a file to be generated and stored on the server periodically (an XML feed for iTunes). The page is generated using ExpressionEngine. I discovered that the website's current server has a very restricted cPanel and doesn't have access to cron.
So I'm considering two options; find an alternative way to access the cronjobs (if they are available), or find an alternative way to created regularly scheduled tasks.
Regarding the first option, how would I go about determining if a server has cron available? I'm not sure how useful this would be anyway since I don't think the server allows shell access (it's a very basic setup for people who aren't tech savvy).
Regarding the second option, a friend mentioned to me that the functionality of cronjobs can just be done in PHP. How would I go about this?
Or, am I perhaps thinking too much with this? The page in ExpressionEngine that outputs the XML file is domain.com/itunes/itunes_feed. This just has some EE tags that outputs the relevant XML and the resultant page is in .xml format. Is it enough to just submit the above url to iTunes, or does it have to be a url to the actual pre-existing file on the server?
Option 1
Simply contact your hosts and ask them do they support cron jobs, and if so, how to set up.
Option 2
I only just set up my own set of cron jobs yesterday..
Create a php file that runs the code you want,
Set up and account on https://www.easycron.com/
Upload your php file to easycron
Set the times in which you would like your php code to run
Simple as that! Does that make sense?

Flash and PHP, live user environment: how do I use sockets?

I've been scouring Google to find out how I can make Flash bring at least two users together in an environment. What I've been trying to do is, for example: both users load http://example.com/myflashenvironment.html, and on that page is the same flashenv.swf file that they both see from two different computers. In the flashenv.swf there is a movieclip object that is draggable. I want to make it so if user 1 drags the movieclip then user 2 can see it being dragged, over the Internet in some kind of online type of deal. I've been trying to do it at runtime, like an online game.
I've been searching google and I've found things about sockets, but it's very hard to just jump right in when you are me. I've tried to figure out so far that I need a PHP file that creates a connects to my server with fsockopen, and then I need to create a socket? But then I don't know how to have user 1 write the (x, y) coordinates of the movieclip when he drags it and have user 2 automatically pick up those same coordinates.
And please believe me, I used this as a last resort to see if anyone knew what I am talking about. Google just isn't cutting it tonight.
It sounds to me like you need to read up on how to actually use sockets. Once you understand how they work, how you should structure your program should become very clear. You could serialize a small object with whatever you want the other user to see (like a coordinate change, for example).
But nevermind that, PHP is not what you want. PHP is not made for this sort of thing. What you need is some kind of standalone server - you would have to roll your own using C++ or Java, for example. PHP is made for short requests - you can't run it as a server. Yes, it does have sockets, but they're also made for quick one-shot connections. You need something that is always running, I'm assuming.
You should check out some of the flash multiuser servers that are already made if you don't want to roll your own. Red5 is a free one, and SmartFoxServer is a more fully featured server, but it is not free (they do have a free version, but it only supports a few concurrent users).
It is questionable (but not without precedence) to write and run a server in PHP.
The suggested Java based solution fits better for your needs.
If you are totally new to multi-user Flash, I recommend using SmartFox Server. It is very easy to use and there are many tutorials.
it is possible to create the socket server you want in php, but i don't really recommend it.
the difference to traditional php scripts is, you wouldn't run it like it's called over the browser, but a long-running (think infinite loop) cli-server-application (more like java)
simplified it works like this:
php: the script starts and listens for incoming request
flash: the flash app is started and connects to the server
php: the connection (from 2) is stored in an array
flash: now if the user moves his movieclip, the coordinates are sent to the script
php: data arrives (the coordinates from 4). now you loop through all connections and ...
... send the data to all the other movieclips
flash: if data (from 6) arrives, update the mc position accordingly
if the flash connection is terminated, remove it from the array
the problems:
- php is not really well suited for this
- you still have to learn about sockets. there are lots of tutorials on this topic, but most of them cover only single connections.
- depending on where you host it, your provider might not support long-running php-cli apps
No need to write your own server, use sockets or other complicated and time-consuming techniques.
Adobe has created the shared object class for exactly that purpose. You need to have a server running Flash Media Server (or equivalent) and use remote shared objects.

PHP/Apache blocking on each request?

Ok, this may be a dumb question but here goes. I noticed something the other day when I was playing around with different HTML to PDF converters in PHP. One I tried (dompdf) took forever to run on my HTML. Eventually it ran out of memory and ended but while it was still running, none of my other PHP scripts were responsive at all. It was almost as if that one request was blocking the entire Webserver.
Now I'm assuming either that can't be right or I should be setting something somewhere to control that behaviour. Can someone please clue me in?
did you had open sessions for each of the scripts?:) they might reuse the same sesion and that blocks until the session is freed by the last request...so they basically wait for each other to complete(in your case the long-running pdf generator). This only applies if you use the same browser.
Tip, not sure why you want html to pdf, but you may take a look at FOP http://xmlgraphics.apache.org/fop/ to generate PDF's. I'm using it and works great..and fast:) It does have its quirks though.
It could be that all the scripts you tried are running in the same application pool. (At least, that's what it's called in IIS.)
However, another explanation is that some browsers will queue requests over a single connection. This has caused me some confusion in the past. If your web browser is waiting for a response from yourdomain.com/script1.php and you open another window or tab to yourdomain.com/script2.php that request won't be sent until the first request receives a reply making it seem like your entire web server is hanging. An easy way to test if this is what's going on try two requests on two separate browsers.
It sounds like the server is simply being overwhelmed and under too much load to complete the requests. Converting an HTML file to a PDF is a pretty complex process, as the PHP script has to effectively provide the same functionality as a web browser and render the HTML using PDF drawing functions.
I would suggest you either split the HTML into separate, smaller files or run the script as a scheduled task directly through PHP independent of the server.

Communication between PHP and application

I'm playing with an embedded Linux device and looking for a way to get my application code to communicate with a web interface. I need to show some status information from the application on the devices web interface and also would like to have a way to inform the application of any user actions like uploaded files etc. PHP-seems to be a good way to make the interface, but the communication part is harder. I have found the following options, but not sure which would be the easiest and most convenient to use.
Sockets. Have to enable sockets for the PHP first to try this. Don't know if enabling will take much more space.
Database. Seems like an overkill solution.
Shared file. Seems like a lot of work.
Named pipes. Tried this with some success, but not sure if there will be problems with for example on simultaneous page loads. Maybe sockets are easier?
What would be the best way to go? Is there something I'm totally missing? How is this done in those numerous commercial Linux based network switches?
I recently did something very similar using sockets, and it worked really well. I had a Java application that communicates with the device, which listened on a server socket, and the PHP application was the client.
So in your case, the PHP client would initialize the connection, and then the server can reply with the status of the device.
There's plenty of tutorials on how to do client/server socket communication with most languages, so it shouldn't take too long to figure out.
What kind of device is it?
If you work with something like a shared file, how will the device be updated?
How will named pipes run into concurrency problems that sockets will avoid?
In terms of communication from the device to PHP, a file seems perfect. PHP can use something basic like file_get_contents(), the device can just write to the file. If you're worried about the moment in time the file is updated to a quick length check.
In terms of PHP informing the device of what to do, I'm also leaning towards files. Have the device watch a directory, and have the script create a file there with something like file_put_contents($path . uniqid(), $command); That way should two scripts run at the exact sime time, you simply have two files for the device to work with.
Embedded linux boxes for routing with web interface don't use PHP. They use CGI and have shell scripts deliver the web page.
For getting information from the application to the web interface, the Shared file option seems most reasonable to me. The application can just write information into the file which is read by PHP.
The other way round it looks not so good at first. PHP supports locking of files, but it most probably doesn't work on a system level. Perhaps one solution is that in fact every PHP script which has information for the application creates it own file (with a unique id filename, e.g. based on timestamp + random value). The application could watch a designated directory for these files to pop-up. After processing them, it could just delete them. For that, the application only needs write permission on the directory (so file ownership is not an issue).
If possible, use shell scripts.
I did something similar, i wrote a video surveillance application. The video part is handled by motion (a great FOSS package). The application is a turn-key solution on standardized hardware, used to monitor slot-machine casinos. It serves as a kiosk system locally and is accessible via internet. I wrote all UI code in PHP, the local display is a tightly locked down KDE desktop with a full screen browser defaulting to localhost. I used shell scripts to interact with motion and the OS.
On a second thought:
If you can use self-compiled applications on the device: Write a simple program that returns the value you want and use PHP's exec() or passthru() or system().

Sending large files via HTTP

I have a PHP client that requests an XML file over HTTP (i.e. loads an XML file via URL). As of now, the XML file is only several KB in size. A problem I can foresee is that the XML becomes several MBs or Gbs in size. I know that this is a huge question and that there are probably a myriad of solutions, but What ideas do you have to transport this data to the client?
Thanks!
based on your use case i'd definitely suggest zipping up the data first. in addition, you may want to md5 hash the file and compare it before initiating the download (no need to update if the file has no changes), this will help with point #2.
also, would it be possible to just send a segment of XML that has been instead of the whole file?
Ignoring how well a browser may or may-not handle a GB-sized XML file, the only real concern I can think of off the top of my head is if the execution time to generate all the XML is greater than any execution time thresholds that are set in your environment.
PHP's max_execution_time setting
PHP's set_time_limit() function
Apache's TimeOut Directive
Given that the XML is created dynamically with your PHP, the simplest thing I can think of is to ensure that the file is gzipped automatically by the webserver, like described here, it offers a general PHP approach and an Apache httpd-specific solution.
Besides that, having a browser (what else can be a PHP-client?) do such a job every night for some data synchonizing sounds like there must be a far simpler solution somewhere else.
And, of course, at some point, transferring "a lot" of data is going to take "a lot" of time...
The problem is that he's syncing up two datasets. The problem is completely misstated.
You need to either a) keep a differential log of changes to dataset A to that you can send that log to dataset B, or b) keep two copies of the dataset (last nights and the current dataset), and then compare them so you can then send the differential log from A to B.
Welcome to the world of replication.
The problem with (a) is that it's potentially invasive to all of your code, though if you're using an RDBMS you could do some logging perchance via database triggers to keep track of inserts/updates/deletes, and write the information in to a table, then export the relevant rows as your differential log. But, that can be nasty too.
The problem with (b) is the whole "comparing the database" all at once. Fine for 100 rows. Bad for 10^9 rows. Nasty nasty.
In fact, it can all be nasty. Replication is nasty.
A better plan is to look into a "real" replication system designed for the particular databases that you're running (assuming you're running a database). Something that perhaps sends database log records over for synchronization rather than trying to roll your own.
Most of the modern DBMS systems have replication systems.
Gallery2, which allows you to upload photos over http, makes you set up a couple of php parameters, post_max_size and upload_max_filesize, to allow larger uploads. You might want to look into that.
It seems to me that posting large files has problems with browser time-outs and the like, but on the plus side it works with proxy servers and firewalls better than trying a different file upload protocol.
Thanks for the responses. I failed to mention that transferring the file should be relatively fast (few mintues max, is this even possible?). The XML that is requested will be parsed and inserted into a database every night. The XML may be the same from the night before, or it may be different. One solution that was proposed is to zip the xml file and then transfer it. So there are basically two requirements: 1. it has to relatively fast 2. it should minimize the number of writes to the database.
One solution that was proposed is to zip the xml file and then transfer it. but that only satisfies (1)
Any other ideas?
Are there any algorithms that I could apply to compress the XML? How are large files such as MP3s being downloaded in a matter of seconds?
PHP receiving GB's of data will take long and is overhead.
Even more perceptible to flaws.
I would - dispatch the assignment to a shellscript (wget with simple error catching) that is not bothered by execution time and on failure could perhaps even retry on its own merit.
Am not experienced with this, but though one could use exec() or alike, these sadly run modal.
Calling a script with **./test.sh &** makes it run in background and solves that problem / i guess. The script could easily let your PHP pick it back up via a wget `http://yoursite.com/continue-xml-stuff.php?id=1049381023&status=0ยด. The id could be a filename, if you don't need to backtrack lost requests. The status would indicate how the script ended up handling the request.
Have you thought about using some sort of version control system to handle this? You could leverage its ability to calculate and send just the differences in the files, plus you get the added benefits of maintaining a version history of your file.
Since I don't know the details of your situation I'll throw question out there. Just for sake of argument does it have to be HTTP? FTP is much better suited for large data transfer and can be automated easily via PHP or Perl.
If you are using Apache, you might also consider Apache mod_gzip. This should allow you to compress the file automatically and the decompression should also happen automatically, as long as both sides accept gzip compression.

Categories