In our application we need to connect to the user email inbox and fetch all of messages.
The application will run in a dedicated server by Ubuntu OS.
I have ran some test in my pc by using imap extension (imap_open, imap_fetchheader, imap_fetchbody, ...).
There are multiple problems in using this approach:
1- The connection and fetching speed is very slow.
2- The php script will time out in case of many messages exists in inbox.
3- The application is multiuser and the number of requests is high.
My search results to solve above problems:
1- We have to rent a static IP from google or other email servers for imap connections !
2- We have to use PHP CLI to fetch orders from database and fetch inbox messages.
3- Multithreading in php.
Summary:
I want to write a bot in php to Connect to mail servers everyday. What connection and fetching way do you suggest? (library, method, language and etc). Thanks
**UPDATE : The code that I had ran to fetch 30 first messages:
$mbox = imap_open('{imap.gmail.com:993/imap/ssl/novalidate-cert}', 'myemailaddress', 'mypassword');
$count = imap_num_msg($mbox);
for($i = $count; $i >= $count - 30; $i--){
$header = imap_fetchheader($mbox, $i);
/* process header & body */
}
imap_close($mbox);
This code works well but when I change imap_fetchheader to imap_fetchbody it takes more time.
Update :
Based on the arkascha answer, I worked on my design and architecture and I found out the low speed reason is in the connections to the mail server. Because this is a I/O bounded task, I cached the imap connection to each mail server and the hole speed improved but not so much.
Sorry, but your search results are simply wrong.
you do not need a static IP address to fetch messages from an imap server. And certainly not one "rented from google". Why should that be required?
I see no reasons why the cli variant of php should be better suited here. In contrary, it is less efficient, since it has a much bigger startup load bacause a process must be spawned for each and every request.
why should you need multithreading? What benefits should it offer for this situation? Multithreading is only of interest for interactive stuff where responsiveness might be an issue. And even then this can be the wrong idea.
I myself implemented an imap client a few times and ran into no such problems. You certainly can implement a robust and efficient solution based on the phps imap extension. Speed of connection and transfer depend on many details, I doubt that php imap implementation is the cause of your problems. There might be additional issues which lead you to your statements, but you do not specify such in your question.
In general you should never fetch a huge list of data in one go and then process it. That is simply really bad design which cannot scale in any way. Instead you should always process lists in a sequential strategy: fetch a single data unit and process it. Then head on to the next. That way your memory footprint stays small, you prevent hitting php limits. This also enables you to process the whole list in smaller chunks (or even one message at a time) which prevents you from hitting phps execution time limit or a timeout on client side. These are general successful implementation patterns which are the base of a robust processing implementation which also apply to email messages.
To answer your bottom line question:
nothing speaks against using phps imap extension
use a cron job for this and do it on a regular base, not just once a day
invest into planning a clean architecture before starting the implementation
I might be biased but I'd use Zend Framework with Zend_Mail_Storage classes: http://framework.zend.com/manual/1.7/en/zend.mail.read.html You get a couple of options like switching to directly accessing mbox or maildir files if possible or getting closer to the underlying protocol and hack something in the zend_mail_protocol_imap to improve speed.
Related
I need to create a user-to-user real time chat system. I managed to create a simple AJAX/PHP/MySQL script for the chat, but one thing worries me in the PHP/MySQL part:
while(true) {
// ...SELECT * FROM messages ORDER BY date DESC LIMIT 1...
if($row['date'] > $_POST['last']) {
echo json_encode($row);
break;
}
sleep(1);
}
Doesn't that mean that it will SELECT the table every 1 second and wouldn't that overload the server?
I tried to use PHP sockets, but it was a nightmare. I had to buy SSL certificate, also, the server was crashing on tests with many users, so I decided to use the long pulling system for now. If I remember correctly, Facebook was using XHR long polling before switching to sockets (if they switched at all).
Well, your question is too generalized. While it mostly depends on the load your will receive and your server specifics, it's generally discouraged to use long polling.
There are certain assumptions made in your question, that are not totally true at all.
I had to buy SSL certificate, also
Did you? As far as I know there are free certificate issuers, such as letsencrypt. The problem here is maybe you are using shared hosting with FTP only access. Consider getting a VPS from some cloud provider, e.g. DigitalOcean. You will have full-access to the virtual environment there. And most of the cheapest VPSs are on the same price you can get a shared hosting.
Facebook was using XHR long polling before switching to sockets
Yes, they were and some times they fall back to them too. But, first of all - they have a lot of computing power. They can afford these things. And second, the facebook web chat is not the fastest app I have ever used :)
-
With indexed columns and a few records on a normal MySQL single instance, you will not notice any slow downs. When the data grows and so do the users (simultaneous connections), you will gradually find out that you need to optimize things and one day you will eventually go to the WebSocket's favor.
PHP at heart is not meant to be asynchronous. All the async things along with the whole event loop you will need to either do by yourself or compose several frameworks to come to the rescue.
I would generally suggest a complete proprietary WebSocket implementation with asynchronous runtime. You either take a look at Ratchet for PHP or use ws for Node.js.
I have this "crazy" project starting, the idea behind it is quite clear:
There is some software that writes to MySQL database. The interval between queries are 1 second.
Now I need and web interface which loads those database records, and continues to show new records, when they happen.
The technologies I am supposed to use are PHP and HTML5 WebSockets. I'v found this nice library Ratchet which I think fits my needs, however there's one problem, I am not sure how to notify PHP script, or send a message to running PHP WebSockets server when the MySQL query occurs.
Now I could use Comet and send request for database record every second, but then it beats the WebSokets which I am supposed to use.
So what I really need is MySQL Pub/Sub system.
I'v read of MySQL triggers but I see that it possess some security risks, and thought the security in this case isn't a real concern since the system will be isolated in a VPN and only few specific people will be using it, I still would like to address every possible issue and do everything in a right way.
Then there is MySQL proxy, of which I have no knowledge, but if it could help me achieve my goal, I would very much consider using it.
So in short the question is how can I notify or run PHP script when MySQL query occurs?
I would separate the issues a bit.
You definitely need some sort of pub/sub system. It's my understanding that Redis can act as one, but I've never used it for this and can't make a specific recommendation as to what system to use. In any case, I wouldn't attach it directly to your database. There are certainly database operations you need to do on your database for maintenance purposes and you don't want it flushing out a million rows to your clients because it's based on a trigger. Pick your pub/sub system independent of what you're doing client-side and what you're doing with your database. The deciding factors should be how it interacts with your server-side languages (PHP in this case).
Now that your pub/sub is out of the way, I would build an API server or ingest system of sorts that takes data in from wherever these transactions are coming from. It will handle these and publish messages as well as inserting them into the database at the same time.
Next, you need a way to get that to clients. Web Sockets are a good choice, as you have discovered. You can do this in PHP or anything really. I prefer Node.js with Socket.IO which provides a nice fallback to long-polling JSON (among others) for clients that don't suppot Web Sockets. Whatever you choose here needs to listen for messages on your pub/sub and send the right data for the client (likely stripping out some of the information that was published that isn't immediately needed client-side).
I've got a registration list, which I need to send out a PDF to each person on the list. Each email needs to contain a PDF, which has a base version on the server, but each person's needs to be personalized via name/company etc over the top. This needs to be emailed to each person, which at the moment adds up to be 2,500, but can easily be much higher in the future.
I've only just started working on this project, but the problem I've encountered continuously since last week are that the server doesn't seem to be able to handle doing this. Currently the script is using Zend, which then allows it to use Zend_Pdf and Zend_Mail to create and email the PDFs. Zend_mail connects to an smtp server from smtp.com to do the actual emailing.
Since we have quite a few sites running on the server, we can't afford it to be going down, and when I run it in batches it can start to go down. The best solution I have thus far is running curl from my local machine to the script, which then does one person. The curl script then calls it again, over and over in batches. Even this runs into problems at times, and seems to some how hog memory even after it should be complete (I'm really not sure how).
So what I'm looking for is information on doing this, from libraries, code, information on server setups, anything that can make this much less painful, and much quicker for us to run. I've run out of ideas, and this is something I've not really had to do before (especially at a bulk level).
Thank you.
Edit:
I also forgot to mention that it's using zend_barcode::factory for creating a barcode on the PDF.
First step I suggest is to work out where the problem lies if you can. Is it the PDF generation? Is it the emailing? "Server doesn't seem to be able to handle this" doesn't say what is actually failing as with the "server goes down" - you need to determine if you are running out of memory/disk-space/time or something else. That will help you determine if you need a tweak or a new approach to your generation. Because you said that even single manual invocations can fail you should be able to narrow the problem down to exactly what is the cause of the failure.
If you are running near some resource limit (which might be the case with several sites running), you probably need to offload this capability onto another machine. Your options include:
run the same setup on a new host and adjust your applications to use the new system
run a new setup on a new host
use an external system (such as the mentioned PDFCrowd or Docmosis)
Start with the specifics of the problem. I hope that helps. Please note I work for the company that created Docmosis.
Here's some ideas:
Is there a particular reason this has to run on a web server? Why not run the framework
from a different machine, but with the same settings? You might have to create a different
controller to handle the command-line version of the request, but there's no fundamental
reason it can't work.
If creating PDFs programatically is giving you a headache, you can instead use a service.
In the past, I've used PDFCrowd with good results, and they provided
a useful PHP library. You can give them a blob of HTML, using full URLs for any stylesheets
and images, and they'll create a PDF for you.
The cost per document varies from 0.5-4.5 cents per document depending on your rate plan.
There are other services which do the same thing.
If this kind of batch job is a big deal for your company, you might consider an
asynchronous job queue like beanstalk. You could queue
up thousands of these, and a worker script could handle the requests at whatever pace you
deem reasonable.
From my experience - two options:
Dynamically generate PDFs using one or more PDF libraries (which can be awfully slow).
OR
Use something like wkhtmltopdf which is a simple shell utility to convert html to pdf using the webkit rendering engine, and qt.
Basically, you can loop over n HTML pages and generate PDF's without the overhead of purely dynamic PDF generation!
We've used this to distribute thousands of personalised PDF's on a daily basis as it quickly converts HTML pages to PDF. There are dependencies, but it works and is less intensive (computationally) than 'creating' PDFs individually.
Hope this helps.
If you are trying to call the script over HTTP, the script will timeout based on the max_execution_time specified in the php.ini.
You need to write a php script which can be run from command line and then schedule it via a cron job. The script at a time, can read one user, put together his pdf file, and email him. After that, you might have to run some performance checks to see if the server can handle the process.
This is a bit complicated, so please don't jump to conclusions, feel free to ask about anything that is not clear enough.
Basically, I have a websocket server written in PHP. Please note that websocket messages are asynchronous, that is, a response to a request might take a lot of time, all the while the client keeps on working (if applicable).
Clients are supposed to ask the server for access to files on other servers. This can be an FTP service, or Dropbox, for the matter.
Here, please take note of two issues: connections should be shared and reused and the server actually 'freezes' while it does its work, hence any requests are processed after the server has 'unfrozen'.
Therefore, I thought, why not offload file access (which is what freezes the server) to PHP threads?
The problem here is twofold;
how do I make a connection resource in the main thread (the server) available to the sub threads (not possible with the above threading model)?
what would happen if two threads end up needing the same resource? It's perfectly fine if one is locked until the other one finishes, but we still need to figure out issue #1.
Perhaps my train of thought is all screwed up, if you can find a better solution, I'm eager to hear it out. I've also had the idea of having a PHP thread hosting a connection resource, but it's pretty memory intensive.
PHP supports no threads. The purpose of PHP is to respond to web requests quickly. That's what the architecture was built for. Different libraries try to do something like threads but they usually cause more issues than they solve.
In general there are two ways to achieve what you want:
off-load the long processes to an external process. A common approach is using a system like gearman http://php.net/gearman
Use asynchronous operations. Some stream operations and such provide an "async" flag or "non-blocking" mode. http://php.net/stream-set-blocking
We have a bunch of cli cron style scripts that are coded in php.
A few of these services use ftp to send data to remote locations.
The way things are set up, what happens quite frequently is:
a) Script start
b) Connect to ftp # remote location
c) Send data
d) Close ftp connection
e) Terminate script
f) Return to A, repeat, within a short amount of time and send to the same target, but different data.
The issue is that there is quite a bit of overhead (read: slowdown) due to step b, where it first has to connect to the ftp server, login, make sure the folder exists, if not create it, etc etc... I know I know, the right way to do things would be to consolidate these transfers into single pushes... But its far more complicated then that. I simplified about 30-40 steps from here.
So what I was hoping on doing is setting up a system like this:
[ CRON CLI SCRIPT ] --->
[ LOCALLY HOSTED SOCKET BASED SERVER THAT KEEPS THE FTP CONNECTIONS OPEN ] --->
[ REMOTE FTP ]
With the above, we can keep the locally hosted socket based server running, and the ftp connections open and we would skip what is the longest part of the process, the ftp authentication related items.
While setting this up for a 'one at a time' style system in PHP is fairly trivial, what I have never done before is making this as close to multi threaded as possible.
Where by, the socket is opened (for example, 127.0.0.1:10000), and multiple requests can come in. If needed, 'children' are spawned, new ftp connections made, etc etc.
Can anyone shed some insight into making this multi threaded in php, OR, if there is another better solution out there? Perl is an option. Its been years (YEARS...) since I have touched it, but I am sure a couple days in front of some good docs would bring me up to speed enough to make it happen.
We have build a system that does more or less what you want. So, it is definitely possible to build a multi-process application in PHP.
It is however, not trivial. If you fork off a child process, you need to very carefully manage your remote connections in order to avoid problems. (use the socket_* family of functions instead of fsockopen for better control)
Also, Signals, tend to interrupt your normal program flow. This is off course normal, but PHP was not build with this in mind -> be prepared for some unexpected results.
try to user gearman , you can handle most expensive cpu usage with gearman , gearman make a new thread for each process .