What are the problems associated with retrieving email using CURL?

What are the problems associated with retrieving email using CURL? - php

A new feature I wish to add to our local network is the ability to retrieve email from free email services such as Gmail, Yahoo and Hotmail using PHP. There are services we can pay for but I would rather hack it up myself!
I find that Google only has an API but the rest do not. What are the problems associated then with me just retrieving email using CURL?
I have even implemented the GMail part using CURL and PHP.

It almost certainly violates their terms of service to screen-scrape their websites for that purpose. If they redesign your site, the scripts you're using to parse out the e-mail contents etc. will probably break catastrophically, as well.
Yahoo, Gmail, and Hotmail all support POP3, a standard protocol for retrieving e-mails. Why not use that instead?

When someone gives you an API, they're promising you that "if you run code X, Y will happen. When you screen scrape, there's no such promise from the provider, and many providers have items in their terms of service that explicitly forbid screen scraping. From a technical standpoint, this means their page/application may undergo changes that will break your screen scraping, wither accidently or purposefully by the provider. This is why CAPTCHA's exist.
Also, increasingly, these applications are using more and more "AJAX" style architectures, which means you're committing yourself to reverse engineering how their application works, as well as keeping up with the changes each application makes.
Finally, well, you're doing it wrong. Email is a set of protocols in and of itself. Most providers have a way to access email via POP3 and IMAP. I'd look into hacking PHP code to interact with the POP/IMAP servers which, like an API, are a promised set of behaviors. You also have the advantage that code written for one provider will likely work (with minor tweaks) for another.

I assume you have a reason for not using the pop protocol which is the supported standard way to retrieve email. To do it like you want it is something that is not supported and maybe also not be covered by the terms of use of the providers.
But if no captcha solving gets in your way it is technically possible. You will have to write a different application for each provider. In case they change something you will have to adopt your application.
To make it work with curl be sure to collect all the cookies they give you in all the pages and to return them in every request.
In case of any problems (and also for development) you could analyze the http requests and answers with some tool (e.g. proxomitron on windows) and make the curl requests more and more look exactly like the browser requests until you succeed. In the end there is nothing they can do to distinguish your curl requests from human requests through a browser. Except captcha like I said before.
Another thing is the intervals between your requests, you could get blocked for requesting to often or when there is no pause between 2 requests (which a human cannot do). Try inserting randomly modified pauses between requests if you suspect this.
I can imagine they block your accounts or IPs during development, in this case it would be necessary to change the IP and/or the account you work on.

Related

Programmatically inserting messages to an email account with CPanel API

I'm looking for a way to programmatically insert messages into a specific email account - created and maintained with CPanel.
I have a website that provides a webmail interface for a video game's inner messsaging (using its API) and I would like to take this service one step further and make the messages available on a POP3 server.
I have looked for a solution multiple ways so far:
Insert mails using CPanel's API: I could not find an api call for this, only for creating new accounts (with https://documentation.cpanel.net/display/SDK/UAPI+Functions+-+Email%3A%3Aadd_pop) This is the preferred method for me as I'd rather leave alone the filesystem.
Manually insert the mails to the filesystem: I found out, that Cpanel stores email data in root/mail/domain/user/ but I have no knowledge of the exact file structure. I recognize the maildirsize file and the other directories containing the mails, but I have no knowledge of the maildirsize file's structure (seems to contain 2 integers divided by a space per line) and also the mail file's filenames are as well not obvious. This is an example of a filename:
1422094110.H186037P182351.hosting-server-domain.com,S=15645
I'd rather use this method as a backup method, because there is very little (or I didn't find any useful) documentation available.
Simply E-mail all messages with custom headers: This would be far the easiest method, however the hosting provider has a very strict anti-spam policy and the outgoing message number is limited.
Implement an own POP3 server: I played around with #cleong 's PHP implementation (https://stackoverflow.com/a/11973533/1030464) and while it worked on localhost, I am not sure it'd be a joyride on the live page. I found a Perl implementation as well which might be worth a try, but I have never ever worked with Perl and I'd rather look at another solution before learning how to implement and integrate that module.
Thank you for reading all this,
Bálint

It sounds like the email is being stored using the Maildir format. This makes inserting mails fairly straightforward. For full details see the Maildir specification, but in summary:
The hardest part: create a guaranteed-unique name for the mail, using the techniques suggested on that page (e.g. local hostname plus high-resolution timestamp plus high-quality random number).
Create a file with this name under .../Maildir/tmp/
Rename the file to be under .../Maildir/new. This renaming step ensures that only fully-formed mails are seen by the mail software - i.e. no other process will attempt to look at the file when you're half way through writing it.
That's it!

What are the right uses for cURL?

I have already heard about the curl library, and that I get interest about...
and as i read that there are many uses for it, can you provide me with some
Are there any security problems with it?

one of the many useful features of curl is to interact with web pages, which means that you can send and receive http request and manipulate the data. which means you can login to web sites and actually send commands as if you where interacting from your web browser.
i found a very good web page titled 10 awesome things to do with curl. it's at http://www.catswhocode.com/blog/10-awesome-things-to-do-with-curl

One of it's big use cases is for automating activities such as getting content from another websites by the application. It can also be used to post data to another website and download files via FTP or HTTP. In other words it allows your application or script to act as a user accessing a website as they would do browsing manually.
There are no inherent security problems with it but it should be used appropriately, e.g. use https where required.
cURL Features

It's for spamming comment forms. ;)
cURL is great for working with APIs, especially when you need to POST data. I've heard that it's quicker to use file_get_contents() for basic GET requests (e.g. grabbing an RSS feed that doesn't require authentication), but I haven't tried myself.
If you're using it in a publicly distributed script, such as a WordPress plugin, be sure to check for it with function_exists('curl_open'), as some hosts don't install it...

In addition to the uses suggested in the other answers, I find it quite useful for testing web-service calls. Especially on *nix servers where I can't install other tools and want to test the connection to a 3rd party webservice (ensuring network connectivity / firewall rules etc.) in advance of installing the actual application that will be communicating with the web-services. That way if there are problems, the usual response of 'something must be wrong with your application' can be avoided and I can focus on diagnosing the network / other issues that are preventing the connection from being made.

It certainly can simplify simple programs you need to write that require higher level protocols for communication.
I do recall a contractor, however, attempting to use it with a high load Apache web server module and it was simply too heavy-weight for that particular application.

How to capture a website API traffic data with Google Analytics?

I have a website where most of the traffic comes from the API (http://untiny.com/api/). I use Google Analytics to collect traffic data, however, the statistics do not include the API traffic because I couldn't include the Google Analytics javascript code into the API pages, and including it will affect the API results. (example: http://untiny.com/api/1.0/extract/?url=tinyurl.com/123).
The solution might be executing the javascript using a javascript engine. I searched stackoverflow and found javascript engines/interpreters for Java and C, but I couldn't find one for PHP except an old one "J4P5" http://j4p5.sourceforge.net/index.php
The question: is using a javascript engine will solve the problem? or is there another why to include the API traffic to Google Analytics?

A simple problem with this in general is that any data you get could be very misleading.
A lot of the time it is probably other servers making calls to your server. When this is true the location of the server in no way represents to location of the people using it, the user agent will be fake, and you can't tell how many different individuals are actually using the service. There's no referrers and if there is they're probably fake... etc. Not many stats in this case are useful at all.
Perhaps make a PHP back end that logs IP and other header information, that's really all you can do to. You'll at least be able to track total calls to the API, and where they're made from (although again, probably from servers but you can tell which servers).

I spent ages researching this and finally found an open source project that seems perfect, though totally under the radar.
http://code.google.com/p/serversidegoogleanalytics/
Will report back on results.

you would likely have to emulate all http calls on the server side with whatever programming language you are using..... This will not give you information on who is using it though, unless untiny is providing client info through some kind of header.
if you want to include it purely for statistical purposes, you could try using curl (if using php) to access the gif file if you detect untiny on the server side
http://code.google.com/apis/analytics/docs/tracking/gaTrackingTroubleshooting.html#gifParameters

You can't easily do this as the Javascript based Google Analytics script will not be run by the end user (unless of course, they are including your API output exactly on their display to the end user: which would negate the need for a fully fledged API [you could just offer an iframable code], pose possible security risks and possibly run foul of browser cross-domain javascript checks).
Your best solution would be either to use server side analytics (such as Apache or IIS's server logs with Analog, Webalizer or Awstats) or - since the most information you would be getting from an API call would be useragent, request and IP address - just log that information in a database when the API is called.

How do I receive email and process it in a web application

I have set up an email id my PHP web application. Users will send emails to this id.
I want to process these emails in the application. Ho do I go about doing this?
Thanks in advance.

I recently worked on a project that required parsing of email from gmail and updating database with certain values based on the contents of the email. I used the ezcMail (now) Zeta Components library to connect to the mail server and parse the emails.
The strategy I adopted was to filter all interesting incoming mail with a label "unprocessed". Run the PHP script via a crontab every 15 minutes. The script would connect to the mail server and open the IMAP unprocessed folder and parse each email. After inserting the interesting values into the database, the script moves the files to another IMAP folder "Proccessed".
I also found IMAP to be better than POP for this sort of processing.

Recently I wanted to be able to receive emails immediately in something I was making so I did some research (I came looking on this question here too actually) and I ended up finding Google App Engine to be pretty helpful. It has an api you can use to receive and process emails sent to ____#yourapp.appspotmail.com. I know that it doesn't really seem helpful since you probably don't want your app on App Engine and you want to receive emails at yourdomain.tld, but with a little setup you can get what you want.
My basic setup is like this:
User sends email to user_id#mydomain.tld (an email address that doesn't actually exist)
mydomain.tld has a catchall email address that forwards to inbox#GAEapp.appspotmail.com
GAEapp (a tiny app on app engine) receives the email, processes it out, and sends a post request with relevant stuff to mydomain.tld
So basically you can make a little GAE app that works like a go between to grab the emails. Even with the redirect it'll work out ok, the email will be fine.
Also I decided to learn me some django and I made a free app called Emailization that will basically do that for you. You create a recipient like ___#emailization.com and give a URL to POST to. Anything sent to that address gets POSTed to you URL. You can make a catchall on your domain that forwards to that emailization recipient and you'll get email through the catchall too!
or you can see a small GAE app I made that you can setup yourself that does the same thing.
Hope that helps somebody!

Use procmail if it is installed on your system. Put these lines in a .procmailrc file in the home directory of the user who receives the e-mail.
:0
| /path/to/your/script.php
Or you can also use a .forward file containing
"|/path/to/your/script.php"
Procmail has the advantage that it allows you to deal with more complicated filtering if your application ever requires it.
Your script.php file will read the headers and body of the e-mail from stdin.

Check out fMailbox. It does not require any non-standard extensions (such as imap) and has been tested with various servers, attachments, multipart messages, SSL, and more.

I suggest using Zend_Mail component of Zend Framework.

There is a great library: Try this: http://code.google.com/p/php-imap

You need to implement an email client in Php. This is probably going to be a POP client.
This code would query the POP server containing your email, download it, and then you could parse it as needed.
A quick google search of "POP client php" has revealed a vast array of different options. Its hard to tell if there's really "The One True PHP POP Library", otherwise I'd include it here. If you are using a preexisting framework, you may wish to check to see its level of POP support, otherwise check the google results above and take your pick. Or it may just be easiest (and most educational :) ) to roll your own.

There are a number of hosted solutions that will accept email for your domain and then post it a script on your website. Most of these will handle the parsing of the messages for you (separating the attachments, "to" "from" and other addresses, etc).
You just create a script that receives a FORM POST and does whatever you need with it.
Mailgun
CloudMailin
You can also look at Mandrill (by MailChimp), SendGrid, and PostMarkApp.

Hosted solutions as Travis Austin suggested work well.
If you are looking for a self-hosted one, you can have a look at the Mailin module allows you to receive emails, parse them and post them to a webhook of your choice.It also checks the dkim and spf, computes a spamassassin score and determines the message language.
I don't know if it will suit your needs since it is written in node.js, but the more options you have, the better. (Disclaimer: I am the maintainer of Mailin)

There is a great tutorial for this here:
http://www.evolt.org/incoming_mail_and_php
which covers how to have the emails forwarded directly to your script, which your script reads via stdin (fopen, fread, etc.) The tutorial code even does basic parsing of the header/body for you.

If you want to avoid reaching out over POP or IMAP to another server to pull-down the email, you can add a 'hook' into the email receive process on some SMTP server you set up (possibly the same php server). Then just have the destination email handled by this server.
Here is an example with postfix, but similar things are possible with sendmail as well.
http://www.adkap.com/autoresponder.html

Create a link between an email and my web-app?

I've seen on site like flickr or brightkite, a personnal email is provided to the users.
If the user mail somethin to this adresse, the content is posted on his public profile.
How can I do that on a web application ?

There are two ways to do this, as I see it:
First, you can use an existing SMTP server/email box system and, on an interval, pull the messages from that mail box using POP3 or IMAP to insert stuff into your database/system.
Alternatively, you can write an implementation of SMTP that will accept email messages coming in and perform your custom logic to put data into your database/system instead of into a mailbox. This is, ultimately, a cleaner design that will have much less overhead. In fact, there may be an SMTP server implementation out there somewhere already that will allow you to inject this kind of custom logic (I'll edit if I can find one).
Personally, I'd go with the second option. This will give you much more control over what's going on in your system and it will have an overall cleaner design.
Good luck!
Edit: It's not PHP, but JAMES from Apache is a Java mail server that allows you to inject custom mail processing units (called mailets) to handle mail processing. You could write such a mailet that will process email messages and put the updates in your database instead of a mailbox. There may be other projects that implement this kind of design, so it's worth a look.
Edit again: Ooo... here's an open source php SMTP server on SourceForge. I don't know that you can inject custom logic, but you can always edit the source and make it do what you want! (If you insist on PHP anyway)

You write a lot of code between your app and the php imagp/pop3 functions.

There are several free mail servers available that support using MySQL or any other database as a storage backend and requires only configuration to do so. If you're not comfortable customizing an existing mail server or writing your own, I'd go with that solution. It's several orders of magnitude faster than using POP3 or IMAP to communicate with the mail server.

Flickr has published their methods for doing exactly this in the book Building Scalable Websites. The whole of chapter 6 is dedicated to the topic. You don't need a non-standard MTA, as mentioned above. The default MTAs will work fine (sendmail, qmail, postfix, exim, etc.). All you have to do is edit /etc/aliases. /etc/aliases can be used to set a mailbox to pass all email to a script.
I strongly recommend reading through this chapter, as it goes on to outline a lot of the common issues you'll run into doing exactly this -- parsing attachments, coping with email from mobile devices (which frequently includes bad/quirky headers), doing authorization correctly, etc.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.