Email Tracking - GMail - php

I am creating my own email tracking system for email marketing tracking. I have been able to determine each persons email client they are using by using the http referrer but for some reason GMAIL does not send a HTTP_REFERRER at all!
So I am trying to find another way of identifying when gmail requests a transparent image from my server. I get the following headers print_r($_SERVER);:
DOCUMENT_ROOT = /usr/local/apache/htdocs
GATEWAY_INTERFACE = CGI/1.1
HTTP_ACCEPT = */*
HTTP_ACCEPT_CHARSET = ISO-8859-1,utf-8;q=0.7,*;q=0.3
HTTP_ACCEPT_ENCODING = gzip,deflate,sdch
HTTP_ACCEPT_LANGUAGE = en-GB,en-US;q=0.8,en;q=0.6
HTTP_CONNECTION = keep-alive
HTTP_COOKIE = __utmz=156230011.1290976484.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=156230011.422791272.1290976484.1293034866.1293050468.7
HTTP_HOST = xx.xxx.xx.xxx
HTTP_USER_AGENT = Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.237 Safari/534.10
PATH = /bin:/usr/bin
QUERY_STRING = i=MTA=
REDIRECT_STATUS = 200
REMOTE_ADDR = xx.xxx.xx.xxx
REMOTE_PORT = 61296
REQUEST_METHOD = GET
Is there anything of use in that list? Or is there something else I can do to actually get the http referrer, if not how are other ESPs managing to find whether gmail was used to view an email?
Btw, I appreciate it if we can hold back on whether this is ethical or not as many ESPs do this already, I just don't want to pay for their service and I want to do it internally.
Thanks all for any implementation advice.
Update
Just thought I would update this question and make it clearer in light of the bounty.
I would like to find out when a user opens my email when sent to a GMail inbox. Assume, I have the usual transparent image tracking and the user does not block images.
I would like to do this with the single request and the header details I get when the transparent image is requested.

Are your images requested with HTTP or HTTPS?
If so, that's the problem.
HTTPS->HTTP referrals do not leak a Referer Header (HTTP_REFERER).
If you embed a HTTP hosted image in an email that is requested from an HTTPS page, it won't send a referrer. (HTTP pages requesting HTTPS, however, do send a referer).The solution is to embed the image as HTTPS. I've tested it, and sure enough, secure HTTPS images do indeed send the Referrer.
One way Gmail could block the referrer information on loaded images by default is if they used a referrer policy, which is supported on most modern browsers. (As of 2011, they did not implement such a policy.)
See the below screenshot of an embedded image that is generated dynamically with the HTTP REFERER of the request:

Make the link something like http://www.example.com/image.jpg?h=8dh38dj
image.jpg is a PHP file and 8dh38dj is the hash of the email you included the link in. When the user requests the file, your PHP script will get '8dh38dj', look that up in your database and find the matching email. Parse the domain i.e. gmail.com from example#gmail.com and you know it is from gmail. To make jpg files execute as PHP, use an AddHandler in php

Related

How can I send event data to Google Measurement Protocol via cURL without a browser generated user-agent?

I am generating leads via Facebook Lead Ads. My server accepts the RTU from Facebook and I am able to push the data around to my CRM as required for my needs.
I want to send an event to GA for when the form is filled out on Facebook.
Reading over the Google Measurement Protocol Reference it states:
user_agent_string – Is a formatted user agent string that is used to compute the following dimensions: browser, platform, and mobile capabilities.
If this value is not set, the data above will not be computed.
I believe that because I am trying to send the event via a PHP webhook script where no browser is involved, the request is failing.
Here is the relevant part of the code that I'm running (I changed from POST to GET thinking that might have been the issue, will change this back to POST once it's working):
$eventData = [
'v' => '1',
't' => 'event',
'tid' => 'UA-XXXXXXX-1',
'cid' => '98a6a970-141c-4a26-b6j2-d42a253de37e',
'ec' => 'my-category-here',
'ea' => 'my-action-here',
'ev' => 'my-value-here
];
//Base URL for API submission
$googleAnalyticsApiUrl = 'https://www.google-analytics.com/collect?';
//Add vars from $eventData object
foreach ($eventData as $key => $value) {
$googleAnalyticsApiUrl .= "$key=$value&";
}
//Remove last comma for clean URL
$googleAnalyticsApiUrl = substr($googleAnalyticsApiUrl, 0, -1);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $googleAnalyticsApiUrl);
curl_setopt($ch,CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
I believe it is the user-agent that is causing the issue as if I manually put the same URL into the browser than I'm trying to hit, the event appears instantly within the Realtime tracking in GA.
An example of said URL is:
https://www.google-analytics.com/collect?v=1&t=event&tid=UA-XXXXX-1&cid=98a6a970-141c-4a26-b6j2-d42a253de37e&ec=my-category-here&ea=my-action-here&el=my-value-here
I have used both the live endpoint and the /debug/ endpoint. My code will not submit without error to either, yet if I visit the relevant URLs via browser, the debug endpoint says all is ok and then on the live endpoint the event reaches GA as expected.
I'm aware that curl_setopt($ch,CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); is trying to send the user-agent of the browser, I have tried filling this option with things such as
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36"
but it never gets accepted by the Measurement Protocol.
My Questions
Is it possible for me to send these events to GA without a web browser being used in the process? I used to have Zapier push these events for me, so I assume it is possible.
How do I send a valid user_agent_string via PHP? I have tried spoofing it with 'CURLOPT_USERAGENT', but never manage to get them working.
I had the same problem: fetching the collect URL from my browser worked like a charm (I saw the hit in the Realtime view), but fetching with with curl or wget did not. On the terminal, using httpie also wored.
I sent a user agent header with curl, and that did solve the issue.
So I am bit puzzled by #daveidivide last comment and that his initial hypothesis was wrong (I mean, I understand that he might have had 2 problems, but sending the user-agent header seems mandatory).
In my experience, Google Analytics simply refrains from tracking requests from cURL or wget (possibly others)... perhaps in an attempt to filter out unwanted noise...? 🤷🏼‍♂️
Any request with a User-Agent including the string "curl" won't get tracked. Overriding the User-Agent header to pretty much anything else, GA will track it.
If you neglect to override the User-Agent header when using cURL, it'll include a default header identifying itself... and GA will ignore the request.
This is also the case when using a package like Guzzle, which also includes its own default User-Agent string (e.g. "GuzzleHttp/6.5.5 curl/7.65.1 PHP/7.3.9").
As long as you provide your own custom User-Agent header, GA should pick it up.

How do i get what the name of the device im using with s.server="" ? SiteCatalyst?

I am sending page name and page information to SiteCatalyst.
I need to send the name of the device the visitor is using.
Could someone help me out
s.pageName="<?php the_title();?> (<?php the_ID(); ?>)"
s.server=""
s.channel="Mobilwebben"
My question is what variable do I need to send s.server"" with so i could get the device name?
For this you can use $_SERVER['HTTP_USER_AGENT'] Quote from PHP.net:
Contents of the User-Agent: header from the current request, if there is one. This is a string denoting the user agent being which is accessing the page. A typical example is: Mozilla/4.5 [en] (X11; U; Linux 2.2.9 i586). Among other things, you can use this value with get_browser() to tailor your page's output to the capabilities of the user agent.
You may also use this free function to read the user agent using PHP:
Detect Mobile Browsers
I hope this helps.

Is there a test case available (simulator) for search engine bots

I have wrote myself a very strong protection class "BlockIp" that can use a blacklist with ip's and can detect strange IP configuration's and can block proxies. When it found one, i get a detailed email about the visitor and why it is blocked and what they want to try to do (once a day of course). It seems that it is working very well because i received some real attacks in the past that have been blocked by this class. It does not block legal bots, but that is not easy to test that the detection method is correct.
Today i get an email from the class that it has blocked "ycar10.mobile.bf1.yahoo.com", it identifies itself as a yahoo robot but was behind a proxy. I search the net if it was blacklisted but didn't found that it is blacklisted. So the question is: Is it right to block bots behind a proxy (use legal bots proxy's anyway?)? Here some information about the bot:
HTTP_ACCEPT = */*
HTTP_X_FORWARDED_FOR = 107.38.3.137, 98.137.88.60
HTTP_USER_AGENT = YahooCacheSystem
PATH = /sbin:/bin:/usr/sbin:/usr/bin
SERVER_SIGNATURE =
SERVER_SOFTWARE = Apache/2.2.14
SERVER_PORT = 80
REMOTE_ADDR = 98.139.241.249
REMOTE_PORT = 53863
GATEWAY_INTERFACE = CGI/1.1
SERVER_PROTOCOL = HTTP/1.1
REQUEST_METHOD = GET
QUERY_STRING =
REQUEST_URI = /
SCRIPT_NAME = /index.php
PHP_SELF = /index.php
REQUEST_TIME = 1330923844
Otherwise, is there a test case (suite/simulator) to be able to the test the correct behaviour of a legal bot (only allowing the major ones such as: Google, Yahoo, Bing), to be sure i used the right detection method. There are some simulators around, but most of them are not working properly and the next question is: "can i trust it...".
*Notice: As you can see in the details above, it is using a REMOTE_PORT value of 53863, what kind of port is 53863?*
I hope you can understand my question, if not, drop a line here.
Port number 53863 is a valid port, not being reserved for anything. The computer that connects to your server can chose any port for that particular connection (although you'll probably see port numbers above 1024).
You can use sites like web-sniffer.net that can identify themselves as GoogleBot. The downside is that they only spoof the user-agent, not the behavior (I doubt they are checking for robots.txt first).
As a personal advice, please do not try to block many IPs at once and check online blacklists. If you start blocking a lot of IPs you may end up realizing that you've blocked trusted bots and you'll have no way to know which ones they were.

PHP Curl Login Cookies

This is what I'm trying to achieve:
I can't use real URL's so I would use site A.com and B.com
From page A.com I display a login form, then I use CURL to submit the values to domain B, login and get the cookie name
Then I set the headers with correct info and cookie name
Redirect the page to B.com/welcome.php using PHP location function
My problem is that I always get the error : Your session has expired. Please login again.
If I use CURL to login and get the cookie and then use CURL to display the page it works.
The CURL part works fine, the following code causes the session expired error"
header ('Host: B.com\n');
header ('User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1\n');
header ('Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\n');
header ('Accept-Language: en-gb,en;q=0.5\n');
header ('Accept-Encoding: gzip, deflate\n');
header ('Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\n');
header ('Referer: http://B.com/\n');
header ('Content-type: application/x-www-form-urlencoded\n');
header ('location: B.com/welcome.php');
The reason I'm using those headers is because that's exactly what is sent from Firefox when I manually login into the page.
Thanks for your help
Obviously, your browser does not have the same cookie store that curl uses and so when the page checks for cookie information from the browser it will not find what it is expecting. If you want this login to work in the browser, you will need to set the cookies in the browser BEFORE directing to the welcome page.

Are there HTTP header fields I could use to spot spam bots?

It stands to reason that scrapers and spambots wouldn't be built as well as normal web browsers. With this in mind, it seems like there should be some way to spot blatant spambots by just looking at the way they make requests.
Are there any methods for analyzing HTTP headers or is this just a pipe-dream?
Array
(
[Host] => example.com
[Connection] => keep-alive
[Referer] => http://example.com/headers/
[Cache-Control] => max-age=0
[Accept] => application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
[User-Agent] => Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7
[Accept-Encoding] => gzip,deflate,sdch
[Accept-Language] => en-US,en;q=0.8
[Accept-Charset] => ISO-8859-1,utf-8;q=0.7,*;q=0.3
)
If I were writing a spam bot, I would fake the headers of a normal browser, so I doubt this is a viable approach. Some other suggestions that might help
Instead
use a captcha
if that's too annoying, a simple but effective trick is to include a text input which is hidden by a CSS rule; users won't see it, but spam bots won't normally bother to parse and apply all the CSS rules, so they won't realise the field is not visible and will put something in it. Check on form submission that the field is empty and disregard it if it is.
use a nonce on your forms; check that the nonce that was used when you rendered the form is the same as when it's submitted. This won't catch everything, but will ensure that the post was at least made by something that received the form in the first place. Ideally change the nonce every time the form is rendered.
You can't find all bots this way, but you could catch some, or at least get some probability of UA being a bot and use that with conjunction with another method.
Some bots forget about Accept-Charset and Accept-Encoding headers. You may also find impossible combinations of Accept and User-Agent (e.g. IE6 won't ask for XHTML, Firefox doesn't advertise MS Office types).
When blocking, be careful about proxies, because they could modify the headers. I recommend backing off if you see Via or X-Forwarded-For headers.
Ideally, instead of writing rules manually, you could use bayesian classifier. It could be as simple as joining relevant headers together and using them as a single "word" in the classifier.

Categories