prevent query to captcha generator from YSlow

prevent query to captcha generator from YSlow - php

i have a pretty simple captcha, something like this:
<?php
session_start();
function randomText($length) {
$pattern = "1234567890abcdefghijklmnopqrstuvwxyz";
for($i=0;$i<$length;$i++) {
$key .= $pattern{rand(0,35)};
}
return $key;
}
$textCaptcha=randomText(8);
$_SESSION['tmptxt'] = $textCaptcha;
$captcha = imagecreatefromgif("bgcaptcha.gif");
$colText = imagecolorallocate($captcha, 0, 0, 0);
imagestring($captcha, 5, 16, 7, $textCaptcha, $colText);
header("Content-type: image/gif");
imagegif($captcha);
?>
the problem is that if the user have YSlow installed, the image is query 2 times, so, the captcha is re-generated and never match with the one inserted by the user.
i saw that is only query a second time if i pass the content-type header as gif, if i print it as a normal php, this doesn't happen.
someone have any clue about this? how i can prevent it or identify that the second query is made by YSlow, to do not generate the captcha again.
Regards,
Shadow.

YSlow does request the page components when run, so it sounds like your problem is cases where the user has YSlow installed and it's set to run automatically at each page load.
The best solution may be to adjust your captcha code to not recreate new values within the same session, or if it does to make sure the session variable matches the image sent.
But to your original question about detecting the second query made by YSlow, it's possible if you look at the HTTP headers received.
I just ran a test and found these headers sent with the YSlow request. The User-Agent is set to match the browser (Firefox in my case), but you could check for the presence of X-YQL-Depth as a signal. (YSlow uses YQL for all of its requests.)
Array
(
[Client-IP] => 1.2.3.4
[X-Forwarded-For] => 1.2.3.4, 5.6.7.8
[X-YQL-Depth] => 1
[User-Agent] => Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
[Accept-Encoding] => gzip
[Host] => www.example.com
[Connection] => keep-alive
[Via] => HTTP/1.1 htproxy1.ops.sp1.yahoo.net[D1832930] (YahooTrafficServer/1.19.5 [uScM])
)

Related

How can I send event data to Google Measurement Protocol via cURL without a browser generated user-agent?

I am generating leads via Facebook Lead Ads. My server accepts the RTU from Facebook and I am able to push the data around to my CRM as required for my needs.
I want to send an event to GA for when the form is filled out on Facebook.
Reading over the Google Measurement Protocol Reference it states:
user_agent_string – Is a formatted user agent string that is used to compute the following dimensions: browser, platform, and mobile capabilities.
If this value is not set, the data above will not be computed.
I believe that because I am trying to send the event via a PHP webhook script where no browser is involved, the request is failing.
Here is the relevant part of the code that I'm running (I changed from POST to GET thinking that might have been the issue, will change this back to POST once it's working):
$eventData = [
'v' => '1',
't' => 'event',
'tid' => 'UA-XXXXXXX-1',
'cid' => '98a6a970-141c-4a26-b6j2-d42a253de37e',
'ec' => 'my-category-here',
'ea' => 'my-action-here',
'ev' => 'my-value-here
];
//Base URL for API submission
$googleAnalyticsApiUrl = 'https://www.google-analytics.com/collect?';
//Add vars from $eventData object
foreach ($eventData as $key => $value) {
$googleAnalyticsApiUrl .= "$key=$value&";
}
//Remove last comma for clean URL
$googleAnalyticsApiUrl = substr($googleAnalyticsApiUrl, 0, -1);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $googleAnalyticsApiUrl);
curl_setopt($ch,CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
I believe it is the user-agent that is causing the issue as if I manually put the same URL into the browser than I'm trying to hit, the event appears instantly within the Realtime tracking in GA.
An example of said URL is:
https://www.google-analytics.com/collect?v=1&t=event&tid=UA-XXXXX-1&cid=98a6a970-141c-4a26-b6j2-d42a253de37e&ec=my-category-here&ea=my-action-here&el=my-value-here
I have used both the live endpoint and the /debug/ endpoint. My code will not submit without error to either, yet if I visit the relevant URLs via browser, the debug endpoint says all is ok and then on the live endpoint the event reaches GA as expected.
I'm aware that curl_setopt($ch,CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); is trying to send the user-agent of the browser, I have tried filling this option with things such as
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36"
but it never gets accepted by the Measurement Protocol.
My Questions
Is it possible for me to send these events to GA without a web browser being used in the process? I used to have Zapier push these events for me, so I assume it is possible.
How do I send a valid user_agent_string via PHP? I have tried spoofing it with 'CURLOPT_USERAGENT', but never manage to get them working.

I had the same problem: fetching the collect URL from my browser worked like a charm (I saw the hit in the Realtime view), but fetching with with curl or wget did not. On the terminal, using httpie also wored.
I sent a user agent header with curl, and that did solve the issue.
So I am bit puzzled by #daveidivide last comment and that his initial hypothesis was wrong (I mean, I understand that he might have had 2 problems, but sending the user-agent header seems mandatory).

In my experience, Google Analytics simply refrains from tracking requests from cURL or wget (possibly others)... perhaps in an attempt to filter out unwanted noise...? 🤷🏼‍♂️
Any request with a User-Agent including the string "curl" won't get tracked. Overriding the User-Agent header to pretty much anything else, GA will track it.
If you neglect to override the User-Agent header when using cURL, it'll include a default header identifying itself... and GA will ignore the request.
This is also the case when using a package like Guzzle, which also includes its own default User-Agent string (e.g. "GuzzleHttp/6.5.5 curl/7.65.1 PHP/7.3.9").
As long as you provide your own custom User-Agent header, GA should pick it up.

Disallowed characters while passing cookies to a PHP site

I am writing a perl program to get the content of one website. while passing cookie in the request, the response i am getting is Disallowed Key Characters.. The webpage, i am trying to get the content of, is designed using PHP. Is there any other way of passing cookies in a clean manner and get the content of the page,same as the browsers do?
The perl snippet is as follows:
my $req = HTTP::Request->new(GET => $link);
$req->header("Host" => "www.example.com/sms");
$req->header('User-Agent' => 'Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firefox/29.0');
$req->header("Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
$req->header("Accept-Language" => "en-us,en;q=0.5");
$req->header('Referer' => 'www.example.com/sms');
$req->header("Cookie" => 'ci_session=a:15:{s:10:"session_id";s:32:"6a023126d470b5c23231f38b00be945f";s:10:"ip_address";s:14:"122.165.230.17";s:10:"user_agent";s:76:"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0";s:13:"last_activity";i:1402922915;s:9:"user_data";s:0:"";s:1:"u";s:7:"username";s:2:"id";s:2:"47";s:7:"uidtype";s:1:"0";s:2:"to";s:0:"";s:4:"from";s:0:"";s:6:"userid";s:0:"";s:3:"fto";s:0:"";s:5:"ffrom";s:0:"";s:12:"sendcontacts";s:0:"";s:6:"checks";s:0:"";}4bdd1a196f5e2fff297cbc0333fde8be');
$req->header("Connection" => "keep-alive");
my $res = $usragt->request($req);
my $code = $res->code();
my $content = $res->content();
print "\n<p>$content</p>\n";
Output:
<p>Disallowed Key Characters.s:32:"6a023126d470b5c23231f38b00be945f"</p>

I was going to put this in a comment since it's not really an answer, but it's too long:
I'm looking at the structure of your cookie, and it's a bit suspicious looking. Here's the breakdown of that cookie:
ci_session=a:15:
{
s:10:"session_id";
s:32:"6a023126d470b5c23231f38b00be945f";
s:10:"ip_address";
s:14:"122.165.230.17";
s:10:"user_agent";
s:76:"Mozilla/5.0 (X11;Ubuntu;Linux x86_64;rv:30.0) Gecko/20100101 Firefox/30.0";
s:13:"last_activity";
i:1402922915;
s:9:"user_data";
s:0:"";
s:1:"u";
s:7:"username";
s:2:"id";
s:2:"47";
s:7:"uidtype";
s:1:"0";
s:2:"to";
s:0:"";
s:4:"from";
s:0:"";
s:6:"userid";
s:0:"";
s:3:"fto";
s:0:"";
s:5:"ffrom";
s:0:"";
s:12:"sendcontacts";
s:0:"";
s:6:"checks";
s:0:"";
}
4bdd1a196f5e2fff297cbc0333fde8be');
That last line could be cookie data, but the rest looks like something else. Cookies are bits of data that point to a unique ID set by the server. They usually have a name and a value associated with it.
The purpose of a cookie is to identify the browser that had previously visited the site. HTTP has no state, and cookies could help establish a state. For example, if you visit a store, a cookie could be set to represent your personal shopping cart. The items you buy won't be stored in the cookie -- only the cart ID. This way, the server can recognize you as you move around the store.
A cookie could contain an id that recognizes you as a user. For example, I log into a site, and check the keep me logged in box. A cookie is set to identify my user ID. When I return to the site, the site sees the cookie that's associated with a particular ID and skips the login process.
The point is that cookies themselves are usually short and sweet. Maybe 64 characters at the most. They may have an expiration date associated with it. Your cookie doesn't look like this. It's long, it's complex, and it contains a lot of stuff that are other parts of a header. I see IP address, Session_ID, User Agent, and what looks like some sort of query in that string. Much of it would change from system to system, so it'd make a terrible cookie.
Did you check the return status code from that webpage? I wouldn't be surprised if it was a 200 OK code. If that's the case, it means you've successfully contacted the server, and talked to that PHP page. It's the PHP page that's sending you back the error.
Since it's in the session_id, it could be that it's an invalid Session ID. Or, it could be almost anything else. You're talking to a PHP program, and it's hard to say what the error could possibly mean in that case.
You may need to find out how to chat with this webpage you want to talk to. Find out exactly the headers you need. Using curlcould help. You can play around with the --data headings and see what's going on.
Sorry I can't give you a better answer than this.

iframe transport isn't transferring any data

I'm using jQuery-File-Upload with jQuery-Iframe-Transport to try to get support for older versions of IE.
I've set the forceIframeTransport option to true so that it behaves more or less the same way in all browsers, but I don't seem to get any data back on the server-side regardless of browser when it uses the iframe transport.
I've spat out the request headers server-side and I get back:
array(
Host => "*******"
Connection => "keep-alive"
Content-Length => "0"
Accept => "*/*"
Origin => "**************"
X-Requested-With => "XMLHttpRequest"
User-Agent => "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17"
DNT => "1"
Referer => "***********"
Accept-Encoding => "gzip,deflate,sdch"
Accept-Language => "en-GB,en-US;q=0.8,en;q=0.6"
Accept-Charset => "ISO-8859-1,utf-8;q=0.7,*;q=0.3"
Cookie => "*********"
)
[*****s indicated bleeped out info; you don't need that ;)]
Which look OK, but $_REQUEST is empty (i.e., array()), and the input buffer is empty too:
$handle = fopen('php://input', 'r');
$file_data = '';
while(($buffer = fgets($handle, 4096)) !== false) {
$file_data .= $buffer;
}
fclose($handle); // $file_data = '';
This all worked fine when I wasn't using the iframe-transport but I need IE support... does anyone have any experience with transmitting files using iframes and might know why no data is coming through?
When I use jQuery-File-Upload / js / jquery.iframe-transport.js and force iframe transport it works in Chrome, but the requests don't even make it to the server in IE.
When I use jquery-iframe-transport / jquery.iframe-transport.js and force iframe transport it breaks in Chrome, but that's fine because Chrome supports proper XHR file transfers, and the requests at least hit the server in IE but no data comes through.
I've updated my script to support either transfer method:
if(empty($_FILES)) {
$handle = fopen('php://input', 'r');
$file_data = '';
while(($buffer = fgets($handle, 4096)) !== false) {
$file_data .= $buffer;
}
fclose($handle);
} else {
$file_data = file_get_contents($_FILES['files']['tmp_name'][0]);
}
But again, I still can't seem to get any data in IE regardless of what I do.
When I say "IE", I'm specifically testing in IE 8 right now. I need support back to 7 though. This guy claims support all the way back to IE 6.

After many hours, I've finally tracked down the issue.
First, you need to use the transport plugin that comes bundled with jQuery-file-upload because it was made for it ;) I'm not quite sure why the other one got a step further, but I'll get to that in a minute.
I noticed in IE that I was getting an "access is denied" JavaScript error somewhere in the core jquery library. From what I read online this usually happens when you try to submit to a URL at a different domain, which I wasn't doing, so I dismissed it.
I was comparing what the 2 different transport scripts did differently, when I came to a line that said form.submit() in one version, and form[0].submit() in the other. So I tried adding the [0] and then noticed the "access has denied" error changed to point to that line. So clearly, it didn't like where I was submitting the files to.
I double checked the form.action and the URL still looked fine. Through some Google-fu I discovered that you can also get this error if the event does not originate from the original/native file input element.
I had replaced the native input with a fancy one and then triggered a "fake" 'click' event on the hidden native input. This it didn't like.
Took out my fake upload button and plopped the native one (<input type="file"/> fyi) back in, and now everything works like a charm in all browsers. Huzzah!

For what it's worth ...
I was working with jQuery v1.9.1 doing virus scanning on files synchronously before files are uploaded to the server. If the file had a virus, we WERE returning a HTTP 400, and HTTP 200 if not virus.
The HTTP 400 response caused the IE8 "Access Denied" result.
When I changed server reponse from 400 to 401, the UI worked perfectly.
Again, "For What It's Worth."

HTTP request getting partial response

I'm trying to get this CrunchBase API page as a string in PHP. When I visit that page in a browser, I get the full response (some 230K characters); however, when I try to get the page in a script, the response is much shorter (24341 characters on a server and 36629 characters locally, with exactly the same number of characters for other long CrunchBase pages). To get the page, I am using a function almost identical to drupal_http_request() although I'm not using Drupal. (I have also tried using cURL and file_get_contents() and got the same result. And now that I'm thinking about it I have experienced the same from CrunchBase in Python in the past.)
What could be causing this and how can I fix it? PHP 5.3.2, Apache 2.2.14, Ubuntu 10.04. Here are additional details on the response:
[protocol] => HTTP/1.1
[headers] => Array
(
[content-type] => text/javascript; charset=utf-8
[connection] => close
[status] => 200 OK
[x-powered-by] =>
[etag] => "d809fc56a529054e613cd13e48d75931"
[x-runtime] => 0.00453
[content-length] => 230310
[cache-control] => private, max-age=0, must-revalidate
[server] => nginx/1.0.10 + Phusion Passenger 3.0.11 (mod_rails/mod_rack)
)
I don't think it's a user agent issue as I used User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6 in the request.
UPDATE
According to this thread I needed to add the Accept-Encoding: gzip, deflate header to the request. That does result in a longer request result, but now I have to figure out how to inflate it. The gzinflate() function fails with a Warning: Data error. Any thoughts on how to inflate the response?

See the comments in the PHP docs about gzinflate(), specifically the remarks about stripping the initial bytes. The last comment did the trick for me:
<?php $dec = gzinflate(substr($enc,10)); ?>
Though it seems that the number of bytes to be stripped depends on the original encoder. Another comment has a more thorough solution, and a reference to RFC1952 for further reading.
Evidently gzdecode() is meant to address to this issue, but it hasn't been released yet.
ps -- I deleted my comment about the returned data being plain text. I was wrong.

Are there HTTP header fields I could use to spot spam bots?

It stands to reason that scrapers and spambots wouldn't be built as well as normal web browsers. With this in mind, it seems like there should be some way to spot blatant spambots by just looking at the way they make requests.
Are there any methods for analyzing HTTP headers or is this just a pipe-dream?
Array
(
[Host] => example.com
[Connection] => keep-alive
[Referer] => http://example.com/headers/
[Cache-Control] => max-age=0
[Accept] => application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
[User-Agent] => Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7
[Accept-Encoding] => gzip,deflate,sdch
[Accept-Language] => en-US,en;q=0.8
[Accept-Charset] => ISO-8859-1,utf-8;q=0.7,*;q=0.3
)

If I were writing a spam bot, I would fake the headers of a normal browser, so I doubt this is a viable approach. Some other suggestions that might help
Instead
use a captcha
if that's too annoying, a simple but effective trick is to include a text input which is hidden by a CSS rule; users won't see it, but spam bots won't normally bother to parse and apply all the CSS rules, so they won't realise the field is not visible and will put something in it. Check on form submission that the field is empty and disregard it if it is.
use a nonce on your forms; check that the nonce that was used when you rendered the form is the same as when it's submitted. This won't catch everything, but will ensure that the post was at least made by something that received the form in the first place. Ideally change the nonce every time the form is rendered.

You can't find all bots this way, but you could catch some, or at least get some probability of UA being a bot and use that with conjunction with another method.
Some bots forget about Accept-Charset and Accept-Encoding headers. You may also find impossible combinations of Accept and User-Agent (e.g. IE6 won't ask for XHTML, Firefox doesn't advertise MS Office types).
When blocking, be careful about proxies, because they could modify the headers. I recommend backing off if you see Via or X-Forwarded-For headers.
Ideally, instead of writing rules manually, you could use bayesian classifier. It could be as simple as joining relevant headers together and using them as a single "word" in the classifier.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.