I am writing a perl program to get the content of one website. while passing cookie in the request, the response i am getting is Disallowed Key Characters.. The webpage, i am trying to get the content of, is designed using PHP. Is there any other way of passing cookies in a clean manner and get the content of the page,same as the browsers do?
The perl snippet is as follows:
my $req = HTTP::Request->new(GET => $link);
$req->header("Host" => "www.example.com/sms");
$req->header('User-Agent' => 'Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firefox/29.0');
$req->header("Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
$req->header("Accept-Language" => "en-us,en;q=0.5");
$req->header('Referer' => 'www.example.com/sms');
$req->header("Cookie" => 'ci_session=a:15:{s:10:"session_id";s:32:"6a023126d470b5c23231f38b00be945f";s:10:"ip_address";s:14:"122.165.230.17";s:10:"user_agent";s:76:"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0";s:13:"last_activity";i:1402922915;s:9:"user_data";s:0:"";s:1:"u";s:7:"username";s:2:"id";s:2:"47";s:7:"uidtype";s:1:"0";s:2:"to";s:0:"";s:4:"from";s:0:"";s:6:"userid";s:0:"";s:3:"fto";s:0:"";s:5:"ffrom";s:0:"";s:12:"sendcontacts";s:0:"";s:6:"checks";s:0:"";}4bdd1a196f5e2fff297cbc0333fde8be');
$req->header("Connection" => "keep-alive");
my $res = $usragt->request($req);
my $code = $res->code();
my $content = $res->content();
print "\n<p>$content</p>\n";
Output:
<p>Disallowed Key Characters.s:32:"6a023126d470b5c23231f38b00be945f"</p>
I was going to put this in a comment since it's not really an answer, but it's too long:
I'm looking at the structure of your cookie, and it's a bit suspicious looking. Here's the breakdown of that cookie:
ci_session=a:15:
{
s:10:"session_id";
s:32:"6a023126d470b5c23231f38b00be945f";
s:10:"ip_address";
s:14:"122.165.230.17";
s:10:"user_agent";
s:76:"Mozilla/5.0 (X11;Ubuntu;Linux x86_64;rv:30.0) Gecko/20100101 Firefox/30.0";
s:13:"last_activity";
i:1402922915;
s:9:"user_data";
s:0:"";
s:1:"u";
s:7:"username";
s:2:"id";
s:2:"47";
s:7:"uidtype";
s:1:"0";
s:2:"to";
s:0:"";
s:4:"from";
s:0:"";
s:6:"userid";
s:0:"";
s:3:"fto";
s:0:"";
s:5:"ffrom";
s:0:"";
s:12:"sendcontacts";
s:0:"";
s:6:"checks";
s:0:"";
}
4bdd1a196f5e2fff297cbc0333fde8be');
That last line could be cookie data, but the rest looks like something else. Cookies are bits of data that point to a unique ID set by the server. They usually have a name and a value associated with it.
The purpose of a cookie is to identify the browser that had previously visited the site. HTTP has no state, and cookies could help establish a state. For example, if you visit a store, a cookie could be set to represent your personal shopping cart. The items you buy won't be stored in the cookie -- only the cart ID. This way, the server can recognize you as you move around the store.
A cookie could contain an id that recognizes you as a user. For example, I log into a site, and check the keep me logged in box. A cookie is set to identify my user ID. When I return to the site, the site sees the cookie that's associated with a particular ID and skips the login process.
The point is that cookies themselves are usually short and sweet. Maybe 64 characters at the most. They may have an expiration date associated with it. Your cookie doesn't look like this. It's long, it's complex, and it contains a lot of stuff that are other parts of a header. I see IP address, Session_ID, User Agent, and what looks like some sort of query in that string. Much of it would change from system to system, so it'd make a terrible cookie.
Did you check the return status code from that webpage? I wouldn't be surprised if it was a 200 OK code. If that's the case, it means you've successfully contacted the server, and talked to that PHP page. It's the PHP page that's sending you back the error.
Since it's in the session_id, it could be that it's an invalid Session ID. Or, it could be almost anything else. You're talking to a PHP program, and it's hard to say what the error could possibly mean in that case.
You may need to find out how to chat with this webpage you want to talk to. Find out exactly the headers you need. Using curlcould help. You can play around with the --data headings and see what's going on.
Sorry I can't give you a better answer than this.
Related
I am generating leads via Facebook Lead Ads. My server accepts the RTU from Facebook and I am able to push the data around to my CRM as required for my needs.
I want to send an event to GA for when the form is filled out on Facebook.
Reading over the Google Measurement Protocol Reference it states:
user_agent_string – Is a formatted user agent string that is used to compute the following dimensions: browser, platform, and mobile capabilities.
If this value is not set, the data above will not be computed.
I believe that because I am trying to send the event via a PHP webhook script where no browser is involved, the request is failing.
Here is the relevant part of the code that I'm running (I changed from POST to GET thinking that might have been the issue, will change this back to POST once it's working):
$eventData = [
'v' => '1',
't' => 'event',
'tid' => 'UA-XXXXXXX-1',
'cid' => '98a6a970-141c-4a26-b6j2-d42a253de37e',
'ec' => 'my-category-here',
'ea' => 'my-action-here',
'ev' => 'my-value-here
];
//Base URL for API submission
$googleAnalyticsApiUrl = 'https://www.google-analytics.com/collect?';
//Add vars from $eventData object
foreach ($eventData as $key => $value) {
$googleAnalyticsApiUrl .= "$key=$value&";
}
//Remove last comma for clean URL
$googleAnalyticsApiUrl = substr($googleAnalyticsApiUrl, 0, -1);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $googleAnalyticsApiUrl);
curl_setopt($ch,CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
I believe it is the user-agent that is causing the issue as if I manually put the same URL into the browser than I'm trying to hit, the event appears instantly within the Realtime tracking in GA.
An example of said URL is:
https://www.google-analytics.com/collect?v=1&t=event&tid=UA-XXXXX-1&cid=98a6a970-141c-4a26-b6j2-d42a253de37e&ec=my-category-here&ea=my-action-here&el=my-value-here
I have used both the live endpoint and the /debug/ endpoint. My code will not submit without error to either, yet if I visit the relevant URLs via browser, the debug endpoint says all is ok and then on the live endpoint the event reaches GA as expected.
I'm aware that curl_setopt($ch,CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); is trying to send the user-agent of the browser, I have tried filling this option with things such as
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36"
but it never gets accepted by the Measurement Protocol.
My Questions
Is it possible for me to send these events to GA without a web browser being used in the process? I used to have Zapier push these events for me, so I assume it is possible.
How do I send a valid user_agent_string via PHP? I have tried spoofing it with 'CURLOPT_USERAGENT', but never manage to get them working.
I had the same problem: fetching the collect URL from my browser worked like a charm (I saw the hit in the Realtime view), but fetching with with curl or wget did not. On the terminal, using httpie also wored.
I sent a user agent header with curl, and that did solve the issue.
So I am bit puzzled by #daveidivide last comment and that his initial hypothesis was wrong (I mean, I understand that he might have had 2 problems, but sending the user-agent header seems mandatory).
In my experience, Google Analytics simply refrains from tracking requests from cURL or wget (possibly others)... perhaps in an attempt to filter out unwanted noise...? 🤷🏼♂️
Any request with a User-Agent including the string "curl" won't get tracked. Overriding the User-Agent header to pretty much anything else, GA will track it.
If you neglect to override the User-Agent header when using cURL, it'll include a default header identifying itself... and GA will ignore the request.
This is also the case when using a package like Guzzle, which also includes its own default User-Agent string (e.g. "GuzzleHttp/6.5.5 curl/7.65.1 PHP/7.3.9").
As long as you provide your own custom User-Agent header, GA should pick it up.
simplexml_load_file() does not load XML file when the URL includes an ampersand symbol. I have tried two examples with and without ampersand:
$source1 = simplexml_load_file("http://www.isws.illinois.edu/warm/data/outgoing/nbska/datastream.aspx?id=ncu");
print_r($source1); //works
$source2 = simplexml_load_file("http://forecast.weather.gov/MapClick.php?lat=38.8893&lon=-77.0494&unit=0&lg=english&FcstType=dwml");
print_r($source2); //no output
First example works well as it does not includes ampersand, but the second example does not work as it include ampersand.
I have referenced
simplexml_load_file with & (ampersand) in url with Solr and simplexml_load_file ampersand in url but it did not work.
The issue is not the ampersand in the URL. The issue, instead, is that weather.gov appears to be blocking these types of requests. They will not allow users that do not have a useragent set.
The fastest way to get around this is to set a UserAgent within PHP, which you can do by putting this code above your xml call:
ini_set('user_agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20100101 Firefox/9.0');
However, I would recommend using CURL instead of simplexml_load_file, as simplexml_load_file is often restricted by server configuration. If you were to do this with curl, you'd want to do something like the first answer here:
SimpleXML user agent
I have tested this locally and got it working just by specifying a user agent.
EDIT: Also, welcome to SO! Be sure to vote often ;D
I am sending page name and page information to SiteCatalyst.
I need to send the name of the device the visitor is using.
Could someone help me out
s.pageName="<?php the_title();?> (<?php the_ID(); ?>)"
s.server=""
s.channel="Mobilwebben"
My question is what variable do I need to send s.server"" with so i could get the device name?
For this you can use $_SERVER['HTTP_USER_AGENT'] Quote from PHP.net:
Contents of the User-Agent: header from the current request, if there is one. This is a string denoting the user agent being which is accessing the page. A typical example is: Mozilla/4.5 [en] (X11; U; Linux 2.2.9 i586). Among other things, you can use this value with get_browser() to tailor your page's output to the capabilities of the user agent.
You may also use this free function to read the user agent using PHP:
Detect Mobile Browsers
I hope this helps.
I was wondering if there is any way to check a page has been ran in a browser (by a human) in PHP.
I've got a page that only needs to be accessed through a cURL request. So I don't want users snooping around on it.
any ideas?
thanks
EDIT:
Because this is one of those questions that are not easily found on the web, here's the solution i used:
I came up with an idea thanks to anthony-arnold. Its not very stable, but it should do for now.
I simply sent the user agent in my cURL request:
//made a new var with the user agent string.
$user_agent = "anything I want in here, which will be my user agent";
//added this line in the cURL request to send the useragent to the target page:
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
and then I simply wrote an if statement to handle it:
if
($_SERVER['HTTP_USER_AGENT'] == "my expected useragent - the string i previously placed into the $user_agent var."){
echo "the useragent is as expected, do whatever";
}
else if
($_SERVER['HTTP_USER_AGENT'] != "my expected useragent - the string i previously placed into the $user_agent var."){
echo "useragent is not as expected, bye now."; exit();}
And that did the trick.
Check the User-Agent or use the get browser function to check which browser is requesting your page. You could configure your web server to fail unless a specific user agent is specified. You can set the user agent in cURL using the --user-agent switch (see the man page).
Unfortunately, the user agent can be spoofed so you can never be absolutely sure that the one sent by the client is in fact correct.
There is a problem with this idea though. If it's on the public web, you have to expect that people might try to access it in any way! If the HTTP request is valid, your server will respond to it (under default configuration). If you really don't want it accessed by any method other than your prescribed cURL one, then you might need to invest in some further authentication/authorization methods (e.g. username/passphrase authentication via SSL).
i have a pretty simple captcha, something like this:
<?php
session_start();
function randomText($length) {
$pattern = "1234567890abcdefghijklmnopqrstuvwxyz";
for($i=0;$i<$length;$i++) {
$key .= $pattern{rand(0,35)};
}
return $key;
}
$textCaptcha=randomText(8);
$_SESSION['tmptxt'] = $textCaptcha;
$captcha = imagecreatefromgif("bgcaptcha.gif");
$colText = imagecolorallocate($captcha, 0, 0, 0);
imagestring($captcha, 5, 16, 7, $textCaptcha, $colText);
header("Content-type: image/gif");
imagegif($captcha);
?>
the problem is that if the user have YSlow installed, the image is query 2 times, so, the captcha is re-generated and never match with the one inserted by the user.
i saw that is only query a second time if i pass the content-type header as gif, if i print it as a normal php, this doesn't happen.
someone have any clue about this? how i can prevent it or identify that the second query is made by YSlow, to do not generate the captcha again.
Regards,
Shadow.
YSlow does request the page components when run, so it sounds like your problem is cases where the user has YSlow installed and it's set to run automatically at each page load.
The best solution may be to adjust your captcha code to not recreate new values within the same session, or if it does to make sure the session variable matches the image sent.
But to your original question about detecting the second query made by YSlow, it's possible if you look at the HTTP headers received.
I just ran a test and found these headers sent with the YSlow request. The User-Agent is set to match the browser (Firefox in my case), but you could check for the presence of X-YQL-Depth as a signal. (YSlow uses YQL for all of its requests.)
Array
(
[Client-IP] => 1.2.3.4
[X-Forwarded-For] => 1.2.3.4, 5.6.7.8
[X-YQL-Depth] => 1
[User-Agent] => Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
[Accept-Encoding] => gzip
[Host] => www.example.com
[Connection] => keep-alive
[Via] => HTTP/1.1 htproxy1.ops.sp1.yahoo.net[D1832930] (YahooTrafficServer/1.19.5 [uScM])
)