I was just curious to know if the data presented in the google analytics report includes bots/spiders/crawlers. One of the websites that we are building is still stealth(zero marketing, though the site went live about 20 odd days ago). My boss was happy and proud that we are having visitors from all over the world already. But I am a little skeptical :)
Will be great if someone can clarify this for me!
Thanks in advance!
Cheers,
--
Sniper
As explained above, Google Analytics uses a Javascript based mechanism, and since most crawlers won't execute JavaScript, you shouldn't see crawlers into your stats, but true visitors.
In some situations however you could get some "noise" into your stats:
someone has put, by mistake ?, your UA number into its website pages, and you get hits from another web site into yours
some services that you subscribed to monitor the availability of your site from all over the world (like IP Label) run embedded browsers that execute JS and then will showup into your stats
last, you run the mobile tracking code of GA on your site, which is server side and no more JS based, then you can get crawlers into your stats while GA should remove most of them.
To assess if case 1 applies, go into Visitors / Network properties / Hostnames and check if only your hostname is diplayed. In case of other domain names showing up, you can build an advanced filter to include only your hostname
For case 2, look at visits per service providers from day to day to highlight service providers having a stable number of visits per day over time. You may also look at pages with a high share of direct access + bounce + similar volume of pageviews per day over time : this is typical of a monitoring system looking always at the same page.
For case 3, look at web browsers identified by GA in Visitors / Browsers
Google Analytics uses JavaScript that isn't usually executed by bots so your reports shouldn't include data from them. You should be able to look at the visitor overview report to see that your visits are coming from agents that identify themselves as browsers, not bots.
Related
We have a few API's on our site, one is a places API.
When google spider crawls our site it will hit the quota for our places API.
I have reset the API over and over and its getting very tiring!
I also set my site to run 3 different API projects with the same APIs (google places) and used logic to make it use up one, switch to the next ect ect however; Even after now having 450,000 calls per day, by noon google search spider has killed all 3 API's!!!
this now makes it so that my users can no longer use any section that uses the places API, this is a HUGE problem!!! i am not being charged for the google hitting google API calls, however it is destroying the users experience at my site and will not be tolerated!
Please help right away!
I imagine it rests within googles hands to fix this bug within their system, there is really nothing i can personally do as you have read above that i have done everything i can for my users experience when visiting my site.
It's not a bug in their system, it's a bug in your site if you have hundreds of thousands of unique URLs that all make API calls and you haven't prevented crawling them using robots.txt (see here).
I ended up solves this in a work around way, for anyone else having this issue here is what i did.
1) i have 3 API projects set up, each can make 150,000 calls a day
2) i have logic set up to look and see if the page is being accessed from a sprider like google bot
3) if the session is coming from a spider, the 3rd API key is set to be null
4)the system trys to use each API one by one, if the first result set is empty it tries number 2, then if 2 is empty it tries number 3
5) because the 3rd API key is set to null if a spider, this allows for those 150,000 calls to be set aside for a user, but now we have to stop crawl from crawling blank content
6) in the logic block that switches from trying API 1, then 2, then 3 I made php rewrite my robots.txt file, if API 1 is usable i set this:
file_put_contents('robots.txt', 'User-agent: *
Disallow: ');
same for API 2, if API 3 is being used then i rewrite the robots.txt to become:
file_put_contents('robots.txt', 'User-agent: *
Disallow: /');
this now has set aside 150,000 calls for users, the spiders can not use these 150,000 calls, and at the point that the other 300,000 calls have been exhausted, the site can no longer be crawled for the rest of the day by any spiders.
problem solved!
told you id fix it myself if they couldnt.
oh and another note, because its google using google API's im not being charged for the 300,000 calls that google kills, i only get charged for that real users use up......pure perfection!
I am trying to implement visitor counter on a project but I am confused about one thing and it is what to accurately count as one visit or view. If I go with IP based counter then it means even if many people are visiting the website on same computer with same IP (like from a cyber cafe or a shared pc) it will count as one visit. If I go with simply incrementing visits every time homepage is opened then someone can keep refreshing the homepage to increase the count and it will not be accurate page views count.
So neither option gives accurate picture of visits.
So I am thinking of implementing IP based page views and if someone opens the homepage with same IP before 5 minutes it will not be counted as another view. Only after five minutes page count will be increase for same IP. So I want to ask whether this approach will give most accurate page view count or there is any other optimal solution?
Google analytics cannot be used as this website will be used on an intranet network.
Google Analytics is still an option for internal websites. I created a workflow application which is only available through our internal network, but Google Analytics still works. Only requirement is that the user that uses the application has internet access, so that the Google-Analytics Snippet can communicate with the servers.
I'd not recommend using your own methods to count visitors, unless you're planning to show these informations for all users (like it is the case with the view here on SO). You could still create some kind of internal mechanism easily, given the fact that people authenticate on your application or you can identify them somehow else.
Google Analytics and other tracking applications use cookies through javascript to track page visits and especially visitors. Due to the fact that cookies can be unique per session of a browser, this makes identifying different people on the same IP more easy.
However as #Ahatius points out, better not to reinvent the wheel if possible.
Google Analytics also has a php api (which I've successfully implemented in the past). However in that scenario you still have to do decide by yourself how to identify visitors and pageviews.
I am developing a website and Facebook application for a friend who runs a gardening services and does some handy-man servicing and would like some help to do the following;
I would not only like to do this on my Facebook application which is fairly similar to the website itself just a little more integration with Facebook, posting to wall and whatnot, but on the main website too. The reason why I have mentioned this is, as you may or may not know, one way of having an application on Facebook is to tell Facebook where the directory is and whatever on that directory on wards is shown in an iframe.
With Facebook going to my chosen directory and showing this on Facebook via an iframe, well, I don't exactly know what goes on, if they're on my website as well as Facebook or whatnot and due to being fairly new to the likes of PHP, I do apologize if this question seems a little messy but I do hope I do explain myself.
I've been told that the best option is to use HTML5 Geolocation as my primary method with a fallback to a better IP geolocator is my best option to find out their location, then I would like to check their current weather via the Met Office website ( http://www.metoffice.gov.uk/datapoint/product/uk-3hourly-site-specific-forecast/detailed-documentation ) to then know what stylesheet should be used.
Different stylesheets are being developed for most conditions such as; Showers, Heavy Rain, Overcast, Sunny, Snow and so on until they are all developed to customize the experience for the user and to also show different div's depending on their weather and show them their forecast. An example of a div could be; if it's snowing, it would ask them if they would want any snow to be ploughed.
Lastly, something I should have also asked and mentioned earlier, I would also like this feature to work for UK users to avoid any confusion and if it is possible to see how far the user is from where they are based as he has only told me he'll do up to a 100 mile radius and if it helps him pay for his bills and whatnot he'll go there, but is there a way to check how far away they are for a contact form or to show on screen or whatnot.
Summary:
Show different graphics depending on their location's weather and to see how far they are away from the base and show their weather forecast and if possible to change the weather upon the 3 hourly update.
Inform non-UK users that this is a UK only business and it would cost too much to go there.
Best Regards,
Tim
You can achieve by using the IP address of your visitor, then tracing down the location of the user. After the location is found out, use any weather report server like Google to get the weather of that location. Using the value returned by the provider, you can use the server side script to change the layout or contents of the page.
It can be a hassle, and might reduce the page load time due to extended time taken to contact and wait for response from the weather report provider.
PS: Suggestions and improvements expected.
I want to store visitor information like reach time ,duration of visit and also exact location of the visitor.
Can any one give me any ideas about it? Please help me.
Just a few links to get you started:
JavaScript heat map generator used to track where users clicked on your site
Recording the time a user spent on any single page
Tracking the user's location via IP
Official Geolocation API specification
In my opinion, best thing would be to use some external website statistics tool (like Google Analytics or similar). You can also search for some solutions you install locally (like AWStats, in case of privacy concerns). No need to reinvent the wheel there.
I need to execute a google analytics script on a site using curl. What are the reporting features that will be available to me for that curl request ?
To make my question more clearer, suppose i have a website www.abc.com which has analytics code on it and if i connect to www.abc.com from www.xyz.com using curl, in the analytics report of www.abc.com will the request i made using curl show ?
What are the parameters that wont show ? since there is no navigator there wouldnt be screen resolutions, color depth, or any other javascript based features. any other reporting features that wouldn't be available to me ?
I need the following to show up :
user agent (which i will be sending by setting header)
referrer (again which i will be sending by setting header)
source ip address and location (using proxies for different countries)
one issue that Im unsure of is Google's cookies and whether (since i might also be using proxies) these need to be cleared. The ultimate outcome is that I need to be able emulate site traffic as if it is coming from a variety of visitors....
IF the entire thing is not technically possible,is there any other way i can simulate diverse traffic into my google analytics account ?
So, the other answers are right: CURL doesn't execute JS, and there are some methods of automated requests that do.
Other methods to simulate diverse traffic to your account include:
Visit the site manually, and grab the __utm.gif request that Google Analytics generates, and manipulate its pieces so that you can CURL it in conjunction with curling the actual site, so that the GA pageviews are recorded. (ie, alter the hostname, pageview name, timestamp, etc.) You can find the meaning of the values of those parameters here.
Implement a server side GA on your target site.
Use a headless web engine to programmatically crawl sites. PhantomJS is a particularly user-friendly option.
Use a browser screenshot service like BrowserShots to get traffic from distributed locations to visit your site.
Use Amazon's Mechanical Turk to get people to visit the site. You could pay $0.01 per click, and get a large amount of diverse traffic from a large number of sources. (To verify, give them an arbitrary, simple task like asking them "What's the headline on this website?")
You can send events directly to Google Analytics using the Measurement Protocol, creating GET or POST requests directly with the tool of your choice.
See reference here:
https://developers.google.com/analytics/devguides/collection/protocol/v1/reference
No, because Google Analytics is based on Javascript and curl doesn't process HTML or Javascript.
Instead of curl, use a command-line tool that does JavaScript, like HTTPUnit (which includes Rhino). I have heard about WATIR too but never tried.
Those happen to be testing tools, but I guess you can use them to trigger Google Analytics too.