Google crawl kills places api quota - php

We have a few API's on our site, one is a places API.
When google spider crawls our site it will hit the quota for our places API.
I have reset the API over and over and its getting very tiring!
I also set my site to run 3 different API projects with the same APIs (google places) and used logic to make it use up one, switch to the next ect ect however; Even after now having 450,000 calls per day, by noon google search spider has killed all 3 API's!!!
this now makes it so that my users can no longer use any section that uses the places API, this is a HUGE problem!!! i am not being charged for the google hitting google API calls, however it is destroying the users experience at my site and will not be tolerated!
Please help right away!
I imagine it rests within googles hands to fix this bug within their system, there is really nothing i can personally do as you have read above that i have done everything i can for my users experience when visiting my site.

It's not a bug in their system, it's a bug in your site if you have hundreds of thousands of unique URLs that all make API calls and you haven't prevented crawling them using robots.txt (see here).

I ended up solves this in a work around way, for anyone else having this issue here is what i did.
1) i have 3 API projects set up, each can make 150,000 calls a day
2) i have logic set up to look and see if the page is being accessed from a sprider like google bot
3) if the session is coming from a spider, the 3rd API key is set to be null
4)the system trys to use each API one by one, if the first result set is empty it tries number 2, then if 2 is empty it tries number 3
5) because the 3rd API key is set to null if a spider, this allows for those 150,000 calls to be set aside for a user, but now we have to stop crawl from crawling blank content
6) in the logic block that switches from trying API 1, then 2, then 3 I made php rewrite my robots.txt file, if API 1 is usable i set this:
file_put_contents('robots.txt', 'User-agent: *
Disallow: ');
same for API 2, if API 3 is being used then i rewrite the robots.txt to become:
file_put_contents('robots.txt', 'User-agent: *
Disallow: /');
this now has set aside 150,000 calls for users, the spiders can not use these 150,000 calls, and at the point that the other 300,000 calls have been exhausted, the site can no longer be crawled for the rest of the day by any spiders.
problem solved!
told you id fix it myself if they couldnt.
oh and another note, because its google using google API's im not being charged for the 300,000 calls that google kills, i only get charged for that real users use up......pure perfection!

Related

Google Analytics API returns Null for channel sources on SOME sites

I've built a small application (written in PHP and just being run locally) to pull Google Analytics results each month and display certain metrics/dimensions nicely using ChartJS which then gets saved as a PDF and emailed off to the clients.
However, as of last month, I cannot get the API to return any data for acquisition channels for 2 sites in particular - all the other sites are totally fine.
It just so happens that these 2 sites see upwards of tens of thousands of unique visits each month - this is the only factor I can see as to why these 2 may be different from the rest.
I'm able to pull all the other metrics for these sites successfully, and can view channel acquisitios if I manually visit the Google Analytics site, but the API just won't return anything.
No errors, no warnings, just an empty return value.
Nothing has changed in the last few months with the code on my end, and everything is working on other sites, just not these 2 so I know the code is working. And it's specifically only channel acquisition which are affected.
I've tried setting the max return value without success also.
Google reports no updates to the API which would cause this.
Any help or advice very much appreciated!

Facebook Graph API Comment Count Suddenly Stopped Working

Our site uses our own comment system (simple php/mysql) and also the fb comment plugin. I would like to be able to add the comment counts of each to display a single total count of comments from both together. Seems simple enough.
Months ago, I got this working. Then it suddenly stopped working. This morning, I found a new way to do it. Got it working on one page, and by the time I had added the code to all the pages on which we have comments, it was no longer working.
I am pulling my hair out trying to get this working, having virtually zero understanding of json. The FB API explorer gives me an error about auth tokens, but doing what I see recommended has no effect (i.e. creating a new fb app and including the block of auth code they provide).
This is what was working fine at first this am:
$fbcounturl = 'http://www.catalystathletics.com/articles/article.php?articleID=1902';
$fbjsonurl = "https://graph.facebook.com/v2.1/?fields=share{comment_count}&id=" .$fbcounturl;
$fbdata = file_get_contents($fbjsonurl);
$fbarray = json_decode($fbdata, true);
$fbcomcount = $fbarray['share']['comment_count'];
print($fbcomcount);
Then I could simply add $fbcomcount to the $comCount from our db.
If I just browse to the url, I get the json info fine:
{
"share": {
"comment_count": 3
},
"id": "http://www.catalystathletics.com/articles/article.php?articleID=1902"
}
But the $fbcomcount is empty.
Here is an example of a page that would use this -
http://www.catalystathletics.com/article/1902/Jumping-Forward-in-the-Snatch-or-Clean-Error-Correction/#comments
Any help would be GREATLY appreciated.
Ran into the same issue recently, Facebook comment count simply stopped working. Eventually tracked down the error in the returned JSOn response, telling me Error #4 Application request limit reached
{"error":{"message":"(#4) Application request limit
reached","type":"OAuthException","is_transient":true,"code":4,"fbtrace_id":"EUNAVRNgnFu"}}`
Here is a good, detailed response on Facebook Open Graph API limits I found elsewhere:
The Facebook API limit isn't really documented, but apparently it's something like: 600 calls per 600 seconds, per token & per IP. As the site is restricted, quoting the relevant part:
After some testing and discussion with the Facebook platform team, there is no official limit I'm aware of or can find in the documentation. However, I've found 600 calls per 600 seconds, per token & per IP to be about where they stop you. I've also seen some application based rate limiting but don't have any numbers.
As a general rule, one call per second should not get rate limited. On the surface this seems very restrictive but remember you can batch certain calls and use the subscription API to get changes.
As you can access the Graph API on the client side via the Javascript SDK; I think if you travel your request for photos from the client, you won't hit any application limit as it's the user (each one with unique id) who's fetching data, not your application server (unique ID).
This may mean a huge refactor if everything you do go through a server. But it seems like the best solution if you have so many request (as it'll give a breath to your server).
Else, you can try batch request, but I guess you're already going this way if you have big traffic.
If nothing of this works, according to the Facebook Platform Policy you should contact them.
If you exceed, or plan to exceed, any of the following thresholds please contact us as you may be subject to additional terms: (>5M MAU) or (>100M API calls per day) or (>50M impressions per day).

How to count website visitors accurately in php?

I am trying to implement visitor counter on a project but I am confused about one thing and it is what to accurately count as one visit or view. If I go with IP based counter then it means even if many people are visiting the website on same computer with same IP (like from a cyber cafe or a shared pc) it will count as one visit. If I go with simply incrementing visits every time homepage is opened then someone can keep refreshing the homepage to increase the count and it will not be accurate page views count.
So neither option gives accurate picture of visits.
So I am thinking of implementing IP based page views and if someone opens the homepage with same IP before 5 minutes it will not be counted as another view. Only after five minutes page count will be increase for same IP. So I want to ask whether this approach will give most accurate page view count or there is any other optimal solution?
Google analytics cannot be used as this website will be used on an intranet network.
Google Analytics is still an option for internal websites. I created a workflow application which is only available through our internal network, but Google Analytics still works. Only requirement is that the user that uses the application has internet access, so that the Google-Analytics Snippet can communicate with the servers.
I'd not recommend using your own methods to count visitors, unless you're planning to show these informations for all users (like it is the case with the view here on SO). You could still create some kind of internal mechanism easily, given the fact that people authenticate on your application or you can identify them somehow else.
Google Analytics and other tracking applications use cookies through javascript to track page visits and especially visitors. Due to the fact that cookies can be unique per session of a browser, this makes identifying different people on the same IP more easy.
However as #Ahatius points out, better not to reinvent the wheel if possible.
Google Analytics also has a php api (which I've successfully implemented in the past). However in that scenario you still have to do decide by yourself how to identify visitors and pageviews.

Twitter API Rate Limit for Multiple Users

I am writing a PHP-based application using the Twitter API. Up until now I've been using the REST API via a GET request on a PHP page. However, as my app scales, I can easily see it going over the 150 requests-per-hour limit. Here's why:
I have categories of topics, each which periodically poll the Twitter API for tweets around a topic. For example, I have: mysite.com/cars, mysite.com/trucks, etc. A user can go to either page. When he is on the page, live, refreshing updates are pulled from Twitter by making an AJAX call to a PHP page I've set up. The PHP page determines which category the user is coming from (cars, trucks), polls Twitter for search results, then returns the JSON to the category page. This sounds confusing, but there are a number of unrelated reasons I need to have the intermediate PHP page.
The problem is that since the PHP page is making the requests, it will eat up the rate limit very quickly (imagine if there were 20 categories instead of just cars and trucks). I can't make a single call with multiple parameters because it would combine multiple categories of tweets and I'd have no way to separate them. I can cache results, but if I did, the more categories I add, the longer each would have to go between API calls.
So how can I approach this problem? I looked at the streaming API, but it's only for oAuth'd users and I don't want my users to have to log in to anything. Can I use the stream on the PHP page and then just make continuous requests each time the category page polls the PHP page? Thanks for any help!
a) You don't have to use your websites user's oAuth credentials in streaming API - just your's:
get them somewhere in dev.twitter.com and hardcode them. Your users won't know there is any oAuth going on backstage.
b) Don't use anonymous requests (150 per IP per hour) use oAuth requests (350 per oAuth per hour). You don't have to ask your users to sing in - just sign in few (1 is sufficient for start) your private twitter accounts. If you don't like creating twitter login functionality, you can get credentials for your twitter account to your twitter application in dev.twitter.com .
c) As #Cheeso mentioned - cache! Don't let every pageload make twitter request.

Google Analytics and the precision of data/report

I was just curious to know if the data presented in the google analytics report includes bots/spiders/crawlers. One of the websites that we are building is still stealth(zero marketing, though the site went live about 20 odd days ago). My boss was happy and proud that we are having visitors from all over the world already. But I am a little skeptical :)
Will be great if someone can clarify this for me!
Thanks in advance!
Cheers,
--
Sniper
As explained above, Google Analytics uses a Javascript based mechanism, and since most crawlers won't execute JavaScript, you shouldn't see crawlers into your stats, but true visitors.
In some situations however you could get some "noise" into your stats:
someone has put, by mistake ?, your UA number into its website pages, and you get hits from another web site into yours
some services that you subscribed to monitor the availability of your site from all over the world (like IP Label) run embedded browsers that execute JS and then will showup into your stats
last, you run the mobile tracking code of GA on your site, which is server side and no more JS based, then you can get crawlers into your stats while GA should remove most of them.
To assess if case 1 applies, go into Visitors / Network properties / Hostnames and check if only your hostname is diplayed. In case of other domain names showing up, you can build an advanced filter to include only your hostname
For case 2, look at visits per service providers from day to day to highlight service providers having a stable number of visits per day over time. You may also look at pages with a high share of direct access + bounce + similar volume of pageviews per day over time : this is typical of a monitoring system looking always at the same page.
For case 3, look at web browsers identified by GA in Visitors / Browsers
Google Analytics uses JavaScript that isn't usually executed by bots so your reports shouldn't include data from them. You should be able to look at the visitor overview report to see that your visits are coming from agents that identify themselves as browsers, not bots.

Categories