I'm looking for coding my own PHP URL shortener. I have already built a system that knows to take a long URL, turn it into a shortened one (something like domain.com/go/URLID) and count the total click activity for it.
I want to add features like:
Daily usage graph (like Google Analytics shows visitor graph in a month).
Unique clicks count.
As I said, the code I made stores the total counts, but I'm not sure how to count unique clicks.
My approach for unique click counts is to use IP or cookies, but I'm not sure which one is more reliable (as cookies may expire and IP will count a full household as repeating clicks). How can I build this?
And the other part of click statistics by day: How can I do it? I was thinking about a very VERY long database table that stores every URL click, but I guess it will be too long, The queries will take time (and I have 300MB table size limit from my server provider).
I would like to get some help with the thing.
I don't mind using external but free services (as long as I can use my own domain, of course).
Thanks!
Related
I'm building a Wordpress plugin whose job is to calculate how much each published post should be paid depending on how many visits it registers. It relays on Google Analytics.
Now, when a post is published, it takes some time before it can be paid. Specifically, the post is ready to be paid when its visits counting exceeds a pre-set threshold (that we'll imagine is 100, for the sake of these examples). This means that to know when a post is ready the plugin needs to know if it has scored enough visits since it was published to the current time.
No, suppose we have:
Post A: published 20/07 Post B: published
25/07
The start time for post A in the GA request would be '2013-07-20', but for post B it would be '2013-07-20'. This means that, basically, every post would need its own request, which is unbearable both because the plugin pages would take something like 30 seconds to load AND GA would probably ban it soon. The plugin runs on big blogs as well, with thousand of published posts: even if I did some caching, there is still a lot of data that would need to be loaded fresh from GA.
Any help on how this could be sorted out? Thanks.
Update
After two months, and after Post A and B have already been paid once, we still want to pay the posts that have reached some visits threshold. It wouldn't make sense to ask for all the posts of the blog, it would potentially take forever and return a huge amount of data, so we only looking for posts that have, say, more than 1000 visits since the last payment. Now here comes the problem: the last payment date (which is GA start-date) is not the same for each post. Actually, it is different for each post. How would you cope with such a request?
If you know the start and end dates then why don't you just query for that time period and use the ga:pagePath dimension along with the metric you're after (visits, unique visits, or maybe pageviews). Then you can parse the response to get the metric for each post. For example:
start-date=2013-07-20
end-date=2013-07-25
dimensions=ga:pagePath
metrics=ga:visits,ga:pageviews
(or do unique visits or pageviews if that's what you want)
This will list all page paths during that period with at least 1 visit/pageview.
try the Query Explorer to get an idea of the data you want and the equivalent API query.
I know the title is complicated, but i was looking for some advise on this and found nothing.
Just want to ask if i'm thinking the right way.
I need to make a top facebook shared page with about 10 items or so for my website items (images, articles etc.)
And this is simple, i will just get the share count from facebook graph api and update in database, i don't want to make it in some ajax call based on fb share, it could be misused.
Every item has datetime of last update, create date and likes fields in database.
I will also need to make top shared url in 24h, 7 days and month so the idea is simple:
User views an item, every 10 minutes the shared count is obtained from fb graph api for this url and updated in database, database also stores last update time.
Every time user is viewing the item, the site checks last update datetime, if it is more than 10 minutes it makes fb api call and updates. It is every 10 minutes to lower fb api calls.
This basically works, but there is a problem - concurrency.
When the item is selected then in php i check if last update was 10 minutes ago or more, and only then i make a call to fb api and then update the share count (if bigger than current) and rest of data, because a remote call is costly and to lower fb api usage.
So, till users view items, they are updated, but the update is depending on select and i can't make it in one SQL statement because of time check and the remote call, so one user can enter and then another, both after 10 minutes and then there is a chance it will call fb api many times, and update many times, the more users, the more calls and updates and THIS IS NOT GOOD.
Any advise how to fix this? I'm doing it right? Maybe there is a better way?
You can either decouple the api check from user interaction completely and have a separate scheduled process collect the facebook data every 10 minutes, regardless of users
Or, if you'd rather pursue this event-driven model, then you need to look at using a 'mutex'. Basically, set a flag somewhere (in a file, or a database, etc) which indicates that a checking process is currently running, and not to run another one.
I run a local directory website (think yelp/yell.com etc) and need to provide analytical data to the businesses listed on the site.
I need to track the following:
1) Number of visitors to specific pages (ie: Jim's widgets was viewed 65 times)
2) Number of times a user clicks a link (ie: 25 users clicked to visit your website)
I am able to do this by simply adding one to the relevant number every time an action occurs.
What I would like to be able to do is split this into date ranges, for example, last 30 days, last 12 months, all time.
How do I store this data in the database? I only need the theory, not the code! If someone can explain the best way to store this information, I would be extremely grateful.
For example, do I use one table for dates, one for the pages/links and another for the user data (links clicked/pages visited)? The only solution I have so far is to add a new row to the DB every time one of these actions happens, which isn't going to scale very well.
Thanks to anyone that can help.
I would not reinvent the wheel and use an already available solutions such as Piwik. It can actually read your normal weblogs to provide all the information you asked for.
If for some reason you still need a custom solution, I would not save the tracking data in ranges, but rather use exact time and url-data for each individual page call (what your normal weblog provides). The cumulated data should be generated on-the-fly in your logic section, e.g. through a SQL-view:
SELECT count(url),url
FROM calllog
WHERE calldate > NOW()-30days
I would like to track all views to a page using php and mysql. I will be tracking the number of times a person viewed the page and the ip address along with the current date. However is there a way to make sure your tracking actual users rather than bots/spiders?
Two options that I see:
Create a "hidden" link on your home page to a honey pot. Any one who hits the honey pot page should be considered a bot and not included in your stats
2: Not a fool proof way, but you could compare the browser's User Agent string to a white list of known web browsers. This string can be spoofed so its not the most reliable.
Personally, I'd go with the first option.
For the honey pot:
on your home page I'd add something like this:
ReallyNotATrap
and on the honey pot page itself something like this:
$BotIp=$_SERVER['REMOTE_ADDR'];
//DB connection
Insert into BlackList($BotIp,$Date,$otherDataYouCareAboutLogging);
//close DB Connection
Then for your stats code simply compare every user's Ip to the BlackList table. If the user isn't on it, record the stats.
EDIT
As pointed out below, googlebot can get tricked by this. If this is something that matters to you (if your just filtering for your own stats and not filtering content it shouldn't matter), include your honeypot page in your Robots.txt. Google will read the text file and avoid the trap. Other nasty bots will fall into it. Since google will avoid our trap, I would also use option 2 and filter out Google's User Agent String from the stats.
The amount of real users should be basically the same number as the number of real users - bots. If you want to you can check the User Agent which will tell you who is browsing the site.
You could try out my tracking script, it's pretty simple to implement and bots and spiders will come up as a bunk browser so it's easy to weed them out. I use this on all my company's sites for analytics. There's one caveat though, if you use this for keyword tracking you may be disappointed real soon because Google is starting to change the structure of their query strings for logged in users.
https://github.com/k4t434sis/tracking.php
I want to trace each user time spent in my Facebook application.
I really don't have any idea how to code this, help me out. If someone has any ideas or hints, that will be enough.
I am using Graph API.
Either: Google Analytics
The easiest solution of course is using Google Analytics - FBML (Facebook Markup Language)
has a tag specifically for that: http://developers.facebook.com/docs/reference/fbml/google-analytics
That of course doesn't give you data on what a specific user did, but it is pretty good at telling you the time spent on various pages in your Facebook app. And morally its much nicer not to store what a specific user did exactly on your site.
Or: Self-coded solution
If you do want to track everything specifically, you'll first need to store when a page was accessed using PHP when loading the site and then storing in 10-seconds-intervals (or so) that the user is still present, using an AJAX call. To do that, I'd give the page view an ID and send a request to a page like this *i_am_still_here.php?p={page_view_id}* which takes the current timestamp and updates a database entry for that page view.
This solution has one problem: When a user opens a tab in the background and doesn't look at it for 30 minutes, you don't really want to store that 30 minutes as "the user being on the site".
Also, make sure that with whatever self-coded solution you choose, you have to take into consideration that people might have your Facebook app opened in more than one tab.
Your problem is not Facebook related per se. There are many ways you could implement this, it also depends on your particular application.
You could track every click a user makes (timestamp of the click) and then calculate the time spent from that (last click in session - first click in session). This approach is not very accurate off course since you don't know how many time a user was still using your application after the last click.
One other solution that comes to mind right now, would be to fire a XHR (AJAX) request every X seconds that would also log the timestamp in some storage (db, redis, memcache, ...) and you could then do the same calculation from that. It would be more accurate (depending on your interval X).
You can easily calculate the time spent on Facebook by downloading a software called TimRabbit which is a desktop application and starts automatically calculates your time spent on facebook. It only calculates your live time spent and ignores when you are acting as a idle user.
For details visit: http://etechdiary.com/calculate-time-spent-on-facebook/