Scalable Webstat Tracking - php

I am looking for the "best practices" or just recommended methods of tracking stats. If I am developing a site that has youtube-like page stat tracking (views, visits, etc). It is pretty important that I have realtime statistics, but I want to avoid issues when scaling and was wondering if there are other methods to solve this besides caching.
I plan to use Google Analytics for most of the statistics, but Google only updates once every 3-4 hours.
I am a little worried about scalability. Some stats need to be realtime - how does a site like youtube handle it? Do they count stats in memory and then defer a database query to once-every-30-mins or are they just caching read requests and updating those every few hours? What would you recommend doing?
Thanks again SO, I'm so glad that the rest of you can share the wealth of experience that I lack.

I can recommend one method of tracking web statistics, if you have an iPhone, you might want to look at TeddyTrack. (Full disclosure: I worked on the project.) It is as realtime as it gets. In fact if you shake your iPhone, it updates your stats instantly. :-) You only get four graphs but they include (complex) weekly cohort graphs. It's far less complex to setup and manage then Google Analytics. But why choose? Why not get Google Analytics, Piwik, AWStats and TeddyTrack? Use all of them and see which ones you like best. TeddyTrack might suit you because it is very lightweight and it keeps your data on your server. Links:
awstats.sourceforge.net, piwik.org, www.google.com/analytics, teddytrackapp.com.
You can also use no free ones like above mentioned chartbeat.com but they cost serious money.

Not sure which version of Google Analytics you are using but the newer versions support real time stats. Check out http://analytics.blogspot.com/2011/09/whats-happening-on-your-site-right-now.html
Also you could check what sitecatalyst from Omniture (now acquired by Adobe) has to offer. Its been a while since I worked with it, but it is really enterprise and scalable.
All the best!

Related

How can i guarantee that my site won't get down with too many players accessing it?

i'm creating a simple browser game with online transactions, but i'm thinking... "How can i guarantee that my site won't get down with too many players accessing it?"
I'm asking because i'll pay digital influencers to do the marketing, so i suppose many people will access it...
I should contract a VPN and run backend with node.js or pure PHP will do a good job to hold the site?
Site stability has a lot of different factors. Two main points to consider:
If your site is static HTML and JS files, using a CDN like Cloudflare will provide very strong protection against the site ever going down.
Assuming there's a heavier lift than static files (like DB calls and server-side processing), this ultimately comes down to two factors:
The specs of your server (e.g. ram, CPUs)
The efficiency of your code
Books can be written about how hardware and code can be improved. Ultimately releasing it in the wild will show you how they handle the load. Great monitoring software (like AppOptics) can give you insights into when you're getting close to any limits and need to upgrade hardware or optimize code.
Practically speaking, if you're not expecting a giant load on day one (which, unless you have a fantastic marketing channel or a lot of followers, you likely won't have), you should be more concerned with building something of value than optimizing it. Optimizing comes later.

facebook/gmail alike web chatbox - what is a good way for nowadays chatapp to store text message?

I'm currently building a facebook alike chatbox, and I have encounter several considerations and problems along the way.
I had been googling useful resources all the time,like simple chatbox example or tutorial online.
My goal is to build one just like facebook/gmail chatbox and CometChat, I know it's hard and too much thing to scale behind the scene, but all I want to do is building it as simple as possible, and figuring out how facebook/gmail chatbox implement their chat functionality.
Progress:
I have finished facebook-like chatbox structure where I have sidebar at the right displaying online friends i can chat with, and popup chatbox at the bottom, and it is able to expand and minimize it.
I also have finished simple chatting based on MySQL database.
There's a table with 4 columns 'sender', 'receiver', 'message', 'time' for storing conversation.
My chatbox works this way:
1.The user send a message, and my front-end javascript will fetch the message the user type in and send the message to php file on the server via Ajax.
2. backend php file will store this message to MySQL.
3. The front-end will call the update function every 3 seconds to update the chatbox content if receiver send message to the sender, and show it out in frontend's chat.
I'm not sure this is a good way and long way to do, and I'm really concerned about it.
If users grow and grow, I have to think of ways to scale it well or my database and server will explode and frontend users might feel high latency in updating conversation.
Is BigTable a right way to do this if you have millions of users online?
How does facebook store their customer's text message or chat history in the backend well??
How does chat app like Whatapp store their text message?
Is it able to let the users chat directly to another user without storing state in server?
If I want to implement the chat history functionality in my chatbox, what is a good way to do ??
I am thinking server can create .txt file for each conversation in their file system, and it has a database table column to store the file path. Is this a good way and right way to do with chat history, I know its possible to do it this way, but im not sure if its a right way or good way.
I know this could be a huge, detailed application.
I'm asking not a detailed implementation but a big picture, concept of building it!
thank you!.
That's a good question and here's an attempt at answering it.
I believe you are thinking about scalability a bit too early. Your IM app might not reach the projected number of users for it to stop performing well. Consider enhancing your small product and scale as you go as much as is needed.
Disk I/O is one of the issues that you will face scaling your web application. Storing communication directly onto the disk with txt file might not be a reliable solution.
Push your technology stack to its limits before considering changing it or switching to something else. I assume you are using a relational database for your storage (since you mentioned columns and rows, which is not an ultimate indicator but still), there are other options out there that have good benchmarking results at the expense of multiple other compromises. (NoSQL: which you referred to as BigTable) is one option. Relational databases are great, they have been for quite a long time the industry standard but currently there are alternative solutions that are quite promising.
Look into NoSQL document based datastorage solutions such as MongoDB, CoucheDB or even Casandra and there are many others. There is a considerable amount of information about the performance of each, under specific circumstances and situations. Choose what is best for the problem at hand and not what is most fashionable or hipped.
Another option would be to outsource your scalability problems to a 3rd Party provider such as Firebase. In this situation all you have to worry about is your product and not what's happening under the hood.
Store only the data that you need and archive or dismiss what you don't.
With scalability there are generally 2 broad categories: Horizontal and Vertical scaling.
Horizontal: means adding more nodes to your system i.e. adding more server instances to handle the extra load. There are many cloud solution providers out there that make this genre of scaling very cheap and instantaneous.
Vertical: means adding more resources to the node you are currently running your app from in addition to use specific technologies that allow you to take full advantages of your resources. This optimization happens on the level of the instance resources (i.e. CPU, RAM, Disk Space etc...) and your data storage, programming language of choice, algorithms you are using etc... You might realize that php and mysql aren't the tools for this job, but that's arguable.
Read More about it here
Distributed Systems architects / programmers also take advantage of other (faster) programming languages at runtime (such as C, C++ or even Java) to speed up certain tasks. Look into how you can dissect your application into smaller decoupled modules / components that can run independently. (But i'm not sure if you will ever reach this stage with an IM client unless it becomes as popular as Whatsapp or Facebook chat).
I advise you to grab and read a couple of books about scaling web applications and leveraging cloud computing. Study scalable architectures and design your application depending on your business logic based on them.
This is a very broad and complex topic, I'm sure others might have additional interesting insight on the matter.

How to track my visitors ? [best perfomance]

I've been asked to create a custom 'tracker' in PHP, to know where users are coming from and where they are going on the site.
I'm thinking of writing a simple script, which connects to a database, writes the ip, browser, and time of the visit, then closes the db link.
Is this the right way to do it ?
I've found a few similar questions on stackoverflow, but none mentioned performance.
Is there a reason you can't use a solution such as Google Analytics - its free and has some nice features such as heat maps which show traffic flow
The main disadvantage is that it requires you to embed some javascript on all the pages - which means that its client side
I suppose it's another question of the kind "I want superior performance, however I have no certain reason for that".
in fact, any solution will be fast enough as writing logs is not too heavy operation.
the only thing one have to keep in mind is not to use any indexes in case SQL database used.
that's all.
So, lets put aside that performance stuff.
The only complete solution would be analyzing web-server logs.
Any other method will not give you complete picture. Say, if there is some image hotlinked on other sites and makes heavy load because of that, you'd never notice that if you log only requests to php scripts.
So, you can run crontab-based script running every night parsing access logs and getting comprehensive information of all users and bots activity.
Check Piwik or New Relic, if you need more customization, you should take a look at Webalyzer and Visitors
N.B: You can customize Piwik by creating plugins http://geekmonkey.org/articles/34-how-to-write-a-piwik-plugin
Perhaps you need some special software like Webalyzer? (it's free and quite powerful)
Performance is easy to say but much harder to define. It depends on zillion circumstances and while i'm say: this is the best performance i can get - you might say: hey, what's this?
Personally i recommend Google Analytics. It does almost everything if you need (almost things you didn't need). Maybe you can get a small 'performance' boost if you storing it's source locally but there's a chance it's cached in users' browser yet.
Or, if you prefer open source solutions, give a shot for Piwik.
Piwik does just that, and it does it very well. There is also a Tracking API that you can use to track a lot of things about your visitors, using PHP or any other language (REST API). See more information on http://piwik.org/docs/tracking-api/
Also it is very modular & fast, don't reinvent the wheel :)

Real time activity feed - code / platform implementation?

I am defining out specs for a live activity feed on my website. I have the backend of the data model done but the open area is the actual code development where my development team is lost on the best way to make the feeds work. Is this purely done by writing custom code or do we need to use existing frameworks to make the feeds work in real time? Some suggestions thrown to me were to use reverse AJAX for this. Some one mentioned having the client poll the server every x seconds but i dont like this because it is unwanted server traffic if there are no updates. I was also mentioned a push engine like light streamer to push from server to browser.
So in the end: What is the way to go? Is it code related, purely pushing SQL quires, using frameworks, using platforms, etc.
My platform is written in PHP codeignitor and DB is MySQL.
The activity stream will have lots of activities. There are 42 components on the social networking I am developing, each component has approx 30ish unique activities that can be streamed.
Check out http://www.stream-hub.com/
I have been using superfeedr.com with Rails and I can tell you it works really well. Here are a few facts about it:
Pros
Julien, the lead developer is very helpful when you encounter a problem.
Immediate push of new feed entries which support PubSubHubHub.
JSon response which is perfect for parsing whoever you'd like.
Retrieve API in case the update callback fails and you need to retrieve the latest entries for a given feed.
Cons
Documentation is not up to the standards I would like, so you'll likely end up searching the web to find obscure implementation details.
You can't control how often superfeedr fetches each feed, they user a secret algorithm to determine that.
The web interface allows you to manage your feeds but becomes difficult to use when you subscribe to a loot of them
Subscription verification mechanism works synchronous so you need to make sure the object URL is ready for the superfeedr callback to hit it (they do provide an async option which does not seem to work well).
Overall I would recommend superfeedr as a good solution for what you need.

API implementation for my social network

I know about code development using PHP but not much about modern day web APIs. I want to implement a framework of APIs like Facebbok connect. Myspace connect, Google connect etc for 2 purposes:
1) Users can upload photos to their photo album
2) Other websites can login users using authentication from my site (similar to facebook/Google connect).
So firstly, what is the underline technology / server requirements etc to implement this? Can i use PHP? Then what other schema changes are required? I see facebook has public API keys that other developers use for this. But I am not sure on the implementation.
Probably the best place to start is to read up on http://oauth.net/. I know there are several OAuth implementations in several languages like Java and .NET. I'm confident there is something for PHP (since Facebook is primarily PHP). Just have to hit the google.
I disagree. People use facebook because there is no option out there. I use it too (purely to show off to my friends what a loser I am on weekends when I sit at home and keep posting updates of how awesome my day was). But if i find a better network (not in terms of features but pure trust, a social network that actually respects its users and the information people share, I will switch. Creating an account is no big deal, takes 1 minute. But trust is a life time thing. Once broken it rarely ever comes back. I see it like a relationship. When i created my facebook account 4 yrs ago i was in a relationship with facebook. They betrayed me time after time, year after year and I have no respect for it now. It is like that partner who has cheated on you so many times that today you wish it would just die and fade away. If I find something better and I am out, and all my friends whom I know well enough share my views too. So you will get users, no doubt.
I like your idea of trying to create something, this is how we grow. If everyone thinks like these other people here then there will be no progress in the world. Everyone will be a follower only and not a leader. Google would have said there is a yahoo and microsoft which is huge let's just follow them. But they took their time, fine tuned their model and today they are bigger than these these brands. Of course it is a different story they are a bigger offender of being a big brother than facebook but with power, 99% of the time comes these unethical minds who want to take over the world. If you can fall in the 1% who can have power and remain true to your users, people will follow you in a true sense.

Categories