Server-side analytics and server-side logging in particular (PHP) - php

I need to log details in PHP of analytics and usage.
I'm looking at various possibilities:
- Google Analytics server-side
- segment.io
- Just adding a record to a DB with PHP
My concern is how much additional processing this will take on my server. Of course Google Analytics' JavaScript implementation won't use anything on my server, but my server-side method of course will.
I also notice that on https://segment.io/docs/integrations/google-analytics they mention that "Server-side Google Analytics is being deprecated due to difficulty of use" - what does this mean?
So basically, I want to implement some basic analytics storing (count number of hits to a URL + some other basic info) server-side - what's the best way to do this considering all things? I only use the PHP language.
It seems that adding a record to the DB every page view might be a little too much.

Segment.io can actually give you the flexibility of all three of these. Using the php library https://segment.io/libraries/php you can start sending events from your server. The library is designed to queue and batch to maximize server efficiency.
Once the events leave your server, they'll go to Segment.io's servers. Once there, we can route the data to Google Analytics.
Additionally, you could use the "Webhooks" integration on Segment.io to set your own server as a receiving endpoint for the data in real-time, so that you could host your own analytics DB separately from the rest of your infrastructure quite easily/cleanly.
https://segment.io/docs/integrations/webhooks

Related

How can handle thousands of requests per second using php and mysql?

I would like to implement an API using php and mysql technologies that can handle several thousands of requests per second.
I haven't did this kind of API before. If you have an experienced to implement similar task could you please tell me what are the steps?
How can I implement such kind of API that can handle thousands of request per second?
I would be glad if you could explain with sample codes.
Thanks in advance for your help.
Based on the details described in the post, you likely want to use an asynchronous, stateless architecture. So requests don’t block resources and can scale easier (always sounds easier than actually doing it ;)).
Without knowing to what other services these servers would connect (it certainly doesn’t make things easier), I’d go for Elixir/Erlang as programming language and use Phoenix as a framework.
You get a robust functional language which comes with a lot of great built-in features/modules (e.g. mnesia, roll/unroll versions while being live) and scales well (good in utilizing all cores of your server).
If you need to queue up requests to the 2nd tier servers AMQP client/server (e.g. RabbitMQ) might be a good option (holds/stores the requests for the servers).
That works pretty okay if it’s stateless, in form of client asks one thing and the server responds once and is done with the task. If you have many requests because the clients ask for updates every single second, then you’re better switching to a stateful connection and use WebSockets so the server can push updates back to a lot of clients and cuts a lot of chatter/screaming.
All of this writing is from a ‘high up view’. In the end, it depends on what kind of services you want to provide. As that narrows down what the ‘suitable tool’ would be. My suggestion is one possibility which I think isn’t far off (Node.js mentioned before is also more than valid).
Well you need to consider several factors such as:
Authenticating the API. Your API should be called by valid users that are authorized and authenticated
Caching API results. Your API should cache the results of API call. This will allow your API to handle requests more quickly, and it will be able to handle more requests per second. Memcache can be used to cache results of API call
The API architecture. RESTFul APIs have less overhead as compared to SOAP based APIs. SOAP based APIs have better support for authentication. They are also better structured then RESTFul APIs.
API documentation. Your API should be well documented and easy for users to understand.
API scope. Your API should have a well defined scope. For example will it be used over the internet as a public API or will it be used as private API inside corporate intranet.
Device support. When designing your API you should keep in mind the devices that will consume your API. For example smart phones, desktop application, browser based application, server application etc
API output format. When designing your API you should keep in mind the format of the output. For example will the output contain user interface related data or just plain data. One popular approach is known as separation of concerns (https://en.wikipedia.org/wiki/Separation_of_concerns). For example separating the backend and frontend logic.
Rate limiting and throttling. Your API should implement rate limiting and throttling to prevent overuse and misuse of the API.
API versioning and backward compatibility. Your API should be carefully versioned. For example if you update your API, then the new version of your API should support older version of API clients. Your API should continue to support the old API clients until all the API clients have migrated to the new version of your API.
API pricing and monitoring. The usage of your API should be monitored, so you know who is using your API and how it is being used. You may also charge users for using your API.
Metric for success. You should also decide which metric to use for measuring the success of your API. For example number of API calls per second or monitory earnings from your API. Development activities such as research, publication of articles, open source code, participation in online forums etc may also be considered when determining the success of your API.
Estimation of cost involved. You should also calculate the cost of developing and deploying your API. For example how much time it will take you to produce a usable version of your API. How much of your development time the API takes etc.
Updating your API. You should also decide how often to update your API. For example how often should new features be added. You should also keep in mind the backward compatibility of your API, so updating your API should not negatively affect your clients.
Good answer, I think one thing to keep in mind is where the bottleneck is. Many times, the bottleneck isn't the API server itself but the data access patterns with the persistence layer.
Think about how you access your data. For posting new items, a lot of times the processing can be delayed and processed async to the original request. For example if resizing an image or sending an email, you can integrate RabmitMQ or SQS to queue up a job, which can be processed later by workers. Queues are great in buffering work so that if a server goes down, stuff is just queued up to be processed once back online.
On the query side, it is important to understand how indexing works and how data is stored. There are different types of indices, for example hash tables can give you constant access time, but you cannot perform range queries with hash tables. The easiest is if you have simple decentralized data objects queried by identifiers which can be stored in a index. If you're data is more complex where you need to do heavy joins or aggregations, then you can look at precomputed values stored in something like Redis or memcache.

Storing high traffic events from web pages on Google Cloud

I am trying to build (to keep it as simple as possible) "something like Google Analytics". It means: I want to store objects with few informations (size <2KB) from web page into some storage and be able to query it.
I have JS code that sends those event objects to the PHP endpoint on Google App Engine. This endpoint then inserts it into Google BigQuery. Here comes my problem: the insertion is done via Google API PHP library - REST request. So it performs HTTP request which is very slow.
My question here is: is there a better way to store the events in Google Cloud environment? Is better (and more cost effective?) to use PubSub or Redis for storing events there and have some workers in the background that loads this queue to the BigQuery?
Any idea how to do this as efficient (both in performance and cost) as possible would be greatly appreciated!
If I had to do this I would first make the endpoint handler save the raw data into a push queue, because enqueuing stuff is relatively fast on App Engine. The processing of the data and the BigQuery API calls would be done later in the task queue.
I guess the performance and cost you can get also vary a bit depending on the App Engine language(PHP,Go,Java,...).

Pipe Apache Logs to Google Analytics?

Does anyone know of a piece of code that can run on a server that pipes the data from Apache logs into Google Analytics? I've got a bunch of websites that generate logs, but the users would likely object to injecting Google tracking codes into them. This might be a nice way to get the basics, what's being requested from where, and have it all sorted for me in with my other Google Analytics pages.
You can use the new measurement protocol (available for universal analytics account only) to implement a serverside solution.
Piping logs would probably not work very well (at least if you want to do a batch job - I don't think you can send a timestamp via the measurement protocol, so it would look as if all hits occured at the same time) but it shouldn't be necessary anyway, just create an url with the relevant parameters pointing to the google endpoint and sent it in the background via CURL (or similiar).
If you're in the European Union remember privacy guidelines still apply and you need to inform users and provide an opt-out link.
For non-Universal Analytics accounts, you can use php-ga - Server-Side Google Analytics Client -- it's essentially a server-side implementation of ga.js.
One caveat: If you want the location metrics to record something other than the location of your server, you'll need to log with a Google Analytics mobile tracking ID. Just replace the "UA" in the tracking ID with "MO", like "MO-12345678-1"
GA needs JavaScript, I think, so that various things like screen resolution can be grabbed. So even if this were possible, you'd be missing a good deal of info for some of your users, skewing your other percentages. Also, if your users are suspicious of Google, they probably would not want you to upload their IP addresses to GA.
With all that in mind, I wonder whether a self-hosted GA-like system would fit the bill? If so, try Piwik.

Web Chat in PHP + MySQL +JavaScript

I am looking to create a Web Chat system using PHP, MySQL and JavaScript.
Currently, I am storing messages in a MySQL database with an incremental ID (Yes, it is indexed), a timestamp, the sender, and the message itself. I am then using AJAX to query the database every 500ms, seeing if there are any more recent messages than the last one received. However, I have a feeling that this is probably horribly inefficient as it will result in a large load on the MySQL server when multiple users are online. Having looked around a bit on Google and on here, everything seems to point to this way of doing it.
My question: is there a better way to do this? Any tips on how to reduce the load on the server would also be welcome.
I'm using PHP 5.3, on an Apache webserver, so libraries or plugins compatible with those would be fine.
EDIT:
Forgot to mention in the original post, but I'm not worried about supporting IE or other outdated browsers.
Potentially viable basic approach:
Cache your 50 most recent messages in memcache. Reset this whenever a new entry is added to the database. When new users connect, serve them these 50 messages to populate their chatroom.
Use a third party service like http://www.pubnub.com/ to send messages to your clients. Whenever a new message is sent to your chatroom, send it out on pubnub. Your server code will do this after writing to your database successfully.
notes: I'm not affiliated with pubnub. You don't need to use 50 messages above either. You don't even have to give them any messages when they connect depending on how you want to set it up. The point is that you want to avoid your users reading from your database in this case - that model isn't likely to scale for this type of application.
Ideally, an evented environment would be ideal for this kind of app. The LAMP stack is not particularly well suited.
I would recommend using this library, Pubnub. Pubnub is an easy way to send radio signals via javascript, or any TCP language (such as PHP) - and javascript instantly recieves the sent messages.
In PHP, you could simply have it save to your database - then use Pubnub's PHP API's to send the message to everyone else on the page.
If your familiar with Html, Javascript, and PHP - it can be fairly easy to learn. I would recommend it.
You are asking about a web chat system specifically built in PHP, MySQL and HTML with JavaScript. There are many options including Pre-built solutions: http://www.cometchat.com/ and http://www.arrowchat.com/ which all have chat comet services powered by a cloud offering like http://www.pubnub.com/ with options to host it yourself. See more about CometServices http://www.cometchat.com/cometservice/third-party-alternatives where you compare the service providers. There are several more options, however I recommend starting there. If you needs something more simple, like HTML and JavaScript only solution, you can check out http://www.pubnub.com/blog/build-real-time-web-apps-easy which is a blog about building real-time web apps easy with an example chat app in 10 lines of JavaScript Code. The solution Cuts Development Time by providing full Cross Platform for all browsers and mobile devices.
You should look into ajax long polling, in a nutshell this a simple ajax call but will not return a result from the server if there is no new data. You just do a simple loop on the server side until new data will be available then return it. Of course you have to stop this eventually if there's no result to send to client after a while (eg. 1 minute) then restart the call.
I suppose, that chat is too intensive for storage engines MySQL. Maybe, MEMORY table type will be ok, never used it. I spoken to several developers and everybody agree, that best option for chat is Memcache or even writing your own custom daemon (with memory-only storage as weel).
For client part you may read about short-polling, long-poling and web-sockets / sockets via flash/Java object.
using AJAX to query the database every 500ms
Is short-polling.
Sockets are a better solution than AJAX polling, however isn't much around about how you can integrate socket based chats with MySQL.
I have done a few tests and have a basic example working here: https://github.com/andrefigueira/PHP-MySQL-Sockets-Chat
It makes use of Ratchet (http://socketo.me/) for the creation of the chat server in PHP.
And you can send chat messages to the DB by sending the server JSON with the information of who is chatting, (if of course you have user sessions)

Real time activity feed - code / platform implementation?

I am defining out specs for a live activity feed on my website. I have the backend of the data model done but the open area is the actual code development where my development team is lost on the best way to make the feeds work. Is this purely done by writing custom code or do we need to use existing frameworks to make the feeds work in real time? Some suggestions thrown to me were to use reverse AJAX for this. Some one mentioned having the client poll the server every x seconds but i dont like this because it is unwanted server traffic if there are no updates. I was also mentioned a push engine like light streamer to push from server to browser.
So in the end: What is the way to go? Is it code related, purely pushing SQL quires, using frameworks, using platforms, etc.
My platform is written in PHP codeignitor and DB is MySQL.
The activity stream will have lots of activities. There are 42 components on the social networking I am developing, each component has approx 30ish unique activities that can be streamed.
Check out http://www.stream-hub.com/
I have been using superfeedr.com with Rails and I can tell you it works really well. Here are a few facts about it:
Pros
Julien, the lead developer is very helpful when you encounter a problem.
Immediate push of new feed entries which support PubSubHubHub.
JSon response which is perfect for parsing whoever you'd like.
Retrieve API in case the update callback fails and you need to retrieve the latest entries for a given feed.
Cons
Documentation is not up to the standards I would like, so you'll likely end up searching the web to find obscure implementation details.
You can't control how often superfeedr fetches each feed, they user a secret algorithm to determine that.
The web interface allows you to manage your feeds but becomes difficult to use when you subscribe to a loot of them
Subscription verification mechanism works synchronous so you need to make sure the object URL is ready for the superfeedr callback to hit it (they do provide an async option which does not seem to work well).
Overall I would recommend superfeedr as a good solution for what you need.

Categories