Analtytics, statistics or logging information for a PHP Script - php

I have a WordPress plugin, which checks for an updated version of itself every hour with my website. On my website, I have a script running which listens for such update requests and responds with data.
What I want to implement is some basic analytics for this script, which can give me information like no of requests per day, no of unique requests per day/week/month etc.
What is the best way to go about this?
Use some existing analytics script which can do the job for me
Log this information in a file on the server and process that file on my computer to get the information out
Log this information in a database on the server and use queries to fetch the information
Also there will be about 4000 to 5000 requests every hour, so whatever approach I take should not be too heavy on the server.
I know this is a very open ended question, but I couldn't find anything useful that can get me started in a particular direction.

Wow. I'm surprised this doesn't have any answers yet. Anyways, here goes:
1. Using an existing script / framework
Obviously, Google analytics won't work for you since it is javascript based. I'm sure there exists PHP analytical frameworks out there. Whether you use them or not is really a matter of your personal choice. Do these existing frameworks record everything you need? If not, do they lend themselves to be easily modified? You could use a good existing framework and choose not to reinvent the wheel. Personally, I would write my own just for the learning experience.
I don't know any such frameworks off the top of my head because I've never needed one. I could do a Google search and paste the first few results here, but then so could you.
2. Log in a file or MySQL
There is absolutely NO GOOD REASON to log to a file. You'd first log it to a file. Then write a script to parse this file.Tomorrow you decide you want to capture some additional information. You now need to modify your parsing script. This will get messy. What I'm getting at is - you do not need to use a file as an intermediate store before the database. 4-5k write requests an hour (I don't think there will be a lot of read requests apart from when you query the DB) is a breeze for MySQL. Furthermore, since this DB won't be used to serve up data to users, you don't care if it is slightly un-optimized. As I see it, you're the only one who'll be querying the database.
EDIT:
When you talked about using a file, I assumed you meant to use it as a temporary store only until you process the file and transfer the contents to a DB. If you did not mean that, and instead meant to store the information permanently in files - that would be a nightmare. Imagine trying to query for certain information that is scattered across files. Not only would you have to write a script that can parse the files, you'd have to right a non-trivial script that can query them without loading all the contents into memory. That would get nasty very, very fast and tremendously impair your abilities to spot trends in data etc.
Once again - 4-5K might seem like a lot of requests, but a well optimized DB can handle it. Querying a reasonably optimized DB will be magnitudes upon magnitudes of orders faster than parsing and querying numerous files.

I would recommend to use an existing script or framework. It is always a good idea to use a specialized tool in which people invested a lot of time and ideas. Since you are using a php Piwik seems to be one way to go. From the webpage:
Piwik is a downloadable, Free/Libre (GPLv3 licensed) real time web analytics software program. It provides you with detailed reports on your website visitors: the search engines and keywords they used, the language they speak, your popular pages…
Piwik provides a Tracking API and you can track custom Variables. The DB schema seems highly optimized, have a look on their testimonials page.

Related

Adopting my current system to include real time notifications

I have a PHP system, that does everything a social media platform does, i.e. add comments, upload images, add objects, logins, sessions etc. Storing all interactions in a MySQL database. So i've got a pretty good infrastructure to build on.
The next stage of my project is to develop the system so that notifications are sent to the "Networks" of "Contacts", which are associated with one and other. Such as the notifications system like Facebook. i.e. Chris has just commented on object N.
I'm looking at implementing this system for a lot of users: 10,000+, so it has to be reliable. I've researched the Facebook integration, resulting in techniques such as memcache, sockets & hashing.
Are they're any systems that can be easily adapted to this functionality, as I could do with a quick, reliable implementation.
P.s one thought I had was just querying the database every 5 seconds for e.g. "Select everything that has happened in the last 5 seconds" using jQuery, Ajax & PHP, but thats stupid, it would exhaust the server & database right?
I've seen this website & this article, can anyone reflect on this to tell me what is the best approach as I am hesitant about which path to follow.
Thanks
This is not possible with just pure vanilla PHP/MySQL. What you can do is set MySQL triggers (http://dev.mysql.com/doc/refman/5.0/en/triggers.html) on your data, and then use the UDF function sys_exec (Which you will need to install on your MySQL server) to run the notification php script. See this post: Invoking a PHP script from a MySQL trigger
If you can get this set up it should be pretty reliable and fast.

Collecting and Processing data with PHP (Twitter Streaming API)

after reading through all of the twitter streaming API and Phirehose PHP documentation i've come across something I have yet to do, collect and process data separately.
The logic behind it, If I understand correctly, is to prevent a log jam at the processing phase that will back up the collecting process. I've seen examples before but they basically write right to a MySQL database right after collection which seems to go against what twitter recommends you do.
What I'd like some advice/help on is, what is the best way to handle this and how. It seems that people recommend writing all the data directly to a text file then parsing/processing it with a separate function. But with this method, I'd assume it could be a memory hog.
Here's the catch, it's all going to be running as a daemon/background process. So does anyone have any experience with solving a problem like this, or more specifically, the twitter phirehose library? Thanks!
Some notes:
*The connection will be through a socket so my guess is that the file will constantly be appended? not sure if anyone has any feedback on that
The phirehose library comes with an example of how to do this. See:
Collect: https://github.com/fennb/phirehose/blob/master/example/ghetto-queue-collect.php
Consume: https://github.com/fennb/phirehose/blob/master/example/ghetto-queue-consume.php
This uses a flat file, which is very scalable and fast, ie: Your average hard disk can write sequentially at 40MB/s+ and scales linearly (ie: unlike a database, it doesn't slow down as it gets bigger).
You don't need any database functionality to consume a stream (ie: you just want the next tweet, there's no "querying" involved).
If you rotate the file fairly often, you will get near-realtime performance (if desired).

PHP - detecting changes in external database-driven site

For a homework project, I'm creating a PHP driven website which main function is aggregating news about various university courses.
The main problem is this: (almost) each course has it's own website. These are usually just plain HTML or built using some simple free CMS system.
As a student, participating in 6-7 courses, almost every day you go through 6-7 websites checking if there are any news. The idea behind the project is that you don't have to do that, instead, you just check the aggregation site.
My idea is the following: each time a student logs in, go through his course list. For every course, get it's website (recursively, like with wget), and create a hash value of it. If the hash is different then one stored in database, we know that site has changed, and we notify the student.
So, what do you think, is this reasonable way to achieve the functionality?
And if yes, what is (technically) the best way to go about this? I was checking php_curl, put I don't know if it can get a website recursively.
Furthermore, there's a slight problem I have somewhat limited resources, only a few MB of quota on public (university) server. However, if that's a big problem, I could use a seperate hosting solution.
Thanks :)
Just use file_get_contents, or cURL if you absolutely have to (in case you need COOKIES).
You can use your hashing trick to check for modifications but it's not very elegant. What you want to know is when was it last changed. I doubt this information is on the website, but maybe they offer an RSS feed or some webservice or API you can use for this purpose.
Don't worry about doing recursive requests. Just make a new request each time.
"When all else fails, build a scraper"

How to track my visitors ? [best perfomance]

I've been asked to create a custom 'tracker' in PHP, to know where users are coming from and where they are going on the site.
I'm thinking of writing a simple script, which connects to a database, writes the ip, browser, and time of the visit, then closes the db link.
Is this the right way to do it ?
I've found a few similar questions on stackoverflow, but none mentioned performance.
Is there a reason you can't use a solution such as Google Analytics - its free and has some nice features such as heat maps which show traffic flow
The main disadvantage is that it requires you to embed some javascript on all the pages - which means that its client side
I suppose it's another question of the kind "I want superior performance, however I have no certain reason for that".
in fact, any solution will be fast enough as writing logs is not too heavy operation.
the only thing one have to keep in mind is not to use any indexes in case SQL database used.
that's all.
So, lets put aside that performance stuff.
The only complete solution would be analyzing web-server logs.
Any other method will not give you complete picture. Say, if there is some image hotlinked on other sites and makes heavy load because of that, you'd never notice that if you log only requests to php scripts.
So, you can run crontab-based script running every night parsing access logs and getting comprehensive information of all users and bots activity.
Check Piwik or New Relic, if you need more customization, you should take a look at Webalyzer and Visitors
N.B: You can customize Piwik by creating plugins http://geekmonkey.org/articles/34-how-to-write-a-piwik-plugin
Perhaps you need some special software like Webalyzer? (it's free and quite powerful)
Performance is easy to say but much harder to define. It depends on zillion circumstances and while i'm say: this is the best performance i can get - you might say: hey, what's this?
Personally i recommend Google Analytics. It does almost everything if you need (almost things you didn't need). Maybe you can get a small 'performance' boost if you storing it's source locally but there's a chance it's cached in users' browser yet.
Or, if you prefer open source solutions, give a shot for Piwik.
Piwik does just that, and it does it very well. There is also a Tracking API that you can use to track a lot of things about your visitors, using PHP or any other language (REST API). See more information on http://piwik.org/docs/tracking-api/
Also it is very modular & fast, don't reinvent the wheel :)

Light Blogging system sans database

This is a general programming question.
What is the best way to make a light blogging system that can handle images, bbcode-ish styling and text without a database back end? Light means not more than 50 to 100 posts in extreme cases.
What language(s) should be used? Is there any preferred data format for the information? How does security play out?
EDIT: Client has no database, is on a shared server. Can't change that. Therefore, no DB.
EDIT2:
Someone mentioned SQL Compact - does that require anything more than copying files to the server? The key here is again that things shouldn't require any more permissions than FTP Acess.
If you're looking to do it yourself; store each post as a file in a directory. Then to sort and limit the posts you rely partially on the file names to order and limit them, and potentially (in the case of a search) on reading every last file. Don't go letting users make 10,000 posts though. But yeah, the above is considered a flat file data format. You can get fancy by using a standard format like JSON, Yaml, or XML within each post file, and even fancier by requesting these with Ajax calls in mostly client side code.
Now if the reason you want to work with flat files is that you just don't want to install a database server, there's nothing stopping you from reading a local (to the server) file as a berkley DB, a Lucene Index, or an SQLite DB from within your webapp using the appropriate client library. You'll find any of these approaches a little more sane (a bit faster, a bit more readable in code) than the afore-mentioned with all the same requirements for installing on the server (read-write file permissions). Many web frameworks or languages (like php) come with the option of an API to these client libraries; SQLite, and Lucy (C Lucene) particularly.
If you're just looking for examples of it being done, I first (I think 1999 or 2000) came across blosxom which is a perl script that either runs as a cgi script per request or as a cron job. It builds a dated index of "posts" based on whatever you throw into the directory it's meant to scan. It also builds an RSS feed.
Jekyll or Blogofile are my favorite kind of solution for that, "compiling pages before upload".
I'm going to go out on a limb here and say that it's not always the destination, but the Journey.
If you're going to set out to do this, I recommend using a language you are comfortable. Personally, this would be C#/.net for me, but from your tagging, I'll assume PHP would be the Serverside scripting language you would choose.
I would layout how I wanted my application to behave. If there is going to be a lot of data, you should consider (as dlamblin mentioned) an DB of some sort for lookup and retrieval. (Light Blog, not so much data... 1000 users can edit? maybe you should consider a DB.) Once you've decided how to store the data, decide how to present it.
Write some proof of concept code for each of the features you want to implement (blog templating, bbcode, user authentication, text searching...) and start to work them all together.
search for flat-file cms-es on google, for example:
http://www.flatcms.org/
this has been already done, so there is no need to create such CMS again. there are plenty of them.
I concur with dusoft that this has already been done.
DotNetBlogEngine.net is an ASP.NET (C#) based blogging system that has a nice XML back-end as an option.
Doesn't answer your question directly but check Unify.
If you do not want to write a new one or want to get some inspiration:
Flatpress
Simple PHP Blog
Ninja Designs are working on a db-free wordpress clone
You could either use XML, or use SQL compact (which allows for handling things just like SQL Server, but instead of a database you utilize flat files).

Categories