creating analytics chart with php

creating analytics chart with php - php

i'm developing a web application with php and mysql. i have videos in my website, users can upload their videos.
Now, i want to increase the interactivity by creating simple analytics chart for the video views.
The graph chart will be containing the views for every month. The problem is i don't know where to begin with.
Can anybody tell me the logic to store the video views in the database of every month and then view in form of a bar chart? ( Like google analytics says the number of visitors every month or every week)

Google Charts will be perfect for what you are looking to do. I use them myself. Just output the stats with PHP json_encode.
Then use a cron job to fetch the current amount of veiws or other stats and store them in a data type text row.
https://developers.google.com/chart/

When to count a view?
You'll have to define when you count a view. Probably best would be to analyze the time gaps in which a certain user visited a video, how long it played and probably some identificators like his User-Agent, to be able to differenciate between several users behind an IP adress (if the user isn't logged in).
Inserting the view
Once you've decided if you count a view, you'll have to insert it into the database. This depends on what database and table structure you use. Have a look at a PHP MySQL Tutorial, or if you think you can do more, have a look PDO, which can be more secure and useful.
In order to be able to analyze monthly statistics, you'll have to insert each view with a timestamp. At the end, you'll just have to select the values for the specified date (in your case, the last month).
This requires that you at least know how to setup your database tables, in which you insert your data.
Showing the views
Since you'll have an entry for every view in you database, you're best shot is to create an automatic job, which updates another table with the total of the views. This will reduce the load, and also prevent your users from instantly seeing the view counts (might be helpful if someone is trying to bump view count --> he can't immediatly see if he's successfull).
Creating graphs
First you'll have to select the values. Let's say your user is able, to specify the date-range of which he wants to see the views, you'll have to take this range, query your database with it, and then proceed with the result. How you select it, depends on how you want to display it. If you want to show every single day of the month, you'll have to group the views by day.
How you actually display the data is open to you. It could be something really simple, something with tables, or you could use something like JpGraph to create graphs as images. I personally use JpGraph for stuff like that. Needs some working into, but it's worth the effort.
I hope this gave you somehow an idea how to solve your problem. If you have more specific problems, post some code and what you're trying to do, because that solution doesn't exist.

Have a table with columns video_id, count, month and year.
The video_id, month and year combined as an Unique Key.
Whenever a video is viewed (in programming terms requested) try to insert the video_id, 1, month and year.
Check whether you get a unique key violation. If you do then instead of inserting you update the count with count + 1.
To fetch the month and year you can use MYSQL Date functions.
video_id would be sent from the front end and count can be easily incremented.
Once you have this table you can always get the count for a specific video for that month and year combined.

Related

What's the best way to paginate unsorted and filtered database data via API?

I have been stuck with this problem for a while where I have to fetch data for a home page, like doing multiple queries to filter data by popularity, by most viewed etc. and merging all into a single query using "union". So the query orders them automatically by importance, for example featured goes first then most popular records go next and then most viewed and so on. The data changes from time to time if there is some new record, or some record becomes more popular than the other it might swap the order. However, when I fetch more data via a pagination or "load more", and at the same time some record swapped places with another one in the background, then this record would be shown again in the next page, which makes it redundant since it showed also on the first.
I checked out some twitter API algorithms with since_ID and max_ID, but in my case they don't help since I don't sort them by ID or any specific order, and this is where the complexity arises.
So how exactly am I supposed to deal with redundant data in this case? Has anyone ever had similar experiences?
Thanks in advance!

When you "load more", you can send the set of ID that is already displayed and consequently exclude them with an additionnal condition in your sql queries " and id not in ([excluded set here]) ".
However with a pagination system, it gets too complex since you have to pass the set of all page visited and you don't control the order of visit, it would turn into a complete mess.
So with pagination I would recommend you simply let your ranking be and eventually cache it for X minutes. So all the users experience the same ranking for a few minutes, and pages never show duplicate content. Also, to improve user experience you can add a visual feedback when the ranking is updated, which provide a sense of interactivity.

Processing and matching large amounts of data

I have one large database table of request data, much like Apache request logs, of about 50 million rows:
request_url
user_agent
created
that contains data like this:
/profile/Billy
Mozilla.....
2012-06-17...
/profile/Jane
Mozilla.....
2012-06-17...
I then have my user database table, with all my user data including usernames.
At the moment, every night, I process the request data for the previous day, row by row and see if it contains an URL that matches one of the usernames in the users table. If it does, I increment a total in another table that stores stats that allows users to see how many pageviews they got for any particular day.
However as the datasets grow, this is becoming resource intensive and can also take a long time to complete, even when grouping the request data by URL and grabbing a count for that group.
Is there a better way of processing this information to get the end result I need? The request data is going to be logged anyway, so it would be preferable to to generate the stats after the fact rather than incrementing the total on every page view.
I'm running this on one server, so distributed processing of the data on multiple servers isn't required.

Start with a fresh log-table every day. When the day is done, use it to increment the totals, then append it to that huge main log-table and delete it.

Incrementing total on every page view is your best option. It saves trouble of "search" later on for each user separately. It's just one extra query of update on every pageview, and thus processing load is spread out throughout the day instead of single time (Plus your stats stay updated all the time, instead of being updated daily)
If you are insistent on doing in SQL, you might consider
SELECT COUNT(request_url) FROM your_table WHERE request_url LIKE %/profile/username%
(though I am not sure if that's what you're already doing?)

Start looking into analytic database like Infobright. Column Based storeage engines are huge in the big data initiatives and are built for doing in memory analytics on aggregates as well as ad hoc querying.
Disclaimer: the author is affiliated with Infobright.

Personalized Search Results based on History

What are some of the techniques for providing personalized search results to a logged in user? One way I can think of will be by analyzing the user's browsing history.
Tracking: A log of a user's activities like pages viewed and 'like' buttons clicked can be use to bias search results.
Question 1: How do you track a user's browsing history? A table with columns user_id, number_of_hits, page id? If I have 1000 daily visitors, each browsing 10 pages on average, wont there be a large number of records to select each time a personalized recommendation is required? The table will grow at 300K rows a month! It will take longer and longer to select the rows each time a search is made. I guess the table for recording 'likes' will take the same table design.
Question 2: How do you bias the results of a search? For example, if a user as been searching for apple products, how does the search engine realise that the user likes apple products and subsequently bias the search towards them? Tag the pages and accumulate a record of tags on the page visited?

You probably don't want to use a relational database for this type of thing, take a look at mongodb or cassandra. That's because you basically want to add a new column to the user's history so a column-oriented database makes more sense.

300k rows per month is not really that much, in fact, that's almost nothing. it doesn't matter if you use a relational or non-relational database for this.
Straightforward approach is the following:
put entries into the table/collection like this:
timestamp, user, action, misc information
(make sure that you put as much information as possible, such that you don't need to join this data warehousing table with any other table)
partition by timestamp (one partition per month)
never go against this table directly, instead have say daily report jobs running over all data and collect and compute the necessary statistics and write them to a summary table.
reflect on your report queries and put appropriate partition local indexes
only go against the summary table from your web frontend

If you stored only the last X results as opposed to everything, it would probably be do-able. Might slow things down, but it'd work. Any time you're writing more data and reading more data, there's going to be an impact. Proper DBA methods such as indexing and query optimizing can help, but no matter what you use there's going to be an affect.
I'd personally look at storing just a default view for the user in a DB and use the session to keep track of the rest. Sure, when you login there'd be no history. But you could take advantage of that to highlight a set of special pages that you think are important or relevant to steer the user to. A highlight system of sorts. Faster, easier, and more user-friendly.
As for bias, you could write a set of keywords for each record and array sort them accordingly. Wouldn't be terribly difficult using PHP.

I use MySQL and over 2M records (page views) a month and we run reports on that table daily and often.
The table is partitioned by month (like already suggested) and indexed where needed.
I also clear the table from data that is over 6 months by creating a new table called "page_view_YYMM" (YY=year, MM=month) and using some UNIONS when necessary
for the second question, the way I would approach it is by creating a table with the list of your products that is a simple:
url, description
the description will be a tag stripped of the content of your page or item (depend how you want to influence the search) and then add a full text index on description and a search on that table adding possible extra terms that you have been collecting while the user was surfing your site that you think are relevant (for example category name, or brand)

Display latest search results from MySQL with PHP

Google unfortunately didn't seem to have the answers I wanted. I currently own a small search engine website for specific content using PHP GET.
I want to add a latest searches page, meaning to have each search recorded, saved, and then displayed on another page, with the "most searched" at the top, or even the "latest search" at the top.
In short: Store my latest searches in a MySQL database (or anything that'll work), and display them on a page afterwards.
I'm guessing this would best be accomplished with MySQL, and then I'd like to output it in to PHP.
Any help is greatly appreciated.

Recent searches could be abused easily. All I have to do is to go onto your site and search for "your site sucks" or worse and they've essentially defaced your site. I'd really think about adding that feature.
In terms of building the most popular searches and scaling it nicely I'd recommend:
Log queries somewhere. Could be a MySQL db table but a logfile would be more sensible as it's a log.
Run a script/job periodically to extract/group data from the log
Have that periodic script job populate some table with the most popular searches
I like this approach because:
A backend script does all of the hard work - there's no GROUP BY, etc made by user requests
You can introduce filtering or any other logic to the backend script and it doesn't effect user requests
You don't ever need to put big volumes of data into the database

Create a database, create a table (for example recent_searches) and fields such as query (the query searched) and timestamp (unix timestamp that the query was made) said, then for your script your MySQL query will be something like:
SELECT * FROM `recent_searches` ORDER BY `timestamp` DESC LIMIT 0, 5
This should return the 5 most recent searches, with the most recent one appearing first.

Create table (something named like latest_searches) with fields query, searched_count, results_count.
Then after each search (if results_count>0), check, if this search query exists in that table. And update or insert new line into table.
And on some page you can just use data from this table.
It's pretty simple.

Ok, your question is not yet clear. But I'm guessing that you mean you want to READ the latest results first.
To achieve this, follow these steps:
When storing the results use an extra field to hold DATETIME. So your insert query will look like this:
Insert into Table (SearchItem, When) Values ($strSearchItem, Now() )
When retrieving, make sure you include an order by like this:
Select * from Table Order by When Desc
I hope this is what you meant to do :)

You simply store the link and name of the link/search in MySQL and then add a timestamp to record what time sb searched for them. Then you pull them out of the DB ordered by the timestamp and display them on the website with PHP.
Create a table with three rows: search link timestamp.
Then write a PHP script to insert rows when needed (this is done when the user actually searches)
Your main page where you want stuff to be displayed simply gets the data back out and puts them into a link container $nameOfWebsite
It's probably best to use a for/while loop to do step 3
You could additionally add sth like a counter to know what searches are the most popular / this would be another field in MySQL and you just keep updating it (increasing it by one, but limited to the IP)

Twitter competition ~ saving tweets (PHP & MySQL)

I am creating an application to help our team manage a twitter competition. So far I've managed to interact with the API fine, and return a set of tweets that I need.
I'm struggling to decide on the best way to handle the storage of the tweets in the database, how often to check for them and how to ensure there are no overlaps or gaps.
You can get a maximum number of 100 tweets per page. At the moment, my current idea is to run a cron script say, once every 5 minutes or so and grab a full 100 tweets at a time, and loop through them looking in the db to see if I can find them, before adding them.
This has the obvious drawback of running 100 queries against the db every 5 minutes, and however many INSERT there are also. Which I really don't like. Plus I would much rather have something a little more real time. As twitter is a live service, it stands to reason that we should update our list of entrants as soon as they enter.
This again throws up a drawback of having to repeatedly poll Twitter, which, although might be necessary, I'm not sure I want to hammer their API like that.
Does anyone have any ideas on an elegant solution? I need to ensure that I capture all the tweets, and not leave anyone out, and keeping the db user unique. Although I have considered just adding everything and then grouping the resultant table by username, but it's not tidy.
I'm happy to deal with the display side of things separately as that's just a pull from mysql and display. But the backend design is giving me a headache as I can't see an efficient way to keep it ticking over without hammering either the api or the db.

100 queries in 5 minutes is nothing. Especially since a tweet has essentially only 3 pieces of data associated with it: user ID, timestamp, tweet, tweet ID - say, about 170 characters worth of data per tweet. Unless you're running your database on a 4.77MHz 8088, your database won't even blink at that kind of "load"

The Twitter API offers a streaming API that is probably what you want to do to ensure you capture everything:
http://dev.twitter.com/pages/streaming_api_methods
If I understand what you're looking for, you'll probably want a statuses/filter, using the track parameter with whatever distinguishing characteristics (hashtags, words, phrases, locations, users) you're looking for.
Many Twitter API libraries have this built in, but basically you keep an HTTP connection open and Twitter continuously sends you tweets as they happen. See the streaming API overview for details on this. If your library doesn't do it for you, you'll have to check for dropped connections and reconnect, check the error codes, etc - it's all in the overview. But adding them as they come in will allow you to completely eliminate duplicates in the first place (unless you only allow one entry per user - but that's client-side restrictions you'll deal with later).
As far as not hammering your DB, once you have Twitter just sending you stuff, you're in control on your end - you could easily have your client cache up the tweets as they come in, and then write them to the db at given time or count intervals - write whatever it has gathered every 5 minutes, or write once it has 100 tweets, or both (obviously these numbers are just placeholders). This is when you could check for existing usernames if you need to - writing a cached-up list would allow you the best chance to make things efficient however you want to.
Update:
My solution above is probably the best way to do it if you want to get live results (which it seems like you do). But as is mentioned in another answer, it may well be possible to just use the Search API to gather entries after the contest is over, and not worry about storing them at all - you can specify pages when you ask for results (as outlined in the Search API link), but there are limits as to how many results you can fetch overall, which may cause you to miss some entries. What solution works best for your application is up to you.

I read over your question and it seems to me that you want to duplicate data already stored by Twitter. Without more specifics on the competition your running, how users enter for example, estimated amount of entries; its impossible to know whether or not storing this information locally on a database is the best way to approach this problem.
Might a better solution to be, skip storing duplicate data locally and drag the entrants directly from twitter, i.e. when your attempting to find a winner.
You could eliminate duplicate entries on-the-fly then whilst the code is running. You would just need to call "the next page" once its finished processing the 100 entries its already fetched. Although, i'm not sure if this is possible directly through the Twitter API.

I think running a cron every X minutes and basing it off of the tweets creation date may work. You can query your database to find the last date/time of the last recorded tweet, then only run selects if there are matching times to prevent duplicates. Then, when you do your inserts into the database, use one or two insert statements containing all the entries you want to record to keep performance up.
INSERT INTO `tweets` (id, date, ...) VALUES (..., ..., ...), (..., ..., ...), ...;
This doesn't seem too intensive...also depends on the number of tweets you expect to record though. Also make sure to index the table properly.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.