Use live API or save data locally to own database?

Use live API or save data locally to own database? - php

I'm not sure where is the best place to ask this question, so maybe if this is the wrong place, someone can help move this question to a better group?
It has elements of programming, user experience and database, but doesn't really fit well into any one category!
I need to take data and display it in a graph on my site. This data is available from an API.
But I cant decide if it is best to just get this data from the API "live" when needed, or to save data from the API to a local (on my own server) database.
Both methods have pros and cons.
Getting the data live means more URL requests, more latency, and if the site is used by many users, may limit the API access. I 'assume' the site will always be available if using the data live. The API data is also restricted to the past 2000 historical data points.
If I use a cron job to request the data, say once an hour, and save it to my own database, then I am only calling the API once every hour. Accessing my own database should be faster than calling an API from a URL GET request when drawing my page. And if my site is up, then my database will be up, so I don't need to worry about the API site uptime. And I can store as many historical data points as I want to, if I am storing the data myself.
But it seems wasteful to simply duplicate data that is already existing elsewhere.
There could be millions of data points. Is it really sensible to store perhaps 50 millions pieces of data on my own server, when it already exists on an API?
From the user's perspective, there shouldn't be any difference as to which method I choose - other than perhaps if my site is up and the API site is down, in which case there would be missing data on my site.
I am torn between these two options and don't know how best to proceed with this.

Related

Android - Retrieving data from remote Database efficiently

I am working in an Android project designed for doctors. Doctors are required to authenticate when they open the app for the first time.
This authentication process is done through a HTTPS connection, using PHP code in the server-side that returns JSON code to the app, letting it know if the connection has been successful and, if it is the case, it also returns that doctor's list of patients. Let me show a piece of JSON code that would be returned in case of a successful log-in:
Obviously, if the log-in were unsuccessful, the "listOfPatients" attribute would carry no data. After the server generating this JSON code, the app would simply read through it using a JSON Parser.
Now imagine the doctor doesn't have just 3 patients, but 100 patients. And each patient doesn't have just 3 attributes ("Age", "Phone", "Smoker") but dozens of them appropriately nested where required. We would then have a somewhat large (but maybe not too complex) JSON code to read through.
In this project I am designing the Client code (i.e. the Android App), whereas the Server code is written by other guy. He is asking me how I'd like the server code to be written in order to facilitate the "Android Client - Server" interaction and achieve the best, smoothest user experience possible.
I answered (this is the really-short version of my answer; don't concern about the server-side-code security since it is not the goal of this question):
Create a login.php that allows for POSTS queries. My App would send "user" and "password" and the server would compare it with the database.
The server would then generate appropriate JSON code depending on the success of the doctor's log-in request.
The Android app would simply parse this JSON and display it to the user in form of list-views, and so on (the way I display this data to the doctor does not matter here in this question).
I was wondering two things:
Knowing that the JSON will contain hundreds of attributes, how efficient is this code? Is there a better way to achieve this functionality? How would you do it?
The vast majority of these attributes' values will change on a daily basis (for example, "bodyTemperature" or "bloodPressure"). Furthermore there will be "importantNotifications", where patients would notify their doctors in case of an emergency situation. I don't think it would be efficient to go through the entire process ("server create JSON ==> client read JSON ==> client display JSON") over and over again, minute after minute, hour after hour, day after day. There must be a better way to do it (maybe local storage? I would then have to discern which attributes to read only once a year ("age"), once a month ("phone"), once a day ("bodyTemperature") or every 30 minutes ("importantNotifications"); How could I then discriminate which values I'd need to read from the JSON in each session?)

Now you will be likely using GSON to parse the response from the server. Also you might define default values and tell the server not to return anything that is equal to default values like smoker - default NO (minimize the ammount of data to transfer). You are highly likely to display the patients in a ListView or RecyclerView.. Google a bit how to implement lazy loaders, meaning you tell the server to return just a few results, not all and when the user scrolls to end, you ask the server to give you more if there are any.
Also using caches on Android is a great way to save a couple of unnecessary requests to the server. You define of how long a cache is valid, say 5 mins and if you want to repopulate a list, check if it's still valid. But you should always leave a manual refresh option, SwipeToRefresh is a great and simple way to do just that.
Hope somebody else can have something more as I am interested in this also.

Working with data received via an API on a site (JSON, XML)

I think it is inefficient to do a request on each page visit to collect the information. I would like to know what the theory of best practises are regarding to working with data that is received from an API.
Should the data be stored in my own database or should I just make a call to the API every time?
As example if I would use Strava to get my running results. Should I just use the result directly from the API or should I store these in a database. And than check after a certain period if there are new or updated results and update my database accordingly?
I have read some articles about RESTful API's but they only give information about how to get, delete, update the data etc.

It depends on what you are using it for...
If the information is likely not to change and is likely to be reaccessed on your site then store it.
Otherwise just get it from the API, it will always be up to date and it's less code to write if you don't have to store it.
If you are storing it and you know what information you are likely to require you could retrieve in the background at set-intervals.
You can also look at other factors when making your judgement, such as the speed of the API or if you have a API call limit.
You are probably not finding an definitive answer as there isn't one, but I hope this clarifies this for you =]

I don't know of Strava or whatever API you will use.
In my case, generally in applications there is a need to keep track of what have been exchanged with an external system, for example to give proof that the data has been read or written.
For this, usually there is something kept in the local database, and usually it does not have to be the track of the complete data exchange.

Use PHP to search for or insert data into a remote mysql database

I have an html based application that allows users to store and search information in a mysql database. They run this on their own servers, so it isn't centralized. I'd like to add a function that allows them to see if their information corresponds to any known info in a central database, and if it isn't, they would have the option to add it to the central db. I'm not sure if the triggering script would be best placed on the client, or server side, so I'm at a loss as to where to start with this. Any script or config suggestions would be welcome.
Edit to add:
The data is preformatted, not created by the user. It consists of 7-10 fields of data that will likely be consistent with that seen by other users. The purpose is to build a troubleshooting database for users to reference or add to. The central server will be based on Q2A to allow upvotes, comments, etc.

This seems like the opposite of what Freebase does. In Freebase, users can connect to the Freebase API and check to see if something exists in the Freebase API if it does not already exist in their database. It is then up to them to cache the entry for faster retrieval in the future. Alternatively, at least in the past, the Freebase community enabled writing to the Freebase database using the MQL API.
If you are suggesting would strike me as being very involved. If you have content creators you really trust, you maybe can get away with not having any review process, but otherwise you will need some peer review and perhaps some programming. Unless you have no content standards, and it is anything goes, your database could quickly become overloaded with nonsense or things you don't want your website to be associated with (whatever those somethings might be).
Without knowing more about your database, I can't really say what those things would be, but what I will say is that if you are looking to have people throw stuff into a centralized location, you may want to (a) use something like OAuth, (b) set up some balances, because while one of your clients may think it's very important to have the 100 reasons why liberals/conservatives suck", another one of your clients may take offense. Guess who they will blame?
That being said, creating a RESTful API (don't know if you already have one or not) with a flag for insert_if_not_exists could work.
i.e. api.php?{json_string} would be picked up by a function/functions which determined what the user wanted to do in the json_string.
On the backend, your PHP function could parse it to an array very easily and if the insert_if_not_exists flag is triggered, you can create the post while you pull the data. Otherwise you just pull the data (or leave that part out if you only want to give them the option to post and not to pull in this fashion).

How to optimize cURL request to an API?

I am writing an API for an application, which will be hosted on the Cloud, so that the User's can access it through there unique Application ID's. For the time being it is working all fine and giving User's desired results. Now suddenly the question in which I have stuck is, how to handle multiple request at a time. I need some suggestions through which I can handle multiple requests to the API. Is there a way that I can optimize my code for some fast results to the User. Should I cache the common request of the User's so that I can directly give output to the User from Cached data. Or should I save the latest requested data in database and use indexing to give a fast Output to the User.
Please give suggestions, that I can write a good and fast application and for long run.

Profile your code using xdebug or xhprof.
Identify bottlenecks using real-life evidence, then eliminate or minimize the bottlenecks.
Don't blindly begin caching data under the assumption that it is a performance problem.

What is a best practice method to log visits per page / object [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Take my profile for example, or any question number of views on this site, what is the process of logging the number of visits per page or object on a website, which I presumably think includes:
Counting registered users once (this must be reflected in the db, which pages / objects the user has visited). this will also not include unregistered users
IP: log the visit of each IP per page / object; this could be troublesome as you might have 2 different people checking the same website; or you really do want to track repeat visitors.
Cookie: this will probably result in that people with multiple computers would be counted twice
other method goes here ....
The question is, what is the process and best practice to count user requests?
EDIT
I've added the computer languages to the list of tags as they are of interest to me. Feel free to include any libraries, modules, and/or extensions that achieve the task.
The question could be rephrased into:
How does someone go about measuring the number of imprints when a user goes on a page? The question is not intended to be similar to what Google analytics does, rather it should be something similar to when you click on a stackoverflow question or profile and see the number of views.

The "correct" answer varies according to the situation; primarily the most desired statistic and the availability of resources to gather and process them:
eg:
Server Side
Raw web server logs
All webservers have some facility to log requests. The trouble with them is that it requires a lot of processing to get meaningful data out and, for your example scenario, they won't record application specific details; like whether or not the request was associated with a registered user.
This option won't work for what you're interested in.
File based application logs
The application programmer can apply custom code to the application to record the stuff you're most interested in to a log file. This is similiar to the webserver log; except that it can be application aware and record things like the member making the request.
The programmers may also need to build scripts which extract from these logs the stuff you're most interested. This option might be suited to a high traffic site with lots of disk space and sysadmins who know how to ensure the logs get rotated and pruned from the production servers before bad things happen.
Database based application logs
The application programmer can write custom code for the application which records every request in a database. This makes it relatively easy to run reports and makes the data instantly accessible. This solution incurs more system overhead at the time of each request so better suited to lesser traffic sites, or scenarios where the data is highly valued.
Client Side
Javascript postback
This is a consideration on top of the above options. Google analytics does this.
Each page includes some javascript code which tells the client to report back to the webserver that the page was viewed. The data might be recorded in a database, or written to file.
Has an strong advantage of improving accuracy in scenarios where impressions get lost due to heavy caching/proxying between the client and server.
Cookies
Every time a request is received from someone who doesn't present a cookie then you assume they're new and record that hit as 'anonymous' and return a uniquely identifying cookie after they login. It depends on your application as to how accurate this proves. Some applications don't lend themselves to caching so it will be quite accurate; others (high traffic) encourage caching which will reduce the accuracy. Obviously it's not much use till they re-authenticate whenever they switch browsers/location.
What's most interesting to you?
Then there's the question of what statistics are important to you. For example, in some situations you're keen to know:
how many times a page was viewed, period,
how many times a page was viewed, by a known user
how many of your known users have viewed a specific page
Thence you typically want to break it down into periods of time to see trending.
Respectively:
are we getting more views from random people?
or we getting more views from registered users?
or has pretty much every one who is going to see the page now seen it?
So back to your question: best practice for "number of imprints when a user goes on a page"?
It depends on your application.
My guess is that you're best off with a database backed application which records what is most interesting to your application and uses cookies to trace the member's sessions.

The best practice for a hit counter depends on how much traffic you expect your site to receive. As wybiral suggested, you can implement something that writes to a database after every request. This might include the IP address if you want to count unique visitors, or it could be a simple as just incrementing a running total for each page or for each (page, user) pair.
But that requires a database write for every request, even if you just want to serve a static page. Ideally speaking, a scalable web app should serve as much as possible from an in-memory cache. Database or disk I/O should be avoided as much as possible.
So the ideal set up would be to build up some representation of the server's activity in-memory and then occasionally (say every 15 minutes) write those events to the database. You could conceivably queue up thousands of requests and then store them with a single database write.
There's a tutorial describing how to do exactly this in python using Celery and Carrot: http://packages.python.org/celery/tutorials/clickcounter.html. It also includes some examples of how to set up your database tables using Django models and what code to call whenever someone accesses a page.
This tutorial will certainly be helpful to you regardless of what you choose to implement, although this level of architecture might be overkill if you don't expect thousands of hits each hour.

Use a database to keep a record of the unique IPs (if the IP doesn't exist in the DB, create it, otherwise continue as planned) and then query the database for the number of those entities. Index this with IP and URL to store views for individual pages. You wont have to worry about tracking registered users this way, they will be totaled into the unique IP count. As far as multiple people from one IP, there's not much you can do there short of requiring an account and counting user->to->page-views similarly.

I would suggest using a persistent key/value store like Redis. If you use a list with the list key being the serialized identifier, you can store other serialized entries and use llen to find the list size.
Example (python) after initializing your Redis store:
def intializeAndPush(serializedKey, serializedValue):
if not redisStore.exists(serializedKey):
redisStore.push(serializedKey, serializedValue)
else:
if serializedValue not in redisStore.lrange(serializedKey, 0, -1):
redisStore.push(serializedKey, serializedValue)
def getSizeOf(serializedKey):
if redisStore.exists(serializedKey):
return redisStore.llen(serializedKey)
else:
return 0
Using this technique, you can use anything as serializedKey or serializedValue. If you want to store IPs with today's date or serialized login information, both are just as simple. Also, only unique serializedValues are stored since writes are locked on read (at least as I recall).

I will try and implement pixel tracking to track views on your page/object. This method is used by google (google analytics) and other high profile media companies.

Pixel tracking will be fine, since you can have point the trackingpixel to a HttpHandler specific for that purpose. That way you can seperate the load and even use some kind of queue for highload scenarios.
Also, you can incorporate user specific information in the tracking pixel such as WHO has visited the page.
eg:
<a href="fakeimages/imba.gif?uid=123&info2=a&info3=b" style="height:1px;width:1px;" />
Then you need to handle the request going to fakeimages/*.gif with a specific HttpHandler / php redirect/controller (whatever language you're using) and process the infos.
regards

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.