I am writing an API for an application, which will be hosted on the Cloud, so that the User's can access it through there unique Application ID's. For the time being it is working all fine and giving User's desired results. Now suddenly the question in which I have stuck is, how to handle multiple request at a time. I need some suggestions through which I can handle multiple requests to the API. Is there a way that I can optimize my code for some fast results to the User. Should I cache the common request of the User's so that I can directly give output to the User from Cached data. Or should I save the latest requested data in database and use indexing to give a fast Output to the User.
Please give suggestions, that I can write a good and fast application and for long run.
Profile your code using xdebug or xhprof.
Identify bottlenecks using real-life evidence, then eliminate or minimize the bottlenecks.
Don't blindly begin caching data under the assumption that it is a performance problem.
Related
I'm not sure where is the best place to ask this question, so maybe if this is the wrong place, someone can help move this question to a better group?
It has elements of programming, user experience and database, but doesn't really fit well into any one category!
I need to take data and display it in a graph on my site. This data is available from an API.
But I cant decide if it is best to just get this data from the API "live" when needed, or to save data from the API to a local (on my own server) database.
Both methods have pros and cons.
Getting the data live means more URL requests, more latency, and if the site is used by many users, may limit the API access. I 'assume' the site will always be available if using the data live. The API data is also restricted to the past 2000 historical data points.
If I use a cron job to request the data, say once an hour, and save it to my own database, then I am only calling the API once every hour. Accessing my own database should be faster than calling an API from a URL GET request when drawing my page. And if my site is up, then my database will be up, so I don't need to worry about the API site uptime. And I can store as many historical data points as I want to, if I am storing the data myself.
But it seems wasteful to simply duplicate data that is already existing elsewhere.
There could be millions of data points. Is it really sensible to store perhaps 50 millions pieces of data on my own server, when it already exists on an API?
From the user's perspective, there shouldn't be any difference as to which method I choose - other than perhaps if my site is up and the API site is down, in which case there would be missing data on my site.
I am torn between these two options and don't know how best to proceed with this.
I am calling different API's on one of my web sites. I am able to get optimal results with multi curl PHP. However, I'm noticing that the speed becomes very slow when traffic is a little high. I have read that caching is another way to speed up websites. However,my question is that can I use caching when the API calls that I am using are entirely dependent on user based inputs? Or is there any alternative solution to this.
It could be possible that maybe 1 request is taking too long to load and as a result delaying other requests.
The answer to your question depends on what kind of task user perform with the data. Basically cache can be used for all tasks related to retrieving, querying data and is not suitable for inserting, mutating or deleting data. There are many way to implement cache in your web application, but one of the easiest way is to use GET request for all user's requests that retrieve data only, and then configure the web server or a CDN to cache them.
I think it is inefficient to do a request on each page visit to collect the information. I would like to know what the theory of best practises are regarding to working with data that is received from an API.
Should the data be stored in my own database or should I just make a call to the API every time?
As example if I would use Strava to get my running results. Should I just use the result directly from the API or should I store these in a database. And than check after a certain period if there are new or updated results and update my database accordingly?
I have read some articles about RESTful API's but they only give information about how to get, delete, update the data etc.
It depends on what you are using it for...
If the information is likely not to change and is likely to be reaccessed on your site then store it.
Otherwise just get it from the API, it will always be up to date and it's less code to write if you don't have to store it.
If you are storing it and you know what information you are likely to require you could retrieve in the background at set-intervals.
You can also look at other factors when making your judgement, such as the speed of the API or if you have a API call limit.
You are probably not finding an definitive answer as there isn't one, but I hope this clarifies this for you =]
I don't know of Strava or whatever API you will use.
In my case, generally in applications there is a need to keep track of what have been exchanged with an external system, for example to give proof that the data has been read or written.
For this, usually there is something kept in the local database, and usually it does not have to be the track of the complete data exchange.
I'm working on a web service in PHP which accesses an MSSQL database and have a few questions about handling large amounts of requests.
I don't actually know what constitutes 'high traffic' and I don't know if my service will ever experience 'high traffic' but would optimisations in this area be largely attributed to the server processing speed and database access speed?
Currently when a request is sent to the server I do the following:
Open database connection
Process Request
Return data
Is there anyway I can 'cache' this database connection across multiple requests? As long as each request was processed simultaneously the database will remain valid.
Can I store user session id and limit the amount of requests per hour from a particular session?
How can I create 'dummy' clients to send requests to the web server? I guess I could just spam send requests in a for loop or something? Better methods?
Thanks for any advice
You never know when high traffic occurs. High traffic might result from your search engine ranking, a blog writing a post of your web service or from any other unforseen random event. You better prepare yourself to scale up. By scaling up, i don't primarily mean adding more processing power, but firstly optimizing your code. Common performance problems are:
unoptimized SQL queries (do you really need all the data you actually fetch?)
too many SQL queries (try to never execute queries in a loop)
unoptimized databases (check your indexing)
transaction safety (are your transactions fast? keep in mind that all incoming requests need to be synchronized when calling database transactions. If you have many requests, this can easily lead to a slow service.)
unnecessary database calls (if your access is read only, try to cache the information)
unnecessary data in your frontend (does the user really need all the data you provide? does your service provide more data than your frontend uses?)
Of course you can cache. You should indeed cache for read-only data that does not change upon every request. There is a useful blogpost on PHP caching techniques. You might also want to consider the caching package of the framework of your choice or use a standalone php caching library.
You can limit the service usage, but i would not recommend to do this by session id, ip address, etc. It is very easy to renew these and then your protection fails. If you have authenticated users, then you can limit the requests on a per-account-basis like Google does (using an API key for all their publicly available services per user)
To do HTTP load and performance testing you might want to consider a tool like Siege, which exactly does what you expect.
I hope to have answered all your questions.
I have a WordPress plugin, which checks for an updated version of itself every hour with my website. On my website, I have a script running which listens for such update requests and responds with data.
What I want to implement is some basic analytics for this script, which can give me information like no of requests per day, no of unique requests per day/week/month etc.
What is the best way to go about this?
Use some existing analytics script which can do the job for me
Log this information in a file on the server and process that file on my computer to get the information out
Log this information in a database on the server and use queries to fetch the information
Also there will be about 4000 to 5000 requests every hour, so whatever approach I take should not be too heavy on the server.
I know this is a very open ended question, but I couldn't find anything useful that can get me started in a particular direction.
Wow. I'm surprised this doesn't have any answers yet. Anyways, here goes:
1. Using an existing script / framework
Obviously, Google analytics won't work for you since it is javascript based. I'm sure there exists PHP analytical frameworks out there. Whether you use them or not is really a matter of your personal choice. Do these existing frameworks record everything you need? If not, do they lend themselves to be easily modified? You could use a good existing framework and choose not to reinvent the wheel. Personally, I would write my own just for the learning experience.
I don't know any such frameworks off the top of my head because I've never needed one. I could do a Google search and paste the first few results here, but then so could you.
2. Log in a file or MySQL
There is absolutely NO GOOD REASON to log to a file. You'd first log it to a file. Then write a script to parse this file.Tomorrow you decide you want to capture some additional information. You now need to modify your parsing script. This will get messy. What I'm getting at is - you do not need to use a file as an intermediate store before the database. 4-5k write requests an hour (I don't think there will be a lot of read requests apart from when you query the DB) is a breeze for MySQL. Furthermore, since this DB won't be used to serve up data to users, you don't care if it is slightly un-optimized. As I see it, you're the only one who'll be querying the database.
EDIT:
When you talked about using a file, I assumed you meant to use it as a temporary store only until you process the file and transfer the contents to a DB. If you did not mean that, and instead meant to store the information permanently in files - that would be a nightmare. Imagine trying to query for certain information that is scattered across files. Not only would you have to write a script that can parse the files, you'd have to right a non-trivial script that can query them without loading all the contents into memory. That would get nasty very, very fast and tremendously impair your abilities to spot trends in data etc.
Once again - 4-5K might seem like a lot of requests, but a well optimized DB can handle it. Querying a reasonably optimized DB will be magnitudes upon magnitudes of orders faster than parsing and querying numerous files.
I would recommend to use an existing script or framework. It is always a good idea to use a specialized tool in which people invested a lot of time and ideas. Since you are using a php Piwik seems to be one way to go. From the webpage:
Piwik is a downloadable, Free/Libre (GPLv3 licensed) real time web analytics software program. It provides you with detailed reports on your website visitors: the search engines and keywords they used, the language they speak, your popular pages…
Piwik provides a Tracking API and you can track custom Variables. The DB schema seems highly optimized, have a look on their testimonials page.