I'm working (for the first time) on developing a PHP application driven by a PHP RESTful API (probably using peej/Tonic). Coming from an application with direct access which might make 20 different database calls during the course of a page load, I am trying to reconcile the fact that 20 API calls = 20x handshakes (which can be improved by Guzzle persistent connections) but also 20x connections to the database.
I believe that with better programming and planning, I can get my required API calls down to 4-5 per page. At this point:
a) Is it not worth considering the latency of 5x database connections + 5x handshakes per page load on account of all the other available optimisations?
b) Is there an existing method by which this can be mitigated that I've thus far failed to find?
c) I believe it violates the principles of RESTful programming but if I had a single API method which itself gathered information from other API endpoints (for instance, GET suppliers WHERE x=y then GET products for supplier), is there a documented method for internal API interaction (particularly within peej/Tonic or other frameworks).
Thank you all for your wisdom in advance.
Remember that the client should be "making 'a request' of the server," which is obliged to fulfill "that request." The server might "execute 20 different database queries" to prepare its response, and the client need not know nor care.
The client's point-of-view becomes, "I care what you tell me, not how you did it."
If you did wish to send query-responses directly to the client, for the client to "do the dirty-work" with these data, then you could still design your server request so that the server did many queries, all at once, and sent all of the result-sets back ... in just one exchange.
Your first priority should be to effectively minimize the number of exchanges that take place. The amount of data returned (within reason) is secondary.
Also consider that, "when the server does it, the work is naturally synchronized." When the client issues multiple asynchronous requests, those requests are, well, "asynchronous." Consider which strategy will be easier for you to debug.
If the server is given "a request to do," it can validate the request (thus checking for client bugs), and perform any number of database operations, perhaps in a TRANSACTION. This strategy, which puts the server into a very active role, is often much less complex than an interaction that is driven by the client with the server taking a passive role.
Related
What approach, mechanisms (& probably code) one should apply to fully implement Model-to-Views data update (transfer) on Model-State-Change event with pure PHP?
If I'm not mistaken, MVC pattern states an implicit requirement for data to be sent from Model layer to all active Views, specifying that "View is updated on Model change". (otherwise, it doesn't make any sense, as users, working with same source would see its data non-runtime and absolutely disconnected from reality)
But PHP is a scripting PL, so it's limited to "connection threads" via processes & it's lifetime is limited to request-response cycle (as tereško kindly noted).
Thus, one has to solve couple issues:
Client must have a live tunnel connection to server (Server Sent Events),
Server must be able to push data to client (flush(), ob_flush()),
Model-State-Change event must be raised & related data packed for transfer,
(?) Data must be sent to all active clients (connected to same exact resource/URL) together, not just one currently working with it's own processes & instance of ModelClass.php file...
UPDATE 1: So, it seems that "simultaneous" interaction with multiple users with PHP involves implementation of WEB Server over sockets of some sort, independent of NGINX and others.... Making its core non-blocking I/O, storing connections & "simply" looping over connections, serving data....
Thus, if I'm not mistaken the easiest way is, still, to go and get some ready solution like Ratchet, be it a 'concurrency framework' or WEB server on sockets...
Too much overhead for a couple of messages a day, though...
AJAX short polling seems to be quite a solution for this dilemma....
Is simultaneous updating multiple clients easier with some different backend than PHP, I wonder?.. Look at C# - it's event-based, not limited to "connection threads" and to query-reply life cycle, if I remember correctly... But it's still WEB (over same HTTP?)...
I would like to implement an API using php and mysql technologies that can handle several thousands of requests per second.
I haven't did this kind of API before. If you have an experienced to implement similar task could you please tell me what are the steps?
How can I implement such kind of API that can handle thousands of request per second?
I would be glad if you could explain with sample codes.
Thanks in advance for your help.
Based on the details described in the post, you likely want to use an asynchronous, stateless architecture. So requests don’t block resources and can scale easier (always sounds easier than actually doing it ;)).
Without knowing to what other services these servers would connect (it certainly doesn’t make things easier), I’d go for Elixir/Erlang as programming language and use Phoenix as a framework.
You get a robust functional language which comes with a lot of great built-in features/modules (e.g. mnesia, roll/unroll versions while being live) and scales well (good in utilizing all cores of your server).
If you need to queue up requests to the 2nd tier servers AMQP client/server (e.g. RabbitMQ) might be a good option (holds/stores the requests for the servers).
That works pretty okay if it’s stateless, in form of client asks one thing and the server responds once and is done with the task. If you have many requests because the clients ask for updates every single second, then you’re better switching to a stateful connection and use WebSockets so the server can push updates back to a lot of clients and cuts a lot of chatter/screaming.
All of this writing is from a ‘high up view’. In the end, it depends on what kind of services you want to provide. As that narrows down what the ‘suitable tool’ would be. My suggestion is one possibility which I think isn’t far off (Node.js mentioned before is also more than valid).
Well you need to consider several factors such as:
Authenticating the API. Your API should be called by valid users that are authorized and authenticated
Caching API results. Your API should cache the results of API call. This will allow your API to handle requests more quickly, and it will be able to handle more requests per second. Memcache can be used to cache results of API call
The API architecture. RESTFul APIs have less overhead as compared to SOAP based APIs. SOAP based APIs have better support for authentication. They are also better structured then RESTFul APIs.
API documentation. Your API should be well documented and easy for users to understand.
API scope. Your API should have a well defined scope. For example will it be used over the internet as a public API or will it be used as private API inside corporate intranet.
Device support. When designing your API you should keep in mind the devices that will consume your API. For example smart phones, desktop application, browser based application, server application etc
API output format. When designing your API you should keep in mind the format of the output. For example will the output contain user interface related data or just plain data. One popular approach is known as separation of concerns (https://en.wikipedia.org/wiki/Separation_of_concerns). For example separating the backend and frontend logic.
Rate limiting and throttling. Your API should implement rate limiting and throttling to prevent overuse and misuse of the API.
API versioning and backward compatibility. Your API should be carefully versioned. For example if you update your API, then the new version of your API should support older version of API clients. Your API should continue to support the old API clients until all the API clients have migrated to the new version of your API.
API pricing and monitoring. The usage of your API should be monitored, so you know who is using your API and how it is being used. You may also charge users for using your API.
Metric for success. You should also decide which metric to use for measuring the success of your API. For example number of API calls per second or monitory earnings from your API. Development activities such as research, publication of articles, open source code, participation in online forums etc may also be considered when determining the success of your API.
Estimation of cost involved. You should also calculate the cost of developing and deploying your API. For example how much time it will take you to produce a usable version of your API. How much of your development time the API takes etc.
Updating your API. You should also decide how often to update your API. For example how often should new features be added. You should also keep in mind the backward compatibility of your API, so updating your API should not negatively affect your clients.
Good answer, I think one thing to keep in mind is where the bottleneck is. Many times, the bottleneck isn't the API server itself but the data access patterns with the persistence layer.
Think about how you access your data. For posting new items, a lot of times the processing can be delayed and processed async to the original request. For example if resizing an image or sending an email, you can integrate RabmitMQ or SQS to queue up a job, which can be processed later by workers. Queues are great in buffering work so that if a server goes down, stuff is just queued up to be processed once back online.
On the query side, it is important to understand how indexing works and how data is stored. There are different types of indices, for example hash tables can give you constant access time, but you cannot perform range queries with hash tables. The easiest is if you have simple decentralized data objects queried by identifiers which can be stored in a index. If you're data is more complex where you need to do heavy joins or aggregations, then you can look at precomputed values stored in something like Redis or memcache.
first of all i'm not sure about this question title so please correct me if it's not, thanks.
About:
I have two projects based on PHP: first project ( CLIENT ) who connects to second ( API ) via curl. In API project are done some calculations which are performed on CLIENT send data.
Problem:
If API project will have downtime by any issues or just slows down CLIENT must wait until API returns results, so it slows down too. Projects are in intensive development so calculations will increase so delay too.
Question:
How i can avoid mentioned problem, perfectly API must do not impact performance of CLIENT. Maybe there is any design patterns or something?
I have read about ASYNCH PHP, caching patterns but still not found solution. If there's any solutions ( patterns ) it would be great to have examples in practise!
P.S. Request doesn't slows, slows calculations. And i agree that first of all they should be optimized.
P.P.S. Total requests are more than 60 per minute ( > ~60 / min ).
There are two approaches, both work but have different pros and cons...
asynchronous processing, meaning that the client does not wait for each single call until it returns (its response returns), but moves on and relies on a mechanism like a callback or similar to handle the response once it comes in. This is for example what is typically done in web clients using javascript and ajax for remote calls. The makes the client considerably more fluent, but obviously involves a higher complexity of code and UI.
queue based processing, meaning that the client does not at all do any such potentially blocking requests directly, but only creates jobs instead inside some queuing mechanism. Those job can be handled then one by one by some scheduler which also must take care of handling the response. This is extremely powerful if it comes to scaling and robustness against load peaks and outages of the API, but the implementation is much more expensive. Also the overall task must accept that response times are not guaranteed at all, typically the responses will take longer than in the first approach so cannot be shown interactively.
I'm working on a web service in PHP which accesses an MSSQL database and have a few questions about handling large amounts of requests.
I don't actually know what constitutes 'high traffic' and I don't know if my service will ever experience 'high traffic' but would optimisations in this area be largely attributed to the server processing speed and database access speed?
Currently when a request is sent to the server I do the following:
Open database connection
Process Request
Return data
Is there anyway I can 'cache' this database connection across multiple requests? As long as each request was processed simultaneously the database will remain valid.
Can I store user session id and limit the amount of requests per hour from a particular session?
How can I create 'dummy' clients to send requests to the web server? I guess I could just spam send requests in a for loop or something? Better methods?
Thanks for any advice
You never know when high traffic occurs. High traffic might result from your search engine ranking, a blog writing a post of your web service or from any other unforseen random event. You better prepare yourself to scale up. By scaling up, i don't primarily mean adding more processing power, but firstly optimizing your code. Common performance problems are:
unoptimized SQL queries (do you really need all the data you actually fetch?)
too many SQL queries (try to never execute queries in a loop)
unoptimized databases (check your indexing)
transaction safety (are your transactions fast? keep in mind that all incoming requests need to be synchronized when calling database transactions. If you have many requests, this can easily lead to a slow service.)
unnecessary database calls (if your access is read only, try to cache the information)
unnecessary data in your frontend (does the user really need all the data you provide? does your service provide more data than your frontend uses?)
Of course you can cache. You should indeed cache for read-only data that does not change upon every request. There is a useful blogpost on PHP caching techniques. You might also want to consider the caching package of the framework of your choice or use a standalone php caching library.
You can limit the service usage, but i would not recommend to do this by session id, ip address, etc. It is very easy to renew these and then your protection fails. If you have authenticated users, then you can limit the requests on a per-account-basis like Google does (using an API key for all their publicly available services per user)
To do HTTP load and performance testing you might want to consider a tool like Siege, which exactly does what you expect.
I hope to have answered all your questions.
My setup: Currently running a dedicated server with an Apache, PHP, MYSQL.
My DB is all set up and stores everything correctly. I'm just trying to figure out how to best display things live in an efficient way.
This would be a live challenging system for a web based game.
User A sends a challenge to User B
User B is alerted immediately and must take action on whether to
Accept or Decline
Once User B accepts he and User A are both taken to a specific page
that is served up by the DB (nothing special happens on this
page,and they dont need to be in sync or anything)
The response from User B is a simple yes or no, no other parameters are set by User B, the page they are going to has already been defined when User A sends the challenge.
Whichever config I implement for this challenge system, I am assuming it will also work for instant sitewide notifications. The only difference is that notifications do not require an instant response from User B.
I have read up on long polling techniques, comet etc.. But im still looking for opinions on the best way to achieve this, and make it scalable.
I am open to trying anything as long as it will work with (or in tandem) to my current PHP and MYSQL set up. Thanks!
You're asking about Notifications from a Server to a Client. This can be implemented either by having the Client poll frequently for changes, or having the Server hold open access to the Client, and pushing changes. Both have their advantages and disadvantages.
EDIT: More Information
Pull Method Advantages:
Easy to implement
Server can be pretty naïve about who's getting data
Pull Method Disadvantages:
Resource intensive on the client side, regardless of polling frequency
Time vs. Resource debacle: More frequent polls mean more resource utilization. Less resource utilization means less immediate data.
Push Method Advantages:
Server has more control overall
Data is immediately sent to the client
Push Method Disadvantages:
Potentially very resource intensive on the server side
You need to implement some way for the server to know how to reach each individual client (for example, Apple uses Device UUIDs for their APNS)
What Wikipedia has to say (some really good stuff, actually): Pull, Push. If you are leaning toward a Push model, you might want to consider setting up your app as a Pushlet