Storing high traffic events from web pages on Google Cloud

Storing high traffic events from web pages on Google Cloud - php

I am trying to build (to keep it as simple as possible) "something like Google Analytics". It means: I want to store objects with few informations (size <2KB) from web page into some storage and be able to query it.
I have JS code that sends those event objects to the PHP endpoint on Google App Engine. This endpoint then inserts it into Google BigQuery. Here comes my problem: the insertion is done via Google API PHP library - REST request. So it performs HTTP request which is very slow.
My question here is: is there a better way to store the events in Google Cloud environment? Is better (and more cost effective?) to use PubSub or Redis for storing events there and have some workers in the background that loads this queue to the BigQuery?
Any idea how to do this as efficient (both in performance and cost) as possible would be greatly appreciated!

If I had to do this I would first make the endpoint handler save the raw data into a push queue, because enqueuing stuff is relatively fast on App Engine. The processing of the data and the BigQuery API calls would be done later in the task queue.
I guess the performance and cost you can get also vary a bit depending on the App Engine language(PHP,Go,Java,...).

Related

How can handle thousands of requests per second using php and mysql?

I would like to implement an API using php and mysql technologies that can handle several thousands of requests per second.
I haven't did this kind of API before. If you have an experienced to implement similar task could you please tell me what are the steps?
How can I implement such kind of API that can handle thousands of request per second?
I would be glad if you could explain with sample codes.
Thanks in advance for your help.

Based on the details described in the post, you likely want to use an asynchronous, stateless architecture. So requests don’t block resources and can scale easier (always sounds easier than actually doing it ;)).
Without knowing to what other services these servers would connect (it certainly doesn’t make things easier), I’d go for Elixir/Erlang as programming language and use Phoenix as a framework.
You get a robust functional language which comes with a lot of great built-in features/modules (e.g. mnesia, roll/unroll versions while being live) and scales well (good in utilizing all cores of your server).
If you need to queue up requests to the 2nd tier servers AMQP client/server (e.g. RabbitMQ) might be a good option (holds/stores the requests for the servers).
That works pretty okay if it’s stateless, in form of client asks one thing and the server responds once and is done with the task. If you have many requests because the clients ask for updates every single second, then you’re better switching to a stateful connection and use WebSockets so the server can push updates back to a lot of clients and cuts a lot of chatter/screaming.
All of this writing is from a ‘high up view’. In the end, it depends on what kind of services you want to provide. As that narrows down what the ‘suitable tool’ would be. My suggestion is one possibility which I think isn’t far off (Node.js mentioned before is also more than valid).

Well you need to consider several factors such as:
Authenticating the API. Your API should be called by valid users that are authorized and authenticated
Caching API results. Your API should cache the results of API call. This will allow your API to handle requests more quickly, and it will be able to handle more requests per second. Memcache can be used to cache results of API call
The API architecture. RESTFul APIs have less overhead as compared to SOAP based APIs. SOAP based APIs have better support for authentication. They are also better structured then RESTFul APIs.
API documentation. Your API should be well documented and easy for users to understand.
API scope. Your API should have a well defined scope. For example will it be used over the internet as a public API or will it be used as private API inside corporate intranet.
Device support. When designing your API you should keep in mind the devices that will consume your API. For example smart phones, desktop application, browser based application, server application etc
API output format. When designing your API you should keep in mind the format of the output. For example will the output contain user interface related data or just plain data. One popular approach is known as separation of concerns (https://en.wikipedia.org/wiki/Separation_of_concerns). For example separating the backend and frontend logic.
Rate limiting and throttling. Your API should implement rate limiting and throttling to prevent overuse and misuse of the API.
API versioning and backward compatibility. Your API should be carefully versioned. For example if you update your API, then the new version of your API should support older version of API clients. Your API should continue to support the old API clients until all the API clients have migrated to the new version of your API.
API pricing and monitoring. The usage of your API should be monitored, so you know who is using your API and how it is being used. You may also charge users for using your API.
Metric for success. You should also decide which metric to use for measuring the success of your API. For example number of API calls per second or monitory earnings from your API. Development activities such as research, publication of articles, open source code, participation in online forums etc may also be considered when determining the success of your API.
Estimation of cost involved. You should also calculate the cost of developing and deploying your API. For example how much time it will take you to produce a usable version of your API. How much of your development time the API takes etc.
Updating your API. You should also decide how often to update your API. For example how often should new features be added. You should also keep in mind the backward compatibility of your API, so updating your API should not negatively affect your clients.

Good answer, I think one thing to keep in mind is where the bottleneck is. Many times, the bottleneck isn't the API server itself but the data access patterns with the persistence layer.
Think about how you access your data. For posting new items, a lot of times the processing can be delayed and processed async to the original request. For example if resizing an image or sending an email, you can integrate RabmitMQ or SQS to queue up a job, which can be processed later by workers. Queues are great in buffering work so that if a server goes down, stuff is just queued up to be processed once back online.
On the query side, it is important to understand how indexing works and how data is stored. There are different types of indices, for example hash tables can give you constant access time, but you cannot perform range queries with hash tables. The easiest is if you have simple decentralized data objects queried by identifiers which can be stored in a index. If you're data is more complex where you need to do heavy joins or aggregations, then you can look at precomputed values stored in something like Redis or memcache.

Sending data from Server to Client(Android App)

So I'm working on a simple Server/Client Android application where the Android app takes a picture and then sends it to a server.
After that the server has some backend code that will do some sort of processing to the image and generate a new image.
I currently have everything up to here working.
What I now need is a way for the server to let the client(android app) know when the image is ready.
Can somebody point me in the direction of the proper way of doing this? I was thinking something simple like a boolean value of whether or not the new image exists yet. My issue is that the processing may take a second or two so if I were to just simply do a GET there is a good chance the new image is not ready yet/does not exist.
Ultimately I want to display the new processed image in the android app. My current method is just to wait 1 second or two before doing the GET but that seems like a bad a way to do this.

Yes, the way you are doing it is completely useless.
You need to do some RestFul service there. When client requests to some certain api, you need to hold it there, do your image processing and respond in the same http request back.
RestFul API
Regarding to loader effect, when you implement the restful api on your server side it's easy to do it in the client. In Android you can use AsyncTask, which handles some background operation with onPreExecute and onPostExecute callbacks inside. So you can start your loader animation in onPreExecute and finish and replace the new image in onPostExecute methods.
You can also check out this discussion. SOF Question

Only a couple of ways, really ... simply have the client "call back" if the server process is anything that exceeds several seconds (a timer on the client). This works, but not the most elegant way of handling things.
My preferred way of tackling long latency server response issues is via a "push" notification to the handset. That way, the handset is free to continue forward with other tasks. But, once the server has something ready it sends the notification - handset receives notification, and "calls home" to pick up the waiting payload. I've been building mobile systems like this since BlackBerry came out with MDS back in 2002. It's an elegant approach, in my opinion. Amazon, Apple, and Google ... they all have their take on this infrastructure that models what RIM pioneered. Check them all out before making any decisions as they each have their own restrictions, pros and cons, costs, etc., etc.

Synchronise database commits in Laravel with AngularJS models/views

We have an web application built in PHP Laravel, which exposes a bunch of schema objects via JSON API calls. We want to tie changes in our schema to AngularJS in such a way that when the database updates, the AngularJS model (and subsequently the view) also updates, in real-time.
In terms of the database, it can be anything, such as mySQL, SQL Server, etc. There's a couple of ways we're thinking about this:
mySQL commits fire some sort of event at Laravel, which then fires a call to all relevant/listening models/views in AngularJS.
Before any data is changed (edited/added) - Laravel fires an event to AngularJS. In other words, after any successful DB commit, another "thing" is done to notify.
The second seems the obvious, clean way of doing this - since the database is not involved lower down the stack. Is there any better way of doing this?
This question is related:
How to implement automatic view update as soon as there is change in database in AngularJs?
but I don't quite understand the concept of a "room" in the answer.
What (if any) is the best way to efficiently tie database commits (pushing) to the AngularJS view (to render changes)? We want to avoid polling a JSON API for changes every second, of course.

I've also had a similar requirements on one of my projects. We solved it with using node.js and sockjs. Flow is like this:
There is a node.js + SockJS server to which all clients connect.
When db is updated, laravel issues a command to node.js via http (redis also a posibility)
Node.js broadcasts the event to all interested clients (this depends upon your business logic)
Either the client reloads the data required or if message is small enough it can be included in node.js broadcast.
Hope this helps. There is no clean way to do this without using other technologies (node.js / web socket / SSE etc). Much of it depends up on the configuration your clients will be using as well.

Server-side analytics and server-side logging in particular (PHP)

I need to log details in PHP of analytics and usage.
I'm looking at various possibilities:
- Google Analytics server-side
- segment.io
- Just adding a record to a DB with PHP
My concern is how much additional processing this will take on my server. Of course Google Analytics' JavaScript implementation won't use anything on my server, but my server-side method of course will.
I also notice that on https://segment.io/docs/integrations/google-analytics they mention that "Server-side Google Analytics is being deprecated due to difficulty of use" - what does this mean?
So basically, I want to implement some basic analytics storing (count number of hits to a URL + some other basic info) server-side - what's the best way to do this considering all things? I only use the PHP language.
It seems that adding a record to the DB every page view might be a little too much.

Segment.io can actually give you the flexibility of all three of these. Using the php library https://segment.io/libraries/php you can start sending events from your server. The library is designed to queue and batch to maximize server efficiency.
Once the events leave your server, they'll go to Segment.io's servers. Once there, we can route the data to Google Analytics.
Additionally, you could use the "Webhooks" integration on Segment.io to set your own server as a receiving endpoint for the data in real-time, so that you could host your own analytics DB separately from the rest of your infrastructure quite easily/cleanly.
https://segment.io/docs/integrations/webhooks

How to persist a MongoDB cursor in between requests?

In the contest of a web server:
In order to avoid re-querying (using find), one could try and keep between requests the cursor reference returned by find. The Cursor object is a complex object storing for example socket connections. How to store such an object to avoid re-querying on subsequent web requests? I am working in Node.js but any advice is helpful (regardless of the language: rails, C#, Java, PHP).
(I am using persistent sessions)

Facebook and Twitter's stream features are more complex than a simple query to a db. Systems like this tend to have two major backend components in their architecture, serving you data: slow and fast.
1) The first backend system is your database, accessed via a query to get a page of results from the stream (being someone's twitter feed or their fb feed). When you page to the bottom or click 'more results' it will just increment the page variable and query against the API for that page of your current stream.
2) The 2nd is a completely separate system that is sending realtime updates to your page via websockets or paging against an API call. This is the 'fast' part of your architecture. This is probably not coming from a database, but a queue somewhere. From this queue, handlers are sending your data to your page, which is a subscriber.
Systems are designed like this because, to scale enormously, you can't depend on your db being updated in real time. It's done in big batches. So, you run a very small subset of that data through the fast part of your architecture, understanding that the way the user gets it from the 'fast' backend may not look exactly how it will eventually look in the 'slow' backend, but it's close enough.
So... moral of the story:
You don't want to persist your db cursor. You want to think 1) do I need updates to be realtime 2) and if so, how can I architect my system so that a first call gets me most of my data and a 2nd call/mechanism can keep it up to date.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.