I'm currently managing PHP "events" for in a single instance. This is working well and is correctly implemented in my system using something similar to Laravel's events provider.
My question now concerns a system where I need to dispatch events across different instances/users.
For example, I have an account composed of multiple users. Each user is caching the account settings in session after the initial loading of the application.
Now, if a user is doing a modification to the account settings, I'd like to send an event to my other users so they update there settings.
For the time being, I'm thinking about these solutions:
Storing the events in a database table, with each users regularly checking the values, but this will require additional SQL load and would make the caching system obsolete.
Another solution would be to store a flag using REDIS. Each users can regularly check the value of the flag and reload the settings is required. It's similar to the SQL solutions above but will be much more efficient with REDIS. However the implementation would be more complexe, and it might be custom built for this specific event.
I also started to look at ways of sharing data between PHP instances and found this question which is suggesting the usage of shared memory. I'm not very familiar with this concept and I'm still looking at it, but I suppose that it may be possible to build a cross instance event system using it.
Using memcached server in PHP. I'm not familiar with that and still evaluating the possibility of building an event dispatcher system around it.
Using a message queue server. Still evaluating the possibility and also checking is existing event based system in PHP are built with it.
Is there any other solutions I could use to dispatch such events between instances?
Edit:
Proposition 3 has been rejected has shared memory is done in the same server, I'm working with server clustering for the application side.
Related
I was checking a part of my application in which I connect to elasticsearch host server and then I realized for every time the front-end sends an report request to my back-end I'm creating an instance of elasticsearch client class using the following code :
$elasticClient = ClientBuilder::create()->setHosts($this->setHostsParams())->build();
Since our application sends about 20 requests to the back-end by loading the first page, I was considering if PHP's elasticsearch library might be capable of optimizing the initiation phase, or if anyone has a better solution for this, or it might not be a big of a deal after all and it's not a real overhead!?
PS : I did some research with it and didn't find any resources covering this subject.
Sharing an object instance is already discussed here and elsewhere so I'm not going to go into that.
What I'd point out, though, is there there's an elasticsearch API called _msearch which enables you to send multiple search payloads at the same time and the system will respond after all the individual requests have resolved. Here's some sample PHP usage.
This might be useful if you need all your ~20 requests resolved at once -- though it may be useless if you defer some of those requests only after, say, a user scrolls down and what not.
I've got a PHP app that stores arbitrary config info in a file. I would like to read that file once, when the app first starts, save it as some kind of application state variable, and leverage it across potentially thousands of user sessions. My Google foo is usually pretty good but in this case the only thing I'm able to come up with is the $_SESSION variable. Using it means reading the config file once per user session, which could mean reading it thousands of times a minute in high-volume installations, which seems inefficient.
When I worked with .NET web apps there was an idea of an application session that could be used to persist app configuration information across multiple user sessions. Does PHP have a similar concept?
Does php provide an API for cross-session data management? No
Does php provide a mechanism for reading and updating data? Yes there's lots of them
While this sounds like a session handler which is shared across multiple users, it's implementation is very different. By default (and by necessity) php's sessions are blocking. If the access to this shared dataset was blocking then you would severely limit concurrency.
Given that the access to the data must be non-blocking, how do you mediate concurrent updates to the shared data? A lot depends on the frequency of the updates. But there's also questions about capacity and whether you need to support multiple nodes.
Any one-size-all solution for the functionality is going to be severely hampered in capacity and/or performance. There are lots of products PHP will integrate with to provide a suitable storage substrate, however (leaving aside the logic of the interface for your super-session) it is not in the nature of open source software to package up third party products and hide them behind APIs.
I'm developing a web app using Laravel (a PHP framework). The app is going to be used by about 30 of my co-workers on their Windows laptops.
My co-workers interview people on a regular basis. They will use the web app to add a new profile to a database once they interview somebody for the first time and they will append notes to these profiles on subsequent visits. Profiles and notes are stored using MySQL, but since I'm using Laravel, I could easily switch to another database.
Sometimes, my co-workers have to interview people when they're offline. They might visit a group of interviewees, add a few profiles and add some notes to existing ones during a session without any internet access.
How should I approach this?
With a local web server on every laptop. I've seen applications ship with some kind of installer including a LAMP stack, but I can't find any documentation on this.
I could install the app and something like XAMPP on every laptop
myself. That would be possible, but in the future more people might use the app and not all of them might be located nearby.
I could use Service Workers, maybe in connection with a libray such
as UpUp. This seems to be the most elegant approach.
I'd like to give option (3) a try, but my app is database driven and I'm not sure whether I could realize this approach:
Would it be possible to write all the (relevant) data from the DB to - let's say - a JSON file which could be accessed instead of the DB when in offline mode? We don't have to handle much data (less than 100 small data records should be available during an interview session).
When my co-workers add profiles or notes in offline mode, is there any "Web Service" way to insert data into the db that has been entered?
Thanks
Pida
I would think of it as building the app in "two parts".
First, the front end uses ajax calls to the back end (which isn't anything but a REST API). If there isn't any network connection, store the data in the browser using local storage.
When the user later has network connection, you could send the data that exists in the local storage to the back end and clear the local storage.
If you add web servers on the laptops, the databases and info will only be stored on their local laptops and would not be synced.
You can build what you describe using service workers to cache your site's static content to make it available offline, and a specific fetch handler in the service worker to detect a failed PUT or POST and queue the data in IndexedDB. You'd then periodically check IndexedDB for any queued data when your web app is loaded, and attempt to resend it.
I've described this approach in more detail at https://developers.google.com/web/showcase/case-study/service-workers-iowa#updates-to-users-schedules
That article assumes the use of the sw-precache library for caching your site's static assets, and the sw-toolbox library to provide runtime fetch handlers that check for failed business-logic requests. It also uses a promise-based IndexedDB wrapper called simpleDB although I'd probably go with the more recent idb library nowadays.
I have created a PHP+MYSQL web app and I am trying to implement now a logging system to store and track some actions of each user.
The purpose of this is the following: track the activity of each user's session by logging the IP+time+action, then see which pages he accessed later on by logging time+pagename; for each user there will be a file in the format: log{userid}_{month}.log
Each log will then be viewed only by the website owner, through a custom admin panel, and the data will be used only for security purposes (as in to show to the user if he logged in from a different IP or if someone else logged in from a different IP and to see which areas of the website the user accessed during his login session).
Currently, I have a MYSQL MyISAM table where I store the userid,IP,time,action and the app is still not launched, but we intend to have very many users (over 100k), and using a database for this solutions feels like suicide.
So what do you suggest? How should the logging be done? Using files, using a table in the current database, using a separate database? Are there any file-logging frameworks available for PHP?
How should the reading of the file be done then? Read the results by row?
Thank you
You have many options, so I'll speak from my experience running a startup with about 500k users, 100k active every month, which seems to be in your range.
We logged user actions in a MySQL database.
Querying your data is very easy and fast (provided good indexes)
We ran on Azure, and had a dedicated MySQL (with slaves, etc) for storing all user data, including logs. Space was not an issue.
Logging to MySQL can be slow, depending on everything you are logging, so we just pushed a log to Redis and had a Python app read it from Redis and insert into MySQL in the background. This made that logging basically had no impact on loading times.
We decided to log in MySQL for user actions because:
We wanted to run queries on anything at any time without much effort. The structured format of the user action logs made that incredibly easy to do.
It also allows you to display certain logs to users, if you would require it.
When we introduced badges, we had no need to parse text logs to award badges to those who performed a specific action X number of times. We simply wrote a query against the user action logs, and the badges were awarded. So adding features based on actions was easy as well.
We did use file logging for a couple of application logs - or things we did not query on a daily basis - such as the Python app writing to the database, Webserver access and error logs, etc.
We used Logstash to process those logs. It can simply hook into a log file and stream it to your Logstash server. Logstash can also query your logs, which is pretty cool.
Advanced uses
We used Slack for team communications and integrated the Python database writing app with it, this allowed us to send critical errors to a channel (via their API) where someone could action a fix immediately.
Closing
My suggestion would be to not over think it for now, log to MySQL, query and see the stats. Make updates, rinse and repeat. You want to keep the cycle between deploy and update quick, so making decisions from a quick SQL query makes it easy.
Basically what you want to avoid is logging into a server, finding a log and grep your way through it to find something, the above achieved that.
This is what we did, it is still running like that and we have no plans to change it soon. We haven't had any issues where we could not find anything that we needed. If there is a massive burst of users and we scale to 1mil monthly active users, then we might change it.
Please note: whichever way you decide to log, if you are saving the POST data, be sure to never do that for credit card info, unless you are compliant. Or rather use Stripe's JavaScript libraries.
If you are sure that reading the log will mainly target one user at a time, you should consider partioning your log table:
http://dev.mysql.com/doc/refman/5.1/en/partitioning-range.html
using your user_id as partitioning key.
Maximum number of partitions being 1024, you will have one partition storing 1/1000 of your 100k users, which is something reasonable.
Are there any file-logging frameworks available for PHP?
There is this which is available on packagist: https://packagist.org/packages/psr/log
Note that it's not a file logging framework but an API for a logger based on the PSR-3 standard from FIG. So, if you like, it's the "standard" logger interface for PHP. You can build a logger that implements this interface or search around on packagist for other loggers that implement that interface (either file or MySQL based). There are a few other loggers on packagist (teacup, forestry) but it would be preferable to use one that sticks to the PSR standard.
We do logging with the great tool Graylog.
It scales as high as you want it, has great tools on data visualization, is incredibly fast even for complex querys and huge datasets, and the underlying search-enginge (elasticsearch) is schemaless. The latter may be an advantage as you get more possibilities on extending your logs without the hassle mysql-schemas can give you.
Graylog, elasticsearch and mongodb (which is used as to save the configuration of graylog and its webinterface) are easily deployable via tools like puppet, chef and the like.
Actually logging to graylog is easy with the already mentioned php-lib monolog.
Of curse the great disadvantage here is that you have to learn a bunch of new tools and softwares. But it is worth it in my opinion.
The crux of the matter is the data you are writing is not going to be changed. In my experience in this scenario I would use either:
MySQL with a blackhole storage engine. Set it up right and its blisteringly fast!
Riak Cluster (NoSQL solution) - though this may be a learning curve for you it might be one you may need to eventually take anyway.
Use SysLog ;)
Set it up on another server and it can log all of your processes seperately (such as networking, servers, sql, apache, and your php).
It can be usefull for you and decreasing the time spend of debugging. :)
I am building a web-application and have a couple of quick questions. From what I learnt, one should not worry about scalability when initially building the app and should only start worrying when the traffic increases. However, this being my first web-application, I am not quite sure if I should take an approach where I design things in an ad-hoc manner and later "fix" them. I have been reading stories about how people start off with an app that gets millions of users in a week or two. Not that I will face the same situation but I can't help but wonder, how do these people do it?
Currently, I bought a shared hosting account on Lunarpages and that got me started in building and testing the application. However, I am interested in learning how to build the same application in a scalable-manner using the cloud, for instance, Amazon's EC2. From my understanding, I can see a couple of components:
There is a load balancer that first receives requests and then decides where to route each request
This request is then handled by a server replica that then processes the request and updates (if required) the database and sends back the response to the client
If a similar request comes in, then a caching mechanism like memcached kicks into picture and returns objects from the cache
A blackbox that handles database replication
Specifically, I am trying to do the following:
Setting up a load balancer (my homework revealed that HAProxy is one such load balancer)
Setting up replication so that databases can be synchronized
Using memcached
Configuring Apache to work with multiple web servers
Partitioning application to use Amazon EC2 and Amazon S3 (my application is something that will need great deal of storage)
Finally, how can I avoid burning myself when using Amazon services? Because this is just a learning phase, I can probably do with 2-3 servers with a simple load balancer and replication but until I want to avoid paying loads of money accidentally.
I am able to find resources on individual topics but am unable to find something that starts off from the big picture. Can someone please help me get started?
Personally, I think you should be considering how your app will scale initially - as otherwise you'll run into problems down the line.
I'm not saying you need to build it initially as a multi-server system, but if you think you'll need to do it later, be mindful of the concerns now.
In my experience, this includes things like:
Sessions. Unless you use 'sticky' load balancing, you will have to have some way of sharing session state between servers. This probably means storing session data on either shared storage, or in a DB.
File uploads and replication. If you allow users to upload files, or you have a CMS that allows you to upload images/documents, it needs to cater for the fact that these files will also need to find their way onto other nodes in your cluster. However, if you've gone down the shared storage route mentioned above, this should cover it.
DB scalability. If you're using traditional DB servers, you might want to think about how you'll implement scalability at that level. This may mean coding your app so you use one connection string for reads, and another for writes. Then, you are free to implement replication with one master node handling the inserts/updates cascading the changes to read only nodes that handle the bulk of the work.
Middleware. You might even want to go down the route of implementing some kind of message oriented middleware solution to completely hand off business logic functions - this will give you a great level of flexibility in how you wish to scale this business logic layer in the future. Although initially this will be a lot of complication and work for not a great deal of payoff.
Have you considered playing around with VMs first? You can run 2-3 VMs on your local machine and set them up like you would actual servers, they just won't be able to handle real traffic levels. If all you're looking for is the learning experience, it might be an ideal way to go about it.