I'm coming from a PHP background and I'm a bit confused about how to safely use resources with Golang. My main concern is, in a web context, with PHP, scripts are usually short-lived (HTTP request / response lifetime), but with Golang they're supposed to run forever (because the Golang program acts as a web server and a web application at the same time).
So, when it comes to deal with database connections, log files, I often see that they should be opened once and not for each request, which makes sense. However how stable is it to do this?
For example, if I open a database connection, how can I be sure it won't break at some point? (if the database decides to kill it for some reason, or if my machine looses internet access, would the connection become valid again when I gain internet access later on?) Same for log files, with PHP it's not a problem for a sysadmin to setup log rotation, however in Golang it would break the file handle I think (if the program doesn't know about this)?
I'd really like to work the Golang way and not open/close those resources for each request but I'm not sure what is the "safe" way to do this, are there any recommendations for this? Or is there built-in features for those concerns?
In Go, there are built-in packages that will handle both log files and databases in a Go idiomatic way.
They are:
database/sql: https://golang.org/pkg/database/sql
log: https://golang.org/pkg/log
In Go, as with any language where long running daemon's can be created, error checking and error handling will be important.
Related
I currently finished building a Web server who's main responsibility is to simply take the contents of the body data in each http post request and write it to a log file. The contents of the post data is obfuscated when received. So i'm un obfuscating the post data and writing it to a log file on the server. The contents after obfuscated is a series of random key value pairs that differ between every request. It is not fixed data.
The server is running Linux with 2.6+ kernel. Server is configured to handle heavy traffic (open files limit 32k, etc). The application is written in Python using web.py framework. The http server is Gunicorn behind Nginx.
After using Apache Benchmark to do some load testing, I noticed that it can handle up to about 600-700 requests per second without any log writing issues. Linux natively does a good job at buffering. Problems start to occur when more than this many requests per second attempt to write to the same file at same moment. Data will not get written and information will be lost. I know that "the writing directly to a file" design might not have been the right solution from the get go.
So i'm wondering if anyone can propose a solution that I can implement quickly without altering too much infrastructure and code that can overcome this problem?
I have read about in memory storage like Redis, but I have realized that if data is sitting in memory during server failure then that data is lost. I have read in the docs that redis can be configured as a persistent store, there just needs to be enough memory on the server for Redis to do it. This solution would mean that I would have to write a script that would dump the data from Redis (memory) to the Log file at a certain interval.
I am wondering if there is even a quicker solution? Any help would be greatly appreciated!
One possible option what I can think of is a separate logging process. So that your web.py can be shielded for performance issue. This is classical way of handling logging module. You can use IPC or any other bus communication infrastructure. With this you will be able to address two issues -
Logging will not be a huge bottle neck for high capacity call flows.
A separate module can ensure/provide switch off/on facility.
As such there would not be any huge/significant process memory usage.
However, you should bear in mind below points -
You need be sure that logging is restricted to just logging. It must not be a data store for business processing. Else you may have many synchronization problem in your business logic.
The logging process (here I mean actual Unix process) will become critical and slightly complex (i.e you may have to handle a form of IPC).
HTH!
I'm developing the backend part of a social app. Clients are iOS/Android phones. The backend code is a PHP application that provides a REST API to clients.
I'm using a simple logging system, with several log levels and different log writers. The simpler writer is a FileWriter. All the log messages go to a log file that changes every day. The log files are not going to be used for analytical purposes, at least so far. Just record errors and user's important operations (database access, mainly)
I'm worried because, if the userbase grows quickly, I think that writing to a file is a kind of bottleneck, for 2 reasons:
Disk writing overhead
¿Concurrency?
About the second point, I have a doubt. I'm sorry if the doubt is stupid: I'm using Apache with Prefork MPM. As far as different client's requests are handled using different processes, there're no concurrecy issues when two processes are trying to log messages to the same file. The OS (Ubuntu 11.10) handles this. Am I right?
Even in that case when I don't have to worry about concurrency writing to a file, is it a good idea? Isn't it too slow?
Many thanks in advance
As long as you open the file in append mode you are fine. Note that as long as you want persistent log files, they have to go to a file on disk at some point anyways. It makes absolutely no sense at all to use a DBMS, since that's simply another layer on top of the filesystem. As long as you don't open the file with caching disabled, the OS should take care of the I/O scheduling and write stuff off in bunches.
This is a bit complicated, so please don't jump to conclusions, feel free to ask about anything that is not clear enough.
Basically, I have a websocket server written in PHP. Please note that websocket messages are asynchronous, that is, a response to a request might take a lot of time, all the while the client keeps on working (if applicable).
Clients are supposed to ask the server for access to files on other servers. This can be an FTP service, or Dropbox, for the matter.
Here, please take note of two issues: connections should be shared and reused and the server actually 'freezes' while it does its work, hence any requests are processed after the server has 'unfrozen'.
Therefore, I thought, why not offload file access (which is what freezes the server) to PHP threads?
The problem here is twofold;
how do I make a connection resource in the main thread (the server) available to the sub threads (not possible with the above threading model)?
what would happen if two threads end up needing the same resource? It's perfectly fine if one is locked until the other one finishes, but we still need to figure out issue #1.
Perhaps my train of thought is all screwed up, if you can find a better solution, I'm eager to hear it out. I've also had the idea of having a PHP thread hosting a connection resource, but it's pretty memory intensive.
PHP supports no threads. The purpose of PHP is to respond to web requests quickly. That's what the architecture was built for. Different libraries try to do something like threads but they usually cause more issues than they solve.
In general there are two ways to achieve what you want:
off-load the long processes to an external process. A common approach is using a system like gearman http://php.net/gearman
Use asynchronous operations. Some stream operations and such provide an "async" flag or "non-blocking" mode. http://php.net/stream-set-blocking
I'm trying to figure out the best way to minimize resource utilization when I have PHP talking to various backend services (e.g. Amazon S3 or any other random web services -- I'd like a general solution). Ideally, I'd like to have a single persistent connection to the backend (or maybe a small pool of persistent connections) with some caching, and then have all of the PHP tasks share it. We can consider it all read-only for the purposes of this question. It's not obvious to me how to do this in PHP. There's the database-specific stuff like mysql_pconnect(), but that doesn't really do it for me.
One idea I've had, which seems seems somewhat suboptimal (but is still better than having every single request create and destroy a new connection) is to use a local caching proxy (in a separate process) that would effectively do the pooling and caching. PHP would still be opening and closing a connection for every request, but at least it would be to a local process, so it should be a little faster (and it would reduce load on the backends). But it doesn't seem like this kind of craziness should be necessary. There's gotta be a better way. This is easy in other languages. Please tell me what I'm missing!
There's a large ideological disconnect between the various web technologies. Some are essentially daemons that run full-time in the background, and handle requests passed in on their own. Because there's a process always running, you can have a pool of already open existing working connections.
PHP (and normal CGI scripts) does not have a daemon behind the scenes. Every time a request comes in, the PHP interpreter is started up with a clean slate, compiles the scripts, and runs the bytecode. There's no persistence. The PHP database functions that support persistent connections establish the connection at the web server child level (i.e. mod_php attached to an Apache process). This isn't exactly a connection pool, as you can only ever see the persistent connection attached to your own process.
Without having a daemon or similar process sitting behind the scenes to hand out resources, you won't get real connection pooling.
Keep in mind that most new connections to most services are not heavy-weight, and non-database connections that are heavy-weight might not be friendly to the concept of a connection pool.
Before you think about writing your own PHP-based daemon to handle stuff like this, keep in mind that it may already be a solved problem. Python came up with something called WSGI, with a similar implementation in Ruby called Rack. Perl also has something remarkably similar but I can't remember the name of it off the top of my head. A quick look at Google didn't show any PHP implementations of WSGI, but that doesn't mean they don't exist...
Because S3 and other webservices use HTTP as their transport, you won't get a significant benefit from caching the connection.
Although you may be using an API that appears to authenticate as a first step, looking at the S3 Documentation, the authentication happens with every request - so no benefit in authenticating once and reusing a connection
Web service requests over HTTP are lightweight and typically stateless. Once your request has been answered, no resources (connection or sesson state) are consumed on the server. This allows the web service implementer to use many machines to answer your request without tying up resources on a particular server
Is it possible to implement a p2p using just PHP? Without Flash or Java and obviously without installing some sort of agent/client on one's computer.
so even though it might not be "true" p2p, but it'd use server to establish connection of some sort, but rest of communication must be done using p2p
i apologize for little miscommunication, by "php" i meant not a php binary, but a php script that hosted on web server remote from both peers, so each peer have nothing but a browser.
without installing some sort of
agent/client on one's computer
Each computer would have to have the PHP binaries installed.
EDIT
I see in a different post you mentioned browser based. Security restrictions in javascript would prohibit this type of interaction
No.
You could write a P2P client / server in PHP — but it would have to be installed on the participating computers.
You can't have PHP running on a webserver cause two other computers to communicate with each other without having P2P software installed.
You can't even use JavaScript to help — the same origin policy would prevent it.
JavaScript running a browser could use a PHP based server as a middleman so that two clients could communicate — but you aren't going to achieve P2P.
Since 2009 (when this answer was originally written), the WebRTC protocol was written and achieved widespread support among browsers.
This allows you to perform peer-to-peer between web browsers but you need to write the code in JavaScript (WebAssembly might also be an option and one that would let you write PHP.)
You also need a bunch of non-peer server code to support WebRTC (e.g. for allow peer discovery and proxy data around firewalls) which you could write in PHP.
It is non-theoretical because server side application(PHP) does not have peer's system access which is required to define ports, IP addresses, etc in order to establish a socket connection.
ADDITION:
But if you were to go with PHP in each peer's web servers, that may give you what you're looking for.
Doesn't peer-to-peer communication imply that communication is going directly from one client to another, without any servers in the middle? Since PHP is a server-based software, I don't think any program you write on it can be considered true p2p.
However, if you want to enable client to client communications with a php server as the middle man, that's definitely possible.
Depends on if you want the browser to be sending data to this PHP application.
I've made IRC bots entirely in PHP though, which showed their status and output in my web browser in a fashion much like mIRC. I just set the timeout limit to infinite and connected to the IRC server using sockets. You could connect to anything though. You can even make it listen for incoming connections and handle them.
What you can't do is to get a browser to keep a two-way connection without breaking off requests (not yet anyways...)
Yes, but its not what's generally called p2p, since there is a server in between. I have a feeling though that what you want to do is to have your peers communicate with each other, rather than have a direct connection between them with no 'middleman' server (which is what is normally meant by p2p)
Depending on the scalability requirements, implementing this kind of communication can be trivial (simple polling script on clients), or demanding (asynchronous comet server).
In case someone comes here seeing if you can write P2P software in PHP, the answer is yes, in this case, Quentin's answer to the original question is correct, PHP would have to be installed on the computer.
You can do whatever you want to do in PHP, including writing true p2p software. To create a true P2P program in PHP, you would use PHP as an interpreted language WITHOUT a web server, and you would use sockets - just like you would in c/c++. The original accepted answer is right and wrong, unless however the original poster was asking if PHP running on a webserver could be a p2p client - which would of course be no.
Basically to do this, you'd basically write a php script that:
Opens a server socket connection (stream_socket_server/socket_create)
Find a list of peer IP's
Open a client connection to each peer
...
Prove everyone wrong.
No, not really. PHP scripts are meant to run only for very small amount of time. Usually the default maximum runtime is two minutes which will be normally not enough for p2p communication. After this the script will be canceled though the server administrator can deactivate that. But even then the whole downloading time the http connection between the server and the client must be hold. The client's browser will show in this time its page loading indicator. If the connection breakes most web servers will kill the php script so the p2p download is canceled.
So it may be possible to implement the p2p protocol, but in a client/server scenario you run into problems with the execution model of php scripts.
both parties would need to be running a server such as apache although for demonstration purposes you could get away with just using the inbuilt php test server. Next you are going to have to research firewall hole punching in php I saw a script i think on github but was long time ago . Yes it can be done , if your client is not a savvy programmer type you would probably need to ensure that they have php installed and running. The path variable may not work unless you add it to the system registry in windows so make sure you provide a bat file that both would ensure the path is in the system registry so windows can find it .Sorry I am not a linux user.
Next you have to develop the code. There are instrucions for how hole punching works and it does require a server on the public domain which is required to allow 2 computers to find each others ip address. Maybe you could rig up something on a free website such as www.000.webhost.com alternatively you could use some kind of a built in mechanism such as using the persons email address. To report the current ip.
The biggest problem is routers and firewalls but packets even if they are directed at a public ip still need to know the destination on a lan so the information on how to write the packet should be straight forwards. With any luck you might find a script that has done most of the work for you.