CGI is a Common Gateway Interface. As the name says, it is a "common" gateway interface for everything. It is so trivial and naive from the name. I feel that I understood this and I felt this every time I encountered this word. But frankly, I didn't. I'm still confused.
I am a PHP programmer with web development experience.
user (client) request for page ---> webserver(->embedded PHP
interpreter) ----> Server side(PHP) Script ---> MySQL Server.
Now say my PHP Script can fetch results from MySQL server & MATLAB server & some other server.
So, now PHP Script is the CGI? Because its interface for the between webserver & All other servers? I don't know. Sometimes they call CGI, a technology & other times they call CGI a program or some other server.
What exactly is CGI?
Whats the big deal with /cgi-bin/*.cgi? What's up with this? I don't know what is this cgi-bin directory on the server for. I don't know why they have *.cgi extensions.
Why does Perl always comes in the way. CGI & Perl (language). I also don't know what's up with these two. Almost all the time I keep hearing these two in combination "CGI & Perl". This book is another great example CGI Programming with Perl. Why not "CGI Programming with PHP/JSP/ASP"? I never saw such things.
CGI Programming in C, confuses me a lot. "in C"?? Seriously?? I don't know what to say. I'm just confused. "in C"?? This changes everything. Program needs to be compiled and executed. This entirely changes my view of web programming. When do I compile? How does the program gets executed (because it will be a machine code, so it must execute as a independent process). How does it communicate with the web server? IPC? and interfacing with all the servers (in my example MATLAB & MySQL) using socket programming? I'm lost!!
People say that CGI is deprecated and isn't in use anymore. Is that so? What is the latest update?
Once, I ran into a situation where I
had to give HTTP PUT request access to
web server (Apache HTTPD). Its a long
back. So, as far as I remember this is
what I did:
Edited the configuration file of Apache HTTPD to tell webserver to pass
all HTTP PUT requests to some
put.php ( I had to write this PHP
script)
Implement put.php to handle the request (save the file to the location
mentioned)
People said that I wrote a CGI Script.
Seriously, I didn't have a clue what
they were talking about.
Did I really write CGI Script?
I hope you understood what my confusion is. (Because I myself don't know where I'm confused). I request you guys to keep your answer as simple as possible. I really can't understand any fancy technical terminology. At least not in this case.
EDIT:
I found this amazing tutorial "CGI Programming Is Simple!" - CGI Tutorial, which explains the concepts in simplest possible way. After reading this article you may want to read Getting Started with CGI Programming in C to supplement your understanding with actual code samples. I've also added these links to this tutorial to Wikipedia's article: http://en.wikipedia.org/wiki/Common_Gateway_Interface
CGI is an interface which tells the webserver how to pass data to and from an application. More specifically, it describes how request information is passed in environment variables (such as request type, remote IP address), how the request body is passed in via standard input, and how the response is passed out via standard output. You can refer to the CGI specification for details.
To use your image:
user (client) request for page ---> webserver ---[CGI]----> Server side Program ---> MySQL Server.
Most if not all, webservers can be configured to execute a program as a 'CGI'. This means that the webserver, upon receiving a request, will forward the data to a specific program, setting some environment variables and marshalling the parameters via standard input and standard output so the program can know where and what to look for.
The main benefit is that you can run ANY executable code from the web, given that both the webserver and the program know how CGI works. That's why you could write web programs in C or Bash with a regular CGI-enabled webserver. That, and that most programming environments can easily use standard input, standard output and environment variables.
In your case you most likely used another, specific for PHP, means of communication between your scripts and the webserver, this, as you well mention in your question, is an embedded interpreter called mod_php.
So, answering your questions:
What exactly is CGI?
See above.
Whats the big deal with /cgi-bin/*.cgi? Whats up with this? I don't know what is this cgi-bin directory on the server for. I don't know why they have *.cgi extensions.
That's the traditional place for cgi programs, many webservers come with this directory pre configured to execute all binaries there as CGI programs. The .cgi extension denotes an executable that is expected to work through the CGI.
Why does Perl always comes in the way. CGI & Perl (language). I also don't know whats up with these two. Almost all the time I keep hearing these two in combination "CGI & Perl". This book is another great example CGI Programming with Perl Why not "CGI Programming with PHP/JSP/ASP". I never saw such things.
Because Perl is ancient (older than PHP, JSP and ASP which all came to being when CGI was already old, Perl existed when CGI was new) and became fairly famous for being a very good language to serve dynamic webpages via the CGI. Nowadays there are other alternatives to run Perl in a webserver, mainly mod_perl.
CGI Programming in C this confuses me a lot. in C?? Seriously?? I don't know what to say. I"m just confused. "in C"?? This changes everything. Program needs to be compiled and executed. This entirely changes my view of web programming. When do I compile? How does the program gets executed (because it will be a machine code, so it must execute as a independent process). How does it communicate with the web server? IPC? and interfacing with all the servers (in my example MATLAB & MySQL) using socket programming? I'm lost!!
You compile the executable once, the webserver executes the program and passes the data in the request to the program and outputs the received response. CGI specifies that one program instance will be launched per each request. This is why CGI is inefficient and kind of obsolete nowadays.
They say that CGI is deprecated. Its no more in use. Is it so? What is its latest update?
CGI is still used when performance is not paramount and a simple means of executing code is required. It is inefficient for the previously stated reasons and there are more modern means of executing any program in a web enviroment. Currently the most famous is FastCGI.
What exactly is CGI?
A means for a web server to get its data from a program (instead of, for instance, a file).
Whats the big deal with /cgi-bin/*.cgi?
No big deal. It is just a convention.
I don't know what is this cgi-bin directory on the server for.
I don't know why they have *.cgi extensions.
The server has to know what to do with the file (i.e. treat it as a program to execute instead of something to simply serve up). Having a .html extension tells it to use a text/html content type. Having a .cgi extension tells it to run it as a program.
Keeping executables in a separate directory gives some added protection against executing incorrect files and/or serving up CGI programs as raw data in case the server gets misconfigured.
Why does Perl always comes in the way.
It doesn't. Perl was just big and popular at the same time as CGI.
I haven't used Perl CGI for years. I was using mod_perl for a long time, and tend towards PSGI/Plack with FastCGI these days.
This book is another great example CGI Programming with Perl
Why not "CGI Programming with PHP/JSP/ASP".
CGI isn't very efficient. Better methods for talking to programs from webservers came along at around the same time as PHP. JSP and ASP are different methods for talking to programs.
CGI Programming in C this confuses me a lot. in C?? Seriously??
It is a programming language, why not?
When do I compile?
Write code
Compile
Access URL
Webserver runs program
How does the program gets executed (because it will be a machine code, so it must execute as a independent process).
It doesn't have to execute as an independent process (you can write Apache modules in C), but the whole concept of CGI is that it launches an external process.
How does it communicate with the web server? IPC?
STDIN/STDOUT and environment variables — as defined in the CGI specification.
and interfacing with all the servers (in my example MATLAB & MySQL) using socket
programming?
Using whatever methods you like and are supported.
They say that CGI is depreciated. Its no more in use. Is it so?
CGI is inefficient, slow and simple. It is rarely used, when it is used, it is because it is simple. If performance isn't a big deal, then simplicity is worth a lot.
What is its latest update?
1.1
CGI is an interface specification between a web server (HTTP server) and an executable program of some type that is to handle a particular request.
It describes how certain properties of that request should be communicated to the environment of that program and how the program should communicate the response back to the server and how the server should 'complete' the response to form a valid reply to the original HTTP request.
For a while CGI was an IETF Internet Draft and as such had an expiry date. It expired with no update so there was no CGI 'standard'. It is now an informational RFC, but as such documents common practice and isn't a standard itself. rfc3875.txt, rfc3875.html
Programs implementing a CGI interface can be written in any language runnable on the target machine. They must be able to access environment variables and usually standard input and they generate their output on standard output.
Compiled languages such as C were commonly used as were scripting languages such as perl, often using libraries to make accessing the CGI environment easier.
One of the big disadvantages of CGI is that a new program is spawned for each request so maintaining state between requests could be a major performance issue. The state might be handled in cookies or encoded in a URL, but if it gets to large it must be stored elsewhere and keyed from encoded url information or a cookie. Each CGI invocation would then have to reload the stored state from a store somewhere.
For this reason, and for a greatly simple interface to requests and sessions, better integrated environments between web servers and applications are much more popular. Environments like a modern php implementation with apache integrate the target language much better with web server and provide access to request and sessions objects that are needed to efficiently serve http requests. They offer a much easier and richer way to write 'programs' to handle HTTP requests.
Whether you wrote a CGI script rather depends on interpretation. It certainly did the job of one but it is much more usual to run php as a module where the interface between the script and the server isn't strictly a CGI interface.
The CGI is specified in RFC 3875, though that is a later "official" codification of the original NCSA document. Basically, CGI defines a protocol to pass data about a HTTP request from a webserver to a program to process - any program, in any language. At the time the spec was written (1993), most web servers contained only static pages, "web apps" were a rare and new thing, so it seemed natural to keep them apart from the "normal" static content, such as in a cgi-bin directory apart from the static content, and having them end in .cgi.
At this time, here also were no dedicated "web programming languages" like PHP, and C was the dominating portable programming language - so many people wrote their CGI scripts in C. But Perl quickly turned out to be a better fit for this kind of thing, and CGI became almost synonymous with Perl for a while. Then there came Java Servlets, PHP and a bunch of others and took over large parts of Perl's market share.
Have a look at CGI in Wikipedia. CGI is a protocol between the web server and a external program or a script that handles the input and generates output that is sent to the browser.
CGI is a simply a way for web server and a program to communicate, nothing more, nothing less. Here the server manages the network connection and HTTP protocol and the program handles input and generates output that is sent to the browser. CGI script can be basically any program that can be executed by the webserver and follows the CGI protocol. Thus a CGI program can be implemented, for example, in C. However that is extremely rare, since C is not very well suited for the task.
/cgi-bin/*.cgi is a simply a path where people commonly put their CGI script. Web server are commonly configured by default to fetch CGI scripts from that path.
a CGI script can be implemented also in PHP, but all PHP programs are not CGI scripts. If webserver has embedded PHP interpreter (e.g. mod_php in Apache), then the CGI phase is skipped by more efficient direct protocol between the web server and the interpreter.
Whether you have implemented a CGI script or not depends on how your script is being executed by the web server.
CGI essentially passes the request off to any interpreter that is configured with the web server - This could be Perl, Python, PHP, Ruby, C pretty much anything. Perl was the most common back in the day thats why you often see it in reference to CGI.
CGI is not dead. In fact most large hosting companies run PHP as CGI as opposed to mod_php because it offers user level config and some other things while it is slower than mod_php. Ruby and Python are also typically run as CGI. they key difference here is that a server module runs as part of the actual server software - where as with CGI its totally outside the server The server just uses the CGI module to determine how to pass and recieve data to the outside interpreter.
CGI is a mechanism whereby an external program is called by the web server in order to handle a request, with environment variables and standard input being used to feed the request data to the program. The exact language the external program is written in does not matter, although it is easier to write CGI programs in some languages versus others.
Since CGI scripts need execute permissions, httpd by default only allows CGI programs in the cgi-bin directory to be run for (possibly now misguided) security purposes.
Most PHP scripts run in the web server process via mod_php. This is not CGI.
CGI is slow since the program (and related interpreter) must be started up per request. Modern alternatives are embedded execution, used by mod_php, and long-running processes, used by FastCGI. A given language may have its own way of implementing those mechanisms, so be sure to ask around before resorting to CGI.
A real-life example: a complicated database that needs to be shown on a website. Since the database was designed somewhere around 1986 (!), lots of data was packed in different ways to save on disk space.
As the development went on, the developers could no longer solve complicated data requests in SQL alone, for example because the sorting algorythms were unusual.
There are three sensible solutions:
quick and dirty: send the unsored data to PHP, sort it there. Obviously a very expensive solution, because this would be repeated every time the page is called
write a plugin to the database engine -- but the admin wasn't ready to allow foreign code to run on their server, or
you can process the data in a program (C, Perl, etc.), and output HTML. The program itself goes into /cgi-bin, and is called by the web server (e.g. Apache) directly, not through PHP.
CGI runs your script in Solution #3 and outputs the effect to the browser. You have the speed of the compiled program, the flexibility of a language broader than SQL, and no need to write plugins to the SQL server. (Again, this is an example specific to SQL and C)
A CGI script is a console/shell program. In Windows, when you use a "Command Prompt" window, you execute console programs. When a web server executes a CGI script it provides input to the console/shell program using environment variables or "standard input". Standard input is like typing data into a console/shell program; in the case of a CGI script, the web server does the typing. The CGI script writes data out to "standard output" and that output is sent to the client (the web browser) as a HTML page. Standard output is like the output you see in a console/shell program except the web server reads it and sends it out.
A CGI script can be executed from a browser. The URI typically includes a query string that is provided to the CGI script. If the method is "get" then the query string is provided to the CGI Script in an environment variable called QUERY_STRING. If the method is "post" then the query string is provided to the CGI Script using standard input (the CGI Script reads the query string from standard input).
An early use of CGI scripts was to process forms. In the beginning of HTML, HTML forms typically had an "action" attribute and a button designated as the "submit" button. When the submit button is pushed the URI specified in the "action" attribute would be sent to the server with the data from the form sent as a query string. If the "action" specifies a CGI script then the CGI script would be executed and it then produces a HTML page.
RFC 3875 "The Common Gateway Interface (CGI)" partially defines CGI using C, as in saying that environment variables "are accessed by the C library routine getenv() or variable environ".
If you are developing a CGI script using C/C++ and use Microsoft Visual Studio to do that then you would develop a console program.
You maybe want to know what is not CGI, and the answer is a MODULE for your web server (if I suppose you are runnig Apache). AND THAT'S THE BIG DIFERENCE, because CGI needs and external program, thread, whatever to instantiate a PERL, PHP, C app server where when you run as a MODULE that program is the web server (apache) per-se.
Because of all this there is a lot of performance, security, portability issues that come into play. But it's good to know what is not CGI first, to understand what it is.
A CGI is a program (or a Web API) you write, and save it on the Web Server site. CGI is a file.
This file sits and waits on the Web Server. When the client browser sends a request to the Web Server to execute your CGI file, the Web Server runs your CGI file on the server site. The inputs for this CGI program, if any, are from the client browser. The outputs of this CGI program are sent to the browser.
What language you use to write a CGI program? Other posts already mention c,java, php, perl, etc.
The idea behind CGI is that a program/script (whether Perl or even C) receives input via STDIN (the request data) and outputs data via STDOUT (echo, printf statements).
The reason most PHP scripts don't qualify is that they are run under the PHP Apache module.
Related
I have been looking at the node.js application vs php, I found many comparisons comparing these two server technique. Mostly people suggest that the javascript V8 engine is much faster than php in terms of running speed of a single file calculation.
I worked on some javascript code for Node.js, now I have this idea I don't know if it is correct.
In my opinion, Node.js runs a javascript application and listen on a port. so this javascript application is an application that is running on server computer. Therefore, the application code is all copied in the memory of the computer. stuff like global variables are declared and saved at the beginning when node.js execute this program. So any new request come in, the server can use these variables very efficiently.
In php, however, the php program execute *.php file based on request. Therefore, if some request is for www.xx.com/index.php, then the program will execute index.php, and in which, there may be stuff like
require("globalVariables.php");
then, php.exe would go there and declare these variables again. same idea for functions and other objects...
So am I correct in thinking that php may not be a good idea when there are many other libraries that need to be included?
I have searched for the comparison, but nobody have talked about this.
Thanks
You are comparing different things. PHP depends on Apache or nginx (for example) to serve scripts, while Node.js is a complete server itself.
That's a big difference, cause when you load a php page, Apache will spawn a thread and run the script there. In Node all requests are served by the Node.js unique thread.
So, php and Node.js are different things, but regarding your concern: yes, you can mantain a global context in Node that will be loaded in memory all the time. On the other hand PHP loads, runs and exits all the time. But that's not the typical use case, Node.js web applications have templates, that have to be loaded and parsed, database calls, files... the real difference is the way Node.js handles heavy tasks: a single thread for javascript, an event queue, and external threads for filesystem, network and all that slow stuff. Traditional servers spawn threads for each connection, that's a very different approach.
I'm new to PHP and will like to develop a mobile app that interacts with the server (By putting and pulling datas from the server). Initially I was using Java, but finacial issues I decided to use PHP because getting domain that uses java is expensive.
My question is that does PHP controls multitasking ? reason been that since I will have thousands of users connected to my server probably the same. I llok forward for your answers Thanks
How should PHP have control over multitasking?
PHP interprets a PHP-Script to one point in time when a http-Request occurs on the Script.
PHP does not do multi-threading. It's a single-process-execution kind of scripting language.
However, when set up as a server-side language, it's usually paired up with a HTTP server like Apache, IIS or Nginx, who manage several child processes to handle multiple requests. - If you set it up like a normal server-side language, on top of one of those HTTP servers, you will have no problems handling a lot of parallel traffic.
so since my webpage makes very complex calculations its VERY important to have it generated with a compiled code, but since im doing it for the web I need a few commands like the one it comes in PHP like $_SERVER (to get for example the IP of the user), $_GET, $_POST .
if theres already is one web server like this that pass these things for parameter for example it would be easier.
Thanks in advance.
You have two basic options:
Use CGI, which is a well supported system for communicating between web servers and scripts/executables.
Write a module
CGI is simple and near universal, but requires a new process to be spawned for each request. There is also FastCGI which is a bit more complicated but lets processes be reused.
Writing a module is significantly more complicated, but provides better performance.
Perhaps look at http://www.boutell.com/cgic/?
You can either compile you program as a CGI, or bounce your requests through a PHP script and pass whichever values you need in as command line parameters:
<?php
passthru("/path/to/my/binary {$_SERVER['HTTP_HOST']} {$_GET['aparameter']} {$_POST['aparameter']}");
?>
If you want to go down the CGI route, start here... ;-)
below is a quotation from http://www.php.net/manual/en/intro.pcntl.php
Process Control should not be enabled within a web server environment and unexpected results
may happen if any Process Control functions are used within a web server environment.
what are the side-effects of enabling it on my web server? what are the threatens and security concerns in it?
Thanks a lot for your help
There's a big difference between just enabling the extension and using the functions. Just enabling the extension should have no side effects whatsoever.
On the other hand, the functions made available can allow for some mischief. Forks can be abused, signals can be sent to other processes, telling them to perform actions that you otherwise might not want, and priorities of processes with the same owner as the web server daemon can be modified.
In other words, it's not something you'd want to enable unless you control all of the PHP running on that machine, like in a shared hosting environment.
If you enable this, an untrusted PHP code author could fork-bomb your server, which is harder to protect against than you might think.
An untrusted PHP code author could kill or suspend the webserver, or any processes that run as the same user as the webserver. (If the webserver runs untrusted PHP code as root, then it can stop or suspend all processes on the server.) Or, if you're using FastCGI or similar tools, it could kill or suspend any other tasks run as the same user.
An untrusted PHP code author could call the wait(2) family of functions, which will desperately confuse the server or FastCGI interface. It might hang it, it might cause it to crash, depends on the server.
Of course, the PHP process controls flag is really just advisory -- bugs in the PHP interpreter will allow a malicious code author all these things and more. This setting is simply there to keep honest programmers honest.
Any code you run in mod_php (or similar technologies for other servers) will have complete access to everything the web server can do.
Any code you run in FastCGI (or similar technologies) will have complete access to everything that the FastCGI system can do, based on the operating system's access controls.
If you really want to confine what untrusted PHP code can do, I suggest looking into different mandatory access control mechanisms, such as AppArmor, TOMOYO, SELinux, or SMACK.
Python, Perl and PHP, all support TCP stream sockets. But exactly how do I use sockets in a script file that is run by a webserver (eg Apache), assuming I only have FTP access and not root access to the machine?
When a client connects to a specific port, how does the script file get invoked?
Does the script stay "running" for the duration of the connection? (could be hours)
So will multiple "instances" of the script be running simultaneously?
Then how can method calls be made from one instance of the script to another?
Scripting languages utilize sockets exactly the same way as compiled languages.
1) The script typically opens and uses the socket. It's not "run" or "invoked" by the socket, but directly controls it via libraries (typically calling into the native C API for the OS).
2) Yes.
3) Not necessarily. Most modern scripting langauges can handle multiple sockets in one "script" application.
4) N/A, see 3)
Edit in response to change in question and comments:
This is now obvious that you are trying to run this in the context of a hosted server. Typically, if you're using scripting within Apache or a similar server, things work a bit differently. A socket is opened up and maintained by Apache, and it executes your script, passing the relevant data (POST/GET results, etc.) to your script to process. Sockets usually don't come into play when you're dealing with scripting for CGI, etc.
However, this typically happens using the same concepts as mod_cgi. This pretty much means that the script running is nothing but an executable as far as the server is concerned, and the executable's output is what gets returned to the client. In this case, (provided you have permissions and the correct libraries on the server), your python script can actually launch a separate script that does its own socket work completely outside of Apache's context.
It's (usually) not a good idea to run a full socket implementation directly inside of the CGI script, however. CGI will expect the executable to run to completion before it returns results to the client. Apache will sit there and "hang" a bit waiting for this to complete. If you're launching a full server (especially if it's a long running process, which they tend to be), Apache will think the script is locked, and probably abort, potentially killing the process (configuration specific, but most hosting companies do this to prevent scripts from taking over CPU on a shared system).
However, if you execute a new script from within your script, and then return (shutting down the CGI executable), the other script can be left running, working as a server. This would be something like (python example, using the subprocess library):
newProccess = Popen("python MyScript", shell=True)
Note that all of the above really depends a bit on server configuration, though. Many hosting companies don't include some of the socket or shell libraries in their scripting implementations specifically to prevent this, so you often have to revert to making the executable in C. In addition, this is often against terms of service for most hosting companies - you'd have to check yours.
As a prior answer notes, scripting languages have operate in this regard in exactly the same way as compiled programs. Where they differ (potentially) is in the API that they use. The operating system (Windows or Unix-based) offers an API (e.g., BSD sockets) that compiled programs will call directly (typically). Interpreted languages like PHP or Python may offer a different API such as Python's socket API which may simplify some parts of the underlying API.
Given any of these APIs, there are many ways in which the actual handling of an incoming TCP connection can be structured. A great and detailed overview of such approaches is available on the c10k webpage: http://www.kegel.com/c10k.html -- in particular, the section on IO strategies. In short, the choice of answers to your question is up to the programmer and may affect how the resulting program performs under load.
To focus on your specific questions:
Many server programs are started before the connection and are running to listen for incoming connections. A special case is inetd which is a superserver: it listens for connections and then hands off those connections to programs that it starts (specified in a config file).
Typically, yes, the script remains running for the duration of the connection. However, depending on the larger system architecture, the script could conceivably pass the connection off to another program for handling and then exit.
This is a choice, again as enumerated on the c10k page.
This is another choice; operating systems offer a variety of Interprocess Communication (IPC) mechanisms to programs.
The only way I can make sense of what you're asking is if you use inetd or a similar meta-server, which is configured to invoke your "service a single client" program for a specific listening port, forwarding your "single client servicer" program's stdin/stdout to the remote client.
If that's the case:
1) inetd runs it
2) yes
3) yes
4) named pipes are one possibility
When a client connects to a specific
port, how does the script file get
invoked?
The script should be already invoked in order to receive any connects from any client. You will need script to be hanging on there forever (infinie loop) and setup Apache not to kill it on timeout. Basically, PHP is not a good choice for writting server applications. Why do you need this?