In this post, I will look at the Git smart HTTP protocol and will demonstrate how to create a web application that can host and serve Git repositories to users. Demonstrations will be shown using Node.js.
GitHub is a marvelous website and I use it almost every day. It’s great that we have GitHub to complement Git, which has quickly become one of my favorite development tools, and hands down my favorite version control system for project management. However, as good as both are, they both have issues that affect my use of them at work.
First, not everything that I write is open source and can be hosted on GitHub. For my professional work, I need to ensure that my clients have access to the source code that I am writing for them. I also need to ensure that their intellectual property is stored confidentially. Sure, I could always pay to host their source code in a private account on GitHub, but sometimes customers just do not want to go there, especially if they do not understand the technology.
Second, there are programs that I write for internal use within Neudesic, or projects that I would like to work on internally. I wish that I could have a GitHub-like website hosted someplace on the company intranet that works like GitHub, but there are not a lot of options out there. There is the GitHub option, but it’s rather pricey for the enterprise edition of it and getting the commitment to spend that much money annually has not happened yet.
Third, I need a true intranet application. This means that users need to be authenticated against my company’s domain server. It also means that I need Git to be hosted and running on a Windows server. Since we are a Microsoft partner, this is pretty critical. So far, Git does not have a great reputation running on other web servers besides Apache, which I would prefer not to run if I could help it.
What I really want is something similar to GitHub that I can use to facilitate collaboration between myself and my peers, and that I can place on my company’s intranet. Since I don’t see anything out there that I can readily use, I am going to look into my own programming skills to figure out if I can build it myself.
I started this project a couple of months ago when I first posted my GitHub Pages website and I did have limited success. I was able to write a simple Git server using Node.js and it worked great on my Mac and on an Ubuntu Linux VM, but the server failed miserably under Windows. Fortunately, persistence has paid off an I was finally able to get my Git server to work in a Windows environment inside of IIS. In this post, I will tell you what I did to make that work.
Understanding the Git HTTP protocol
My overall goal with this project is that I want to start “dogfooding” my hosted Git solution as quickly as possible. I want to publish this solution on my company’s intranet so that others can start helping me to build new features for the site and it can become better. I identified five operations that are critical initially towards facilitating collaboration:
- Push the initial change set from my local repository to the project repository.
- Clone the project repository.
- Push change sets from my local repository to the project repository.
- Fetch change sets from the project repository.
- Pull change sets from the project repository.
I listed #1 and #3 separately because I’m not sure if the protocol is different for a new repository versus an initialized repository. For now, I will be creating the hosted project repository manually. Also, I will not be enforcing authentication initially. I will add those features into the service later.
Now it is time to jump in and start understanding the communication that happens between a Git client and a Git HTTP server. Fortunately, Git is open source and I can utilize the source code for the git-http-backend command. I decided that I am going to use that code as a reference, but not necessarily port from it. I want to understand what it does inside of each request, but my solution may not work exactly the same.
As it turns out, Git supports a couple of options that can be specified via
environment variables to help us out with debugging. First, we can turn on
tracing for network calls that Git makes to an HTTP server. With this enabled,
Git will output the HTTP requests and headers for each call that it makes,
and will also output the HTTP status code and response headers for each
response. This will be useful as we look at each scenario. To turn on this
tracing, we just have to set the
GIT_CURL_VERBOSE environment variable:
The next thing that is worthwhile to look at is using Fiddler to observe and debug the HTTP calls. We can set a second environment variable to force Git to send its HTTP requests through Fiddler’s HTTP proxy:
With those set, we can start observing the HTTP calls that the Git client will send to the Git HTTP server, and we can start to implement the Git HTTP server.
Setting Up the Development Environment
As I mentioned earlier, this solution has to work on a Microsoft Windows server. Specifically, this solution needs to be hostable inside of IIS and not require running the Apache web server. Now, I could always run the Apache web server, but the point is that I do not want to.
To implement this project, I am going to use Node.js as the implementation language (in another post, I may take a look at using ASP.NET to implement the same behavior). My Node.js application will be hosted in IIS using iisnode.
The Node.js application will be based on Express. I am doing this because there will eventually be a web user interface for the site and I want the Git HTTP server to play into it. Also, Express has routing behaviors which will become useful when I implement the Git HTTP protocol.
My project directory structure looks like this:
C:\Projects\GitHttpServer\MAIN <-- root directory for project workspace .git\ <-- local Git repository build\ <-- generated by Rakefile; build outputs go here temp\ <-- intermediate files generated by Rakefile web\ <-- IIS website hosted here; built by Rakefile node_modules\ <-- Express and Jade go here src\ <-- source code to be compiled goes here web\ <-- Express/CoffeeScript files go here public\ <-- Static content will go here routes\ <-- Node.js code for HTTP request handling views\ <-- Jade views will go here iisnode.yml <-- YAML configuration for iisnode gitserver.coffee <-- Main program for the Git HTTP server web.config <-- IIS configuration .gitignore <-- Files/directories for Git to ignore package.json <-- Node.js package definition Rakefile <-- Ruby Rake script used to build site
The web.config file is important because it links my Node.js application to IIS and allows IIS to send requests to iisnode. The web.config file also contains URL rewriting rules that will map incoming requests either to the static content in the public directory, or to the dynamic content implemented by my server. The web.config file looks like this:
The iisnode.yml file is a new feature that was recently added to iisnode. Using iisnode.yml, we can set configuration options for iisnode without having to include those settings in the web.config file. I am initially going to set two settings for iisnode. The first setting, node_env will let the Express framework know that it is being used in a development environment. When I go into product, I will change the value. The second setting, promoteServerVars will allow my Node.js application to obtain the value of the AUTH_USER HTTP server variable so that I can get the user name of the authenticated user. When I later turn on Windows authentication for my web application, the AUTH_USER variable will contain the domain name of the user accessing the website.
With the basic web application set up, I can start looking at the Git HTTP protocol and start to implement the server side of the protocol.
Pushing the Initial Change Set to the Project Repository
Using GitHub as a model, starting a new hosted project involves the following operations:
- I go to GitHub and create the hosted repository for my project.
- I create the local repository for my project.
- I create the initial change set. This typically involves creating a READAME file and committing the file to my repository.
- I add my GitHub project repository as a remote repository to my local repository and assign the remote repository the alias origin.
- I push the initial change set to the remote repository and create the master branch.
I handled step 1 manually for now, as I am not focusing on the project creation story yet. I created a directory on my hard drive at C:\GitProjects\GitServer and initialized the repository as a bare repository:
C:\GitProjects\GitServer> git init --bare
I performed the second step when I created my development environment in the previous section. For the third step, I committed the basic application template that I created above after testing to make sure that the website’s home page was served from IIS using iisnode and appeared in my web browser.
For the fourth step, I need to create the link between my local repository and the remote repository. In order to do that, I need a URL structure that I can use and specify for my project. For now, I am going to use the following URL format to specify the Git repository for a hosted project:
In the case of this project, my Git repository URL will look like this:
I added the remote repository to my local repository:
git remote add origin http://localhost:4000/michael.collins/gitserver/git
My new remote repository is now being aliased as origin. Now it’s time to
try to push the initial change set from my local repository to my new remote
repository. I know in advance that this is going to fail, because I have not
implemented the server side of the operation, but because I set the
GIT_CURL_VERBOSE environment variable, Git will show me the HTTP requests
that it is sending to my server so that I can start to implement a handler for
Executing the push command
git push -u origin master shows me that Git is
first trying to send the following request to my server:
This request will return a 404 since that URL is not handled by the web
application, so now I can look to implement it. Looking at the source code for
git-http-backend command, I can see that the Git CGI handler outputs some
header bytes and then executes the the
git-receive-pack command. The output
git-receive-pack command is appended to the HTTP response and returned
to the Git client. The full command line that is executed looks like this:
git-receive-pack --stateless-rpc --advertise-refs (projectroot)
When I tried this project earlier, this was where my first failure occurred.
On Windows, the
git-receive-pack command is a shortcut that executes the
git receive-pack command. This is the behavior that we really want. Node.js
seems incapable of executing the
git-receive-pack command using the
API, so we need another way of executing the command. What I did was to create
a new file in my web server directory named git-receive-pack.cmd that I
could execute instead:
@git receive-pack %*
My implementation for this handler looks like this:
I can add this handler to my Git server application by assigning a route:
For now, you will notice that I am hard-coding the paths to the Git repository and to the processes that I am spawning. I am also ignoring the user name and project name parameters in the URL. That will get fixed later. Right now, I just want to see the solution work.
The getInfoRefs handler basically outputs a set of HTTP headers to disable response caching and set the content type of the response. Next, there are a set of header bytes that need to be output in a special format. The first bytes returned are the length of the record specified as a 4 character hexadecimal string. I create the header record, get it’s length, build the hexadecimal string, and then write the record to the response.
After the header record, the rest of the output comes from executing the
git receive-pack command. I spawn the batch script that I created earlier
and pass the parameters to the script. The script executes Git and passes
the parameters to the
receive-pack command. I then redirect the standard
output stream of the Git child process to the HTTP response stream. I also
listen for any errors being written to the standard error stream and log those
errors. Finally, when the Git process finishes executing, I close the HTTP
Obviously, this is a bare bones implementation that needs more work, but when I ran it, I found that this operation succeeded, so I am able to move to the next step in the process. The next step is Git executes an HTTP POST to send the change set to the remote repository:
The payload of the request is a blob containing the change sets to be pushed
to the hosted repository. Again, this will execute the
command, so I have to take the body of the HTTP request and send it via the
standard input stream to Git. The handler for this request looks like this:
This handler gets added to the Git server application:
I then find that executing the push command again now succeeds:
git push -u origin master
I should be able to go to my hosted repository and execute the
command, and my change set should now appear in the repository, which it
In terms of my objectives, my first objective was to push the initial change
set to the hosted repository, which I just accomplished. My third objective is
to push later change sets to the remote repository. Since I now have changes
because I implemented the
git push behavior for the initial commit, I can
commit my changes to my local repository and execute
git push origin master
again. What I saw was that the Git client successfully pushed my additional
changes to the hosted repository. Checking the hosted repository verifies that
my changes are now in the remote repository. So two objectives are completed.
Cloning the Hosted Git Repository
Fresh on the success of pushing changes from my local repository to my hosted repository, the next thing that I want to do is to be able to clone the hosted repository. It is not so much an issue for me, since I do have a local repository, but long-term I do not want to be the only person working on this project. I am a busy person. I want help and collaborators and they will need to be able to clone my repository.
Following the workflow from the previous section, I am going to start off and
git clone command to see what happens:
git clone http://localhost:8000/michael.collins/gitserver/git
Executing this command shows me that Git is first trying to send the following request to my web application:
While I did implement support for the
/info/refs path in the previous
section, the command was for
git-upload-pack is another
command that I have to implement.
Here’s the interesting thing about
git-upload-pack. While the
git-receive-pack command on Windows is a shortcut for the
git-upload-pack is actually an executable on Windows. It can also
be executing using the
git upload-pack command. To make life simple, I am
going to create a second batch script that aliases the
@git upload-pack %*
Next, I am going to change my implementation of
getInfoRefs to support
executing the service specified in the
service query parameter:
Updating my web application and running the
git clone command should show me
that this operation now works. The next HTTP request that is sent to the web
application by the
git clone command is an HTTP POST that looks like this:
Why this is a POST on the
clone is unknown to me. It is also not relevant
because I do not need to process the POST data. I just need to forward the body
to Git. The implementation here is very similar to the POST operation that I
implemented in the previous section, so I am just going to copy the code and
modify it to run the correct batch script:
I added this operation to my web application:
Again, updating the web application and running the
git clone command should
show me that this operation completes. In fact, what I found was that
implementing this operation resulted in the clone operation succeeding and
the cloned repository was now in my workspace. A third objective is now
Fetching and Pulling Changes from the Hosted Repository
Three of five objectives complete. I can clone a repository, make changes to
it, and push my changes back to the hosted repository. The last piece of the
puzzle is that other people that I am collaborating with need to be able to
pull my changes. Therefore, I need support for
git fetch and
In order to test a fetch, I committed the changes that I made in the last
section and pushed the changes to the host repository. I next went to the
cloned repository and executed a
git fetch origin command. What I found was
that the operation completed successfully. It pulled down the changes into
my new local repository. Executing
git merge origin/master merged the changes
from the origin/master branch into my local master branch. A fourth
objective complete and I did not have to do anything.
My next thought is that maybe
git pull will work the same way, since I’m
guessing that it uses the same logic as
git fetch and
git merge. To test
it out, I created edited the README file in my project repository and pushed
the change to the hosted repository. I next switched over to my other local
repository and executed the
git pull origin command to pull the changes
from the origin/master branch and merge them into my local master branch.
Again, the operation completed successfully and my local repository was updated
with the changes from the other repository.
Five out of five objectives completed successfully.
Conclusion and Next Steps
In this post, I created a Node.js application that allowed me to expose hosted Git repositories using IIS and running on a Windows server. The web application is not necessarily complete as there are still features like error handling and user authentication and authorization that have to be implemented, but the basic operations needed to clone, push, pull, and fetch are now supported.
In the next post on this subject, I will look at what I can do in order to make the solution stronger and usable in an intranet scenario.
I also know that some of the people that I work with aren’t so keen yet on using Node.js over ASP.NET MVC, so I may take a look at creating the same functionality inside of an ASP.NET MVC application.