Implementing a Git HTTP Server
Introduction
In this post, I will look at the Git smart HTTP protocol and will demonstrate how to create a web application that can host and serve Git repositories to users. Demonstrations will be shown using Node.js.
GitHub is a marvelous website and I use it almost every day. It’s great that we have GitHub to complement Git, which has quickly become one of my favorite development tools, and hands down my favorite version control system for project management. However, as good as both are, they both have issues that affect my use of them at work.
First, not everything that I write is open source and can be hosted on GitHub. For my professional work, I need to ensure that my clients have access to the source code that I am writing for them. I also need to ensure that their intellectual property is stored confidentially. Sure, I could always pay to host their source code in a private account on GitHub, but sometimes customers just do not want to go there, especially if they do not understand the technology.
Second, there are programs that I write for internal use within Neudesic, or projects that I would like to work on internally. I wish that I could have a GitHub-like website hosted someplace on the company intranet that works like GitHub, but there are not a lot of options out there. There is the GitHub option, but it’s rather pricey for the enterprise edition of it and getting the commitment to spend that much money annually has not happened yet.
Third, I need a true intranet application. This means that users need to be authenticated against my company’s domain server. It also means that I need Git to be hosted and running on a Windows server. Since we are a Microsoft partner, this is pretty critical. So far, Git does not have a great reputation running on other web servers besides Apache, which I would prefer not to run if I could help it.
What I really want is something similar to GitHub that I can use to facilitate collaboration between myself and my peers, and that I can place on my company’s intranet. Since I don’t see anything out there that I can readily use, I am going to look into my own programming skills to figure out if I can build it myself.
I started this project a couple of months ago when I first posted my GitHub Pages website and I did have limited success. I was able to write a simple Git server using Node.js and it worked great on my Mac and on an Ubuntu Linux VM, but the server failed miserably under Windows. Fortunately, persistence has paid off an I was finally able to get my Git server to work in a Windows environment inside of IIS. In this post, I will tell you what I did to make that work.
Understanding the Git HTTP protocol
My overall goal with this project is that I want to start “dogfooding” my hosted Git solution as quickly as possible. I want to publish this solution on my company’s intranet so that others can start helping me to build new features for the site and it can become better. I identified five operations that are critical initially towards facilitating collaboration:
- Push the initial change set from my local repository to the project repository.
- Clone the project repository.
- Push change sets from my local repository to the project repository.
- Fetch change sets from the project repository.
- Pull change sets from the project repository.
I listed #1 and #3 separately because I’m not sure if the protocol is different for a new repository versus an initialized repository. For now, I will be creating the hosted project repository manually. Also, I will not be enforcing authentication initially. I will add those features into the service later.
Now it is time to jump in and start understanding the communication that happens between a Git client and a Git HTTP server. Fortunately, Git is open source and I can utilize the source code for the git-http-backend command. I decided that I am going to use that code as a reference, but not necessarily port from it. I want to understand what it does inside of each request, but my solution may not work exactly the same.
As it turns out, Git supports a couple of options that can be specified via
environment variables to help us out with debugging. First, we can turn on
tracing for network calls that Git makes to an HTTP server. With this enabled,
Git will output the HTTP requests and headers for each call that it makes,
and will also output the HTTP status code and response headers for each
response. This will be useful as we look at each scenario. To turn on this
tracing, we just have to set the GIT_CURL_VERBOSE
environment variable:
SET GIT_CURL_VERBOSE=1
The next thing that is worthwhile to look at is using Fiddler to observe and debug the HTTP calls. We can set a second environment variable to force Git to send its HTTP requests through Fiddler’s HTTP proxy:
SET HTTP_PROXY=http://localhost:8888
With those set, we can start observing the HTTP calls that the Git client will send to the Git HTTP server, and we can start to implement the Git HTTP server.
Setting Up the Development Environment
As I mentioned earlier, this solution has to work on a Microsoft Windows server. Specifically, this solution needs to be hostable inside of IIS and not require running the Apache web server. Now, I could always run the Apache web server, but the point is that I do not want to.
To implement this project, I am going to use Node.js as the implementation language (in another post, I may take a look at using ASP.NET to implement the same behavior). My Node.js application will be hosted in IIS using iisnode.
For the implementation, I will be using CoffeeScript instead of JavaScript. It’s a personal choice, I prefer CoffeeScript. I will use Ruby Rake to automate the process of compiling the CoffeeScript into JavaScript for execution.
The Node.js application will be based on Express. I am doing this because there will eventually be a web user interface for the site and I want the Git HTTP server to play into it. Also, Express has routing behaviors which will become useful when I implement the Git HTTP protocol.
My project directory structure looks like this:
C:\Projects\GitHttpServer\MAIN <-- root directory for project workspace
.git\ <-- local Git repository
build\ <-- generated by Rakefile; build outputs go here
temp\ <-- intermediate files generated by Rakefile
web\ <-- IIS website hosted here; built by Rakefile
node_modules\ <-- Express and Jade go here
src\ <-- source code to be compiled goes here
web\ <-- Express/CoffeeScript files go here
public\ <-- Static content will go here
routes\ <-- Node.js code for HTTP request handling
views\ <-- Jade views will go here
iisnode.yml <-- YAML configuration for iisnode
gitserver.coffee <-- Main program for the Git HTTP server
web.config <-- IIS configuration
.gitignore <-- Files/directories for Git to ignore
package.json <-- Node.js package definition
Rakefile <-- Ruby Rake script used to build site
The web.config file is important because it links my Node.js application to IIS and allows IIS to send requests to iisnode. The web.config file also contains URL rewriting rules that will map incoming requests either to the static content in the public directory, or to the dynamic content implemented by my server. The web.config file looks like this:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<system.webServer>
<handlers>
<add name="iisnode" path="gitserver.js" verb="*" modules="iisnode"/>
</handlers>
<rewrite>
<rules>
<rule name="LogFile" patternSyntax="ECMAScript" stopProcessing="true">
<match url="^[a-zA-Z0-9_\-]+\.js\.logs\/\d+.txt$"/>
</rule>
<rule name="NodeInspector" patternSyntax="ECMAScript" stopProcessing="true">
<match url="^gitserver.js\/debug[\/]?"/>
</rule>
<rule name="StaticContent">
<action type="Rewrite" url="public"/>
</rule>
<rule name="DynamicContent">
<conditions>
<add input="" matchType="IsFile" negate="True"/>
</conditions>
<action type="Rewrite" url="gitserver.js"/>
</rule>
</rules>
</rewrite>
</system.webServer>
</configuration>
The iisnode.yml file is a new feature that was recently added to iisnode. Using iisnode.yml, we can set configuration options for iisnode without having to include those settings in the web.config file. I am initially going to set two settings for iisnode. The first setting, node_env will let the Express framework know that it is being used in a development environment. When I go into product, I will change the value. The second setting, promoteServerVars will allow my Node.js application to obtain the value of the AUTH_USER HTTP server variable so that I can get the user name of the authenticated user. When I later turn on Windows authentication for my web application, the AUTH_USER variable will contain the domain name of the user accessing the website.
node_env: development
promoteServerVars: AUTH_USER
With my directory structure established, I next created a website in IIS and pointed the root of the site to the build\web subdirectory where my build script will write the compiled JavaScript files and web content. My development website is hosted at http://localhost:8000. Now, I just need to create something to listen for incoming requests. I created the gitserver.coffee file with a basic Express web application template:
###
Copyright 2012 Michael F. Collins, III
###
###############################################################################
#
# gitserver.coffee
#
# This program implements the Git HTTP Server application. This program will
# host an HTTP server that will process and dispatch incoming HTTP requests to
# the correct handler.
#
# Copyright 2012 Michael F. Collins, III
#
###############################################################################
express = require 'express'
routes = require './routes'
app = module.exports = express.createServer()
app.configure ->
app.set 'views', __dirname + '/views'
app.set 'view engine', 'jade'
app.use express.bodyParser()
app.use express.methodOverride()
app.use express.query()
app.use app.router
app.use express.static, __dirname + '/public'
app.configure 'development', ->
app.use express.errorHandler
dumpExceptions: true
showStack: true
app.configure 'production', ->
app.use express.errorHandler()
app.get '/', routes.index
app.listen process.env.PORT, ->
console.log "Git HTTP Server listening on port #{process.env.PORT} in #{app.settings.env} mode"
With the basic web application set up, I can start looking at the Git HTTP protocol and start to implement the server side of the protocol.
Pushing the Initial Change Set to the Project Repository
Using GitHub as a model, starting a new hosted project involves the following operations:
- I go to GitHub and create the hosted repository for my project.
- I create the local repository for my project.
- I create the initial change set. This typically involves creating a READAME file and committing the file to my repository.
- I add my GitHub project repository as a remote repository to my local repository and assign the remote repository the alias origin.
- I push the initial change set to the remote repository and create the master branch.
I handled step 1 manually for now, as I am not focusing on the project creation story yet. I created a directory on my hard drive at C:\GitProjects\GitServer and initialized the repository as a bare repository:
C:\GitProjects\GitServer> git init --bare
I performed the second step when I created my development environment in the previous section. For the third step, I committed the basic application template that I created above after testing to make sure that the website’s home page was served from IIS using iisnode and appeared in my web browser.
For the fourth step, I need to create the link between my local repository and the remote repository. In order to do that, I need a URL structure that I can use and specify for my project. For now, I am going to use the following URL format to specify the Git repository for a hosted project:
http://<webserver>/<username>/<projectname>/git
In the case of this project, my Git repository URL will look like this:
http://localhost:8000/michael.collins/gitserver/git
I added the remote repository to my local repository:
git remote add origin http://localhost:4000/michael.collins/gitserver/git
My new remote repository is now being aliased as origin. Now it’s time to
try to push the initial change set from my local repository to my new remote
repository. I know in advance that this is going to fail, because I have not
implemented the server side of the operation, but because I set the
GIT_CURL_VERBOSE
environment variable, Git will show me the HTTP requests
that it is sending to my server so that I can start to implement a handler for
each request.
Executing the push command git push -u origin master
shows me that Git is
first trying to send the following request to my server:
GET http://localhost:8000/michael.collins/gitserver/git/info/refs?service=git-receive-pack
This request will return a 404 since that URL is not handled by the web
application, so now I can look to implement it. Looking at the source code for
the git-http-backend
command, I can see that the Git CGI handler outputs some
header bytes and then executes the the git-receive-pack
command. The output
of the git-receive-pack
command is appended to the HTTP response and returned
to the Git client. The full command line that is executed looks like this:
git-receive-pack --stateless-rpc --advertise-refs (projectroot)
When I tried this project earlier, this was where my first failure occurred.
On Windows, the git-receive-pack
command is a shortcut that executes the
git receive-pack
command. This is the behavior that we really want. Node.js
seems incapable of executing the git-receive-pack
command using the spawn
API, so we need another way of executing the command. What I did was to create
a new file in my web server directory named git-receive-pack.cmd that I
could execute instead:
@git receive-pack %*
My implementation for this handler looks like this:
child_process = require 'child_process'
spawn = child_process.spawn
exports.getInfoRefs = (req, res) ->
res.setHeader 'Expires', 'Fri, 01 Jan 1980 00:00:00 GMT'
res.setHeader 'Pragma', 'no-cache'
res.setHeader 'Cache-Control', 'no-cache, max-age=0, must-revalidate'
res.setHeader 'Content-Type', 'application/x-git-receive-pack-advertisement'
packet = "# service=git-receive-pack\n"
length = packet.length + 4
hex = "0123456789abcdef"
prefix = hex.charAt (length >> 12) & 0xf
prefix = prefix + hex.charAt (length >> 8) & 0xf
prefix = prefix + hex.charAt (length >> 4) & 0xf
prefix = prefix + hex.charAt length & 0xf
res.write "#{prefix}#{packet}0000"
git = spawn 'C:/Projects/GitHttpServer/MAIN/build/web/git-receive-pack.cmd', ['--stateless-rpc', '--advertise-refs', 'C:/GitProjects/GitServer']
git.stdout.pipe res
git.stderr.on 'data', (data) ->
console.log "stderr: #{data}"
git.on 'exit', ->
res.end()
I can add this handler to my Git server application by assigning a route:
app.get '/:user/:project/git/info/refs', routes.getInfoRefs
For now, you will notice that I am hard-coding the paths to the Git repository and to the processes that I am spawning. I am also ignoring the user name and project name parameters in the URL. That will get fixed later. Right now, I just want to see the solution work.
The getInfoRefs handler basically outputs a set of HTTP headers to disable response caching and set the content type of the response. Next, there are a set of header bytes that need to be output in a special format. The first bytes returned are the length of the record specified as a 4 character hexadecimal string. I create the header record, get it’s length, build the hexadecimal string, and then write the record to the response.
After the header record, the rest of the output comes from executing the
git receive-pack
command. I spawn the batch script that I created earlier
and pass the parameters to the script. The script executes Git and passes
the parameters to the receive-pack
command. I then redirect the standard
output stream of the Git child process to the HTTP response stream. I also
listen for any errors being written to the standard error stream and log those
errors. Finally, when the Git process finishes executing, I close the HTTP
response stream.
Obviously, this is a bare bones implementation that needs more work, but when I ran it, I found that this operation succeeded, so I am able to move to the next step in the process. The next step is Git executes an HTTP POST to send the change set to the remote repository:
POST http://localhost:8000/michael.collins/gitserver/git/git-receive/pack
The payload of the request is a blob containing the change sets to be pushed
to the hosted repository. Again, this will execute the git receive-pack
command, so I have to take the body of the HTTP request and send it via the
standard input stream to Git. The handler for this request looks like this:
exports.postReceivePack = (req, res) ->
res.setHeader 'Expires', 'Fri, 01 Jan 1980 00:00:00 GMT'
res.setHeader 'Pragma', 'no-cache'
res.setHeader 'Cache-Control', 'no-cache, max-age=0, must-revalidate'
res.setHeader 'Content-Type', 'application/x-git-receive-pack-result'
git = spawn 'C:/Projects/GitHttpServer/MAIN/build/web/git-receive-pack.cmd', ['--stateless-rpc', 'C:/GitProjects/GitServer']
req.pipe git.stdin
git.stdout.pipe res
git.stderr.on 'data', (data) ->
console.log "stderr: #{data}"
git.on 'exit', ->
res.end()
This handler gets added to the Git server application:
app.post '/:user/:project/git/git-receive-pack', routes.postReceivePack
I then find that executing the push command again now succeeds:
git push -u origin master
I should be able to go to my hosted repository and execute the git log
command, and my change set should now appear in the repository, which it
does.
In terms of my objectives, my first objective was to push the initial change
set to the hosted repository, which I just accomplished. My third objective is
to push later change sets to the remote repository. Since I now have changes
because I implemented the git push
behavior for the initial commit, I can
commit my changes to my local repository and execute git push origin master
again. What I saw was that the Git client successfully pushed my additional
changes to the hosted repository. Checking the hosted repository verifies that
my changes are now in the remote repository. So two objectives are completed.
Cloning the Hosted Git Repository
Fresh on the success of pushing changes from my local repository to my hosted repository, the next thing that I want to do is to be able to clone the hosted repository. It is not so much an issue for me, since I do have a local repository, but long-term I do not want to be the only person working on this project. I am a busy person. I want help and collaborators and they will need to be able to clone my repository.
Following the workflow from the previous section, I am going to start off and
execute a git clone
command to see what happens:
git clone http://localhost:8000/michael.collins/gitserver/git
Executing this command shows me that Git is first trying to send the following request to my web application:
GET http://localhost:8000/michael.collins/gitserver/git/info/refs?service=git-upload-pack
While I did implement support for the /info/refs
path in the previous
section, the command was for git-receive-pack
. git-upload-pack
is another
command that I have to implement.
Here’s the interesting thing about git-upload-pack
. While the
git-receive-pack
command on Windows is a shortcut for the git receive-pack
command, git-upload-pack
is actually an executable on Windows. It can also
be executing using the git upload-pack
command. To make life simple, I am
going to create a second batch script that aliases the git-upload-pack
command:
@git upload-pack %*
Next, I am going to change my implementation of getInfoRefs
to support
executing the service specified in the service
query parameter:
exports.getInfoRefs = (req, res) ->
service = req.query.service
res.setHeader 'Expires', 'Fri, 01 Jan 1980 00:00:00 GMT'
res.setHeader 'Pragma', 'no-cache'
res.setHeader 'Cache-Control', 'no-cache, max-age=0, must-revalidate'
res.setHeader 'Content-Type', "application/x-#{service}-advertisement" # <-- change here
packet = "# service=#{service}\n" # <-- change here
length = packet.length + 4
hex = "0123456789abcdef"
prefix = hex.charAt (length >> 12) & 0xf
prefix = prefix + hex.charAt (length >> 8) & 0xf
prefix = prefix + hex.charAt (length >> 4) & 0xf
prefix = prefix + hex.charAt length & 0xf
res.write "#{prefix}#{packet}0000"
# change the batch script that is called
git = spawn "C:/Projects/GitHttpServer/MAIN/build/web/#{service}.cmd", ['--stateless-rpc', '--advertise-refs', 'C:/GitProjects/GitServer']
git.stdout.pipe res
git.stderr.on 'data', (data) ->
console.log "stderr: #{data}"
git.on 'exit', ->
res.end()
Updating my web application and running the git clone
command should show me
that this operation now works. The next HTTP request that is sent to the web
application by the git clone
command is an HTTP POST that looks like this:
POST http://localhost:8000/michael.collins/gitserver/git/git-upload-pack
Why this is a POST on the clone
is unknown to me. It is also not relevant
because I do not need to process the POST data. I just need to forward the body
to Git. The implementation here is very similar to the POST operation that I
implemented in the previous section, so I am just going to copy the code and
modify it to run the correct batch script:
exports.postUploadPack = (req, res) ->
res.setHeader 'Expires', 'Fri, 01 Jan 1980 00:00:00 GMT'
res.setHeader 'Pragma', 'no-cache'
res.setHeader 'Cache-Control', 'no-cache, max-age=0, must-revalidate'
res.setHeader 'Content-Type', 'application/x-git-upload-pack-result'
git spawn 'C:/Projects/GitHttpServer/MAIN/build/web/git-upload-pack.cmd', ['--stateless-rpc', 'C:/GitProjects/GitServer']
req.pipe git.stdin
git.stdout.pipe res
git.stderr.on 'data', (data) ->
console.log "stderr: #{data}"
git.on 'exit', ->
res.end()
I added this operation to my web application:
app.post '/:user/:project/git/git-upload-pack', routes.postUploadPack
Again, updating the web application and running the git clone
command should
show me that this operation completes. In fact, what I found was that
implementing this operation resulted in the clone operation succeeding and
the cloned repository was now in my workspace. A third objective is now
completed.
Fetching and Pulling Changes from the Hosted Repository
Three of five objectives complete. I can clone a repository, make changes to
it, and push my changes back to the hosted repository. The last piece of the
puzzle is that other people that I am collaborating with need to be able to
pull my changes. Therefore, I need support for git fetch
and git pull
.
In order to test a fetch, I committed the changes that I made in the last
section and pushed the changes to the host repository. I next went to the
cloned repository and executed a git fetch origin
command. What I found was
that the operation completed successfully. It pulled down the changes into
my new local repository. Executing git merge origin/master
merged the changes
from the origin/master branch into my local master branch. A fourth
objective complete and I did not have to do anything.
My next thought is that maybe git pull
will work the same way, since I’m
guessing that it uses the same logic as git fetch
and git merge
. To test
it out, I created edited the README file in my project repository and pushed
the change to the hosted repository. I next switched over to my other local
repository and executed the git pull origin
command to pull the changes
from the origin/master branch and merge them into my local master branch.
Again, the operation completed successfully and my local repository was updated
with the changes from the other repository.
Five out of five objectives completed successfully.
Conclusion and Next Steps
In this post, I created a Node.js application that allowed me to expose hosted Git repositories using IIS and running on a Windows server. The web application is not necessarily complete as there are still features like error handling and user authentication and authorization that have to be implemented, but the basic operations needed to clone, push, pull, and fetch are now supported.
In the next post on this subject, I will look at what I can do in order to make the solution stronger and usable in an intranet scenario.
I also know that some of the people that I work with aren’t so keen yet on using Node.js over ASP.NET MVC, so I may take a look at creating the same functionality inside of an ASP.NET MVC application.