janw.name Personal Blog and Portfolio of Jan Wolff

Hosting a static site from S3 via Caddy (6. August 2024)

I had setup MinIO as a way to self-host S3 buckets for an unrelated project. As a way to force myself to test that setup regularly (and to make my VPS setup more enterprise-y), I opted to host my site(s) from there. Previously I just had the files on an XFS filesystem like a caveman and had Caddy serve them.

Using Caddy is really nice with its automatic SSL, sane config files and whatnot. Thus I wanted to keep using it. So instead of simply serving static files via the file_server directive, it now reverse proxies public S3 buckets served by MinIO. This, at first, seem pretty straightforward. A basic MinIO provides public buckets via an URL like minio.host/$BUCKET/$OBJECT. $OBJECT can, of course, be an identifier that resembles a directory structure. So the initial hunch was to simply configure something like this in the Caddyfile:

rewrite * /$BUCKETNAME{uri}
reverse_proxy minio:9000

This works. Somewhat. Obviously it doesn’t serve index.html when the request points towards a directory. This is bad. All routes on this damn page rely on this to work… Actually, its even worse. By default MinIO serves a listing of all files in the bucket if you request “/”. So all subdirectories are just broken, because the object does not actually exist and the root of the page is an ugly XML listing of all files. Not good.

To prevent MinIO from proving a file listing for publicly readable buckets, simply remove the following Actions from the access policy: s3:ListBucket, s3:ListBucketMultipartUploads. Or vice versa, you only want the permissions s3:GetObject and s3:GetBucketLocation.

Not, MinIO will return 403 when trying to access “/” and 404 when trying to access a directory. We’ll let Caddy handle both errors by simply trying the same route again, with /index.html appended to it.

rewrite * /$BUCKETNAME{uri}
reverse_proxy minio:9000 {
    @error status 403 404
    handle_response @error {
        rewrite * {uri}/index.html
        reverse_proxy minio:9000 {
            @nestedError status 404
            handle_response @nestedError {
                respond "not found" 404
            }
        }
    }
}

This retries the request when the first one returns 403 and 404. Only if the second attempt also returns 404, we present “not found” to the enduser.

So. Done?

Not quite… In S3 one just pretends that object names are fully qualified paths. Right now, we always append /index.html to the request. This works fine for https://janw.name/blog but falls apart if the request URL is https://janw.name/blog/. Thats because the seconds one ends up as a request for the object blog//index.html, which does not exist. Only blog/index.html exists. We’ll need to trim the trailing slash if it is present in the request. This can be done by appending the following to the configuration:

@pathWithSlash path_regexp dir (.+)/$
handle @pathWithSlash {
    rewrite @pathWithSlash {re.dir.1}
}

We can then wrap the whole thing in a nice template like so:

(s3page) {
    @pathWithSlash path_regexp dir (.+)/$
    handle @pathWithSlash {
        rewrite @pathWithSlash {re.dir.1}
    }

    rewrite * /{args[0]}{uri}
    reverse_proxy minio:9000 {
        @error status 403 404
        handle_response @error {
            rewrite * {uri}/index.html
            reverse_proxy minio:9000 {
                @nestedError status 404
                handle_response @nestedError {
                    respond "not found" 404
                }
            }
        }
    }
}

And then use the template like so:

janw.name {
    import s3page "janw.name"
}

In my case I simply hardcoded the MinIO on my internal network into the template (minio:9000). But this could be made configurable like the bucket name if required.