# Docgrouper Given a set of files with an integer timestamp as its first line, identify a set of documents that they represent at various points of the document's life. ## Building Building **docgrouper** requires [Go](https://go.dev), and can be built by running `make build`. Because Go might not be installed, a `Dockerfile` is provided to test and build a container image. The docker image can be built via the `docker-build` Makefile target. ## Running If running via Docker, the directory where the file pool exists must be mounted into the container, via the `-v` or `--volume` switch, like so: ``` docker run --volume ./host-files:/files steelray-docgrouper ``` This invocation is made available via the `docker-run` Makefile target, but this will only invoke docgrouper with the default command line arguments since arguments cannot be passed to a Makefile target. ## Options ``` -output string output file (default is stdout) -path string path to the file pool (default "files") -prefix use '[doc ###]' prefix for output -threshold float similarity threshold (default 0.5) -verbose enable verbose logging -workers int number of workers to use (default 2*) ```