1.2 KiB
1.2 KiB
Docgrouper
Given a set of files with an integer timestamp as its first line, identify a set of documents that they represent at various points of the document's life.
Building
Building docgrouper requires Go, and can be built by
running make build. Because Go might not be installed, a Dockerfile is
provided to test and build a container image. The docker image can be built via
the docker-build Makefile target.
Running
If running via Docker, the directory where the file pool exists must be mounted
into the container, via the -v or --volume switch, like so:
docker run --volume ./host-files:/files steelray-docgrouper
This invocation is made available via the docker-run Makefile target, but this
will only invoke docgrouper with the default command line arguments since
arguments cannot be passed to a Makefile target.
Options
-output string
output file (default is stdout)
-path string
path to the file pool (default "files")
-prefix
use '[doc ###]' prefix for output
-threshold float
similarity threshold (default 0.5)
-verbose
enable verbose logging
-workers int
number of workers to use (default 2*<number-of-cores>)