40 lines
1.2 KiB
Markdown
40 lines
1.2 KiB
Markdown
# Docgrouper
|
|
|
|
Given a set of files with an integer timestamp as its first line, identify a set
|
|
of documents that they represent at various points of the document's life.
|
|
|
|
## Building
|
|
|
|
Building **docgrouper** requires [Go](https://go.dev), and can be built by
|
|
running `make build`. Because Go might not be installed, a `Dockerfile` is
|
|
provided to test and build a container image. The docker image can be built via
|
|
the `docker-build` Makefile target.
|
|
|
|
## Running
|
|
|
|
If running via Docker, the directory where the file pool exists must be mounted
|
|
into the container, via the `-v` or `--volume` switch, like so:
|
|
|
|
```
|
|
docker run --volume ./host-files:/files steelray-docgrouper
|
|
```
|
|
|
|
This invocation is made available via the `docker-run` Makefile target, but this
|
|
will only invoke docgrouper with the default command line arguments since
|
|
arguments cannot be passed to a Makefile target.
|
|
|
|
## Options
|
|
|
|
```
|
|
-path string
|
|
path to the file pool (default "files")
|
|
-prefix
|
|
use '[doc ###]' prefix for output
|
|
-threshold float
|
|
similarity threshold (default 0.5)
|
|
-verbose
|
|
enable verbose logging
|
|
-workers int
|
|
number of workers to use (default 2*<number-of-cores>)
|
|
```
|