Add the `files` directories to the .gitignore to prevent them from being
committed again. Update the readme with the latest command line options,
and revise method documentation to match implementation.
Allow the important data to be explicitly written to a file via a
command line switch. The default is still stdout, and redirecting
output will still only redirect the important data to the file, ignoring
summary data on stderr.
Add status during runtime and summary upon completion, for a better user
experience.
Use a slightly more sophisticated method to determine similarity than
just trying to find duplicated lines, which falls apart fairly quickly.
Instead add value to the histogram while scanning the first file, and
subtract while scanning the second. After this, any entries with a
vvalue of 0 indicate matching lines. The magnitudes of anything elsefrom
zero are summed and used to calculate a similarity fraction.
Break the worker function into one that ranges over the channel and one
that actually does the work of associating the file with a document if
it is determined to match.
Use a bounded worker pool to prevent creation of hundreds of goroutines
contending for scheduling. Add some tests, a Dockerfile, a Makefile, and
a readme.