-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: update README, add doc strings to all exported functionality
- Loading branch information
1 parent
86d9c9b
commit 356b236
Showing
6 changed files
with
222 additions
and
113 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,44 @@ | ||
# go-hibp-sync | ||
|
||
`go-hibp-sync` provides functionality to keep a local copy of the HIBP leaked password database in sync with the upstream version at https://haveibeenpowned.com. | ||
In addition to syncing the database, the library allows exporting it into a single list — the former distribution format of the database. | ||
`go-hibp-sync` provides functionality to keep a local copy of the *HIBP leaked password database* in sync with the upstream version at [https://haveibeenpowned.com](https://haveibeenpowned.com). | ||
In addition to syncing the "database", the library allows exporting it into a single list — the former distribution format of the database — and querying it for a given *k-proximity range*. | ||
|
||
This local copy consists of one file per range/prefix, grouped into `256` directories (first `2` of `5` prefix characters). | ||
As an uncompressed copy of the database would currently require around `~40 GiB` of disk space, a moderate level of `zstd` compression is applied with the result of cutting down storage consumption by `50%`. | ||
This compression can be disabled if the little computational overhead caused outweighs the advantage of requiring only half the space. | ||
|
||
To avoid unnecessary network transfers and to also speed up things, `go-hibp-sync` additionally keeps the `etag` returned by the upstream CDN. | ||
Subsequent requests contain it and should allow for more frequent syncs, not necessarily resulting in full re-downloads. | ||
Of course, this can be disabled too. | ||
|
||
The library supports to continue from where it left off, the `sync` command mentioned below demonstrates this. | ||
|
||
The basic API is really simple; two functions are exported (and additionally, typed configuration options): | ||
|
||
```go | ||
Sync(options ...SyncOption) error // Syncs the local copy with the upstream database | ||
Export(w io.Writer, options ...ExportOption) error // Writes a continuous, decompressed and "free-of-etags" stream to the given io.Writer | ||
``` | ||
|
||
Additionally, the library can also operate on its data using the `RangeAPI` type and its `Query` method. | ||
This operates on disk but, depending on the medium, should provide access times that are probably good enough for all scenarios. | ||
A memory-based `tmpfs` will speed things up when necessary. | ||
|
||
```go | ||
querier := NewRangeAPI(/* optional options go here */) | ||
kProximityResponse, err := querier.Query("ABCDE") | ||
// TODO: Handle error | ||
// TODO: Read the response (as before received from the upstream API) line-by-line and check whether it contains your hash. | ||
``` | ||
|
||
There are two basic CLI commands, `sync` and `export` that can be used for manual tasks and serve as minimal examples on how to use the library. | ||
They are basic but should play well with other tooling. | ||
`sync` will track the progress and is able to continue from where it left of last. | ||
|
||
Run them with: | ||
|
||
```bash | ||
go run github.com/exaring/go-hibp-sync/cmd/sync | ||
# and | ||
go run github.com/exaring/go-hibp-sync/cmd/export | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
package hibpsync | ||
|
||
import ( | ||
"context" | ||
"io" | ||
) | ||
|
||
type commonConfig struct { | ||
dataDir string | ||
noCompression bool | ||
} | ||
|
||
type syncConfig struct { | ||
commonConfig | ||
ctx context.Context | ||
endpoint string | ||
minWorkers int | ||
progressFn ProgressFunc | ||
stateFile io.ReadWriteSeeker | ||
lastRange int64 | ||
} | ||
|
||
// SyncOption represents a type of function that can be used to customize the behavior of the Sync function. | ||
type SyncOption func(config *syncConfig) | ||
|
||
// SyncWithContext sets the context for the sync operation. | ||
func SyncWithContext(ctx context.Context) SyncOption { | ||
return func(c *syncConfig) { | ||
c.ctx = ctx | ||
} | ||
} | ||
|
||
// SyncWithDataDir sets the data directory for the sync operation. | ||
// The directory will be created it if it does not exist. | ||
// Default: "./.hibp-data" | ||
func SyncWithDataDir(dataDir string) SyncOption { | ||
return func(c *syncConfig) { | ||
c.dataDir = dataDir | ||
} | ||
} | ||
|
||
// SyncWithEndpoint sets a custom endpoint instead of the default HIBP API endpoint. | ||
// Default: "https://api.pwnedpasswords.com/range/" | ||
func SyncWithEndpoint(endpoint string) SyncOption { | ||
return func(c *syncConfig) { | ||
c.endpoint = endpoint | ||
} | ||
} | ||
|
||
// SyncWithMinWorkers sets the minimum number of workers goroutines that will be used to process the ranges. | ||
// Default: 50 | ||
func SyncWithMinWorkers(workers int) SyncOption { | ||
return func(c *syncConfig) { | ||
c.minWorkers = workers | ||
} | ||
} | ||
|
||
// SyncWithStateFile sets the state file to be used for tracking progress. | ||
// This can either be an os.File or any other implementation of io.ReadWriteSeeker. | ||
// Seeking is only used to jump back to the start of the "virtual file". | ||
// It should be easy enough to decorate a bytes.Buffer with the necessary methods to make it work. | ||
// Default: nil, i.e., no state will be tracked. | ||
func SyncWithStateFile(stateFile io.ReadWriteSeeker) SyncOption { | ||
return func(c *syncConfig) { | ||
c.stateFile = stateFile | ||
} | ||
} | ||
|
||
// SyncWithProgressFn sets a custom progress function that will be called regularly. | ||
// The function should return an error if the operation should be aborted. | ||
// Note, there is no guarantee that the function will be called for every prefix. | ||
// Default: no-op function | ||
func SyncWithProgressFn(progressFn ProgressFunc) SyncOption { | ||
return func(c *syncConfig) { | ||
c.progressFn = progressFn | ||
} | ||
} | ||
|
||
// SyncWithNoCompression disables compression for the sync operation. | ||
// This seriously increases the amount of storage required. | ||
// Default: false | ||
func SyncWithNoCompression() SyncOption { | ||
return func(c *syncConfig) { | ||
c.noCompression = true | ||
} | ||
} | ||
|
||
// SyncWithLastRange sets the last range to be processed. | ||
// Aside from tests, this is rarely useful. | ||
// Default: 0xFFFFF | ||
func SyncWithLastRange(to int64) SyncOption { | ||
return func(c *syncConfig) { | ||
c.lastRange = to | ||
} | ||
} | ||
|
||
type exportConfig struct { | ||
commonConfig | ||
} | ||
|
||
// ExportOption represents a type of function that can be used to customize the behavior of the Export function. | ||
type ExportOption func(*exportConfig) | ||
|
||
// ExportWithDataDir sets the data directory for the export operation. | ||
// Default: "./.hibp-data" | ||
func ExportWithDataDir(dataDir string) ExportOption { | ||
return func(c *exportConfig) { | ||
c.dataDir = dataDir | ||
} | ||
} | ||
|
||
// ExportWithNoCompression instructs the export operation to assume the local data is not compressed. | ||
// This should be in sync with the configuration of the call to Sync. | ||
// Default: false | ||
func ExportWithNoCompression() ExportOption { | ||
return func(c *exportConfig) { | ||
c.noCompression = true | ||
} | ||
} | ||
|
||
type queryConfig struct { | ||
commonConfig | ||
} | ||
|
||
// RangeAPIOption represents a type of function that can be used to customize the behavior of the RangeAPI constructor. | ||
type RangeAPIOption func(*queryConfig) | ||
|
||
// QueryWithDataDir sets the data directory for the RangeAPI. | ||
// Default: "./.hibp-data" | ||
func QueryWithDataDir(dataDir string) RangeAPIOption { | ||
return func(c *queryConfig) { | ||
c.dataDir = dataDir | ||
} | ||
} | ||
|
||
// QueryWithNoCompression instructs the RangeAPI to assume the local data is not compressed. | ||
// This should be in sync with the configuration of the call to Sync. | ||
// Default: false | ||
func QueryWithNoCompression() RangeAPIOption { | ||
return func(c *queryConfig) { | ||
c.noCompression = true | ||
} | ||
} |