Skip to content

Commit

Permalink
Feature: input-checksum-quick option (#1265)
Browse files Browse the repository at this point in the history
  • Loading branch information
emmercm authored Aug 2, 2024
1 parent 89265ff commit 751e287
Show file tree
Hide file tree
Showing 10 changed files with 136 additions and 27 deletions.
20 changes: 17 additions & 3 deletions docs/roms/matching.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ And some DAT release groups do not include filesize information for every file,

!!! success

For situations like these, Igir will automatically detect what combination of checksums it needs to calculate for input files to be able to match them to DATs. This has the chance of greatly slowing down file scanning, especially with archives.
For situations like these, Igir will automatically detect what combination of checksums it needs to calculate for input files to be able to match them to DATs. This _does_ have the chance of greatly slowing down file scanning, especially with archives, so you can use the `--input-checksum-quick` (below) option to keep processing faster.

For example, if you provide all of these DATs at once with the [`--dat <path>` option](../dats/processing.md):

Expand All @@ -30,6 +30,20 @@ For example, if you provide all of these DATs at once with the [`--dat <path>` o

When generating a [dir2dat](../dats/dir2dat.md) with the `igir dir2dat` command, Igir will calculate CRC32, MD5, and SHA1 information for every file. This helps ensure that the generated DAT has the most complete information it can. You can additionally add SHA256 information with the option `igir [commands..] [options] --input-min-checksum SHA256` (below).

## Quick scanning files

A number of archives formats require the extraction of files to calculate their checksums, and this extraction can greatly increase scanning time and add hard drive wear & tear. Igir's default settings will give you the best chance of matching input files to DATs, but there may be situations where you want to make scanning faster.

The `--input-checksum-quick` option will prevent the extraction of archives (either in memory _or_ using temporary files) to calculate checksums of files contained inside. This means that Igir will rely solely on the information available in the archive's file directory. Non-archive files will still have their checksum calculated as normal. See the [archive formats](../input/reading-archives.md) page for more information about what file types contain what checksum information.

!!! warning

If an archive format doesn't contain any checksum information (e.g. `.cso`, `.tar.gz`), then there is no way to match those input files to DATs! Only use quick scanning when all input archives store checksums of their files!

!!! warning

Different DAT groups catalog CHDs of CDs (`.bin` & `.cue`) and GDIs (`.gdi` & `.bin`/`.raw`) that use a track sheet plus one or more track files differnetly. Take the Sega Dreamcast for example, Redump catalogs `.bin` & `.cue` files (which is [problematic with CHDs](https://github.com/mamedev/mame/issues/11903)), [MAME Redump](https://github.com/MetalSlug/MAMERedump) catalogs `.chd` CD files, and TOSEC catalogs `.gdi` & `.bin`/`.raw` files. Quick scanning of CHDs means only the SHA1 stored in its header will be used for matching, which may or may not work depending on the DATs you use.

## Manually using other checksum algorithms

!!! danger
Expand All @@ -44,9 +58,9 @@ igir [commands..] [options] --input-min-checksum SHA1
igir [commands..] [options] --input-min-checksum SHA256
```

This option defines the _minimum_ checksum that will be used based on digest size (below). If not every ROM in every DAT provides the checksum you specify, Igir may automatically calculate and match files based on a higher checksum (see above).
This option defines the _minimum_ checksum that will be used based on digest size (below). If not every ROM in every DAT provides the checksum you specify, Igir may automatically calculate and match files based on a higher checksum (see above), but never lower.

The reason you might want to do this is to have a higher confidence that found files _exactly_ match ROMs in DATs. Just keep in mind that explicitly enabling non-CRC32 checksums will _greatly_ slow down scanning of files within archives.
The reason you might want to do this is to have a higher confidence that found files _exactly_ match ROMs in DATs. Just keep in mind that explicitly enabling non-CRC32 checksums will _greatly_ slow down scanning of files within archives (see "quick scanning" above).

Here is a table that shows the keyspace for each checksum algorithm, where the higher number of bits reduces the chances of collisions:

Expand Down
2 changes: 1 addition & 1 deletion src/igir.ts
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ export default class Igir {
}

private determineScanningBitmask(dats: DAT[]): number {
const minimumChecksum = this.options.getInputMinChecksum() ?? ChecksumBitmask.CRC32;
const minimumChecksum = this.options.getInputChecksumMin() ?? ChecksumBitmask.CRC32;
let matchChecksum = minimumChecksum;

if (this.options.getPatchFileCount() > 0) {
Expand Down
14 changes: 13 additions & 1 deletion src/modules/argumentsParser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,19 @@ export default class ArgumentsParser {
type: 'array',
requiresArg: true,
})
.option('input-min-checksum', {
.option('input-checksum-quick', {
group: groupRomInput,
description: 'Only read checksums from archive headers, don\'t decompress to calculate',
type: 'boolean',
})
.check((checkArgv) => {
// Re-implement `conflicts: 'input-checksum-min'`, which isn't possible with a default value
if (checkArgv['input-checksum-quick'] && checkArgv['input-checksum-min'] !== ChecksumBitmask[ChecksumBitmask.CRC32].toUpperCase()) {
throw new ExpectedError('Arguments input-checksum-quick and input-checksum-min are mutually exclusive');
}
return true;
})
.option('input-checksum-min', {
group: groupRomInput,
description: 'The minimum checksum level to calculate and use for matching',
choices: Object.keys(ChecksumBitmask)
Expand Down
7 changes: 6 additions & 1 deletion src/modules/scanner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import ArrayPoly from '../polyfill/arrayPoly.js';
import fsPoly from '../polyfill/fsPoly.js';
import ArchiveEntry from '../types/files/archives/archiveEntry.js';
import File from '../types/files/file.js';
import { ChecksumBitmask } from '../types/files/fileChecksums.js';
import FileFactory from '../types/files/fileFactory.js';
import Options from '../types/options.js';
import Module from './module.js';
Expand Down Expand Up @@ -65,7 +66,11 @@ export default abstract class Scanner extends Module {
}
}

const filesFromPath = await FileFactory.filesFrom(filePath, checksumBitmask);
const filesFromPath = await FileFactory.filesFrom(
filePath,
checksumBitmask,
this.options.getInputChecksumQuick() ? ChecksumBitmask.NONE : checksumBitmask,
);

const fileIsArchive = filesFromPath.some((file) => file instanceof ArchiveEntry);
if (checksumArchives && fileIsArchive) {
Expand Down
7 changes: 7 additions & 0 deletions src/types/files/archives/chd/chd.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,19 @@ export default class Chd extends Archive {

async getArchiveEntries(checksumBitmask: number): Promise<ArchiveEntry<this>[]> {
const info = await this.getInfo();

if (checksumBitmask === ChecksumBitmask.NONE) {
// Doing a quick scan
return this.getArchiveEntriesSingleFile(info, checksumBitmask);
}

if (info.type === CHDType.CD_ROM) {
return ChdBinCueParser.getArchiveEntriesBinCue(this, checksumBitmask);
} if (info.type === CHDType.GD_ROM) {
// TODO(cemmer): allow parsing GD-ROM to bin/cue https://github.com/mamedev/mame/issues/11903
return ChdGdiParser.getArchiveEntriesGdRom(this, checksumBitmask);
}

return this.getArchiveEntriesSingleFile(info, checksumBitmask);
}

Expand Down
11 changes: 6 additions & 5 deletions src/types/files/fileFactory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,22 +23,23 @@ import FileSignature from './fileSignature.js';
export default class FileFactory {
static async filesFrom(
filePath: string,
checksumBitmask: number = ChecksumBitmask.CRC32,
fileChecksumBitmask: number = ChecksumBitmask.CRC32,
archiveChecksumBitmask = fileChecksumBitmask,
): Promise<File[]> {
if (!this.isExtensionArchive(filePath)) {
const entries = await this.entriesFromArchiveSignature(filePath, checksumBitmask);
const entries = await this.entriesFromArchiveSignature(filePath, archiveChecksumBitmask);
if (entries !== undefined) {
return entries;
}
return [await this.fileFrom(filePath, checksumBitmask)];
return [await this.fileFrom(filePath, fileChecksumBitmask)];
}

try {
const entries = await this.entriesFromArchiveExtension(filePath, checksumBitmask);
const entries = await this.entriesFromArchiveExtension(filePath, archiveChecksumBitmask);
if (entries !== undefined) {
return entries;
}
return [await this.fileFrom(filePath, checksumBitmask)];
return [await this.fileFrom(filePath, fileChecksumBitmask)];
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
throw new ExpectedError(`file doesn't exist: ${filePath}`);
Expand Down
18 changes: 13 additions & 5 deletions src/types/options.ts
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ export interface OptionsProps {

readonly input?: string[],
readonly inputExclude?: string[],
readonly inputMinChecksum?: string,
readonly inputChecksumQuick?: boolean,
readonly inputChecksumMin?: string,
readonly inputChecksumArchives?: string,

readonly dat?: string[],
Expand Down Expand Up @@ -182,7 +183,9 @@ export default class Options implements OptionsProps {

readonly inputExclude: string[];

readonly inputMinChecksum?: string;
readonly inputChecksumQuick: boolean;

readonly inputChecksumMin?: string;

readonly inputChecksumArchives?: string;

Expand Down Expand Up @@ -375,7 +378,8 @@ export default class Options implements OptionsProps {

this.input = options?.input ?? [];
this.inputExclude = options?.inputExclude ?? [];
this.inputMinChecksum = options?.inputMinChecksum;
this.inputChecksumQuick = options?.inputChecksumQuick ?? false;
this.inputChecksumMin = options?.inputChecksumMin;
this.inputChecksumArchives = options?.inputChecksumArchives;

this.dat = options?.dat ?? [];
Expand Down Expand Up @@ -775,9 +779,13 @@ export default class Options implements OptionsProps {
return globPattern;
}

getInputMinChecksum(): ChecksumBitmask | undefined {
getInputChecksumQuick(): boolean {
return this.inputChecksumQuick;
}

getInputChecksumMin(): ChecksumBitmask | undefined {
const checksumBitmask = Object.keys(ChecksumBitmask)
.find((bitmask) => bitmask.toUpperCase() === this.inputMinChecksum?.toUpperCase());
.find((bitmask) => bitmask.toUpperCase() === this.inputChecksumMin?.toUpperCase());
if (!checksumBitmask) {
return undefined;
}
Expand Down
2 changes: 1 addition & 1 deletion test/igir.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -827,7 +827,7 @@ describe('with explicit DATs', () => {
commands: ['copy', 'extract', 'test'],
dat: [path.join(inputTemp, 'dats', '*')],
input: [path.join(inputTemp, 'roms')],
inputMinChecksum: ChecksumBitmask[ChecksumBitmask.MD5].toLowerCase(),
inputChecksumMin: ChecksumBitmask[ChecksumBitmask.MD5].toLowerCase(),
patch: [path.join(inputTemp, 'patches')],
output: outputTemp,
dirDatName: true,
Expand Down
31 changes: 22 additions & 9 deletions test/modules/argumentsParser.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,8 @@ describe('options', () => {
expect(options.shouldReport()).toEqual(false);

expect(options.getInputPaths()).toEqual([os.devNull]);
expect(options.getInputMinChecksum()).toEqual(ChecksumBitmask.CRC32);
expect(options.getInputChecksumQuick()).toEqual(false);
expect(options.getInputChecksumMin()).toEqual(ChecksumBitmask.CRC32);
expect(options.getInputChecksumArchives()).toEqual(InputChecksumArchivesMode.AUTO);

expect(options.getDatNameRegex()).toBeUndefined();
Expand Down Expand Up @@ -244,15 +245,27 @@ describe('options', () => {
expect((await argumentsParser.parse(['copy', '--input', './src', '--output', os.devNull, '--input-exclude', './src']).scanInputFilesWithoutExclusions()).length).toEqual(0);
});

it('should parse "input-min-checksum', () => {
expect(argumentsParser.parse(dummyCommandAndRequiredArgs).getInputMinChecksum())
it('should parse "input-checksum-quick"', () => {
expect(() => argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-quick', '--input-checksum-min', 'MD5'])).toThrow(/mutually exclusive/i);
expect(() => argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-quick', '--input-checksum-min', 'SHA1'])).toThrow(/mutually exclusive/i);
expect(() => argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-quick', '--input-checksum-min', 'SHA256'])).toThrow(/mutually exclusive/i);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-quick']).getInputChecksumQuick()).toEqual(true);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-quick', 'true']).getInputChecksumQuick()).toEqual(true);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-quick', 'false']).getInputChecksumQuick()).toEqual(false);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-quick', '--input-checksum-quick']).getInputChecksumQuick()).toEqual(true);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-quick', 'false', '--input-checksum-quick', 'true']).getInputChecksumQuick()).toEqual(true);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-quick', 'true', '--input-checksum-quick', 'false']).getInputChecksumQuick()).toEqual(false);
});

it('should parse "input-checksum-min', () => {
expect(argumentsParser.parse(dummyCommandAndRequiredArgs).getInputChecksumMin())
.toEqual(ChecksumBitmask.CRC32);
expect(() => argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-min-checksum', 'foobar']).getInputMinChecksum()).toThrow(/invalid values/i);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-min-checksum', 'CRC32']).getInputMinChecksum()).toEqual(ChecksumBitmask.CRC32);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-min-checksum', 'MD5']).getInputMinChecksum()).toEqual(ChecksumBitmask.MD5);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-min-checksum', 'SHA1']).getInputMinChecksum()).toEqual(ChecksumBitmask.SHA1);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-min-checksum', 'SHA256']).getInputMinChecksum()).toEqual(ChecksumBitmask.SHA256);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-min-checksum', 'SHA256', '--input-min-checksum', 'CRC32']).getInputMinChecksum()).toEqual(ChecksumBitmask.CRC32);
expect(() => argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-min', 'foobar']).getInputChecksumMin()).toThrow(/invalid values/i);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-min', 'CRC32']).getInputChecksumMin()).toEqual(ChecksumBitmask.CRC32);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-min', 'MD5']).getInputChecksumMin()).toEqual(ChecksumBitmask.MD5);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-min', 'SHA1']).getInputChecksumMin()).toEqual(ChecksumBitmask.SHA1);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-min', 'SHA256']).getInputChecksumMin()).toEqual(ChecksumBitmask.SHA256);
expect(argumentsParser.parse([...dummyCommandAndRequiredArgs, '--input-checksum-min', 'SHA256', '--input-checksum-min', 'CRC32']).getInputChecksumMin()).toEqual(ChecksumBitmask.CRC32);
});

it('should parse "input-checksum-archives"', () => {
Expand Down
51 changes: 50 additions & 1 deletion test/modules/romScanner.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ import path from 'node:path';

import Temp from '../../src/globals/temp.js';
import ROMScanner from '../../src/modules/romScanner.js';
import ArrayPoly from '../../src/polyfill/arrayPoly.js';
import fsPoly from '../../src/polyfill/fsPoly.js';
import ArchiveEntry from '../../src/types/files/archives/archiveEntry.js';
import { ChecksumBitmask } from '../../src/types/files/fileChecksums.js';
import Options, { OptionsProps } from '../../src/types/options.js';
import ProgressBarFake from '../console/progressBarFake.js';
Expand Down Expand Up @@ -48,11 +50,58 @@ describe('multiple files', () => {
[{ input: [path.join('test', 'fixtures', 'roms', 'tar')] }, 12],
[{ input: [path.join('test', 'fixtures', 'roms', 'zip')] }, 15],
] satisfies [OptionsProps, number][])('should calculate checksums of archives: %s', async (optionsProps, expectedRomFiles) => {
const checksumBitmask = Object.keys(ChecksumBitmask)
.filter((bitmask): bitmask is keyof typeof ChecksumBitmask => Number.isNaN(Number(bitmask)))
.reduce((allBitmasks, bitmask) => allBitmasks | ChecksumBitmask[bitmask], 0);
const scannedFiles = await new ROMScanner(new Options(optionsProps), new ProgressBarFake())
.scan(ChecksumBitmask.CRC32, true);
.scan(checksumBitmask, true);
expect(scannedFiles).toHaveLength(expectedRomFiles);
});

it('should scan quickly', async () => {
const options = new Options({
input: [path.join('test', 'fixtures', 'roms')],
inputChecksumQuick: true,
});

const scannedFiles = await new ROMScanner(options, new ProgressBarFake())
.scan(ChecksumBitmask.CRC32, false);

const extensionsWithoutCrc32 = scannedFiles
.filter((file) => file instanceof ArchiveEntry)
.filter((file) => !file.getCrc32())
.map((file) => {
const match = file.getFilePath().match(/[^.]+((\.[a-zA-Z0-9]+)+)$/);
return match ? match[1] : undefined;
})
.filter(ArrayPoly.filterNotNullish)
.reduce(ArrayPoly.reduceUnique(), [])
.sort();
expect(extensionsWithoutCrc32).toEqual(['.chd', '.tar.gz']);

const entriesWithMd5 = scannedFiles
.filter((file) => file instanceof ArchiveEntry)
.filter((file) => file.getMd5());
expect(entriesWithMd5).toHaveLength(0);

const extensionsWithSha1 = scannedFiles
.filter((file) => file instanceof ArchiveEntry)
.filter((file) => file.getSha1())
.map((file) => {
const match = file.getFilePath().match(/[^.]+((\.[a-zA-Z0-9]+)+)$/);
return match ? match[1] : undefined;
})
.filter(ArrayPoly.filterNotNullish)
.reduce(ArrayPoly.reduceUnique(), [])
.sort();
expect(extensionsWithSha1).toEqual(['.chd']);

const entriesWithSha256 = scannedFiles
.filter((file) => file instanceof ArchiveEntry)
.filter((file) => file.getSha256());
expect(entriesWithSha256).toHaveLength(0);
});

it('should scan multiple files with some file exclusions', async () => {
await expect(createRomScanner(['test/fixtures/roms/**/*'], ['test/fixtures/roms/**/*.rom']).scan()).resolves.toHaveLength(77);
await expect(createRomScanner(['test/fixtures/roms/**/*'], ['test/fixtures/roms/**/*.rom', 'test/fixtures/roms/**/*.rom']).scan()).resolves.toHaveLength(77);
Expand Down

0 comments on commit 751e287

Please sign in to comment.