Ci quality workflows #1423

enzymezoo-code · 2023-09-05T03:30:26Z

This lays the foundation for a testing suite. Pytest was chosen for its simplicity, extensibility, and scalability.

The file tests/inference/test_inference.py is an e2e test that generates images and asserts that they are not blank.

The plan is for these automated tests is to orchestrate them through GitHub workflows, ensuring code changes are thoroughly tested prior to merging.

Currently, a few of these tests fail (eg. sampler="uni_pc_bh2" and scheduler="exponential"). We will either need to make these tests pass or decide to limit which samplers+schedulers are fully supported.

…to ci-quality-workflows

enzymezoo-code · 2023-09-05T20:24:45Z

Another issue:

The comfy/cli_args.py file will parse args on import, so this cannot use any built-in pytest args as it is.

Example error message:

 > pytest --co
tests/inference/test_inference.py:20: in <module>
    from comfy.samplers import KSampler
comfy/samplers.py:5: in <module>
    from comfy import model_management
comfy/model_management.py:3: in <module>
    from comfy.cli_args import args
comfy/cli_args.py:97: in <module>
    args = parser.parse_args()
../../../anaconda3/lib/python3.9/argparse.py:1827: in parse_args
    self.error(msg % ' '.join(argv))
../../../anaconda3/lib/python3.9/argparse.py:2581: in error
    self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
../../../anaconda3/lib/python3.9/argparse.py:2568: in exit
    _sys.exit(status)
E   SystemExit: 2
collecting ... usage: pytest [-h] [--listen [IP]] [--port PORT] [--enable-cors-header [ORIGIN]] [--extra-model-paths-config PATH [PATH ...]] [--output-directory OUTPUT_DIRECTORY] [--temp-directory TEMP_DIRECTORY] [--auto-launch]
              [--disable-auto-launch] [--cuda-device DEVICE_ID] [--cuda-malloc | --disable-cuda-malloc] [--dont-upcast-attention] [--force-fp32 | --force-fp16] [--fp16-vae | --fp32-vae | --bf16-vae]
              [--directml [DIRECTML_DEVICE]] [--disable-ipex-optimize] [--preview-method [none,auto,latent2rgb,taesd]] [--use-split-cross-attention | --use-quad-cross-attention | --use-pytorch-cross-attention]
              [--disable-xformers] [--gpu-only | --highvram | --normalvram | --lowvram | --novram | --cpu] [--disable-smart-memory] [--dont-print-server] [--quick-test-for-ci] [--windows-standalone-build]
              [--disable-metadata]
pytest: error: unrecognized arguments: --co

Best solution would be to fix cli_args.py so it would not parse args on import.

Since test_inference.py only uses this import statement to get a list of samplers and schedulers, I will hard-code those lists for now.

M1kep · 2023-09-05T22:20:08Z

What sort of runtime speeds are you seeing in GHA?

I see some of the test workflows are using 40 steps, I feel these runs could take hours?

guill · 2023-09-06T00:09:13Z

Right now, it looks like these tests only check that the workflow completed (and returned a non-black result). There's definitely value in that, but I'm not sure it'll be able to serve as a basis for a full testing suite. In order to get full coverage, we'll want:

Comparison of image results with 'expected' image results (possibly via a perceptual hash to avoid bloating the repository).
Manual review of image results when they differ.
The ability to determine whether caching functioned properly.

IMO, before merging this, we should have simple examples of those test cases. (In particular, I'm concerned about this code's ability to generalize to 2 without creating a second entirely parallel testing system.)

ltdrdata · 2023-09-06T00:42:20Z

Right now, it looks like these tests only check that the workflow completed (and returned a non-black result). There's definitely value in that, but I'm not sure it'll be able to serve as a basis for a full testing suite. In order to get full coverage, we'll want:

Comparison of image results with 'expected' image results (possibly via a perceptual hash to avoid bloating the repository).

Manual review of image results when they differ.

The ability to determine whether caching functioned properly.

IMO, before merging this, we should have simple examples of those test cases. (In particular, I'm concerned about this code's ability to generalize to 2 without creating a second entirely parallel testing system.)

We should be able to set up a structure to compare reference images and result images by pixel differences.

…to ci-quality-workflows

* Add image comparison tests * Comparison tests do not pass with empty metadata * Ensure tests are run in correct order * Save image files with test name * Update tests readme

enzymezoo-code · 2023-09-06T21:28:01Z

Right now, it looks like these tests only check that the workflow completed (and returned a non-black result). There's definitely value in that, but I'm not sure it'll be able to serve as a basis for a full testing suite. In order to get full coverage, we'll want:
1. Comparison of image results with 'expected' image results (possibly via a perceptual hash to avoid bloating the repository).

2. Manual review of image results when they differ.

3. The ability to determine whether caching functioned properly.
IMO, before merging this, we should have simple examples of those test cases. (In particular, I'm concerned about this code's ability to generalize to 2 without creating a second entirely parallel testing system.)

You're right! A full set of tests needs all those things.

Update:

I've added image comparisons with tests/compare/test_quality.py
Manual review: test_quality.py creates a metrics.md file, which shows which tests passed and which did not. It also creates a folder of grid images to compare "ground truth" to the newly generated image.
Determine whether caching functioned properly: @guill Do you have suggestions on where to start with this?

enzymezoo-code · 2023-09-06T21:36:07Z

@M1kep Good suggestion. I've reduced the default step count in the tests to 20.

I estimate they'll take about a half hour to run through all the samplers and schedule combinations (on a 3090 with xformers only).

guill · 2023-09-07T03:00:29Z

Right now, it looks like these tests only check that the workflow completed (and returned a non-black result). There's definitely value in that, but I'm not sure it'll be able to serve as a basis for a full testing suite. In order to get full coverage, we'll want:
1. Comparison of image results with 'expected' image results (possibly via a perceptual hash to avoid bloating the repository).

2. Manual review of image results when they differ.

3. The ability to determine whether caching functioned properly.
IMO, before merging this, we should have simple examples of those test cases. (In particular, I'm concerned about this code's ability to generalize to 2 without creating a second entirely parallel testing system.)
You're right! A full set of tests needs all those things.

Update:

I've added image comparisons with tests/compare/test_quality.py

Manual review: test_quality.py creates a metrics.md file, which shows which tests passed and which did not. It also creates a folder of grid images to compare "ground truth" to the newly generated image.

Determine whether caching functioned properly: @guill Do you have suggestions on where to start with this?

Awesome! This is coming along great.

Right now, detecting caching requires listening to the websocket events. You can do it one of two ways:

Use the execution_cached message to detect which nodes are cached.
Use the executing message to detect which nodes are not cached.

If I can be a little selfish, I would vote for the latter. (In my execution model refactor PR, we won't actually be able to know whether nodes within components will be able to be cached until those components are instantiated.)

guill · 2023-09-07T03:23:53Z

Also, this doesn't necessarily need to be decided in this PR, but I'd like to propose implementing tests using a more code-oriented method of creating graphs rather than exporting the json from the front-end. I think it's going to be a lot easier to deal with in git history, a lot easier to parameterize (such as where you're currently hard-coding the sampler node IDs), and a lot easier for us to reuse common subgraphs to keep tests terse.

As an example, here's what your current test json would look like using the GraphBuilder from my branch (https://github.com/guill/ComfyUI/blob/node_expansion/comfy/graph_utils.py):

graph = graph_utils.GraphBuilder(prefix="")

# Initial Sampling
loader = graph.node("CheckpointLoaderSimple", ckpt_name="sd_xl_base_1.0.safetensors")
latent = graph.node("EmptyLatentImage", width=1024, height=1024, batch_size=1)
prompt = graph.node("CLIPTextEncode", text="a photo of a cat", clip=loader.out(1))
# We can have comments too!
# For example, we're explicitly using this instead of another CLIPTextEncode node to test _____.
negative_prompt = graph.node("ConditioningZeroOut", conditioning=prompt.out(0))
sampler = graph.node("KSamplerAdvanced",
    add_noise="enable",
    noise_seed=42,
    steps=20,
    cfg=7.5,
    sampler_name=SAMPLERS[i],
    scheduler="normal",
    start_at_step=0,
    end_at_step=32,
    return_with_leftover_noise="enable",
    model=loader.out(0),
    positive=prompt.out(0),
    negative=negative_prompt.out(0),
    latent_image=latent.out(0))

# Refining
rloader = graph.node("CheckpointLoaderSimple", ckpt_name="sd_xl_refiner_1.0.safetensors")
rprompt = graph.node("CLIPTextEncode", text="a photo of a cat", clip=rloader.out(1))
rnegative_prompt = graph.node("CLIPTextEncode", text="", clip=rloader.out(1))
rsampler = graph.node("KSamplerAdvanced",
    add_noise="disable",
    noise_seed=42,
    steps=20,
    cfg=7.5,
    sampler_name=SAMPLERS[i],
    scheduler="normal",
    start_at_step=32,
    end_at_step=10000,
    return_with_leftover_noise="disable",
    model=rloader.out(0),
    positive=rprompt.out(0),
    negative=rnegative_prompt.out(0),
    latent_image=sampler.out(0))
decode = graph.node("VAEDecode", samples=rsampler.out(0), vae=loader.out(2))

comfy_client.get_images(graph=graph.finalize(), save=False)

(Dry-coded, so sorry if there are any missing commas or the like.)

The GraphBuilder itself is pretty small/simple, and could probably just be pulled into this PR if we want to go that route.

* Add build test github workflow

comfyanonymous · 2023-09-13T16:07:26Z

3039b08

This should fix the issue with the args.

guill · 2023-09-16T01:49:13Z

@enzymezoo-code Do you plan on continuing to push this PR through? If not, I may try to do so myself -- I'm looking to get some unit tests running against my branch (with comparisons against mainline) and this PR looks like a great start.

enzymezoo-code · 2023-09-16T03:55:34Z

@comfyanonymous Can we merge this in?

* Add inference tests * Clean up * Rename test graph file * Add readme for tests * Separate server fixture * test file name change * Assert images are generated * Clean up comments * Add __init__.py so tests can run with command line `pytest` * Fix command line args for pytest * Loop all samplers/schedulers in test_inference.py * Ci quality workflows compare (comfyanonymous#1) * Add image comparison tests * Comparison tests do not pass with empty metadata * Ensure tests are run in correct order * Save image files with test name * Update tests readme * Reduce step counts in tests to ~halve runtime * Ci quality workflows build (comfyanonymous#2) * Add build test github workflow

* Add inference tests * Clean up * Rename test graph file * Add readme for tests * Separate server fixture * test file name change * Assert images are generated * Clean up comments * Add __init__.py so tests can run with command line `pytest` * Fix command line args for pytest * Loop all samplers/schedulers in test_inference.py * Ci quality workflows compare (#1) * Add image comparison tests * Comparison tests do not pass with empty metadata * Ensure tests are run in correct order * Save image files with test name * Update tests readme * Reduce step counts in tests to ~halve runtime * Ci quality workflows build (#2) * Add build test github workflow

enzymezoo-code added 8 commits September 1, 2023 15:53

Add inference tests

6672503

Clean up

cd18e72

Rename test graph file

faf4b37

Add readme for tests

5b4aaa6

Separate server fixture

d7533bf

test file name change

2efe168

Assert images are generated

633243c

Clean up comments

4c27d4e

enzymezoo-code requested a review from comfyanonymous as a code owner September 5, 2023 03:30

enzymezoo-code added 2 commits September 4, 2023 22:32

Merge branch 'master' of https://github.com/comfyanonymous/ComfyUI in…

219e62a

…to ci-quality-workflows

Add __init__.py so tests can run with command line pytest

3289db1

enzymezoo-code added 2 commits September 5, 2023 15:42

Fix command line args for pytest

10dc635

Loop all samplers/schedulers in test_inference.py

ad6cd71

enzymezoo-code and others added 3 commits September 6, 2023 15:50

Merge branch 'master' of https://github.com/comfyanonymous/ComfyUI in…

26c8e8b

…to ci-quality-workflows

Ci quality workflows compare (#1)

dc4bb92

* Add image comparison tests * Comparison tests do not pass with empty metadata * Ensure tests are run in correct order * Save image files with test name * Update tests readme

Reduce step counts in tests to ~halve runtime

a07a838

Ci quality workflows build (#2)

cf0e551

* Add build test github workflow

comfyanonymous changed the base branch from master to temp September 19, 2023 03:17

comfyanonymous merged commit 26cd840 into comfyanonymous:temp Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ci quality workflows #1423

Ci quality workflows #1423

enzymezoo-code commented Sep 5, 2023

enzymezoo-code commented Sep 5, 2023

M1kep commented Sep 5, 2023

guill commented Sep 6, 2023

ltdrdata commented Sep 6, 2023 •

edited

Loading

enzymezoo-code commented Sep 6, 2023

enzymezoo-code commented Sep 6, 2023

guill commented Sep 7, 2023

guill commented Sep 7, 2023 •

edited

Loading

comfyanonymous commented Sep 13, 2023

guill commented Sep 16, 2023

enzymezoo-code commented Sep 16, 2023

Ci quality workflows #1423

Ci quality workflows #1423

Conversation

enzymezoo-code commented Sep 5, 2023

enzymezoo-code commented Sep 5, 2023

M1kep commented Sep 5, 2023

guill commented Sep 6, 2023

ltdrdata commented Sep 6, 2023 • edited Loading

enzymezoo-code commented Sep 6, 2023

enzymezoo-code commented Sep 6, 2023

guill commented Sep 7, 2023

guill commented Sep 7, 2023 • edited Loading

comfyanonymous commented Sep 13, 2023

guill commented Sep 16, 2023

enzymezoo-code commented Sep 16, 2023

ltdrdata commented Sep 6, 2023 •

edited

Loading

guill commented Sep 7, 2023 •

edited

Loading