-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ci quality workflows #1423
Ci quality workflows #1423
Conversation
Another issue: The Example error message:
Best solution would be to fix Since test_inference.py only uses this import statement to get a list of samplers and schedulers, I will hard-code those lists for now. |
What sort of runtime speeds are you seeing in GHA? I see some of the test workflows are using 40 steps, I feel these runs could take hours? |
Right now, it looks like these tests only check that the workflow completed (and returned a non-black result). There's definitely value in that, but I'm not sure it'll be able to serve as a basis for a full testing suite. In order to get full coverage, we'll want:
IMO, before merging this, we should have simple examples of those test cases. (In particular, I'm concerned about this code's ability to generalize to 2 without creating a second entirely parallel testing system.) |
We should be able to set up a structure to compare reference images and result images by pixel differences. |
…to ci-quality-workflows
* Add image comparison tests * Comparison tests do not pass with empty metadata * Ensure tests are run in correct order * Save image files with test name * Update tests readme
You're right! A full set of tests needs all those things. Update:
|
@M1kep Good suggestion. I've reduced the default step count in the tests to 20. I estimate they'll take about a half hour to run through all the samplers and schedule combinations (on a 3090 with xformers only). |
Awesome! This is coming along great. Right now, detecting caching requires listening to the websocket events. You can do it one of two ways:
If I can be a little selfish, I would vote for the latter. (In my execution model refactor PR, we won't actually be able to know whether nodes within components will be able to be cached until those components are instantiated.) |
Also, this doesn't necessarily need to be decided in this PR, but I'd like to propose implementing tests using a more code-oriented method of creating graphs rather than exporting the json from the front-end. I think it's going to be a lot easier to deal with in git history, a lot easier to parameterize (such as where you're currently hard-coding the sampler node IDs), and a lot easier for us to reuse common subgraphs to keep tests terse. As an example, here's what your current test json would look like using the GraphBuilder from my branch (https://github.com/guill/ComfyUI/blob/node_expansion/comfy/graph_utils.py): graph = graph_utils.GraphBuilder(prefix="")
# Initial Sampling
loader = graph.node("CheckpointLoaderSimple", ckpt_name="sd_xl_base_1.0.safetensors")
latent = graph.node("EmptyLatentImage", width=1024, height=1024, batch_size=1)
prompt = graph.node("CLIPTextEncode", text="a photo of a cat", clip=loader.out(1))
# We can have comments too!
# For example, we're explicitly using this instead of another CLIPTextEncode node to test _____.
negative_prompt = graph.node("ConditioningZeroOut", conditioning=prompt.out(0))
sampler = graph.node("KSamplerAdvanced",
add_noise="enable",
noise_seed=42,
steps=20,
cfg=7.5,
sampler_name=SAMPLERS[i],
scheduler="normal",
start_at_step=0,
end_at_step=32,
return_with_leftover_noise="enable",
model=loader.out(0),
positive=prompt.out(0),
negative=negative_prompt.out(0),
latent_image=latent.out(0))
# Refining
rloader = graph.node("CheckpointLoaderSimple", ckpt_name="sd_xl_refiner_1.0.safetensors")
rprompt = graph.node("CLIPTextEncode", text="a photo of a cat", clip=rloader.out(1))
rnegative_prompt = graph.node("CLIPTextEncode", text="", clip=rloader.out(1))
rsampler = graph.node("KSamplerAdvanced",
add_noise="disable",
noise_seed=42,
steps=20,
cfg=7.5,
sampler_name=SAMPLERS[i],
scheduler="normal",
start_at_step=32,
end_at_step=10000,
return_with_leftover_noise="disable",
model=rloader.out(0),
positive=rprompt.out(0),
negative=rnegative_prompt.out(0),
latent_image=sampler.out(0))
decode = graph.node("VAEDecode", samples=rsampler.out(0), vae=loader.out(2))
comfy_client.get_images(graph=graph.finalize(), save=False) (Dry-coded, so sorry if there are any missing commas or the like.) The GraphBuilder itself is pretty small/simple, and could probably just be pulled into this PR if we want to go that route. |
* Add build test github workflow
This should fix the issue with the args. |
@enzymezoo-code Do you plan on continuing to push this PR through? If not, I may try to do so myself -- I'm looking to get some unit tests running against my branch (with comparisons against mainline) and this PR looks like a great start. |
@comfyanonymous Can we merge this in? |
* Add inference tests * Clean up * Rename test graph file * Add readme for tests * Separate server fixture * test file name change * Assert images are generated * Clean up comments * Add __init__.py so tests can run with command line `pytest` * Fix command line args for pytest * Loop all samplers/schedulers in test_inference.py * Ci quality workflows compare (comfyanonymous#1) * Add image comparison tests * Comparison tests do not pass with empty metadata * Ensure tests are run in correct order * Save image files with test name * Update tests readme * Reduce step counts in tests to ~halve runtime * Ci quality workflows build (comfyanonymous#2) * Add build test github workflow
* Add inference tests * Clean up * Rename test graph file * Add readme for tests * Separate server fixture * test file name change * Assert images are generated * Clean up comments * Add __init__.py so tests can run with command line `pytest` * Fix command line args for pytest * Loop all samplers/schedulers in test_inference.py * Ci quality workflows compare (#1) * Add image comparison tests * Comparison tests do not pass with empty metadata * Ensure tests are run in correct order * Save image files with test name * Update tests readme * Reduce step counts in tests to ~halve runtime * Ci quality workflows build (#2) * Add build test github workflow
This lays the foundation for a testing suite. Pytest was chosen for its simplicity, extensibility, and scalability.
The file tests/inference/test_inference.py is an e2e test that generates images and asserts that they are not blank.
The plan is for these automated tests is to orchestrate them through GitHub workflows, ensuring code changes are thoroughly tested prior to merging.
Currently, a few of these tests fail (eg. sampler="uni_pc_bh2" and scheduler="exponential"). We will either need to make these tests pass or decide to limit which samplers+schedulers are fully supported.