Commit Graph

15189 Commits

Author SHA1 Message Date
Ryan Dick
20acfc9a00 Raise in CustomEmbedding and CustomGroupNorm if a patch is applied. 2024-12-28 20:49:17 +00:00
Ryan Dick
918f541af8 Add unit test for a SetParameterLayer patch applied to a CustomFluxRMSNorm layer. 2024-12-28 20:44:48 +00:00
Ryan Dick
93e76b61d6 Add CustomFluxRMSNorm layer. 2024-12-28 20:33:38 +00:00
Ryan Dick
f692e217ea Add patch support to CustomConv1d and CustomConv2d (no unit tests yet). 2024-12-27 22:23:17 +00:00
Ryan Dick
f2981979f9 Get custom layer patches working with all quantized linear layer types. 2024-12-27 22:00:22 +00:00
Ryan Dick
ef970a1cdc Add support for FluxControlLoRALayer in CustomLinear layers and add a unit test for it. 2024-12-27 21:00:47 +00:00
Ryan Dick
5ee7405f97 Add more unit tests for custom module LoRA patching: multiple LoRAs and ConcatenatedLoRALayers. 2024-12-27 19:47:21 +00:00
Ryan Dick
e24e386a27 Add support for patches to CustomModuleMixin and add a single unit test (more to come). 2024-12-27 18:57:13 +00:00
Ryan Dick
b06d61e3c0 Improve custom layer wrap/unwrap logic. 2024-12-27 16:29:48 +00:00
Ryan Dick
7d6ab0ceb2 Add a CustomModuleMixin class with a flag for enabling/disabling autocasting (since it incurs some runtime speed overhead.) 2024-12-26 20:08:30 +00:00
Ryan Dick
9692a36dd6 Use a fixture to parameterize tests in test_all_custom_modules.py so that a fresh instance of the layer under test is initialized for each test. 2024-12-26 19:41:25 +00:00
Ryan Dick
b0b699a01f Add unit test to test that isinstance(...) behaves as expected with custom module types. 2024-12-26 18:45:56 +00:00
Ryan Dick
a8b2c4c3d2 Add inference tests for all custom module types (i.e. to test autocasting from cpu to device). 2024-12-26 18:33:46 +00:00
Ryan Dick
03944191db Split test_autocast_modules.py into separate test files to mirror the source file structure. 2024-12-24 22:29:11 +00:00
Ryan Dick
987c9ae076 Move custom autocast modules to separate files in a custom_modules/ directory. 2024-12-24 22:21:31 +00:00
Ryan Dick
6d7314ac0a Consolidate the LayerPatching patching modes into a single implementation. 2024-12-24 15:57:54 +00:00
Ryan Dick
80db9537ff Rename model_patcher.py -> layer_patcher.py. 2024-12-24 15:57:54 +00:00
Ryan Dick
6f926f05b0 Update apply_smart_model_patches() so that layer restore matches the behavior of non-smart mode. 2024-12-24 15:57:54 +00:00
Ryan Dick
61253b91f1 Enable LoRAPatcher.apply_smart_lora_patches(...) throughout the stack. 2024-12-24 15:57:54 +00:00
Ryan Dick
0148512038 (minor) Rename num_layers -> num_loras in unit tests. 2024-12-24 15:57:54 +00:00
Ryan Dick
d0f35fceed Add test_apply_smart_lora_patches_to_partially_loaded_model(...). 2024-12-24 15:57:54 +00:00
Ryan Dick
cefcb340d9 Add LoRAPatcher.smart_apply_lora_patches() 2024-12-24 15:57:54 +00:00
Ryan Dick
0fc538734b Skip flaky test when running on Github Actions, and further reduce peak unit test memory. 2024-12-24 14:32:11 +00:00
Ryan Dick
7214d4969b Workaround a weird quirk of QuantState.to() and add a unit test to exercise it. 2024-12-24 14:32:11 +00:00
Ryan Dick
a83a999b79 Reduce peak memory used for unit tests. 2024-12-24 14:32:11 +00:00
Ryan Dick
f8a6accf8a Fix bitsandbytes imports to avoid ImportErrors on MacOS. 2024-12-24 14:32:11 +00:00
Ryan Dick
f8ab414f99 Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded. 2024-12-24 14:32:11 +00:00
Ryan Dick
c6795a1b47 Make CachedModelWithPartialLoad work with models that have non-persistent buffers. 2024-12-24 14:32:11 +00:00
Ryan Dick
0a8fc74ae9 Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules. 2024-12-24 14:32:11 +00:00
Ryan Dick
dc54e8763b Add CustomInvokeLinearNF4 to enable CPU -> GPU streaming for InvokeLinearNF4 layers. 2024-12-24 14:32:11 +00:00
Ryan Dick
1b56020876 Add CustomInvokeLinear8bitLt layer for device streaming with InvokeLinear8bitLt layers. 2024-12-24 14:32:11 +00:00
Ryan Dick
3f990393a1 Simplify the state management in InvokeLinear8bitLt and add unit tests. This is in preparation for wrapping it to support streaming of weights from cpu to gpu. 2024-12-24 14:32:11 +00:00
Ryan Dick
97d56f7dc9 Add torch module autocast unit test for GGUF-quantized models. 2024-12-24 14:32:11 +00:00
Ryan Dick
fe0ef2c27c Add torch module autocast utilities. 2024-12-24 14:32:11 +00:00
Ryan Dick
65fcbf5f60 Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW. 2024-12-24 14:32:11 +00:00
Ryan Dick
d3916dbdb6
Partial Loading PR1: Tidy ModelCache (#7492)
## Summary

This PR tidies up the model cache code in preparation for further
refactoring to support partial loading of models onto the GPU. **These
code changes should not change the functional behavior in any way.**

Changes:
- Remove the `ModelCacheBase` class. `ModelCache` is the only
implementation, so there is no benefit to the separate abstract class.
- Split `CacheRecord` and `CacheStats` out into their own files.
- Remove the `ModelLocker` class. This extra layer of indirection was
not providing any benefit. Locking is now done directly with the
`ModelCache`.
- Tidy up relative imports that were contributing to circular import
issues.
- Pull the 'submodel' concern out of the `ModelCache`. The `ModelCache`
should not need to be aware of the model manager submodel system.
- Delete unused properties from the `ModelCache` (e.g.
`.lazy_offloading`, `.storage_device`, etc.)

## QA Instructions

I ran smoke tests with a variety of SD1, SDXL and FLUX models. No change
to behavior is expected.

## Merge Plan

<!--WHEN APPLICABLE: Large PRs, or PRs that touch sensitive things like
DB schemas, may need some care when merging. For example, a careful
rebase by the change author, timing to not interfere with a pending
release, or a message to contributors on discord after merging.-->

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
2024-12-24 09:30:44 -05:00
Ryan Dick
55b13c1da3 (minor) Add TODO comment regarding the location of get_model_cache_key(). 2024-12-24 14:23:19 +00:00
Ryan Dick
7dc3e0fdbe Get rid of ModelLocker. It was an unnecessary layer of indirection. 2024-12-24 14:23:18 +00:00
Ryan Dick
a39bcf7e85 Move lock(...) and unlock(...) logic from ModelLocker to the ModelCache and make a bunch of ModelCache properties/methods private. 2024-12-24 14:23:18 +00:00
Ryan Dick
a7c72992a6 Pull get_model_cache_key(...) out of ModelCache. The ModelCache should not be concerned with implementation details like the submodel_type. 2024-12-24 14:23:18 +00:00
Ryan Dick
d30a9ced38 Rename model_cache_default.py -> model_cache.py. 2024-12-24 14:23:18 +00:00
Ryan Dick
e0bfa6157b Remove ModelCacheBase. 2024-12-24 14:23:18 +00:00
Ryan Dick
83ea6420e2 Move CacheStats to its own file. 2024-12-24 14:23:18 +00:00
Ryan Dick
ce11a1952e Move CacheRecord out to its own file. 2024-12-24 14:23:18 +00:00
Ryan Dick
e48dee4c4a Rip out ModelLockerBase. 2024-12-24 14:23:18 +00:00
Simon Fuhrmann
712674b6dd Add Stereogram Nodes to communityNodes.md 2024-12-23 13:51:53 -05:00
psychedelicious
de0043f443 docs: update download links for launcher 2024-12-23 13:23:14 +11:00
Riku
d21506da6f feat(ci): add typegen check workflow 2024-12-22 06:05:17 +11:00
psychedelicious
a49894901a docs: fix installation docs home again 2024-12-20 17:35:50 +11:00
psychedelicious
e7e26c8a93 docs: fix installation docs home 2024-12-20 17:12:44 +11:00