feat(mm): default hashing algo to blake3_single

For SSDs, `blake3` is about 10x faster than `blake3_single` - 3 files/second vs 30 files/second. For spinning HDDs, `blake3` is about 100x slower than `blake3_single` - 300 seconds/file vs 3 seconds/file. For external drives, `blake3` is always worse, but the difference is highly variable. For external spinning drives, it's probably way worse than internal. The least offensive algorithm is `blake3_single`, and it's still _much_ faster than any other algorithm.
2025-01-07 03:17:05 +08:00 · 2024-03-21 17:38:46 +11:00 · 2024-03-21 17:38:46 +11:00 · 7726d312e1
commit 7726d312e1
parent 61520dfb86
5 changed files with 12 additions and 12 deletions
--- a/docs/features/CONFIGURATION.md
+++ b/docs/features/CONFIGURATION.md
@ -119,19 +119,19 @@ The provided token will be added as a `Bearer` token to the network requests to

 #### Model Hashing

-Models are hashed during installation, providing a stable identifier for models across all platforms. The default algorithm is `blake3`, with a multi-threaded implementation.
-
-If your models are stored on a spinning hard drive, we suggest using `blake3_single`, the single-threaded implementation. The hashes are the same, but it's much faster on spinning disks.
+Models are hashed during installation, providing a stable identifier for models across all platforms. Hashing is a one-time operation.

 ```yaml
 hashing_algorithm: blake3_single
 ```

-Model hashing is a one-time operation, but it may take a couple minutes to hash a large model collection. You may opt out of model hashing entirely by setting the algorithm to `random`.
+You might want to change this setting, depending on your system:

-```yaml
-hashing_algorithm: random
-```
+- `blake3_single` (default): Single-threaded - best for spinning HDDs, still OK for SSDs
+- `blake3`: Parallelized, memory-mapped implementation - best for SSDs, terrible for spinning disks
+- `random`: Skip hashing entirely - fastest but of course no hash
+
+During the first startup after upgrading to v4, all of your models will be hashed. This can take a few minutes.

 Most common algorithms are supported, like `md5`, `sha256`, and `sha512`. These are typically much, much slower than `blake3`.

--- a/invokeai/app/services/config/config_default.py
+++ b/invokeai/app/services/config/config_default.py
@ -191,7 +191,7 @@ class InvokeAIAppConfig(BaseSettings):
    node_cache_size:                int = Field(default=512,                description="How many cached nodes to keep in memory.")

    # MODEL INSTALL
-    hashing_algorithm: HASHING_ALGORITHMS = Field(default="blake3",         description="Model hashing algorthim for model installs. 'blake3' is best for SSDs. 'blake3_single' is best for spinning disk HDDs. 'random' disables hashing, instead assigning a UUID to models. Useful when using a memory db to reduce model installation time, or if you don't care about storing stable hashes for models. Alternatively, any other hashlib algorithm is accepted, though these are not nearly as performant as blake3.")
+    hashing_algorithm: HASHING_ALGORITHMS = Field(default="blake3_single",  description="Model hashing algorthim for model installs. 'blake3' is best for SSDs. 'blake3_single' is best for spinning disk HDDs. 'random' disables hashing, instead assigning a UUID to models. Useful when using a memory db to reduce model installation time, or if you don't care about storing stable hashes for models. Alternatively, any other hashlib algorithm is accepted, though these are not nearly as performant as blake3.")
    remote_api_tokens: Optional[list[URLRegexTokenPair]] = Field(default=None, description="List of regular expression and token pairs used when downloading models from URLs. The download URL is tested against the regex, and if it matches, the token is provided in as a Bearer token.")

    # fmt: on
--- a/invokeai/backend/model_hash/model_hash.py
+++ b/invokeai/backend/model_hash/model_hash.py
@ -61,7 +61,7 @@ class ModelHash:
    """

    def __init__(
-        self, algorithm: HASHING_ALGORITHMS = "blake3", file_filter: Optional[Callable[[str], bool]] = None
+        self, algorithm: HASHING_ALGORITHMS = "blake3_single", file_filter: Optional[Callable[[str], bool]] = None
    ) -> None:
        self.algorithm: HASHING_ALGORITHMS = algorithm
        if algorithm == "blake3":
--- a/invokeai/backend/model_manager/probe.py
+++ b/invokeai/backend/model_manager/probe.py
@ -114,7 +114,7 @@ class ModelProbe(object):

    @classmethod
    def probe(
-        cls, model_path: Path, fields: Optional[Dict[str, Any]] = None, hash_algo: HASHING_ALGORITHMS = "blake3"
+        cls, model_path: Path, fields: Optional[Dict[str, Any]] = None, hash_algo: HASHING_ALGORITHMS = "blake3_single"
    ) -> AnyModelConfig:
        """
        Probe the model at model_path and return its configuration record.
--- a/scripts/probe-model.py
+++ b/scripts/probe-model.py
@ -20,8 +20,8 @@ parser.add_argument(
 parser.add_argument(
    "--hash_algo",
    type=str,
-    default="blake3",
-    help=f"Hashing algorithm to use (default: blake3), one of: {algos}",
+    default="blake3_single",
+    help=f"Hashing algorithm to use (default: blake3_single), one of: {algos}",
 )
 args = parser.parse_args()