Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Speed up _to_str() by 38% in rich/filesize.py #26

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 3, 2024

📄 _to_str() in rich/filesize.py

📈 Performance improved by 38% (0.38x faster)

⏱️ Runtime went down from 198 microseconds to 144 microseconds

Explanation and details

Sure! Here is an optimized version of the _to_str function.

Here are the changes made to optimize the runtime.

  1. Reorganized the loop; instead of using enumerate, we directly multiply unit by base in each iteration which saves the computation of base ** i.
  2. Integrated the formatting logic directly within the loop to avoid recalculation and save an iteration.
  3. Added an extra check at the end to handle cases where the size exceeds the largest unit.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 18 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
# function to test
from typing import Iterable, Optional

import pytest  # used for our unit tests
from rich.filesize import _to_str

# unit tests

def test_basic_functionality():
    # Single Byte
    assert _to_str(1, ["KB", "MB", "GB"], 1024) == "1 byte"
    # Less than Base
    assert _to_str(500, ["KB", "MB", "GB"], 1024) == "500 bytes"
    # Exact Base
    assert _to_str(1024, ["KB", "MB", "GB"], 1024) == "1.0 KB"

def test_precision_handling():
    # Default Precision
    assert _to_str(1536, ["KB", "MB", "GB"], 1024) == "1.5 KB"
    # Custom Precision
    assert _to_str(1536, ["KB", "MB", "GB"], 1024, precision=2) == "1.50 KB"
    # Zero Precision
    assert _to_str(1536, ["KB", "MB", "GB"], 1024, precision=0) == "2 KB"

def test_separator_handling():
    # Default Separator
    assert _to_str(1536, ["KB", "MB", "GB"], 1024) == "1.5 KB"
    # Custom Separator
    assert _to_str(1536, ["KB", "MB", "GB"], 1024, separator="") == "1.5KB"
    # Multiple Character Separator
    assert _to_str(1536, ["KB", "MB", "GB"], 1024, separator=" | ") == "1.5 | KB"

def test_large_sizes():
    # Terabytes
    assert _to_str(1099511627776, ["KB", "MB", "GB", "TB"], 1024) == "1.0 TB"
    # Petabytes
    assert _to_str(1125899906842624, ["KB", "MB", "GB", "TB", "PB"], 1024) == "1.0 PB"

def test_edge_cases():
    # Zero Size
    assert _to_str(0, ["KB", "MB", "GB"], 1024) == "0 bytes"
    # Negative Size
    assert _to_str(-1024, ["KB", "MB", "GB"], 1024) == "0 bytes"
    # Empty Suffixes
    assert _to_str(1024, [], 1024) == ""

def test_non_standard_bases():
    # Decimal Base
    assert _to_str(1000, ["KB", "MB", "GB"], 1000) == "1.0 KB"
    # Custom Base
    assert _to_str(625, ["KB", "MB", "GB"], 5) == "25.0 KB"

def test_large_scale():
    # Large Data Sample
    assert _to_str(10**18, ["KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"], 1024) == "909.5 PB"
    # Performance with Large Iterables
    assert _to_str(1024, ["KB"] * 1000, 1024) == "1.0 KB"

def test_invalid_inputs():
    # Invalid Size Type
    with pytest.raises(TypeError):
        _to_str("1024", ["KB", "MB", "GB"], 1024)
    # Invalid Suffixes Type
    with pytest.raises(TypeError):
        _to_str(1024, "KB, MB, GB", 1024)
    # Invalid Base Type
    with pytest.raises(TypeError):
        _to_str(1024, ["KB", "MB", "GB"], "1024")

# Run the tests
if __name__ == "__main__":
    pytest.main()

🔘 (none found) − ⏪ Replay Tests

Sure! Here is an optimized version of the `_to_str` function.



Here are the changes made to optimize the runtime.
1. Reorganized the loop; instead of using `enumerate`, we directly multiply `unit` by `base` in each iteration which saves the computation of `base ** i`.
2. Integrated the formatting logic directly within the loop to avoid recalculation and save an iteration.
3. Added an extra check at the end to handle cases where the size exceeds the largest unit.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 3, 2024
@iusedmyimagination
Copy link

This could be legit. Certainly f-strings are faster than format. And that's a lot of passed tests. (see #28).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant