Skip to content

Converting dataclasses to and from fixed-length binary data using Python's stdlib module `struct`

License

Notifications You must be signed in to change notification settings

harrymander/dataclasses-struct

Repository files navigation

dataclasses-struct

PyPI version Python versions Tests status Code coverage License: MIT

A simple Python package that combines dataclasses with struct for packing and unpacking Python dataclasses to fixed-length bytes representations.

from typing import Annotated  # use typing_extensions on Python <3.9
import dataclasses_struct as dcs

@dcs.dataclass()
class Test:
    x: int  # or dcs.I64, i.e., a signed 64-bit integer
    y: float  # or dcs.F64, i.e., a double-precision (64-bit) floating point
    z: dcs.U8  # unsigned 8-bit integer
    s: Annotated[bytes, 10]  # fixed-length byte array of length 10

@dcs.dataclass()
class Container:
    test1: Test
    test2: Test
>>> dcs.is_dataclass_struct(Test)
True
>>> t1 = Test(100, -0.25, 0xff, b'12345')
>>> dcs.is_dataclass_struct(t1)
True
>>> t1
Test(x=100, y=-0.25, z=255, s=b'12345')
>>> packed = t1.pack()
>>> packed
b'd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0\xbf\xff12345\x00\x00\x00\x00\x00'
>>> Test.from_packed(packed)
Test(x=100, y=-0.25, z=255, s=b'12345\x00\x00\x00\x00\x00')
>>> t2 = Test(1, 100, 12, b'hello, world')
>>> c = Container(t1, t2)
>>> Container.from_packed(c.pack())
Container(test1=Test(x=100, y=-0.25, z=255, s=b'12345\x00\x00\x00\x00\x00'), test2=Test(x=1, y=100.0, z=12, s=b'hello, wor'))

Installation

This package is available on pypi:

pip install dataclasses-struct

To work correctly with mypy, an extension is required; add to your mypy.ini:

[mypy]
plugins = dataclasses_struct.ext.mypy_plugin

Usage

from typing import Annotated  # use typing_extensions on Python <3.9
import dataclasses_struct as dcs

endians = (
    dcs.NATIVE_ENDIAN_ALIGNED,  # uses system endianness and alignment
    dcs.NATIVE_ENDIAN,  # system endianness, packed representation
    dcs.LITTLE_ENDIAN,
    dcs.BIG_ENDIAN,
    dcs.NETWORK_ENDIAN,
)

@dcs.dataclass(endians[0])  # if no endian provided, defaults to NATIVE_ENDIAN_ALIGNED
class Test:

    # Single char type (must be bytes)
    single_char: dcs.Char
    single_char_alias: bytes  # alias for Char

    # Boolean
    bool_1: dcs.Bool
    bool_2: bool  # alias for Bool

    # Iegers
    int8: dcs.I8
    uint8: dcs.U8
    int16: dcs.I16
    uint16: dcs.U16
    int32: dcs.I32
    uint32: dcs.U32
    uint64: dcs.U64
    int64: dcs.I64
    int64_alias: int  # alias for I64

    # Only supported with NATIVE_ENDIAN_ALIGNED
    unsigned_size: dcs.Size
    signed_size: dcs.SSize
    pointer: dcs.Pointer

    # Floating point types
    single_precision: dcs.F32  # equivalent to float in C
    double_precision: dcs.F64  # equivalent to double in C
    double_precision_alias: float  # alias for F64

    # Byte arrays: values shorter than size will be padded with b'\x00'
    array: Annotated[bytes, 100]  # an array of length 100

    # Pad bytes can be added before and after fields: a b'\x00' will be
    # inserted for each pad byte.
    pad_before: Annotated[int, dcs.PadBefore(4)]
    pad_after: Annotated[int, dcs.PadAfter(2)]
    pad_before_and_after: Annotated[int, dcs.PadBefore(3), dcs.PadAfter(2)]

# Also supports nesting dataclass-structs
@dcs.dataclass(endians[0])  # endianness of contained classes must match
class Container:
    contained1: Test

    # supports PadBefore and PadAfter as well:
    contained2: Annotated[Test, dcs.PadBefore(10)]

Decorated classes are transformed to a standard Python dataclass with boilerplate __init__, __repr__, __eq__ etc. auto-generated. Additionally, two methods are added to the class: pack, a method for packing an instance of the class to bytes, and from_packed, a class method that returns a new instance of the class from its packed bytes representation.

A class or object can be check to see if it is a dataclass-struct using the is_dataclass_struct function. The get_struct_size function will return the size in bytes of the packed representation of a dataclass_struct class or an instance of one.

An additional class attribute, __dataclass_struct__. The struct format string, packed size, and endianness can be accessed like so:

>>> Test.__dataclass_struct__.format
'@cc??bBhHiIQqqNnPfdd100s4xqq2x3xq2x'
>>> Test.__dataclass_struct__.size
234
>>> Test.__dataclass_struct__.endianness
'@'

Default attribute values will be validated against their expected type and allowable value range. For example,

import dataclasses_struct as dcs

@dcs.dataclass()
class Test:
    x: dcs.U8 = -1

will raise a ValueError. This can be disabled by passing validate=False to the dataclasses_struct.dataclass decorator.

Development and contributing

Pull requests are welcomed!

This project uses Poetry as its build system. To install all dependencies (including development dependencies) into a virtualenv for local development:

poetry install --with dev

Uses pytest for testing:

poetry run pytest

(Omit the poetry run if the Poetry virtualenv is activated.)

Uses ruff and flake8 for linting, which is enforced on pull requests:

poetry run ruff check .
poetry run flake8

See pyproject.toml for the list of enabled checks. I recommend installing the provided pre-commmit hooks to ensure new commits pass linting:

pre-commit install

This will help speed-up pull requests by reducing the chance of failing CI checks.

PRs must also pass mypy checks (poetry run mypy).

About

Converting dataclasses to and from fixed-length binary data using Python's stdlib module `struct`

Resources

License

Stars

Watchers

Forks

Releases

No releases published