Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make memory mapped behavior match read_samples #60

Open
Teque5 opened this issue May 30, 2024 · 1 comment
Open

Make memory mapped behavior match read_samples #60

Teque5 opened this issue May 30, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Teque5
Copy link
Collaborator

Teque5 commented May 30, 2024

When reading samples from signals the current implementation is a bit quirky and deviates from expectations when reading memory mapped samples from a file IF those samples need to be scaled.

Consider the case where we read the sigmf logo from the main repository. This is a 2-channel real-valued audio file with samples stored as 16-bit integers.

>>> logo = sigmf.sigmffile.fromfile('sigmf_logo')

>>> logo.read_samples(count=3)
array([[-3.0517578e-05,  0.0000000e+00],
       [ 6.1035156e-05,  0.0000000e+00],
       [-6.1035156e-05,  0.0000000e+00]], dtype=float32)

>>> logo[0:3]
memmap([[-1,  0],
        [ 2,  0],
        [-2,  0]], dtype=int16)

This happens because when using read_samples the scale factor is applied, but this is not done for the memory map.

I'm not sure the exact best solution for this, but I think we should fix #15 simultaneously since it will require tinkering with the same code.

Solutions I propose:

  1. Leave as-is
  2. When accesing the memory-map of a file that requires scaling, return of a copy of the data instead (by using read_samples probably)
  3. When accessing a memory-map return a scale parameter along with the data? or maybe a warning?

Fixing #15 I believe requires using the offset kwarg of np.memmap.

@Teque5 Teque5 added the bug Something isn't working label May 30, 2024
@liambeguin
Copy link
Contributor

Hi @Teque5, I've run into the same kind of problem with sigmf archives... I was hoping #42 was going to fix this, but no..

On my end the problem is that functions like read_samples_in_capture() assume that we have a data_file to access to run things like os.path.getsize(). IMO it would be really nice to rework/consolidate SigMFFile.__init__, set_data_file(), and _read_datafile() to process user inputs (either a file, a buffer, any other type, ...) into a single internal representation of the data (maybe _memmap?). Then, each accessor can use that single "representation" and return whatever is needed.

This might also help support loading a non-conforming dataset?
Let me know what you think, I don't have a lot of time to spare on this, but I could try to help out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants