Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

float type matrices? #4

Closed
cdeterman opened this issue Apr 13, 2015 · 14 comments
Closed

float type matrices? #4

cdeterman opened this issue Apr 13, 2015 · 14 comments

Comments

@cdeterman
Copy link
Contributor

Given the focus of this package is to use the minimal amount of memory as efficiently as possible I believe it should include big.matrix objects of type float. There may be situations in which single precision is sufficient and therefore would need only ~half the space as a double matrix. I know R does not have any single precision data type but seeing how all the 'heavy lifting' is done in C++ it seems like this should be an approachable thing. However I would like to have additional opinions on the following points before I begin writing a bunch of code.

  1. Naturally, do you agree that float type matrices should be a part of this package?
  2. The current structure has the matrix_type representing the byte size of each type. Continuing this with float types would lead a conflict in various case statements. The only solution I could come up off the top of my head was to add another field to the big.matrix object that is more specific to the data type and not the byte size. e.g. pMat->matrix_data_type() would return a string (i.e. 'int', 'float', 'double', etc). This would lead to other code requiring updating, such as the Rcpp Gallery posts unless a more elegant solution can be conceived.
  3. Approaches likely would involved the use of typeid from <typeinfo> unless we would want to also begin moving towards C++11 standards where we could use the newer decltype function but possibly a moot point here (but worth beginning thoughts about C++11.

Any thoughts are appreciated :)

@cdeterman
Copy link
Contributor Author

As I am experimenting with this (I think i can avoid requiring the additional field mentioned above) do you have any previous code to check the sizeof the actual matrix? I have initially tried a few simple queries I thought would work but to neither returns the correct size. I want to try and get this to work correctly with double type matrices.

With a matrix of 1000x1000

Directly on the matrix the object points to (returns 8) but it appears this is data type size:

    Rcpp::XPtr<BigMatrix> pMat(bigMatAddr);
    return Rcpp::wrap(sizeof(pMat->matrix()));

Trying to use matrix accessor (returns 40?):

    Rcpp::XPtr<BigMatrix> pMat(bigMatAddr);
    MatrixAccessor<double> accMat(*pMat);
    return Rcpp::wrap(sizeof(accMat));

@kaneplusplus
Copy link
Owner

Supporting floats would be nice and I support any movement toward C++11. It feels like there is still a lot of code we could eliminate by adding modern C++ features.

I think biganalytics uses stable versions of calculations for things like variance so it seems like almost all of the work would be managing the element types.

Would the linear algebra operations then be handled by armadillo?

WRT sizeof, your sizeof(accMat)), 40 is the size of the 4 index_types + one pointer type (all 8 bytes). We could keep track of the size of the memory-mapping in a big.matrix when it is allocated. This would mean that a user would not need to resort to finding the backing and checking it's size manually.

@cdeterman
Copy link
Contributor Author

There are float types within armadillo (e.g. fmat, fvec) so it would work very easily. It would be nice to have the size tracked. Do you have a method in mind to accomplish that? Otherwise do you know how to check the size manually from these objects? I want to confirm that the float type is properly being applied.

@kaneplusplus
Copy link
Owner

It'll need to be added to the Create* functions in BigMatrix.cpp. The calls to truncate and ftruncate calculate the size. One thing to note is that the size may not be the same as the amount of physical space being being used. On Linux and Windows "sparse files" are created by default (the Mac file system doesn't have this capability).

I can probably get to this on the weekend unless you want to take a look.

@cdeterman
Copy link
Contributor Author

It would likely be better if you to modify the Create* functions. I am less familiar with the boost methods for shared memory objects. I think I have some working code to implement the float type matrices but don't have a validation method. That note regarding the size not the same as physical space is a good point. In the simplest scenario I want to at least confirm that the size is smaller for the float type matrices. In theory, they should be approximately half their double counterparts but if they don't at least that note will provide some sort of explanation.

@kaneplusplus
Copy link
Owner

That's fine. I'll send you a note when it's done. It seems like a nice feature and it provides the sanity check you're looking for to validate float types.

@kaneplusplus
Copy link
Owner

A new member of BigMatrix has been added, called _allocationSize and it keeps track of the total number of bytes allocated to a BigMatrix object. This value of this member can be found using the allocation_size method.

@kaneplusplus
Copy link
Owner

OK, I need to be passing a pointer-pointer to the Create* functions. I'll fix it now.

@kaneplusplus
Copy link
Owner

Yeah, that was it. Sorry and thanks for pointing it out. The fixed version is checked in.

@sritchie73
Copy link

Nice! Any news on when this version will be available on CRAN?

@cdeterman
Copy link
Contributor Author

@sritchie73 We are currently working on some additional small fixes (see issue #15 and #16 ). I imagine once we have these issues resolved so R CMD check passes without error we will update it on CRAN. For now you can use the dev version from this repo. Did you have any additional thoughts @kaneplusplus ?

@kaneplusplus
Copy link
Owner

Looking at #15 it doesn't seem like there was a consensus. I pointed out that it's nice when the behavior is similar across all platforms but I don't like Windows, I think development on Windows is miserable, and I think if you want to do any serious computing you should be on Linux. As a result we've traditionally done the minimum needed to get Windows building and I don't mind continuing with that policy.

For #16 I vote for \dontrun. The check environment is a mysterious and beyond checking packages, it's not used.

Are there objections or concerns for either?

@cdeterman
Copy link
Contributor Author

If CRAN doesn't have an issue with the Windows limitations I have no issue with it moving forward. Windows definitely is a pain for things like this. I think within bigmemory we will need \dontrun at the start of most of the examples however but ultimately really isn't an issue.

@phaverty
Copy link
Contributor

phaverty commented Jun 2, 2015

I'm OK with doing \dontrun on the troublesome examples. I'm also not super
concerned about losing Windows support. People have asked me about it from
time to time, but not so much that it is critical to have support.

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
[email protected]

On Tue, Jun 2, 2015 at 9:34 AM, Charles Determan [email protected]
wrote:

If CRAN doesn't have an issue with the Windows limitations I have no issue
with it moving forward. Windows definitely is a pain for things like this.
I think within bigmemory we will need \dontrun at the start of most of the
examples however but ultimately really isn't an issue.

Reply to this email directly or view it on GitHub
#4 (comment)
.

kaneplusplus pushed a commit that referenced this issue Mar 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants