Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-submission Inquiry for QuadratiK #168

Closed
2 of 18 tasks
rmj3197 opened this issue Mar 13, 2024 · 7 comments
Closed
2 of 18 tasks

Pre-submission Inquiry for QuadratiK #168

rmj3197 opened this issue Mar 13, 2024 · 7 comments

Comments

@rmj3197
Copy link

rmj3197 commented Mar 13, 2024

Submitting Author: Raktim Mukhopadhyay (@rmj3197)
Package Name: QuadratiK
One-Line Description of Package: QuadratiK includes test for multivariate normality, test for uniformity on the sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data.
Repository Link (if existing): https://github.com/rmj3197/QuadratiK


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:

Documentation link : https://quadratik.readthedocs.io/en/latest/

We introduce the QuadratiK package that incorporates innovative data analysis methodologies. The presented software, implemented in both R and Python, offers a comprehensive set of novel goodness-of-fit tests and clustering techniques using kernel-based quadratic distances. Our software implements one, two and k-sample tests for goodness of fit, providing an efficient and mathematically sound way to assess the fit of probability distributions. Expanded capabilities of our software include supporting tests for uniformity on the $d$-dimensional Sphere based on Poisson kernel densities, and algorithms for generating random samples from Poisson kernel densities. Particularly noteworthy is the incorporation of a unique clustering algorithm specifically tailored for spherical data that leverages a mixture of Poisson kernel-based densities on the sphere. Alongside this, our software includes additional graphical functions, aiding the users in validating, as well as visualizing and representing clustering results. This enhances interpretability and usability of the analysis. In summary, our R and Python packages serve as a powerful suite of tools, offering researchers and practitioners the means to delve deeper into their data, draw robust inference, and conduct potentially impactful analyses and inference across a wide array of disciplines.

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Scope

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our
    scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific

  • Geospatial
  • Education

Community Partnerships

If your package is associated with an
existing community please check below:

We are unsure of the categorization of the package. The contents of the package are described in detail below.

  • Who is the target audience and what are the scientific applications of this package?

    • The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions.
    • Spherical data structures are common in fields such as biology, geosciences and astronomy, where data points are naturally mapped to a sphere. QuadratiK provides a tailored approach to effectively handle and interpret these data.
    • This package is also of particular interest to professionals in health and biological sciences, where understanding and interpreting spherical data can be crucial in studies ranging from molecular biology to epidemiology and public health.
  • Are there other Python packages that accomplish similar things? If so, how does yours differ?

    • SciPy and hyppo also have collections of goodness-of-fit test functionalities. Our interest focuses on tests that are based on the family of kernel-based quadratic distances. The kernels we use are diffusion kernels, that is, probability distributions that depend on a tuning parameter and satisfy the convolution property. We also implement the Poisson kernel-based tests for uniformity on the d-dimensional sphere.

    • We are aware of only a limited number of Python libraries that offer spherical clustering capabilities, such as spherecluster (last updated in November 2018) and soyclustering (last updated in May 2020). spherecluster implements Spherical K-Means and clustering using von Mises Fisher distributions as proposed in "Banerjee, Arindam, et al. "Clustering on the Unit Hypersphere using von Mises-Fisher Distributions." Journal of Machine Learning Research 6.9 (2005).". soyclustering implements spherical k-means for document clustering which has been proposed in Kim, Hyunjoong, Han Kyul Kim, and Sungzoon Cho. "Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling." Expert Systems with Applications 150 (2020): 113288.

    • In summary, there are fundamental differences between QuadratiK and existing packages that are as follows -

      • The GOF tests are U-statistics based on centered kernels. The concept and methodology of centering is unique to our methods and is not part of the methods appearing in existing packages.
      • An algorithm for connecting the tuning parameter with the statistical properties of the test, namely power and degrees of freedom (DOF) is provided. This feature differentiates our novel methods from methods in other packages.
      • A new clustering algorithm for data that reside on the sphere using the Poisson kernel-based densities is offered. This aspect is not a feature of the existing packages.
      • We also offer algorithms for generating random samples from Poisson kernel-based densities. This capability is also unique to our package.
    • We also implement a GUI to enable interaction with the software in a non-programmatic manner using the streamlit library. We have not found any python package that implements a GUI for the above described tasks.

  • Any other questions or issues we should be aware of:

Please see our comment presented in the bullet point regarding the category of the software. Are we fitting into technical, specialized domains? Please advise.

P.S. Have feedback/comments about our review process? Leave a comment here

@rmj3197
Copy link
Author

rmj3197 commented Mar 22, 2024

Hello, I just wanted to ask if anyone had a chance to review the inquiry for this package. Thank you very much for your time and efforts.

@isabelizimm
Copy link
Contributor

Hello there 👋 Thank you so much for your submission! I'm taking a look now and discussing the scope of this package internally. We hope to get back to you shortly!

@rmj3197
Copy link
Author

rmj3197 commented Apr 10, 2024

Hello @isabelizimm , thank you very much for looking into our package. I was wondering if you could provide any updates on it as it has been about three weeks since your last communication. We are eagerly looking forward to hearing from you. Thank you very much for your time.

@isabelizimm
Copy link
Contributor

Hello there! To give some clarity on what is the happening behind the scenes, pyOpenSci is currently updating its scope around analytic/modeling packages, which will affect the decision around QuadratiK. This is something that we want to do thoughtfully, to make sure we have clear guidelines and support on what that scope should be.

Thank you so much for your patience and your submission! We are wrapping up this process and hope to have our updated scope and decision on Quadratik to share with you soon.

@rmj3197
Copy link
Author

rmj3197 commented May 1, 2024

Hello @isabelizimm , Could you please inform us of your decision.

It's been a while, and having your decision would be greatly appreciated, as it will assist us in planning our next steps.

Thank you very much for looking into our package.

@Batalex
Copy link
Contributor

Batalex commented May 2, 2024

Hello @rmj3197! I am Alex, and I have taken the Editor-in-Chief mantel for now!
We have decided that QuadratiK is in scope for us. I just want to make one thing clear, as the documentation highlights the novelty of the clustering algorithm, that our expertise is focused on good development and packaging practices, and not so much as a technical endorsement of the approach.
If that works for you, you are welcome to open a new issue referencing this pre-submission inquiry. Thank you for your patience.

@rmj3197
Copy link
Author

rmj3197 commented May 13, 2024

Thank you very much for looking into our package and the clarification. We will proceed with the submission now.

@rmj3197 rmj3197 closed this as completed May 13, 2024
@rmj3197 rmj3197 mentioned this issue May 13, 2024
32 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants