Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should be part of stdlib? #1

Open
milancurcic opened this issue Dec 14, 2019 · 36 comments
Open

What should be part of stdlib? #1

milancurcic opened this issue Dec 14, 2019 · 36 comments
Labels
meta Related to this repository

Comments

@milancurcic
Copy link
Member

milancurcic commented Dec 14, 2019

Existing libraries, for inspiration or adoption


First issue in this repo which evolved from this thread. This is a broad, open ended, high-level issue, so feel free to go wide and crazy here.

To propose a specific module, procedure, or derived type, please open a new issue. You can follow the same format as in Fortran Proposals.

Wishlist from upthread

From @apthorpe:


From @FortranFan:

  • Containers
    • string type
    • bitsets
    • Enhanced 'array' types such as vectors, singly-linked and doubly-linked lists, etc.
    • Adapters such as stacks, queues, etc.
    • Associative ones such as dictionaries (maps), hash_sets, etc.
  • Algorithms
    • Generic methods for sort, findloc, etc. that can work with any type, intrinsic and derived,
    • Operations and permutations on a range of elements such as merge/union, difference,
      etc.
  • Utilities
    • Iterator-like facilities which make it easy to work with Containers,
    • Operator (<, >, ==, etc. ) and assignment(=) overload abstractions that perhaps make
      the use of standard algorithms more efficient?
    • Miscellaneous other functions, subroutines (like generic swap), datetime, named
      constants, etc.
  • Special
    • Any basic facilities (extensions perhaps to ABSTRACT INTERFACE block?) needed toward
      "special" functions such as Variadic ones in the language e.g., MAX, MIN
    • Ability to "overload" array subsection notation facility with Containers that standard Fortran
      provides with its 2 built-in containers: arrays and CHARACTER intrinsic type.
    • Any special mechanisms that can help aid with improved constructors of arrays/containers
      and derived types ('classes'). I envision certain fundamental 'computer engineering' aspects
      being pursued here that can enable, say, efficient operation on the diagonal of a matrix or
      initialization to an identity matrix; or efficient 'dynamic' construction of 'classes' in Fortran similar
      to that is achieved near universally using new keyword in other languages.

From @zbeekman:

  • Strings
    • Conversion to/from integer/real/logical (all kinds of each)
    • Conversion on string concatenation
    • raw string processing functions inspired by Ruby & Python
    • string class to make using all the machinery easier via TBPs
  • Files
    • For now just name manipulations like dirname, basename, etc.
  • OS/Environment integration
    • is_a_tty(), OS%env("HOME"), .envExists. "USER", etc.
  • Unit testing & assertions stuff
    • Subtest summaries w/ color
    • File and line number triggering failures
  • Error Stack class/object
    • Maintain a call-stack
    • Raise errors, but optionally trap them later with good call stack including line number and file
@FortranFan
Copy link

@milancurcic wrote:

..

  • Many others -- what did I miss?

Great start, perhaps @tclune et al at NASA with https://github.com/nasa/gFTL can contribute or be an inspiration?

@ivan-pi
Copy link
Member

ivan-pi commented Dec 15, 2019

The D standard library can also serve as reference/inspiration: https://dlang.org/library/. For many of the D modules there are already some existing open-source modules in Fortran (like dealing with CSV and JSON files, datetime objects, low-level string operations).

Inspired by the D library, I prepared a bunch of functions for checking ASCII characters: https://github.com/ivan-pi/fortran-ascii

@ivan-pi
Copy link
Member

ivan-pi commented Dec 16, 2019

Several interesting modules are available in the General-Purpose Fortran package (command line arguments, strings, expression parsers,messages, io, hot keys, fortran/C calls, graphics, sorting, unit conversions).

George Benthien has also made some string utilities and expression parsers.

Also Alan Miller's Fortran software contain many routines that are suitable for a stdlib.

The Rosetta Code Fortran pages contain simple implementations of several algorithms (greatest common divisors, sorting, searching, etc.) and data types (priority queues, decks, linked lists, etc.).

@marshallward
Copy link

I would like to see greater support for bit-reproducible numerical operations. This is a very high priority for us since our models are used in weather and climate forecasting, and much of our time is devoted to bit reproducibility of our numerical calculations.

A common problem is intrinsic Fortran reduction operations like sum(), where the order is ambiguous (deliberately, one might say), and therefore not reproducible. A more serious problem for us is transcendental functions, like exp() or cos(), which will give different results for different optimizations, and we typically cannot say where it was invoked (libm? Vendor library? etc.).

A standard library may be a place to provide bit-reproducible implementations.

@rweed
Copy link

rweed commented Dec 17, 2019

Fortunately, there is a wealth of libraries etc we can draw from. Some of the older ones like SLATEC etc are still in F77 but can be converted to F90 free format for consistency. One issue that needs to be resolved though are possible license conflicts. Here are a few more suggestions (there are probably a hundred more if we do a deep dive into whats available)

For general mathematical functions etc.

SLATEC/nistCML
https://www/nist/gov/itl/math/software
https://www.netlib.org/slatec
https://people.sc.fsu.edu/~jburkardt/f_src/slatec/slatec.html (F90 translation)

John Burkardt's collection of software at
https://people.sc.fsu.edu/~jburkardt/f_src/f_src.html

For containers/ADTs, I would suggest Robert Ruegers Fortran Template Library at
https://github.com/SCM-NV/ftl
Similar to @tclune gFTL but Ruegers implementations of the various containers and how he does the preprocessing step was easier for me to follow

Two books that have available code that I would suggest are Robin Vowels, "Algorithms and Data Structures in F and Fortran" and Dick Hanson and Tim Hopkins, "Numerical Computing in Modern Fortran". I've implemented some of the sorting routines from both books. In particular, I have a "semi"-generic implementation of Hanson and Hopkins quickSort routines that support all the integer types, 32 and 64 bit reals, character strings, and a user type/class. I have my own implementations of several commonly used ADTs based on unlimited polymorphic variables that I can contribute for reveiw but I need to go back and look at licensing issues since I borrowed ideas from Arjen Markus FLIBS, and Rueger's FTL. Also, I would personally avoid anything related to Numerical Recipes like the plague due to their restrictive license (and poor implementations of some of the algorithms)

@certik
Copy link
Member

certik commented Dec 17, 2019

@marshallward I created #12 for bit-reproducibility, let's discuss the details there.

@jacobwilliams
Copy link
Member

One big question is, do we want this library to contain numerical/scientific type codes? For example, ODE solvers, optimizers, interpolation, etc... The sorts of things that were in SLATEC and are in SciPy. A library like that is desperately needed for modern Fortran. Is that this library, or does that belong in another library built upon this one?

@certik
Copy link
Member

certik commented Dec 18, 2019

@jacobwilliams excellent question. I don't know the answer, we need to discuss it. I am a bit worried if the scope does not become too much if we include everything that potentially can be in SciPy.

@milancurcic
Copy link
Member Author

I am not opposed to numerical and scientific codes being part of stdlib. The scope of Fortran's stdlib doesn't necessarily need to be similar to that of Python, C, or any other language. Fortran is more ubiquitous in science and engineering, and to me it makes sense that the stdlib would have modules similar to numpy and scipy.

@milancurcic
Copy link
Member Author

One personal challenge I have with stock Fortran are its somewhat awkward and low-level I/O facilities -- open, read, write, inquire, rewind, and close. I often wished for a higher-level interface, like what you get with Python's open() -- you open a file with a function, get a file-like instance with methods that let you do stuff with it.

This would do away with unit numbers, which I don't think application developers should have to deal with. It could also be a solution to the problem that allocatable character strings must be pre-allocated before use on read statement.

Is there anything similar out there for Fortran? Would this be of interest to people here? I'd use it.

@certik
Copy link
Member

certik commented Dec 18, 2019

If we want to go with this broader scope, then one reasonable proposal can be to limit the scope roughly to what is in here:

https://www.mathworks.com/help/matlab/mathematics.html

Which seems to cover roughly what is in NumPy and SciPy.

If we use the Python analogy, the bare bone Python language does not have much for numerical computing. And if you do any kind of numerical computing in Python (I do), NumPy and SciPy are pretty much the "standard library". Not surprisingly, the default "Matlab standard library" roughly covers the same range.

The Julia standard library (https://github.com/JuliaLang/julia/tree/5da74be66993fb19edce52e4877d8ae2edbe27b0/stdlib, documented at https://docs.julialang.org, in the left column scroll down to "Standard Library") does not cover as wide range, but still includes linear algebra (Lapack), sparse arrays, statistics. It used to contain fft, but they moved it out apparently (https://discourse.julialang.org/t/where-is-the-fft/16512) -- it would be interesting to know the reasoning, as Matlab as well as NumPy has fft by default.

Ok, it's not a bad idea.

@jvdp1
Copy link
Member

jvdp1 commented Dec 18, 2019

Is there anything similar out there for Fortran? Would this be of interest to people here? I'd use it.

I would use it too.

Something that might be interesting to include in a standard library is sparse arrays (creation, management).

@certik
Copy link
Member

certik commented Dec 18, 2019

I think we can learn from Julia a lot. Here is the discussion related to moving FFT out of Julia's standard library and into a separate package:

JuliaLang/julia#18389

and apparently they want to also move much of the linear algebra out. See also:

https://groups.google.com/forum/#!topic/julia-users/ug5Jh6y5biA.

JuliaLang/julia#5155

If I understand their arguments, if it's part of the julia compiler itself, it's hard for them to make a release, test things properly on Travis, etc. Applied to Fortran, that would be like moving things from Fortran compilers (gfortran, ifort, ...) into a separate library like this stdlib.

@certik
Copy link
Member

certik commented Dec 18, 2019

So here are other things that could be part of stdlib:

  • sparse matrices
  • fft
  • special functions (like in SciPy) such as spherical harmonics, hypergeometric functions, ...
  • random numbers
  • statistics
  • ODE solvers and numerical integration (Gauss-Legendre points and weights and other algorithms)
  • optimization (root finding, etc.)

@cmacmackin
Copy link

One personal challenge I have with stock Fortran are its somewhat awkward and low-level I/O facilities -- open, read, write, inquire, rewind, and close. I often wished for a higher-level interface, like what you get with Python's open() -- you open a file with a function, get a file-like instance with methods that let you do stuff with it.

This would do away with unit numbers, which I don't think application developers should have to deal with. It could also be a solution to the problem that allocatable character strings must be pre-allocated before use on read statement.

Is there anything similar out there for Fortran? Would this be of interest to people here? I'd use it.

I'd personally like something along these lines. However, the problem is in defining methods on the file-object; these would need to know the number and type of arguments at compile-time. It would be impractical to produce methods with every conceivable permutation of object types. It would also require variadic functions, which are not available. As such, this can not be implemented well in Fortran, although perhaps something would be possible if we were to wrap some C-routines and pass in deferred-type objects.

@cmacmackin
Copy link

So here are other things that could be part of stdlib:

* sparse matrices

* fft

* special functions (like in SciPy) such as spherical harmonics, hypergeometric functions, ...

* random numbers

* statistics

* ODE solvers and numerical integration (Gauss-Legendre points and weights and other algorithms)

* optimization (root finding, etc.)

Some sort of interface for working with solvers for dense matrices would also be useful. LAPACK is horribly tedious to use, so an object-oriented wrapper could be handy. This could hold the factored version of the matrix, handle allocation of work arrays, etc. I've written code along these lines in the past.

@certik
Copy link
Member

certik commented Dec 18, 2019

Some sort of interface for working with solvers for dense matrices would also be useful. LAPACK is horribly tedious to use, so an object-oriented wrapper could be handy. This could hold the factored version of the matrix, handle allocation of work arrays, etc. I've written code along these lines in the past.

Yes, that's already planned, see #10.

@certik
Copy link
Member

certik commented Dec 18, 2019

@milancurcic why don't you start a separate issue for the IO stuff, so that we can discuss it there.

@zbeekman
Copy link
Member

One personal challenge I have with stock Fortran are its somewhat awkward and low-level I/O facilities -- open, read, write, inquire, rewind, and close. I often wished for a higher-level interface, like what you get with Python's open() -- you open a file with a function, get a file-like instance with methods that let you do stuff with it.

This would do away with unit numbers, which I don't think application developers should have to deal with. It could also be a solution to the problem that allocatable character strings must be pre-allocated before use on read statement.

Is there anything similar out there for Fortran? Would this be of interest to people here? I'd use it.

This is one of my primary motivations too. As @cmacmackin pointed out, we may not be able to get a one-to-one mapping of our favorite implementation X for fileIO stuff, but we can certainly make something better than what we have and idiomatically Fortran-like. And were there is very obvious solutions that need to be implemented in the language standard we can lobby for those.

@milancurcic
Copy link
Member Author

@zbeekman can you post this message to #14?

@certik
Copy link
Member

certik commented Dec 23, 2019

Do we all agree that the scope is broader (e.g., Python standard libraries + NumPy/SciPy), rather than narrower (e.g., C++ standard library)?

If so, let's write down in general terms, what the scope is and put it into README. I started at #43. Can you help me polish it up?

@zbeekman
Copy link
Member

PR looks good to me. I think this is an area that will evolve over time. As such I don't think we need to hash out every detail so long as we ensure things only grow in a good way organically... balance immediate needs with the threat of incurring technical debt and bad design choices.

If we hash things out in too much detail documents won't reflect reality. The PR is looking good last I checked and I'm generally happy with the vast majority of ideas and desires that others have expressed so far.

@zbeekman
Copy link
Member

A more useful step might be to provide more clarity on governance and workflow since right now the process of deciding when PRs are merged is murky, much less how to decide and agree upon what the grand objectives of the project are.

@milancurcic
Copy link
Member Author

@ivan-pi In my opinion, yes.

I think every Fortran compiler comes with a companion C compiler, no? Which ships with its libc. So is there really any burden to having C-interfaces in stdlib? If there is, then the interfaces should be optional.

Of course, if we want to support some of libc, we don't have to do so for all of it. Only parts that we decide are needed, as a community.

@jvdp1
Copy link
Member

jvdp1 commented Apr 15, 2020

Is not libc associated with Fortran executables? I just compiled a simple Fortran program and printed the shared object dependencies:

$ more test.f90 
program tmp
 implicit none
 integer::i
 
 i=1
 print*,i
end program
$ 
$ gfortran -O0 test.f90 
$ ldd a.out 
	linux-vdso.so.1 (0x00007ffc2a987000)
	libgfortran.so.5 => /lib64/libgfortran.so.5 (0x00007f7131db2000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f7131c6c000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f7131c52000)
	libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007f7131c08000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f7131a3d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f713208b000)
$ ifort -O0 test.f90 
$ ldd a.out 
	linux-vdso.so.1 (0x00007ffd12d20000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fbe10b2d000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fbe10b0b000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fbe10942000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fbe10928000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fbe1091f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fbe10caa000)

@milancurcic
Copy link
Member Author

@jvdp1 GFortran does this because it uses libc to implement parts of Fortran, but I don't know if that's true for most Fortran compilers. If it is, I think that would be a good argument for not disallowing Fortran interfaces to libc in stdlib.

@certik
Copy link
Member

certik commented Apr 15, 2020

I think we can have C wrappers for any C library, such as in #45. Including libc.

That being said, I think Fortran language should stand on its own if needed, including compiling without libc if needed.

Having "standard" C interfaces for most common tasks would be very helpful so that people don't have to reimplement those over and over. Perhaps later those could go into its own package, perhaps even called libc, as part of fpm.

@dev-zero
Copy link

While working on a large code base (CP2K), a repeating and annoying topic is strings in various forms:

  • string lists: still needs a string type AFAIK because you can't put a variable length string inside a variable length list and is therefore very cumbersome without a wrapper for the string, the list or both
  • reading into strings: reading into an allocatable string is not supported, hence we need to buffer manually
  • reading large files as strings: mmap would be nice to avoid copying data unnecessarily, but this results again in an array of characters instead
  • being able to interface fast regex engines like re2c, hyperscan (or simply pcre) would be nice
  • encoding/decoding strings, multibyte/unicode support (although for most high performance codes probably not relevant), I guess @jacobwilliams has some experience here ;-) ... also, the discussion UCS4 vs UTF-8/utf8everywhere might be relevant in this context
  • ANSI support

@jvdp1
Copy link
Member

jvdp1 commented Apr 24, 2020

While working on a large code base (CP2K), a repeating and annoying topic is strings in various forms:

  • string lists: still needs a string type AFAIK because you can't put a variable length string inside a variable length list and is therefore very cumbersome without a wrapper for the string, the list or both
  • reading into strings: reading into an allocatable string is not supported, hence we need to buffer manually
  • reading large files as strings: mmap would be nice to avoid copying data unnecessarily, but this results again in an array of characters instead

Thank you.
Re: strings I would point to the discussions in #31, #32, and #69.
I think that the not yet covered topics you mention could be discussed there too (or in another issue if too specific?).

@certik
Copy link
Member

certik commented Apr 24, 2020

@dev-zero thanks for getting in touch. Besides what @jvdp1 posted, see also j3-fortran/fortran_proposals#24, j3-fortran/fortran_proposals#96 and j3-fortran/fortran_proposals#9.

Regarding Unicode, I think we should support Unicode in stdlib, we should use utf8 and I also posted at #11 (comment) with links to utf8 handling code that is simple enough to port to Fortran / stdlib.

If you want to help us implement any of these things, we would really appreciate it!

@dev-zero
Copy link

dev-zero commented May 4, 2020

If you want to help us implement any of these things, we would really appreciate it!

Will try, but I'm not sure whether I can spare the time (yet).

Another thing which came to mind should be part of an stdlib are: compatibility functions for compilers not fully implementing standards.

Two examples we encountered in CP2K or DBCSR:

  • newunit argument for open from F2008. If memory serves it was the Cray compiler which was rather late in implementing this one. Providing higher-level OO wrappers will probably make this a moot point, though.
  • findloc from F2008. Implemented in GCC 9.0 & Intel 2018

While missing intrinsics could be provided in a compat module, is redefining an intrinsic probably not doable transparently without using the CPP.

@ivan-pi
Copy link
Member

ivan-pi commented Jul 17, 2020

I found another FORTRAN 90 Numerical Library (https://sourceforge.net/projects/afnl/) developed by Alberto Ramos. I will edit the first post to include it.

The contents are the following:

  • MODULE NumTypes
  • MODULE Constants
  • MODULE Error
  • MODULE Integration
  • MODULE Optimization
  • MODULE Linear
  • MODULE NonNum
  • MODULE SpecialFunc
  • MODULE Statistics
  • MODULE Polynomial
  • MODULE Root
  • MODULE Fourier
  • MODULE Time

@David-Duffy
Copy link

I think a good implementation of a data frame. That is, a rectangular array where each column is one homogenous type - integer, character etc, but each column can be of any type. These exist in R, Pandas, Julia etc, and are the workhorse for statistical analysis. One will encounter arguments about whether this should all be in a "real database", and so you just need to provide appropriate Fortran interfaces, but the continued success of R, Pandas etc is a potent counter. For speed, there do have to be indices and hashes under the bonnet, and optimized sorts, joins, Fortran array type slices, and so on.

awvwgk pushed a commit that referenced this issue Jun 3, 2021
Proposition to generate code for integer, real and string from one single peace of code
jvdp1 pushed a commit that referenced this issue Jun 4, 2021
milancurcic pushed a commit that referenced this issue Aug 22, 2021
Expand unit testing for more cases
jvdp1 pushed a commit that referenced this issue Sep 16, 2021
Minor adjustments to deployment script
@awvwgk awvwgk added the meta Related to this repository label Sep 18, 2021
jvdp1 pushed a commit that referenced this issue Dec 31, 2021
jvdp1 pushed a commit that referenced this issue Jan 17, 2024
Addition of some fypp directives
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Related to this repository
Projects
None yet
Development

No branches or pull requests