Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split documentation for different data types into new sections #203

Merged
merged 4 commits into from
Nov 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ makedocs(
warnonly = is_ci_env,
pages = [
"Home" => "index.md",
"Types" => "types.md",
"Lattices" => "lattices.md",
"Atoms and crystals" => "atoms.md",
"Data grids" => "grids.md",
"File formats" => "filetypes.md",
"API" => Any[
"Lattices" => "api/lattices.md"
Expand Down
109 changes: 109 additions & 0 deletions docs/src/atoms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Atoms

Once a lattice is defined, we can include other data, and atomic coordinates are one of the most
common datasets encountered.

## `NamedAtom`

A `NamedAtom` contains two pieces of information: an atom name and the associated atomic number.
The atom name is a string no longer than 15 codepoints (internally, we use
`InlineStrings.InlineString15` to store this data), and the atomic number is an `Int`.

### Constructors

The default constructor for `NamedAtom` is `NamedAtom(::AbstractString, ::Integer)`. The atom name
can be inferred and specified automatically as the associated symbol if only a number is given.

In the case of only a string being provided, the atomic number may be inferred. This is done by
matching the first letters up to a non-letter character of the string with a known symbol. If no
match is found, the atomic number will be 0, corresponding to a dummy atom (see below).

!!! note
Support for inferring from symbols representing unnamed elements or atomic numbers of those
elements (for instance, element 119, Uue) is not yet implemented.

### Sorting

`NamedAtom` types are sorted by atomic number first, then by name.

### Dummy atoms

Sometimes, it is useful to include _dummy atoms_: sites which do not correspond to a real atomic
position, but are useful to track in certain circumstances. Dummy atoms have atomic number zero.

If no string is provided, `NamedAtom(0)` returns `NamedAtom("dummy", 0)`, as does any argument not
between 1 and 118.

## Atomic positions

### Atom position data

Atomic positions are given by a `CartesianAtomPosition{D}` for Cartesian coordinates, or a
`FractionalAtomPosition{D}` for fractional coordinates with respect to a lattice. This information
consists of a `NamedAtom`, a coordinate in real space as an `SVector{D,Float64}`, and optional
occupancy data for partially occupied sites - if not specified, it defaults to 1 (fully occupied).

### Lists of atomic positions

The `AbstractAtomList{D}` type represents a list of atoms.

`AtomList{D}` is a wrapper for `Vector{CartesianAtomPosition{D}}`, and represents atomic positions
in an arbitrary `D`-dimensional space.

`PeriodicAtomList{D}` wraps `Vector{FractionalAtomPosition{D}}`, but also includes the basis vectors
of the lattice associated with fractional coordinates.

The two types can be interconverted with the appropriate constructors. An `AtomList` can be
constructed from a `PeriodicAtomList` directly, but for the reverse conversion, the basis vectors of
the new `PeriodicAtomList` must be specified.

### Sorting

Structures often have many identical `NamedAtom` instances, so sorting of `FractionalAtomPosition`
and `CartesianAtomPosition` is accomplished by whichever has the lowest initial coordinate, moving
to the next coordinate in the case of ties.

### Constructing supercells

In many cases, it is useful to construct a supercell from a unit cell. This can be done to convert
a primitive cell to a conventional representation, or to a cell of a larger size or different shape.

The `supercell(::PeriodicAtomList, ::AbstractMatrix{<:Integer})` function takes an
`AbstractMatrix{<:Integer}`, which transforms the lattice basis vectors by right multiplication, and
fills the new lattice with copies of atoms from the original cell. Scalar arguments (calling
`supercell(::PeriodicAtomList, ::Integer)`) scale each direction by the identity matrix multiplied
by the scalar, and vector arguments (calling `supercell(::PeriodicAtomList, ::Integer)`) convert the
vector to a diagonal matrix.

Internally, this is done using the [Smith normal form][1] of the transformation matrix, using the
diagonal factors to extend the lattice the correct number of times along specific dimensions, and
moving them into the cell with the help of the unimodular factors associated with the Smith normal
form.

```@docs; canonical=false
Electrum.supercell
```
## Crystals

The `Crystal{D}` represents a generating set of atomic positions for a periodic structure. Along
with a `PeriodicAtomList`, a `Crystal{D}` contains the space group and origin setting associated
with the data, as well as an integer matrix transformation describing the unit cell generated by
the given cell. The `set_transform!` function allows for the transformation matrix to be altered
after the creation of a `Crystal{D}` object.

This data can be converted to a `PeriodicAtomList{D}` with a constructor or `Base.convert` call, and
this uses the `supercell` function to construct the full list of atoms from the generating set.

!!! warning
At the moment, the use of space group data to generate atomic sites is not yet implemented.
A space group number and setting may be provided, but this will not be used to generate any
atomic sites.

Additionally, we provide the `CrystalWithDatasets{D,K,V}`, which associates a `Crystal{D}` with a
`Dict{K,V}` containing datasets associated with the structure. This data structure may be indexed
like the underlying dictionary.

!!! note
`CrystalWithDatasets{D,K,V}` will likely be changed significantly or removed in Electrum 0.2.

[1]: https://en.wikipedia.org/wiki/Smith_normal_form
68 changes: 68 additions & 0 deletions docs/src/grids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Data grids

The output of many quantum chemistry packages are datasets defined on a spatial grid in real or
reciprocal space. Theoretical tools may want to perform mathematical operations on this data, but
operating on bare arrays containing the dta requires tracking properties associated with each array,
such as the basis vectors of the lattice

## `DataGrid`, `RealDataGrid`, and `ReciprocalDataGrid`

An `DataGrid{D,B<:Electrum.LatticeBasis,S<:AbstractVector{<:Real},T}` contains data defined in a
crystal lattice of `D` with basis vectors of type `B`, a shift parameter of type `S`, and elements
of type `T`, either in real space (`RealDataGrid`) or in reciprocal space (`ReciprocalDataGrid`).
These aliases are defined as follows:

```julia
const RealDataGrid{D,T} = DataGrid{D,RealBasis{D,Float64},SVector{D,Float64},T}
const ReciprocalDataGrid{D,T} = DataGrid{D,ReciprocalBasis{D,Float64},KPoint{D},T}
```
!!! note
Look closely at the above definition: the shift data type for `RealDataGrid{D}` is
`SVector{D,Float64}`, but the shift data type for `ReciprocalDataGrid{D}` is `KPoint{D}`. When
the `DataGrid` constructor without the shift type parameter is invoked, the basis type is used
to infer the appropriate shift type so that a `RealDataGrid` or `ReciprocalDataGrid` is
constructed.

`DataGrid` uses zero-based, periodic indexing: the first index of an `AbstractDataGrid{D}` is
`zero(NTuple{D,Int})`, and indices whose moduli with respect to size along that dimension are
identical will reference the same element: for instance, for `g::AbstractDataGrid{3}` with size
`(10, 10, 10)`, `g[69, 420, 1337] === g[9, 0, 7]`. Encountering a `BoundsError` is not possible
when indexing an `DataGrid`.

The basis of an `DataGrid` can be recovered with `basis(::DataGrid)`, which will be of the type
specified by the type parameter.

## Broadcasting and mathematical operations

Broadcasting is defined for `DataGrid` with a custom `Base.Broadcast.BroadcastStyle` subtype:
```julia
Electrum.DataGridStyle{D,B,S} <: Broadcast.AbstractArrayStyle{D}
```
This allows `DataGrid` instances to operated on with dot syntax. However, they must share lattice
basis vectors and shift values. If they do not match, an `Electrum.LatticeMismatch` exception will
be thrown.

!!! info
Although `Base.Broadcast.ArrayStyle` is usually overridden by other subtypes of
`Base.Broadcast.AbstractArrayStyle`, it does not override `Electrum.DataGridStyle`. Adding a
`DataGrid` to an `Array` returns an `Array`, and adding a `DataGrid` to other `AbstractArray`
subtypes returns the `AbstractArray` subtype defined by the `Broadcast.BroadcastStyle`. In the
case of a dimension mismatch, the broadcast style wll be `Broadcast.ArrayConflict` - the
operation will throw a `DimensionMismatch`.

The `+` and `-` operators are defined for `DataGrid` instances, and they are faster than the
broadcasted `.+` and `.-` equivalents. As with the broadcasted versions, checks are implemented to
ensure that the lattice basis vectors and shifts match.

Similarly, the `*`, `/`, and `\` operators are defined for pairs of `DataGrid` and `Number`
instances, and again, are faster than their broadcasted equivalents.

### Fourier transforms

The Fourier transform and its inverse are available through an overload of `FFTW.fft()` and
`FFTW.ifft()`. The transforms are normalized with respect to the basis vectors of the space, so
for `g::DataGrid`, `ifft(fft(g)) ≈ g` (to within floating point error).

`fft` and `ifft` defined for `DataGrid` generally, but it is critical to note that in the vast
majority of cases, you will want to call `fft` on `RealDataGrid` instances and `ifft` on
`ReciprocalDataGrid` instances.
142 changes: 142 additions & 0 deletions docs/src/lattices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Lattices

The starting point for solid state structures is a periodic lattice. Electrum provides convenient
types for working with lattices of arbitrary dimension

## Real and reciprocal space traits

The `Electrum.BySpace{D}` supertype contains two types, `Electrum.ByRealSpace{D}` and
`Electrum.ByReciprocalSpace{D}`, where `D` is the number of dimensions associated with the dataset.
These types are used to denote whether data is associated with real space (e.g. electron density) or
reciprocal space (e.g. the Fourier transform of the electron density). When working with lattices,
it is important to distinguish the two types of lattice: this is the primary reason why bare
`SMatrix{D,D,T}` instances are not used in this package.

## `Electrum.LatticeBasis` and methods

The `Electrum.LatticeBasis{S<:Electrum.BySpace,D,T}` data type is a wrapper for an
`SMatrix{D,D,T,D^2}` which represents the real or reciprocal space basis vectors of a lattice.

Electrum does not export `Electrum.LatticeBasis`, but instead provide the following aliases. This
allows developers to alter the implementation of `Electrum.LatticeBasis` without breaking the API:
```julia
const RealBasis = Electrum.LatticeBasis{ByRealSpace}
const ReciprocalBasis = Electrum.LatticeBasis{ByReciprocalSpace}
const AbstractBasis = Electrum.LatticeBasis{<:BySpace}
```
!!! warning
Note that `RealBasis{D} === Electrum.LatticeBasis{ByRealSpace,D}`, and not
`Electrum.LatticeBasis{ByRealSpace{D},D}`. While the latter is a valid type and can be used to
store data, the result of `Electrum.DataSpace(::Electrum.LatticeBasis{ByRealSpace{D},D})` is
an error!

The units of `RealBasis` are bohr, and those of `ReciprocalBasis` are radians over bohr,
corresponding with the convention that the dot product of a real basis vector with a corresponding
reciprocal basis vector is 2π.

### Construction

A `RealBasis` or `ReciprocalBasis` can be constructed from an `AbstractMatrix`, `Tuple`, or iterator
of the correct size.

In the case of a `StaticMatrix`, the size and element type are already known, so the constructors
can simply be called as `RealBasis` or `ReciprocalBasis`:
```
julia> RealBasis(SMatrix{3,3}(1, 0, 0, 0, 2, 0, 0, 0, 3))
Electrum.LatticeBasis{Electrum.ByRealSpace, 3, Int64}:
a: [ 1.000000 0.000000 0.000000 ] (1.000000 bohr)
b: [ 0.000000 2.000000 0.000000 ] (2.000000 bohr)
c: [ 0.000000 0.000000 3.000000 ] (3.000000 bohr)
```
However, in the case of a `Matrix` or other dynamically sized data, the dimension is not known and
should be supplied to avoid an exception being thrown. This is to avoid type instability arising
from determining the size at runtime:
```
julia> RealBasis{3}([1 0 0; 0 2 0; 0 0 3])
Electrum.LatticeBasis{Electrum.ByRealSpace, 3, Int64}:
a: [ 1.000000 0.000000 0.000000 ] (1.000000 bohr)
b: [ 0.000000 2.000000 0.000000 ] (2.000000 bohr)
c: [ 0.000000 0.000000 3.000000 ] (3.000000 bohr)
```
In either case, you can supply the element type (though it must be proceeded by the dimension in all
cases) to convert the input elements to the desired type (here, it's `Float32`):
```
julia> RealBasis{3,Float32}(SMatrix{3,3}(1, 0, 0, 0, 2, 0, 0, 0, 3))
Electrum.LatticeBasis{Electrum.ByRealSpace, 3, Float32}:
a: [ 1.000000 0.000000 0.000000 ] (1.000000 bohr)
b: [ 0.000000 2.000000 0.000000 ] (2.000000 bohr)
c: [ 0.000000 0.000000 3.000000 ] (3.000000 bohr)
```
### Conversion

A `RealBasis` can be converted to a `ReciprocalBasis` via `Base.convert` or their constructors,
and vice versa:
```julia
b = ReciprocalBasis(a)
c = convert(RealBasis, b)
```
!!! tip
Avoid needless back-and-forth conversions between `RealBasis` and `ReciprocalBasis` to avoid
numerical instabilities.

### Mathematical operations

The `RealBasis` and `ReciprocalBasis` types support the majority of common operations used in
solid-state chemistry, including addition, subtraction, multiplication, left division, and right
division.

Importantly, it supports the QR decomposition provided by `LinearAlgebra.qr`. This decomposition is
useful in that it generates a `Q`` factor, which is an orthogonal matrix (representing a Euclidean
point isometry - compositions of rotations and reflections) and an ``R`` factor, which is an upper
triangular matrix. This operation is useful in converting lattices to a standard orientation: in the
case of a QR decomposition, the $R$ factor places the first basis vector of the lattice along the
first basis vector of space. In 3D, this means that ``\vec{a}`` is collinear with ``\vec{x}``.

Calling `LinearAlgebra.qr(::AbstractBasis{D,T})` returns a
`StaticArrays.QR{SMatrix{D,D,T,D^2}, SMatrix{D,D,T,D^2}, SVector{D,Int}}`, so the ``Q`` and ``R``
factors are bare ``SMatrix`` instances. While this is fine for the ``Q`` matrix, the ``R`` matrix
represents a basis, and we expect an `AbstractBasis` return value. The `triangularize` function
returns the ``R`` factor of a QR decomposition as an `AbstractBasis`, discarding the ``Q`` factor.

```@docs; canonical=false
Electrum.triangularize
```
!!! note
Support for the corresponding LQ decomposition (where ``L`` is a lower triangular matrix) is not
yet implemented.

### Implementation details

The `SMatrix{D1,D2,T,L}` type requires four type parameters - `D1` and `D2` are each of the matrix
dimensions, and `T` is the element type of the matrix. However, one last parameter is needed: `L`,
the length of the `NTuple` that backs the `SMatrix`.

Julia currently does not allow for the calculation of type parameters from other type parameters,
which poses a problem in the declaration of structs. Technically, a type like `SMatrix{3,3,Float64}`
is an abstract type, as the `L` parameter is undeclared, even though the value of `L` is always
`D1 * D2`. By declaring a struct to have this type, there seems to be a performance drop.

The `LatticeBasis` types wrap an `SVector{D,SVector{D,Float64}}`. This only requires a single
type parameter `D`, and allows for fully concrete struct declarations. The `:matrix` property
converts this data to an `SMatrix{D,D,T,D^2}` instance, and methods needing to access something that
looks like a matrix reference this property.

## Basis vectors in composite types

Some types (such as `Electrum.DataGrid`) store a set of basis vectors as part of the dataset. The
`basis` function allows a user to access that set of basis vectors. By default, `basis(x)` returns
`x.basis`, so defining a field `basis::RealBasis{...}` or `basis::ReciprocalBasis{...}` implements
these functions automatically.

!!! warning
`basis(x)` returns an `AbstractBasis`, but there is no guarantee whether the return type is
`RealBasis` or `ReciprocalBasis`. Use `convert(::Type{<:AbstractBasis}, basis(x))` to ensure
that the return type is what you expect.

The type of basis vectors stored also allows for inference of the data space trait, as the default
definition of `DataSpace(::Type{T})` is `fieldtype(T, :basis)`. We encourage you to choose your
basis vector type to match the data you wish to represent.

!!! note
For performance reasons, we encourage you to define struct fields with concrete types, and use
type parameters in your struct definitions to ensure that the type is concrete.
Loading
Loading