Skip to content

Commit

Permalink
Add a Database abstraction and helpers for various data files.
Browse files Browse the repository at this point in the history
  • Loading branch information
blambeau committed Jun 26, 2024
1 parent d6033c7 commit 0b6de4e
Show file tree
Hide file tree
Showing 22 changed files with 684 additions and 95 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ Gemfile.lock
/spec/regression/**/*.db
/spec/unit/writer/result.xlsx
.world
.ruby-version
.DS_Store
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
## 0.23.0

* Add `Bmg.json` and `Bmg.yaml` factory methods, to get relations on top of
usual data files.

* Add a `Database` abstraction, with `Database.data_folder`, `Database.sequel`
and `Database.xlsx` factory methods and implementations, as well as
`Database#to_data_folder` and `Database#to_xlsx` dump methods.
See README for details.

## 0.22.0 - 2024-05-17

* Add the `minus` operation (also known as set difference, or EXCEPT in SQL).
Expand Down
99 changes: 85 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@ further down this README.
* [Where are base relations coming from?](#where-are-base-relations-coming-from)
* [Memory relations](#memory-relations)
* [Connecting to SQL databases](#connecting-to-sql-databases)
* [Reading files (csv, Excel, text)](#reading-files-csv-excel-text)
* [Reading data files](#reading-data-files-json-csv-yaml-text-xls--xlsx)
* [Connecting to Redis databases](#connecting-to-redis-databases)
* [Your own relations](#your-own-relations)
* [The Database abstraction](#the-database-abstraction)
* [List of supported operators](#supported-operators)
* [How is this different?](#how-is-this-different)
* [... from similar libraries](#-from-similar-libraries)
Expand Down Expand Up @@ -117,33 +118,38 @@ Bmg.sequel(:suppliers, sequel_db)
# {:array=>false})
```

### Reading files (csv, Excel, text)
### Reading data files (json, csv, yaml, text, xls & xlsx)

Bmg provides simple adapters to read files and reach Relationland as soon as
possible.

#### CSV files
#### JSON files

```ruby
csv_options = { col_sep: ",", quote_char: '"' }
r = Bmg.csv("path/to/a/file.csv", csv_options)
r = Bmg.json("path/to/a/file.json")
```

Options are directly transmitted to `::CSV.new`, check Ruby's standard
library.
The json file is expected to contain tuples of same heading.

#### Excel files
#### YAML files

You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
read `.xls` and `.xlsx` files with Bmg.
```ruby
r = Bmg.yaml("path/to/a/file.yaml")
```

The yaml file is expected to contain tuples of same heading.

#### CSV files

```ruby
roo_options = { skip: 1 }
r = Bmg.excel("path/to/a/file.xls", roo_options)
csv_options = { col_sep: ",", quote_char: '"' }
r = Bmg.csv("path/to/a/file.csv", csv_options)
```

Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
documentation.
Options are directly transmitted to `::CSV.new`, check Ruby's standard
library. If you don't provide them, `Bmg` uses `headers: true` (hence making
then assumption that attributes names are provided on first line), and makes a
best effort to infer the column separator.

#### Text files

Expand Down Expand Up @@ -173,6 +179,19 @@ r.type.attrlist
In this scenario, non matching lines are skipped. The `:line` attribute keeps
being used to have at least one candidate key (so to speak).

#### Excel files

You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
read `.xls` and `.xlsx` files with Bmg.

```ruby
roo_options = { skip: 1 }
r = Bmg.excel("path/to/a/file.xls", roo_options)
```

Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
documentation.

### Connecting to Redis databases

Bmg currently requires `bmg-redis` and `redis >= 4.6` to connect
Expand Down Expand Up @@ -240,6 +259,58 @@ restrictions down the tree) by overriding the underscored version of operators
Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
example. Keep in touch with the team if you need some help.

## The Database abstraction

The previous section focused on obtaining *relations*. In practice you frequently
have a collection of relations hence a *database*:

* A SQL database with multiple tables
* A list of data files, all in the same folder
* An excel file with various sheets

Bmg supports a simple Datbabase abstraction that serves those relations "by name",
in a simple way. A database can also be easily dumped back to a data folder of
json or csv files, or as simple xlsx files with multiple sheets.

### Connecting to a SQL Database

For a SQL database, connected with Sequel:

```
db = Bmg::Database.sequel(Sequel.connect('...'))
db.suppliers # yields a Bmg::Relation over the `suppliers` table
```

### Connecting to data files in the same folder

Data files all in the same folder can be seen as a very basic form of database,
and served as such. Bmg supports `json`, `csv` and `yaml` files:

```
db = Bmg::Database.data_folder('./my-database')
db.suppliers # yields a Bmg::Relation over the `suppliers.(json,csv,yml)` file
```

Bmg supports files in different formats in the same folder. When files with the
same basename exist, json is prefered over yaml, which is prefered over csv.

### Dumping a Database instance

As a data folder:

```
db = Bmg::Database.sequel(Sequel.connect('...'))
db.to_data_folder('path/to/folder', :json)
```

As an .xlsx file (any existing file will be erased, we don't support modifying
existing files):

```
require 'bmg/xlsx'
db.to_xlsx('path/to/file.xlsx')
```

## Supported operators

```ruby
Expand Down
12 changes: 12 additions & 0 deletions lib/bmg.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,16 @@ def csv(path, options = {}, type = Type::ANY)
end
module_function :csv

def json(path, options = {}, type = Type::ANY)
in_memory(path.load.map{|tuple| TupleAlgebra.symbolize_keys(tuple) })
end
module_function :json

def yaml(path, options = {}, type = Type::ANY)
in_memory(path.load.map{|tuple| TupleAlgebra.symbolize_keys(tuple) })
end
module_function :yaml

def excel(path, options = {}, type = Type::ANY)
Reader::Excel.new(type, path, options).spied(main_spy)
end
Expand Down Expand Up @@ -57,6 +67,8 @@ def main_spy=(spy)
require_relative 'bmg/relation/materialized'
require_relative 'bmg/relation/proxy'

require_relative 'bmg/database'

# Deprecated
Leaf = Relation::InMemory
end
35 changes: 35 additions & 0 deletions lib/bmg/database.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
module Bmg
class Database

def self.data_folder(*args)
require_relative 'database/data_folder'
DataFolder.new(*args)
end

def self.sequel(*args)
require 'bmg/sequel'
require_relative 'database/sequel'
Sequel.new(*args)
end

def self.xlsx(*args)
require 'bmg/xlsx'
require_relative 'database/xlsx'
Xlsx.new(*args)
end

def to_xlsx(*args)
require 'bmg/xlsx'
Writer::Xlsx.to_xlsx(self, *args)
end

def to_data_folder(*args)
DataFolder.dump(self, *args)
end

def each_relation_pair
raise NotImplementedError
end

end # class Database
end # module Bmg
67 changes: 67 additions & 0 deletions lib/bmg/database/data_folder.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
module Bmg
class Database
class DataFolder < Database

DEFAULT_OPTIONS = {
data_extensions: ['json', 'yml', 'yaml', 'csv']
}

def initialize(folder, options = {})
@folder = Path(folder)
@options = DEFAULT_OPTIONS.merge(options)
end

def method_missing(name, *args, &bl)
return super(name, *args, &bl) unless args.empty? && bl.nil?
raise NotSuchRelationError(name.to_s) unless file = find_file(name)
read_file(file)
end

def each_relation_pair
return to_enum(:each_relation_pair) unless block_given?

@folder.glob('*') do |path|
next unless path.file?
next unless @options[:data_extensions].find {|ext|
path.ext == ".#{ext}" || path.ext == ext
}
yield(path.basename.rm_ext.to_sym, read_file(path))
end
end

def self.dump(database, path, ext = :json)
path = Path(path)
path.mkdir_p
database.each_relation_pair do |name, rel|
(path/"#{name}.#{ext}").write(rel.public_send(:"to_#{ext}"))
end
path
end

private

def read_file(file)
case file.ext.to_s
when '.json'
Bmg.json(file)
when '.yaml', '.yml'
Bmg.yaml(file)
when '.csv'
Bmg.csv(file)
else
raise NotSupportedError, "Unable to use #{file} as a relation"
end
end

def find_file(name)
exts = @options[:data_extensions]
exts.each do |ext|
target = @folder/"#{name}.#{ext}"
return target if target.file?
end
raise NotSuchRelationError, "#{@folder}/#{name}.#{exts.join(',')}"
end

end # class DataFolder
end # class Database
end # module Bmg
35 changes: 35 additions & 0 deletions lib/bmg/database/sequel.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
module Bmg
class Database
class Sequel < Database

DEFAULT_OPTIONS = {
}

def initialize(sequel_db, options = {})
@sequel_db = sequel_db
end

def method_missing(name, *args, &bl)
return super(name, *args, &bl) unless args.empty? && bl.nil?
raise NotSuchRelationError(name.to_s) unless @sequel_db.table_exists?(name)
table = @sequel_db[name]
rel_for(table)
end

def each_relation_pair
return to_enum(:each_relation_pair) unless block_given?

@sequel_db.tables.each do |table|
yield(table, rel_for(table))
end
end

protected

def rel_for(table_name)
Bmg.sequel(table_name, @sequel_db)
end

end # class Sequel
end # class Database
end # module Bmg
41 changes: 41 additions & 0 deletions lib/bmg/database/xlsx.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
module Bmg
class Database
class Xlsx < Database

DEFAULT_OPTIONS = {
}

def initialize(path, options = {})
path = Path(path) if path.is_a?(String)
@path = path
@options = options.merge(DEFAULT_OPTIONS)
end

def method_missing(name, *args, &bl)
return super(name, *args, &bl) unless args.empty? && bl.nil?
rel = rel_for(name)
raise NotSuchRelationError(name.to_s) unless rel
rel
end

def each_relation_pair
return to_enum(:each_relation_pair) unless block_given?

spreadsheet.sheets.each do |sheet_name|
yield(sheet_name.to_sym, rel_for(sheet_name))
end
end

protected

def spreadsheet
@spreadsheet ||= Roo::Spreadsheet.open(@path, @options)
end

def rel_for(sheet_name)
Bmg.excel(@path, { sheet: sheet_name.to_s })
end

end # class Sequel
end # class Database
end # module Bmg
3 changes: 3 additions & 0 deletions lib/bmg/error.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,7 @@ class MisuseError < Error; end
# to backtrack to something more ruby-native.
class NotSupportedError < Error; end

# Raised when relation (variable) is not found
class NotSuchRelationError < Error; end

end
Loading

0 comments on commit 0b6de4e

Please sign in to comment.