Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local datasets scaling graph v2 #77

Merged
merged 14 commits into from
Jan 12, 2017
4 changes: 3 additions & 1 deletion lib/atlas.rb
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,9 @@

require_relative 'atlas/graph_deserializer'
require_relative 'atlas/scaler'
require_relative 'atlas/scaled_attributes'
require_relative 'atlas/scaler/area_attributes_scaler'
require_relative 'atlas/scaler/graph_scaler'
require_relative 'atlas/scaler/time_curve_scaler'

require_relative 'atlas/merit_order_details'
require_relative 'atlas/storage_details'
Expand Down
113 changes: 86 additions & 27 deletions lib/atlas/csv_document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,22 +37,54 @@ def self.curve(path)
end

# Public: Creates a new CSV document instance which will read data from a
# CSV file on disk. Document are read-only.
# CSV file on disk. Documents are read-write.
#
# path - Path to the CSV file.
#
# Returns a CSVDocument.
def initialize(path, normalizer = KEY_NORMALIZER)
def initialize(path, headers = nil)
@path = Pathname.new(path)

@table = CSV.table(@path.to_s, {
converters: [YEAR_NORMALIZER, :all],
header_converters: [normalizer],
})
if headers
raise(ExistingCSVHeaderError, path) if @path.file?
@headers = headers.map(&KEY_NORMALIZER)
@table = CSV::Table.new([CSV::Row.new(@headers, @headers, true)])
else
@table = CSV.table(@path.to_s, {
converters: [YEAR_NORMALIZER, :all],
header_converters: [KEY_NORMALIZER],
# Needed to retrieve the headers in case
# of an otherwise empty csv file
return_headers: true
})

@headers = table.headers

# Delete the header row for the internal representation -
# will be dynamically (re-)created when outputting
table.delete(0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current use of this method (in TimeCurveScaler) looks like: CSVDocument.create → add data → save.

That involves two writes to disk (in create and again in save), and if something goes wrong in "add data" or the second save, you end up with a half-complete CSV file on disk. Perhaps create could yield itself prior to writing, so that the user can add their initial data? (like File.open(path, 'w') { |f| ... })

CSVDocument.create(path, headers) do |doc|
  doc.set(:a, :b, 1)
  doc.set(:c, :d, 2)
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are so right 😔
I did it this way because Ruby's CSV class is not really good for read-write access and because I did not want to mess up the current CSVDocument.new signature.
But I found an okish way to build only the document and not save it until later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be commit 310ab25

raise(BlankCSVHeaderError, path) if @headers.any?(&:nil?)
end
end

@headers = @table.headers
# Public: Saves the CSV document to disk
def save!
FileUtils.mkdir_p(path.dirname)
File.write(path, table.to_csv)
self
end

raise(BlankCSVHeaderError, path) if @headers.any?(&:nil?)
# Public: Sets the value of a cell identified by its row and column.
# Non-existing rows are created automatically.
#
# row - The unique row name.
# column - The name of the column in which the data shall be put.
# value - The value that shall be set.
#
# Returns the set cell contents.
def set(row, column, value)
set_cell(normalize_key(row), normalize_key(column), value)
end

# Public: Retrieves the value of a cell identified by its row and column.
Expand All @@ -65,6 +97,14 @@ def get(row, column)
cell(normalize_key(row), normalize_key(column))
end

def row_keys
table.map { |row| normalize_key(row[0]) }
end

def column_keys
@headers.map(&method(:normalize_key))
end

#######
private
#######
Expand All @@ -74,20 +114,46 @@ def get(row, column)
#
# Returns the cell content.
def cell(row_key, column_key)
unless header?(column_key)
fail UnknownCSVCellError.new(self, column_key)
end
assert_header(column_key)

(data = row(row_key)) && data[column_key]
(table_row = row(row_key)) && table_row[column_key]
end

# Internal: Sets the value of a cell, raising an UnknownCSVCellError if no
# such column exists. Non-existing rows are created automatically.
#
# Returns the cell content.
def set_cell(row_key, column_key, value)
assert_header(column_key)

get_or_create_row(row_key)[column_key] = value
end

# Internal: Finds the row by the given +key+.
#
# Returns a CSV::Row or raises an UnknownCSVRowError if no such row exists
# in the file.
def row(key)
@table.find { |row| normalize_key(row[0]) == key } ||
fail(UnknownCSVRowError.new(self, key))
safe_row(key) || fail(UnknownCSVRowError.new(self, key))
end

# Internal: Finds the row by the given +key+.
#
# Returns a CSV::Row or nil.
def safe_row(key)
table.find { |row| normalize_key(row[0]) == key }
end

# Internal: Finds the row by the given +key+ or creates it if no such
# row exists in the file.
#
# Returns a CSV::Row.
def get_or_create_row(key)
safe_row(key) || begin
row = CSV::Row.new(@headers, [key])
table << row
safe_row(key)
end
end

# Internal: Converts the given key to a format which removes all special
Expand All @@ -98,12 +164,14 @@ def normalize_key(key)
KEY_NORMALIZER.call(key)
end

# Internal: Determines if the named column exists in the file. This will
# always be true if the column is named by index (a number)
# Internal: Raises unless the named column exists in the file.
# Never raises if the column is named by index (a number)
#
# Returns true or false.
def header?(key)
key.is_a?(Numeric) || @headers.nil? || @headers.include?(key)
def assert_header(key)
unless key.is_a?(Numeric) || @headers.nil? || @headers.include?(key)
fail UnknownCSVCellError.new(self, key)
end
end
end # CSVDocument

Expand All @@ -120,13 +188,4 @@ def get(row)
cell(normalize_key(row), 1)
end
end # CSVDocument::OneDimensional

# A CSVDocument which reads CSV files which are output by the Exporter. Each
# left-hand column is a node, edge, or slot key whose value needs to be
# preserved without removing special characters.
class CSVDocument::Production < CSVDocument
def initialize(path)
super(path, ->(value) { value.to_sym })
end
end # CSVDocument::Production
end # Atlas
13 changes: 10 additions & 3 deletions lib/atlas/dataset.rb
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,14 @@ def efficiencies(key)
dataset_dir.join("efficiencies/#{ key }_efficiency.csv"))
end

def time_curves_dir
dataset_dir.join('time_curves')
end

def time_curve_path(key)
time_curves_dir.join("#{ key }_time_curve.csv")
end

# Public: Retrieves the time curve data for the file whose name matches
# +key+.
#
Expand All @@ -160,15 +168,14 @@ def efficiencies(key)
# Returns a CSVDocument.
def time_curve(key)
(@time_curves ||= {})[key.to_sym] ||=
CSVDocument.new(
dataset_dir.join("time_curves/#{ key }_time_curve.csv"))
CSVDocument.new(time_curve_path(key))
end

# Public: Retrieves all the time curves for the dataset's region.
#
# Returns a hash of document keys, and the CSVDocuments.
def time_curves
Pathname.glob(dataset_dir.join('time_curves/*.csv')).each do |csv_path|
Pathname.glob(time_curves_dir.join('*.csv')).each do |csv_path|
time_curve(csv_path.basename('_time_curve.csv').to_s)
end

Expand Down
5 changes: 5 additions & 0 deletions lib/atlas/dataset/derived_dataset.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ def dataset_dir
@dataset_dir ||= Atlas.data_dir.join(DIRECTORY, base_dataset)
end

# Overwrite
def time_curves_dir
Atlas.data_dir.join(DIRECTORY, key.to_s, 'time_curves')
end

private

def base_dataset_exists
Expand Down
5 changes: 5 additions & 0 deletions lib/atlas/errors.rb
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,11 @@ def self.error_class(superclass = AtlasError, &block)
"of the cells in the first row must contain a non-blank value."
end

ExistingCSVHeaderError = error_class do |path|
"Column headers provided although CSV file #{path} already exists"
end


# Parser Errors ------------------------------------------------------------

CannotIdentifyError = error_class(ParserError) do |string|
Expand Down
33 changes: 12 additions & 21 deletions lib/atlas/graph_persistor.rb
Original file line number Diff line number Diff line change
@@ -1,24 +1,15 @@
module Atlas
class GraphPersistor
def initialize(dataset, path)
@dataset = dataset
@path = path
end

def self.call(dataset, path)
new(dataset, path).persist!
end

def persist!
File.open(@path, 'w') do |f|
f.write EssentialExporter.dump(refinery_graph).to_yaml
end
end

private

def refinery_graph
Runner.new(@dataset).refinery_graph(:export)
end
# Public: Builds the graph and exports it to a YAML file.
#
# dataset - This dataset's graph will be built and persisted
# path - File to which the graph will be exported
# export_modifier - Will be called on the graph's exported hash prior to saving it
#
# Returns a Hash
GraphPersistor = lambda do |dataset, path, export_modifier: nil|
data = EssentialExporter.dump(Runner.new(dataset).refinery_graph(:export))
export_modifier.call(data) if export_modifier
File.write(path, data.to_yaml)
data
end
end
37 changes: 0 additions & 37 deletions lib/atlas/scaled_attributes.rb

This file was deleted.

45 changes: 26 additions & 19 deletions lib/atlas/scaler.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,31 @@ def initialize(base_dataset_key, derived_dataset_name, number_of_residences)

def create_scaled_dataset
derived_dataset = Dataset::DerivedDataset.new(
@base_dataset.attributes
.merge(scaled_attributes)
.merge(new_attributes))
@base_dataset.attributes.
merge(AreaAttributesScaler.call(@base_dataset, scaling_factor)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: can this be indented 2 lines further for the sake of readability? I don't care to strongly about it though so I'm fine either way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 8f85716

merge(new_attributes))

derived_dataset.save!

GraphPersistor.call(@base_dataset, derived_dataset.graph_path)
GraphPersistor.call(@base_dataset, derived_dataset.graph_path, export_modifier: Scaler::GraphScaler.new(scaling_factor))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we call export_modifier, scaler for now. I understand that it's practical to keep it abstract; migth there be a reason to one day support (multiple?) "export modifiers". But I'm not so sure we should do this already considering that only 'scaling' is relevant for local datasets.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that I would like to avoid the word "scaling" completely inside graph_persistor.rb since the latter has nothing to do with the former.


TimeCurveScaler.call(@base_dataset, scaling_factor, derived_dataset)
end

private

def value
@number_of_residences
end

def base_value
@base_dataset.number_of_residences
end

def scaling_factor
value.to_r / base_value.to_r
end

def new_attributes
id = Dataset.all.map(&:id).max + 1
{
Expand All @@ -28,20 +42,13 @@ def new_attributes
key: @derived_dataset_name,
area: @derived_dataset_name,
base_dataset: @base_dataset.area,
scaling: scaling,
scaling:
{
value: value,
base_value: base_value,
area_attribute: 'number_of_residences',
},
}
end

def scaling
{
value: @number_of_residences,
base_value: @base_dataset.number_of_residences,
area_attribute: 'number_of_residences',
}
end

def scaled_attributes
ScaledAttributes.new(@base_dataset, @number_of_residences).scale
end
end
end
end # Scaler
end # Atlas
27 changes: 27 additions & 0 deletions lib/atlas/scaler/area_attributes_scaler.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
module Atlas
class Scaler::AreaAttributesScaler
# Only attributes common to FullDataset and DerivedDataset
# may be scaled
SCALEABLE_AREA_ATTRS = Atlas::Dataset.attribute_set
.select { |attr| attr.options[:proportional] }.map(&:name).freeze

def self.call(*args)
new(*args).scale
end

def initialize(base_dataset, scaling_factor)
@base_dataset = base_dataset
@scaling_factor = scaling_factor
end

def scale
result = {}
SCALEABLE_AREA_ATTRS.map do |attr|
if value = @base_dataset[attr]
result[attr] = Util::round_computation_errors(value * @scaling_factor)
end
end
result
end
end # Scaler::AreaAttributesScaler
end # Atlas
Loading