- Add extra
xgboost
on PyPI. sensAI supports a wide range of XGBoost versions (dating back to 2020), but with the extra, we opted to use 1.7 as a lower bound, as compatibility with this version is well-tested. util
:util.version
: Add methodsVersion.is_at_most
andVersion.is_equal
evaluation
:EvaluationResultCollector
: Add methodis_plot_creation_enabled
data
:InputOutputData
: Add methodto_df
- Add module
data.dataset
containing sample datasets (mainly for demonstration purposes)
evaluation
:ModelEvaluation
(and subclasses): Fix plots being shown if noResultWriter
is used even thoughshow_plots=False
vector_model
:VectorModel
: Fix data frame transformers not appearing in string representations
data_transformation
:DFTOneHotEncoder
: Fitting failed in the presence of missing values
util
- Minimise required dependencies for all modules in this package in preparation of the release of sensAI-utils
util.logging
:- Fix type annotations of
run_main
andrun_cli
- Fix type annotations of
util.cache
:- Add new base class
KeyValueCache
alongsidePersistentKeyValueCache
- Add
InMemoryKeyValueCache
PickleCached
- Rename to
pickle_cached
, keeping old name as alias - Change implementation to use nested functions instead of a class to improve IDE support
- Auto-create the storage directory if it does not yet exist
- Rename to
- Support
cloudpickle
as a backend
- Add new base class
columngen
:ColumnGenerator
: add methodto_feature_generator
evaluation
:MultiDataEvaluation
: Add option to supply test data (without using splitting)VectorRegressionModelEvaluator
: Handle output column mismatch between model output and ground truth for the case where there is only a single column, avoiding the exception and issuing a warning instead
dft
:DFTNormalisation.RuleTemplate
: Add attributesfit
andarray_valued
util.deprecation
: Applyfunctools.wrap
to retain meta-data of wrapped functionutil.logging
:- Support multiple configuration callbacks in
set_configure_callback
- Add line number to default format (
LOG_DEFAULT_FORMAT
) - Add function
is_enabled
to check whether a log handler is registered - Add context manager
LoggingDisabledContext
to temporarily disable logging - Add
FallbackHandler
to support logging to a fallback destination (if no other handlers are defined)
- Support multiple configuration callbacks in
util.io
:ResultWriter
:- Allow to disable an instance such that no results are written (constructor parameter
enabled
) - Add default configuration for closing figures after writing them (constructor parameter
close_figures
) write_image
: Improve layout in written images by settingbbox_inches='tight'
- Allow to disable an instance such that no results are written (constructor parameter
vectoriser
:SequenceVectoriser
:- Allow to inject a sequence item identifier provider
(instance of new class
ItemIdentifierProvider
) in order to determine the set of relevant unique items when using fitting mode UNIQUE - Allow sharing of vectorisers between instances such that a previously fitted vectoriser can be reused in its fitted state, which can be particularly useful for encoder-decoder settings where the decoding stage uses some of the same features (vectorisers) as the encoding stage.
- Allow to inject a sequence item identifier provider
(instance of new class
- Make Vectorisers aware of their 'fitted' status.
torch
:TorchVectorRegressionModel
: Add support for auto-regressive predictions by adding classTorchAutoregressiveResultHandler
and methodwith_autogressive_result_handler
LSTNetwork
:- Add new mode 'encoder', where the output of the complex path prior to the dense layer is returned
- Changed constructor interface to comply with PEP-8
- Add package
seq
for encoder-decoder-style sequence models, adding the highly flexible vector model implementationEncoderDecoderVectorRegressionModel
and a multitude of low-level encoder and decoder modules
data
:- Add
DataFrameSplitterColumnEquivalenceClass
, which splits a data frame based on equivalence classes of a given column
- Add
evaluation
:ModelEvaluation
(and derived classes): Support direct specification of the test data
(previously only indirect specification via a splitter was supported)
GridSearch
: Change return value to a result object for convenient retrieval
TagBuilder
: Fix return value ofwith_component
ModelEvaluation
:create_plots
did not track plots with given tracking context ifshow_plots
=False andresult_writer
=None.ParametersMetricsCollection
:csv_path
could not be NoneLSTNetworkVectorClassificationModel
is now functional in v1, improving the representation (no more dictionaries). This breaks compatibility with sensAI v0.x representations of this class.
tracking
:- Improve (under-the-hood) tracking interfaces, introducing the concept of a tracking
context (class
TrackingContext
, which is typically model-specific) in addition to the more high-level 'experiment' concept - Full support for cross-validation
- Adapt & improve MLflow tracking implementation
- Improve (under-the-hood) tracking interfaces, introducing the concept of a tracking
context (class
util.datastruct
:SortedKeysAndValues
,SortedKeyValuePairs
: Add__len__
featuregen
:FeatureCollector
: Add factory methods for the generation of DFTNormalisation and DFTOneHotEncoder instances (for convenience)FeatureGeneratorRegistry
:- Improve type annotation of singleton dictionary
- Add convenience method
collect_features
, which creates a FeatureCollector
util.io
:write_data_frame_csv_file
: Add optionsindex
andheader
util.pickle
:dump_pickle
,load_pickle
:PickleLoadSaveMixin
: Support passingPath
objects
vector_model
:- Pre-processors are now included in models string representations by default
torch
:TorchVector*Model
: Improve type hints for with* methods
evaluation
:MultiDataModelEvaluation
(previouslyMultiDataEvaluationUtil
):- Add model description/string representation to result object
- Add class
CrossValidationSplitterNested
(for nested cross-validation) ModelComparisonData.Result
: Add methoditer_evaluation_data
feature_selection
:- Add
RecursiveFeatureElimination
(to complement existing CV-based implementation)
- Add
util.string
:- Add class
TagBuilder
(for generation of dataset/experiment tags/identifiers)
- Add class
util.logging
:- Add in-memory logging (
add_memory_logger
,get_memory_log
) - Reuse configured log format (if any) for both file & in-memory loggers
- Add functions
run_main
andrun_cli
for convenient setup - Add
set_configure_callback
for third-party usage ofconfigure
, allowing users to add additional configuration via a callback - Add
remove_log_handler
- Add
FileLoggerContext
for file-based logging within awith
-block
- Add in-memory logging (
- Refactoring:
- Module
featuregen
is now a package with modulesfeature_generator
(all feature generators)feature_generator_registry
(registry and feature collector)
- Module
- Testing:
- Add test for typical usage of
FeatureCollector
in conjunction withFeatureGeneratorRegistry
- Add test for typical usage of
-
Changed all camel case interfaces (methods and parameters) as well as local variables to use snake case in order to align more closely with PEP 8.
This breaks source-level compatibility with earlier v0 releases. However, persisted objects from earlier versions should still be loadable, as attribute names in classes that may have been persisted remain in camel case. Strictly speaking, PEP 8 makes no statement about the format of attribute names, so there is not really a violation anyway.
-
Removed some deprecated interfaces (particularly support for the kwargs/dict interface in parallel to parameter objects in evaluators)
-
TorchVector*Model
: Changed construction of containedTorchModel
to a no-args factory (i.e. support formodelArgs
andmodelKwArgs
dropped). The new mechanism is both simpler and does not encourage usage patterns where correct construction cannot be statically checked (in contrast to the old mechanism). The new mechanisms encourages the implementation of dedicated factory methods (but could be abused withfunctools.partial
, of course). -
FeatureGeneratorRegistry
: Removed support for discouraged mechanism of setting/getting feature generator factories via__setattr__
/__getattr__
-
NNOptimiserParams
: Do not use kwargs for parameters to be passed on to the underlying optimiser, use dictoptimiser_args
instead -
MultiDataModelEvaluation
(previouslyMultiDataEvaluationUtil
):- Moved evaluator and cross-validator params to constructor
- Removed deprecated method
compare_models_cross_validation
-
RegressionEvalStats
: Rename methods using inappropriate prefixget
(nowcompute
) -
Renamed high-level evaluation classes:
RegressionEvalUtil
renamed toRegressionModelEvaluation
ClassificationEvalUtil
renamed toClassificationModelEvaluation
MultiDataEvaluationUtil
renamed toMultiDataModelEvaluation
Vector*ModelEvaluatorParams
->*EvaluatorParams
-
Changed default parameters of
SkLearnDecisionTreeVectorClassificationModel
andSkLearnRandomForestVectorClassificationModel
to align with sklearn defaults
ToStringMixin
: Prevent infinite recursion for case where ToStringMixin references a bound method of itselfTorchVectorModels
: Dropped support for model kwargs in constructorMultiDataModelEvaluation
(previouslyMultiDataEvaluationUtil
):- dataset key column was not removed prior to mean computation (would fail if value is non-numeric)
- Combined eval stats were not logged
EvalStatsClassification
: Do not attempt to create precision/recall plots if class probabilities are unavailable
Final pre-release (primarily for internal use at jambit GmbH and appliedAI Initiative GmbH)
- v0.1.9 (2022-07-20)
- v0.1.8 (2022-07-01)
- v0.1.7 (2022-02-22)
- v0.1.6 (2021-07-16)
- v0.1.5 (2021-06-22)
- v0.1.4 (2021-06-21)
- v0.1.1 (2021-06-01)
- v0.1.0 (2021-05-25)
- v0.0.8 (2021-02-18)
- v0.0.4 (2020-10-16)
- v0.0.1 (2020-02-20)