To give some grounding to an otherwise abstract problem, we refer to an example scenario. This scenario involves a number of different users requesting access to some sensitive Human Resources (HR) data for different reasons. These different requests each demonstrate a different aspect of Palisade that we require.
ℹ️ This scenario is implemented and demonstrated in the Example Library package of the examples repository.
The Employee
represents an element of the target dataset, the public and private details of an employee at a company.
Some highlights of the schema might include:
class Employee:
String uid
String name
Address address
BankDetails bankDetails
PayGrade payGrade
List<Manager> managers
class Manager:
String uid
List<Manager> managers
Employees represent a target dataset and can be generated by the Synthetic Data Generator.
The ExampleUser
represents a user of Palisade, an employee who needs to analyse the sensitive data of other fellow employees.
Some highlights of the schema might include:
class User:
String uid
List<String> auths
class ExampleUser extends User:
List<TrainingCourse> trainingCompleted
enum TrainingCourse:
PAYROLL_TRAINING_COURSE
In particular, TrainingCourse
is specific to this scenario and not to Palisade, it has been added as part of this enriched ExampleUser
.
The Resource
represents a coarse-grained collection of records with a hierarchical structure, although it does not actually contain the data therein.
A Resource
in this scenario is effectively a filename, from the following directory structure:
/data
employee_file0.avro
employee_file1.avro
Ideally this metadata comes from a data catalogue, but it can equally work by doing a directory listing of the underlying data store and/or using shadow files for metadata info about each resource.
The Purpose
of the data request is declared by the User
along with the rest of the query.
This is audited, and could have further rules deciding whether the declared Purpose
was legitimate.
A sample of the set of possible values that could be declared in our scenario might include:
enum Purpose:
SALARY
DUTY_OF_CARE
STAFF_REPORT
In general, we bundle all additional contextual information into the Context
of the request.
These are the purposes we have defined as an example, but is not what might be included.
With these, we hope to demonstrate Palisade is capable of applying complex record-level rules as part of the defined data-access policy, such as:
- resource-level filtering - hide a resource
File /data/employee_file0.avro
- record-level filtering - hide the whole
Employee Alice
record - record-level masking - show the first half of
Alice
'sPostCode
and hide the rest of the address - contextual rule application - show the full
BankDetails
if the purpose of the request wasSALARY
and the user has completed thePAYROLL_TRAINING_COURSE
, otherwise hide it - complex (recursive) rules - show
Alice
thePayGrade
of employees for whom she is in their management chain (managers of managers etc.)