Replies: 4 comments 2 replies
-
Meeting August 18, 2022: Sarguel, Ian, Jose Regarding 2:
Regarding 3:
Regarding 4:
|
Beta Was this translation helpful? Give feedback.
-
Meeting September 15, 2022: Cancelled I met with Sarguel on Saturday September 17, 2022 to review some updates he has been working on: Regarding 2:
Regarding 3:
Regarding 4: STANDBY -> I am currently adding the Channel field to OSSEM DM
Azure data request:
|
Beta Was this translation helpful? Give feedback.
-
I am posting here at Ian's request. As part of the Feb hackathon month, I was considering writing a code generator for the OSSEM-CDM that would generate both For example, the Device entity in the OSSEM-CDM would generate Python classes like this: # Relatively close pseudo-code
from typing import Annotated
import pandera as pa
from pandera.typing import Series
from pydantic import BaseModel, Field
class Device(BaseModel):
"""
Events used to normalize events for the device or endpoint that generated the event (source or destination).
"""
action: Annotated[str, "If reported by an intermediary device such as a firewall, the action taken by device."] = Field(alias="Action")
inbound_interface: Annotated[str, "If reported by an intermediary device such as a firewall, the network interface used by it for the connection to the source device"] = Field(alias="InboundInterface")
outbound_interface: Annotated[str, "If reported by an intermediary device such as a firewall, the network interface used by it for the connection to the destination device."] = Field(alias="OutboundInterface")
hostname: Annotated[str, "The host name from which the event/log came from. There may be multiple host names in an event (i.e. syslog could have forwarder host name), this field is to be the most true log host name (i.e. NOT the forwarders name)."] = Field(alias="Hostname")
domain: Annotated[str, "Name of the domain the device is part of."] = Field(alias="Domain")
fqdn: Annotated[str, "The fully qualified domain name of the host"] = Field(alias="Fqdn")
interface_guid: Annotated[str, "GUID of the network interface which was used for authentication request"] = Field(alias="InterfaceGuid")
interface_name: Annotated[str, "the name (description) of the network interface that was used for authentication request. You can get the list of all available network adapters using ipconfig /all command"] = Field(alias="InterfaceName")
os: Annotated[str, "The OS of the device"] = Field(alias="Os")
model_name: Annotated[str, "The model name of the device"] = Field(alias="ModelName")
model_number: Annotated[str, "The model number of the device"] = Field(alias="ModelNumber")
_type: Annotated[str, "The type of the device"] = Field(alias="Type")
class DeviceSchema(pa.SchemaModel):
"""
Events used to normalize events for the device or endpoint that generated the event (source or destination).
"""
action: Series[str]
inbound_interface: Series[str]
outbound_interface: Series[str]
hostname: Series[str]
domain: Series[str]
fqdn: Series[str]
interface_guid: Series[str]
interface_name: Series[str]
os: Series[str]
model_name: Series[str]
model_number: Series[str]
_type: Series[str] The benefit is that these classes would give you better auto-complete in your IDE (through pylance/jedi) and provide runtime validation of objects to ensure they conform to the specification in OSSEM-CDM. It would still be up to the developer to use the data dictionaries from OSSEM-DD to populate the generated class models for the OSSEM entities. The only coercion that Here is a you example of how we could define an analytical function that takes a dataframes of OSSEM-CDM devices and returns a list of malicious devices based on some detection logic: import pandas as pd
from typing import Any, Dict, List
from pandera.typing import DataFrame
# Dictionary representation of the devices.
device_dump: List[Dict[str, Any]] = dump_devices_from_somewhere()
# The pydantic model will try to parse the dicts into
# the class model. If it is wildly different, then it
# will throw an exception. If there are subtle type
# mis-matches, it will attempt to coerce the values
# into whatever types are defined on the class model
# OSSEM-CDM entity.
# These are now tidy models, so you can access attributes
# using dot notation instead of itemgetter. You can also
# leverage pydantic to compose multiple entities into
# the OSSEM-CDM tables.
corporate_devices = [Device(**device) for device in device_dump]
# But we love our dataframes for analytics. So how can
# we get similar sanity checking/type annotations on
# of OSSEM-CDM entities in dataframes? We can use
# pandera.
device_df = pd.DataFrame(corporate_devices)
@pa.check_types(with_pydantic=True)
def detect_compromised_devices(df: DataFrame[DeviceSchema]) -> List[Device]:
"""
This function knows nothing about where these devices
came from or how they were populated. But if you pass
it a dataframe with the appropriate columns as defined
in the pandera schema model `DeviceSchema`, then the
function should happily consume it.
Think of this as a type annotation on a dataframe.
The type annotation on the return value can also
be one of these DataFrames, so if you are doing
a bunch of munging/ETL, you can define what the output
should be and pandera will complain if your function
doesn't actually implement the contract.
"""
raise NotImplementedError
# If `device_df` doesn't conform to the pandera schema model,
# then it will throw an exception. This makes it easier to
# reason about the internal logic of the function and MUCH
# easier to test.
compromised_devices = detect_compromised_devices(device_df) Does this seem generally useful? |
Beta Was this translation helpful? Give feedback.
-
Here is a brief update on my progress RE: OSSEM entity and table code generation into Pydantic models: I used pypackage cookiecutter to boostrap the project and wrote parsers for both OSSEM entity and table definitions: I then wrote jinja-based templates to convert the parsed entity/table definitions into Pydantic models. The generated Pydantic models are syntactically valid and (as far as I can tell) semantically equivalent to the definitions in the YAML files. And here is an example of the generated network session table model: The generated table model properly adds the prefix and sub-selects the relevant fields from the specified entity: My current challenges are with the tables. There are some tables that reference attributes that do not exist on their source entity. For example, the Additional work also needs to be done to ensure that all fields in the generated table models are actually unique. I have found a few name collisions.
Hope some of this made sense :D |
Beta Was this translation helpful? Give feedback.
-
Some initial thoughts on how this might work (be modified from current implementation).
Beta Was this translation helpful? Give feedback.
All reactions