Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit Support #4

Open
andyjefferson opened this issue Apr 11, 2016 · 1 comment
Open

Audit Support #4

andyjefferson opened this issue Apr 11, 2016 · 1 comment

Comments

@andyjefferson
Copy link
Member

This was taken from a Wiki post, so put here as a placeholder in case someone can implement it.

I like to implement auditing feature for JDO objects. Auditing means different for different people. So i'll explain my requirements

Req 1 :
History :- When an object is created/modified the old and new values are stored in a separate location. This feature is in lines with hibernate-envers. For now this is of low priority.

Req 2 :
AuditMaker :- When an object is created/modified the author,role,timestamp should be additionally captured and stored.

Lets study Req2 :
AuditMaker can be designed to be part of the domain model ie, create an interface with author,role,timestamp and implement this interface across the domain. This pollutes my domain model, further audit is a cross concern not needed in all projects/customers. I should be enabled only when I need.

Another option is to handle this via metadata. Introduce an annotation @Auditable. All @Auditable classes will automatically get three additional fields. In similar lines with JDO Versioning. With @Version the data object automatically gets a version field.

Another issue is capturing the userid and his role. Need to design some kind of extension point.

Question
Where do I start? One option seems to be Lifecycle Callbacks. Since I need to enhance the data object Lifecycle Callbacks may not be a good option.
If I choose to write a datanucleus plugin , which plugin point is appropriate for this requirement?
Has any one already implemented similar stuff ?
Any thoughts on this are welcome.

Andy Jefferson says:
The annotation sounds fine, and you can make use of

The annotation sounds fine, and you can make use of http://www.datanucleus.org/extensions/class_annotation_handler.html to get the annotation information read in to DataNucleus; in the "processClassAnnotation" method just call something like
cmd.addExtension("auditable", "true");

so then when you access AbstractClassMetaData in your code you can find if the class is defined as auditable or not.

Userid and role : a user of JDO/JPA doesn't have these. How useful is that info?

Is it going to work with all types of datastores ? Obviously the RDBMS plugin knows how to persist to RDBMS, the HBase plugin knows how to persist to HBase etc, and you don't want to have to update all of those.

One way you could do this, used by a not-yet-released bit of software (encryption of data), is for when the user wants to use auditing they define their connectionURL as something like
audit:jdbc:mysql:...
audit:hbase:...
And "AuditStoreManager" will create the "backing" StoreManager of the right type (RDBMS, HBase, etc) that will do the actual persistence. So your AuditStoreManager receives all persistence calls, which allows you to intercept any changes. You then relay the persistence call to the backing StoreManager for the users objects, and then also create "in-memory" the persistable classes for the audit data, and persist instances of these to the backing StoreManager.

    Kiran Kumar says:
    Annotation handler sounds as a good start. Later it can be applied to xml based ...

    Annotation handler sounds as a good start. Later it can be applied to xml based metadata.
    I have few questions here
    1) Does the enhanced class have access to events like prePersist and preUpdate etc.. ?
    2) Does the enhanced class have access to store manager ? - Idea is when it finds an rdbms datastore I can run additional checks to verify if the audit columns already exist. If it finds a object data store additional validation is not needed.


        Is it going to work with all types of datastores ? Obviously the RDBMS plugin knows how to persist to RDBMS, the HBase plugin knows how to persist to HBase etc, and you don't want to have to update all of those.

    Yes it will work in all data-stores since the datastore is expected to store/work with the enhanced class

        Userid and role : a user of JDO/JPA doesn't have these. How useful is that info?

    The userid and role doesn't belong to datastore. It comes from the application. It has to be the details of the currently logged in user. In typical j2ee application these details are stored in the httpsession. In CDI based application it could be injected as a sessionscoped bean. Some of them may prefer to store it in a thread local.
    To be able to support multiple techniques we need some kind of extension point. In one of the banking application i developed , it was required to store the userid,approverid,timestamp and domainId So the content of the audit info depends on the actual application being developed.

        audit:jdbc:mysql:...

    This seems to be at global level. I need control at class level. All objects don't require audit. Further nature of audit is slightly different depending on the situation. For eg: At my organization we classify the system into two types a)Master Data b) Transitional data. The nature of audit for master is different for masters and different for transactional data.

        Andy Jefferson says:
        The enhanced class is a class nothing more. It doesn't do anything; my suggested...

        The enhanced class is a class nothing more. It doesn't do anything; my suggested route doesn't need "callbacks" since you provide an AuditStoreManager and that receives all persistence calls.

        The persistence manager allows "properties", so the user could set a property of "user" (or something, if they want it registering in audits) and then you use that.

        My suggested route is to intercept the StoreManager, and relay calls on to the underlying StoreManager and optionally persist some audit information too. If a class is not audited then you simply relay the calls to the underlying StoreManager (you have the metadataManager so can easily check what is audited).

        The only other route would be to update every store plugin for auditing, and that makes no sense.


            Kiran Kumar says:
            Great now I understand your suggestion. Effectively there will be two new plugin...

            Great now I understand your suggestion. Effectively there will be two new plugins 1)Annotation Handler 2) Store Manager. I will get back with some code.
            Thanks a lot

            Kiran Kumar says:
            Andy , I have created a new StoreManager based on federatedstore manager. But I ...

            Andy , I have created a new StoreManager based on federatedstore manager. But I have got a classcast exception while inserting a object. The idea is to create a transparent store manager which will relay all the calls to backingStoreManager. Following is the exception

            Exception in thread "main" java.lang.ClassCastException: org.datanucleus.datastore.audit.AuditStoreManager cannot be cast to org.datanucleus.store.rdbms.RDBMSStoreManager
            at org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:205)
            at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.insertTable(RDBMSPersistenceHandler.java:163)
            at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.insertObject(RDBMSPersistenceHandler.java:139)
            at org.datanucleus.datastore.audit.AuditPersistenceHandler.insertObject(AuditPersistenceHandler.java:108)
            at org.datanucleus.state.JDOStateManagerImpl.internalMakePersistent(JDOStateManagerImpl.java:2411)
            at org.datanucleus.state.JDOStateManagerImpl.makePersistent(JDOStateManagerImpl.java:2387)
            at org.datanucleus.ObjectManagerImpl.persistObjectInternal(ObjectManagerImpl.java:1779)
            at org.datanucleus.ObjectManagerImpl.persistObjectWork(ObjectManagerImpl.java:1627)
            at org.datanucleus.ObjectManagerImpl.persistObject(ObjectManagerImpl.java:1474)
            at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:734)
            at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:759)
            at org.datanucleus.samples.jdo.tutorial.Main.main(Main.java:58)

            Line 205 in InsertRequest.java
            RDBMSStoreManager storeMgr = (RDBMSStoreManager)ec.getStoreManager();

            I guess the above error should also occur with FederatedStoreManager. Please give a hint how is it handled in federatedStore manager.

            Please see the attachment

                Andy Jefferson says:
                Hi, Nowhere does it claim that FederatedStoreManager (and its related classes) a...

                Hi, Nowhere does it claim that FederatedStoreManager (and its related classes) are complete; they're not mentioned in the docs for a reason. Two JIRAs exist still, to be worked before 3.1. SVN trunk has further changes towards this end, so you'd be best advised to use that, and work out further issues.


                    Kiran Kumar says:
                    Andy I have used code from the trunk. Now the hollow AuditStoreManager works wit...

                    Andy I have used code from the trunk. Now the hollow AuditStoreManager works with one small change the method
                    RDBMSQueryUtils.prepareStatementForExecution() still refers to the store manager in nucleus context.After the following change it works well.

                    "- MappedStoreManager storeMgr = (MappedStoreManager)nucleusCtx.getStoreManager();
                    + MappedStoreManager storeMgr = (MappedStoreManager)query.getStoreManager();


Jasper Siepkes says:
Have you thought about how you are going to store the audit data ? I know that H...

Have you thought about how you are going to store the audit data ? I know that Hibernate Envers stores a complete copy of a persisted object for every change. So for example if you have a table with 10 columns and you change 1 column Envers will also store the other 9 columns (its more like snapshotting).

An alternative would be to store all changes in a change table (one change table for every versioned table for example).

The advantage of "snapshotting" is that you can keep your type data (number, varchar, etc). Also its quite easy to get a point in time view of an object.

The obvious disadvantage of "snapshotting" is that it will probably take up more space unless your underlying storage mechanism performs some sort of compression (in case of an RDBMS full table compression. I know that for example PostgreSQL does not support this, only compression at record level).

    Kiran Kumar says:
    Jasper as described in the post right now I am trying to achieve Req2. AuditMake...

    Jasper as described in the post right now I am trying to achieve Req2. AuditMaker.
    With reference to versions here are my thoughts.
    There are two major types of needs 1) Regulatory requirements 2) Business Requirement
    1) Regulatory requires that every change is recorded along with relevant meta data. Dont care about querying. The audit data is accessed only when there is a dispute or a similar situation. Here tractability is important. The storage and format is of little concern
    2) Business requirement - Businesses like insurance need to keep track of all changes to say customer information , policies etc.. The versioned records need to be available for day to day queries. Business activities depend on when the changed happened and why the change happened. Here storage and accessibility of the storage is critical.

    The second requirement should be addressed by designing the version strategy as part of the object model. Any generic solution will fall short of the actual requirement.

    The first requirement can be achieved as a generic solution which can be independent of the object model.
    For eg: I have a petStore application. Now when I deploy it in regulated regions I will enable auditing. In other regions I will disable auditing. And fine tune the depth and objects which will be audited.

    To answer your question, considering that my primary focus is on first requirement. The shortest path seems to be storing the snapshots. Further the snapshot need not be similar to original object/table. In one of the products for banking (using jdbc) I have stored the audit data in JSON format. JSON format has few advantages. When storage is a concern we could compress before storing.

    The design problem Iam facing is version strategy across all the dependent objects. What will be the version when parent changes. What will the revision when child changes. Will all dependent objects have a single revision number similar to how SVN manages objects ? Or what about having single revision for the whole database. How will this translate when used on rdbms vs object databases.

    Any suggestions comments ?
    Thanks Kiran Kumar


        Jasper Siepkes says:
        Hi Kiran, I agree with you that the second requirement cannot be fulfilled with...

        Hi Kiran,

        I agree with you that the second requirement cannot be fulfilled with the same generic audit framework we are talking about here and is therefor out of scope.

        About the revision numbers; I think the state of the auditable objects can only be tracked in a single revision number style kind of way. Let's say you have a simple object like "Order" which would contain a "Customer" object. Next you change the name of the customer in the customer object and increment the version of the customer object. However this will also affect the order object, which doesnt have its version incremented. Tracking where the customer object is used and incrementing it is obviously a messy solution. Since the system would work in a snapshot kind of way a global revision number makes sense (like SVN, GIT, etc.). I know Hibernate Envers does not do this; It tracks version numbers per object. Regarding RDBMS vs Object DB, I don't think the underlying storage mechanism matters.

        Since the primary goal is tracking changes for regulatory purposes (seeing who changed what and tracking down a person who messed up ) the speed of retrieving this data is not the most important objective IMHO. Personally I think persisting the changes via DataNucleus is the best option. For example with RDBMS this would be creating an audit table for every table of an audited entity. This audit table would have all the columns the normal table has but contain extra columns to store different versions of the entity. Now I think we could get a way with storing only the column data that has changed since the previous version and put NULL in all the unchanged columns. This way we would have minimal storage usage and type safety. Hibernate Envers always stores all columns for every version (IRC). I am somewhat curious why they do this and not simply store only changed columns and NULL everything that hasn't changed, am I missing something obvious ? (guess the best way to find out is asking them).

        What do you think Kiran ? Andy, do you have any thoughts on this ?
            Permalink
            Edit
            Remove
            Reply

            Aug 21, 2011
            Andy Jefferson says:
            Basis for discussion : Usecase Order and Customer (11). We only store audit info...

            Basis for discussion : Use-case - Order and Customer (1-1).
            We only store audit information for a class when a user has marked it as @Auditable, and the class has a version strategy.

            Primary datastore : tables ORDER, CUSTOMER.
            Audit datastore (can be different) : tables ORDER_AUDIT, CUSTOMER_AUDIT, AUDIT_VERSION

            The AUDIT_VERSION table would be updated for each block of changes. In CUSTOMER_AUDIT, ORDER_AUDIT you have the same structure as CUSTOMER, ORDER but with an additional column "CHANGE_VERSION", and this is the key into AUDIT_VERSION. AUDIT_VERSION could also include user name, datetime etc.

            1. User persists first version of each.
            -> Create entries in tables ORDER, CUSTOMER, with version=1 (of each).
            -> Create entry in AUDIT_VERSION, version=1
            -> Create entry in ORDER_AUDIT, CUSTOMER_AUDIT for each, and CHANGE_VERSION=1.

            2. User updates Customer "name".
            -> Update CUSTOMER table "name", and version=2.
            -> Create entry in AUDIT_VERSION, version=2
            -> Create entry in CUSTOMER_AUDIT for this record, CHANGE_VERSION=2

            3. User updates Owner "location"
            -> Update OWNER table "location", and version=2.
            -> Create entry in AUDIT_VERSION, version=3
            -> Create entry in ORDER_AUDIT for this record, CHANGE_VERSION=3

            Jaspers idea of only putting values in the changed columns of CUSTOMER_AUDIT, ORDER_AUDIT makes sense since you then know what was changed rather than having to analyse ... except if the user set something to null! so would need to cater for that.
                Permalink
                Edit
                Remove
                Reply

                Aug 21, 2011
                Jasper Siepkes says:
                Sounds good Andy! Hadn't thought of the scenario that a user can set a value to...

                Sounds good Andy!

                Hadn't thought of the scenario that a user can set a value to NULL. Possible solutions I see are:
                    Special value for every type to indicate it was set to null rather then not changed.
                        Would also become cumbersome to maintain especially when using more exotic types like geospatial.
                    Add a boolean column for every column to flag it as changed.
                        Is beter then the idea above because its more abstract and doesn't require manual maintenance. We could also make a configuration property that would enable the plugin to always store all columns even the ones that haven't changed (personally I wouldn't use it, but perhaps there are users who want that kind of Envers compatible like behavoir). Users would still be able to tell which columns have changed.
                    Store the names of changed columns/fields somewhere in a table for every revision
                        The advantage is we don't need a flag column for every column and only have an entry when a value is set to NULL by the user. Disadvantage is that it is probably slower because we need a join to get the info from the table. Also storing column names in fields is something I try to avoid (if possible) because of long term maintenance issues.

                Can't really say I care much for any of these 3 possible solutions. Number 2 seems like the least worst of them IMHO. I hope anyone can come up with a beter solution then the 3 I thought of.
                    Permalink
                    Edit
                    Remove
                    Reply

                Aug 27, 2011
                Kiran Kumar says:
                Two aspects storage and retrieval 1) Storage If it was just rdbms the above can ...

                Two aspects storage and retrieval
                1) Storage - If it was just rdbms the above can be achieved without class enhancement. Drawback is it doesn't work with object database like db4o and queries will be complex. So best options seems to be to use class enhancement.
                Use-case - Order and Customer (1-1).
                Three classes are needed a) OrderAudit,b) CustomerAudit, c) AuditVersion. To resolve class name conflict ie, when OrderAudit already exists. @Auditble will have property to define a prefix and suffix for the generated class.

                Question What is the best place to generate this class a)As part of JDO class enhancer b)At the time of annotation processing similar to the classes generated for typesafe queries c) on the fly at run-time ?

                2) Retrieval :- Do we need a new api ? Can it be accommodated within standard JDO ?
                for eg : SELECT FROM mydomain.OrderAudit
                Question Is JDOQL sufficient to represent a) Querying for entities of a class at a given revision and b) Querying for revisions, at which entities of a given class changed

                Jasper what do think ? Andy any suggestions ?
                Thanks
                -Kiran
                    Permalink
                    Edit
                    Remove
                    Reply

                    Aug 28, 2011
                    Andy Jefferson says:
                    1). Enhancer : this is a separate process, and nothing to do with class creatio...

                    1).
                        Enhancer : this is a separate process, and nothing to do with class creation, so shouldn't put anything to do with Audit in there.
                        AnnotationProcessor : advantage here is that you have a simple way of getting the audit classes created (and enhanced as post-compile), but only works when the model classes are annotated i.e won't work with classes that have XML metadata.
                        Runtime : Would allow it to work with annotations or XML, but involves more work. Implementing this way ought to follow the process in this guide
                        http://www.datanucleus.org/servlet/wiki/pages/viewpage.action?pageId=6619188
                        Permalink
                        Edit
                        Remove
                        Reply

                        Sep 02, 2011
                        Kiran Kumar says:
                        Andy,The new annotation can be used as below. @Auditable(auditableclass="org.da...

                        Andy,The new annotation can be used as below.

                        @Auditable(auditableclass="org.datanucleus.samples.jdo.tutorial.AuditableBook")
                        or
                        @Auditable(auditableclassprefix="Auditable")
                        The priority is for auditableclass property, When auditableclass is not found AuditStoreManager will create the AuditableClass on the fly.

                        I am initially trying with existing class. I have Book.java and AuditableBook.java Not related to each other. AuditableBook has few extra auditable fields.

                        I hit a road block here. I am able to insert the Book and AuditableBook. But AuditableBook gets deleted apparently due to persistence-by-reachability. The JdoStateManagerImpl finds that the AuditableBook is not reachable hence decides to delete. Please suggest proper way to handle the lifecycle.

                        Here The psuedo code
                        The AuditPersistenceHandler.persist()
                        Unknown macro: { StateManager smaudit = smaudit=(StateManager) ObjectProviderFactory.newForPersistentNew(sm.getExecutionContext(), auditDataObject, null); smaudit.setFlushedNew(true); smaudit.makePersistent(); storeMgr.insertObject(smaudit); }

                        datanucleus.log
                        DEBUG main (DataNucleus.Persistence) - Object with id "1OIDorg.datanucleus.samples.jdo.tutorial.AuditableBook" was reachable when a makePersistent() was called on another object but is no longer reachable (at commit). The object will be removed from the datastore.
                        DEBUG main (DataNucleus.Cache) - Object "org.datanucleus.samples.jdo.tutorial.AuditableBook@1154b2f" (id="1OIDorg.datanucleus.samples.jdo.tutorial.AuditableBook") taken from Level 1 cache (loadedFlags="YYYYYYY") cache size = 3
                        DEBUG main (DataNucleus.Persistence) - ObjectManager.internalFlush() process started using optimised flush - 0 to delete, 0 to insert and 0 to update
                        DEBUG main (DataNucleus.Persistence) - ObjectManager.internalFlush() process finished
                        DEBUG main (DataNucleus.Cache) - Object "org.datanucleus.samples.jdo.tutorial.AuditableBook@1154b2f" (id="1OIDorg.datanucleus.samples.jdo.tutorial.AuditableBook") taken from Level 1 cache (loadedFlags="YYYYYYY") cache size = 3
                        DEBUG main (DataNucleus.Lifecycle) - Object "org.datanucleus.samples.jdo.tutorial.AuditableBook@1154b2f" (id="1OIDorg.datanucleus.samples.jdo.tutorial.AuditableBook") has a lifecycle change : "P_NEW"->"P_NEW_DELETED"
                        DEBUG main (DataNucleus.Persistence) - Object "org.datanucleus.samples.jdo.tutorial.AuditableBook@1154b2f" being deleted from table "JDO_AUDITABLEBOOKS"
                        DEBUG main (DataNucleus.Connection) - Connection found in the pool : [org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl@dcfce0, jdbc:hsqldb:mem:nucleus1, UserName=SA, HSQL Database Engine Driver] for key=org.datanucleus.ObjectManagerImpl@7733eb in factory=ConnectionFactory:tx[org.datanucleus.store.rdbms.ConnectionFactoryImpl@5e1210]
                        DEBUG main (DataNucleus.Datastore.Persist) - Retrieving PreparedStatement for connection "jdbc:hsqldb:mem:nucleus1, UserName=SA, HSQL Database Engine Driver"
                        DEBUG main (DataNucleus.Datastore.Native) - DELETE FROM JDO_AUDITABLEBOOKS WHERE JDO_AUDITABLEBOOKS_ID=<1>
                        DEBUG main (DataNucleus.Datastore.Persist) - Execution Time = 1 ms (number of rows = 1)

                        Thanks Kiran
                            Permalink
                            Edit
                            Remove
                            Reply

                            Sep 02, 2011
                            Andy Jefferson says:
                            Any persistence of AuditableBook should be through the correct API calls. Needs ...

                            Any persistence of AuditableBook should be through the correct API calls. Needs to go through ObjectManagerImpl.persistObject(...) otherwise it won't be registered as an object being persisted (and hence PBR at commit removes it). i.e you have to simulate what would happen if the user did
                            pm.makePersistent(book);
                            pm.makePersistent(auditBook);
                            (although the user obviously doesn't have the auditBook object). That way you let DataNucleus creates StateManagers etc.
                                Permalink
                                Edit
                                Remove
                                Reply

                                Sep 08, 2011
                                Kiran Kumar says:
                                Thanks andy. Now I am able to insert and update. I used ObjectManagerImpl omgr...

                                Thanks andy. Now I am able to insert and update. I used

                                ObjectManagerImpl omgr=(ObjectManagerImpl) sm.getExecutionContext();

                                Question How can I access the currentValue. I am trying to store currentValue as well as newValue ?
                                DirtyLifecycleListener is not suitable since it fires only for first event.

                                One way I can think of is, clone the existing object, query the original state from datastore. Push the values from dirty fields to AuditableBook . Finally merge the dirty fields with original object and persist.

                                Seems too much work. Is there a better solution ?

                                Thanks
                                Kiran
                                    Permalink
                                    Edit
                                    Remove
                                    Reply

                                    Sep 08, 2011
                                    Andy Jefferson says:
                                    What currentValue ? of a field of the users persistable object? Use a FieldManag...

                                    What currentValue ? of a field of the users persistable object? Use a FieldManager
                                        Permalink
                                        Edit
                                        Remove
                                        Reply

                                        Sep 09, 2011
                                        Kiran Kumar says:
                                        I am looking for CurrentValue of the field. In an audit trail I want to store th...

                                        I am looking for CurrentValue of the field. In an audit trail I want to store the objectid,objecttype,fieldName,oldValue,newValue

                                        --Kiran
                                            Permalink
                                            Edit
                                            Remove
                                            Reply

                                            Sep 09, 2011
                                            Andy Jefferson says:
                                            Look at the use of FieldManager in the store plugins (e.g store.odf, store.hbase...

                                            Look at the use of FieldManager in the store plugins (e.g store.odf, store.hbase). "insert" of an object will use a StoreFieldManager to receive the values (and put them in the datastore). You could use the same idea to put the values into the audit object.

                                            Why store oldValue and newValue? The first time an object is inserted, just store the value of all fields. Then on each change you store the new value.
                                                Permalink
                                                Edit
                                                Remove
                                                Reply

                                                Sep 09, 2011
                                                Kiran Kumar says:
                                                Andy here is my requirement for audit trail. Requirement : Maintain log of chang...

                                                Andy here is my requirement for audit trail.
                                                Requirement : Maintain log of changes in single table for all objects.Single class in case of a object database. It is a series of changes as they happen.
                                                Purpose : We use audit trail for a)Track user activity b) Support. During support we often get calls like - My charges are too high today. Till yesterday it was normal. We request the audit log from user and find that the user has changed the charging preference from transaction based to volume based.
                                                Having oldvalues along with newvalues is critical for readability.It is part of the root cause analysis

                                                Say I have
                                                Person
                                                id name surname version
                                                1 andy jefferson 1
                                                2 Jasper Siepkes 1

                                                After first change ie, converting name into upper case
                                                id name surname version
                                                1 ANDY jefferson 2
                                                2 Jasper Siepkes 1
                                                AuditLog
                                                id objectId fieldName oldValue newValue action version time userid
                                                1 1 name andy andy insert 1 t1
                                                2 1 surname jefferson jefferson insert 1 t1
                                                3 2 name Jasper Jasper insert 1 t2
                                                4 2 surname Siepkes Siepkes insert 1 t2
                                                5 1 name andy ANDY update 2 t3

                                                Here userid can be any other supplementry data like user role, screen id, terminal id etc..

                                                The StoreFieldManager is recieving ObjectProvider sm as an input. The problem is the when the StoreManger.udate() is executed the fields are already replaced by the StateManager. It seems to me that there are two ways to gethold of oldvalue 1) Query the database 2) Retrieve from cache

                                                I am not sure how to handle either of them Please advice.
                                                Thanks and Regards
                                                Kiran
                                                    Permalink
                                                    Edit
                                                    Remove
                                                    Reply

                                                    Sep 09, 2011
                                                    Andy Jefferson says:
                                                    My point is you don't need to store the "oldValue" since it is in the previous c...

                                                    My point is you don't need to store the "oldValue" since it is in the previous change (i.e duplication) ... sure its nice to compare looking at the datastore, but you can get the same from a simple SELECT. AuditLog row 1 has value of "andy", and row 5 has value of "ANDY", hence you can interpret that it was changed from "andy" to "ANDY". I don't see how "AuditLog" knows the object is a Person, unless you mean AuditLog is a table just for Person objects

                                                    If you really want to store the oldValue just do a SELECT on the AuditLog for the previous change of that field ... way simpler (i.e a Query on the audit StoreManager for the AuditLog, WHERE fieldName = :value
                                                        Permalink
                                                        Edit
                                                        Remove
                                                        Reply

                                                        Sep 09, 2011
                                                        Kiran Kumar says:
                                                        The AuditLog does not know anything about the Person other than its id. It has n...

                                                        The AuditLog does not know anything about the Person other than its id. It has no relation with Person.

                                                        Typically AuditLog size keeps increasing quickly. So we provide a purge logic. The user can choose to remove previous records. Both options suggested induces additional complexity in purge logic.
                                                        Please note : AuditLog here is not expected to support queries like what is the Person with version 6 ? Single table is used to store changes wrt all objects.
                                                        Its basic purpose is to answer questions like which objects changed ?, who changed ?, when ?, what is the change ?.

                                                        --Kiran
                                                            Permalink
                                                            Edit
                                                            Remove
                                                            Reply

                                                            Sep 09, 2011
                                                            Andy Jefferson says:
                                                            Saying "id" is 1 means nothing. So somebody changed field X in an object with id...

                                                            Saying "id" is 1 means nothing. So somebody changed field X in an object with id "1" ... great, but that does nothing for auditability (there may be a Person with id "1" and an Account with id "1", so how do you know which?). It has to tell you what type of object it is, otherwise I see no point to this. The thing Jasper and me discussed in this thread around Aug 21 is what made sense to me, since there you have traceability of object types.
                                                                Permalink
                                                                Edit
                                                                Remove
                                                                Reply

                                                                Sep 09, 2011
                                                                Kiran Kumar says:
                                                                Andy you are right I forgot to add objectType. You may have noticed, AuditLog st...

                                                                Andy you are right I forgot to add objectType. You may have noticed, AuditLog stores all oldValue as well as NewValue as a String in interest of readability . I still think we need a way to retrieve oldvalue.

                                                                Regarding the nature of auditing I request your attention to http://www.datanucleus.org/servlet/wiki/display/ENG/DNAuditor+Requirements+%28Draft%29.

                                                                I know of these req because except items 6 7 & 8 , I worked on all using jdbc and database triggers in an rdbms.

                                                                I have listed suggestions by Jasper as item 3. Initially I thought of working on items 1,2

                                                                Thanks and Regards
                                                                --Kiran
                                                                    Permalink
                                                                    Edit
                                                                    Remove
                                                                    Reply

                                                                    Sep 14, 2011
                                                                    Jeremy Higbee says:
                                                                    Kiran, Jasper, and Andy, First of, sorry to butt into the conversation, but I'm...

                                                                    Kiran, Jasper, and Andy,

                                                                    First of, sorry to butt into the conversation, but I'm intrigued by the discussion, because I am currently trying to impliment this type of auditing in the system I'm trying to migrate to DataNucleus. I had used Hibernate Envers previously which worked for pure auditing, but did not fit the business process model I needed to work in (changes are automatic in Envers, whereas my application requires more of a "proposed changes" setup where users can propose changes to a Java Object, using auditing to track exactly what fields they want to change (from what - OldValue, and to what - NewValue)).

                                                                    First off, I don't follow the need to capture the Object Type. My current attempt uses two data objects: ProposedChange which contains a ref to the object being modified (for bi-directional relationship), metadata regarding who made the change, when, and a List of Change objects. Change objects have a String fieldName, Object oldValue, and Object newValue (since JDO can handle persisting Objects). I use java reflection to grab java.lang.Object values of each object's fields (since I want to be able to use/modify my POJOs without having to re-write the Change object). From my (very bare) understanding, JDO can do this without persisting the Object Type. Am I missing something? Or is the Object Type necessary for implimenting in DataNucleus as being agnostic to whether the user is using JDO or JPA?

                                                                    Thanks!
                                                                    Jeremy
                                                                        Permalink
                                                                        Edit
                                                                        Remove
                                                                        Reply

                                                                        Sep 15, 2011
                                                                        Kiran Kumar says:
                                                                        Jeremy, Welcome to friendly neighborhood datanucleus :) The requirments describe...

                                                                        Jeremy,
                                                                        Welcome to friendly neighborhood datanucleus
                                                                        The requirments described looks familiar. In banking domain we call it maker-checker.
                                                                        http://stackoverflow.com/questions/5100366/maker-checker-support-envers.

                                                                            ProposedChange which contains a ref to the object being modified (for bi-directional relationship)

                                                                        However AuditLog(AuditTrail) does not have any relation with the Domain object (say Person).
                                                                        We need Object Type for traceability. It cannot be used to reconstruct the object.

                                                                        Key constraint for dnauditor is to induced zero changes to the domain model.
                                                                        In the solution you suggested the domain model(Person) needs to be amended ie, add a property to hold the ProposedChange.

                                                                        AuditLog is trying to address the first requirement in the proposed DNAuditor Requirements (Draft).
                                                                        Currently JPA is not in scope.
                                                                            Permalink
                                                                            Edit
                                                                            Remove
                                                                            Reply

                                                                        Sep 15, 2011
                                                                        Andy Jefferson says:
                                                                        Conversations are for all to participate in. "object type" is needed to identif...

                                                                        Conversations are for all to participate in.

                                                                        "object type" is needed to identify what object the change relates to. An object can have a single field as PK, or composite PK, or compound identity; perhaps the easiest way would be to just make "id" be a String column and to store (something like)
                                                                        "object-type:string-form-of-id"

                                                                        @Kiran, JPA may not be specifically in your scope but, by providing the functionality in a StoreManager, it is accessible for both JDO and JPA
                                                                            Permalink
                                                                            Edit
                                                                            Remove
                                                                            Reply

                                                                            Sep 16, 2011
                                                                            Jeremy Higbee says:
                                                                            Kiran, Andy, Thanks for the explanation. I originally failed to grasp that this...

                                                                            Kiran, Andy,

                                                                            Thanks for the explanation. I originally failed to grasp that this project aims to incorporate the auditing without requiring changes to the objects (which is fantastic), so my fault. (My case has the flexibility since I'm writing it from scratch, but as I mentioned, I had tried Envers which was the same style).

                                                                            Thanks guys!
                                                                            Jeremy
                                                                                Permalink
                                                                                Edit
                                                                                Remove
                                                                                Reply

Sep 12, 2011
Jasper Siepkes says:
Hi Kiran, Do you have the code somewhere hosted online so I/we can take a look ...

Hi Kiran,

Do you have the code somewhere hosted online so I/we can take a look at it (github perhaps) and shine my/our (somewhat ) bright light on it ?
    Permalink
    Edit
    Remove
    Reply

    Sep 14, 2011
    Kiran Kumar says:
    https://sourceforge.net/p/dnauditor/code/2/tree/trunk/AuditDataStore/src/java/or...

    https://sourceforge.net/p/dnauditor/code/2/tree/trunk/AuditDataStore/src/java/org/datanucleus/datastore/audit/
    Expect to see some amateurish code
    There is also an empty test project.
    --Kiran
        Permalink
        Edit
        Remove
        Reply

        Sep 14, 2011
        Jasper Siepkes says:
        Cool! Thanks! I'm going to take a look at it the coming days.

        Cool! Thanks! I'm going to take a look at it the coming days.
            Permalink
            Edit
            Remove
            Reply

            Sep 16, 2011
            Kiran Kumar says:
            I have created few tickets for tracking/discussing line items. https://sourcefor...

            I have created few tickets for tracking/discussing line items.
            https://sourceforge.net/apps/trac/dnauditor/report
                Permalink
                Edit
                Remove
                Reply

        Sep 28, 2011
        Andrey says:
        Hi, All! Thank you for interesting topic, Kiran. i'm quite new to datanucleus, ...

        Hi, All! Thank you for interesting topic, Kiran.

        i'm quite new to datanucleus, but it seems to me, that it would be more easy/convenient to hook audit logging through lifecycle callbacks? http://www.datanucleus.org/products/accessplatform/jdo/lifecycle_callbacks.html#listeners

        prove me wrong?
            Permalink
            Edit
            Remove
            Reply

            Sep 29, 2011
            Kiran Kumar says:
            I ll try :) 1) I would prefer to use LifeCycle listeners (Entity Listners in JP...

            I ll try

            1) I would prefer to use LifeCycle listeners (Entity Listners in JPA ) when developing application specific behaviour. For eg : Listeners in payments processing application are different from listners in loan processing application.
            DnAuditor would be generic ie, should work with any application using datanucleus.
            2) We dont have access to old value in lifecycle listners (Mandatory for audit trail)
            3) DnAuditor could be configured as annotation or using XML. DataNucleus has few convinent plugin points to access annotations and configurations
            4) Point 3 induces dependency on datanucleues. DnAuditor is targeted at DataNucleus users.
            5) We would require dynamic class generation and enhancement. DataNucleus has good infrastructure to handle this. Having enhanced classes at the time of loading will help identifying and debug issues , rather than wating till the event occurs and then listner executes.
            6) Infrastructure like ObjectManagerImpl, AbstractClassMetaData , VersionMetaData, ObjectProvider,state manager cannot be accessed in a life cycle listner -(not sure of this plz correct me)
            Plz see http://www.datanucleus.org/servlet/wiki/display/ENG/DNAuditor+Requirements+%28Draft%29
                Permalink
                Edit
                Remove
                Reply

                Sep 30, 2011
                Andrey says:
                okay, thanks :) i'll try both approaches when i have more spare time. another a...

                okay, thanks

                i'll try both approaches when i have more spare time.
                another auditing approach is to use aspects. at the first glance it offers storage independent auditing framework. i'll look into it too.

                Btw, Kiran, have you contacted with JFire devs? it seems, they urge auditing framework too, but don't have enough resources to implement it. 
@andyjefferson
Copy link
Member Author

Source code and Status can be accessed at DnAuditor Sourceforge
Based on discussions are at HOWTO implement Auditable ?
Following are the use-cases proposed for this extension.

  1. Basic AuditLog (Audit trail)
    Description : Audit log in simplest form can be achieved using tools like system.out.println(). In this use case it is critical for log to be available. The format, location are least important. Queries on this data happens less frequently. Since the format is not important the developer may choose to store the snapshot in serialized format, json format or rdbms format (see alternative storage). Default impl will be for rdbms. In rdbms format a single audit_log table will be used for all types. Each change is recorded as a row.
    Purpose : We use audit trail for a)Track user activity b) Statutory b) Support. During support we often get calls like - My charges are too high today. Till yesterday it was normal. We request the audit log from user and find that the user has changed the charging preference from transaction based to volume based. 
Having oldvalues along with newvalues is critical for readability.It is part of the root cause analysis

For example
Person Table
id name surname version
1 andy jefferson 1
2 Jasper Siepkes 1

After first change ie, converting name into upper case
id name surname version
1 ANDY jefferson 2
2 Jasper Siepkes 1
AuditLog Table
id objectId fieldName oldValue newValue action version time ObjectType
1 1 name andy andy insert 1 t1 Person
2 1 surname jefferson jefferson insert 1 t1 Person
3 2 name Jasper Jasper insert 1 t2 Person
4 2 surname Siepkes Siepkes insert 1 t2 Person
5 1 name andy ANDY update 2 t3 Person

Only the dirty fields are stored
AuditLog table may be split into two table header and dtl. Can be defined by the developer using jdo mapping.

  1. Basic AuditLog (Audit trail) With supplementary data
    Description In addition to changes to object an application may require to store information like userid,role,activity,terminal etc..
    There has to be an extention point or a facility for the developer to provide a custom auditlog or a supplementary class
    AuditLog Table
    id objectId fieldName oldValue newValue action version time ObjectType userid activity role
    1 1 name andy andy insert 1 t1 Person u1 create maker
    2 1 surname jefferson jefferson insert 1 t1 Person u1 create maker
    3 2 name Jasper Jasper insert 1 t2 Person u1 create maker
    4 2 surname Siepkes Siepkes insert 1 t2 Person u1 create maker
    5 1 name andy ANDY update 2 t3 Person a1 authorize checker

AuditLog table may be split into two table header and dtl. Can be defined by the developer using jdo mapping.

One-One RelationShip - Audit trail
TODO : Describe how to capture changes to child objects and relations

  1. Temporal Audit
    Description Audit needs to preserve history similar to svn/cvs systems. Every change is versioned and all versioned changes are stored. The historical data needs to be accessible to regular application users. It has to answer questions about to state of an object in time. The historical data is frequently queried.
    Purpose We use Temporal design to resolve queries like What was andy's Address on 15th of August
    What is the sixth revision of Person.name and What was the value of surname on 15th Aug 2011. Also known as Actual Temporal
    DNAuditor should provide this without needing any change in application model. For example when a Book has to be audited the AuditableBook has to be generated by DNAuditor. End result is the application developer need not pollute his domain model with temporal aspects of the system.

TODO : Give examples. Describe how to capture changes to child objects and relations

  1. Temporal Audit With supplementary data
    Description In addition to changes to object an application may require to store information like userid,role,activity etc..
    There has to be an extention point or a facility for the developer to provide a custom auditlog or a supplementary class

TODO : Give examples , Describe how to capture changes to child objects and relations

  1. Temporal Audit With Undo
    Description It is need to restore the system to a know state in time. It is often needed to revert the state of an object or the system back in time. For eg : When a change is rejected.
    DNAuditor should provide this without needing any change in application model.

TODO : Give examples. Describe how to capture changes to child objects and relations. Describe how undo buffers are stored.

  1. Bi-Temporal
    Description Bi_temporal and Multi-Temporal systems support multiple dimensions of time. Commercial RDBMS has minimal support for storing and retrieving temporal data like validity,period,actual time, record time etc.. Where there is a support, DNAuditor should delegate temporal queries to underlining database. Domain model of the application requires significant refactoring. DNAuditor can simplify handling temporal datatypes and optimize the queries.

TODO : Give examples.

  1. Alternative Storage
    Description All the above solutions assume slow changing data. High throughput systems require faster and efficient systems to store audit data. For eg : OLTP runs on an RDBMS and the audit data is stored in MongoDB. This data is needed by systems like OFAC,AML etc.. FederatedDataStore can be extended for such use.
    The desired implementations are csv,excel,DB4o,MongoDB.

TODO : Give examples.

  1. Audit with Integrity
    Description Audit data needs to be immutable. However when it lives on a shared server for eg on a cloud it is vulnerable to be tampered by internal/external users. Cryptography has to be used to provide this kind of integrity. One effective solution is to support HMAC. Developer should be able to enable or disable this feature.

TODO : Give examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant