Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG]: Add rdf serialization #49

Merged
merged 35 commits into from
Oct 9, 2016
Merged

Conversation

satra
Copy link
Contributor

@satra satra commented Jun 23, 2014

This is ready for merge!

satra added 12 commits May 17, 2014 21:00
* upstream/master:
  Fixed Build status link
  Added build status from Travis-CI
  doc: fixed docstring for assertLess
  fix: remove redundant ls
  fix: remove checking for py3k
  fix: added assertLess
  fix: added support for asserts to unittest
  fix: more set fixes
  fix: support for 2.6 set
  fix: added python versions to tests
  fix: updated dateutil name
  enh: add travis testing file
@trungdong
Copy link
Owner

Hi @satra,

FYI, prov now has many more tests (https://github.com/trungdong/prov/tree/master/prov/tests), which you can use to test the RDF export. Since we don't have RDF import, you won't be able to do the round-trip tests as in the test cases, but even a one-way export test could be useful.

* upstream/master:
  Fixed: Cloning the records when creating a new document from them
  Bugfix regarding a software agent record.
@satra
Copy link
Contributor Author

satra commented Oct 11, 2014

@trungdong - i'm slowly making my way through rdf deserialization. in terms of comparing documents, how do you ensure that order of attributes don't matter?

ACTUAL: u'document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:label="bonjour"@fr, prov:label="hello", prov:label="activity2", prov:label="bye"@en])
endDocument'
 DESIRED: u'document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:label="activity2", prov:label="hello", prov:label="bye"@en, prov:label="bonjour"@fr])
endDocument'

to me these are the same graphs, but the roundtrip fails because the orders of attributes are different.

@trungdong
Copy link
Owner

Hi @satra,
Glad you have some time to get on with this. Thanks.

I suggest you do prov.model --> RDF --> prov.model. Comparing two ProvDocument instances is not sensitive to ordering (of attributes or records) as it uses set instead of list.

@satra
Copy link
Contributor Author

satra commented Oct 11, 2014

thanks @trungdong - i was doing assert_equal(g.get_provn(), g1.get_provn()), but i changed to assert_equal(g, g1)

@satra
Copy link
Contributor Author

satra commented Oct 11, 2014

even sets are not quite doing their job - will have to look into this further.

document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:type="a", prov:type=1, prov:type=2014-06-23T12:28:53.843000+01:00,
 prov:type="ex:abc" %% xsd:QName, prov:type="http://example.org/hello" %% xsd:anyURI,
 prov:type="1.0" %% xsd:float, prov:type="true", prov:label="activity2"])

endDocument

vs

document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:type="a", prov:type=1, prov:type=2014-06-23T12:28:53.843000+01:00, 
prov:type="http://example.org/hello" %% xsd:anyURI, prov:type="1.0" %% xsd:float, prov:type="ex:abc"
 %% xsd:QName, prov:type="true", prov:label="activity2"])

endDocument

sets

[(<QualifiedName: prov:type>, u'a'), (<QualifiedName: prov:type>, 1), (<QualifiedName: prov:type>, 
datetime.datetime(2014, 6, 23, 12, 28, 53, 843000, tzinfo=tzoffset(None, 3600))), (<QualifiedName:
 prov:type>, <Identifier: http://example.org/hello>), (<QualifiedName: prov:type>, <Literal: "1.0" %%
 xsd:float>), (<QualifiedName: prov:type>, <Literal: "ex:abc" %% xsd:QName>), (<QualifiedName:
 prov:type>, u'true'), (<QualifiedName: prov:label>, u'activity2')]

vs

[(<QualifiedName: prov:type>, u'a'), (<QualifiedName: prov:type>, 1), (<QualifiedName: prov:type>,
 datetime.datetime(2014, 6, 23, 12, 28, 53, 843000, tzinfo=tzoffset(None, 3600))), (<QualifiedName:
 prov:type>, <XSDQName: ex:abc>), (<QualifiedName: prov:type>, <Identifier: http://example.org/hello>),
 (<QualifiedName: prov:type>, <Literal: "1.0" %% xsd:float>), (<QualifiedName: prov:type>, u'true'),
 (<QualifiedName: prov:label>, u'activity2')]

@satra
Copy link
Contributor Author

satra commented Oct 11, 2014

forgot to say that the graphs in the previous comment are failing the assert.

@satra
Copy link
Contributor Author

satra commented Oct 11, 2014

nevermind - found it - it's the QName

* upstream/master:
  Fixed: PROV-N representation for  xsd:dateTime (closed trungdong#58)
  Fixed: Unintended merging of Identifier and QualifiedName values
* upstream/master:
  fix: formal attributes were not being included in all attributes
* upstream/master:
  Fixed trungdong#60 but no need to touch ProvRecord.formal_attributes (as per trungdong#61)
@cmaumet
Copy link
Contributor

cmaumet commented Dec 4, 2014

Hi @satra. You will find below an example in which the serialization to rdf adds extra (unwanted) qualified relations:

from prov.model import ProvDocument
from exporter.objects.constants import *

if __name__ == '__main__':
        doc = ProvDocument()

        activity_id = NIIRI["activity"]
        doc.activity(activity_id)

        entity_1 = NIIRI["entity_1"]
        doc.entity(entity_1)

        entity_2 = NIIRI["entity_2"]
        doc.entity(entity_2)

        doc.used(activity_id, entity_1)
        doc.wasGeneratedBy(entity_1, activity_id)
        doc.wasDerivedFrom(entity_1, entity_1)

        ttl_file = "example.ttl"
        ttl_fid = open(ttl_file, 'w');
        ttl_fid.write(doc.serialize(format='rdf'))

Obtained turtle export:

@prefix niiri:  .
@prefix prov:  .
@prefix rdf:  .
@prefix rdfs:  .
@prefix xml:  .
@prefix xsd:  .
niiri:entity_2 a prov:Entity .
niiri:activity a prov:Activity ;
    prov:qualifiedUsage [ a prov:Usage ;
            prov:entity niiri:entity_1 ] ;
    prov:used niiri:entity_1 .
niiri:entity_1 a prov:Entity ;
    prov:qualifiedDerivation [ a prov:Derivation ;
            prov:usedEntity niiri:entity_1 ] ;
    prov:qualifiedGeneration [ a prov:Generation ;
            prov:activity niiri:activity ] ;
    prov:wasDerivedFrom niiri:entity_1 ;
    prov:wasGeneratedBy niiri:activity .

Unfortunately, I did not find the fix...

I hope this example is useful. Let me know if I can help you to track this down!

@satra
Copy link
Contributor Author

satra commented Dec 4, 2014

@cmaumet - should entity_1 be derived from entity_1?

@satra
Copy link
Contributor Author

satra commented Dec 4, 2014

also the qualified relations aren't unwanted - that's how the representation for derivation is intended to be.

a wasDerivedFrom is a relationship, i.e. an edge between two nodes. the qualified derivation allows describing properties of that edge.

this is partly what makes the deserialization difficult.

@cmaumet
Copy link
Contributor

cmaumet commented Dec 4, 2014

thank you

@satra
Copy link
Contributor Author

satra commented Mar 19, 2016

a few more things to finalize:

  • support bundles (via trig)
  • make the tests run on travis
  • check py3 support
  • settle an issue with Decimal representation (see Literal comparison for Decimal values #77)
  • skip scruffy round trip tests for now - just ensure they can be read without error for the moment.

@satra
Copy link
Contributor Author

satra commented Mar 20, 2016

these tests fail round trip - need to figure out a way to skip.

FAIL: test_scruffy_end_2 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_end_3 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_end_4 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_generation_2 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_invalidation_2 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_start_2 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_start_3 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_start_4 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_usage_2 (prov.tests.test_rdf.RoundTripRDFTests)

@satra
Copy link
Contributor Author

satra commented Mar 20, 2016

@trungdong - some of the issues here are unfortunately due to rdflib interactions. but this is good for review. i still need to figure out the python 3 errors, again an interaction with rdflib! and how to suppress the failing scruff tests.

* fix/literal:
  fix: extra curly bracket
  fix: test setup
  fix: only escape triple quotes in the triple quote case
  fix: string representation containing double quotes or triple quotes - closes trungdong#79
@coveralls
Copy link

coveralls commented Jun 27, 2016

Coverage Status

Coverage decreased (-1.3%) to 89.442% when pulling 1debd4b on satra:enh/rdf-1.x into a556fa7 on trungdong:master.

* upstream/master:
  Remove networkx versioning also in setup.py
  Relaxed networkx requirement. Closed trungdong#84.
  Fix deprecated usage of cgi.escape since Python 3.3
@coveralls
Copy link

coveralls commented Oct 8, 2016

Coverage Status

Coverage decreased (-1.5%) to 89.207% when pulling 1f8955e on satra:enh/rdf-1.x into 5ddeade on trungdong:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.4%) to 90.311% when pulling 5a29a1b on satra:enh/rdf-1.x into 5ddeade on trungdong:master.

1 similar comment
@coveralls
Copy link

coveralls commented Oct 8, 2016

Coverage Status

Coverage decreased (-0.4%) to 90.311% when pulling 5a29a1b on satra:enh/rdf-1.x into 5ddeade on trungdong:master.

@satra satra changed the title WIP: Add rdf serialization [MRG]: Add rdf serialization Oct 8, 2016
@satra
Copy link
Contributor Author

satra commented Oct 8, 2016

@trungdong - finally got some time to fix and this is ready for merge :)

@satra
Copy link
Contributor Author

satra commented Oct 8, 2016

closes #1

@trungdong trungdong merged commit 4d2c236 into trungdong:master Oct 9, 2016
@trungdong
Copy link
Owner

Excellent! Thank you very much @satra!!!
I'll try to get some time this week to clean up the current master branch and will make a new release with RDF support soon.

cmaumet pushed a commit to cmaumet/nidmresults that referenced this pull request Oct 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants