Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raw json output to JsonLdExtractor #103

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Granitosaurus
Copy link

Sometimes jsonld schema is prefered in raw json format rather than python dict - this PR implementes as_json kwarg bool to determine whether to return python dictionary or json string.

Some use cases:

1. Use alternative json library
2. Use json.loads kwargs
3. Use object hook functions

My personal use case is to strip away @ keys with object hook:

def strip_jsonld(data):
    return {k: v for k, v in data.items() if not k.startswith('@')}

data = json.loads(script, object_hook=strip_jsonld)

@codecov
Copy link

codecov bot commented Jan 25, 2019

Codecov Report

Merging #103 into master will decrease coverage by 0.32%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #103      +/-   ##
==========================================
- Coverage    87.3%   86.98%   -0.33%     
==========================================
  Files          11       11              
  Lines         457      461       +4     
  Branches       97       98       +1     
==========================================
+ Hits          399      401       +2     
- Misses         52       53       +1     
- Partials        6        7       +1
Impacted Files Coverage Δ
extruct/jsonld.py 100% <100%> (ø) ⬆️
extruct/rdfa.py 91.3% <60%> (-8.7%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ab5592...ff22c72. Read the comment docs.

Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Granitosaurus I think this is a useful feature to have (although #69 still has it's place), left some comments below.

extruct/jsonld.py Outdated Show resolved Hide resolved
extruct/jsonld.py Outdated Show resolved Hide resolved
Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Granitosaurus thanks 👍 Left a few minor comments. Also would be nice to mention this feature in the README.

extruct/jsonld.py Show resolved Hide resolved
extruct/jsonld.py Show resolved Hide resolved
granitosaurus added 2 commits January 29, 2019 01:39
@Granitosaurus
Copy link
Author

Also added this feature to RDFA extractor and updated readme with short examples.

@Gallaecio
Copy link
Member

This looks mostly ready. What about adding a test covering the usage of parse_json=False?

@Granitosaurus
Copy link
Author

Sorry that it took me so long to attend to this but the tests gave me a bit of an headache :D

Should be good to go now!

@Gallaecio
Copy link
Member

TestRDFa.test_wikipedia_xhtml_rdfa_no_prefix seems to be failing after your changes.

@Granitosaurus
Copy link
Author

TestRDFa.test_wikipedia_xhtml_rdfa_no_prefix seems to be failing after your changes.

It has been failing master for me too; I actually update the fixtures in this PR to prevent it failing but the test just seems to be flawed in some sense - it keeps either breakin on travis or on my machine locally. 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants