Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging two functionalities in grobid #935

Closed
Tanmay98 opened this issue Jul 20, 2022 · 4 comments
Closed

Merging two functionalities in grobid #935

Tanmay98 opened this issue Jul 20, 2022 · 4 comments
Labels
question There's no such thing as a stupid question

Comments

@Tanmay98
Copy link

Right now, grobid follows the cascading like:
Segmentation->Header, Fulltext, Reference segmenter

And I want to also use:
Custom segmentation->Custom feature

Is it possible to combine both in one build?
what I mean is that right now when I run processFullText, grobid follows the set hierarchy, but let's say if I run processMyFeature and I want grobid to follow some other custom hierarchy like I mentioned above.

All in all is it possible to add both of these seperate cascadings in one single build??

Thanks in advance!

@kermitt2
Copy link
Owner

Hi @Tanmay98

Is "Custom feature" an existing submodel of Grobid or you would like to write your own?

The existing sub-models are constrained in term of input/output, and all the output do not have a final serialization - so something to return. At least a new result serialization (in xml or json) would be necessary.

If one wants to add its own process at any stages of the processing hierarchy, currently some Java development for this new process is required. This is done in the grobid modules listed here, which introduce additional models applied after segmentation or fulltext, on certain relevant substructures.

@kermitt2 kermitt2 added the question There's no such thing as a stupid question label Jul 20, 2022
@Tanmay98
Copy link
Author

Tanmay98 commented Jul 20, 2022

Thankyou for your quick response @kermitt2 !

Actually, no the custom feature is not an existing submodule of grobid.

My concern is that i want two seperate hierarchies to run. For example, I want to use the current hierarchy that grobid by default follows as well my other custom heirarchy. I was wondering if it was possible?

Also, I did went through the grobid-dictionary submodule. So regarding that I assumed that using maven I will be only able to run the dictionary part and not the default grobid features using one single server.
I am sorry but I am new to java and maven, etc. (I know Machine Learning very well). Is it possible to run both the grobid dictionary modules as well as default grobid modules by running only one server? As in if i run maven/./gradlew run, I am able to run both processfulltext as well as processDictionary?

@Tanmay98
Copy link
Author

Hi @kermitt2, my goal was to train models such as segmentation(for grobid) and segmentation(grobid-dictionary) from a single server run (./gradlew run)
So I tried to combine both grobid dictionary modules and grobid modules in one single pipeline.
I made necessary files in grobid-core and grobid-trainer as well as attached two different TEI formatter (one for grobid dictionary and other for grobid). Finally I also did changes in the gradle build file.
I was able to successfully build the library but when I run ./gradlew train_dictionary_body_segmentation, I get the following errors
Screenshot 2022-07-26 at 11 10 13 AM

Can you help me?

@kermitt2
Copy link
Owner

Hello @Tanmay98 !

Apparently you need to load a property file specific to grobid-dictionaries and instantiate a GrobidDictionaryProperties object.

But I am was not part of the developers of grobid-dictionaries - you will certainly receive better help by asking in the grobid-dictionaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question There's no such thing as a stupid question
Projects
None yet
Development

No branches or pull requests

2 participants