Skip to content

Lesson: Add attached files

Bess Sadler edited this page Jan 25, 2017 · 11 revisions

Goals

  • Attaching file sub-resources to models
  • See where files are stored in Fedora objects and how to retrieve them

Explanation

So far, we've only added metadata to our objects. Let's attach a file that has some content to it. For example, for our BibliographicFileSet model, this could be an image of the bibliographic resource's cover or a pdf of the bibliographic resource's content, or for the PageFileSet model, an image or pdf of a single page.

In this case, we'll add a file where we can store a pdf of a page.

Steps

Step 1: In the console, add a content file resource to the Page model

When we originally built the PageFileSet model, we added a property named text to hold the text of the page. But for those who have electronic versions of pages, you will want to upload a file instead. The following shows how to attach a content file and later steps will show how to create derivatives of the content file.

By defining our PageFileSet model to include the behaviors of a file set, it is ready to have the page content uploaded. Each file you want to upload will go into a separate file set. Generic files are defined to hold one uploaded content file and any number of derivatives of the uploaded content, for example a thumbnail image file and full text file. The following shows an example of uploading a content file.

require 'open-uri'
pf1 = PageFileSet.find('page-1')
=> #<PageFileSet id: "page-1", page_number: 1, text: "Once upon a midnight dreary...", head_id: nil, tail_id: nil>

file1 = open("https://github.com/projecthydra-labs/hydra-works/wiki/raven_files/TheRaven_page1.pdf","r")
=> #<Tempfile:/var/folders/cm/zq5vgsj946n5hws81m85h5fr0000gn/T/open-uri20150922-869-2uceq0>

Hydra::Works::UploadFileToFileSet.call(pf1, file1)
=> #<PageFileSet id: "page-1", page_number: 1, text: "Once upon a midnight dreary...", head_id: nil, tail_id: nil>

pf1.save
=> true

pf1.files
=> [#<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/a64557f8-1c74-4cf0-9d55-3acaebf98bc7" >]

NOTE: There are several ways to create a file that is acceptable to the UploadFileToFileSet service. See the documentation in the header of the service definition file for an exhaustive list. At the writing of this tutorial, the list of accepted content files is...

    # @param [IO,File,Rack::Multipart::UploadedFile, #read] object that will be the contents. If file responds to :mime_type or :original_name, those will be called to provide technical metadata.

If you want to upload a local file rather than one from a URL, you can issue the following commands:

pf1 = PageFileSet.find('page-1')
file1 = open("/path/to/a/local/file.pdf")
Hydra::Works::UploadFileToFileSet.call(pf1, file1)
pf1.save
pf1.files

Step 2: View the contents from Fedora

Copy the URL you get when you run pf1.files and paste it into your browser. You may need to enter your fedora user and password (default username/password fedoraAdmin/fedoraAdmin).

NOTE: Some browsers will recognize that this is a pdf file and open it appropriately. Or it may try to open it as text and you will need to choose to open it with Adobe Reader.

Step 3: Fix the mimetype set by github

If you used open-uri to open the file directly from github, then the mimetype on the file is incorrectly set to "application/octet-stream". We are going to change it to "application/pdf" before continuing with derivatives.

f1 = pf1.files.first
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-31/files/2520d9d5-4631-4f20-9fce-f40eb1bd1095" >

f1.mime_type
=> "application/octet-stream"

f1.mime_type = 'application/pdf'
=> "application/pdf"

pf1.files.first.mime_type
=> "application/pdf"

Step 4: Generate standard derivatives

NOTE: Could not get this running in 1-25-2017 (Hydra 11) update to this tutorial.

There are dependencies that have to be installed prior to being able to generate a thumbnail. See hydra-derivatives for the dependency list and other useful information on working with the hydra-derivatives gem.

Once dependencies have been installed, type the following in the rails console to generate a thumbnail.

pf1.create_derivatives
=> [{:label=>:thumbnail, :format=>"jpg", :size=>"338x493", :object=>#<PageFileSet id: "page-1", head: [], tail: [], page_number: 1, text: "Once upon a midnight dreary...">}]

pf1.files
=> [#<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/50a56242-6ab0-4234-bfcb-b6321dfeec6f" >, 
    #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/d0fad131-50d3-4136-b670-9880c3a0e0f2" >]

pf1.thumbnail
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/d0fad131-50d3-4136-b670-9880c3a0e0f2" >

NOTE: At the time of this writing, create_derivatives only creates a thumbnail for pdf files. To see what create_derivatives generates for various file types, see #create_derivatives method in lib/hydra/works/models/concerns/file_set/derivatives.rb.

Step 5: View the thumbnail from Fedora

Copy the URL from pf1.thumbnail and paste it into your browser. You may need to enter your fedora user and password (default username/password fedoraAdmin/fedoraAdmin).

Step 6: Generate full text derivative

warning Warning: It appears that the full text derivative service has been removed. ### TODO Look into whether there is an alternate way to generate full text.

To generate the full text derivative, type the following in the rails console.

extracted_text = Hydra::Works::FullTextExtractionService.run(pf1)
=> # all the text for page 1

pf1.build_extracted_text
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/7d761246-e65b-48d1-b6c3-0e9537cdf5f2" >

pf1.extracted_text.content = extracted_text
=> # all the text for page 1

pf1.save
=> true

pf1.extracted_text
=> #<Hydra::PCDM::File uri="http://127.0.0.1:8983/fedora/rest/dev/page-1/files/7d761246-e65b-48d1-b6c3-0e9537cdf5f2" >

NOTE: The process for generating derivatives is under review and will likely change such that all derivatives are generated through the hydra-derivatives gem.

Step 7: View the extracted text from Fedora

Copy the URL from pf1.extracted_text and paste it into your browser. You may need to enter your fedora user and password (default username/password fedoraAdmin/fedoraAdmin).

Step 8: Add files for other pages

If you like, this is a good time to use this same process to add the other pages of The Raven to the other page files.

Next Step

Proceed to BONUS Lesson: Generate Rails Scaffolding for Creating and Editing or explore other [Dive into Hydra-Works](Dive into Hydra-Works#Bonus) tutorial bonus lessons.

Clone this wiki locally