Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 81: touch up XML some #82

Merged
merged 10 commits into from
Feb 16, 2022
Merged

Issue 81: touch up XML some #82

merged 10 commits into from
Feb 16, 2022

Conversation

al-niessner
Copy link
Contributor

@al-niessner al-niessner commented Jan 26, 2022

🗒️ Summary

Cleaned up more code. Handle the code more or less uniformly from endpoint to return result.

Update also appears to fix a bug identified here: NASA-PDS/pds-api#155

⚙️ Test Data and/or Report

$ curl --location --request GET 'http://localhost:8080/products/urn:nasa:pds:izenberg_pdart14_meap:document:ns_inst::1.0' --header 'Accept: application/pds4+xml'

♻️ Related Issues

Resolves NASA-PDS/pds-api#81
Resolves NASA-PDS/pds-api#73 (update to newest dataset exposed this and then fixing pds4 blob fixed this as well)
Resolves NASA-PDS/pds-api#155
Resolves #456

@al-niessner al-niessner self-assigned this Jan 26, 2022
@al-niessner
Copy link
Contributor Author

Does the decode/inflate of the blob to string of XML but now the serializer makes a mess of the XML string doing substitution for all of the XML special characters like '<'.

Worked on parsing out the <?xml> bits and inserting the pds4 blob as XML. Had it most of the way there when I noticed always getting a plural result when asking for a singular. Fixed that too. Now XML blob is broken again but only for pds4+xml
@al-niessner
Copy link
Contributor Author

@tloubrieu-jpl @jordanpadams

Can you look at the output for the XML types please? I am sure it is not correct but I need to know what tags need fixed and how. The biggest problem is blob needs to be inserted as XML and that is just causing jackson converter to go nuts. I have something working now, but it is less than perfect. I am not trying to reach perfect nor walk the perfect path but just to deliver XML that is good enough.

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Feb 3, 2022

Thanks @al-niessner

I looked at the new output and that looks good overall.

For the details I tried to see how the xmlns prefix should be used and provided some notes on that. I also don't understand why Pds4Metadata tag became the new name of meta.

Here are the details of my notes:
Using request:

curl --location --request GET 'http://localhost:8080/products' \
--header 'Accept: application/pds4+xml'
  1. I don't think we need the first `
xmlns="http://pds.nasa.gov/pds4/pds/v1"` 

since it relates to the pds4 schema which is already in the pds4 tag below.

  1. pds4metadata
    '
  <id>urn:nasa:pds:insight_rad:data_raw:hp3_rad_raw_00390_20200101_120222::1.0</id>
            <Pds4Metadata xmlns="">

The id should have a pds_api tag prefix since it relates to the pds_api

I would keep the pds_api:meta tag as it is for pds4+json

We should not have an ampty xmlns attribute.

  1. Could we have the embedded data_files tag singular (without 's') ? I know it is not always easy with the existing marshallers or unmarshallers.
<data_files>
    <data_files>
  1. the pds4 decoded label looks good. pds4 tag should prefixed with pds_api.

Using request:

curl --location --request GET 'http://localhost:8080/products/urn:nasa:pds:insight_rad:data_raw:hp3_rad_raw_00390_20200101_120222::1.0' \
--header 'Accept: application/pds4+xml'
  1. I believe all this section
  <id>urn:nasa:pds:insight_rad:data_raw:hp3_rad_raw_00390_20200101_120222::1.0</id>
    <Pds4Metadata xmlns="">
        <node_name>PDS_ENG</node_name>
        <label_file>
            <file_name>hp3_rad_raw_00390_20200101_120222.xml</file_name>
            <file_ref>/data/data_raw/hp3_rad_raw_00390_20200101_120222.xml</file_ref>
            <creation_date>2020-06-02T17:58:34Z</creation_date>
            <file_size>23809</file_size>
            <md5_checksum>5317a6bbadca248da550780f37d531df</md5_checksum>
        </label_file>
        <data_files>
            <data_files>
                <file_name>hp3_rad_raw_00390_20200101_120222.tab</file_name>
                <file_ref>/data/data_raw/hp3_rad_raw_00390_20200101_120222.tab</file_ref>
                <creation_date>2020-06-01T11:36:37Z</creation_date>
                <file_size>21216107</file_size>
                <md5_checksum>c1f92f8eb97f437599545263ecf113dd</md5_checksum>
                <mime_type>text/plain</mime_type>
            </data_files>
        </data_files>
    </Pds4Metadata>

Should have a pds_api: prefix

Pds4Metadata tag name should be meta

  1. pds4 tag name should also have the pds_api: prefix

@jordanpadams
Copy link
Member

jordanpadams commented Feb 3, 2022

@tloubrieu-jpl @al-niessner for this one, let's table it:

Could we have the embedded data_files tag singular (without 's') ? I know it is not always easy with the existing marshallers or unmarshallers.

I would like to create a new ticket to rename this section altogether to use the ops: namespace

@al-niessner
Copy link
Contributor Author

@tloubrieu-jpl @jordanpadams

I left the xmlns in the <product> because I do not understand the reason to remove them. If the pds_api namespace is defined by xmlns:pds_api="http://pds.nasa.gov/api" xmlns="http://pds.nasa.gov/pds4/pds/v1" then it should remain with <product> and the ones in <Product_Document> are redundant but harmless.

The double usage is probably because the namespace is required when the blob is taken standalone. Therefore, if you want aesthetic results, then I would remove the ones from the decoded blob since they are technically the redundant definitions. Note: this will be a huge pain to actually do unless you can guarantee that they are all identical in all blobs.

@al-niessner
Copy link
Contributor Author

al-niessner commented Feb 7, 2022

@tloubrieu-jpl

I added and modified tags are requested. Here is the result but also pushed so feel free to run your own tests:

$ curl --location --request GET 'http://localhost:8080/products/urn:nasa:pds:izenberg_pdart14_meap:document:ns_inst::1.0' --header 'Accept: application/pds4+xml' | xmllint --format -
<?xml version="1.0"?>
<pds_api:product xmlns:pds_api="http://pds.nasa.gov/api" xmlns="http://pds.nasa.gov/pds4/pds/v1">
  <pds_api:id>urn:nasa:pds:izenberg_pdart14_meap:document:ns_inst::1.0</pds_api:id>
  <pds_api:meta>
    <node_name>PSA</node_name>
    <label_file>
      <file_name>ns_inst.xml</file_name>
      <file_ref>/var/local/harvest/archive/document/ns_inst.xml</file_ref>
      <creation_date>2022-01-24T20:08:23Z</creation_date>
      <file_size>3589</file_size>
      <md5_checksum>a8d09cca0a01728db50c15052c2736cf</md5_checksum>
    </label_file>
    <data_files>
      <data_files>
        <file_name>ns_inst.pdf</file_name>
        <file_ref>/var/local/harvest/archive/document/ns_inst.pdf</file_ref>
        <creation_date>2022-01-24T20:08:23Z</creation_date>
        <file_size>138172</file_size>
        <md5_checksum>8103f20c13a3c321dac4a193aba19d16</md5_checksum>
        <mime_type>application/pdf</mime_type>
      </data_files>
    </data_files>
  </pds_api:meta>
  <pds_api:pds4>
    <Product_Document xmlns="http://pds.nasa.gov/pds4/pds/v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pds.nasa.gov/pds4/pds/v1 http://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1E00.xsd">
      <Identification_Area>
        <logical_identifier>urn:nasa:pds:izenberg_pdart14_meap:document:ns_inst</logical_identifier>
        <version_id>1.0</version_id>
        <title>MESSENGER Neutron Spectrometer (NS) Description</title>
        <information_model_version>1.14.0.0</information_model_version>
        <product_class>Product_Document</product_class>
        <Citation_Information>
          <publication_year>2016</publication_year>
          <description>Description of the MESSENGER Neutron Spectrometer (NS).</description>
        </Citation_Information>
        <Modification_History>
          <Modification_Detail>
            <modification_date>2016-03-16</modification_date>
            <version_id>1.0</version_id>
            <description>Initial PDS4 version.</description>
          </Modification_Detail>
        </Modification_History>
      </Identification_Area>
      <Context_Area>
        <Investigation_Area>
          <name>MESSENGER</name>
          <type>Mission</type>
          <Internal_Reference>
            <lid_reference>urn:nasa:pds:context:investigation:mission.messenger</lid_reference>
            <reference_type>document_to_investigation</reference_type>
          </Internal_Reference>
        </Investigation_Area>
        <Observing_System>
          <name>MESSENGER</name>
          <Observing_System_Component>
            <name>MESSENGER</name>
            <type>Host</type>
            <Internal_Reference>
              <lid_reference>urn:nasa:pds:context:instrument_host:spacecraft.mess</lid_reference>
              <reference_type>is_instrument_host</reference_type>
            </Internal_Reference>
          </Observing_System_Component>
          <Observing_System_Component>
            <name>NS</name>
            <type>Instrument</type>
            <Internal_Reference>
              <lid_reference>urn:nasa:pds:context:instrument:ns.mess</lid_reference>
              <reference_type>is_instrument</reference_type>
            </Internal_Reference>
          </Observing_System_Component>
        </Observing_System>
        <Target_Identification>
          <name>Mercury</name>
          <type>Planet</type>
          <Internal_Reference>
            <lid_reference>urn:nasa:pds:context:target:planet.mercury</lid_reference>
            <reference_type>document_to_target</reference_type>
          </Internal_Reference>
        </Target_Identification>
      </Context_Area>
      <Document>
        <publication_date>2016-03-16</publication_date>
        <description>Description of the MESSENGER Neutron Spectrometer (NS).</description>
        <Document_Edition>
          <edition_name>MESSENGER NS Instrument Description</edition_name>
          <language>English</language>
          <files>1</files>
          <Document_File>
            <file_name>ns_inst.pdf</file_name>
            <document_standard_id>PDF/A</document_standard_id>
          </Document_File>
        </Document_Edition>
      </Document>
    </Product_Document>
  </pds_api:pds4>
</pds_api:product>

@jordanpadams
Copy link
Member

@al-niessner this should be revised:

<pds_api:product xmlns:pds_api="http://pds.nasa.gov/api" xmlns="http://pds.nasa.gov/pds4/pds/v1">

should be:

<pds_api:product xmlns:pds_api="http://pds.nasa.gov/api">

This namespace http://pds.nasa.gov/pds4/pds/v1 should not apply at the top-level API response. As you noted, having this at each individual product as it comes back from the blob is fine.

@al-niessner
Copy link
Contributor Author

al-niessner commented Feb 7, 2022

@tloubrieu-jpl @jordanpadams

Removed default namespace.

$ curl --location --request GET 'http://localhost:8080/products/urn:nasa:pds:izenberg_pdart14_meap:document:ns_inst::1.0' --header 'Accept: application/pds4+xml' | xmllint --format -
<?xml version="1.0"?>
<pds_api:product xmlns:pds_api="http://pds.nasa.gov/api">
  <pds_api:id>urn:nasa:pds:izenberg_pdart14_meap:document:ns_inst::1.0</pds_api:id>
  <pds_api:meta>
    <node_name>PSA</node_name>
    <label_file>
      <file_name>ns_inst.xml</file_name>
      <file_ref>/var/local/harvest/archive/document/ns_inst.xml</file_ref>
      <creation_date>2022-01-24T20:08:23Z</creation_date>
      <file_size>3589</file_size>
      <md5_checksum>a8d09cca0a01728db50c15052c2736cf</md5_checksum>
    </label_file>
    <data_files>
      <data_files>
        <file_name>ns_inst.pdf</file_name>
        <file_ref>/var/local/harvest/archive/document/ns_inst.pdf</file_ref>
        <creation_date>2022-01-24T20:08:23Z</creation_date>
        <file_size>138172</file_size>
        <md5_checksum>8103f20c13a3c321dac4a193aba19d16</md5_checksum>
        <mime_type>application/pdf</mime_type>
      </data_files>
    </data_files>
  </pds_api:meta>
  <pds_api:pds4>
    <Product_Document xmlns="http://pds.nasa.gov/pds4/pds/v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pds.nasa.gov/pds4/pds/v1 http://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1E00.xsd">
      <Identification_Area>
        <logical_identifier>urn:nasa:pds:izenberg_pdart14_meap:document:ns_inst</logical_identifier>
        <version_id>1.0</version_id>
        <title>MESSENGER Neutron Spectrometer (NS) Description</title>
        <information_model_version>1.14.0.0</information_model_version>
        <product_class>Product_Document</product_class>
        <Citation_Information>
          <publication_year>2016</publication_year>
          <description>Description of the MESSENGER Neutron Spectrometer (NS).</description>
        </Citation_Information>
        <Modification_History>
          <Modification_Detail>
            <modification_date>2016-03-16</modification_date>
            <version_id>1.0</version_id>
            <description>Initial PDS4 version.</description>
          </Modification_Detail>
        </Modification_History>
      </Identification_Area>
      <Context_Area>
        <Investigation_Area>
          <name>MESSENGER</name>
          <type>Mission</type>
          <Internal_Reference>
            <lid_reference>urn:nasa:pds:context:investigation:mission.messenger</lid_reference>
            <reference_type>document_to_investigation</reference_type>
          </Internal_Reference>
        </Investigation_Area>
        <Observing_System>
          <name>MESSENGER</name>
          <Observing_System_Component>
            <name>MESSENGER</name>
            <type>Host</type>
            <Internal_Reference>
              <lid_reference>urn:nasa:pds:context:instrument_host:spacecraft.mess</lid_reference>
              <reference_type>is_instrument_host</reference_type>
            </Internal_Reference>
          </Observing_System_Component>
          <Observing_System_Component>
            <name>NS</name>
            <type>Instrument</type>
            <Internal_Reference>
              <lid_reference>urn:nasa:pds:context:instrument:ns.mess</lid_reference>
              <reference_type>is_instrument</reference_type>
            </Internal_Reference>
          </Observing_System_Component>
        </Observing_System>
        <Target_Identification>
          <name>Mercury</name>
          <type>Planet</type>
          <Internal_Reference>
            <lid_reference>urn:nasa:pds:context:target:planet.mercury</lid_reference>
            <reference_type>document_to_target</reference_type>
          </Internal_Reference>
        </Target_Identification>
      </Context_Area>
      <Document>
        <publication_date>2016-03-16</publication_date>
        <description>Description of the MESSENGER Neutron Spectrometer (NS).</description>
        <Document_Edition>
          <edition_name>MESSENGER NS Instrument Description</edition_name>
          <language>English</language>
          <files>1</files>
          <Document_File>
            <file_name>ns_inst.pdf</file_name>
            <document_standard_id>PDF/A</document_standard_id>
          </Document_File>
        </Document_Edition>
      </Document>
    </Product_Document>
  </pds_api:pds4>
</pds_api:product>

@tloubrieu-jpl
Copy link
Member

Thanks @al-niessner ,

I am seeing that the meta content is still without the pds_api prefix. Is it complicated to update ?
Anyway I will merge it as-is and add a note in @jordanpadams ticket for this section (NASA-PDS/pds-api#154)

@al-niessner
Copy link
Contributor Author

al-niessner commented Feb 16, 2022

Thanks @al-niessner ,

I am seeing that the meta content is still without the pds_api prefix. Is it complicated to update ? Anyway I will merge it as-is and add a note in @jordanpadams ticket for this section (NASA-PDS/pds-api#154)

@tloubrieu-jpl

I thought the answer would be yes but it is no. I have nothing else going today so I will just keep working at it but it will not be done before lunch.

The jackson XmlMapper is being used to convert that part and it does not obviously accept a namespace qualifier. It must allow it somewhere so I will keep looking.

--- more

Just googled about and jackson wants '@' attributes to add the namespace. However the bean being translated is part of the code generated by swagger. Not sure I can add them via swagger so looking for more direct method.

@tloubrieu-jpl tloubrieu-jpl merged commit 45317d6 into main Feb 16, 2022
@tloubrieu-jpl tloubrieu-jpl deleted the issue_81 branch February 16, 2022 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants