Skip to content

Latest commit

 

History

History
169 lines (165 loc) · 11.4 KB

README.md

File metadata and controls

169 lines (165 loc) · 11.4 KB

Submissions

Team MiBi used LLM-based retrieval-augmented generation to answer questions in task 12b. For document retrieval, the team either queried the PubMed search API or retrieved from an Elasticsearch index of the PubMed 2024 baseline, filtering based on publication types and using a BM25 scoring of the article's title, abstract, and MeSH terms. Snippets were extracted either by GPT-3.5 chain-of-thought few-shot prompting or heuristically by re-ranking the title and chunks of up to three sentences from the abstract. For snippet re-ranking, team MiBi used pre-trained bi-encoders, cross-encoders, and lexical BM25 scoring. Answers were generated by zero-shot prompting GPT-3.5, GPT-4, or Mixtral-8x7B. The team also used the DSPy LLM programming framework for some runs but did not use DSPy to tune the prompts. Besides the commonly used retrieve-then-generate paradigm, team MiBi also submitted runs following variants of the generate-then-retrieve paradigm to expand the query before retrieval and re-ranking of either documents or snippets. Further, in one approach, DSPy was used to let the LLM (Mixtral-8x7B) decide the order in which the answer was formed (document retrieval, snippet extraction and ranking, exact answer generation, and ideal answer generation).

Individual Runs

The following list briefly describes which approaches were submitted to BioASQ in each batch and phase:

  • Batch 1:
    • Phase A:
      • mibi_rag_abstract:
        • Retrieve 200 documents with PubMed API (using the question text as query; removing the last query term then first until the result set is not empty anymore).
        • Re-rank with BM25 (k1=1.5, b=0.75, epsilon=0.25, concatenated title and abstract, NLTK tokenizer, excluding NLTK stopwords and punctuation, case-insensitive).
        • Cut-off at top-50.
        • Re-rank with cross-encoder (cross-encoder/msmarco-MiniLM-L6-en-de-v1, original question, only abstract).
        • Cut-off at top-25.
        • Re-rank with bi-encoder (sentence-transformers/all-mpnet-base-v2, original question, only abstract).
        • Cut-off at top-10.
      • mibi_rag_snippet:
        • Retrieve documents the same as mibi_rag_abstract.
        • Generate snippets with GPT-3.5 (turbo, temperature?, max tokens?) chain-of-thought few-shot prompting (prompt).
    • Phase A+:
      • mibi_rag_abstract:
        • Retrieve documents the same as mibi_rag_abstract (Phase A).
        • Use top-3 abstracts concatenated as context for answer generation.
        • Generate an exact answer to the question with GPT-3.5 (turbo, temperature?, max tokens?) zero-shot prompting (prompt), with 4 response models depending on the question type.
        • Generate an "ideal" (long-form) answer to the question with GPT-3.5 (turbo, temperature?, max tokens?) zero-shot prompting (prompt), with 2 response models depending on the question type.
      • mibi_rag_snippet:
        • Retrieve documents and snippets the same as mibi_rag_snippet (Phase A).
        • Use all (top-10) snippets concatenated as context for answer generation.
        • Generate an exact answer to the question the same as mibi_rag_abstract.
        • Generate an "ideal" answer to the question the same as mibi_rag_abstract.
    • Phase B:
      • mibi_rag_abstract:
        • Re-rank the provided abstracts the same as mibi_rag_abstract (Phase A; cross-encoder and bi-encoder).
        • Use top-3 abstracts concatenated as context for answer generation.
        • Generate an exact answer to the question the same as mibi_rag_abstract (Phase A+).
        • Generate an "ideal" answer to the question the same as mibi_rag_abstract (Phase A+).
      • mibi_rag_snippet:
        • Use all (top-10) provided snippets concatenated as context for answer generation.
        • Generate an exact answer to the question the same as mibi_rag_snippet (Phase A+).
        • Generate an "ideal" answer to the question the same as mibi_rag_snippet (Phase A+).
  • Batch 2:
    • Phase A:
      • mibi_rag_abstract:
        • Index PubMed 2024 baseline in Elasticsearch (including metadata).
        • Retrieve 10 documents from Elasticsearch (consider only articles with an abstract, disallow 27 non-peer-reviewed publication types, match the question to the article's title and abstract, and match the medical entities (SciSpaCy) from the question to MeSH terms of the PubMed article).
      • mibi_rag_snippet:
        • Retrieve documents the same as mibi_rag_abstract.
        • Extract snippets from the retrieved articles (the full title and up to 3 sentences from the abstract).
        • Re-rank the top-100 with the TAS-B bi-encoder (sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco).
        • Re-rank the top-5 with the duoT5 cross-encoder (castorini/duot5-base-msmarco).
    • Phase A+:
      • mibi_rag_abstract:
        • Retrieve documents the same as mibi_rag_abstract (Phase A).
        • Use top-3 abstracts concatenated as context for answer generation.
        • Generate an exact answer to the question the same as mibi_rag_abstract (Batch 1).
        • Generate an "ideal" answer to the question the same as mibi_rag_abstract (Batch 1).
      • mibi_rag_snippet:
        • Retrieve documents and snippets the same as mibi_rag_snippet (Phase A).
        • Use all (top-10) snippets concatenated as context for answer generation.
        • Generate an exact answer to the question the same as mibi_rag_snippet (Batch 1).
        • Generate an "ideal" answer to the question the same as mibi_rag_snippet (Batch 1).
    • Phase B:
      • mibi_rag_abstract:
        • Re-rank and generate answers the same as mibi_rag_abstract (Batch 1).
      • mibi_rag_snippet:
        • Generate answers the same as mibi_rag_snippet (Batch 1).
  • Batch 3:
    • Modules:
      • Documents:
        • Index PubMed 2024 baseline in Elasticsearch (including metadata).
        • If an exact answer is given, append it to the query (comma-separated).
        • If an ideal answer is given, append it to the query.
        • If snippets are given, de-passage them (max passage strategy).
        • If documents (directly or from snippets) are given, re-rank documents based on Elasticsearch retrieval score (consider only articles with an abstract, disallow 27 non-peer-reviewed publication types, match the question to the article's title and abstract, and match the medical entities (SciSpaCy) from the question to MeSH terms of the PubMed article).
        • Otherwise, retrieve 10 documents from Elasticsearch (same query strategy).
      • Snippets:
        • Index PubMed 2024 baseline in Elasticsearch (including metadata).
        • If an exact answer is given, append it to the query (comma-separated).
        • If an ideal answer is given, append it to the query.
        • If documents are given, extract snippets from the retrieved articles (the full title and up to 3 sentences from the abstract).
        • If snippets are given, append them to the extracted snippets, if any.
        • Re-rank snippets based on Elasticsearch document retrieval score (same query strategy as for documents)
        • Re-rank the top-100 with the TAS-B bi-encoder (sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco).
        • Re-rank the top-5 with the duoT5 cross-encoder (castorini/duot5-base-msmarco).
      • Exact answer:
        • Add the question text to the prompt.
        • If snippets are given, add each snippet's text to the prompt context.
        • If a (previous) exact answer is given, add the answer to the prompt context.
        • If an ideal answer is given, add the answer to the prompt context.
        • Generate an exact answer with DSPy's typed predictions (Mixtral-8x7B-Instruct-v0.1 from Blablador API, custom signature per question type, no prompt optimization)
      • Ideal answer:
        • Add the question text to the prompt.
        • If snippets are given, add each snippet's text to the prompt context.
        • If an exact answer is given, add the answer to the prompt context.
        • If a (previous) ideal answer is given, add the answer to the prompt context.
        • Generate an ideal answer with DSPy's typed predictions (Mixtral-8x7B-Instruct-v0.1 from Blablador API, custom signature, no prompt optimization)
    • Phase A:
      • mibi_rag_snippet:
        • Use the modules defined above in this order: documents, snippets, exact answer, ideal answer (retrieve-then-generate).
      • mibi_rag_abstract:
        • Use the modules defined above in this order: exact answer, ideal answer, documents, snippets (generate-then-retrieve).
      • mibi_rag_3:
        • Use the modules defined above in this order: documents, snippets, exact answer, ideal answer, documents, snippets (retrieve-then-generate-then-retrieve).
      • mibi_rag_4:
        • Use the modules defined above in this order: exact answer, ideal answer, documents, snippets, exact answer, ideal answer (generate-then-retrieve-then-generate).
      • mibi_rag_5:
        • Use the modules defined above incrementally (each module must be run at least once, the same module cannot run consecutively).
        • Add the question text and question type to the prompt.
        • Add the history of previously run modules to the prompt (module and whether it was successful).
        • Add a readiness flag to the prompt (whether the question is fully answered yet).
        • Determine the next module to run with DSPy's typed predictions (Mixtral-8x7B-Instruct-v0.1 from Blablador API, custom signature, custom suggestions, no prompt optimization)
    • Phase A+:
      • mibi_rag_abstract:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_snippet:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_3:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_4:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_5:
        • Use the same modules in the same way as Phase A.
    • Phase B:
      • mibi_rag_abstract:
        • The same as for batch 2, but the model is GPT-4.
      • mibi_rag_snippet:
        • The same as for batch 2, but the model is GPT-4.
      • mibi_rag_3:
        • Use the same modules in the same way as mibi_rag_snippet of Phase A.
      • mibi_rag_4:
        • Use the same modules in the same way as mibi_rag_4 of Phase A.
      • mibi_rag_5:
        • Use the same modules in the same way as mibi_rag_5 of Phase A.
  • Batch 4:
    • Phase A:
      • mibi_rag_abstract:
        • Use the same modules in the same way as Batch 3.
      • mibi_rag_snippet:
        • Use the same modules in the same way as Batch 3.
      • mibi_rag_3:
        • Use the same modules in the same way as Batch 3.
      • mibi_rag_4:
        • Use the same modules in the same way as Batch 3.
      • mibi_rag_5:
        • Use the same modules in the same way as Batch 3.
    • Phase A+:
      • mibi_rag_abstract:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_snippet:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_3:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_4:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_5:
        • Use the same modules in the same way as Phase A.
    • Phase B:
      • mibi_rag_abstract:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_snippet:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_3:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_4:
        • Use the same modules in the same way as Phase A.
      • mibi_rag_5:
        • Use the same modules in the same way as Phase A.