You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current chunking strategy in Verba for Markdown includes the header text in each chunk (i.e. repeating the header text across multiple chunks), but currently, it only includes the top-level header. Subsection headers are dropped. (An example is provided below in the repro steps.)
There are probably many ways to ensure that retrieval attends to information provided by headers, but assuming the current approach in the project is just to stick them into the chunk content, I take it the subheaders should be included too.
I should be able to cut a PR for this.
Installation
pip install goldenverba
pip install from source
Docker installation
If you installed via pip, please specify the version:
Can reproduce just through the Verba UI by importing a Markdown doc. I viewed the content of the chunks using the /v1/objects API. I also cloned the repo and ran the chunker manually:
# 2024 U.S. Presidential Election Outcome## Overview
The 2024 U.S. presidential election concluded with former President Donald Trump defeating Vice President Kamala Harris, marking a historic return to the White House. This victory makes Trump the second president in U.S. history to serve two non-consecutive terms, following Grover Cleveland.
## Electoral Vote Breakdown| Candidate | Electoral Votes ||---------------------|-----------------|| Donald Trump (R) | 295 || Kamala Harris (D) | 226 |*Note: Some states were still uncalled at the time of reporting.*## Key Battleground States
Trump secured victories in several pivotal swing states:
-**Georgia (16 electoral votes)**-**North Carolina (16 electoral votes)**-**Pennsylvania (19 electoral votes)**-**Wisconsin (10 electoral votes)**
These wins were instrumental in surpassing the 270 electoral votes required for victory.
## Popular Vote
While the exact national popular vote totals were still being finalized, early reports indicated a closely contested race, with both candidates receiving substantial support across the country.
## Congressional Races### Senate
Republicans regained control, achieving at least 52 seats.
### House of Representatives
Control remained undecided, with several competitive races still pending at the time of reporting.
## Notable Outcomes### Historic Return
Donald Trump's victory marks a significant political comeback, making him the first former president since 1892 to be re-elected after a defeat.
### Democratic Response
Vice President Harris delivered a concession speech, urging her supporters to "not despair" and emphasizing the importance of continued political engagement.
### International Reactions
World leaders, including Egypt's president and Israel's prime minister, extended congratulations to President-elect Trump, expressing willingness to cooperate with the incoming administration.
## Conclusion
The 2024 election results signify a notable shift in the U.S. political landscape, with Donald Trump's return to the presidency and the Republican Party regaining control of the Senate. The final outcomes of the House races will further define the balance of power in the federal government.
The MarkdownChunker produces the following chunks:
Result
# Chunk 1
2024 U.S. Presidential Election Outcome
The 2024 U.S. presidential election concluded with former President Donald Trump defeating Vice President Kamala Harris, marking a historic return to the White House. This victory makes Trump the second president in U.S. history to serve two non-consecutive terms, following Grover Cleveland.
# Chunk 2
2024 U.S. Presidential Election Outcome
| Candidate | Electoral Votes |
|---------------------|-----------------|
| Donald Trump (R) | 295 |
| Kamala Harris (D) | 226 |
*Note: Some states were still uncalled at the time of reporting.*
# Chunk 3
2024 U.S. Presidential Election Outcome
Trump secured victories in several pivotal swing states:
- **Georgia (16 electoral votes)**
- **North Carolina (16 electoral votes)**
- **Pennsylvania (19 electoral votes)**
- **Wisconsin (10 electoral votes)**
These wins were instrumental in surpassing the 270 electoral votes required for victory.
# Chunk 4
2024 U.S. Presidential Election Outcome
While the exact national popular vote totals were still being finalized, early reports indicated a closely contested race, with both candidates receiving substantial support across the country.
# Chunk 5
2024 U.S. Presidential Election Outcome
Republicans regained control, achieving at least 52 seats.
# Chunk 6
2024 U.S. Presidential Election Outcome
Control remained undecided, with several competitive races still pending at the time of reporting.
# And so on...
Would expect:
Expected
# Chunk 1
2024 U.S. Presidential Election Outcome
Overview
The 2024 U.S. presidential election concluded with former President Donald Trump defeating Vice President Kamala Harris, marking a historic return to the White House. This victory makes Trump the second president in U.S. history to serve two non-consecutive terms, following Grover Cleveland.
# Chunk 2
2024 U.S. Presidential Election Outcome
Electoral Vote Breakdown
| Candidate | Electoral Votes |
|---------------------|-----------------|
| Donald Trump (R) | 295 |
| Kamala Harris (D) | 226 |
*Note: Some states were still uncalled at the time of reporting.*
# Chunk 3
2024 U.S. Presidential Election Outcome
Key Battleground States
Trump secured victories in several pivotal swing states:
- **Georgia (16 electoral votes)**
- **North Carolina (16 electoral votes)**
- **Pennsylvania (19 electoral votes)**
- **Wisconsin (10 electoral votes)**
These wins were instrumental in surpassing the 270 electoral votes required for victory.
# Chunk 4
2024 U.S. Presidential Election Outcome
Popular Vote
While the exact national popular vote totals were still being finalized, early reports indicated a closely contested race, with both candidates receiving substantial support across the country.
# Chunk 5
2024 U.S. Presidential Election Outcome
Congressional Races
Senate
Republicans regained control, achieving at least 52 seats.
# Chunk 6
2024 U.S. Presidential Election Outcome
Congressional Races
Senate
Control remained undecided, with several competitive races still pending at the time of reporting.
# And so on...
Additional context
The text was updated successfully, but these errors were encountered:
alexchao
added a commit
to alexchao/Verba
that referenced
this issue
Nov 8, 2024
Description
The current chunking strategy in Verba for Markdown includes the header text in each chunk (i.e. repeating the header text across multiple chunks), but currently, it only includes the top-level header. Subsection headers are dropped. (An example is provided below in the repro steps.)
There are probably many ways to ensure that retrieval attends to information provided by headers, but assuming the current approach in the project is just to stick them into the chunk content, I take it the subheaders should be included too.
I should be able to cut a PR for this.
Installation
If you installed via pip, please specify the version:
Weaviate Deployment
Configuration
Reader:
Chunker:
MarkdownChunker
Embedder:
Retriever:
Generator:
Steps to Reproduce
Can reproduce just through the Verba UI by importing a Markdown doc. I viewed the content of the chunks using the
/v1/objects
API. I also cloned the repo and ran the chunker manually:Given the following Markdown document:
Markdown document with subsections
The
MarkdownChunker
produces the following chunks:Result
Would expect:
Expected
Additional context
The text was updated successfully, but these errors were encountered: