Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a LlamaPack that implements LongRAG #14916

Merged
merged 36 commits into from
Jul 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
3c3bcea
add no_ssl option to Trafilatura web loader
jonathanhliu21 Jun 21, 2024
0e0bdd5
vbump
nerdai Jun 22, 2024
ebb2e97
Adds GoogleChatReader
jonathanhliu21 Jun 25, 2024
9dad03e
Merge branch 'main' of github.com:jonathanhliu21/llama_index
jonathanhliu21 Jun 25, 2024
07256a3
sync upstream
jonathanhliu21 Jun 25, 2024
ff252cf
add readme to google chat
jonathanhliu21 Jun 25, 2024
dc41cbc
Fixes KeyError when getting quoted message without a reference, remov…
jonathanhliu21 Jun 26, 2024
82bc222
Adds Google Chat README
jonathanhliu21 Jun 26, 2024
65d62ed
Adds Google Chat demo notebook
jonathanhliu21 Jun 26, 2024
7e35ab2
Merge branch 'run-llama:main' into main
jonathanhliu21 Jun 26, 2024
e268f65
vbump
jonathanhliu21 Jun 26, 2024
b321eb6
adds comments to code
jonathanhliu21 Jun 26, 2024
a667689
Merge branch 'run-llama:main' into main
jonathanhliu21 Jul 12, 2024
48e092d
adds quantization configuration to QdrantVectorStore
jonathanhliu21 Jul 12, 2024
48b782c
Merge branch 'run-llama:main' into main
jonathanhliu21 Jul 13, 2024
f7e8788
vbump
jonathanhliu21 Jul 13, 2024
6d51acc
Merge branch 'run-llama:main' into main
jonathanhliu21 Jul 15, 2024
07daed3
implements delete_nodes() and clear() for pinecone
jonathanhliu21 Jul 15, 2024
9de4bf6
Merge branch 'main' of github.com:jonathanhliu21/llama_index
jonathanhliu21 Jul 15, 2024
42ff701
adds postgres delete_nodes() and clear() implementations, adds tests
jonathanhliu21 Jul 16, 2024
99a77be
Merge branch 'run-llama:main' into main
jonathanhliu21 Jul 16, 2024
6d6151b
adds delete_nodes() and clear() methods to MilvusVectorStore
jonathanhliu21 Jul 16, 2024
dc1d2d6
Implements delete_nodes() and clear() for OpenSearchVectorStore
jonathanhliu21 Jul 17, 2024
6d29849
implements delete_nodes() and clear() for WeaviateVectorStore
jonathanhliu21 Jul 17, 2024
13832ad
Merge branch 'run-llama:main' into main
jonathanhliu21 Jul 17, 2024
0571317
vbump
jonathanhliu21 Jul 17, 2024
a36be62
remove print statement
jonathanhliu21 Jul 17, 2024
908ff5d
adds adelete_nodes() and aclear() to PGVectorStore
jonathanhliu21 Jul 19, 2024
f4a6cbe
Merge branch 'run-llama:main' into main
jonathanhliu21 Jul 19, 2024
f7f2452
implements long rag pack
jonathanhliu21 Jul 23, 2024
6721a69
adds notebook example and README
jonathanhliu21 Jul 23, 2024
6569946
Merge branch 'run-llama:main' into main
jonathanhliu21 Jul 23, 2024
9c0ac19
change BUILD files
jonathanhliu21 Jul 23, 2024
9f27dda
add requirements file
jonathanhliu21 Jul 23, 2024
6169791
adds BUILD to tests folder
jonathanhliu21 Jul 23, 2024
6b13ce0
store small nodes in VectorStoreIndex, simplify code
jonathanhliu21 Jul 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 153 additions & 0 deletions llama-index-packs/llama-index-packs-longrag/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
llama_index/_static
.DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
bin/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
etc/
include/
lib/
lib64/
parts/
sdist/
share/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
.ruff_cache

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints
notebooks/

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
pyvenv.cfg

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# Jetbrains
.idea
modules/
*.swp

# VsCode
.vscode

# pipenv
Pipfile
Pipfile.lock

# pyright
pyrightconfig.json
7 changes: 7 additions & 0 deletions llama-index-packs/llama-index-packs-longrag/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
poetry_requirements(
name="poetry",
)

python_requirements(
name="reqs",
)
17 changes: 17 additions & 0 deletions llama-index-packs/llama-index-packs-longrag/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
GIT_ROOT ?= $(shell git rev-parse --show-toplevel)

help: ## Show all Makefile targets.
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[33m%-30s\033[0m %s\n", $$1, $$2}'

format: ## Run code autoformatters (black).
pre-commit install
git ls-files | xargs pre-commit run black --files

lint: ## Run linters: pre-commit (black, ruff, codespell) and mypy
pre-commit install && git ls-files | xargs pre-commit run --show-diff-on-failure --files

test: ## Run tests via pytest.
pytest tests

watch-docs: ## Build and watch documentation.
sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/llama_index/
31 changes: 31 additions & 0 deletions llama-index-packs/llama-index-packs-longrag/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# LlamaIndex Packs Integration: LongRAG

This LlamaPack implements LongRAG based on [this paper](https://arxiv.org/pdf/2406.15319).

LongRAG retrieves large tokens at a time, with each retrieval unit being ~6k tokens long, consisting of entire documents or groups of documents. This contrasts the short retrieval units (100 word passages) of traditional RAG. LongRAG is advantageous because results can be achieved using only the top 4-8 retrieval units, and long-context LLMs can better understand the context of the documents because long retrieval units preserve their semantic integrity.

## Installation

```
# installation
pip install llama-index-packs-longrag

# source code
llamaindex-cli download-llamapack LongRAGPack --download-dir ./longrag_pack
```

## Code Usage

```py
from llama_index.packs.longrag import LongRAGPack
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI("gpt-4o")

pack = LongRAGPack(data_dir="./data")

query_str = "How can Pittsburgh become a startup hub, and what are the two types of moderates?"
res = pack.run(query_str)
print(str(res))
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Being a Noob

January 2020

When I was young, I thought old people had everything figured out. Now that I'm old, I know this isn't true.

I constantly feel like a noob. It seems like I'm always talking to some startup working in a new field I know nothing about, or reading a book about a topic I don't understand well enough, or visiting some new country where I don't know how things work.

It's not pleasant to feel like a noob. And the word "noob" is certainly not a compliment. And yet today I realized something encouraging about being a noob: the more of a noob you are locally, the less of a noob you are globally.

For example, if you stay in your home country, you'll feel less of a noob than if you move to Farawavia, where everything works differently. And yet you'll know more if you move. So the feeling of being a noob is inversely correlated with actual ignorance.

But if the feeling of being a noob is good for us, why do we dislike it? What evolutionary purpose could such an aversion serve?

I think the answer is that there are two sources of feeling like a noob: being stupid, and doing something novel. Our dislike of feeling like a noob is our brain telling us "Come on, come on, figure this out." Which was the right thing to be thinking for most of human history. The life of hunter-gatherers was complex, but it didn't change as much as life does now. They didn't suddenly have to figure out what to do about cryptocurrency. So it made sense to be biased toward competence at existing problems over the discovery of new ones. It made sense for humans to dislike the feeling of being a noob, just as, in a world where food was scarce, it made sense for them to dislike the feeling of being hungry.

Now that too much food is more of a problem than too little, our dislike of feeling hungry leads us astray. And I think our dislike of feeling like a noob does too.

Though it feels unpleasant, and people will sometimes ridicule you for it, the more you feel like a noob, the better.
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
The Four Quadrants of Conformism

July 2020

One of the most revealing ways to classify people is by the degree and aggressiveness of their conformism. Imagine a Cartesian coordinate system whose horizontal axis runs from conventional-minded on the left to independent-minded on the right, and whose vertical axis runs from passive at the bottom to aggressive at the top. The resulting four quadrants define four types of people. Starting in the upper left and going counter-clockwise: aggressively conventional-minded, passively conventional-minded, passively independent-minded, and aggressively independent-minded.

I think that you'll find all four types in most societies, and that which quadrant people fall into depends more on their own personality than the beliefs prevalent in their society. [1]

Young children offer some of the best evidence for both points. Anyone who's been to primary school has seen the four types, and the fact that school rules are so arbitrary is strong evidence that which quadrant people fall into depends more on them than the rules.

The kids in the upper left quadrant, the aggressively conventional-minded ones, are the tattletales. They believe not only that rules must be obeyed, but that those who disobey them must be punished.

The kids in the lower left quadrant, the passively conventional-minded, are the sheep. They're careful to obey the rules, but when other kids break them, their impulse is to worry that those kids will be punished, not to ensure that they will.

The kids in the lower right quadrant, the passively independent-minded, are the dreamy ones. They don't care much about rules and probably aren't 100% sure what the rules even are.

And the kids in the upper right quadrant, the aggressively independent-minded, are the naughty ones. When they see a rule, their first impulse is to question it. Merely being told what to do makes them inclined to do the opposite.

When measuring conformism, of course, you have to say with respect to what, and this changes as kids get older. For younger kids it's the rules set by adults. But as kids get older, the source of rules becomes their peers. So a pack of teenagers who all flout school rules in the same way are not independent-minded; rather the opposite.

In adulthood we can recognize the four types by their distinctive calls, much as you could recognize four species of birds. The call of the aggressively conventional-minded is "Crush <outgroup>!" (It's rather alarming to see an exclamation point after a variable, but that's the whole problem with the aggressively conventional-minded.) The call of the passively conventional-minded is "What will the neighbors think?" The call of the passively independent-minded is "To each his own." And the call of the aggressively independent-minded is "Eppur si muove."

The four types are not equally common. There are more passive people than aggressive ones, and far more conventional-minded people than independent-minded ones. So the passively conventional-minded are the largest group, and the aggressively independent-minded the smallest.

Since one's quadrant depends more on one's personality than the nature of the rules, most people would occupy the same quadrant even if they'd grown up in a quite different society.

Princeton professor Robert George recently wrote:
I sometimes ask students what their position on slavery would have been had they been white and living in the South before abolition. Guess what? They all would have been abolitionists! They all would have bravely spoken out against slavery, and worked tirelessly against it.
He's too polite to say so, but of course they wouldn't. And indeed, our default assumption should not merely be that his students would, on average, have behaved the same way people did at the time, but that the ones who are aggressively conventional-minded today would have been aggressively conventional-minded then too. In other words, that they'd not only not have fought against slavery, but that they'd have been among its staunchest defenders.

I'm biased, I admit, but it seems to me that aggressively conventional-minded people are responsible for a disproportionate amount of the trouble in the world, and that a lot of the customs we've evolved since the Enlightenment have been designed to protect the rest of us from them. In particular, the retirement of the concept of heresy and its replacement by the principle of freely debating all sorts of different ideas, even ones that are currently considered unacceptable, without any punishment for those who try them out to see if they work. [2]

Why do the independent-minded need to be protected, though? Because they have all the new ideas. To be a successful scientist, for example, it's not enough just to be right. You have to be right when everyone else is wrong. Conventional-minded people can't do that. For similar reasons, all successful startup CEOs are not merely independent-minded, but aggressively so. So it's no coincidence that societies prosper only to the extent that they have customs for keeping the conventional-minded at bay. [3]

In the last few years, many of us have noticed that the customs protecting free inquiry have been weakened. Some say we're overreacting — that they haven't been weakened very much, or that they've been weakened in the service of a greater good. The latter I'll dispose of immediately. When the conventional-minded get the upper hand, they always say it's in the service of a greater good. It just happens to be a different, incompatible greater good each time.

As for the former worry, that the independent-minded are being oversensitive, and that free inquiry hasn't been shut down that much, you can't judge that unless you are yourself independent-minded. You can't know how much of the space of ideas is being lopped off unless you have them, and only the independent-minded have the ones at the edges. Precisely because of this, they tend to be very sensitive to changes in how freely one can explore ideas. They're the canaries in this coalmine.

The conventional-minded say, as they always do, that they don't want to shut down the discussion of all ideas, just the bad ones.

You'd think it would be obvious just from that sentence what a dangerous game they're playing. But I'll spell it out. There are two reasons why we need to be able to discuss even "bad" ideas.

The first is that any process for deciding which ideas to ban is bound to make mistakes. All the more so because no one intelligent wants to undertake that kind of work, so it ends up being done by the stupid. And when a process makes a lot of mistakes, you need to leave a margin for error. Which in this case means you need to ban fewer ideas than you'd like to. But that's hard for the aggressively conventional-minded to do, partly because they enjoy seeing people punished, as they have since they were children, and partly because they compete with one another. Enforcers of orthodoxy can't allow a borderline idea to exist, because that gives other enforcers an opportunity to one-up them in the moral purity department, and perhaps even to turn enforcer upon them. So instead of getting the margin for error we need, we get the opposite: a race to the bottom in which any idea that seems at all bannable ends up being banned. [4]

The second reason it's dangerous to ban the discussion of ideas is that ideas are more closely related than they look. Which means if you restrict the discussion of some topics, it doesn't only affect those topics. The restrictions propagate back into any topic that yields implications in the forbidden ones. And that is not an edge case. The best ideas do exactly that: they have consequences in fields far removed from their origins. Having ideas in a world where some ideas are banned is like playing soccer on a pitch that has a minefield in one corner. You don't just play the same game you would have, but on a different shaped pitch. You play a much more subdued game even on the ground that's safe.

In the past, the way the independent-minded protected themselves was to congregate in a handful of places — first in courts, and later in universities — where they could to some extent make their own rules. Places where people work with ideas tend to have customs protecting free inquiry, for the same reason wafer fabs have powerful air filters, or recording studios good sound insulation. For the last couple centuries at least, when the aggressively conventional-minded were on the rampage for whatever reason, universities were the safest places to be.

That may not work this time though, due to the unfortunate fact that the latest wave of intolerance began in universities. It began in the mid 1980s, and by 2000 seemed to have died down, but it has recently flared up again with the arrival of social media. This seems, unfortunately, to have been an own goal by Silicon Valley. Though the people who run Silicon Valley are almost all independent-minded, they've handed the aggressively conventional-minded a tool such as they could only have dreamed of.

On the other hand, perhaps the decline in the spirit of free inquiry within universities is as much the symptom of the departure of the independent-minded as the cause. People who would have become professors 50 years ago have other options now. Now they can become quants or start startups. You have to be independent-minded to succeed at either of those. If these people had been professors, they'd have put up a stiffer resistance on behalf of academic freedom. So perhaps the picture of the independent-minded fleeing declining universities is too gloomy. Perhaps the universities are declining because so many have already left. [5]

Though I've spent a lot of time thinking about this situation, I can't predict how it plays out. Could some universities reverse the current trend and remain places where the independent-minded want to congregate? Or will the independent-minded gradually abandon them? I worry a lot about what we might lose if that happened.

But I'm hopeful long term. The independent-minded are good at protecting themselves. If existing institutions are compromised, they'll create new ones. That may require some imagination. But imagination is, after all, their specialty.


Notes

[1] I realize of course that if people's personalities vary in any two ways, you can use them as axes and call the resulting four quadrants personality types. So what I'm really claiming is that the axes are orthogonal and that there's significant variation in both.

[2] The aggressively conventional-minded aren't responsible for all the trouble in the world. Another big source of trouble is the sort of charismatic leader who gains power by appealing to them. They become much more dangerous when such leaders emerge.

[3] I never worried about writing things that offended the conventional-minded when I was running Y Combinator. If YC were a cookie company, I'd have faced a difficult moral choice. Conventional-minded people eat cookies too. But they don't start successful startups. So if I deterred them from applying to YC, the only effect was to save us work reading applications.

[4] There has been progress in one area: the punishments for talking about banned ideas are less severe than in the past. There's little danger of being killed, at least in richer countries. The aggressively conventional-minded are mostly satisfied with getting people fired.

[5] Many professors are independent-minded — especially in math, the hard sciences, and engineering, where you have to be to succeed. But students are more representative of the general population, and thus mostly conventional-minded. So when professors and students are in conflict, it's not just a conflict between generations but also between different types of people.
Loading
Loading