Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community: Updated Firecrawl Document Loader to v1 #26548

Merged
merged 11 commits into from
Oct 15, 2024

Conversation

rafaelsideguide
Copy link
Contributor

This PR updates the Firecrawl Document Loader to use the recently released V1 API of Firecrawl.

Key Updates:

Firecrawl V1 Integration: Updated the document loader to leverage the new Firecrawl V1 API for improved performance, reliability, and developer experience.

Map Functionality Added: Introduced the map mode for more flexible document loading options.

These updates enhance the integration and provide access to the latest features of Firecrawl.

Copy link

vercel bot commented Sep 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 15, 2024 1:13pm

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder labels Sep 16, 2024
Copy link
Member

@efriis efriis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey there! This is a breaking change. Could you keep around (but deprecate use of) the old input parameters?

@roopeshuniyal
Copy link

roopeshuniyal commented Oct 1, 2024

I installed Firecrawl using "pipenv install firecrawl-py" and pipfile.lock shows the version": "==1.2.4"
seems like the latest version but getting the following error -

AttributeError: 'str' object has no attribute 'get'

@calebpeffer
Copy link

I installed Firecrawl using "pipenv install firecrawl-py" and pipfile.lock shows the version": "==1.2.4" seems like the latest version but getting the following error -

AttributeError: 'str' object has no attribute 'get'

This is probably because this hasn't been merged yet!

@rafaelsideguide is there anything else we need to do to get this through?

@rafaelsideguide
Copy link
Contributor Author

I installed Firecrawl using "pipenv install firecrawl-py" and pipfile.lock shows the version": "==1.2.4" seems like the latest version but getting the following error -
AttributeError: 'str' object has no attribute 'get'

This is probably because this hasn't been merged yet!

@rafaelsideguide is there anything else we need to do to get this through?

Hey @calebpeffer, I'm currently updating the PR to ensure it's not a breaking change, as requested by @efriis. I'll be pushing the updates in a few hours.

@rafaelsideguide
Copy link
Contributor Author

Hey @efriis! I updated the PR with the requested changes. Could you take a look? Thank you!

def __init__(
self,
url: str,
*,
api_key: Optional[str] = None,
api_url: Optional[str] = None,
mode: Literal["crawl", "scrape"] = "crawl",
mode: Literal["crawl", "scrape", "map"] = "crawl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just confirming this will be a breaking change to anyone passing api_url. It's ok with me if it's ok on the firecrawl side!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's not good. I just reverted that. @efriis Thank you!

if not url:
raise ValueError("Url must be provided")

api_key = api_key or get_from_env("api_key", "FIREWALL_API_KEY")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix FIREWALL_API_KEY to FIRECRAWL_API_KEY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

raise ValueError("Url must be provided")

api_key = api_key or get_from_env("api_key", "FIREWALL_API_KEY")
self.firecrawl = FirecrawlApp(api_key=api_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-add api_url

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@nickscamara
Copy link
Contributor

@rafaelsideguide can you fix those please? Thank you!

@rafaelsideguide
Copy link
Contributor Author

@nickscamara done

@nickscamara
Copy link
Contributor

nickscamara commented Oct 9, 2024

Sweet thanks! I think this is all good to merge!

@emarco177
Copy link
Contributor

Hi! Any news with this ? :)

@rafaelsideguide
Copy link
Contributor Author

Hey @efriis, could you run the GitHub workflows when you have a chance? I’d like to check if there are any issues. Thank you!

@efriis efriis enabled auto-merge (squash) October 15, 2024 13:10
@efriis efriis merged commit fc14f67 into langchain-ai:master Oct 15, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants