Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pinecone-client to 3.0+ #37158

Closed
1 task done
Taragolis opened this issue Feb 3, 2024 · 5 comments · Fixed by #37307
Closed
1 task done

Support pinecone-client to 3.0+ #37158

Taragolis opened this issue Feb 3, 2024 · 5 comments · Fixed by #37307
Assignees
Labels
area:providers good first issue kind:task A task that needs to be completed as part of a larger issue provider:pinecone

Comments

@Taragolis
Copy link
Contributor

Body

Originally pinecone-client limited to <3.0 in #36818 because it not compatible with current version of provider.

We need to have eventually support of SDK v3, the Guidance which might help to migrate Pinecone Provider to this version of client: https://canyon-quilt-082.notion.site/Pinecone-Python-SDK-v3-0-0-Migration-Guide-056d3897d7634bf7be399676a4757c7b

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.
@Taragolis Taragolis added area:providers good first issue kind:meta High-level information important to the community provider:pinecone labels Feb 3, 2024
@rawwar
Copy link
Collaborator

rawwar commented Feb 7, 2024

I will be working on this. Thanks

@rawwar
Copy link
Collaborator

rawwar commented Feb 15, 2024

UPDATE: I will be going with approach 2 and give examples in the documentation. If you do have feedback, please comment. Thanks!

In the 3.x client, creating index signature has changed. Now, they use podSpec or serverlessSpec classes to pass configurations w.r.t infra. I've been thinking how to rewrite the hook and am split between two ways and need help in finalizing which one to go with.

Approach 1:

will have two methods in PineconeHook namely create_pod_index and create_serverless_index which will take all possible parameters w.r.t index and the spec and internally will create the Spec object and pass it to the create_index method

method will be as follows for the pod based index creation

def create_pod_index(
        self,
        index_name: str,
        dimension: int,
        environment: str,
        metric: str | None = "cosine",
        replicas: int | None = 1,
        shards: int | None = 1,
        pods: int | None = 1,
        pod_type: str | None = "p1.x1",
        metadata_config: dict[str, str] | None = None,
        source_collection: str | None = "",
        timeout: int | None = None,
    ) -> None:
        pod_spec = PodSpec(
            environment=environment,
            replicas=replicas,
            shards=shards,
            pods=pods,
            pod_type=pod_type,
            metadata_config=metadata_config,
            source_collection=source_collection,
        )
        
        self.conn.create_index(
            name=index_name,
            dimension=dimension,
            spec=pod_spec
            metric=metric,
            timeout=timeout,
        )

Approach 2:

Will have helper methods to create PodSpec and ServerlessSpec objects and then have the create_index method follow the pinecone signature and accept only one of these two types.

def get_pod_spec_obj(self,environment, replicas, shards, pods, pod_type, metadata_config, source_collection):
        return PodSpec(
            environment=environment,
            replicas=replicas,
            shards=shards,
            pods=pods,
            pod_type=pod_type,
            metadata_config=metadata_config,
            source_collection=source_collection,
        )

def create_index(self, name ,dimension,spec,metric,timeout):
    return self.conn.create_index(name=name,dimension=dimension,spec=spec,metric=metric,timeout=timeout)

@Taragolis
Copy link
Contributor Author

@rawwar difficult to say which approach are better. I can't find any usage into the operators, so assume this might only use into the custom operators or taskflow operators, so maybe create method create_index_from_spec?

@rawwar

This comment was marked as outdated.

@rawwar rawwar removed their assignment Feb 22, 2024
@rawwar
Copy link
Collaborator

rawwar commented Apr 12, 2024

I am picking this back up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers good first issue kind:task A task that needs to be completed as part of a larger issue provider:pinecone
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants