OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Need to upsert vectors in pinecone DB in Next JS

  • Thread starter Thread starter InfernalCoder
  • Start date Start date
I

InfernalCoder

Guest
So I need help pushing vectors into the db, currently something is wrong with the await function here, I am getting an error.

Here is my code for the pinecone.ts file that uploads to the db:

Code:
import {
  Index,
  Pinecone,
  PineconeRecord,
  RecordMetadata,
} from "@pinecone-database/pinecone";
import { downloadFromS3 } from "./s3.server";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import {
  Document,
  RecursiveCharacterTextSplitter,
} from "@pinecone-database/doc-splitter";
import { getEmbeddings } from "./embeddings";
import { Vector } from "@pinecone-database/pinecone/dist/pinecone-generated-ts-fetch";
import md5 from "md5";
import { convertToAscii } from "./utils";

let pinecone: Pinecone | null = null;
const api = process.env.PINECONE_API_KEY || "";

export const getPineconeClient = () => {
  if (!pinecone) {
    pinecone = new Pinecone({
      apiKey: api,
    });
  }
  return pinecone;
};

type PDFPage = {
  pageContent: string;
  metadata: {
    loc: { pageNumber: number };
  };
};

export async function loadS3IntoPinecone(fileKey: string) {
  try {
    // 1. Obtain the PDF
    console.log("Downloading PDF from S3...");
    const file_name = await downloadFromS3(fileKey);

    if (!file_name) {
      throw new Error("File not found");
    }

    // 2. Download and read PDF
    console.log("Reading PDF...");
    const loader = new PDFLoader(file_name as string);
    const pages = (await loader.load()) as PDFPage[];

    // 3. Split and segment the PDF
    console.log("Splitting PDF...");
    const documents = await Promise.all(pages.map(prepareDocument));

    // 4. Vectorize and embed individual docs
    console.log("Embedding documents...");
    const vectors = await Promise.all(documents.flat().map(embedDocument));

    // 5. Upload to Pinecone
    const client = await getPineconeClient();
    const pineconeIndex = client.Index("teachtalk");

    console.log("inserting vectors into pinecone");
    
    
    const namespace = convertToAscii(fileKey);

    // Push vectors to Pinecone index
    await pineconeIndex.upsert({
      vectors: vectors as PineconeRecord<RecordMetadata>[],
      namespace: namespace,
    });

    console.log("Upload complete");
  } catch (error) {
    console.error("Error in loadS3IntoPinecone", error);
  }
}

async function embedDocument(doc: Document): Promise<Vector> {
  try {
    const embeddings = await getEmbeddings(doc.pageContent);
    const hash = md5(doc.pageContent);

    return {
      id: hash,
      values: embeddings,
      metadata: {
        text: doc.metadata.text,
        pageNumber: doc.metadata.pageNumber,
      },
    } as PineconeRecord;
  } catch (error) {
    console.error("Error in embedding document", error);
    throw error;  // Ensure errors are propagated
  }
}

// Converts to bytes then to a string
export const truncateStringByBytes = (str: string, bytes: number) => {
  const enc = new TextEncoder();
  return new TextDecoder("utf-8").decode(enc.encode(str).slice(0, bytes));
};

async function prepareDocument(page: PDFPage): Promise<Document[]> {
  let { pageContent, metadata } = page;

  // Replace empty line with space
  pageContent = pageContent.replace(/\n/g, "");

  // Split the docs
  const splitter = new RecursiveCharacterTextSplitter();
  const docs = await splitter.splitDocuments([
    new Document({
      pageContent,
      metadata: {
        pageNumber: metadata.loc.pageNumber,
        text: truncateStringByBytes(pageContent, 36000),
      },
    }),
  ]);

  return docs;
}

This is the error I'm getting right now:

Code:
Object literal may only specify known properties, and 'vectors' does not exist in type                       'PineconeRecord<RecordMetadata>[]'.

I tried basically everything from gpt to docs, I can't find how to properly upsert the vectors into the db in batched, would grately appreciate the help

<p>So I need help pushing vectors into the db, currently something is wrong with the await function here, I am getting an error.</p>
<p>Here is my code for the pinecone.ts file that uploads to the db:</p>
<pre><code>import {
Index,
Pinecone,
PineconeRecord,
RecordMetadata,
} from "@pinecone-database/pinecone";
import { downloadFromS3 } from "./s3.server";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import {
Document,
RecursiveCharacterTextSplitter,
} from "@pinecone-database/doc-splitter";
import { getEmbeddings } from "./embeddings";
import { Vector } from "@pinecone-database/pinecone/dist/pinecone-generated-ts-fetch";
import md5 from "md5";
import { convertToAscii } from "./utils";

let pinecone: Pinecone | null = null;
const api = process.env.PINECONE_API_KEY || "";

export const getPineconeClient = () => {
if (!pinecone) {
pinecone = new Pinecone({
apiKey: api,
});
}
return pinecone;
};

type PDFPage = {
pageContent: string;
metadata: {
loc: { pageNumber: number };
};
};

export async function loadS3IntoPinecone(fileKey: string) {
try {
// 1. Obtain the PDF
console.log("Downloading PDF from S3...");
const file_name = await downloadFromS3(fileKey);

if (!file_name) {
throw new Error("File not found");
}

// 2. Download and read PDF
console.log("Reading PDF...");
const loader = new PDFLoader(file_name as string);
const pages = (await loader.load()) as PDFPage[];

// 3. Split and segment the PDF
console.log("Splitting PDF...");
const documents = await Promise.all(pages.map(prepareDocument));

// 4. Vectorize and embed individual docs
console.log("Embedding documents...");
const vectors = await Promise.all(documents.flat().map(embedDocument));

// 5. Upload to Pinecone
const client = await getPineconeClient();
const pineconeIndex = client.Index("teachtalk");

console.log("inserting vectors into pinecone");


const namespace = convertToAscii(fileKey);

// Push vectors to Pinecone index
await pineconeIndex.upsert({
vectors: vectors as PineconeRecord<RecordMetadata>[],
namespace: namespace,
});

console.log("Upload complete");
} catch (error) {
console.error("Error in loadS3IntoPinecone", error);
}
}

async function embedDocument(doc: Document): Promise<Vector> {
try {
const embeddings = await getEmbeddings(doc.pageContent);
const hash = md5(doc.pageContent);

return {
id: hash,
values: embeddings,
metadata: {
text: doc.metadata.text,
pageNumber: doc.metadata.pageNumber,
},
} as PineconeRecord;
} catch (error) {
console.error("Error in embedding document", error);
throw error; // Ensure errors are propagated
}
}

// Converts to bytes then to a string
export const truncateStringByBytes = (str: string, bytes: number) => {
const enc = new TextEncoder();
return new TextDecoder("utf-8").decode(enc.encode(str).slice(0, bytes));
};

async function prepareDocument(page: PDFPage): Promise<Document[]> {
let { pageContent, metadata } = page;

// Replace empty line with space
pageContent = pageContent.replace(/\n/g, "");

// Split the docs
const splitter = new RecursiveCharacterTextSplitter();
const docs = await splitter.splitDocuments([
new Document({
pageContent,
metadata: {
pageNumber: metadata.loc.pageNumber,
text: truncateStringByBytes(pageContent, 36000),
},
}),
]);

return docs;
}
</code></pre>
<p>This is the error I'm getting right now:</p>
<pre><code>Object literal may only specify known properties, and 'vectors' does not exist in type 'PineconeRecord<RecordMetadata>[]'.
</code></pre>
<p>I tried basically everything from gpt to docs, I can't find how to properly upsert the vectors into the db in batched, would grately appreciate the help</p>
 

Online statistics

Members online
0
Guests online
4
Total visitors
4
Top