How to create and query vector stores

info

Head to Integrations for documentation on built-in integrations with vectorstore providers.

Prerequisites

This guide assumes familiarity with the following concepts:

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

This walkthrough uses a basic, unoptimized implementation called MemoryVectorStore that stores embeddings in-memory and does an exact, linear search for the most similar embeddings. LangChain contains many built-in integrations - see this section for more, or the full list of integrations.

Creating a new index

Most of the time, you'll need to load and prepare the data you want to search over. Here's an example that loads a recent speech from a file:

import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
import { TextLoader } from "langchain/document_loaders/fs/text";

// Create docs with a loader
const loader = new TextLoader("src/document_loaders/example_data/example.txt");
const docs = await loader.load();

// Load the docs into the vector store
const vectorStore = await MemoryVectorStore.fromDocuments(
  docs,
  new OpenAIEmbeddings()
);

// Search for the most similar document
const resultOne = await vectorStore.similaritySearch("hello world", 1);

console.log(resultOne);

/*
  [
    Document {
      pageContent: "Hello world",
      metadata: { id: 2 }
    }
  ]
*/

API Reference:

MemoryVectorStore from langchain/vectorstores/memory
OpenAIEmbeddings from @langchain/openai
TextLoader from langchain/document_loaders/fs/text

Most of the time, you'll need to split the loaded text as a preparation step. See this section to learn more about text splitters.

Creating a new index from texts

If you have already prepared the data you want to search over, you can initialize a vector store directly from text chunks:

tip

See this section for general instructions on installing integration packages.

npm
Yarn
pnpm

npm install @langchain/openai

yarn add @langchain/openai

pnpm add @langchain/openai

import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";

const vectorStore = await MemoryVectorStore.fromTexts(
  ["Hello world", "Bye bye", "hello nice world"],
  [{ id: 2 }, { id: 1 }, { id: 3 }],
  new OpenAIEmbeddings()
);

const resultOne = await vectorStore.similaritySearch("hello world", 1);
console.log(resultOne);

/*
  [
    Document {
      pageContent: "Hello world",
      metadata: { id: 2 }
    }
  ]
*/

API Reference:

MemoryVectorStore from langchain/vectorstores/memory
OpenAIEmbeddings from @langchain/openai

Which one to pick?

Here's a quick guide to help you pick the right vector store for your use case:

If you're after something that can just run inside your Node.js application, in-memory, without any other servers to stand up, then go for HNSWLib, Faiss, LanceDB or CloseVector
If you're looking for something that can run in-memory in browser-like environments, then go for MemoryVectorStore or CloseVector
If you come from Python and you were looking for something similar to FAISS, try HNSWLib or Faiss
If you're looking for an open-source full-featured vector database that you can run locally in a docker container, then go for Chroma
If you're looking for an open-source vector database that offers low-latency, local embedding of documents and supports apps on the edge, then go for Zep
If you're looking for an open-source production-ready vector database that you can run locally (in a docker container) or hosted in the cloud, then go for Weaviate.
If you're using Supabase already then look at the Supabase vector store to use the same Postgres database for your embeddings too
If you're looking for a production-ready vector store you don't have to worry about hosting yourself, then go for Pinecone
If you are already utilizing SingleStore, or if you find yourself in need of a distributed, high-performance database, you might want to consider the SingleStore vector store.
If you are looking for an online MPP (Massively Parallel Processing) data warehousing service, you might want to consider the AnalyticDB vector store.
If you're in search of a cost-effective vector database that allows run vector search with SQL, look no further than MyScale.
If you're in search of a vector database that you can load from both the browser and server side, check out CloseVector. It's a vector database that aims to be cross-platform.
If you're looking for a scalable, open-source columnar database with excellent performance for analytical queries, then consider ClickHouse.

How to create and query vector stores

Creating a new index

API Reference:

Creating a new index from texts

API Reference:

Which one to pick?

Next steps

Was this page helpful?

You can leave detailed feedback on GitHub.

How to create and query vector stores

Creating a new index​

API Reference:

Creating a new index from texts​

API Reference:

Which one to pick?​

Next steps​

Was this page helpful?

You can leave detailed feedback on GitHub.

Creating a new index

Creating a new index from texts

Which one to pick?

Next steps