Developing a RAG chatbot on NextJS: Firebase Genkit and Firestore Tutorial

We are all adding various Generative AI workflows in our apps. A common problem encountered is to get LLMs to answer based on our defined dataset.

We may want to provide information like:

Our internal documents like PDFs or reports.
Data provided by the user like notes, forms, appointment history etc.
Or latest information like news articles, and wikipedia pages.

Building a RAG pipeline:

To solve this, we will incorporate a RAG pipeline that can perform a semantic search and extract most relevant data to answer user's query from a huge corpus of 1000s of documents.

For this article, we will pick up a website and create a chatbot answering questions over it's content.

Getting Help: Dash Agent for Firebase Genkit

Building with a new package is a difficult task that requires learning it's concepts first. To save our time reading the documentation of Firebase Genkit, we will use the @Firebase Genkit agent in CommandDash.

💡

Dash agents stay upto date with the latest docs, examples and github issues and assist us from the IDE.

We can install it from CommandDash marketplace in VSCode .

Now with the Firebase Genkit agent in place, we can directly ask it to assist us to build our usecase. Any interactions with the agent are quoted like below:

Hi Firebase Genkit! Let's start building 👋🏼

Getting Started

How to setup a Next.js project to use Firebase Genkit and Firestore

Here's a comprehensive guide on setting up a Next.js project to use Firebase Genkit with Firestore.

Prerequisites

Firebase Project: You need a Firebase project. If you don't have one, create one at https://console.firebase.google.com.
Node.js and npm: Ensure you have Node.js and npm installed on your machine.
Firebase CLI: Install the Firebase CLI using npm install -g firebase-tools.

Steps

Create a Next.js Project

 npx create-next-app@latest my-genkit-nextjs-app
 cd my-genkit-nextjs-app

Choose TypeScript as the language during setup.

Initialize Genkit
```
 npm i -g genkit
 genkit init
```
- Select Next.js as your deployment platform.
- Select your preferred model provider (e.g., Gemini, Vertex AI).
- Accept the defaults for other prompts.

This will set up your project with sample Genkit files.

Install the Firebase Plugin

 npm install --save @genkit-ai/firebase

Configure the Firebase Plugin

In yourgenkit.ts file:

  import { firebase } from '@genkit-ai/firebase';
  import { configureGenkit } from '@genkit-ai/flow'; 

  configureGenkit({
      plugins: [firebase({ projectId: 'your-firebase-project-id' })], // Replace 'your-firebase-project-id'
      logLevel: 'debug',
      enableTracingAndMetrics: true
  });

Set up Firestore

Initialize the Firebase Admin SDK in yourgenkit.ts file:

  import * as admin from 'firebase-admin';

  // ... other imports

  const app = admin.initializeApp({
    credential: admin.credential.cert('path/to/your/firebase-service-account-key.json'),
    });
  const firestore = getFirestore(app);

  // ... rest of your Genkit code

Replace path/to/your/firebase-service-account-key.json with the path to your Firebase service account key file. You can download this key file from the Firebase console (Project Settings > Service Accounts).

Populating Data in Firestore

For this article, we will pick the Firebase Genkit documentation and index it to our Firestore DB.

Indexing is converting text into embeddings which can them be used to rank text based on their semantic similarity. To make it easy, I've extracted the Genkit documentation into this JSON file.

Now let's ask the Dash Agent:

Write me a method that extracts data from the json file with schema [{title, description, content}] and saves the indexed data to the "documents" collection in firestore.

Chunk the data appropriately within the embedding model limits.

// app/genkit.ts
import { configureGenkit } from "@genkit-ai/core";
import { defineFirestoreRetriever, firebase } from "@genkit-ai/firebase";
import { googleAI } from "@genkit-ai/googleai";
import * as admin from 'firebase-admin';
import { getFirestore, FieldValue } from "firebase-admin/firestore";
import { textEmbeddingGecko001 } from "@genkit-ai/googleai";
import { embed } from "@genkit-ai/ai/embedder";
import * as fs from 'fs';

// your existing code
const firestore = getFirestore(app);

const indexConfig = {
  collection: "documents",
  contentField: "content",
  vectorField: "embedding",
  embedder: textEmbeddingGecko001,
};

async function indexWebsiteContent() {
  const filePath = 'your downloded file path';
  const data = JSON.parse(fs.readFileSync(filePath, 'utf-8'));

  for (const document of data) {
    // Handle embedding length limits
    const chunks = chunkText(document.content, 4500);
    for (const chunk of chunks) {
      const embedding = await embed({
        embedder: indexConfig.embedder,
        content: chunk,
      });

      await firestore.collection(indexConfig.collection).add({
        [indexConfig.vectorField]: FieldValue.vector(embedding),
        [indexConfig.contentField]: chunk,
        title: document.title,
        description: document.description,
      });

      console.log(`Indexed chunk of document: ${document.title}`);
    }
  }
}

function chunkText(text: string, chunkSize: number): string[] {
  const chunks: string[] = [];
  for (let i = 0; i < text.length; i += chunkSize) {
    chunks.push(text.slice(i, i + chunkSize));
  }
  return chunks;
}

Now, we will run the indexWebsiteContent as a one time API call to save it to Firestore for querying later.

// pages/api/index-website-content.ts
import { NextApiHandler, NextApiRequest, NextApiResponse } from 'next';
import { indexWebsiteContent } from '../../app/genkit';

const handler: NextApiHandler = async (req: NextApiRequest, res: NextApiResponse) => {
    try {
    await indexWebsiteContent();
    res.status(200).json({ message: 'Website content indexed successfully' });
    } catch (error) {
    console.error(error);
    res.status(500).json({ message: 'Error indexing website content' });
    }
};

export default handler;

Then we run:

npm run dev
curl http://localhost:3000/api/index-content

The Firestore DB is now populated 🙌🏼 with embeddings in vector format!

Asking questions on this data

Moving back to Firebase Genkit Agent:

Create a chatRequest method that retrieves the data from the collection and creates a model request to answer user's query.

Use top 3 documents with Gemini Pro model.

import { configureGenkit } from "@genkit-ai/core";
import { defineFirestoreRetriever, firebase } from "@genkit-ai/firebase";
import { geminiPro, googleAI } from "@genkit-ai/googleai";
import * as admin from 'firebase-admin';
import { getFirestore, FieldValue } from "firebase-admin/firestore";
import { textEmbeddingGecko001 } from "@genkit-ai/googleai";
import { embed } from "@genkit-ai/ai/embedder";
import { generate } from "@genkit-ai/ai";
import * as fs from 'fs';

/// your existing code

const firestoreRetriever = defineFirestoreRetriever(
  {
    name: 'firestoreRetriever',
    firestore: firestore,
    collection: indexConfig.collection,
    vectorField: indexConfig.vectorField,
    contentField: indexConfig.contentField,
    embedder: indexConfig.embedder,
  }
)
export async function indexWebsiteContent() {
    // no change here
}

export async function chatRequest(query: string) {
  const docs = await firestoreRetriever({
  query: {
    content: [
      {
        text: query
      }
    ]
  },
  options: { limit: 3 }, // Retrieve top 3 documents
  });


  let context = docs.documents.map(doc => doc.content[0].text).join('\n\n');

  const response = await generate({
  model: geminiPro,
  prompt: `
      You are a helpful and informative AI assistant.
      Answer the user's question.

      Question: ${query}
      Context: ${context}
  `,
  });

  return response.text();
}

// your existing code

Done! We got the entire from ready for us.

Now, we can first test the answerQuery method as an API as we did before.

import { NextApiHandler, NextApiRequest, NextApiResponse } from 'next';
import { chatRequest } from '../../app/genkit';

const handler: NextApiHandler = async (req: NextApiRequest, res: NextApiResponse) => {
    try {
        // Assuming the query is passed in the request body
        const { query } = req.body;

        // If no query is provided, send an error response
        if (!query) {
            return res.status(400).json({ message: 'Please provide a query' });
        }

        let result = await chatRequest(query);
        res.status(200).json({ message: result });
    } catch (error) {
        console.error(error);
        res.status(500).json({ message: 'Error processing chat request' });
    }
};

export default handler;

Running the methods returns an error regarding an indexing prerequisite issue.

Error: 9 FAILED_PRECONDITION: Missing vector index configuration. Please create the required index with the following gcloud command: gcloud alpha firestore indexes composite create --project={your project name} --collection-group=documents --query-scope=COLLECTION --field-config=vector-config='{"dimension":"768","flat": "{}"}',field-path=embedding

This indicates we need to first set the Vector Index configuration for the Firestore DB using gcloud CLI. Once the CLI is installed, run this command:

gcloud alpha firestore indexes composite create \
--project={project} 
--collection-group=documents 
--query-scope=COLLECTION 
--field-config=vector-config={"dimension":"768","flat": "{}"},
field-path=embedding

You can copy find values specific to your project from the error message itself. Once index is created, you get it's ID.

Now, we re-run the same API request again and this time it succeeds 🙌🏼

Building the UI

We will ask CommandDash only to create us a basic UI that calls the chatRequest method.

Use Attach Snippets to Dash in the menu bar after selecting the below snippets to provide them as context

[page.tsx] [genkit.ts] [pages/api/chat-request.ts] update my next JS page to have a textfield and button that calls that chat request method and shows the response in the text box.

// app/page.tsx
"use client";

import { useState } from "react";

export default function Home() {
    const [query, setQuery] = useState("");
    const [response, setResponse] = useState("");

    const handleSubmit = async (event: any) => {
    event.preventDefault();

    try {
        const result = await fetch("/api/chat-request", {
            method: "POST",
            headers: { "Content-Type": "application/json" },
            body: JSON.stringify({ query }),
        });

        const data = await result.json();
        setResponse(data.message);
    } catch (error) {
        console.error("Error:", error);
        setResponse("Error processing chat request.");
    }
    };

    return (
    <main className="flex min-h-screen flex-col items-center justify-between p-24">
        {/* ... Existing HTML content ... */}

        <div className="mt-16 w-full max-w-5xl">
        <form onSubmit={handleSubmit}>
            <div className="mb-4">
            <label htmlFor="query" className="block text-gray-700 text-sm font-bold mb-2">
                Your question:
            </label>
            <input
                type="text"
                id="query"
                className="shadow appearance-none border rounded w-full py-2 px-3 text-gray-700 leading-tight focus:outline-none focus:shadow-outline"
                value={query}
                onChange={(e) => setQuery(e.target.value)}
            />
            </div>
            <button
            type="submit"
            className="bg-blue-500 hover:bg-blue-700 text-white font-bold py-2 px-4 rounded focus:outline-none focus-shadow-outline"
            >
            Ask
            </button>
        </form>

        {response && (
            <div className="mt-4 p-4 border border-gray-300 rounded">
            <p>{response}</p>
            </div>
        )}
        </div>
    </main>
    );
}

Great, Let's run our Next app and see the results:

Congratulations! We've built our first RAG app with Genkit!

Now adapt this to your usecase and ship useful AI powered apps to your users.

Resources:

https://marketplace.visualstudio.com/items?itemName=WelltestedAI.fluttergpt