国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Table of Contents
What is embedding?
Why choose PostgreSQL and pgvector?
App Overview
Prerequisites
Conclusion
Home Backend Development Golang Building a Semantic Search Engine with OpenAI, Go, and PostgreSQL (pgvector)

Building a Semantic Search Engine with OpenAI, Go, and PostgreSQL (pgvector)

Jan 15, 2025 am 11:09 AM

Building a Semantic Search Engine with OpenAI, Go, and PostgreSQL (pgvector)

In recent years, vector embeddings have become the foundation of modern natural language processing (NLP) and semantic search. Instead of relying on keyword searches, vector databases compare the "meaning" of text through numerical representations (embeddings). This example demonstrates how to create a semantic search engine using OpenAI embedding, Go, and PostgreSQL with the pgvector extension.

What is embedding?

Embedding is a vector representation of text (or other data) in a high-dimensional space. If two pieces of text are semantically similar, their vectors will be close to each other in this space. By storing embeddings in a database like PostgreSQL (with the pgvector extension), we can perform similarity searches quickly and accurately.

Why choose PostgreSQL and pgvector?

pgvector is a popular extension that adds vector data types to PostgreSQL. It enables you to:

  • Store embeddings as vector columns
  • Perform an approximate or exact nearest neighbor search
  • Run queries using standard SQL

App Overview

  1. Call OpenAI’s embedding API to convert input text into vector embeddings.
  2. Use the pgvector extension to store these embeddings in PostgreSQL.
  3. Query embeddings to find the most semantically similar entries in the database.

Prerequisites

  • Go installed (1.19 recommended).
  • PostgreSQL installed and running (local or hosted).
  • Install the pgvector extension in PostgreSQL. (See pgvector’s GitHub page for installation instructions.)
  • OpenAI API key with embedded access.

Makefile containing tasks related to postgres/pgvector and Docker for local testing.

pgvector:
    @docker run -d \
        --name pgvector \
        -e POSTGRES_USER=admin \
        -e POSTGRES_PASSWORD=admin \
        -e POSTGRES_DB=vectordb \
        -v pgvector_data:/var/lib/postgresql/data \
        -p 5432:5432 \
        pgvector/pgvector:pg17
psql:
    @psql -h localhost -U admin -d vectordb

Make sure pgvector is installed. Then, in your PostgreSQL database:

CREATE EXTENSION IF NOT EXISTS vector;

Full code

package main

import (
    "context"
    "fmt"
    "log"
    "os"
    "strings"

    "github.com/jackc/pgx/v5/pgxpool"
    "github.com/joho/godotenv"
    "github.com/sashabaranov/go-openai"
)

func floats32ToString(floats []float32) string {
    strVals := make([]string, len(floats))
    for i, val := range floats {
        // 將每個(gè)浮點(diǎn)數(shù)格式化為字符串
        strVals[i] = fmt.Sprintf("%f", val)
    }

    // 使用逗號(hào) + 空格連接它們
    joined := strings.Join(strVals, ", ")

    // pgvector 需要方括號(hào)表示法才能輸入向量,例如 [0.1, 0.2, 0.3]
    return "[" + joined + "]"
}

func main() {
    // 加載環(huán)境變量
    err := godotenv.Load()
    if err != nil {
        log.Fatal("加載 .env 文件出錯(cuò)")
    }

    // 創(chuàng)建連接池
    dbpool, err := pgxpool.New(context.Background(), os.Getenv("DATABASE_URL"))
    if err != nil {
        fmt.Fprintf(os.Stderr, "無(wú)法創(chuàng)建連接池:%v\n", err)
        os.Exit(1)
    }
    defer dbpool.Close()

    // 1. 確保已啟用 pgvector 擴(kuò)展
    _, err = dbpool.Exec(context.Background(), "CREATE EXTENSION IF NOT EXISTS vector;")
    if err != nil {
        log.Fatalf("創(chuàng)建擴(kuò)展失?。?v\n", err)
        os.Exit(1)
    }

    // 2. 創(chuàng)建表(如果不存在)
    createTableSQL := `
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        content TEXT,
        embedding vector(1536)
    );
    `
    _, err = dbpool.Exec(context.Background(), createTableSQL)
    if err != nil {
        log.Fatalf("創(chuàng)建表失?。?v\n", err)
    }

    // 3. 創(chuàng)建索引(如果不存在)
    createIndexSQL := `
    CREATE INDEX IF NOT EXISTS documents_embedding_idx
    ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
    `
    _, err = dbpool.Exec(context.Background(), createIndexSQL)
    if err != nil {
        log.Fatalf("創(chuàng)建索引失?。?v\n", err)
    }

    // 4. 初始化 OpenAI 客戶端
    apiKey := os.Getenv("OPENAI_API_KEY")
    if apiKey == "" {
        log.Fatal("未設(shè)置 OPENAI_API_KEY")
    }
    openaiClient := openai.NewClient(apiKey)

    // 5. 插入示例文檔
    docs := []string{
        "PostgreSQL 是一個(gè)先進(jìn)的開(kāi)源關(guān)系數(shù)據(jù)庫(kù)。",
        "OpenAI 提供基于 GPT 的模型來(lái)生成文本嵌入。",
        "pgvector 允許將嵌入存儲(chǔ)在 Postgres 數(shù)據(jù)庫(kù)中。",
    }

    for _, doc := range docs {
        err = insertDocument(context.Background(), dbpool, openaiClient, doc)
        if err != nil {
            log.Printf("插入文檔“%s”失?。?v\n", doc, err)
        }
    }

    // 6. 查詢相似性
    queryText := "如何在 Postgres 中存儲(chǔ)嵌入?"
    similarDocs, err := searchSimilarDocuments(context.Background(), dbpool, openaiClient, queryText, 5)
    if err != nil {
        log.Fatalf("搜索失?。?v\n", err)
    }

    fmt.Println("=== 最相似的文檔 ===")
    for _, doc := range similarDocs {
        fmt.Printf("- %s\n", doc)
    }
}

// insertDocument 使用 OpenAI API 為 `content` 生成嵌入,并將其插入 documents 表中。
func insertDocument(ctx context.Context, dbpool *pgxpool.Pool, client *openai.Client, content string) error {
    // 1) 從 OpenAI 獲取嵌入
    embedResp, err := client.CreateEmbeddings(ctx, openai.EmbeddingRequest{
        Model: openai.AdaEmbeddingV2, // "text-embedding-ada-002"
        Input: []string{content},
    })
    if err != nil {
        return fmt.Errorf("CreateEmbeddings API 調(diào)用失?。?w", err)
    }

    // 2) 將嵌入轉(zhuǎn)換為 pgvector 的方括號(hào)字符串
    embedding := embedResp.Data[0].Embedding // []float32
    embeddingStr := floats32ToString(embedding)

    // 3) 插入 PostgreSQL
    insertSQL := `
        INSERT INTO documents (content, embedding)
        VALUES (, ::vector)
    `
    _, err = dbpool.Exec(ctx, insertSQL, content, embeddingStr)
    if err != nil {
        return fmt.Errorf("插入文檔失?。?w", err)
    }

    return nil
}

// searchSimilarDocuments 獲取用戶查詢的嵌入,并根據(jù)向量相似性返回前 k 個(gè)相似的文檔。
func searchSimilarDocuments(ctx context.Context, pool *pgxpool.Pool, client *openai.Client, query string, k int) ([]string, error) {
    // 1) 通過(guò) OpenAI 獲取用戶查詢的嵌入
    embedResp, err := client.CreateEmbeddings(ctx, openai.EmbeddingRequest{
        Model: openai.AdaEmbeddingV2, // "text-embedding-ada-002"
        Input: []string{query},
    })
    if err != nil {
        return nil, fmt.Errorf("CreateEmbeddings API 調(diào)用失?。?w", err)
    }

    // 2) 將 OpenAI 嵌入轉(zhuǎn)換為 pgvector 的方括號(hào)字符串格式
    queryEmbedding := embedResp.Data[0].Embedding // []float32
    queryEmbeddingStr := floats32ToString(queryEmbedding)
    // 例如 "[0.123456, 0.789012, ...]"

    // 3) 構(gòu)建按向量相似性排序的 SELECT 語(yǔ)句
    selectSQL := fmt.Sprintf(`
        SELECT content
        FROM documents
        ORDER BY embedding <-> '%s'::vector
        LIMIT %d;
    `, queryEmbeddingStr, k)

    // 4) 運(yùn)行查詢
    rows, err := pool.Query(ctx, selectSQL)
    if err != nil {
        return nil, fmt.Errorf("查詢文檔失敗:%w", err)
    }
    defer rows.Close()

    // 5) 讀取匹配的文檔
    var contents []string
    for rows.Next() {
        var content string
        if err := rows.Scan(&content); err != nil {
            return nil, fmt.Errorf("掃描行失?。?w", err)
        }
        contents = append(contents, content)
    }
    if err = rows.Err(); err != nil {
        return nil, fmt.Errorf("行迭代錯(cuò)誤:%w", err)
    }

    return contents, nil
}

Conclusion

OpenAI embeddings in PostgreSQL, Go and pgvector provide a straightforward solution for building semantic search applications. By representing text as vectors and leveraging the power of database indexes, we move from traditional keyword-based searches to searching by context and meaning.

This revised output maintains the original language style, rephrases sentences for originality, and keeps the image in the same format and location. The code is also slightly improved for clarity and readability. The key changes include more descriptive variable names and comments.

The above is the detailed content of Building a Semantic Search Engine with OpenAI, Go, and PostgreSQL (pgvector). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What are the implications of Go's static linking by default? What are the implications of Go's static linking by default? Jun 19, 2025 am 01:08 AM

Go compiles the program into a standalone binary by default, the main reason is static linking. 1. Simpler deployment: no additional installation of dependency libraries, can be run directly across Linux distributions; 2. Larger binary size: Including all dependencies causes file size to increase, but can be optimized through building flags or compression tools; 3. Higher predictability and security: avoid risks brought about by changes in external library versions and enhance stability; 4. Limited operation flexibility: cannot hot update of shared libraries, and recompile and deployment are required to fix dependency vulnerabilities. These features make Go suitable for CLI tools, microservices and other scenarios, but trade-offs are needed in environments where storage is restricted or relies on centralized management.

How does Go ensure memory safety without manual memory management like in C? How does Go ensure memory safety without manual memory management like in C? Jun 19, 2025 am 01:11 AM

Goensuresmemorysafetywithoutmanualmanagementthroughautomaticgarbagecollection,nopointerarithmetic,safeconcurrency,andruntimechecks.First,Go’sgarbagecollectorautomaticallyreclaimsunusedmemory,preventingleaksanddanglingpointers.Second,itdisallowspointe

How do I create a buffered channel in Go? (e.g., make(chan int, 10)) How do I create a buffered channel in Go? (e.g., make(chan int, 10)) Jun 20, 2025 am 01:07 AM

To create a buffer channel in Go, just specify the capacity parameters in the make function. The buffer channel allows the sending operation to temporarily store data when there is no receiver, as long as the specified capacity is not exceeded. For example, ch:=make(chanint,10) creates a buffer channel that can store up to 10 integer values; unlike unbuffered channels, data will not be blocked immediately when sending, but the data will be temporarily stored in the buffer until it is taken away by the receiver; when using it, please note: 1. The capacity setting should be reasonable to avoid memory waste or frequent blocking; 2. The buffer needs to prevent memory problems from being accumulated indefinitely in the buffer; 3. The signal can be passed by the chanstruct{} type to save resources; common scenarios include controlling the number of concurrency, producer-consumer models and differentiation

How can you use Go for system programming tasks? How can you use Go for system programming tasks? Jun 19, 2025 am 01:10 AM

Go is ideal for system programming because it combines the performance of compiled languages ??such as C with the ease of use and security of modern languages. 1. In terms of file and directory operations, Go's os package supports creation, deletion, renaming and checking whether files and directories exist. Use os.ReadFile to read the entire file in one line of code, which is suitable for writing backup scripts or log processing tools; 2. In terms of process management, the exec.Command function of the os/exec package can execute external commands, capture output, set environment variables, redirect input and output flows, and control process life cycles, which are suitable for automation tools and deployment scripts; 3. In terms of network and concurrency, the net package supports TCP/UDP programming, DNS query and original sets.

How do I call a method on a struct instance in Go? How do I call a method on a struct instance in Go? Jun 24, 2025 pm 03:17 PM

In Go language, calling a structure method requires first defining the structure and the method that binds the receiver, and accessing it using a point number. After defining the structure Rectangle, the method can be declared through the value receiver or the pointer receiver; 1. Use the value receiver such as func(rRectangle)Area()int and directly call it through rect.Area(); 2. If you need to modify the structure, use the pointer receiver such as func(r*Rectangle)SetWidth(...), and Go will automatically handle the conversion of pointers and values; 3. When embedding the structure, the method of embedded structure will be improved, and it can be called directly through the outer structure; 4. Go does not need to force use getter/setter,

What are interfaces in Go, and how do I define them? What are interfaces in Go, and how do I define them? Jun 22, 2025 pm 03:41 PM

In Go, an interface is a type that defines behavior without specifying implementation. An interface consists of method signatures, and any type that implements these methods automatically satisfy the interface. For example, if you define a Speaker interface that contains the Speak() method, all types that implement the method can be considered Speaker. Interfaces are suitable for writing common functions, abstract implementation details, and using mock objects in testing. Defining an interface uses the interface keyword and lists method signatures, without explicitly declaring the type to implement the interface. Common use cases include logs, formatting, abstractions of different databases or services, and notification systems. For example, both Dog and Robot types can implement Speak methods and pass them to the same Anno

How do I use string functions from the strings package in Go? (e.g., len(), strings.Contains(), strings.Index(), strings.ReplaceAll()) How do I use string functions from the strings package in Go? (e.g., len(), strings.Contains(), strings.Index(), strings.ReplaceAll()) Jun 20, 2025 am 01:06 AM

In Go language, string operations are mainly implemented through strings package and built-in functions. 1.strings.Contains() is used to determine whether a string contains a substring and returns a Boolean value; 2.strings.Index() can find the location where the substring appears for the first time, and if it does not exist, it returns -1; 3.strings.ReplaceAll() can replace all matching substrings, and can also control the number of replacements through strings.Replace(); 4.len() function is used to obtain the length of the bytes of the string, but when processing Unicode, you need to pay attention to the difference between characters and bytes. These functions are often used in scenarios such as data filtering, text parsing, and string processing.

Strategies for Integrating Golang Services with Existing Python Infrastructure Strategies for Integrating Golang Services with Existing Python Infrastructure Jul 02, 2025 pm 04:39 PM

TointegrateGolangserviceswithexistingPythoninfrastructure,useRESTAPIsorgRPCforinter-servicecommunication,allowingGoandPythonappstointeractseamlesslythroughstandardizedprotocols.1.UseRESTAPIs(viaframeworkslikeGininGoandFlaskinPython)orgRPC(withProtoco

See all articles