Building Related Products in Laravel 12 with pgvector & PostgreSQL

The majority of e-commerce “Related Products” sections rely on simple collaborative filtering, matching categories, overlapping tags, or strict, rule-based logic. Product relationships are frequently semantic rather than categorical, which makes this strategy ineffective.

Despite having different tags and categories, a canvas messenger bag and a leather laptop bag fulfil the same user need. They will never be connected by a rule-based system. Because vector search comprehends meaning instead of just matching labels, it does.

Using PostgreSQL, pgvector, and OpenAI embeddings, this tutorial demonstrates how to create a semantic “Related Products” feature in Laravel 12 with HNSW indexing to maintain speedy similarity queries at scale.

Why should you use PostgreSQL with pgvector?

You don’t need a separate vector database anymore because pgvector lets you run vector similarity searches right in your existing PostgreSQL database. Your relational data and embeddings stay in one place, which makes your architecture simple and queries quick.

With HNSW indexing built into pgvector, approximate nearest-neighbor searches stay quick even when your product catalogue grows to hundreds of thousands of rows. This is the best way to keep things simple at work.

Requirements

PHP 8.2+ and Laravel 12: Check that your environment meets the minimum version requirements.
PostgreSQL 15+: Needed for HNSW index support. Older versions don’t have the USING hnsw syntax.
To install and enable the pgvector Extension on your PostgreSQL instance, use the command CREATE EXTENSION IF NOT EXISTS vector.
OpenAI API Key: This is used with the Embeddings API to turn product text into vector embeddings.
Composer Packages: pgvector/pgvector for Eloquent vector support and openai-php/laravel for making embeddings.

Setting Up the Environment

Install Dependencies

Install the required Composer packages:

bash
# Vector support for Laravel
composer require pgvector/pgvector

# OpenAI PHP client for generating embeddings
composer require openai-php/laravel

Configure Environment Variables

Add the following to your .env file:

env
DB_CONNECTION=pgsql
DB_HOST=127.0.0.1
DB_PORT=5432
DB_DATABASE=your_database
DB_USERNAME=your_username
DB_PASSWORD=your_password

OPENAI_API_KEY=sk-your-openai-api-key

Enable the pgvector extension

Create a dedicated migration to enable the extension. This must run before any migration that adds a vector column:

bash
php artisan make:migration enable_pgvector_extension

php
<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Support\Facades\DB;

return new class extends Migration {
    public function up(): void
    {
        DB::statement('CREATE EXTENSION IF NOT EXISTS vector');
    }

    public function down(): void
    {
        DB::statement('DROP EXTENSION IF EXISTS vector');
    }
};

⚠ Verify the extension is available before running migrations: SELECT * FROM pg_available_extensions WHERE name = ‘vector’;

Step 1: Design the Database Schema

Add a vector column to the products table to store embeddings. The pgvector/pgvector package extends Laravel’s Blueprint with a vector() macro, so you can define it directly in a schema migration.

⚠ The dimension value (1536) must exactly match the output dimensions of your embedding model. text-embedding-ada-002 produces 1536 dimensions. Mismatched dimensions will cause storage and query failures.

php
<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration {
    public function up(): void
    {
        Schema::table('products', function (Blueprint $table) {
            // 1536 dimensions matches OpenAI text-embedding-ada-002
            $table->vector('embedding', 1536)->nullable();
        });
    }

    public function down(): void
    {
        Schema::table('products', function (Blueprint $table) {
            $table->dropColumn('embedding');
        });
    }
};

Step 2: Add HNSW Indexing for Performance

Without an index, similarity search requires a full table scan — O(n) — which does not scale past a few thousand rows. Create an approximate nearest-neighbor (ANN) index using HNSW:

sql
CREATE INDEX products_embedding_idx
    ON products
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

Index Parameters Explained

m — Maximum connections per layer in the HNSW graph. Higher values improve recall but increase memory consumption. Default: 16. Recommended range: 8–32.
ef_construction — Quality of the index build. Higher values produce a more accurate graph but take longer to construct. Default: 64. Recommended range: 64–200.

Tuning Recall at Query Time

You can change the recall/speed trade-off without rebuilding the index by setting hnsw.ef_search at query time:

sql
-- Higher value = better recall, slightly slower queries
SET hnsw.ef_search = 100;

After creating the index, update the planner statistics so PostgreSQL can generate optimal query plans:

sql
ANALYZE products;

Step 3: Create the EmbeddingService

Before generating any embeddings, create a dedicated EmbeddingService class. This encapsulates the OpenAI API call in one place and is the class referenced throughout the rest of this guide.

php
<?php

namespace App\Services;

use OpenAI\Laravel\Facades\OpenAI;

class EmbeddingService
{
    /**
     * Generate an embedding vector for the given text.
     * Returns a float array (e.g., 1536 floats for ada-002).
     *
     * @return float[]
     */
    public function generate(string $text): array
    {
        $response = OpenAI::embeddings()->create([
            'model' => 'text-embedding-ada-002',
            'input' => $text,
        ]);

        return $response->embeddings[0]->embedding;
    }
}

The method returns a plain float[] array. The Vector cast on the Product model (configured in Step 4) accepts a raw float array when assigning to $product->embedding, so no additional conversion is required.

Step 4: Generate and Queue Product Embeddings

Vector similarity only works when embeddings are generated consistently. Always combine the same product fields in the same order across the entire catalog.

Build the Input Text

php
// Combine the fields that best describe what the product IS and DOES.
// Be consistent — every product in the catalog must use the same fields.
$content = $product->name . ' ' . $product->description . ' ' . $product->category;

You may also include brand, features, or specifications. Consistency across all products is what matters most.

Always Queue Embedding Generation

Calling the OpenAI API is a network request. Generating embeddings inline during an HTTP request adds latency and ties up your server. Always dispatch embedding generation to a background queue job.

php
<?php

namespace App\Jobs;

use App\Models\Product;
use App\Services\EmbeddingService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class GenerateProductEmbedding implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    // Retry up to 3 times on failure, waiting 10 seconds between attempts.
    // This handles transient OpenAI API errors without manual intervention.
    public int $tries   = 3;
    public int $backoff = 10;

    public function __construct(
        private readonly Product $product
    ) {}

    public function handle(EmbeddingService $service): void
    {
        $content = strip_tags(
            $this->product->name . ' ' .
            $this->product->description . ' ' .
            $this->product->category
        );

        // Trim whitespace to prevent inconsistent embeddings.
        $content = trim($content);

        // Never send an empty string to the API — it wastes credits
        // and produces a semantically meaningless vector.
        if (blank($content)) {
            return;
        }

        // generate() returns float[] — the Vector cast on the model
        // handles serialization to pgvector's binary format automatically.
        $this->product->embedding = $service->generate($content);
        $this->product->save();
    }
}
Dispatch the job whenever a product is created:
php
dispatch(new GenerateProductEmbedding($product));

Step 5: Configure the Eloquent Model

Set up the Product model to handle vector columns correctly using pgvector/pgvector’s traits and casts. Without this, Laravel will mishandle the binary vector data.

php
<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Pgvector\Laravel\HasNeighbors;
use Pgvector\Laravel\Vector;
use InvalidArgumentException;

class Product extends Model
{
    use HasNeighbors;

    protected $casts = [
        'embedding' => Vector::class,
    ];

    /**
     * Order results by vector similarity to the given embedding.
     *
     * Supported operators:
     *   <=>  Cosine distance     (best for text embeddings)
     *   <->  Euclidean distance  (best for image/numeric vectors)
     *   <#>  Negative inner product (for unit-normalized vectors)
     *
     * IMPORTANT: Cast $vector to string for raw PDO bindings.
     * Passing the Vector object directly to orderByRaw may fail
     * silently depending on the PDO driver configuration.
     */
    public function scopeNearestTo($query, Vector $vector, string $operator = '<=>')
    {
        $allowedOperators = ['<=>', '<->', '<#>'];

        if (!in_array($operator, $allowedOperators)) {
            throw new InvalidArgumentException(
                "Invalid similarity operator '{$operator}'. " .
                "Allowed: " . implode(', ', $allowedOperators)
            );
        }

        // Cast to string to produce the literal format pgvector expects:
        // e.g.  [0.123,0.456,...,0.789]
        return $query->orderByRaw(
            "embedding {$operator} ?",
            [(string) $vector]
        );
    }
}

✔ scopeNearestTo now casts \$vector to string before passing it as a PDO binding. Passing the Vector object directly to orderByRaw may fail silently depending on how the PDO driver is configured.

With the model in place, the similarity query is clean and readable:

php
$related = Product::query()
    ->whereNot('id', $product->id)
    ->whereNotNull('embedding')
    ->nearestTo($product->embedding, '<=>')
    ->limit(8)
    ->get();

Choosing the Right Similarity Operator

<=> — Cosine distance. Measures the angle between vectors, not their magnitude. Best choice for text embeddings.
<-> — Euclidean (L2) distance. Measures absolute spatial distance. Best for dense numeric or image feature vectors.
<#> — Negative inner product. Maximizes dot-product similarity. Best for unit-normalized vectors.

For text embeddings from OpenAI, always use cosine distance (<=>). OpenAI normalizes its vectors to unit length, making angular similarity the most meaningful measure.

⚠ Never place orderByRaw calls directly in controllers. Route all similarity queries through the nearestTo() scope defined on the model, and wrap the full similarity logic in a ProductSimilarityService class to keep business logic out of the HTTP layer.

Step 7: Hybrid Filtering (Recommended for Production)

Pure vector similarity is semantically aware but business-unaware. A $2,000 laptop might be the closest vector match to a $20 phone case. Combine vector search with SQL filters for results that are both semantically relevant and commercially meaningful.

php
Product::query()
    ->where('category_id', $product->category_id)
    ->whereBetween('price', [$product->price * 0.5, $product->price * 2])
    ->where('in_stock', true)
    ->whereNotNull('embedding')
    ->nearestTo($product->embedding, '<=>')
    ->limit(8)
    ->get();

Think of It This Way

Vectors answer: “What is semantically similar?”
SQL filters answers: “What is actually relevant and purchasable for this customer?”

Filters Worth Layering

category_id: Keeps recommendations contextually relevant.
price range (whereBetween): Prevents recommending products far outside the customer’s budget.
in_stock: Ensures only purchasable products are shown.
brand: Useful for brand-affinity recommendations.

Architectural & Edge Case Considerations

1. Cold Start Problem

A null embedding means a new product cannot appear in any similarity query. The queued job architecture from Step 4 solves this automatically: GenerateProductEmbedding is dispatched immediately on product creation, the HTTP response returns without blocking, and the embedding is populated in the background.

2. Re-embedding on Content Changes

When a product’s name or description changes, its stored embedding becomes stale. Use a Laravel Observer to handle re-embedding automatically and keep this logic off the model:

bash
php artisan make:observer ProductObserver --model=Product

php
<?php

namespace App\Observers;

use App\Jobs\GenerateProductEmbedding;
use App\Models\Product;

class ProductObserver
{
    public function updated(Product $product): void
    {
        // Only re-embed when semantically relevant fields change.
        // Price or stock changes do NOT trigger a new embedding.
        if ($product->wasChanged(['name', 'description'])) {
            dispatch(new GenerateProductEmbedding($product));
        }
    }
}

Register the observer in the boot() method of AppServiceProvider:
php
<?php

namespace App\Providers;

use App\Models\Product;
use App\Observers\ProductObserver;
use Illuminate\Support\ServiceProvider;

class AppServiceProvider extends ServiceProvider
{
    public function boot(): void
    {
        Product::observe(ProductObserver::class);
    }
}

The full AppServiceProvider class is now shown; previously only the one-liner was shown, which was not copy-pasteable as a complete example.

3. Approximate vs. Exact Search

pgvector behaves differently depending on whether an HNSW index exists on the column:

Search Mode	Technique	Query Speed	Recall Accuracy	Scalable
Approximate	HNSW Index	Under 10 ms	95–99%	Yes
Exact	Sequential Scan	O(n) – Slower	100%	No

The HNSW index is not optional for production catalogs with more than a few thousand products. Real-world recall exceeds 95% with the index settings in this guide. Tune at query time with the following:

sql
-- Increase ef_search for higher recall (at a small speed cost)
SET hnsw.ef_search = 100;

4. Handling No Results (with Fallback)

Vector search doesn’t always return useful results. Short or poorly written product descriptions may yield neighbors that are semantically too distant to be meaningful. Always check the distance score and define a fallback:

php
<?php

namespace App\Services;

use App\Models\Product;
use Illuminate\Support\Collection;
use Pgvector\Laravel\Vector;

class ProductSimilarityService
{
    public function related(Product $product, int $limit = 8): Collection
    {
        // Select distance as an alias so we can filter and sort by it.
        // We cast to string for the raw PDO binding.
        $embeddingStr = (string) $product->embedding;

        $results = Product::query()
            ->whereNot('id', $product->id)
            ->whereNotNull('embedding')
            ->selectRaw('*, (embedding <=> ?) AS distance', [$embeddingStr])
            // Discard neighbors with cosine distance > 0.5.
            // Distance of 0.5 means the vectors are ~60° apart — too dissimilar.
            // For dense, well-written catalogs you can tighten this to 0.3.
            ->whereRaw('(embedding <=> ?) <= 0.5', [$embeddingStr])
            ->orderByRaw('distance')  // PostgreSQL allows ordering by SELECT aliases
            ->limit($limit)
            ->get();

        if ($results->isEmpty()) {
            return $this->popularFallback($limit);
        }

        return $results;
    }

    private function popularFallback(int $limit): Collection
    {
        return Product::query()
            ->whereNotNull('embedding')
            ->orderByDesc('sales_count')
            ->limit($limit)
            ->get();
    }
}

✔ Wrapped in ProductSimilarityService (not a bare controller method). Embed cast to string once and reuse it. orderByRaw(‘distance’) is used instead of orderBy() for clarity. Proper namespace and use statements added.

When Vector Search Shouldn’t Be Used

Vector similarity is useful, but the extra work that comes with managing embeddings, indexing HNSW, and doing lifecycle jobs isn’t always worth it.

Catalogues with less than 500 items

PostgreSQL can easily scan full tables with a few hundred rows. At this scale, simple queries based on categories or tags are faster to make, easier to keep up with, and work just fine.

Rule-Based Systems

A deterministic rule-based system is more predictable, easier to check, and easier to fix if your recommendations always have to follow strict business rules (always the same brand, always from the same subcategory).

Compliance and Auditable Outputs

Searches based on HNSW are not exact and are based on probabilities. The results can change depending on how the index is set up. A rule-based approach is the best choice if your system needs outputs that can be fully reproduced and explained for legal or compliance reasons.

Next Steps

Your vector infrastructure is now complete. The same embedding column and pgvector setup can power additional intelligent features with no further database changes.

1. Semantic Search

Let users search by meaning, not just keywords. Embed the search query and run a cosine similarity search against the products table:

php
<?php

use App\Services\EmbeddingService;
use App\Models\Product;
use Pgvector\Laravel\Vector;

// Convert the raw search string into a vector
$queryEmbedding = new Vector(
    app(EmbeddingService::class)->generate($request->input('q'))
);

$results = Product::query()
    ->whereNotNull('embedding')
    ->nearestTo($queryEmbedding, '<=>')
    ->limit(10)
    ->get();

A search for “something to carry my laptop to meetings” will now surface relevant products even if none of them contain those exact words.

2. Retrieval-Augmented Generation (RAG)

Your vector table can serve as the knowledge base for an AI product assistant. Retrieve the most semantically relevant products first, then pass them as context to a language model:

php
<?php

use App\Services\EmbeddingService;
use App\Models\Product;
use OpenAI\Laravel\Facades\OpenAI;
use Pgvector\Laravel\Vector;

// Embed the user's question
$queryEmbedding = new Vector(
    app(EmbeddingService::class)->generate($request->input('q'))
);

// Retrieve the 5 most semantically relevant products
$relevantProducts = Product::query()
    ->whereNotNull('embedding')
    ->nearestTo($queryEmbedding, '<=>')
    ->limit(5)
    ->get();

// Build a context string from the retrieved products
$context = $relevantProducts->pluck('description')->implode("\n");

// Send the context + question to the language model
$answer = OpenAI::chat()->create([
    'model'    => 'gpt-4',
    'messages' => [
        [
            'role'    => 'system',
            'content' => 'You are a helpful product assistant.',
        ],
        [
            'role'    => 'user',
            'content' => "Using this product context:\n{$context}\n\nQuestion: {$request->input('q')}",
        ],
    ],
]);

return $answer->choices[0]->message->content;

✔ Use OpenAI\Laravel\Facades\OpenAI; import added. Raw float arrays from EmbeddingService are wrapped in a new Vector(…) before passing to nearestTo(). Response access corrected to $answer->choices[0]->message->content.

Both features reuse everything built in this guide: embeddings, indexes, and similarity operators. The vector infrastructure you built here is not just a recommendation engine; it is the foundation for a fully intelligent product experience.

Building a ‘Related Products’ Feature in Laravel 12 Using Native Vectors and PostgreSQL

Why should you use PostgreSQL with pgvector?

Setting Up the Environment

Install Dependencies

Configure Environment Variables

Enable the pgvector extension

Step 1: Design the Database Schema

Step 2: Add HNSW Indexing for Performance

Tuning Recall at Query Time

Step 3: Create the EmbeddingService

Step 4: Generate and Queue Product Embeddings

Build the Input Text

Always Queue Embedding Generation

Step 5: Configure the Eloquent Model

Step 7: Hybrid Filtering (Recommended for Production)

Architectural & Edge Case Considerations

1. Cold Start Problem

2. Re-embedding on Content Changes

3. Approximate vs. Exact Search

4. Handling No Results (with Fallback)

When Vector Search Shouldn’t Be Used

Catalogues with less than 500 items

Rule-Based Systems

Compliance and Auditable Outputs

Next Steps

1. Semantic Search

2. Retrieval-Augmented Generation (RAG)

Chandan Kumar

Related Articles

Giving Your Vision a smooth transition to reality

Building a ‘Related Products’ Feature in Laravel 12 Using Native Vectors and PostgreSQL

Why should you use PostgreSQL with pgvector?

Setting Up the Environment

Install Dependencies

Configure Environment Variables

Enable the pgvector extension

Step 1: Design the Database Schema

Step 2: Add HNSW Indexing for Performance

Tuning Recall at Query Time

Step 3: Create the EmbeddingService

Step 4: Generate and Queue Product Embeddings

Build the Input Text

Always Queue Embedding Generation

Step 5: Configure the Eloquent Model

Step 6: Query for Related Products

Step 7: Hybrid Filtering (Recommended for Production)

Architectural & Edge Case Considerations

1. Cold Start Problem

2. Re-embedding on Content Changes

3. Approximate vs. Exact Search

4. Handling No Results (with Fallback)

When Vector Search Shouldn’t Be Used

Catalogues with less than 500 items

Rule-Based Systems

Compliance and Auditable Outputs

Next Steps

1. Semantic Search

2. Retrieval-Augmented Generation (RAG)

Chandan Kumar

Related Articles

Giving Your Vision a smooth transition to reality