The majority of e-commerce “Related Products” sections rely on simple collaborative filtering, matching categories, overlapping tags, or strict, rule-based logic. Product relationships are frequently semantic rather than categorical, which makes this strategy ineffective.
Despite having different tags and categories, a canvas messenger bag and a leather laptop bag fulfil the same user need. They will never be connected by a rule-based system. Because vector search comprehends meaning instead of just matching labels, it does.
Using PostgreSQL, pgvector, and OpenAI embeddings, this tutorial demonstrates how to create a semantic “Related Products” feature in Laravel 12 with HNSW indexing to maintain speedy similarity queries at scale.
Why should you use PostgreSQL with pgvector?
You don’t need a separate vector database anymore because pgvector lets you run vector similarity searches right in your existing PostgreSQL database. Your relational data and embeddings stay in one place, which makes your architecture simple and queries quick.
With HNSW indexing built into pgvector, approximate nearest-neighbor searches stay quick even when your product catalogue grows to hundreds of thousands of rows. This is the best way to keep things simple at work.
Requirements
- PHP 8.2+ and Laravel 12: Check that your environment meets the minimum version requirements.
- PostgreSQL 15+: Needed for HNSW index support. Older versions don’t have the USING hnsw syntax.
- To install and enable the pgvector Extension on your PostgreSQL instance, use the command CREATE EXTENSION IF NOT EXISTS vector.
- OpenAI API Key: This is used with the Embeddings API to turn product text into vector embeddings.
- Composer Packages: pgvector/pgvector for Eloquent vector support and openai-php/laravel for making embeddings.
Setting Up the Environment
Install Dependencies
Install the required Composer packages:
bash
# Vector support for Laravel
composer require pgvector/pgvector
# OpenAI PHP client for generating embeddings
composer require openai-php/laravel Configure Environment Variables
Add the following to your .env file:
env
DB_CONNECTION=pgsql
DB_HOST=127.0.0.1
DB_PORT=5432
DB_DATABASE=your_database
DB_USERNAME=your_username
DB_PASSWORD=your_password
OPENAI_API_KEY=sk-your-openai-api-key Enable the pgvector extension
Create a dedicated migration to enable the extension. This must run before any migration that adds a vector column:
bash
php artisan make:migration enable_pgvector_extension
php
<?php
use Illuminate\Database\Migrations\Migration;
use Illuminate\Support\Facades\DB;
return new class extends Migration {
public function up(): void
{
DB::statement('CREATE EXTENSION IF NOT EXISTS vector');
}
public function down(): void
{
DB::statement('DROP EXTENSION IF EXISTS vector');
}
};
⚠ Verify the extension is available before running migrations: SELECT * FROM pg_available_extensions WHERE name = ‘vector’;
Step 1: Design the Database Schema
Add a vector column to the products table to store embeddings. The pgvector/pgvector package extends Laravel’s Blueprint with a vector() macro, so you can define it directly in a schema migration.
⚠ The dimension value (1536) must exactly match the output dimensions of your embedding model. text-embedding-ada-002 produces 1536 dimensions. Mismatched dimensions will cause storage and query failures.
php
<?php
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
return new class extends Migration {
public function up(): void
{
Schema::table('products', function (Blueprint $table) {
// 1536 dimensions matches OpenAI text-embedding-ada-002
$table->vector('embedding', 1536)->nullable();
});
}
public function down(): void
{
Schema::table('products', function (Blueprint $table) {
$table->dropColumn('embedding');
});
}
};
Step 2: Add HNSW Indexing for Performance
Without an index, similarity search requires a full table scan — O(n) — which does not scale past a few thousand rows. Create an approximate nearest-neighbor (ANN) index using HNSW:
sql
CREATE INDEX products_embedding_idx
ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Index Parameters Explained
- m — Maximum connections per layer in the HNSW graph. Higher values improve recall but increase memory consumption. Default: 16. Recommended range: 8–32.
- ef_construction — Quality of the index build. Higher values produce a more accurate graph but take longer to construct. Default: 64. Recommended range: 64–200.
Tuning Recall at Query Time
You can change the recall/speed trade-off without rebuilding the index by setting hnsw.ef_search at query time:
sql
-- Higher value = better recall, slightly slower queries
SET hnsw.ef_search = 100;
After creating the index, update the planner statistics so PostgreSQL can generate optimal query plans:
sql
ANALYZE products;
Step 3: Create the EmbeddingService
Before generating any embeddings, create a dedicated EmbeddingService class. This encapsulates the OpenAI API call in one place and is the class referenced throughout the rest of this guide.
php
<?php
namespace App\Services;
use OpenAI\Laravel\Facades\OpenAI;
class EmbeddingService
{
/**
* Generate an embedding vector for the given text.
* Returns a float array (e.g., 1536 floats for ada-002).
*
* @return float[]
*/
public function generate(string $text): array
{
$response = OpenAI::embeddings()->create([
'model' => 'text-embedding-ada-002',
'input' => $text,
]);
return $response->embeddings[0]->embedding;
}
}
The method returns a plain float[] array. The Vector cast on the Product model (configured in Step 4) accepts a raw float array when assigning to $product->embedding, so no additional conversion is required.
Step 4: Generate and Queue Product Embeddings
Vector similarity only works when embeddings are generated consistently. Always combine the same product fields in the same order across the entire catalog.
Build the Input Text
php
// Combine the fields that best describe what the product IS and DOES.
// Be consistent — every product in the catalog must use the same fields.
$content = $product->name . ' ' . $product->description . ' ' . $product->category;
You may also include brand, features, or specifications. Consistency across all products is what matters most.
Always Queue Embedding Generation
Calling the OpenAI API is a network request. Generating embeddings inline during an HTTP request adds latency and ties up your server. Always dispatch embedding generation to a background queue job.
php
<?php
namespace App\Jobs;
use App\Models\Product;
use App\Services\EmbeddingService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
class GenerateProductEmbedding implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
// Retry up to 3 times on failure, waiting 10 seconds between attempts.
// This handles transient OpenAI API errors without manual intervention.
public int $tries = 3;
public int $backoff = 10;
public function __construct(
private readonly Product $product
) {}
public function handle(EmbeddingService $service): void
{
$content = strip_tags(
$this->product->name . ' ' .
$this->product->description . ' ' .
$this->product->category
);
// Trim whitespace to prevent inconsistent embeddings.
$content = trim($content);
// Never send an empty string to the API — it wastes credits
// and produces a semantically meaningless vector.
if (blank($content)) {
return;
}
// generate() returns float[] — the Vector cast on the model
// handles serialization to pgvector's binary format automatically.
$this->product->embedding = $service->generate($content);
$this->product->save();
}
}
Dispatch the job whenever a product is created:
php
dispatch(new GenerateProductEmbedding($product));
Step 5: Configure the Eloquent Model
Set up the Product model to handle vector columns correctly using pgvector/pgvector’s traits and casts. Without this, Laravel will mishandle the binary vector data.
php
<?php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
use Pgvector\Laravel\HasNeighbors;
use Pgvector\Laravel\Vector;
use InvalidArgumentException;
class Product extends Model
{
use HasNeighbors;
protected $casts = [
'embedding' => Vector::class,
];
/**
* Order results by vector similarity to the given embedding.
*
* Supported operators:
* <=> Cosine distance (best for text embeddings)
* <-> Euclidean distance (best for image/numeric vectors)
* <#> Negative inner product (for unit-normalized vectors)
*
* IMPORTANT: Cast $vector to string for raw PDO bindings.
* Passing the Vector object directly to orderByRaw may fail
* silently depending on the PDO driver configuration.
*/
public function scopeNearestTo($query, Vector $vector, string $operator = '<=>')
{
$allowedOperators = ['<=>', '<->', '<#>'];
if (!in_array($operator, $allowedOperators)) {
throw new InvalidArgumentException(
"Invalid similarity operator '{$operator}'. " .
"Allowed: " . implode(', ', $allowedOperators)
);
}
// Cast to string to produce the literal format pgvector expects:
// e.g. [0.123,0.456,...,0.789]
return $query->orderByRaw(
"embedding {$operator} ?",
[(string) $vector]
);
}
}
✔ scopeNearestTo now casts \$vector to string before passing it as a PDO binding. Passing the Vector object directly to orderByRaw may fail silently depending on how the PDO driver is configured.
Step 6: Query for Related Products
With the model in place, the similarity query is clean and readable:
php
$related = Product::query()
->whereNot('id', $product->id)
->whereNotNull('embedding')
->nearestTo($product->embedding, '<=>')
->limit(8)
->get();
Choosing the Right Similarity Operator
- <=> — Cosine distance. Measures the angle between vectors, not their magnitude. Best choice for text embeddings.
- <-> — Euclidean (L2) distance. Measures absolute spatial distance. Best for dense numeric or image feature vectors.
- <#> — Negative inner product. Maximizes dot-product similarity. Best for unit-normalized vectors.
For text embeddings from OpenAI, always use cosine distance (<=>). OpenAI normalizes its vectors to unit length, making angular similarity the most meaningful measure.
⚠ Never place orderByRaw calls directly in controllers. Route all similarity queries through the nearestTo() scope defined on the model, and wrap the full similarity logic in a ProductSimilarityService class to keep business logic out of the HTTP layer.
Step 7: Hybrid Filtering (Recommended for Production)
Pure vector similarity is semantically aware but business-unaware. A $2,000 laptop might be the closest vector match to a $20 phone case. Combine vector search with SQL filters for results that are both semantically relevant and commercially meaningful.
php
Product::query()
->where('category_id', $product->category_id)
->whereBetween('price', [$product->price * 0.5, $product->price * 2])
->where('in_stock', true)
->whereNotNull('embedding')
->nearestTo($product->embedding, '<=>')
->limit(8)
->get();
Think of It This Way
- Vectors answer: “What is semantically similar?”
- SQL filters answers: “What is actually relevant and purchasable for this customer?”
Filters Worth Layering
- category_id: Keeps recommendations contextually relevant.
- price range (whereBetween): Prevents recommending products far outside the customer’s budget.
- in_stock: Ensures only purchasable products are shown.
- brand: Useful for brand-affinity recommendations.
Architectural & Edge Case Considerations
1. Cold Start Problem
A null embedding means a new product cannot appear in any similarity query. The queued job architecture from Step 4 solves this automatically: GenerateProductEmbedding is dispatched immediately on product creation, the HTTP response returns without blocking, and the embedding is populated in the background.
2. Re-embedding on Content Changes
When a product’s name or description changes, its stored embedding becomes stale. Use a Laravel Observer to handle re-embedding automatically and keep this logic off the model:
bash
php artisan make:observer ProductObserver --model=Product
php
<?php
namespace App\Observers;
use App\Jobs\GenerateProductEmbedding;
use App\Models\Product;
class ProductObserver
{
public function updated(Product $product): void
{
// Only re-embed when semantically relevant fields change.
// Price or stock changes do NOT trigger a new embedding.
if ($product->wasChanged(['name', 'description'])) {
dispatch(new GenerateProductEmbedding($product));
}
}
}
Register the observer in the boot() method of AppServiceProvider:
php
<?php
namespace App\Providers;
use App\Models\Product;
use App\Observers\ProductObserver;
use Illuminate\Support\ServiceProvider;
class AppServiceProvider extends ServiceProvider
{
public function boot(): void
{
Product::observe(ProductObserver::class);
}
}
The full AppServiceProvider class is now shown; previously only the one-liner was shown, which was not copy-pasteable as a complete example.
3. Approximate vs. Exact Search
pgvector behaves differently depending on whether an HNSW index exists on the column:
| Search Mode | Technique | Query Speed | Recall Accuracy | Scalable |
| Approximate | HNSW Index | Under 10 ms | 95–99% | Yes |
| Exact | Sequential Scan | O(n) – Slower | 100% | No |
The HNSW index is not optional for production catalogs with more than a few thousand products. Real-world recall exceeds 95% with the index settings in this guide. Tune at query time with the following:
sql
-- Increase ef_search for higher recall (at a small speed cost)
SET hnsw.ef_search = 100;
4. Handling No Results (with Fallback)
Vector search doesn’t always return useful results. Short or poorly written product descriptions may yield neighbors that are semantically too distant to be meaningful. Always check the distance score and define a fallback:
php
<?php
namespace App\Services;
use App\Models\Product;
use Illuminate\Support\Collection;
use Pgvector\Laravel\Vector;
class ProductSimilarityService
{
public function related(Product $product, int $limit = 8): Collection
{
// Select distance as an alias so we can filter and sort by it.
// We cast to string for the raw PDO binding.
$embeddingStr = (string) $product->embedding;
$results = Product::query()
->whereNot('id', $product->id)
->whereNotNull('embedding')
->selectRaw('*, (embedding <=> ?) AS distance', [$embeddingStr])
// Discard neighbors with cosine distance > 0.5.
// Distance of 0.5 means the vectors are ~60° apart — too dissimilar.
// For dense, well-written catalogs you can tighten this to 0.3.
->whereRaw('(embedding <=> ?) <= 0.5', [$embeddingStr])
->orderByRaw('distance') // PostgreSQL allows ordering by SELECT aliases
->limit($limit)
->get();
if ($results->isEmpty()) {
return $this->popularFallback($limit);
}
return $results;
}
private function popularFallback(int $limit): Collection
{
return Product::query()
->whereNotNull('embedding')
->orderByDesc('sales_count')
->limit($limit)
->get();
}
}
✔ Wrapped in ProductSimilarityService (not a bare controller method). Embed cast to string once and reuse it. orderByRaw(‘distance’) is used instead of orderBy() for clarity. Proper namespace and use statements added.
When Vector Search Shouldn’t Be Used
Vector similarity is useful, but the extra work that comes with managing embeddings, indexing HNSW, and doing lifecycle jobs isn’t always worth it.
Catalogues with less than 500 items
PostgreSQL can easily scan full tables with a few hundred rows. At this scale, simple queries based on categories or tags are faster to make, easier to keep up with, and work just fine.
Rule-Based Systems
A deterministic rule-based system is more predictable, easier to check, and easier to fix if your recommendations always have to follow strict business rules (always the same brand, always from the same subcategory).
Compliance and Auditable Outputs
Searches based on HNSW are not exact and are based on probabilities. The results can change depending on how the index is set up. A rule-based approach is the best choice if your system needs outputs that can be fully reproduced and explained for legal or compliance reasons.
Next Steps
Your vector infrastructure is now complete. The same embedding column and pgvector setup can power additional intelligent features with no further database changes.
1. Semantic Search
Let users search by meaning, not just keywords. Embed the search query and run a cosine similarity search against the products table:
php
<?php
use App\Services\EmbeddingService;
use App\Models\Product;
use Pgvector\Laravel\Vector;
// Convert the raw search string into a vector
$queryEmbedding = new Vector(
app(EmbeddingService::class)->generate($request->input('q'))
);
$results = Product::query()
->whereNotNull('embedding')
->nearestTo($queryEmbedding, '<=>')
->limit(10)
->get();
A search for “something to carry my laptop to meetings” will now surface relevant products even if none of them contain those exact words.
2. Retrieval-Augmented Generation (RAG)
Your vector table can serve as the knowledge base for an AI product assistant. Retrieve the most semantically relevant products first, then pass them as context to a language model:
php
<?php
use App\Services\EmbeddingService;
use App\Models\Product;
use OpenAI\Laravel\Facades\OpenAI;
use Pgvector\Laravel\Vector;
// Embed the user's question
$queryEmbedding = new Vector(
app(EmbeddingService::class)->generate($request->input('q'))
);
// Retrieve the 5 most semantically relevant products
$relevantProducts = Product::query()
->whereNotNull('embedding')
->nearestTo($queryEmbedding, '<=>')
->limit(5)
->get();
// Build a context string from the retrieved products
$context = $relevantProducts->pluck('description')->implode("\n");
// Send the context + question to the language model
$answer = OpenAI::chat()->create([
'model' => 'gpt-4',
'messages' => [
[
'role' => 'system',
'content' => 'You are a helpful product assistant.',
],
[
'role' => 'user',
'content' => "Using this product context:\n{$context}\n\nQuestion: {$request->input('q')}",
],
],
]);
return $answer->choices[0]->message->content;
✔ Use OpenAI\Laravel\Facades\OpenAI; import added. Raw float arrays from EmbeddingService are wrapped in a new Vector(…) before passing to nearestTo(). Response access corrected to $answer->choices[0]->message->content.
Both features reuse everything built in this guide: embeddings, indexes, and similarity operators. The vector infrastructure you built here is not just a recommendation engine; it is the foundation for a fully intelligent product experience.