A Practical AI Pipeline for Turning Media Into Searchable Catalogs
This post describes a production-ready pipeline for converting images, audio, video, and text into searchable, faceted catalogs using AI — without vector databases, embeddings, or opaque ranking systems.
The core idea is simple:
Use AI to project content into explicit, bounded attributes and keywords, then index those using normal relational + graph structures.
1. Ingest Any Media
The pipeline accepts:
- Images (photos, posters, ads, menus)
- Audio (podcasts, interviews, calls)
- Video (shows, talks, lectures)
- Text (documents, transcripts, summaries)
Each media item is stored once, unchanged.
2. Extract Structured Observations (One AI Call)
For each item, a single AI call is made to extract observations, in a structured way.
Examples of observations:
- Title / subtitle
- Language and spelling quality
- Countries and cultural relevance
- Safety signals (obscene, controversial)
- Keywords (English + native language)
- Holiday or event references
- Date ranges and importance
- Shareability and confidence
Rules enforced:
- Strict JSON output
- Fixed fields only
- Hard length limits
- Use
nullif uncertain - No guessing
This keeps output predictable and auditable. A simple policy ensures user submissions are vetted
for importance and relevance, and screened for obscenity or controversy before becoming
discoverable and searchable by others.
3. Normalize + Compact Attributes
After AI output:
- Numbers are bucketed (e.g. confidence → 0.95)
- Text fields are truncated
- Keywords are normalized and deduplicated
- Arrays are hard-limited in size
- Total attribute size is capped (e.g. 1 KB)
Result:
- A small, flat attribute map per item
- Safe to store directly on the stream
4. Index Using Relations (No Vector Search)
Each attribute becomes a relation, for example:
attribute/keyword=valentine
attribute/country=IL
attribute/holiday=Valentine’s Day
attribute/confidence=0.95
Key properties:
- Each category is its own index
- Prefix search works naturally
- Facets combine efficiently
- Storage stays relational
- No embeddings or vector DBs required
Search becomes:
“Find items with keyword
romance, countryIL, confidence ≥ 0.8”
5. Audio & Video: Time-Based Semantic Indexing
For long media (audio/video):
-
Transcribe with timestamps (speaker-aware if available)
-
Split evenly across the timeline (not by meaning)
-
For each chunk:
- Generate summary
- Generate search keywords
- Preserve time range
-
Store clips as their own streams
-
Index clips and episodes separately
This enables:
- Jump-to-topic search
- Clip-level discovery
- Episode-level aggregation
- Explainable results (“this matched at 00:42:10”)
6. AI Keywords, Not User Keywords
Keywords are:
- Generated by AI
- Normalized and bounded
- Expanded (carefully) for search recall
This avoids:
- User-supplied tag spam
- Missing synonyms
- Language mismatch
The system can later re-index without re-processing media.
7. What This Enables
With the same pipeline you can build:
- Image catalogs (e-commerce, menus, ads)
- Dating profile discovery
- Restaurant menus with semantic search
- Podcast / interview archives
- Lecture and education libraries
- Community photo archives
- Media monitoring tools
- Historical video libraries
- Multilingual search experiences
- Faceted browsing without embeddings
8. Why This Works
- AI is used as a semantic compiler, not a chatbot
- Output is bounded, structured, and auditable
- Indexing is explicit and explainable
- Search is fast, cheap, and debuggable
- No vendor lock-in to vector databases
- Works today with standard SQL + relations
Summary
This pipeline turns unstructured media into search-native data by:
- Extracting bounded semantics with AI
- Normalizing into small attributes
- Indexing via relations
- Enabling keyword + facet search at scale
It’s not a recommendation engine.
It’s a catalog intelligence system.
And it works for images, audio, video, and text using the same architecture.
Videos
For real-time videoconferencing, we can use the speech recognition built into browsers, and achieve speaker diarization without having to guess voices, because we know who is speaking in the teleconference or live show being recorded:
Otherwise, we hook into external APIs provided by companies like AssemblyAI, to do transcription jobs with speaker diarization.
From these, we extract keywords, summaries, metadata etc. and make it all easily searchable and shareable. The end result for a multi-year podcast can be something like this:
Images
For images, a typical ingestion script works like this, using our technology. You can take a peek at the code:
<?php
function GroupsAPI_upload_post($params)
{
// Users::loggedInUser(true);
if (empty($_FILES['image']['tmp_name'])) {
throw new Q_Exception("No image uploaded");
}
$binary = file_get_contents($_FILES['image']['tmp_name']);
// 1. Save image (folder-only icon semantics)
$tempKey = 'tmp_' . uniqid('', true);
$paths = Q_Image::save(array(
'data' => $binary,
'path' => 'Q/uploads',
'subpath' => "Streams/images/$tempKey",
'save' => 'Streams/image',
'skipAccess' => true
));
if (empty($paths[''])) {
throw new Exception("Image save failed");
}
$tempIconPath = $paths['']; // folder-only path
// 2. Run LLM observations
$inputs = array('images' => array($binary));
$observations = array(
'semanticExtraction' => array(
'promptClause' =>
'First, extract explicit semantic facts visible in the image. ' .
'Do not infer or invent text. ' .
'If something is not clearly present, return null. ' .
'Assume the current year is ' . date('Y') . '. ' .
'If a known holiday is clearly referenced, identify it and give its date range ' .
'for this year only. Use YYYY-MM-DD format.',
'fieldNames' => array(
'title',
'subtitle',
'holidayName',
'startDate',
'endDate'
),
'example' => array(
'title' => 'Happy Valentine\'s Day',
'subtitle' => 'Celebrate love with something sweet',
'holidayName' => 'Valentine\'s Day',
'startDate' => '2026-02-14',
'endDate' => '2026-02-14'
)
),
/* === NEW, ADDITIVE === */
'holidayAnalysis' => array(
'promptClause' =>
'If holidayName is present, evaluate the global importance of this holiday. ' .
'Return an integer from 1 to 10 representing importance to at least 1 million people worldwide. ' .
'10 = globally significant holidays (e.g. Christmas, Ramadan). ' .
'7-9 = widely observed national or religious holidays (e.g. Valentine\'s Day, Hanukkah). ' .
'4-6 = regional or multi-country holidays. ' .
'1-3 = minor or niche holidays. ' .
'If no real holiday is present, return null. ' .
'Base this on widely accepted real-world observance, not personal opinion.',
'fieldNames' => array(
'holidayImportance'
),
'example' => array(
'holidayImportance' => 9
)
),
/* === END NEW === */
'languageQuality' => array(
'promptClause' =>
'Analyze the primaryImage and determine the primary language used, spelling quality, and expression naturalness. ' .
'Base this judgment only on visible text and widely-known linguistic conventions. ' .
'Do not infer intent or audience.',
'fieldNames' => array(
'language',
'spelling',
'expressions'
),
'example' => array(
'language' => 'ru',
'spelling' => 9,
'expressions' => 8
)
),
'culturalRelevance' => array(
'promptClause' =>
'Determine which countries and cultures this content is most relevant to based on visible imagery, symbols, and text. ' .
'Do not guess. Only include countries that are strongly implied.',
'fieldNames' => array(
'countries',
'culturalSpecificity'
),
'example' => array(
'countries' => array('RU','UA'),
'culturalSpecificity' => 7
)
),
'timing' => array(
'promptClause' =>
'Infer when this content would be relevant in the real world. ' .
'Provide exact date ranges for the years 2025 and 2026 only. ' .
'Use real holidays or events if clearly implied. ' .
'Do not invent holidays. If no timing is evident, return an empty array.',
'fieldNames' => array(
'dates',
'evergreen'
),
'example' => array(
'dates' => array(
array('2025-01-07','2025-01-07'),
array('2026-01-07','2026-01-07')
),
'evergreen' => 5
)
),
'contentClassification' => array(
'promptClause' =>
'Classify what this content is and how it is presented. ' .
'Describe type, occasion, tone, and sentiment without judging quality.',
'fieldNames' => array(
'contentType',
'occasion',
'tone',
'sentiment'
),
'example' => array(
'contentType' => 'greeting',
'occasion' => array('orthodoxChristmas'),
'tone' => array('festive'),
'sentiment' => 'positive'
)
),
'safety' => array(
'promptClause' =>
'Assess whether the content contains obscenity or material likely to cause controversy. ' .
'Rate visibility of such elements, not intent.',
'fieldNames' => array(
'obscene',
'controversial'
),
'example' => array(
'obscene' => 1,
'controversial' => 1
)
),
'discoveryQuality' => array(
'promptClause' =>
'Evaluate how suitable this content is for discovery and sharing by others. ' .
'Besides this, derive most common 10 keywords as single English words ' .
'people might use to search for this. Normalize to lowercase alphanumeric only. ' .
'If the primary language is English, set keywordsNative to empty array. ' .
'Otherwise, fill keywordsNative with top 10 keywords as single words in that language.',
'fieldNames' => array(
'shareability',
'confidence',
'keywords',
'keywordsNative'
),
'example' => array(
'keywords' => array("Orthodox", "Christmas", "Religious", "Holiday"),
'keywordsNative' => array("Православный", "Рождество", "Религиозный", "Праздник"),
'shareability' => 8,
'confidence' => 0.9
)
)
);
$llm = AI_LLM::create('openai');
$results = $llm->process($inputs, $observations);
// 3. Policy gate
if (!GroupsAPI_Images::accept($results)) {
return Q_Response::json(array(
'accepted' => false,
'results' => $results
));
}
// 4. Compact for attributes
$attr = GroupsAPI_Images::forAttributes($results);
if (!GroupsAPI_Images::fitsAttributes($attr)) {
throw new Exception("Observation attributes exceed limit");
}
// 5. Create Streams/image
$title = Q::ifset($attr, 'title', Q::ifset($attr, 'subtitle', 'Untitled Image'));
unset($attr['title']);
$stream = Streams::create(
'GroupsAPI',
'GroupsAPI',
'Streams/image',
array(
'title' => $title,
'icon' => $paths[''], // folder only
'attributes' => $attr
),
array(
'skipAccess' => true
)
);
/*
* 6. Rename temp folder → final stream name
*/
$uploadsRoot = APP_WEB_DIR;
$timestamp = time();
$finalDir = implode(DS, array(
$uploadsRoot, 'Q', 'uploads', 'Streams', $stream->publisherId, $stream->name, 'icon', $timestamp
));
if (!is_dir($parentDir = dirname($finalDir))) {
@mkdir($parentDir, 0777, true);
}
if (!rename($uploadsRoot . DS . $tempIconPath, $finalDir)) {
throw new Exception("Failed to finalize image storage");
}
/*
* 7. Update stream icon to final folder
*/
$publisherId = $stream->publisherId;
$streamName = $stream->name;
$stream->update(array(
'icon' => "{{baseUrl}}/Q/uploads/Streams/$publisherId/$streamName/icon/$timestamp"
), array(
'skipAccess' => true
));
return Q_Response::json(array(
'accepted' => true,
'streamName' => $stream->name,
'icon' => Streams::iconUrl($stream)
));
}

