Executive summary
Most CCTV systems are built for recording, not retrieval. When something goes wrong, teams fall into the same trap: open video player, scrub timelines, jump between cameras, and hope you don’t miss the key moment.
On-prem smart video search flips the workflow from “watch footage” to “query evidence.” Instead of spending hours, operators ask direct questions:
- “Show me every time this vehicle plate appeared this week.”
- “Find this person across these cameras between 7 pm and 11 pm.”
- “Show all sightings above a confidence threshold, grouped by camera and time.”
This is not magic. It is a pipeline that converts video into searchable metadata, indexes it properly, and gives the operator a UI built for investigations and audits.
A quick, important clarification:
- Smart Search is the practical umbrella: plates, faces, person re-identification, and event filters.
- Semantic Search is the advanced layer: search by meaning using natural-language queries like “person carrying a carton,” powered by vision embeddings.
Most deployments should start with smart search (plates + face/person + filters) and add semantic search only after the foundation is stable.
1) Smart search vs semantic search (not the same)
Smart search
Smart search is built on structured signals:
- detections (person, face, vehicle, plate)
- OCR text (plate numbers)
- tracking (same object across frames)
- event rules (intrusion, loitering, line-crossing)
- filters (time, camera group, zone, confidence)
This is why plate search can be instant: plate text is indexed like a database field.
Semantic search
Semantic search is “search by meaning.” Typical queries:
- “man with helmet in hand”
- “white hatchback stopping near gate”
- “person running in corridor”
- “same scene as this clip”
This is usually done through embeddings and vector search (similarity retrieval), often with a vision-language model.
Practical reality: semantic search is powerful, but it is also easier to misinterpret, harder to validate, and more sensitive to camera quality and scene variation. It should be treated as an add-on, not the base.
2) What “on-prem smart search” really means
On-prem smart search means:
- video stays inside your network (camera VLAN, NVR network, or on-prem server rack)
- AI processing happens on a local box or server
- the system stores searchable metadata locally
- search works even if the internet is down
This matters in India because many sites have limited uplink, strict privacy expectations, and operational needs where alerts and retrieval must work offline.
3) How it works end-to-end (the pipeline)
A reliable on-prem system has six layers.
Layer 1: Ingest
- Pull streams from cameras using RTSP.
- If cameras are locked behind an NVR, pull RTSP from NVR channels.
Goal: stable stream access, health monitoring, and timestamps you can trust.
Layer 2: Decode
This is frequently the hidden bottleneck.
- Multi-camera analytics consumes decode capacity before AI becomes the limit.
- Efficient deployments use sub-streams for indexing and main-stream only for evidence review.
Layer 3: Detection + tracking
- Detect person, face, vehicle, plate region
- Track objects across frames to avoid duplicate detections and to create “tracklets” (a short continuous track)
Tracking improves search quality because the system stores one strong sighting per track, not 400 near-duplicates.
Layer 4: Recognition (the two search engines)
People search has two engines:
- Face search
- Face detection
- Face alignment
- Face embedding creation
- Similarity match against a query face or watchlist
- Person re-identification (Re-ID)
- Person body detection
- Appearance embedding (clothing, silhouette, gait cues)
- Similarity match across cameras/time even when face is not visible
Plate search:
- Plate detection
- OCR
- Store text, confidence, plate crop, camera ID, timestamp
Layer 5: Metadata storage + indexing
You store two types of data:
- Structured tables: plate text, timestamps, camera, confidence
- Vector indexes: face embeddings, person embeddings
If indexing is weak, search feels slow and unreliable even when AI is good.
Layer 6: Search UI + evidence workflow
Search is only valuable if it produces:
- fast results
- reviewable evidence (clip + frame + crop)
- exportable proof with time and camera stamp
- audit trail of who searched what and when
4) What you need (minimum viable vs production-grade)
A. Cameras: quality requirements that decide your success
For number plates
Plate recognition is mostly a capture problem, not an OCR marketing claim. You need:
- enough pixels across the plate width
- low motion blur (shutter matters)
- controlled glare at night (IR bloom and reflective plates can kill reads)
- sensible camera angle (avoid extreme tilt)
A widely used practical target discussed in industry guides is around 100 pixels across the plate width, with more margin for hard scenes.
For faces
Face search works best at choke points:
- entry gates, reception, elevator lobbies, billing counters
- controlled lighting beats “random corridor” every time
- avoid backlit glass entrances unless you solve exposure
For person Re-ID
Re-ID is useful in warehouses, corridors, shopfloor, parking lanes, but it becomes less reliable when:
- uniforms look identical
- lighting varies wildly across cameras
- the person changes clothing
B. Compute: the sizing logic buyers should understand
Do not size only by “number of cameras.” Size by:
- which streams you process (main vs sub)
- FPS you analyze (you often do not need full FPS for indexing)
- how much tracking you do
- whether you store embeddings for long retention windows
- how many concurrent users will search at peak times
In many real deployments, the best cost-performance comes from:
- index on sub-stream
- verify on main-stream
- store metadata longer than video
C. Storage: treat video and metadata separately
Video is heavy. Metadata is light.
Good design:
- keep video retention as per policy (7, 15, 30 days depending on risk and storage)
- keep metadata longer if policy allows, because it enables search even after video ages out (or at least accelerates retrieval during retention window)
- keep face/plate crops only if required for evidence workflows and policy
5) The operator’s “prompt” library (what people actually search)
Plate search prompts
- “UP16AB1234, last 30 days, all gates”
- “UP16AB partial match, Gate-2, yesterday 6 pm to 10 pm”
- “Show similar plates to UP16AB1234 (O/0, I/1), confidence above 0.6”
- “All blocked plates that entered in the last 7 days”
People search prompts (face-based)
- “Find this person across all entrances, last 14 days”
- “Show only matches above 0.75, group by camera”
- “Watchlist alert if seen again at any gate”
People search prompts (Re-ID)
- “Find this person from this snapshot across Warehouse Zone A, last 6 hours”
- “Top 200 results, group by camera, show tracklets”
- “Filter: only between 8 pm and 6 am”
Semantic search prompts (advanced)
- “person carrying a carton near Dock-3”
- “white hatchback stopping at Gate-1”
“person running in corridor”
Use these only when your base indexing and camera consistency are already strong.
6) Ten real-life examples (how smart search gets used)
- Society gate dispute: retrieve entry and exit for a plate in under 2 minutes, settle resident complaint same day.
- Blacklist vehicle alert: plate watchlist triggers a local alert even without internet.
- Visitor management audit: match visitor log to actual vehicle entries and dwell time.
- Warehouse short-ship dispute: find the truck entry, dock assignment, and departure window fast.
- Vendor fraud: repeat vendor vehicle appears outside approved slots, flagged by time filters.
- Parking revenue leakage: identify “lost ticket” patterns by comparing entry reads versus exit reads.
- Retail incident: person appears at customer service desk across multiple days, found via face search at choke point camera.
- Factory night-shift review: Re-ID helps trace movement between restricted zones where faces are not visible.
- Hospital emergency lane: identify repeated offenders blocking ambulance route by plate and export evidence.
- Multi-site operations: search across camera groups by location tags (Gate, Dock, Lobby) to reduce investigator effort.
7) What to measure in a pilot (so it does not become “AI demo theatre”)
Define success as time-to-evidence and operator workload reduction.
Recommended pilot KPIs:
- Plate search: “retrieve all sightings of a plate across selected cameras in under 60 seconds”
- People search: “find top candidates across 20 cameras in under 2 minutes, with reviewable evidence clips”
- Reduction in manual scrubbing time per incident
- False alert rate for watchlists (measured and tuned)
- Evidence export quality: time stamp accuracy and repeatability
8) Governance and compliance (practical controls)
Even on-prem, you still process personal data. Buyers increasingly ask for:
- role-based access control (who can search plates, who can search faces)
- audit logs (who searched what, when, and why)
- retention policy for video, crops, and embeddings
- separation of duties (admin vs operator)
- purpose-bound usage, documented in SOPs
This turns a “tool” into an enterprise-ready system.
9) Buyer checklist (what to demand in procurement)
Ask vendors to show:
- camera audit method (pixel width and shutter guidance for plates, face capture guidance)
- stream plan (main vs sub-stream strategy)
- indexing design (text index for plates, vector search for faces and persons)
- evidence workflow (review, export, chain-of-custody style logs)
- accuracy tuning plan (thresholds, false positives, periodic recalibration)
- governance features (RBAC, audit, retention, deletion)
If they only demo a UI with no indexing and no SOPs, it will fail after week two.
FAQs
Yes. Pull RTSP from NVR channels and run analytics on those streams.
Start with smart search. Plates and face-based choke point search deliver faster, more defensible outcomes.
Capture: insufficient pixels on plate, blur, glare at night, and bad angle.
Yes, via person Re-ID, but reliability varies by uniforms, lighting shifts, and clothing change.
Yes. That is the point of on-prem.
If you want face search, yes, some form of embeddings and vector index is required. Control access and retention tightly.
Use confidence thresholds, operator verification, and watchlist governance. Measure false match and false non-match behavior over real site data, not lab samples.

