What i am Trying to do?
I want to access the ip cameras and add a face recognition model to the live feed and display it to the web browser frontend for the ease of access this is the main concern, also want to have frame access so that i can save the recognized image in the form of jpeg images like a snapshot (secondry)
Environment -> Windows 11, Python 3.11, Nvdia 5060 Laptop GPU, YOLOv8n-face, facerecognition (dlib), opencv, Flask, flasksock + eventlet, MSsql for accessing encodings (pyodbc), and react frontend for accessing the cameras via websocket
Goal maintain smooth, low-latency streams when scaling from 4 cams → 6–8 cams, with face detection + recognition per stream and annotated JPEGs (just need to have fram level access) sent over WebSocket to a React frontend.
Problems ≤4 cams: OK. 6–8 cams: periodic freezes/stutter. GPU has headroom, CPU spikes. (a lot of lags during preview) Delay spikes during YOLO inference + JPEG encode.
Minimal code (condensed) - This is the minimal code which i can share
# wsCamStream.py
import threading, time, cv2, numpy as np
from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor
from recoznizeyoloN import recognizeyolo, is_face_exists
executor = ThreadPoolExecutor(max_workers=6) # shared pool
class CameraStream:
def __init__(self, url, name, known_encoding, id_info):
self.url, self.name = url, name
self.capture = cv2.VideoCapture(self.url, cv2.CAP_FFMPEG)
self.running = True
self.known_encoding, self.id_info = known_encoding, id_info
self.zoom = 1.0
self.frame_queue = Queue(maxsize=1)
threading.Thread(target=self._reader, daemon=True).start()
def _reader(self):
while self.running:
if self.capture.isOpened():
ok, frame = self.capture.read()
if ok:
try: self.frame_queue.get_nowait()
except Empty: pass
try: self.frame_queue.put_nowait(frame)
except: pass
else:
self._reconnect()
time.sleep(5)
else:
self._reconnect()
time.sleep(5)
time.sleep(0.01)
def _reconnect(self):
try:
self.capture.release(); time.sleep(1)
self.capture = cv2.VideoCapture(self.url, cv2.CAP_FFMPEG)
except Exception as e:
print("reconnect err", e)
def get_frame(self):
try: return self.frame_queue.get(timeout=1)
except Empty: return None
def process_frame_async(self, frame, cnt, skip_every=3):
return executor.submit(self._process, frame)
def _process(self, frame):
if frame is None or frame.size == 0: return frame, None, None
if frame.dtype != np.uint8: frame = frame.astype(np.uint8)
if frame.shape[-1] == 4: frame = frame[:, :, :3]
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
if is_face_exists(frame_rgb):
result, _, idx, photo_b64 = recognizeyolo(
frame_rgb.copy(), self.known_encoding, self.id_info, "entry", self.name
)
return result, idx, photo_b64
return frame, None, None
# app.py (extract)
import time, json, struct, cv2
from flask import Flask, request
from flask_sock import Sock
from wsCamStream import CameraStream
FRAME_SKIP = 3
SEND_DELAY = 0.06 # ~12 FPS target
@sock.route('/ws_stream/<cam_id>')
def ws_stream(ws, cam_id):
cam = camera_streams.get(cam_id)
frame_count = 0
while True:
t0 = time.time()
frame = cam.get_frame()
if frame is None: time.sleep(0.01); continue
frame = cv2.resize(frame, (1280, 720))
frame_count += 1
fut = cam.process_frame_async(frame, frame_count, skip_every=FRAME_SKIP)
try:
annotated, match_info, face_b64 = fut.result(timeout=2)
except Exception as e:
continue
ok, buf = cv2.imencode('.jpg', annotated)
if not ok: continue
header = {"match": id_info[match_info] if match_info is not None else None,
"photo": face_b64 or None}
hj = json.dumps(header).encode('utf-8')
ws.send(struct.pack('>I', len(hj)) + hj + buf.tobytes())
time.sleep(SEND_DELAY)
import torch, cv2, numpy as np, face_recognition as frg
from ultralytics import YOLO
model = YOLO("yolov8n-face.pt").to("cuda").half()
YOLO_FACE_CONFIDENCE_THRESHOLD = 0.30
TOLERANCE = 0.5
def is_face_exists(img):
boxes = model(img, verbose=False)[0].boxes.xyxy.cpu().numpy().astype(int)
return len(boxes) > 0
def recognizeyolo(img_rgb, known_encoding, id_info, movetype, cam_name):
res = model(img_rgb, verbose=False)
# ... crop, frg.face_encodings, frg.compare_faces, draw, return annotated RGB ...
return annotated_rgb, face_located, match_idx, face_photo_b64
What i tried ->
- Threading: Shared ThreadPoolExecutor(max_workers=6) for processing; reader thread per camera; Queue(maxsize=1) to drop stale frames.
- Model: YOLOv8 nano face model + half precision (CUDA). Early-exit with is_face_exists before recognition.
- Frame rate and size: Downscale to 1280×720 ; SEND_DELAY ≈ 12 FPS to clients. Preview route is 640×360.
- I/O: JPEG encoding via cv2.imencode('.jpg', ...); binary WebSocket payload with small JSON header.
The main problem is that when there are upto 4 cameras the lags are minimal and system runs fi ne but when i add more cams it starts lagging and frame drops, also there are delays in the cmeras stream
What can be the possible solution and way to achieve this can you help me also can you give me refrences?
Also tried gstreamer - to get the frames but still i am new to it so was not able to achieve the desired result as i am still learning how the pipelines of gstreamer works and what are the functions. If it can help me increase the performance than i will be happy to use that but still in cmd i am not able to use the gstreamer on gpu it always fall back to cpu.