How can I optimize a Python RTSP → YOLOv8 face-recognition → WebSocket pipeline to handle 6–8 cameras without lag/stutter?

Ask Question

Asked 3 months ago

Modified 3 months ago

Viewed 88 times

What i am Trying to do?
I want to access the ip cameras and add a face recognition model to the live feed and display it to the web browser frontend for the ease of access this is the main concern, also want to have frame access so that i can save the recognized image in the form of jpeg images like a snapshot (secondry)

Environment -> Windows 11, Python 3.11, Nvdia 5060 Laptop GPU, YOLOv8n-face, facerecognition (dlib), opencv, Flask, flasksock + eventlet, MSsql for accessing encodings (pyodbc), and react frontend for accessing the cameras via websocket

Goal maintain smooth, low-latency streams when scaling from 4 cams → 6–8 cams, with face detection + recognition per stream and annotated JPEGs (just need to have fram level access) sent over WebSocket to a React frontend.

Problems ≤4 cams: OK. 6–8 cams: periodic freezes/stutter. GPU has headroom, CPU spikes. (a lot of lags during preview) Delay spikes during YOLO inference + JPEG encode.

Minimal code (condensed) - This is the minimal code which i can share

# wsCamStream.py
import threading, time, cv2, numpy as np
from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor
from recoznizeyoloN import recognizeyolo, is_face_exists

executor = ThreadPoolExecutor(max_workers=6)  # shared pool

class CameraStream:
    def __init__(self, url, name, known_encoding, id_info):
        self.url, self.name = url, name
        self.capture = cv2.VideoCapture(self.url, cv2.CAP_FFMPEG)
        self.running = True
        self.known_encoding, self.id_info = known_encoding, id_info
        self.zoom = 1.0
        self.frame_queue = Queue(maxsize=1)
        threading.Thread(target=self._reader, daemon=True).start()

    def _reader(self):
        while self.running:
            if self.capture.isOpened():
                ok, frame = self.capture.read()
                if ok:
                    try: self.frame_queue.get_nowait()
                    except Empty: pass
                    try: self.frame_queue.put_nowait(frame)
                    except: pass
                else:
                    self._reconnect()
                    time.sleep(5)
            else:
                self._reconnect()
                time.sleep(5)
            time.sleep(0.01)

    def _reconnect(self):
        try:
            self.capture.release(); time.sleep(1)
            self.capture = cv2.VideoCapture(self.url, cv2.CAP_FFMPEG)
        except Exception as e:
            print("reconnect err", e)

    def get_frame(self):
        try: return self.frame_queue.get(timeout=1)
        except Empty: return None

    def process_frame_async(self, frame, cnt, skip_every=3):
        return executor.submit(self._process, frame)

    def _process(self, frame):
        if frame is None or frame.size == 0: return frame, None, None
        if frame.dtype != np.uint8: frame = frame.astype(np.uint8)
        if frame.shape[-1] == 4: frame = frame[:, :, :3]
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        if is_face_exists(frame_rgb):
            result, _, idx, photo_b64 = recognizeyolo(
                frame_rgb.copy(), self.known_encoding, self.id_info, "entry", self.name
            )
            return result, idx, photo_b64
        return frame, None, None

# app.py (extract)
import time, json, struct, cv2
from flask import Flask, request
from flask_sock import Sock
from wsCamStream import CameraStream

FRAME_SKIP = 3
SEND_DELAY = 0.06   # ~12 FPS target

@sock.route('/ws_stream/<cam_id>')
def ws_stream(ws, cam_id):
    cam = camera_streams.get(cam_id)
    frame_count = 0
    while True:
        t0 = time.time()
        frame = cam.get_frame()
        if frame is None: time.sleep(0.01); continue
        frame = cv2.resize(frame, (1280, 720))
        frame_count += 1
        fut = cam.process_frame_async(frame, frame_count, skip_every=FRAME_SKIP)
        try:
            annotated, match_info, face_b64 = fut.result(timeout=2)
        except Exception as e:
            continue
        ok, buf = cv2.imencode('.jpg', annotated)
        if not ok: continue
        header = {"match": id_info[match_info] if match_info is not None else None,
                  "photo": face_b64 or None}
        hj = json.dumps(header).encode('utf-8')
        ws.send(struct.pack('>I', len(hj)) + hj + buf.tobytes())
        time.sleep(SEND_DELAY)

import torch, cv2, numpy as np, face_recognition as frg
from ultralytics import YOLO
model = YOLO("yolov8n-face.pt").to("cuda").half()
YOLO_FACE_CONFIDENCE_THRESHOLD = 0.30
TOLERANCE = 0.5

def is_face_exists(img):
    boxes = model(img, verbose=False)[0].boxes.xyxy.cpu().numpy().astype(int)
    return len(boxes) > 0

def recognizeyolo(img_rgb, known_encoding, id_info, movetype, cam_name):
    res = model(img_rgb, verbose=False)
    # ... crop, frg.face_encodings, frg.compare_faces, draw, return annotated RGB ...
    return annotated_rgb, face_located, match_idx, face_photo_b64

What i tried ->

Threading: Shared ThreadPoolExecutor(max_workers=6) for processing; reader thread per camera; Queue(maxsize=1) to drop stale frames.
Model: YOLOv8 nano face model + half precision (CUDA). Early-exit with is_face_exists before recognition.
Frame rate and size: Downscale to 1280×720 ; SEND_DELAY ≈ 12 FPS to clients. Preview route is 640×360.
I/O: JPEG encoding via cv2.imencode('.jpg', ...); binary WebSocket payload with small JSON header.

The main problem is that when there are upto 4 cameras the lags are minimal and system runs fi ne but when i add more cams it starts lagging and frame drops, also there are delays in the cmeras stream

What can be the possible solution and way to achieve this can you help me also can you give me refrences?

Also tried gstreamer - to get the frames but still i am new to it so was not able to achieve the desired result as i am still learning how the pipelines of gstreamer works and what are the functions. If it can help me increase the performance than i will be happy to use that but still in cmd i am not able to use the gstreamer on gpu it always fall back to cpu.

asked Aug 13 at 15:22

playkashyap

191 silver badge3 bronze badges

the scope of your task is vast. I think that is too much to ask on Stack Overflow.

Christoph Rackwitz
– Christoph Rackwitz

2025-08-13 20:18:22 +00:00
Commented Aug 13 at 20:18
try tensorrt github.com/triple-Mu/YOLOv8-TensorRT

Lamp
– Lamp

2025-08-14 03:30:15 +00:00
Commented Aug 14 at 3:30
@ChristophRackwitz actually after asking this question i did some changes in my code and insted of opencv capture i used ffmpeg for getting rtsp stream and WebRTC for Streaming to the frontend it improved the FPS of the streams but its still CPU bound and after connecting 5-6 cameras all the native of 1080p downscaled to 720p to stream on frontend my cpu uses is arounf 90% so is this good approach, i know i asked a very big question but here i am stuck and not finding any way, in forums people are suggesting gstreamer + deepstream but pipelines are too complicated to figureout

playkashyap
– playkashyap

2025-08-14 12:37:37 +00:00
Commented Aug 14 at 12:37
@Lamp i am already running the ultralytics model with CUDA

playkashyap
– playkashyap

2025-08-14 12:39:39 +00:00
Commented Aug 14 at 12:39
1

Cuda is different than tensorrt. With tensorrt you can probably reduce gpu load and cpu load slightly, because it's just better optimized than cuda-only. However you should try to find the actual bottleneck, it's not necessarily the gpu or gpu-bandwidth. Could also be the stream decoding or something else. Are you able to receive and process >4 caneras without the face recognition part?

Micka
– Micka

2025-08-14 19:28:53 +00:00
Commented Aug 14 at 19:28

| Show 2 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How can I optimize a Python RTSP → YOLOv8 face-recognition → WebSocket pipeline to handle 6–8 cameras without lag/stutter?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest