Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OAK-D-POE intermittent failure - INTERNAL_ERROR_CORE #1103

Open
laurence-diack-pk opened this issue Aug 19, 2024 · 5 comments
Open

[BUG] OAK-D-POE intermittent failure - INTERNAL_ERROR_CORE #1103

laurence-diack-pk opened this issue Aug 19, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@laurence-diack-pk
Copy link

Problem Description
OAK-D-POE cameras intermittently disappear from the network and become unreachable via ping while running. The issue occurs unpredictably:

  • Sometimes happens after about 5 minutes of operation
  • Sometimes doesn't occur at all during a session
  • Requires a power cycle to bring the cameras back online

System Details

  • Camera Model: OAK-D-POE
  • Main Application: Written in C++
  • Network Configuration: 2x OAK-D-POEs sharing a subnet with a RPLIDAR-2E and Intel NUC
  • Power Supply: POE
  • System info attached here log_system_information.json

Observed Behavior

  • The pipeline starts normally from a cold boot.
  • At an unpredictable point, the cameras become unreachable on the network.
  • Ping attempts to the camera IP addresses fail.
  • Power cycling the cameras brings them back online.

Crash Dumps
Two crash dumps have been collected, showing the following errors:

INTERNAL_ERROR_CORE
RTEMS_FATAL_SOURCE_EXCEPTION

Crash dump files:
crashDump_1_1844301031A3DC0E00_c7127782f2da45aac89d5b5b816d04cc45ae40be.json
crashDump_0_18443010C15E9F0F00_c7127782f2da45aac89d5b5b816d04cc45ae40be.json

Pipeline
The main app is C++ so I can't get a pipeline grab, but here's the basic setup:

    // Create Nodes
    auto yolospatialdetectionnetwork = pipeline.create<dai::node::YoloSpatialDetectionNetwork>();
    auto camrgb = pipeline.create<dai::node::ColorCamera>();
    auto monoleft = pipeline.create<dai::node::MonoCamera>();
    auto monoright = pipeline.create<dai::node::MonoCamera>();
    auto stereo = pipeline.create<dai::node::StereoDepth>();
    auto objecttracker = pipeline.create<dai::node::ObjectTracker>();
    auto imu = pipeline.create<dai::node::IMU>();
    auto videoenc = pipeline.create<dai::node::VideoEncoder>();
    auto videoenc_webui = pipeline.create<dai::node::VideoEncoder>();
    auto videoenc_left = pipeline.create<dai::node::VideoEncoder>();
    auto videoenc_right = pipeline.create<dai::node::VideoEncoder>();
    auto manip = pipeline.create<dai::node::ImageManip>();
    auto xoutdepth = pipeline.create<dai::node::XLinkOut>();
    auto xouttracks = pipeline.create<dai::node::XLinkOut>();
    auto xoutdetections = pipeline.create<dai::node::XLinkOut>();
    auto xoutIMU = pipeline.create<dai::node::XLinkOut>();
    auto xoutvidenc = pipeline.create<dai::node::XLinkOut>();
    auto xoutmonoenc_left = pipeline.create<dai::node::XLinkOut>();
    auto xoutmonoenc_right = pipeline.create<dai::node::XLinkOut>();
    auto xoutvideostream = pipeline.create<dai::node::XLinkOut>();

    /// Set stream names for outputs
    xouttracks->setStreamName("tracklets");
    xoutdepth->setStreamName("depth");
    xoutdetections->setStreamName("detections");
    xoutvidenc->setStreamName("vid_enc");
    xoutmonoenc_left->setStreamName("mono_enc_left");
    xoutmonoenc_right->setStreamName("mono_enc_right");
    xoutvideostream->setStreamName("videostream");
    xoutIMU->setStreamName("imu");

    // Set properties for nodes
    camrgb->setPreviewSize(416, 416);
    camrgb->setInterleaved(false);
    camrgb->setColorOrder(dai::ColorCameraProperties::ColorOrder::RGB);
    camrgb->setPreviewKeepAspectRatio(false);

    // Set RGB resolution
    if (camera_.isIMX378())
    {
        camrgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_1080_P);
        camrgb->setIspScale(2, 3); // Set the rgb resolution to be 2/3 of the resolution for better alignment
    }
    else
    {
        camrgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_720_P);
    }

    monoleft->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
    monoleft->setBoardSocket(dai::CameraBoardSocket::CAM_B);
    monoright->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
    monoright->setBoardSocket(dai::CameraBoardSocket::CAM_C);

    camrgb->setFps(fps);
    monoleft->setFps(fps);
    monoright->setFps(fps);
    videoenc->setQuality(93);
    videoenc_left->setQuality(93);
    videoenc_right->setQuality(93);
    videoenc->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
    videoenc_left->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
    videoenc_right->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
    videoenc_webui->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::H264_BASELINE);
    videoenc_webui->setQuality(50);
    videoenc_webui->setFrameRate(fps);
    videoenc_webui->setRateControlMode(dai::VideoEncoderProperties::RateControlMode::CBR);
    auto videoenc_webui_bitrate = 500000;
    auto videoenc_webui_width = 1280;
    auto videoenc_webui_height = 720;
    videoenc_webui->setBitrate(videoenc_webui_bitrate);

    // imu settings
    imu->enableIMUSensor({dai::IMUSensor::ACCELEROMETER_RAW, dai::IMUSensor::GYROSCOPE_RAW}, 200);
    imu->setBatchReportThreshold(1);
    imu->setMaxBatchReports(10);

    // setting node configs
    stereo->setDefaultProfilePreset(dai::node::StereoDepth::PresetMode::HIGH_ACCURACY);
    stereo->setSubpixel(true);
    stereo->setLeftRightCheck(true);
    stereo->left.setQueueSize(1);
    stereo->right.setQueueSize(1);
    stereo->left.setBlocking(false);
    stereo->right.setBlocking(false);
    stereo->setDepthAlign(dai::CameraBoardSocket::CAM_A);
    stereo->setOutputSize(monoleft->getResolutionWidth(), monoleft->getResolutionHeight());
    stereo->useHomographyRectification(false);
    stereo->setConfidenceThreshold(confidence_threshold);
    auto config = stereo->initialConfig.get();
    config.postProcessing.median = dai::MedianFilter::KERNEL_5x5;
    config.postProcessing.temporalFilter.enable = true;
    config.postProcessing.spatialFilter.enable = true;
    config.postProcessing.spatialFilter.holeFillingRadius = 2;
    config.postProcessing.spatialFilter.numIterations = 1;
    config.postProcessing.thresholdFilter.minRange = 300;
    config.postProcessing.thresholdFilter.maxRange = 10000;
    config.postProcessing.decimationFilter.decimationFactor = 3;
    config.postProcessing.decimationFilter.decimationMode = dai::RawStereoDepthConfig::PostProcessing::DecimationFilter::DecimationMode::NON_ZERO_MEDIAN;

    // Set spatial mobile net settings
    yolospatialdetectionnetwork->setBlobPath(nn_path);
    // Pub names of classes
    auto nn_classes = getNNClasses(nn_config_path);
    std::unordered_map<std::string, float> detection_confidences;
    // grab default confidence vals
    try {
        detection_confidences = getConfigValue<std::unordered_map<std::string, float>>(config_, {"nn", "default_confidence"});
    } catch (const std::exception& e) {
        ROS_ERROR_STREAM("Error parsing detection confidence: " << e.what());
    }

    grover_msgs::StringArray nn_classes_msg;
    for (const auto& nn_class : nn_classes) {
        nn_classes_msg.data.push_back(nn_class);

        auto it = detection_confidences.find(nn_class);
        if (it != detection_confidences.end()) {
            m_detection_class_conf.push_back(std::make_pair(nn_class, it->second));
        } else {
            m_detection_class_conf.push_back(std::make_pair(nn_class, confidence_threshold));
        }
    }

    m_nn_classes_pub = nh_.advertise<grover_msgs::StringArray>(cam_name_ + "/nn_classes", 1, true);
    m_nn_classes_pub.publish(nn_classes_msg);

    

    fillNNSettings<dai::node::YoloSpatialDetectionNetwork>(nn_config_path, yolospatialdetectionnetwork);
    yolospatialdetectionnetwork->input.setBlocking(true);
    yolospatialdetectionnetwork->setBoundingBoxScaleFactor(0.5);
    yolospatialdetectionnetwork->setDepthLowerThreshold(150);
    yolospatialdetectionnetwork->setDepthUpperThreshold(15000);
    yolospatialdetectionnetwork->setIouThreshold(0.5f);

    // possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS, SHORT_TERM_IMAGELESS, SHORT_TERM_KCF
    objecttracker->setTrackerType(dai::TrackerType::ZERO_TERM_IMAGELESS);
    // take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
    objecttracker->setTrackerIdAssignmentPolicy(dai::TrackerIdAssignmentPolicy::SMALLEST_ID);

    manip->setMaxOutputFrameSize(1382400);
    manip->initialConfig.setResize(1280, 720);
    manip->initialConfig.setFrameType(dai::ImgFrame::Type::NV12);

    monoleft->out.link(stereo->left);
    monoright->out.link(stereo->right);

    camrgb->video.link(manip->inputImage);
    manip->out.link(videoenc_webui->input);
    videoenc_webui->bitstream.link(xoutvideostream->input);
    camrgb->video.link(videoenc->input);

    monoright->out.link(videoenc_right->input);
    monoleft->out.link(videoenc_left->input);
    videoenc->bitstream.link(xoutvidenc->input);
    videoenc_right->bitstream.link(xoutmonoenc_right->input);
    videoenc_left->bitstream.link(xoutmonoenc_left->input);

    stereo->depth.link(xoutdepth->input);

    imu->out.link(xoutIMU->input);

    camrgb->preview.link(yolospatialdetectionnetwork->input);
    stereo->depth.link(yolospatialdetectionnetwork->inputDepth);
    yolospatialdetectionnetwork->passthrough.link(objecttracker->inputTrackerFrame);
    yolospatialdetectionnetwork->passthrough.link(objecttracker->inputDetectionFrame);
    yolospatialdetectionnetwork->out.link(objecttracker->inputDetections);
    yolospatialdetectionnetwork->out.link(xoutdetections->input);

    objecttracker->out.link(xouttracks->input);

Any insights would be greatly appreciated, thanks

@laurence-diack-pk laurence-diack-pk added the bug Something isn't working label Aug 19, 2024
@moratom
Copy link
Collaborator

moratom commented Aug 22, 2024

Thanks for the bug report @laurence-diack-pk !

Just to clarify, the disconnects happen whilst you're running the app right?

@moratom
Copy link
Collaborator

moratom commented Aug 22, 2024

@SzabolcsGergely could you take a look at a crashdumps when you have a moment?

@SzabolcsGergely
Copy link
Collaborator

@SzabolcsGergely could you take a look at a crashdumps when you have a moment?

Crash occurred during a XLink read, in XLinkPlatformRead, reason unknown.

@laurence-diack-pk
Copy link
Author

laurence-diack-pk commented Aug 25, 2024

Thanks for the bug report @laurence-diack-pk !

Just to clarify, the disconnects happen whilst you're running the app right?

Yeah so it seems it can happen on pipeline load or also mid-run.

It doesn't seem to be a very predictable failure and I'm having a hard time reproducing it consistently - for example I am looking at an instance right now where one of two cameras has disappeared, but I had to restart the host several times to get it into this state.

Also it may well be that the crashdumps are not a 1:1 correlation with this failure, as I have observed cases where it does this and no crashdump is retrieved.

Sorry for the vagueness, it's just sorta a black box from my end - if I can't communicate with the camera over network, it's hard to tell exactly what's going on.

I was wondering if there's any additional logging I can pull of the device itself, or perhaps some way in which I could use the M8 connector to debug over uart or usb so I can get some insight into the state of the camera when it disappears like that.

@laurence-diack-pk
Copy link
Author

I trimmed down the pipeline slightly by conditionally removing nodes that weren't necessary (imageManip and the mono encoders/outs) and that seemed to help stability greatly, though I have still had a few intermittent failures.

See: crashDump_0_18443010C15E9F0F00_9ed7c9ae4c232ff93a3500a585a6b1c00650e22c.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants