Skip to content

Examples: RTPEngine speech

Lorenzo Mangani edited this page Oct 14, 2017 · 10 revisions

Speech Recognition

HOMER + OpenSIPS + RTPEngine

What if we could "read" transcribed users conversations directly in HOMER sessions? Well...

We Can!

Speech recognition for VoIP has been around for ages, and the recent storm of hosted ML transcription services keep this an evergreen subject to hack around with. In this guide, we'll assemble one of the many possible sets to create an "intercepting" SIP+RTP Proxy able to post-process recorded calls and correlate them for fun and profit using OpenSIPS, RTPEngine and the BING Speech API (more in the future)

HOW?

OpenSIPS will act as a transparent proxy, hooking media streams using the latest RTPEngine which features recording capabilities via a new set of dedicated controls available directly at dialplan level and dedicated init options:

  --recording-dir=FILE             Spool directory where PCAP call recording data goes
  --recording-method=pcap|proc     Strategy for call recording
  --recording-format=raw|eth       PCAP file format for recorded calls.

In this experiment, we'll leverage this mix to create an intercepting proxy where we can emulate and extend a Sip:Wise dangerous demo on the same subject and do the following:

  • Record Complete Calls through our intercepting-proxy
  • Post-Process Recordings for Speech Recognition
  • Send any Transcription results in HEP format to HOMER

Setup

The ingredients of our intercepting recipe:

Lazy Mode 100

  • Good news! We prepared a full working docker container with all of the elements required for this demo!
  • Bad News! You're going to build this locally to match your kernel version (unless you're on 3.16.0)

Docker Everything!

For the sake of simplicity, we'll use a container running OpenSIPS + RTPEngine recorder with open relay settings, thus able to proxy SIP towards any target system of choice - in other words, we will use a real SIP account on the other end.

1: Building from Source

In order for RTPEngine to insert and use its kernel recording modules on a given Docker system, the container must be built for the specific underlying OS kernel version - since we're at it, we'll build everything from sources! Mind this might fail when using "excessively virtualized" Docker flavors.

Make sure a recent Docker is installed, and proceed to build your container:

git clone -b dev https://github.com/lmangani/docker-rtpagent-speech
cd docker-rtpagent-speech
docker build -t qxip/docker-rtpengine-speech .

NOTE: If you're running on Debian 8 with kernel 3.16.0-4 you can use the master repository and the prebuilt packages

2: Configuring the Stack

The repository ships with a sample docker-compose file ready to be customized with our parameters. Adjust the port range parameters to your likes, and enter the details of your HOMER installation:

version: '2.2'
services:
  opensips-rec:
    image: qxip/docker-rtpengine-speech
    privileged: true
    restart: always
    environment:
      ADVERTISED_RANGE_FIRST: 20000
      ADVERTISED_RANGE_LAST: 20100
      HOMER_SERVER: 'YOUR_HOMER_IP'
      HOMER_PORT: 9060
      BING_KEY: 'YOUR_KEY_HERE'
    volumes:
       - /var/lib/mysql
       - /recording
    ports:
      - "5060:5060/udp"
      - "5061:5061/tcp"
      - "20000-20100:20000-20100/udp"
  captagent:
    container_name: captagent
    image: qxip/captagent-docker
    network_mode: "service:[opensips-rec]"
    environment:
      - ETHERNET_DEV=any
      - CAPTURE_HOST='YOUR_HOMER_IP'
      - CAPTURE_PORT=9060

Once you're happy with the settings, launch the container set:

docker-compose up -d
2: Make a Call!

This is the easy part. Configure your existing SIP account to proxy through your shiny new Proxy. With a pinch of luck you'll quickly get REGISTERED - Next, make a call to your voicemail, yourself or a friend (no screaming monkeys for this one tho)

Keep things short for simplicity, maybe add a few breaks between sentences, and hangup!

3: Check for Results

The call recordings will be picked up and processed within 30s from hangup by the built in nodejs app.

Allow some time for this process to take place (you can watch syslog inside the container for actin) and then proceed to locate your call session in HOMER or HEPIC - If things went right, a few log entries should magically appear, revealing your conversation (or at least, what Bing Speech made of it!)

Did it work? Any profanity detected? Share your results with us or shout out @sipcapture on twitter!

Troubleshooting

I get no recording / transcriptions
  • Make sure you entered your BING Key and HOMER details correctly.
  • Shell into the container and tail /var/log/syslog
  • Check for errors related to kernel modules or recording targets
Clone this wiki locally