OSD-GDB server: Provide interface to connect OSD with GDB [WIP] #20

shivmgg · 2018-06-27T20:09:10Z

This PR adds OSD-GDB server which enables GDB client to connect to an OSD-enabled system. It implements GDB Remote Serial Protocol (RSP) which provides a high-level protocol allowing GDB to connect to any target remotely. The current implementation supports OpenRISC 1000 architecture and is experimental in nature.

The current iteration of the program provides the API to support:

connection to GDB (client) via TCP/IP
send and receive RSP packets to/from the GDB client.
encode and validate the obtained RSP packet
an interface to interact with MAM and CDM module using osd_hostmod. These modules enable the server to manipulate two key areas, i.e. main memory (using MAM module) and Special Purpose Registers (using CDM module) in the CPU core.
following RSP packets (yet, to be tested):
- `g’
  Read general purpose registers
- `G’XX…
  Write general purpose registers
- `p n'
  Read the value of register n
- `P n...=r...'
  Write register n... with value r
- m’ADDR`,’LENGTH
  Read length addressable memory units starting at address addr
- M’ADDR,LENGTH`:’XX
  Write length addressable memory units starting at address addr (see addressable memory unit). The data is given by XX…; each byte is transmitted as a two-digit hexadecimal number.

RSP commands from the client to the server are textual strings. Each packet is acknowledged with a single character '+' to indicate valid receipt and ‘-’ to indicate the failure.

The PR also contains a mock GDB client to check and validate TCP connection with the server and test various requests/response commands and packets.

Closes #19

imphil · 2018-06-28T08:28:49Z

Thanks for this PR! Before we go into the details, let's start with some fundamental things.

Please separate between public and private API. The public API goes into the .h file and has corresponding functions without "static" in the .c file, but marked with the API_EXPORT macro. On the other hand private functions are only in the .c file and marked with "static".
This is especially important to quickly see the entry points to your class.
Even though the class is named "osd_gdbserver" the files are typically named "gdbserver". Please look at the existing code for these naming conventions.
Please take e.g. the osd_systracelogger class as an example how to make a class out of your collection of functions. This includes the _new() and _free() functions and the context object (i.e. the "this" pointer). This should allow you to get rid of the "connection" struct. All functions the public API should be prefixed with osd_gdbserver_, the private ones are typically not.
The naming of the functions could be a bit more intuitive. "get" and "put" are typically operations on a list or similar data structures, not on a network socket. A "packet" is ambiguous in OSD since we also have the debug packets around.
We need to work out how to handle blocking, i.e. where blocking is expected, and where the API should/must be asynchronous. I guess this will become clearer once we have a first application actually using the class. Which brings us to the next point.
Testing: Unit tests are required. You can start by testing the functions which don't require a network socket, but at some point in time you'll need to create a "mock gdb client", i.e. code which actually connects to your TCP port. You can then instruct this code to send specific requests and responses. Examples of such functionality are in the source code (e.g. mock_hostmod).
Development is much easier if you start right away creating an application using your new class. If you look at osd-systrace-log for example you see that it's not hard (most of the code is boilerplate and can be copied).

So that's a couple points that I'd like to see worked on first. Once this is done we can focus on the smaller things, and talk about how to actually implement the handling of commands coming in from GDB.

shivmgg · 2018-06-28T19:10:20Z

@imphil Thanks a lot for the review.
I have tried my best to cover everything as stated in the first few points.

Testing: Unit tests are required. You can start by testing the functions which don't require a network socket, but at some point in time you'll need to create a "mock gdb client", i.e. code which actually connects to your TCP port. You can then instruct this code to send specific requests and responses. Examples of such functionality are in the source code (e.g. mock_hostmod).

I will try adding unit tests for send/receive_rsp_packet() functions. The idea of writing a "mock gdb client" is a little frightening but still exciting right now. I will look into the code of mock_hostmod() for more details.

Development is much easier if you start right away creating an application using your new class. If you look at osd-systrace-log for example you see that it's not hard (most of the code is boilerplate and can be copied).

I think you meant "osd-target-run". Please correct me, if I am wrong. I will start with it right away.

Edit: I tried adding unit tests for validate_rsp_packet() function but cannot run them since they need access to the TCP port.

(1) Connect to GDB over TCP (2) Send and Recieve data to/from the client (3) Receive RSP packet from the client (4) Validate the obtained packet using checksum (5) Send RSP packet to the client

imphil · 2018-06-30T16:32:06Z

The challenges you have in testing are indeed tricky and require a bit of thought. Designing software in a way that it is testable isn't always easy, but will lead to a much clearer software design in the end.

In general, unit tests test the public API of a class. For us that's all functions which appear in the include/osd/*.h files, which are not marked "static" and have the API_EXPORT macro set (which makes the functions visible outside the library).

For our osd_gdbserver class the public API is not exposing the functions which parse and assemble the stream of GDB commands. But in fact this code is the most tricky to write and benefits most from unit tests. So we should find a way to make these functions testable.

So what options do we have? The easiest way is to make the functions we want to test "public", at least when running the unit tests. I'd hence go ahead and do the following:

Don't mark the parsing and assembling functions static.
Create a header file (e.g. src/gdbserver-private.h) in which you declare the functions. (You don't need to add this header file to a Makefile.)
Include this private header file in the unit test: #include "../../src/libosd/gdbserver-private.h"

You can now call such "private" functions in your test code.

Prerequisite, however, is you don't have a Matryoshka doll-like design in your functions, i.e. functions which call other subfunctions in a nesting fashion. An example is the function send_rsp_packet(). Right now it has two functions: assemble the data to send according to the RSP protocol, and send it over the TCP connection. To test this function you need to either have the full TCP connection tested with it, or somehow intercept the sending functions, both not good options. Things get easier if you create two functions instead. One assembles the output string according to the RSP protocol, and returns the buffer. And the other one then sends this buffer over TCP. This makes it much easier to test the first function, which is the more tricky one.

shivmgg · 2018-06-30T18:05:22Z

Thanks a lot for the guidance.

Prerequisite, however, is you don't have a Matryoshka doll-like design in your functions, i.e. functions which call other subfunctions in a nesting fashion. An example is the function send_rsp_packet(). Right now it has two functions: assemble the data to send according to the RSP protocol, and send it over the TCP connection. To test this function you need to either have the full TCP connection tested with it, or somehow intercept the sending functions, both not good options. Things get easier if you create two functions instead. One assembles the output string according to the RSP protocol, and returns the buffer. And the other one then sends this buffer over TCP. This makes it much easier to test the first function, which is the more tricky one.

Some more issues:

Almost every function needs access to "osd_gdbserver_ctx" struct object. The object's member elements like "buffer", etc. are initialised using osd_gdbserver_read_data() function. This function cannot be invoked right now.
One alternative solution could be to have a set_osd_gdbserver_ctx() function and set these values manually during testing. Similarly, having a get_osd_gdbserver_ctx() function might help.

Edit: Implementation like the one below can be used to test send_rsp_packet_inner.

void send_rsp_packet_inner(char *buffer, int len, char *packet_buffer,
                                 int packet_checksum)
{
    packet_buffer[0] = '$';
    memcpy(packet_buffer + 1, buffer, len);
    int j = len + 1;
    packet_buffer[j++] = '#';
    for (int i = 0; i < len; i++) {
         packet_checksum += buffer[i];
    }
    packet_buffer[j++] = dectohex((packet_checksum >> 4) & 0xf);
    packet_buffer[j] = dectohex(packet_checksum & 0xf);
}

osd_result send_rsp_packet(struct osd_gdbserver_ctx *ctx, char *buffer,
                           int len)
{
    char packet_buffer[len + 3];
    int packet_checksum = 0;
    osd_result rv;

    while(1) {
        send_rsp_packet_inner(buffer, len, packet_buffer, &packet_checksum);
        rv = osd_gdbserver_write_data(ctx, packet_buffer, len + 4);
        if (OSD_FAILED(rv)) {
            return OSD_ERROR_FAILURE;
        }

        rv = osd_gdbserver_read_data(ctx);
        if (OSD_FAILED(rv)) {
            return OSD_ERROR_FAILURE;
        }

        char reply = ctx->buffer[0];
        if (reply == '+') {
            break;
        } else {
            return OSD_ERROR_FAILURE;
        }
    }
}

imphil · 2018-06-30T19:10:23Z

Your version in the edit got it right. Functions like send_rsp_packet_inner() (better name needed) don't need the context struct and hence can easily be tested on its own. Please go ahead and try to update the PR in this style (and add the tests).

imphil · 2018-06-30T19:33:56Z

I forgot to mention: you don't need to make all functions which are currently static functions, i.e. helpers, available for testing. But you absolutely should test the functions which implement the RSP protocol and deal with string buffers, since those functions are very easy to get wrong (functional errors, buffer overflows, other memory errors, etc.)

shivmgg · 2018-07-06T14:40:28Z

@imphil Apologies for the delay.
I have added unit tests for the private functions. Please take a look whenever you get some time.
I think I should start working on the implementation of different RSP commands.

These functions implement the RSP protocol and deal with the string buffer.

imphil

Thanks for the updated PR. As you have noticed, working on memory buffers in C requires some practice. That also means we're unfortunately not there yet. I've for now focused on the encoding/decoding functions for the RSP packets, and added comments into the source code.

In addition I have three general remarks:

In almost all cases, buffer is a bad name for a function argument as it is to unspecific. Choose a name according to what is actually inside the buffer -- that's what people care about.
In your tests, don't just test a single combination of arguments. Test multiple ones, and include tricky edge cases, or cases that are expected to return an error (and make sure this error actually happens).
Programming in C requires great care being taken to memory errors, such as the wrong length of a string, writing beyond memory boundaries, etc. Most of these problems can be avoided by making it very clear (a) who's reponsible of allocating and freeing memory (e.g. the caller or callee of a function), and (b) never using buffers of static size (e.g. char[1024]) unless you are very sure and document that, that the data you're writing into the buffer will not overflow.

imphil · 2018-07-09T09:20:15Z

src/libosd/gdbserver.c

+
+//API_EXPORT
+osd_result validate_rsp_packet(char *buf_p, bool *ver_checksum, int *len,
+                               char *buffer)


A couple problems here:

validate_rsp_packet does not accurately describe what's going on in this function. From my reading, decodes a RSP packet according to the rules in https://sourceware.org/gdb/onlinedocs/gdb/Overview.html#Overview. Mention this URL in the doxygen comment for people to look up what you're actually implementing. (Correspondingly, there's also an encoding function, which you named configure_rsp_packet). How about something like rsp_packet_encode() for a function name?

The input data you pass is given as string buffer buf_p with no length. Your implementation works on the assumption that the string contains a # somewhere, otherwise it will just continue in the while loop and at some point in time it will be killed when it accesses memory it may not access. You have two options to solve that: either pass a input string and a length of this string, or assume the string is null-terminated (and assert on that assumption in the calling function!) For safety reasons I'd go with passing the length of the buffer.

Your output data is written to buffer, the length of the valid data in this buffer is in len. That does not work. The caller must allocate memory used for buffer. How much memory should it allocate? It cannot know how much data you'll write into the buffer inside your function (even you cannot give an upper bound, since it depends on the size of your input data). Again you have multiple options: (1) allocate a new buffer within this function and pass it back to the caller (and document that the caller is responsible for freeing it), (2) Operate in-place, since the output data will always be shorter than the input data (if I understood the protocol right), or (3) let the function take three arguments: a pointer to an allocated buffer (like buffer right now), the size of this buffer (i.e. how large it was allocated), and as a third argument have an output argument (similar to len right now) that indicates how many valid bytes the buffer now contains.

The status "is the packet valid" is passed back to the caller through the output variable var_checksum. Why not just use the return value of the function for that?

imphil · 2018-07-09T09:21:16Z

src/libosd/gdbserver.c

+
+    // packet-format: $packet-data#checksum
+    // traversing through the obtained packet till we obtained '#'
+    while (1) {


As said before: make sure that this loop ends even if the input data is corrupted/invalid. The easiest way is iterating only until the known end of the string (i.e. a length that you passed in, or a null-byte if the string is null-terminated)

imphil · 2018-07-09T09:22:18Z

src/libosd/gdbserver.c

+        * character followed by the original character XORed with 0x20.
+        */
+        if (packet_char == '}') {
+            val_checksum += packet_char & 0xff;


packet_char is a 8 bit value. With `& 0xff`` you mask out the first 8 bit, i.e. do nothing => simply remove that operation.

imphil · 2018-07-09T09:22:38Z

src/libosd/gdbserver.c

+    packet_char = *buf;
+    packet_checksum[1] = packet_char;
+    packet_checksum[2] = 0;
+    *ver_checksum = (val_checksum == strtoul(packet_checksum, NULL, 16));


as said before: make that a return value.

imphil · 2018-07-09T09:23:09Z

src/libosd/gdbserver.c

+
+    if (OSD_FAILED(rv)) {
+        return rv;
+    } else {


no need to have an else after a return.

imphil · 2018-07-09T09:23:49Z

src/libosd/gdbserver.c

+}
+
+API_EXPORT
+osd_result configure_rsp_packet(char *buffer, int len, char *packet_buffer)


See the long comment in validate_rsp_packet and apply the suggestions on this function as well.

Adds more test cases to check validate/encode functions.

shivmgg · 2018-07-09T17:35:29Z

Thanks for the review. I have incorporated almost every change requested. I have also added two more unit tests to check some corner cases. Here's a brief description of the two functions:

validate_rsp_packet(): This function traverses through RSP packet "$packet-data#checksum" starting from '$' till '#'. It then calculates the checksum of the obtained packet-data and compares it with the received checksum. This function ensures that the packet-data obtained is valid and uncorrupted.
encode_rsp_packet() (name changed): This function is responsible for packaging of the packet-data into RSP packet format. It also calculates the checksum of the packet data to be sent to the client (GDB).

Regarding the size of buffer, we can keep it constant and equal to the MAX_GDB_PACKET_SIZE, i.e., 1024 (currently). This is the maximum size of the packet received/send from/to the GDB via TCP. Also, the commands we are supporting right now will suffice with this size.

The largest packet that will be needed is to hold a command with the hexadecimal values of all the general registers (for the G packets).

Source: https://www.embecosm.com/appnotes/ean4/embecosm-howto-rsp-server-ean4-issue-2.html#sec_exchange_target_remote

In case of OR1K, there are a total of 32 32-bit registers, each requiring 8 hex characters + 1 character for the 'G', a total of 257 (hexadecimal 0x101) characters.

(1) 'g' : read general registers (2) 'G XX' : write general registers (3) 'p n ' : read the value of a specific register n (4) 'P n..=r..': write a specific register n with value r

shivmgg · 2018-07-12T16:35:43Z

I have added code implementation for the following GDB commands:

'g' : read general registers
'G XX' : write general registers
'p n ' : read the value of a specific register n
'P n..=r..' : write a specific register n with value r

Next, I will add support for memory read/write command and then, connect GDB using target command. I need to complete the following RSP packet exchanges for the GDB target remote command to support connectivity.

shivmgg · 2018-07-13T16:33:03Z

@imphil

Can you please suggest some way to connect to the actual GDB (client)? I have added support for almost every command required to complete the RSP packet exchanges for the GDB target command. Is it possible using unit tests?
I think the major roadblock is testing. I am not sure if we can connect to GDB using unit tests. Almost every other function needs access to the context struct and hence GDB.
If somehow, we can connect to GDB in real-time, testing and the development part of the other blocks can be increased many-fold.
Also, do the program needs two hostmod_ctx, one for CDM client class and another for MAM client class?

1. m addr,length : Read length bytes of memory starting at addr. 2. M addr,length:XX... : Write length bytes of memory starting at addr. 3. Unit tests to check intermediate utility functions

imphil · 2018-07-16T07:52:04Z

Unit testing can and needs to be done through a mock gdb client. I already outlined that in one of the previous discussions, please implement that. Before such testing is in place I will not have a further look at the code.

shivmgg · 2018-07-16T07:56:07Z

Thanks a lot for the response.
Regarding the implementation of mock GDB client, it should be similar to mock_hostmod.c and should have features of the actual GDB, right?

imphil · 2018-07-16T07:58:35Z

It should open a tcp connection, and act like a gdb client, by sending commands that you told the mock client to do, and verifying the response.

shivmgg · 2018-07-22T13:18:26Z

I have added a rough implementation of mock GDBclient program and unit tests to check and validate the TCP connection between GDB server and mock GDBclient program. The test cases are working fine. Currently, I used PThread API to establish a TCP connection.

GDBserver has the following two programs:

a hostmod program which provides access to CDM and MAM client classes. This program uses worker class which in turn utilizes PThread API and ZeroMQ sockets for communication.
a GDB client which provides access to all debugging features. Currently, I have used PThread API and standard TCP protocol to implement it.

I think we need to design the code such that there is no deadlock between the two threads/programs.
Both of these programs need access to the same buffer data.

Also, as you pointed out earlier, we need to figure out the non-blocking aspect of the code. For this part, I tried using:

setsockopt(cli_socket, SOL_SOCKET, SO_RCVTIMEO, (struct timeval *)&tv, sizeof(struct timeval));

but it didn't work.
I will explore the non-blocking aspect of the program using the Event Poll interface.

Send/receive and read/write functions are working successfully using the existing implementation (with slight changes). Maybe, for now, I should focus on completing the GDBserver with blocking I/O only, i.e. adding add_breakpoint/remove_breakpoint functions. I can switch to non-blocking I/O and other concepts later.

imphil

Thanks Shivam for the update on this PR. Getting parallelism right in any programming language is tricky and requires thinking of multiple components and interfaces at the same time. I know this is not an easy task, but I'm sure the experience will be worth it.

I focus in this review on the public API and the threading questions. I leave all further questions to future iterations.
I did refrain for now from discussing the GDB buffer sizing and related discussions. There is more to be said later.

I noted a couple things regarding the public API in the code (and some other drive-by comments).
The most important part now is to get the parallelism right.

To get a better understanding of the parallelism I'm proposing you draw a message sequence chart (e.g. using draw.io) which includes the participants (classes and other components like the GDB client) and the communication between them.

Two communication paths need to be supported.

Path 1: gdb-client initiated commands:
GDB client -> osd_gdbserver -> (some function handling the command, typically using osd_hostmod) -> ... (response)

Questions to be answered:

Which things can happen in parallel (i.e. are asynchronus)
Is the gdbserver protocol a synchronous (blocking) or asynchronous protocol?
Example: if the gdb client sends a command to read memory from the target, is the answer sent when the memory has been read, or does it return immediately (e.g. with an acknowledgement saying "command receive"), and at some later point a new message is sent with the data?
Answering the previous question is essential to the design of the osd_gdbserver module: it determines if you need to (potentially) handle multiple commands in parallel, or just one.

Path 2: event notifications (e.g. breakpoint notifications)
osd_hostmod (events) -> osd_gdbserver -> (TCP send) -> GDB client

osd_hostmod provide a polling interface (osd_hostmod_event_receive()) and a asynchronous interface (register a event rx callback with osd_hostmod_new()). Depending on the answer of the remaining design both could be used.

Questions to be answered:

How are these notifications being combined with the gdb client-initiated data? I.e. breakpoint notifications be sent to a gdb client at any time, or only when no other command (see above) is being processed?

Things already known:

You need at least one thread within osd_gdbserver which handles the communication with the gdb client. This thread is started when the port is bound (e.g. in osd_gdbserver_start() or _connect(), whatever name you choose in the end), and torn down when the osd_gdbserver is stopped/disconnected.

Things not yet known:

You possibly need more threads, depending on the answers to the previous questions.
Depending on the communicaton needs between the threads, you may get away with simple pthreads (if you need almost no communication), or you need to use the worker helper class (if you have more communication between threads).

imphil · 2018-07-30T08:30:04Z