Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDP & AF_XDP for IAVF based on libie and page-pool #19

Open
wants to merge 40 commits into
base: net-next
Choose a base branch
from

Commits on Mar 24, 2023

  1. net: intel: introduce Intel Ethernet common library

    Not a secret there's a ton of code duplication between two and more Intel
    ethernet modules.
    Before introducing new changes, which would need to be copied over again,
    start decoupling the already existing duplicate functionality into a new
    module, which will be shared between several Intel Ethernet drivers.
    Add the lookup table which converts 8/10-bit hardware packet type into
    a parsed bitfield structure for easy checking packet format parameters,
    such as payload level, IP version, etc. This is currently used by i40e,
    ice and iavf and it's all the same in all three drivers.
    The only difference introduced in this implementation is that instead of
    defining a 256 (or 1024 in case of ice) element array, add unlikely()
    condition to limit the input to 154 (current maximum non-reserved packet
    type). There's no reason to waste 600 (or even 3600) bytes only to not
    hurt very unlikely exception packets.
    The hash computation function now takes payload level directly as a
    pkt_hash_type. There's a couple cases when non-IP ptypes are marked as
    L3 payload and in the previous versions their hash level would be 2, not
    3. But skb_set_hash() only sees difference between L4 and non-L4, thus
    this won't change anything at all.
    The module is behind the hidden Kconfig symbol, which the drivers will
    select when needed. The exports are behind 'LIBIE' namespace to limit
    the scope of the functions.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    67bcf37 View commit details
    Browse the repository at this point in the history
  2. iavf: kill "legacy-rx" for good

    Ever since build_skb() became stable, the old way with allocating an skb
    for storing the headers separately, which will be then copied manually,
    was slower, less flexible and thus obsolete.
    
    * it had higher pressure on MM since it actually allocates new pages,
      which then get split and refcount-biased (NAPI page cache);
    * it implies memcpy() of packet headers (40+ bytes per each frame);
    * the actual header length was calculated via eth_get_headlen(), which
      invokes Flow Dissector and thus wastes a bunch of CPU cycles;
    * XDP makes it even more weird since it requires headroom for long and
      also tailroom for some time (since mbuf landed). Take a look at the
      ice driver, which is built around work-arounds to make XDP work with
      it.
    
    Even on some quite low-end hardware (not a common case for 100G NICs) it
    was performing worse.
    The only advantage "legacy-rx" had is that it didn't require any
    reserved headroom and tailroom. But iavf didn't use this, as it always
    splits pages into two halves of 2k, while that save would only be useful
    when striding. And again, XDP effectively removes that sole pro.
    
    There's a train of features to land in IAVF soon: Page Pool, XDP, XSk,
    multi-buffer etc. Each new would require adding more and more Danse
    Macabre for absolutely no reason, besides making hotpath less and less
    effective.
    Remove the "feature" with all the related code. This includes at least
    one very hot branch (typically hit on each new frame), which was either
    always-true or always-false at least for a complete NAPI bulk of 64
    frames, the whole private flags cruft and so on. Some stats:
    
    Function: add/remove: 0/2 grow/shrink: 0/7 up/down: 0/-774 (-774)
    RO Data: add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-40 (-40)
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    0e490c5 View commit details
    Browse the repository at this point in the history
  3. iavf: optimize Rx buffer allocation a bunch

    The Rx hotpath code of IAVF is not well-optimized TBH. Before doing any
    further buffer model changes, shake it up a bit. Notably:
    
    1. Cache more variables on the stack.
       DMA device, Rx page size, NTC -- these are the most common things
       used all throughout the hotpath, often in loops on each iteration.
       Instead of fetching (or even calculating, as with the page size) them
       from the ring all the time, cache them on the stack at the beginning
       of the NAPI polling callback. NTC will be written back at the end,
       the rest are used read-only, so no sync needed.
    2. Don't move the recycled buffers around the ring.
       The idea of passing the page of the right-now-recycled-buffer to a
       different buffer, in this case, the first one that needs to be
       allocated, moreover, on each new frame, is fundamentally wrong. It
       involves a few o' fetches, branches and then writes (and one Rx
       buffer struct is at least 32 bytes) where they're completely unneeded,
       but gives no good -- the result is the same as if we'd recycle it
       inplace, at the same position where it was used. So drop this and let
       the main refilling function take care of all the buffers, which were
       processed and now need to be recycled/refilled.
    3. Don't allocate with %GPF_ATOMIC on ifup.
       This involved introducing the @gfp parameter to a couple functions.
       Doesn't change anything for Rx -> softirq.
    4. 1 budget unit == 1 descriptor, not skb.
       There could be underflow when receiving a lot of fragmented frames.
       If each of them would consist of 2 frags, it means that we'd process
       64 descriptors at the point where we pass the 32th skb to the stack.
       But the driver would count that only as a half, which could make NAPI
       re-enable interrupts prematurely and create unnecessary CPU load.
    5. Shortcut !size case.
       It's super rare, but possible -- for example, if the last buffer of
       the fragmented frame contained only FCS, which was then stripped by
       the HW. Instead of checking for size several times when processing,
       quickly reuse the buffer and jump to the skb fields part.
    6. Refill the ring after finishing the polling loop.
       Previously, the loop wasn't starting a new iteration after the 64th
       desc, meaning that we were always leaving 16 buffers non-refilled
       until the next NAPI poll. It's better to refill them while they're
       still hot, so do that right after exiting the loop as well.
       For a full cycle of 64 descs, there will be 4 refills of 16 descs
       from now on.
    
    Function: add/remove: 4/2 grow/shrink: 0/5 up/down: 473/-647 (-174)
    
    + up to 2% performance.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    88798b9 View commit details
    Browse the repository at this point in the history
  4. iavf: remove page splitting/recycling

    As an intermediate step, remove all page splitting/recyclig code. Just
    always allocate a new page and don't touch its refcount, so that it gets
    freed by the core stack later.
    The change allows to greatly simplify certain parts of the code:
    
    Function: add/remove: 2/3 grow/shrink: 0/5 up/down: 543/-963 (-420)
    
    &iavf_rx_buf can even now retire in favor of just storing an array of
    pages used for Rx. Their DMA addresses can be stored in page::dma_addr
    -- use Page Pool's function for that.
    No surprise perf loses up to 30% here, but that regression will go away
    once PP lands.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    46eb61c View commit details
    Browse the repository at this point in the history
  5. iavf: always use a full order-0 page

    The current scheme with trying to pick the smallest buffer possible for
    the current MTU in order to flip/split pages is not very optimal.
    For example, on default MTU of 1500 it gives only 192 bytes of headroom,
    while XDP may require up to 258. But this also involves unnecessary code
    complication, which sometimes is even hard to follow.
    As page split is no more, always allocate order-0 pages. This optimizes
    performance a bit and drops some bytes off the object code. Next, always
    pick the maximum buffer length available for this %PAGE_SIZE to set it
    up in the hardware. This means it now becomes a constant value, which
    also has its positive impact.
    On x64 this means (without XDP):
    
    4096 page
    64 head, 320 tail
    3712 HW buffer size
    3686 max MTU w/o frags
    
    Previously, the maximum MTU w/o splitting a frame into several buffers
    was 3046.
    Increased buffer size allows us to reach the maximum frame size w/ frags
    supported by HW: 16382 bytes (MTU 16356). Reflect it in the netdev
    config as well. Relying on max single buffer size when calculating MTU
    was not correct.
    Move around a couple of fields in &iavf_ring after ::rx_buf_len removal
    to reduce holes and improve cache locality.
    Instead of providing the Rx definitions, which can and will be reused in
    rest of the drivers, exclusively for IAVF, do that in the libie header.
    Non-PP drivers could still use at least some of them and lose a couple
    copied lines.
    
    Function: add/remove: 0/0 grow/shrink: 3/9 up/down: 18/-265 (-247)
    
    + even reclaims a half percent of performance, nice.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    95a993b View commit details
    Browse the repository at this point in the history
  6. net: page_pool: allow DMA mapping with %DMA_ATTR_WEAK_ORDERING

    Add a new flag, %PP_FLAG_DMA_MAP_WEAK, whill will tell PP to map pages
    with %DMA_ATTR_WEAK_ORDERING.
    To keep the code simple and optimized, map the following PP flags to DMA
    map attr flags:
    
    %PP_FLAG_DMA_MAP	=> %DMA_ATTR_SKIP_CPU_SYNC
    %PP_FLAG_DMA_MAP_WEAK	=> %DMA_ATTR_WEAK_ORDERING
    
    The first pair is done to be able to just pass it directly to
    dma_map_page_attrs(). When a driver wants Page Pool to maintain DMA
    mappings, it always sets this flag. Page Pool always skips CPU syncs
    when mapping to do that separately later, so having those two 1:1 avoids
    introducing ifs and/or bit-ors and keeps the code more compact.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    49b4e5c View commit details
    Browse the repository at this point in the history
  7. net: page_pool: add DMA-sync-for-CPU inline helpers

    Each driver is responsible for syncing buffers written by HW for CPU
    before accessing them. Almost each PP-enabled driver uses the same
    pattern, which could be shorthanded into a static inline to make driver
    code a little bit more compact.
    Introduce a pair of such functions. The first one takes the actual size
    of the data written by HW and is the main one to be used on Rx. The
    second picks max_len from the PP params and is designed for more extreme
    cases when the size is unknown, but the buffer still needs to be synced.
    Also constify pointer arguments of page_pool_get_dma_dir() and
    page_pool_get_dma_addr() to give a bit more room for optimization, as
    both of them are read-only.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    b71da32 View commit details
    Browse the repository at this point in the history
  8. iavf: switch to Page Pool

    Now that the IAVF driver simply uses dev_alloc_page() + free_page() with
    no custom recycling logics and one whole page per frame, it can easily
    be switched to using Page Pool API instead.
    Introduce libie_rx_page_pool_create(), a wrapper for creating a PP with
    the default libie settings applicable to all Intel hardware, and replace
    the alloc/free calls with the corresponding PP functions, including the
    newly added sync-for-CPU helpers. Use skb_mark_for_recycle() to bring
    back the recycling and restore the initial performance.
    
    From the important object code changes, worth mentioning that
    __iavf_alloc_rx_pages() is now inlined due to the greatly reduced size.
    The resulting driver is on par with the pre-series code and 1-2% slower
    than the "optimized" version right before the recycling removal.
    But the number of locs and object code bytes slaughtered is much more
    important here after all, not speaking of that there's still a vast
    space for optimization and improvements.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    b71ce3c View commit details
    Browse the repository at this point in the history
  9. libie: add common queue stats

    Next stop, per-queue private stats. They have only subtle differences
    from driver to driver and can easily be resolved.
    Define common structures, inline helpers and Ethtool helpers to collect,
    update and export the statistics. Use u64_stats_t right from the start,
    as well as the corresponding helpers to ensure tear-free operations.
    For the NAPI parts of both Rx and Tx, also define small onstack
    containers to update them in polling loops and then sync the actual
    containers once a loop ends.
    The drivers will be switched to use this API later on a per-driver
    basis, along with conversion to PP.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    3d884e3 View commit details
    Browse the repository at this point in the history
  10. libie: add per-queue Page Pool stats

    Expand the libie generic per-queue stats with the generic Page Pool
    stats provided by the API itself, when CONFIG_PAGE_POOL is enable.
    When it's not, there'll be no such fields in the stats structure, so
    no space wasted.
    They are also a bit special in terms of how they are obtained. One
    &page_pool accumulates statistics until it's destroyed obviously,
    which happens on ifdown. So, in order to not lose any statistics,
    get the stats and store in the queue container before destroying
    a pool. This container survives ifups/downs, so it basically stores
    the statistics accumulated since the very first pool was allocated
    on this queue. When it's needed to export the stats, first get the
    numbers from this container and then add the "live" numbers -- the
    ones that the current active pool returns. The result values will
    always represent the actual device-lifetime* stats.
    There's a cast from &page_pool_stats to `u64 *` in a couple functions,
    but they are guarded with stats asserts to make sure it's safe to do.
    FWIW it saves a lot of object code.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    16d126c View commit details
    Browse the repository at this point in the history
  11. iavf: switch queue stats to libie

    iavf is pretty much ready for using the generic libie stats, so drop all
    the custom code and just use generic definitions. The only thing is that
    it previously lacked the counter of Tx queue stops. It's present in the
    other drivers, so add it here as well.
    The rest is straightforward. There were two fields in the Tx stats
    struct, which didn't belong there. The first one has never been used,
    wipe it; and move the other to the queue structure. Plus move around
    a couple fields in &iavf_ring to account stats structs' alignment.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    1f934b6 View commit details
    Browse the repository at this point in the history
  12. selftests/bpf: robustify test_xdp_do_redirect with more payload magics

    Currently, the test relies on that only dropped ("xmitted") frames will
    be recycled and if a frame became an skb, it will be freed later by the
    stack and never come back to its page_pool.
    So, it easily gets broken by trying to recycle skbs:
    
      test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
      test_xdp_do_redirect:FAIL:pkt_count_zero unexpected pkt_count_zero:
    actual 9936 != expected 2
      test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec
    
    That huge mismatch happened because after the TC ingress hook zeroes the
    magic, the page gets recycled when skb is freed, not returned to the MM
    layer. "Live frames" mode initializes only new pages and keeps the
    recycled ones as is by design, so they appear with zeroed magic on the
    Rx path again.
    Expand the possible magic values from two: 0 (was "xmitted"/dropped or
    did hit the TC hook) and 0x42 (hit the input XDP prog) to three: the new
    one will mark frames hit the TC hook, so that they will elide both
    @pkt_count_zero and @pkt_count_xdp. They can then be recycled to their
    page_pool or returned to the page allocator, this won't affect the
    counters anyhow. Just make sure to mark them as "input" (0x42) when they
    appear on the Rx path again.
    Also make an enum from those magics, so that they will be always visible
    and can be changed in just one place anytime. This also eases adding any
    new marks later on.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    4e60236 View commit details
    Browse the repository at this point in the history
  13. net: page_pool, skbuff: make skb_mark_for_recycle() always available

    skb_mark_for_recycle() is guarded with CONFIG_PAGE_POOL, this creates
    unneeded complication when using it in the generic code. For now, it's
    only used in the drivers always selecting Page Pool, so this works.
    Move the guards so that preprocessor will cut out only the operation
    itself and the function will still be a noop on !PAGE_POOL systems,
    but available there as well.
    No functional changes.
    
    Reported-by: kernel test robot <[email protected]>
    Link: https://lore.kernel.org/oe-kbuild-all/[email protected]
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    2e28dff View commit details
    Browse the repository at this point in the history
  14. xdp: recycle Page Pool backed skbs built from XDP frames

    __xdp_build_skb_from_frame() state(d):
    
    /* Until page_pool get SKB return path, release DMA here */
    
    Page Pool got skb pages recycling in April 2021, but missed this
    function.
    
    xdp_release_frame() is relevant only for Page Pool backed frames and it
    detaches the page from the corresponding page_pool in order to make it
    freeable via page_frag_free(). It can instead just mark the output skb
    as eligible for recycling if the frame is backed by a pp. No change for
    other memory model types (the same condition check as before).
    cpumap redirect and veth on Page Pool drivers now become zero-alloc (or
    almost).
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    df09c63 View commit details
    Browse the repository at this point in the history
  15. xdp: remove unused {__,}xdp_release_frame()

    __xdp_build_skb_from_frame() was the last user of
    {__,}xdp_release_frame(), which detaches pages from the page_pool.
    All the consumers now recycle Page Pool skbs and page, except mlx5,
    stmmac and tsnep drivers, which use page_pool_release_page() directly
    (might change one day). It's safe to assume this functionality is not
    needed anymore and can be removed (in favor of recycling).
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    4fcbfcd View commit details
    Browse the repository at this point in the history

Commits on Apr 14, 2023

  1. iavf: optimize Rx hotpath a bunch -- vol. 2

    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    897dcfd View commit details
    Browse the repository at this point in the history
  2. iavf: fixup for optimize vol. 2

    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    d6ea05c View commit details
    Browse the repository at this point in the history
  3. i40e: Unify handling of zero ring length in 'configure queue'

    The current version of Intel 'ice' driver allows for using zero
    for the ring lenghth in 'configure queue' VIRTCHNL message.
    Such a value indicates the ring should not be configured.
    
    Implement the same handling in i40e driver. Instead of returning
    an 'invalid parameter' error for zero-sized rings, just skip
    that ring during queue pair configuration.
    
    That unified handling is needed for AF_XDP implementation for
    'iavf' driver. In that use case we sometimes need to configure
    Tx ring only for a given queue pair.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    8aca6e1 View commit details
    Browse the repository at this point in the history
  4. iavf: Remove IAVF_TX_FLAGS_FD_SB flag

    This flag was never set, so remove it and simplify buffer
    cleaning process.
    
    Signed-off-by: Larysa Zaremba <[email protected]>
    walking-machine authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    557b392 View commit details
    Browse the repository at this point in the history
  5. iavf: Use separate ring masks for TX and RX in q_vector

    Replace the existing ring mask (common for RX and TX rings)
    in iavf_q_vector with two masks dedicated to handling RX and TX
    rings separately.
    
    The virtchnl interface allows separate masks to be used for different
    ring types, so there is no need to merge them into a single mask.
    Also, after adding XDP support to iavf, the number of RX and TX
    rings can be asymmetric. Therefore, this patch is a necessary
    preparation for XDP support.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    80f873c View commit details
    Browse the repository at this point in the history
  6. iavf: Prepare VIRTCHNL functions to support XDP

    The XDP and AF_XDP feature is initialized using .ndo functions. Those
    functions are always synchronous and may require some serious queues
    reconfiguration including changing the number of queues.
    
    Performing such a reconfiguration implies sending a bunch of VIRTCHNL
    messages to the PF in order to disable queues, re-enable and re-configure
    them, or update the RSS LUT.
    By definition, those VIRTCHNL messages are sent asynchronously, so the
    result of each VIRTCHNL operation can be received from the PF via admin
    queue after some time.
    Moreover, the previous implementation of some VIRTCHNL functions (e.g.
    'iavf_disable_queues()' or 'iavf_enable_queues()' does not allow to call
    them selectively for specific queues only.
    
    In order to addres those problems and cover all scenarios of XDP and
    AF_XDP initialization, implement a polling mechanism with a timeout for
    blocking the execution of XDP .ndo functions until the result of
    VIRTCHNL operation on PF is known to the driver.
    Also, refactor the existing VIRTCHNL API by adding functions for
    selective queue enabling, disabling and configuration.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    455dee1 View commit details
    Browse the repository at this point in the history
  7. iavf: Refactor ring initialization functions to handle XDP

    Introduce modular functions to allocate and initialize Rx and Tx rings
    in order to prepare the initialization procedure to easily fit the XDP
    setup.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    d1aea70 View commit details
    Browse the repository at this point in the history
  8. iavf: Prepare rings to support XDP

    Extend basic structures of the driver (e.g. 'iavf_adapter', 'iavf_ring')
    by adding members necessary to support XDP. Register those members using
    required functions from BPF API.
    Implement a support for XDP_TX and XDP_REDIRECT actions by adding
    additional XDP Tx queues to transmit packets without interferring a
    regular Tx traffic.
    Finally, add required XDP setup and release calls to queue allocation
    and deallocation functions respectively.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    Signed-off-by: Larysa Zaremba <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    76bf498 View commit details
    Browse the repository at this point in the history
  9. iavf: don't hardcode DMA direction, headroom and buffer len on Rx

    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    629e722 View commit details
    Browse the repository at this point in the history
  10. iavf: Handle XDP_SETUP_PROG command in .ndo_bpf

    Add .ndo_bpf function to handle XDP_SETUP_PROG command.
    
    In order to avoid synchronization issues, implement functions
    dedicated to re-initialize only those parts of the interface which
    are really necessary to setup the XDP program.
    Such an approach is much lighter than performing a full reset of the
    driver and thanks to it we can immediately know the result of traffic
    initialization comparing to the reset task which triggers some
    asynchronous events (e.g. link speed negotiation).
    
    Signed-off-by: Michal Kubiak <[email protected]>
    Signed-off-by: Larysa Zaremba <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    9f85b9d View commit details
    Browse the repository at this point in the history
  11. iavf: Add XDP_PASS and XDP_DROP support

    Implement basic XDP program setup, refactor data path
    to use xdp_buff, implement XDP_PASS and XDP_DROP actions.
    
    Signed-off-by: Larysa Zaremba <[email protected]>
    walking-machine authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    c59cad5 View commit details
    Browse the repository at this point in the history
  12. iavf: Implement XDP_TX action

    Implement sending the packet from an XDP ring.
    XDP path functions are separate from the general TX routines,
    because this allows to simplify and therefore speedup the process.
    It also makes code more friendly to future XDP-specific optimizations.
    
    Signed-off-by: Larysa Zaremba <[email protected]>
    walking-machine authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    19cfad9 View commit details
    Browse the repository at this point in the history
  13. iavf: Implement XDP redirect path

    Implement XDP_REDIRECT action and ndo_xdp_xmit() callback.
    
    For now, packets redirected from CPU with index greater than
    XDP queues number are just dropped with an error.
    This is a rather common situation, especially when VF is configured
    to run on host and will be addressed in later patches.
    
    Patch also refactors RX XDP handling to use switch statement due to
    increased number of actions.
    
    Signed-off-by: Larysa Zaremba <[email protected]>
    walking-machine authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    ed6e942 View commit details
    Browse the repository at this point in the history
  14. iavf: Allow XDP TxQ sharing

    Port of commit 22bf877 ("ice: introduce XDP_TX fallback path").
    The patch handles the case, when queue number is not sufficient for
    the current number of CPUs. To avoid dropping some packets
    redirected from other interfaces, XDP TxQs are allowed to be shared
    between CPUs, which imposes the locking requirement.
    Static key approach has little to none performance penalties
    when sharing is not needed.
    
    This mechanism is much more applicable when dealing with VFs.
    In fact, maximum number of queue pairs that ice PF can give to
    an iavf VF is 16, which allows up to 8 XDP TxQs, so without
    XDP TxQ sharing, some redirected packets can be dropped even
    on a 10 CPU system.
    
    Signed-off-by: Larysa Zaremba <[email protected]>
    walking-machine authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    55adf95 View commit details
    Browse the repository at this point in the history
  15. iavf: Enable XDP netdev features

    Enable NETDEV_XDP_ACT_BASIC and NETDEV_XDP_ACT_REDIRECT
    XDP features in netdev.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    8f9b548 View commit details
    Browse the repository at this point in the history
  16. iavf: Add AF_XDP initialization

    Add necessary functions and data structures to support
    AF_XDP feature.
    Implement handling of 'XDP_SETUP_XSK_POOL' in .ndo_bpf().
    Also, implement functions for selectively stopping only
    those queues which take part in XDP socket creation.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    7910142 View commit details
    Browse the repository at this point in the history
  17. iavf: Implement Tx path for AF_XDP

    Implement Tx handling for AF_XDP feature in zero-copy mode.
    Add '.ndo_xdp_xmit()' and '.ndo_xsk_wakeup()' implementations
    to support AF_XDP Tx path.
    Also, add Tx interrupt handling function for zero-copy mode.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    af588dd View commit details
    Browse the repository at this point in the history
  18. iavf: Implement AF_XDP RX processing

    Implement RX packet processing specific to AF_XDP ZC.
    All actions except XDP_PASS are supported, the skb path will
    be implemented in later patches.
    
    Signed-off-by: Larysa Zaremba <[email protected]>
    walking-machine authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    af38da2 View commit details
    Browse the repository at this point in the history
  19. iavf: consolidate skb fields processing

    For now, filling the skb fields on Rx is a bit scattered across RQ
    polling function. This makes it harder to reuse the code on XSk Rx
    path and also sometimes costs some CPU (e.g. doing a lookup for the
    decoded packet type two times).
    Make it consistent and do everything in iavf_process_skb_fields(). First
    of all, get the packet type and decode it. Then, move to hash, csum and
    VLAN, which is moved here too. iavf_receive_skb() becomes then the
    classic eth_type_trans() + napi_gro_receive() pair.
    Finally, make the fields processing function global and the skb receive
    function static inline in order to call them from a different file later
    on.
    
    Signed-off-by: Alexander Lobakin <[email protected]>
    alobakin authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    f057680 View commit details
    Browse the repository at this point in the history
  20. iavf: Implement XDP_PASS path in AF_XDP processing

    Construct skb and fill in its fields, when AF_XDP
    is enabled on the ring, if XDP program returns XDP_PASS.
    (will be fixed up).
    
    Signed-off-by: Larysa Zaremba <[email protected]>
    walking-machine authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    7f53f95 View commit details
    Browse the repository at this point in the history
  21. iavf: Make request and free traffic irqs symmetric

    The existing implementation of 'iavf_request_traffic_irqs()'
    function does not request any interrupt for q_vectors that
    have no Tx nor Rx queues assigned to it. However, the function
    'iavf_free_traffic_irqs()' releases interrupts for all q_vectors
    unconditionally.
    Such an approach may result in showing kernel warning about
    an attempt of releasing the interrupt that was not requested.
    
    In order to solve that potential issue make both functions
    fully symmetric. Therefore, add the logic to 'iavf_free_traffic_irqs()'
    for skipping not used q_vectors.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    3e2fafc View commit details
    Browse the repository at this point in the history
  22. iavf: Do not reset the number of requested queues

    When the number of queues is being changed by the user, the information
    about a new queue number is kept in the adapter structure member
    (num_req_queues). Such an information was always reset to zero just
    after setting queue number request is processed.
    
    However, that structure member should always provide an information
    about user's preference regarding the requested queue number, so
    it should be preserved for future driver reinitializations or
    setting up the adapter for XDP program.
    
    Remove setting the number of requested queues to zero and use that
    value as a priority one during next reinitializations of the adapter,
    in order to avoid the scenario when the queue count can be changed
    automatically out of user's control.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    a3e00d6 View commit details
    Browse the repository at this point in the history
  23. iavf: Limit number of channels in ethtool when XDP is enabled

    When XDP is enabled, our true maximum number of queue pairs
    can be reduced up to being cut in half, ex. for system with 10 CPUs
    and 16 queue pairs allowed by PF, normally maximum would be 10 queues,
    but XDP requires 2 queue pairs per channel, so the maximum of 8 queues
    can be used if program is attached.
    
    The above fact has to be reflected in ethtool.
    
    Signed-off-by: Larysa Zaremba <[email protected]>
    walking-machine authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    c784562 View commit details
    Browse the repository at this point in the history
  24. iavf: lock XDP queue while using in ZC mode

    To avoid race between .ndo_xdp_xmit(), normal ZC TX processing
    and XDP_TX in ZC mode when also sharing queues,
    add locking to the later two.
    
    Locking in .ndo_xdp_xmit() is already present.
    
    Signed-off-by: Larysa Zaremba <[email protected]>
    walking-machine authored and michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    b7fbed3 View commit details
    Browse the repository at this point in the history
  25. iavf: Enable AF_XDP zero-copy feature in netdev

    Enable NETDEV_XDP_ACT_XSK_ZEROCOPY feature in
    netdev structure.
    
    Signed-off-by: Michal Kubiak <[email protected]>
    michalQb committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    f71cafe View commit details
    Browse the repository at this point in the history