Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tbb] Please support tbb version 2021.1.1 #1471

Open
JackBoosY opened this issue Mar 3, 2021 · 54 comments
Open

[tbb] Please support tbb version 2021.1.1 #1471

JackBoosY opened this issue Mar 3, 2021 · 54 comments

Comments

@JackBoosY
Copy link

Hi guys, I'm vcpkg maintainer.
I tried to update tbb to the latest version 2021.1.1 in vcpkg (PR microsoft/vcpkg#16321), but seems that usd doesn't support this version. And some other libraries has been adapted for that.
Can you consider updating the code to adapt to this version?

Thanks.

@jilliene
Copy link

jilliene commented Mar 9, 2021

Filed as internal issue #USD-6600

@manuelk
Copy link

manuelk commented Mar 12, 2021

Confirmed - after much trial & error, was able to get a working build using the 'tbb_2019' branch (which unfortunately did not have a cmake build or exported config yet).

The biggest hurdle seems to be the obsoleting of tbb::atomic (replaced with std::atomic sometime in late 2020).

IMHO, there is little reason to not submodule tbb & just let tbb's cmake config take care of everything, but just in case, I am attaching the snippet i had to modify in the FindTBB.cmake module (because the version file moved too)

  find_file(TBB_VERSION_FILE 
      NAMES oneapi/tbb/version.h tbb/version.h tbb/tbb_stddef.h       
      HINTS ${TBB_INCLUDE_DIRS} ${TBB_SEARCH_DIR}
      PATHS ${TBB_DEFAULT_SEARCH_DIR})

  if(TBB_VERSION_FILE)
    file(READ "${TBB_VERSION_FILE}" _tbb_version_file)

    string(REGEX REPLACE ".*#define TBB_VERSION_MAJOR ([0-9]+).*" "\\1"
        TBB_VERSION_MAJOR "${_tbb_version_file}")

    string(REGEX REPLACE ".*#define TBB_VERSION_MINOR ([0-9]+).*" "\\1"
        TBB_VERSION_MINOR "${_tbb_version_file}")

    string(REGEX REPLACE ".*#define TBB_INTERFACE_VERSION ([0-9]+).*" "\\1"
        TBB_INTERFACE_VERSION "${_tbb_version_file}")

    set(TBB_VERSION "${TBB_VERSION_MAJOR}.${TBB_VERSION_MINOR}")
  endif()

@meshula
Copy link
Member

meshula commented Mar 13, 2021

Maybe the biggest hurdle can be cleared by moving USD to std::atomic :) You know what gets my vote! @manuelk , is that the approach you took?

@manuelk
Copy link

manuelk commented Mar 13, 2021

the compiler was throwing lots of errors, so i didn't really try to fix it (with the idea that other things may have been obsoleted and apparently USD was not being kept up to date obviously)

took the easy out & just reverted TBB to a 2019 version

i do agree with you that a lot of the 'arch' code in USD should be moved to modern C++ (mutex, "ref" ptrs, ...) with the hope of maybe killing the now extremely unwieldy TBB dependency (full win10 binary installer is 22GB on disk - for real Intel !)

@manuelk
Copy link

manuelk commented Mar 15, 2021

Something else that makes no sense w/ TBB handling in USD :

in the TBB module you correctly define a debug & release interface (assuming the expression works - it's hard to read...)

      set_target_properties(TBB::tbb PROPERTIES
          INTERFACE_COMPILE_DEFINITIONS "$<$<OR:$<CONFIG:Debug>,$<CONFIG:RelWithDebInfo>>:TBB_USE_DEBUG=1>"
          IMPORTED_LOCATION_DEBUG          ${TBB_LIBRARIES_DEBUG}
          IMPORTED_LOCATION_RELWITHDEBINFO ${TBB_LIBRARIES_DEBUG}
          IMPORTED_LOCATION_RELEASE        ${TBB_LIBRARIES_RELEASE}
          IMPORTED_LOCATION_MINSIZEREL     ${TBB_LIBRARIES_RELEASE}
          )

but then all the USD components bypass the interface to link directly with the component variable:

pxr_library(tf
    LIBRARIES
        arch
        ${WINLIBS}
        ${PYTHON_LIBRARIES}
        ${Boost_PYTHON_LIBRARY}
        ${TBB_tbb_LIBRARY}

Obviously this does not work well for debug builds, as neither TBB_USE_DEBUG is set, nor is the linker pointed to the right library.

@e4lam
Copy link

e4lam commented Apr 20, 2021

@jilliene @meshula We're moving up to TBB 2020 update 3 for the next major release of Houdini right now (since CY2021 vfxplatform.com is TBB 2020). I'm noticing the same issues except they're flagged as deprecation warnings instead of out right removals. I think it'll be worthwhile to first move USD up to TBB 2020 and replace uses as we go to remove the deprecated functionality.

@meshula
Copy link
Member

meshula commented Apr 20, 2021

@manuelk Yeah, it would be more correct/bulletproof to link via TBB::tbb, not ${TBB_tbb_LIBRARY} (as elaborated in the plethora of "modern cmake" articles.)

@HnimNart
Copy link

HnimNart commented Aug 6, 2021

Do you have a timeline for updating to TBB 2021?

@c64kernal
Copy link
Contributor

We don't unfortunately have a timeline for moving to newer versions of tbb... though we typically try to at least follow http://vfxplatform.com/

@manuelk
Copy link

manuelk commented Aug 10, 2021

I've just run a quick scan for TBB over the source code - the majority of the usage seems to be for things like mutex, atomic and thread local storage, which should be transitioned to modern C++ (as recommended by Intel in TBB).

However, there also seems to be enough use of thread arenas, concurrent queues, etc that deprecating TBB would not be a small task (it's really unfortunate that OneTBB has become so unwieldy in recent years). But it's still nothing compared to the headaches of boost + python though.

@spiffmon
Copy link
Member

@manuelk , even if somehow there were enough resources in the universe (so to speak) to wean USD off of TBB, that in itself would create an even less tractable problem: for better or worse, TBB has becomes the CPU resource management system of the VFX industry, so if USD were to move away from it, it would become really difficult to share resources with host applications in a good-citizen and performant way.

I think the only realistic way to remove the TBB dependency from USD is if the VFX platform itself mandates it, so that USD and all its consumers could move together.

@manuelk
Copy link

manuelk commented Aug 10, 2021

There be a boat anchor... guess we can commiserate around beers at the next non-virtual Siggraph :) (if ever)

@meshula
Copy link
Member

meshula commented Aug 11, 2021

Spiff points out that using TBB helps alleviate CPU over-subscription, a might important consideration. That does leave some room for some targeted weaning, around some std provided constructs like atomics, and particularly around the parts of TBB slated for removal, at least on vfxplatform gated time table.

Sidebar: To get the same benefit on the over-subscription issue on Apple platforms, the Work system would need to cooperate with Grand Central Dispatch. Perhaps longer term, there's more to discuss around that, as an independent topic.

@ggris
Copy link

ggris commented Aug 24, 2021

Hi, guys
I ended up here because I wanted to to update TBB through VCPKG in our codebase and followed the threads to see what was the issue and how I could maybe help solving it.

I'd like to share why we have to update it or our codebase because after looking at this repo, it might have implications on your side. It all boils down to spin lock and why unless you have a really guru expert, you should not play with fire and try to save some performance by using them instead of regular mutexes.

Long story short: in their internal task_queue, prior to the 2021 release, they were using a spin lock to protect the queue which, as is often the case with spin lock, resulted on occasional lock access freeze when trying to access the queue.

This is probably relevant to this repo because I see tbb::spin_mutex used all over the place. So either

  • Your lib is never used on high CPU loads or on single core CPU
  • You have a real threading expert internally who understands the issue I'm talking about and reviewed all spin locks usage to guarantee none of them are problematic (But note that even the guys at tbb did mistakes when using spin locks)
  • This is all legacy code and no one really knows why it's spin mutex and not regular mutex

@spiffmon
Copy link
Member

Thanks @ggris ! We monitor our performance suite for lock contention, and have deployed several strategies for reducing it when it crops up. We have not observed correctness issues or deadlocks attributable to our use of tbb's spin locks, but if you have evidence of such a problem, or of scalability issues (we do not regularly run on more than 32 cores yet), please let us know.

@e4lam
Copy link

e4lam commented Aug 26, 2021

@ggris On a side note, can you point to where the fix is for TBB? Thanks!

@ggris
Copy link

ggris commented Aug 26, 2021

@e4lam Actually, I was mistaken in thinking it was already merged but it is a fix for this ticket: oneapi-src/oneTBB#546
Note that in this case the issue comes from an implicit spin lock pattern. As I learned while researching this, it seems TBB spin_mutex is protected against the typical spin lock pattern issue with a backoff strategy and the solution for this ticket is to apply similar backoff strategy in this implicit spin lock pattern. I've had so many issues in the past with libraries using bad implementation of the spin lock pattern that I now assume by default that all spin lock patterns are buggy.

I ended up backporting the backoff fix to our internal build of TBB as it is unlikely we'll get an up to date TBB

@mbechard
Copy link

mbechard commented Oct 21, 2021

@c64kernal at the very least could the codebase remove usage of deprecated (now removed) TBB features and move to the std:: replacements? That way newer TBB binaries can be used at runtime. Even if USD is built against TBB 2020 Update 3, if it's not using features that have been removed in 2021.4.0, we should be able to drop in 2021.4.0 binaries (which other tools we use may rely on) and allow things to still function.
An upgrade guide can be found here:
https://www.intel.com/content/www/us/en/developer/articles/technical/tbb-revamp.html

@c64kernal
Copy link
Contributor

@mbechard -- yes for sure! If there is a minimally invasive change that you'd like to PR, we'd love to have a look and would definitely consider. We're currently constrained to C++14 by the way. Thanks so much for pushing through this!

@mbechard
Copy link

@c64kernal unfortunately I don't think I'm in a position to do a PR for this. Certainly I could upgrade the tbb::atomics, but things such as tbb::empty_task and related threading work would likely be tricky for me to do as I've spent no time working in the USD codebase (and really, barely any in TBB as well).
Since these changes are compatible with both the current codebase and in the future (and will be required once vfxplatforms reaches TBB 2021), I was suggesting they get added sooner rather than later to allow people to use the newer TBB when required.
Thanks for your consideration!

@c64kernal
Copy link
Contributor

Okay got it, thanks again @mbechard -- I filed an internal issue to do this initial clean up, and we'll prioritize as best we can. Thanks again for the suggestion.

@sybrenstuvel
Copy link

Hello,

A little introduction: I'm Sybren, the developer who added USD support to Blender.

We don't unfortunately have a timeline for moving to newer versions of tbb... though we typically try to at least follow http://vfxplatform.com/

This puts anyone who wants to use USD in an application in a tough spot. The VFX Reference Platform has a strange mix of super new versions of libraries and deprecated/outdated versions. For example, for a while 'CY2021' targeted a not-yet-released version (3.0) of OpenEXR while targeting a version of Python (3.7) that was so old it would be unsupported for the 2nd half of the year.

If USD developers follow VFX Reference Platform in the sense of only supporting deprecated versions of these libraries, this means that any application that wants to use USD also has to lag behind.

USD is using parts of the TBB API that were deprecated two years ago. Various Linux distributions will ship with a newer version soon; they typically only allow installing a single version of such libraries, which means that building Blender with USD support on contemporary Linuxes is going to be problematic.

I filed an internal issue to do this initial clean up, and we'll prioritize as best we can.

This was posted about 4 months ago. What is the current status?

PS: OpenSubDiv also still only supports the deprecated TBB API, but there at least patching the library is fairly trivial. For USD the work seems to be more considerable, though, which means that Blender can't just include a patch to make it work.

@spiffmon
Copy link
Member

spiffmon commented Mar 15, 2022

Hi @sybrenstuvel ,
We understand your vexation, and honestly would very much like to be on TBB's OneAPI. We hope to be able to do this upgrade by end of year.
To understand why it is taking us so long, there are two factors at play, which are related: resources, and sensitivity to the VFX industry.

You may think the VFX reference platform is criminally impeding forward progress, but in fact it protects animation and vfx studios from vendor thrashing, given our (unfortunate, but difficult to change) slow pace of updating OS's and system components. I am attaching a fascinating survey of 88 anim/Vfx studios conducted last year by the VES, for your information. It shows where the industry is in terms of OS's, but the thing I really want to highlight to you is that until the (Predicted) end of this year, not even a majority of VFX houses will have migrated to Python 3.x yet! Pixar is, in fact, in the middle of wrestling our large internal codebase into 3.x compatibility, and even though (thanks to NVidia!) USD can be built with python 3, internally all production and development builds still use python 2.7x.

Referring back to the reference platform, then, we and other studios are still trying to adopt CY2020, and the move to OneAPI does not happen until CY2023. We may jump right to CY2022 after finishing 2020 adoption, but in either case, even CY2022 is only TBB 2020, reflective of the widespread expectation that moving forward to OneApi is going to be a Big Deal. In fact, we do not yet even know how we are going to manage it without serious performance degradation to some of our internal software, because OneApi removes some lower-level API's that we leverage to good effect.

If you or others have the resources and will to want to see USD be OneAPI compatible ahead of our ability to make it so, we would be very appreciative. The constraints, as we see them, are:

  1. Anything that would want to land soon would, in deference to vfx studios and the software they are currently using, need to support building against either TBB 2019 or 202x .
  2. As you mentioned, USD still uses some TBB containers/locks that have been deprecated in favor of std options. But many of those are in performance-sensitive codesites, so adopting alternatives may require extra consideration. We are working, for other collaborations, to make available some of the performance tests and test assets we rely on internally, but that may not bear fruit soon enough to be useful.

That said, I do think such a contribution would help speed this process up.

Thanks, Sybren!

@boberfly
Copy link
Contributor

boberfly commented Jun 7, 2022

Hi all,

Just a heads-up I started to have a crack at it today and have made some progress:
https://gist.github.com/boberfly/07f7f8a4c7433d8ebd5cdfa640ef7758

As you can see in the error log, it still has an issue and my head hurts from all the template+namespace+function bindings... :)

https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Migration_Guide/Task_API.html

Cheers

@e4lam
Copy link

e4lam commented Jun 7, 2022

@boberfly I'd suggest that you split it out under separate PR's by porting in different branches if possible, one for each category of API changes. In particular, I think at least the atomic changes should be separate.

@boberfly
Copy link
Contributor

boberfly commented Jun 7, 2022

@e4lam good idea, I am just trying to get this working on regular TBB 2020U3 to verify my TBB_INTERFACE_VERSION defines and std::atomic replacements are working, unfortunately this one in particular: https://github.com/PixarAnimationStudios/USD/blob/release/pxr/usd/usdGeom/bboxCache.cpp#L107 causes template deduce issues for some reason on this line for the insert which seems bonkers to me at the moment, if anyone else has any ideas... :)

@furby-tm
Copy link
Contributor

furby-tm commented Jun 13, 2022

@boberfly The bboxCache has been the bane of my existence in terms of being able to upgrade TBB. Every time I've went through all the work you've done - I have gotten stuck at this exact section. If you know how to fix it that would be huge.

We may need to reach out to the Intel TBB team to get this resolved, I've been staring at this issue for years 😆.

This issue also exists on all operating systems - macOS, Windows, and Linux.

@brechtvl
Copy link
Contributor

I have a branch here that supports optionally building with oneTBB, and passes the tests on my Linux machine:
https://github.com/brechtvl/USD/commits/onetbb

I can submit this as multiple PRs, but not sure how granular is most convenient. Perhaps one PR with the build system, dispatcher and thread limit changes (since you can't test them individually), and another PR with the various small changes?

@spiffmon
Copy link
Member

Thank you, @brechtvl - this sounds great! As to factoring, the first PR you suggested sounds right. For the rest of the “small” changes, switching to std from tbb objects/APIs, etc, it would be great to get one PR per affected module. This helps our correctness and performance automation be efficient about tracking regressions, and allows us to not need to revert a big change in the case where it would take time to address an unacceptable regression. So we would land all of the smaller PR’s one at a time, first, before the build/Work change, since they should all be backwards compatible.

Just to set expectations, we are currently a bit over-booked, but are allotting time in the Fall to address OneTBB compatibility, so that’s when we’ll likely get to reviewing and incorporating these PR’s, and your work should make the process go much faster, so thank you again!

@mwestphal
Copy link

Please note we are waiting on this at F3D project in order to add USD support in our binary package, thanks a lot for working on this.

f3d-app/f3d#819

@Meakk
Copy link

Meakk commented Jun 12, 2023

Thank you @brechtvl! I tried your fork and it's working fine.
For those interested, I rebased @brechtvl's fork on v23.05 and kept their commits and merged commit 610c6c9.
Here is the formatted patch that one can use to build USD 23.05 with oneTBB.

@e4lam
Copy link

e4lam commented Jun 12, 2023

I'm not familiar with the USD coding style but is it expected that all methods defined inside a class be explicitly marked as inline? It's rather superfluous since any method defined inside a class is already inline.

@spiffmon
Copy link
Member

Ugh. Our internal coding conventions (latest revision is still in a draft state, which is part of the reason we have not published them) is silent on that, and it should not be. I think that is the pattern we want, @e4lam , however, we are not entirely consistent, e.g. gf/range.temlplate.h:SetEmpty(), though that is the only inconsistency I noted (but there could be others).

@brechtvl
Copy link
Contributor

brechtvl commented Jun 13, 2023

The addition of inline in pxr/usd/sdf/path.h is not in my pull requests, it seems to be part of another commit that was included in that formatted patch.

@alexmyczko
Copy link

is 2021.8.0 also fine enough to be supported? not sure if it's related to #1650

@e4lam
Copy link

e4lam commented Jun 22, 2023

@brechtvl Yeah, I only realized that later. I think it might have got caught into @Meakk 's formatted patch because it seems like that code in path.h was subsequently reverted.

@spiffmon Ok sure. Just curious, what's the motivation for using inline on methods defined inside classes?

@spiffmon
Copy link
Member

There is no justifiable reason, @e4lam ... the offending code is I think more than twenty years old and just has not been scrutinized before now.

@e4lam
Copy link

e4lam commented Jun 23, 2023

@spiffmon Sorry, I thought you meant that what you want is to add the inline keyword? Or did I misunderstood you on the other direction?

@Meakk
Copy link

Meakk commented Nov 22, 2023

Is there any update regarding this issue?

@spiffmon
Copy link
Member

It is queued up for the next few months, thanks.

@mwestphal
Copy link

Any updates on this ? It doesnt seem to be part of usd 24 sadly.

@spiffmon
Copy link
Member

It's IP and we hope to land it soon... it threw some wrenches into our larger codebase that took some time to iron out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests