Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TBB deprecation fix (partial) #209

Merged
merged 11 commits into from
May 6, 2020
18 changes: 13 additions & 5 deletions examples/SolverComparer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,8 @@
#include <iostream>

#ifdef GTSAM_USE_TBB
#include <tbb/tbb.h>
#undef max // TBB seems to include windows.h and we don't want these macros
#undef min
#include <tbb/task_arena.h> // tbb::task_arena
#include <tbb/task_group.h> // tbb::task_group
#endif

using namespace std;
Expand Down Expand Up @@ -206,10 +205,11 @@ int main(int argc, char *argv[]) {
}

#ifdef GTSAM_USE_TBB
std::unique_ptr<tbb::task_scheduler_init> init;
tbb::task_arena arena;
tbb::task_group tg;
if(nThreads > 0) {
cout << "Using " << nThreads << " threads" << endl;
init.reset(new tbb::task_scheduler_init(nThreads));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This line was important to the test.

Copy link
Contributor Author

@acxz acxz Jan 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because we do not explicitly create a task_scheduler_init does not mean, TBB or C++11 is not using its internal one. n threads are still created and the test is run in parallel if TBB is being used, although not needed now, since C++11 has concurrency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always wanted max number of threads (i.e. threads = number of cores) that might be ok, but nThreads is an optional parameter to this program, so that the user can test timing for a specific number of threads. As I see it, removing this line effectively turns nThreads into an on/off switch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thats right. Thx for the thorough review! I'll add a commit that adds this deprecated feature for now until we figure out how to handle it in the future.

arena.initialize(nThreads);
} else
cout << "Using threads for all processors" << endl;
#else
Expand All @@ -219,6 +219,10 @@ int main(int argc, char *argv[]) {
}
#endif

#ifdef GTSAM_USE_TBB
arena.execute([&]{
tg.run_and_wait([&]{
#endif
// Run mode
if(incremental)
runIncremental();
Expand All @@ -230,6 +234,10 @@ int main(int argc, char *argv[]) {
runPerturb();
else if(stats)
runStats();
#ifdef GTSAM_USE_TBB
});
});
#endif

return 0;
}
Expand Down
44 changes: 34 additions & 10 deletions examples/TimeTBB.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,12 @@ using boost::assign::list_of;

#ifdef GTSAM_USE_TBB

#include <tbb/tbb.h>
#undef max // TBB seems to include windows.h and we don't want these macros
#undef min
#include <tbb/blocked_range.h> // tbb::blocked_range
#include <tbb/tick_count.h> // tbb::tick_count
#include <tbb/parallel_for.h> // tbb::parallel_for
#include <tbb/cache_aligned_allocator.h> // tbb::cache_aligned_allocator
#include <tbb/task_arena.h> // tbb::task_arena
#include <tbb/task_group.h> // tbb::task_group

static const DenseIndex numberOfProblems = 1000000;
static const DenseIndex problemSize = 4;
Expand Down Expand Up @@ -67,10 +70,14 @@ struct WorkerWithoutAllocation
};

/* ************************************************************************* */
map<int, double> testWithoutMemoryAllocation()
map<int, double> testWithoutMemoryAllocation(int num_threads)
{
// A function to do some matrix operations without allocating any memory

// Create task_arena and task_group
tbb::task_arena arena(num_threads);
tbb::task_group tg;

// Now call it
vector<double> results(numberOfProblems);

Expand All @@ -79,7 +86,14 @@ map<int, double> testWithoutMemoryAllocation()
for(size_t grainSize: grainSizes)
{
tbb::tick_count t0 = tbb::tick_count::now();
tbb::parallel_for(tbb::blocked_range<size_t>(0, numberOfProblems), WorkerWithoutAllocation(results));

// Run parallel code (as a task group) inside of task arena
arena.execute([&]{
tg.run_and_wait([&]{
tbb::parallel_for(tbb::blocked_range<size_t>(0, numberOfProblems), WorkerWithoutAllocation(results));
});
});

tbb::tick_count t1 = tbb::tick_count::now();
cout << "Without memory allocation, grain size = " << grainSize << ", time = " << (t1 - t0).seconds() << endl;
timingResults[(int)grainSize] = (t1 - t0).seconds();
Expand Down Expand Up @@ -120,10 +134,14 @@ struct WorkerWithAllocation
};

/* ************************************************************************* */
map<int, double> testWithMemoryAllocation()
map<int, double> testWithMemoryAllocation(int num_threads)
{
// A function to do some matrix operations with allocating memory

// Create task_arena and task_group
tbb::task_arena arena(num_threads);
tbb::task_group tg;

// Now call it
vector<double> results(numberOfProblems);

Expand All @@ -132,7 +150,14 @@ map<int, double> testWithMemoryAllocation()
for(size_t grainSize: grainSizes)
{
tbb::tick_count t0 = tbb::tick_count::now();
tbb::parallel_for(tbb::blocked_range<size_t>(0, numberOfProblems), WorkerWithAllocation(results));

// Run parallel code (as a task group) inside of task arena
arena.execute([&]{
tg.run_and_wait([&]{
tbb::parallel_for(tbb::blocked_range<size_t>(0, numberOfProblems), WorkerWithAllocation(results));
});
});

tbb::tick_count t1 = tbb::tick_count::now();
cout << "With memory allocation, grain size = " << grainSize << ", time = " << (t1 - t0).seconds() << endl;
timingResults[(int)grainSize] = (t1 - t0).seconds();
Expand All @@ -153,9 +178,8 @@ int main(int argc, char* argv[])
for(size_t n: numThreads)
{
cout << "With " << n << " threads:" << endl;
tbb::task_scheduler_init init((int)n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this line this entire timing script is pointless. Is there not a replacement for this deprecated function? Otherwise, could we just suppress the deprecation warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you say that? The task_scheduler_init was optional in the first place. If we don't provide one, TBB uses one internally. We are still timing how fast the parallelization occurs. From what I can tell the functionality of the script has not changed one bit.
Suppressing the deprecation warning is just hiding the problem. In future releases this will become an error, instead of a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand that it may be hard to support both TBB 4.4 and TBB 2020 and if that is the case, we can just add a requirement during the installation procedure that warns the use of TBB > 2020. Although I think we can get GTSAM to support both versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Read the for loop more carefully. The purpose of this script is to time the performance of TBB with different grain sizes, and this default grain size varies with number of threads. In order to read this default grain size, this loop initializes tbb with 1, 4, 8 threads, and reads the grain size.
Removing tbb::task_scheduler_init init((int)n); causes number of threads to be equal to number of cores for all cases, rendering the for loops in this test redundant and pointless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see now, yes we are specifying the number of threads each iteration of the for loop. Hmmm, well this is a predicament.

Copy link
Contributor Author

@acxz acxz Jan 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me link the TBB documentation: https://software.intel.com/en-us/node/506296 regarding task_scheduler_init

Some quotes from it:

An optional parameter to the constructor and method initialize allows you to specify the number of threads to be used for task execution. This parameter is useful for scaling studies during development, but should not be set for production use.

Tip
The reason for not specifying the number of threads in production code is that in a large software project, there is no way for various components to know how many threads would be optimal for other threads. Hardware threads are a shared global resource. It is best to leave the decision of how many threads to use to the task scheduler.

To minimize time overhead, it is best to rely upon automatic creation of the task scheduler, or create a single task_scheduler_init object whose activation spans all uses of the library's task scheduler. A task_scheduler_init is not assignable or copy-constructible.

But I mean this is just saying don't use it in production. Basically we want a feature to specify the number of threads we create.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can use task_arena instead. (https://software.intel.com/en-us/node/506359)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on how to replicate this script without task_scheduler_init so I am going to leave this script be, hopefully it will get tackled before the feature gets removed.

results[(int)n].grainSizesWithoutAllocation = testWithoutMemoryAllocation();
results[(int)n].grainSizesWithAllocation = testWithMemoryAllocation();
results[(int)n].grainSizesWithoutAllocation = testWithoutMemoryAllocation((int)n);
results[(int)n].grainSizesWithAllocation = testWithMemoryAllocation((int)n);
cout << endl;
}

Expand Down
46 changes: 5 additions & 41 deletions gtsam/base/ThreadsafeException.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

/**
* @file ThreadSafeException.h
* @brief Base exception type that uses tbb_exception if GTSAM is compiled with TBB
* @brief Base exception type that uses tbb_allocator if GTSAM is compiled with TBB
* @author Richard Roberts
* @date Aug 21, 2010
* @addtogroup base
Expand All @@ -25,34 +25,28 @@
#include <gtsam/dllexport.h>
#include <string>
#include <typeinfo>
#include <exception>

#ifdef GTSAM_USE_TBB
#include <tbb/tbb_allocator.h>
#include <tbb/tbb_exception.h>
#include <tbb/scalable_allocator.h>
#include <iostream>
#endif

namespace gtsam {

/// Base exception type that uses tbb_exception if GTSAM is compiled with TBB.
/// Base exception type that uses tbb_allocator if GTSAM is compiled with TBB.
template<class DERIVED>
class ThreadsafeException:
#ifdef GTSAM_USE_TBB
public tbb::tbb_exception
#else
public std::exception
#endif
{
#ifdef GTSAM_USE_TBB
private:
typedef tbb::tbb_exception Base;
typedef std::exception Base;
#ifdef GTSAM_USE_TBB
protected:
typedef std::basic_string<char, std::char_traits<char>,
tbb::tbb_allocator<char> > String;
#else
private:
typedef std::exception Base;
protected:
typedef std::string String;
#endif
Expand Down Expand Up @@ -82,36 +76,6 @@ public std::exception
}

public:
// Implement functions for tbb_exception
#ifdef GTSAM_USE_TBB
virtual tbb::tbb_exception* move() throw () {
void* cloneMemory = scalable_malloc(sizeof(DERIVED));
if (!cloneMemory) {
std::cerr << "Failed allocating memory to copy thrown exception, exiting now." << std::endl;
exit(-1);
}
DERIVED* clone = ::new(cloneMemory) DERIVED(static_cast<DERIVED&>(*this));
clone->dynamic_ = true;
return clone;
}

virtual void destroy() throw () {
if (dynamic_) {
DERIVED* derivedPtr = static_cast<DERIVED*>(this);
derivedPtr->~DERIVED();
scalable_free(derivedPtr);
}
}

virtual void throw_self() {
throw *static_cast<DERIVED*>(this);
}

virtual const char* name() const throw () {
return typeid(DERIVED).name();
}
#endif

virtual const char* what() const throw () {
return description_ ? description_->c_str() : "";
}
Expand Down
8 changes: 4 additions & 4 deletions gtsam/base/debug.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,29 +20,29 @@
#include <gtsam/config.h> // for GTSAM_USE_TBB

#ifdef GTSAM_USE_TBB
#include <tbb/mutex.h>
#include <mutex> // std::mutex, std::unique_lock
#endif

namespace gtsam {

GTSAM_EXPORT FastMap<std::string, ValueWithDefault<bool, false> > debugFlags;

#ifdef GTSAM_USE_TBB
tbb::mutex debugFlagsMutex;
std::mutex debugFlagsMutex;
#endif

/* ************************************************************************* */
bool guardedIsDebug(const std::string& s) {
#ifdef GTSAM_USE_TBB
tbb::mutex::scoped_lock lock(debugFlagsMutex);
std::unique_lock<std::mutex> lock(debugFlagsMutex);
#endif
return gtsam::debugFlags[s];
}

/* ************************************************************************* */
void guardedSetDebug(const std::string& s, const bool v) {
#ifdef GTSAM_USE_TBB
tbb::mutex::scoped_lock lock(debugFlagsMutex);
std::unique_lock<std::mutex> lock(debugFlagsMutex);
#endif
gtsam::debugFlags[s] = v;
}
Expand Down
7 changes: 2 additions & 5 deletions gtsam/base/treeTraversal/parallelTraversalTasks.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,8 @@
#include <boost/make_shared.hpp>

#ifdef GTSAM_USE_TBB
# include <tbb/tbb.h>
# include <tbb/scalable_allocator.h>
# undef max // TBB seems to include windows.h and we don't want these macros
# undef min
# undef ERROR
#include <tbb/task.h> // tbb::task, tbb::task_list
#include <tbb/scalable_allocator.h> // tbb::scalable_allocator

namespace gtsam {

Expand Down
4 changes: 2 additions & 2 deletions gtsam/base/types.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@
#include <cstddef>
#include <cstdint>

#include <exception>

#ifdef GTSAM_USE_TBB
#include <tbb/task_scheduler_init.h>
#include <tbb/tbb_exception.h>
#include <tbb/scalable_allocator.h>
#endif

Expand Down
2 changes: 1 addition & 1 deletion gtsam/geometry/Unit3.cpp
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ const Matrix32& Unit3::basis(OptionalJacobian<6, 2> H) const {
// NOTE(hayk): At some point it seemed like this reproducably resulted in
// deadlock. However, I don't know why and I can no longer reproduce it.
// It either was a red herring or there is still a latent bug left to debug.
tbb::mutex::scoped_lock lock(B_mutex_);
std::unique_lock<std::mutex> lock(B_mutex_);
#endif

const bool cachedBasis = static_cast<bool>(B_);
Expand Down
4 changes: 2 additions & 2 deletions gtsam/geometry/Unit3.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
#include <string>

#ifdef GTSAM_USE_TBB
#include <tbb/mutex.h>
#include <mutex> // std::mutex
#endif

namespace gtsam {
Expand All @@ -48,7 +48,7 @@ class Unit3 {
mutable boost::optional<Matrix62> H_B_; ///< Cached basis derivative

#ifdef GTSAM_USE_TBB
mutable tbb::mutex B_mutex_; ///< Mutex to protect the cached basis.
mutable std::mutex B_mutex_; ///< Mutex to protect the cached basis.
#endif

public:
Expand Down