KomputeProject · axsaucedo · Oct 25, 2020 · Oct 25, 2020 · Oct 25, 2020
diff --git a/README.md b/README.md
@@ -313,7 +313,7 @@ This by default configures without any of the extra build tasks (such as buildin
 | -DKOMPUTE_VK_API_MAJOR_VERSION=1     | Major version to use for the Vulkan API                                                 |
 | -DKOMPUTE_VK_API_MINOR_VERSION=1     | Minor version to use for the Vulkan API                                                 |
 | -DKOMPUTE_ENABLE_SPDLOG=1            | Enables the build with SPDLOG and FMT dependencies (must be installed)                  |
-| -DKOMPUTE_LOG_VERRIDE=1              | Does not define the SPDLOG_<LEVEL> macros if these are to be overriden                  |
+| -DKOMPUTE_LOG_VERRIDE=1              | Does not define the SPDLOG_<LEVEL> macros if these are to be overridden                 |
 | -DSPDLOG_ACTIVE_LEVEL                | The level for the log level on compile level (whether spdlog is enabled)                |
 | -DVVK_USE_PLATFORM_ANDROID_KHR       | Flag to enable android imports in kompute (enabled with -DKOMPUTE_OPT_ANDROID_BUILD)    |
 | -DRELEASE=1                          | Enable release build (enabled by cmake release build)                                   |
@@ -368,7 +368,7 @@ We appreciate PRs and Issues. If you want to contribute try checking the "Good f
 * Uses cmake as build system, and provides a top level makefile with recommended command
 * Uses xxd (or xxd.exe windows 64bit port) to convert shader spirv to header files
 * Uses doxygen and sphinx for documentation and autodocs
-* Uses vcpkg for finding the dependencies, it's the recommanded set up to retrieve the libraries
+* Uses vcpkg for finding the dependencies, it's the recommended set up to retrieve the libraries
 
 ##### Updating documentation
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -16,7 +16,7 @@ Index
     Asynchronous & Parallel Operations <overview/async-parallel>
     Memory Management Principles <overview/memory-management>
     Converting GLSL/HLSL Shaders to C++ Headers <overview/shaders-to-headers>
-    Mobile App Intergration (Android) <overview/mobile-android>
+    Mobile App Integration (Android) <overview/mobile-android>
     Game Engine Integration (Godot Engine) <overview/game-engine-godot>
     Code Index <genindex>
 
diff --git a/docs/overview/advanced-examples.rst b/docs/overview/advanced-examples.rst
@@ -218,10 +218,10 @@ Back to `examples list <#simple-examples>`_.
        // In this case we select device 0, and for queues, one queue from familyIndex 0
        // and one queue from familyIndex 2
        uint32_t deviceIndex(0);
-       std::vector<uint32_t> familyIndeces = {0, 2};
+       std::vector<uint32_t> familyIndices = {0, 2};
 
        // We create a manager with device index, and queues by queue family index
-       kp::Manager mgr(deviceIndex, familyIndeces);
+       kp::Manager mgr(deviceIndex, familyIndices);
 
        // We need to create explicit sequences with their respective queues
        // The second parameter is the index in the familyIndex array which is relative
@@ -276,7 +276,7 @@ Back to `examples list <#simple-examples>`_.
 
        // Here we can do other work
 
-       // We can now wait for thw two parallel tasks to finish
+       // We can now wait for the two parallel tasks to finish
        mgr.evalOpAwait("queueOne")
        mgr.evalOpAwait("queueTwo")
 
@@ -415,7 +415,7 @@ Converting to Kompute Terminology
 
    1. Create a Sequence to record and submit GPU commands
    2. Submit OpCreateTensor to create all the tensors 
-   3. Record the OpAlgo with the Logistic Regresion shader
+   3. Record the OpAlgo with the Logistic Regression shader
    4. Loop across number of iterations:
       4-a. Submit algo operation on LR shader
       4-b. Re-calculate weights from loss
@@ -454,10 +454,10 @@ Converting to Kompute Terminology
 
 
 
-#. Record the OpAlgo with the Logistic Regresion shader
+#. Record the OpAlgo with the Logistic Regression shader
    :raw-html-m2r:`<del>~</del>`\ :raw-html-m2r:`<del>~</del>`\ :raw-html-m2r:`<del>~</del>`\ :raw-html-m2r:`<del>~</del>`\ ~~
 
-Once we re-record, all the instructions that were recorded previosuly are cleared.
+Once we re-record, all the instructions that were recorded previously are cleared.
 
 Because of this we can record now the new commands which will consist of the following:
 
@@ -526,7 +526,7 @@ Because of this we can record now the new commands which will consist of the fol
                // Run evaluation which passes data through shader once
                sq->eval();
 
-               // Substract the resulting weights and biases 
+               // Subtract the resulting weights and biases
                for(size_t j = 0; j < bOut->size(); j++) {
                    wInVec[0] -= wOutI->data()[j];
                    wInVec[1] -= wOutJ->data()[j];

diff --git a/docs/overview/async-parallel.rst b/docs/overview/async-parallel.rst
@@ -69,7 +69,7 @@ Sequences can be executed in synchronously or asynchronously without having to c
 
 While this is running we can actually do other things like in this case create the shader we'll be using.
 
-In this case we create a shader that shoudl take a couple of milliseconds to run.
+In this case we create a shader that should take a couple of milliseconds to run.
 
 .. code-block:: cpp
     :linenos:
@@ -164,7 +164,7 @@ Let's take a tangible example. The [NVIDIA 1650](http://vulkan.gpuinfo.org/displ
 
 With this in mind, the NVIDIA 1650 as of today does not support intra-family parallelization, which means that if you were to submit commands in multiple queues of the same family, these would still be exectured synchronously. 
 
-However the NVIDIA 1650 does support inter-family parallelization, which menas that if we were to submit commands across multiple queues from different families, these would execute in parallel.
+However the NVIDIA 1650 does support inter-family parallelization, which means that if we were to submit commands across multiple queues from different families, these would execute in parallel.
 
 This means that we would be able to execute parallel workloads as long as we're running them across multiple queue families. This is one of the reasons why Vulkan Kompute enables users to explicitly select the underlying queues and queue families to run particular workloads on.
 
@@ -189,10 +189,10 @@ You will want to keep track of the indices you initialize your manager, as you w
     // In this case we select device 0, and for queues, one queue from familyIndex 0
     // and one queue from familyIndex 2
     uint32_t deviceIndex(0);
-    std::vector<uint32_t> familyIndeces = {0, 2};
+    std::vector<uint32_t> familyIndices = {0, 2};
 
     // We create a manager with device index, and queues by queue family index
-    kp::Manager mgr(deviceIndex, familyIndeces);
+    kp::Manager mgr(deviceIndex, familyIndices);
 
 We are now able to create sequences with a particular queue. 
 
@@ -281,7 +281,7 @@ We are able to wait for the tasks to complete by triggering the `evalOpAwait` on
 
     // Here we can do other work
 
-    // We can now wait for thw two parallel tasks to finish
+    // We can now wait for the two parallel tasks to finish
     mgr.evalOpAwait("queueOne")
     mgr.evalOpAwait("queueTwo")
 

diff --git a/single_include/kompute/Kompute.hpp b/single_include/kompute/Kompute.hpp
@@ -690,7 +690,7 @@ namespace kp {
  *
  * Tensors are the base building block in Kompute to perform operations across
  * GPUs. Each tensor would have a respective Vulkan memory and buffer, which
- * woudl be used to store their respective data. The tensors can be used for GPU
+ * would be used to store their respective data. The tensors can be used for GPU
  * data storage or transfer.
  */
 class Tensor
@@ -733,7 +733,7 @@ class Tensor
     /**
      * Initialiser which calls the initialisation for all the respective tensors
      * as well as creates the respective staging tensors. The staging tensors
-     * woudl only be created for the tensors of type TensorType::eDevice as
+     * would only be created for the tensors of type TensorType::eDevice as
      * otherwise there is no need to copy from host memory.
      */
     void init(std::shared_ptr<vk::PhysicalDevice> physicalDevice,
@@ -1267,12 +1267,12 @@ class Manager
      * they would like to create the resources on.
      *
      * @param physicalDeviceIndex The index of the physical device to use
-     * @param familyQueueIndeces (Optional) List of queue indeces to add for
+     * @param familyQueueIndices (Optional) List of queue indices to add for
      * explicit allocation
      * @param totalQueues The total number of compute queues to create.
      */
     Manager(uint32_t physicalDeviceIndex,
-            const std::vector<uint32_t>& familyQueueIndeces = {});
+            const std::vector<uint32_t>& familyQueueIndices = {});
 
     /**
      * Manager constructor which allows your own vulkan application to integrate
@@ -1509,7 +1509,7 @@ class Manager
     std::unordered_map<std::string, std::shared_ptr<Sequence>>
       mManagedSequences;
 
-    std::vector<uint32_t> mComputeQueueFamilyIndeces;
+    std::vector<uint32_t> mComputeQueueFamilyIndices;
     std::vector<std::shared_ptr<vk::Queue>> mComputeQueues;
 
     uint32_t mCurrentSequenceIndex = -1;
@@ -1523,7 +1523,7 @@ class Manager
 
     // Create functions
     void createInstance();
-    void createDevice(const std::vector<uint32_t>& familyQueueIndeces = {});
+    void createDevice(const std::vector<uint32_t>& familyQueueIndices = {});
 };
 
 } // End namespace kp
@@ -1556,7 +1556,7 @@ class Algorithm
               std::shared_ptr<vk::CommandBuffer> commandBuffer);
 
     /**
-     * Initialiser for the shader data provided to the algoithm as well as
+     * Initialiser for the shader data provided to the algorithm as well as
      * tensor parameters that will be used in shader.
      *
      * @param shaderFileData The bytes in spir-v format of the shader
@@ -1707,7 +1707,7 @@ class OpAlgoBase : public OpBase
      * the barriers that ensure the memory has been copied before going in and
      * out of the shader, as well as the dispatch operation that sends the
      * shader processing to the gpu. This function also records the GPU memory
-     * copy of the output data for the staging bufffer so it can be read by the
+     * copy of the output data for the staging buffer so it can be read by the
      * host.
      */
     virtual void record() override;
@@ -1745,7 +1745,7 @@ class OpAlgoBase : public OpBase
 
 } // End namespace kp
 
-// Including implemenation for template class
+// Including implementation for template class
 #ifndef OPALGOBASE_IMPL
 #define OPALGOBASE_IMPL
 
@@ -1972,7 +1972,7 @@ class OpAlgoLhsRhsOut : public OpAlgoBase<tX, tY, tZ>
      * the barriers that ensure the memory has been copied before going in and
      * out of the shader, as well as the dispatch operation that sends the
      * shader processing to the gpu. This function also records the GPU memory
-     * copy of the output data for the staging bufffer so it can be read by the
+     * copy of the output data for the staging buffer so it can be read by the
      * host.
      */
     virtual void record() override;
@@ -1996,7 +1996,7 @@ class OpAlgoLhsRhsOut : public OpAlgoBase<tX, tY, tZ>
 
 } // End namespace kp
 
-// Including implemenation for template class
+// Including implementation for template class
 #ifndef OPALGOLHSRHSOUT_CPP
 #define OPALGOLHSRHSOUT_CPP
 
@@ -2247,7 +2247,7 @@ class OpTensorCopy : public OpBase
     void init() override;
 
     /**
-     * Records the copy commands from teh first tensor into all the other tensors provided. Also optionally records a barrier.
+     * Records the copy commands from the first tensor into all the other tensors provided. Also optionally records a barrier.
      */
     void record() override;
 

diff --git a/src/Manager.cpp b/src/Manager.cpp
@@ -29,12 +29,12 @@ Manager::Manager()
 {}
 
 Manager::Manager(uint32_t physicalDeviceIndex,
-                 const std::vector<uint32_t>& familyQueueIndeces)
+                 const std::vector<uint32_t>& familyQueueIndices)
 {
     this->mPhysicalDeviceIndex = physicalDeviceIndex;
 
     this->createInstance();
-    this->createDevice(familyQueueIndeces);
+    this->createDevice(familyQueueIndices);
 }
 
 Manager::Manager(std::shared_ptr<vk::Instance> instance,
@@ -119,7 +119,7 @@ Manager::createManagedSequence(std::string sequenceName, uint32_t queueIndex)
       std::make_shared<Sequence>(this->mPhysicalDevice,
                                  this->mDevice,
                                  this->mComputeQueues[queueIndex],
-                                 this->mComputeQueueFamilyIndeces[queueIndex]);
+                                 this->mComputeQueueFamilyIndices[queueIndex]);
     sq->init();
 
     if (sequenceName.empty()) {
@@ -128,7 +128,7 @@ Manager::createManagedSequence(std::string sequenceName, uint32_t queueIndex)
           { KP_DEFAULT_SESSION + std::to_string(this->mCurrentSequenceIndex),
             sq });
     } else {
-        // TODO: Check if sequence doens't already exist
+        // TODO: Check if sequence doesn't already exist
         this->mManagedSequences.insert({ sequenceName, sq });
     }
     return sq;
@@ -220,7 +220,7 @@ Manager::createInstance()
 }
 
 void
-Manager::createDevice(const std::vector<uint32_t>& familyQueueIndeces)
+Manager::createDevice(const std::vector<uint32_t>& familyQueueIndices)
 {
 
     SPDLOG_DEBUG("Kompute Manager creating Device");
@@ -251,7 +251,7 @@ Manager::createDevice(const std::vector<uint32_t>& familyQueueIndeces)
                 this->mPhysicalDeviceIndex,
                 physicalDeviceProperties.deviceName);
 
-    if (!familyQueueIndeces.size()) {
+    if (!familyQueueIndices.size()) {
         // Find compute queue
         std::vector<vk::QueueFamilyProperties> allQueueFamilyProperties =
           physicalDevice.getQueueFamilyProperties();
@@ -272,14 +272,14 @@ Manager::createDevice(const std::vector<uint32_t>& familyQueueIndeces)
             throw std::runtime_error("Compute queue is not supported");
         }
 
-        this->mComputeQueueFamilyIndeces.push_back(computeQueueFamilyIndex);
+        this->mComputeQueueFamilyIndices.push_back(computeQueueFamilyIndex);
     } else {
-        this->mComputeQueueFamilyIndeces = familyQueueIndeces;
+        this->mComputeQueueFamilyIndices = familyQueueIndices;
     }
 
     std::unordered_map<uint32_t, uint32_t> familyQueueCounts;
     std::unordered_map<uint32_t, std::vector<float>> familyQueuePriorities;
-    for (const auto& value : this->mComputeQueueFamilyIndeces) {
+    for (const auto& value : this->mComputeQueueFamilyIndices) {
         familyQueueCounts[value]++;
         familyQueuePriorities[value].push_back(1.0f);
     }
@@ -308,7 +308,7 @@ Manager::createDevice(const std::vector<uint32_t>& familyQueueIndeces)
       &deviceCreateInfo, nullptr, this->mDevice.get());
     SPDLOG_DEBUG("Kompute Manager device created");
 
-    for (const uint32_t& familyQueueIndex : this->mComputeQueueFamilyIndeces) {
+    for (const uint32_t& familyQueueIndex : this->mComputeQueueFamilyIndices) {
         std::shared_ptr<vk::Queue> currQueue = std::make_shared<vk::Queue>();
 
         this->mDevice->getQueue(familyQueueIndex,

diff --git a/src/include/kompute/Algorithm.hpp b/src/include/kompute/Algorithm.hpp
@@ -30,7 +30,7 @@ class Algorithm
               std::shared_ptr<vk::CommandBuffer> commandBuffer);
 
     /**
-     * Initialiser for the shader data provided to the algoithm as well as
+     * Initialiser for the shader data provided to the algorithm as well as
      * tensor parameters that will be used in shader.
      *
      * @param shaderFileData The bytes in spir-v format of the shader

diff --git a/src/include/kompute/Manager.hpp b/src/include/kompute/Manager.hpp
@@ -29,12 +29,12 @@ class Manager
      * they would like to create the resources on.
      *
      * @param physicalDeviceIndex The index of the physical device to use
-     * @param familyQueueIndeces (Optional) List of queue indeces to add for
+     * @param familyQueueIndices (Optional) List of queue indices to add for
      * explicit allocation
      * @param totalQueues The total number of compute queues to create.
      */
     Manager(uint32_t physicalDeviceIndex,
-            const std::vector<uint32_t>& familyQueueIndeces = {});
+            const std::vector<uint32_t>& familyQueueIndices = {});
 
     /**
      * Manager constructor which allows your own vulkan application to integrate
@@ -271,7 +271,7 @@ class Manager
     std::unordered_map<std::string, std::shared_ptr<Sequence>>
       mManagedSequences;
 
-    std::vector<uint32_t> mComputeQueueFamilyIndeces;
+    std::vector<uint32_t> mComputeQueueFamilyIndices;
     std::vector<std::shared_ptr<vk::Queue>> mComputeQueues;
 
     uint32_t mCurrentSequenceIndex = -1;
@@ -285,7 +285,7 @@ class Manager
 
     // Create functions
     void createInstance();
-    void createDevice(const std::vector<uint32_t>& familyQueueIndeces = {});
+    void createDevice(const std::vector<uint32_t>& familyQueueIndices = {});
 };
 
 } // End namespace kp
diff --git a/src/include/kompute/Tensor.hpp b/src/include/kompute/Tensor.hpp
@@ -11,7 +11,7 @@ namespace kp {
  *
  * Tensors are the base building block in Kompute to perform operations across
  * GPUs. Each tensor would have a respective Vulkan memory and buffer, which
- * woudl be used to store their respective data. The tensors can be used for GPU
+ * would be used to store their respective data. The tensors can be used for GPU
  * data storage or transfer.
  */
 class Tensor
@@ -54,7 +54,7 @@ class Tensor
     /**
      * Initialiser which calls the initialisation for all the respective tensors
      * as well as creates the respective staging tensors. The staging tensors
-     * woudl only be created for the tensors of type TensorType::eDevice as
+     * would only be created for the tensors of type TensorType::eDevice as
      * otherwise there is no need to copy from host memory.
      */
     void init(std::shared_ptr<vk::PhysicalDevice> physicalDevice,

diff --git a/src/include/kompute/operations/OpAlgoBase.hpp b/src/include/kompute/operations/OpAlgoBase.hpp
@@ -104,7 +104,7 @@ class OpAlgoBase : public OpBase
      * the barriers that ensure the memory has been copied before going in and
      * out of the shader, as well as the dispatch operation that sends the
      * shader processing to the gpu. This function also records the GPU memory
-     * copy of the output data for the staging bufffer so it can be read by the
+     * copy of the output data for the staging buffer so it can be read by the
      * host.
      */
     virtual void record() override;
@@ -143,7 +143,7 @@ class OpAlgoBase : public OpBase
 
 } // End namespace kp
 
-// Including implemenation for template class
+// Including implementation for template class
 #ifndef OPALGOBASE_IMPL
 #define OPALGOBASE_IMPL
 

diff --git a/src/include/kompute/operations/OpAlgoLhsRhsOut.hpp b/src/include/kompute/operations/OpAlgoLhsRhsOut.hpp
@@ -63,7 +63,7 @@ class OpAlgoLhsRhsOut : public OpAlgoBase<tX, tY, tZ>
      * the barriers that ensure the memory has been copied before going in and
      * out of the shader, as well as the dispatch operation that sends the
      * shader processing to the gpu. This function also records the GPU memory
-     * copy of the output data for the staging bufffer so it can be read by the
+     * copy of the output data for the staging buffer so it can be read by the
      * host.
      */
     virtual void record() override;
@@ -87,7 +87,7 @@ class OpAlgoLhsRhsOut : public OpAlgoBase<tX, tY, tZ>
 
 } // End namespace kp
 
-// Including implemenation for template class
+// Including implementation for template class
 #ifndef OPALGOLHSRHSOUT_CPP
 #define OPALGOLHSRHSOUT_CPP