Skip to content

Commit

Permalink
Merge branch 'main' into vibhansa/entry_cache_new
Browse files Browse the repository at this point in the history
  • Loading branch information
vibhansa-msft authored Sep 18, 2024
2 parents a392356 + de2f9fd commit ea40ce7
Show file tree
Hide file tree
Showing 7 changed files with 96 additions and 14 deletions.
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @vibhansa-msft @souravgupta-msft @ashruti-msft @syeleti-msft
* @vibhansa-msft @souravgupta-msft @ashruti-msft @syeleti-msft @jainakanksha-msft
27 changes: 18 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,31 @@
# Blobfuse2 - A Microsoft supported Azure Storage FUSE driver
## About
Blobfuse2 is an open source project developed to provide a virtual filesystem backed by the Azure Storage. It uses the libfuse open source library (fuse3) to communicate with the Linux FUSE kernel module, and implements the filesystem operations using the Azure Storage REST APIs.
This is the next generation [blobfuse](https://github.com/Azure/azure-storage-fuse)
This is the next generation [blobfuse](https://github.com/Azure/azure-storage-fuse).

Blobfuse2 is stable, and is ***supported by Microsoft*** provided that it is used within its limits documented here. Blobfuse2 supports both reads and writes however, it does not guarantee continuous sync of data written to storage using other APIs or other mounts of Blobfuse2. For data integrity it is recommended that multiple sources do not modify the same blob/file. Please submit an issue [here](https://github.com/azure/azure-storage-fuse/issues) for any issues/feature requests/questions.
## About Data Consistency and Concurrency
Blobfuse2 is stable and ***supported by Microsoft*** when used within its [documented limits](#un-supported-file-system-operations). Blobfuse2 supports high-performance reads and writes with strong consistency; however, it is recommended that multiple clients do not modify the same blob/file simultaneously to ensure data integrity. Blobfuse2 does not guarantee continuous synchronization of data written to the same blob/file using multiple clients or across multiple mounts of Blobfuse2 concurrently. If you modify an existing blob/file with another client while also reading that object, Blobfuse2 will not return the most up-to-date data. To ensure your reads see the newest blob/file data, disable all forms of caching at kernel (using `direct-io`) as well as at Blobfuse2 level, and then re-open the blob/file.

[This](https://github.com/Azure/azure-storage-fuse/tree/main?tab=readme-ov-file#config-guide) section will help you choose the correct config for Blobfuse2.
Please submit an issue [here](https://github.com/azure/azure-storage-fuse/issues) for any issues/feature requests/questions.

[This](#config-guide) section will help you choose the correct config for Blobfuse2.

## NOTICE
- If you are using versions 2.2.0, 2.2.1 and 2.3.0, refrain from using Block-cache mode and switch to `file-cache` mode. [Known issues](https://github.com/Azure/azure-storage-fuse/wiki/Blobfuse2-Known-issues) in these versions are fixed in version **`2.3.2`**.
- Due to known data consistency issues when using Blobfuse2 in `block-cache` mode, it is strongly recommended that all Blobfuse2 installations be upgraded to version 2.3.2. For more information, see [this](https://github.com/Azure/azure-storage-fuse/wiki/Blobfuse2-Known-issues).
- As of version 2.3.0, blobfuse has updated its authentication methods. For Managed Identity, Object-ID based OAuth is solely accessible via CLI-based login, requiring Azure CLI on the system. For a dependency-free option, users may utilize Application/Client-ID or Resource ID based authentication.
- `streaming` mode is being deprecated.

## Limitations in Block Cache
- Parallel write operations using multiple handles on a same file is not supported and might lead to data inconsistency.
- Read operation on a file which is being written via another handle will not return updated data.
- When using `cp` utility on mounted path, always use `--sparse=never` parameter. For example, `cp --sparse=never src dest`
- In write operations data is persisted in storage only on close, sync or flush calls.
- User applications must check the returned code for write, close and flush operations.
- Concurrent write operations on the same file using multiple handles is not checked for data consistency and may lead to incorrect data being written.
- A read operation on a file that is being written to simultaneously by another process or handle will not return the most up-to-date data.
- When copying files with trailing null bytes using `cp` utility to a Blobfuse2 mounted path, use `--sparse=never` parameter to avoid data being trimmed. For example, `cp --sparse=never src dest`.
- In write operations, data written is persisted (or committed) to the Azure Storage container only when close, sync or flush operations are called by user application.
- Files cannot be modified if they were originally created with block-size different than the one configured.

## Recommendations in Block Cache
- User applications must check the returned code (success/failure) for filesystem calls like read, write, close, flush, etc. If error is returned, the application must abort their respective operation.
- User applications must ensure that there is only one writer at a time for a given file.
- When dealing with very large files (in TiB), the block-size must be configured accordingly. Azure Storage supports only [50,000 blocks](https://learn.microsoft.com/en-us/rest/api/storageservices/put-block-list?tabs=microsoft-entra-id#remarks) per blob.

## Blobfuse2 Benchmarks
[This](https://azure.github.io/azure-storage-fuse/) page lists various benchmarking results for HNS and FNS Storage account.
Expand Down Expand Up @@ -261,6 +269,7 @@ If your use-case involves updating/uploading file(s) through other means and you
`docker run -it --rm --cap-add=SYS_ADMIN --device=/dev/fuse --security-opt apparmor:unconfined <environment variables> <docker image>`
- In case of `mount all` system may limit on number of containers you can mount in parallel (when you go above 100 containers). To increase this system limit use below command
`echo 256 | sudo tee /proc/sys/fs/inotify/max_user_instances`
- Refer [this](#limitations-in-block-cache) for block-cache limitations.

### Syslog security warning
By default, Blobfuse2 will log to syslog. The default settings will, in some cases, log relevant file paths to syslog.
Expand Down
73 changes: 73 additions & 0 deletions common/util_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,11 @@
package common

import (
"bytes"
"fmt"
"math/rand"
"os"
"os/exec"
"path/filepath"
"testing"
"time"
Expand Down Expand Up @@ -67,6 +69,74 @@ func TestUtil(t *testing.T) {
suite.Run(t, new(utilTestSuite))
}

func (suite *typesTestSuite) TestIsMountActiveNoMount() {
var out bytes.Buffer
cmd := exec.Command("pidof", "blobfuse2")
cmd.Stdout = &out
err := cmd.Run()
suite.assert.Equal("exit status 1", err.Error())
res, err := IsMountActive("/mnt/blobfuse")
suite.assert.Nil(err)
suite.assert.False(res)
}

func (suite *typesTestSuite) TestIsMountActiveTwoMounts() {
var out bytes.Buffer

// Define the file name and the content you want to write
fileName := "config.yaml"

lbpath := filepath.Join(home_dir, "lbpath")
os.MkdirAll(lbpath, 0777)
defer os.RemoveAll(lbpath)

content := "components:\n" +
" - libfuse\n" +
" - loopbackfs\n\n" +
"loopbackfs:\n" +
" path: " + lbpath + "\n\n"

mntdir := filepath.Join(home_dir, "mountdir")
os.MkdirAll(mntdir, 0777)
defer os.RemoveAll(mntdir)

dir, err := os.Getwd()
suite.assert.Nil(err)
configFile := filepath.Join(dir, "config.yaml")
// Create or open the file. If it doesn't exist, it will be created.
file, err := os.Create(fileName)
suite.assert.Nil(err)
defer file.Close() // Ensure the file is closed after we're done

// Write the content to the file
_, err = file.WriteString(content)
suite.assert.Nil(err)

err = os.Chdir("..")
suite.assert.Nil(err)

dir, err = os.Getwd()
suite.assert.Nil(err)
binary := filepath.Join(dir, "blobfuse2")
cmd := exec.Command(binary, mntdir, "--config-file", configFile)
cmd.Stdout = &out
err = cmd.Run()
suite.assert.Nil(err)

res, err := IsMountActive(mntdir)
suite.assert.Nil(err)
suite.assert.True(res)

res, err = IsMountActive("/mnt/blobfuse")
suite.assert.Nil(err)
suite.assert.False(res)

cmd = exec.Command(binary, "unmount", mntdir)
cmd.Stdout = &out
err = cmd.Run()
suite.assert.Nil(err)
}

func (suite *typesTestSuite) TestDirectoryExists() {
rand := randomString(8)
dir := filepath.Join(home_dir, "dir"+rand)
Expand Down Expand Up @@ -261,6 +331,9 @@ func (suite *utilTestSuite) TestDirectoryCleanup() {

err = TempCacheCleanup(dirName)
suite.assert.Nil(err)

_ = os.RemoveAll(dirName)

}

func (suite *utilTestSuite) TestGetFuseMinorVersion() {
Expand Down
2 changes: 1 addition & 1 deletion perf_testing/scripts/highspeed_create.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def main(folder, num_files):
total_data_written = num_files * 20 # in GB
speed_gbps = (total_data_written * 8) / total_time # converting GB to Gb and then calculating Gbps

print(json.dumps({"name": "create_10_20GB_file", "total_time": total_time, "speed": speed_gbps, "unit": "GiB/s"}))
print(json.dumps({"name": "create_10_20GB_file", "total_time": total_time, "speed": speed_gbps / 8, "unit": "GiB/s"}))

if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Create multiple 20GB files in parallel.')
Expand Down
2 changes: 1 addition & 1 deletion perf_testing/scripts/highspeed_read.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def main(file_paths):
total_size_gb = total_size / (1024 ** 3) # Convert bytes to GB
speed_gbps = (total_size * 8) / (time_taken * 10**9) # Convert bytes to bits and calculate speed in Gbps

print(json.dumps({"name": "read_10_20GB_file", "total_time": time_taken, "speed": speed_gbps, "unit": "GiB/s"}))
print(json.dumps({"name": "read_10_20GB_file", "total_time": time_taken, "speed": speed_gbps / 8, "unit": "GiB/s"}))

if __name__ == "__main__":
if len(sys.argv) < 2:
Expand Down
2 changes: 1 addition & 1 deletion perf_testing/scripts/read.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@
read_mbps = ((bytes_read/read_time) * 8)/(1024 * 1024)
total_mbps = ((bytes_read/total_time) * 8)/(1024 * 1024)

print(json.dumps({"name": "read_" + size + "GB", "open_time": open_time, "read_time": read_time, "close_time": close_time, "total_time": total_time, "read_mbps": read_mbps, "speed": total_mbps, "unit": "MiB/s"}))
print(json.dumps({"name": "read_" + size + "GB", "open_time": open_time, "read_time": read_time, "close_time": close_time, "total_time": total_time, "read_mbps": read_mbps / 8, "speed": total_mbps / 8, "unit": "MiB/s"}))
2 changes: 1 addition & 1 deletion perf_testing/scripts/write.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@
write_mbps = ((bytes_written/write_time) * 8)/(1024 * 1024)
total_mbps = ((bytes_written/total_time) * 8)/(1024 * 1024)

print(json.dumps({"name": "write_" + size + "GB", "open_time": open_time, "write_time": write_time, "close_time": close_time, "total_time": total_time, "write_mbps": write_mbps, "speed": total_mbps, "unit": "MiB/s"}))
print(json.dumps({"name": "write_" + size + "GB", "open_time": open_time, "write_time": write_time, "close_time": close_time, "total_time": total_time, "write_mbps": write_mbps / 8, "speed": total_mbps / 8, "unit": "MiB/s"}))

0 comments on commit ea40ce7

Please sign in to comment.