Merge branch 'main' into limit-writeback

juicedata · Mar 23, 2022 · 3696c91 · 3696c91
2 parents 08fa7b1 + adbabaf
commit 3696c91
Show file tree

Hide file tree

Showing 29 changed files with 543 additions and 102 deletions.
diff --git a/cmd/sync.go b/cmd/sync.go
@@ -137,6 +137,11 @@ Supported storage systems: https://juicefs.com/docs/community/how_to_setup_objec
 				Name:  "include",
 				Usage: "don't exclude Key matching PATTERN",
 			},
+			&cli.BoolFlag{
+				Name:    "links",
+				Aliases: []string{"l"},
+				Usage:   "copy symlinks as symlinks",
+			},
 			&cli.StringFlag{
 				Name:  "manager",
 				Usage: "manager address",
@@ -246,6 +251,10 @@ func createSyncStorage(uri string, conf *sync.Config) (object.ObjectStorage, err
 	}
 	name := strings.ToLower(u.Scheme)
 	endpoint := u.Host
+	if conf.Links && name != "file" {
+		logger.Warnf("storage %s does not support symlink, ignore it", uri)
+		conf.Links = false
+	}
 
 	isS3PathTypeUrl := isS3PathType(endpoint)
 

diff --git a/docs/en/administration/fault_diagnosis_and_analysis.md b/docs/en/administration/fault_diagnosis_and_analysis.md
@@ -6,9 +6,15 @@ slug: /fault_diagnosis_and_analysis
 
 # Fault Diagnosis and Analysis
 
-## Error Log
+## Client log
 
-When JuiceFS run in background (through [`-d` option](../reference/command_reference.md#juicefs-mount) when mount volume), logs will output to syslog and `/var/log/juicefs.log` (v0.15+, refer to [`--log` option](../reference/command_reference.md#juicefs-mount)). Depending on your operating system, you can get the logs through different commands:
+JuiceFS client will output logs for troubleshooting during operation, the log levels are from low to high: DEBUG, INFO, WARNING, ERROR, FATAL, by default only logs above INFO level are output. If you need to output DEBUG level logs, you need to explicitly enable it when running the JuiceFS client, such as adding the `--debug` option.
+
+Different JuiceFS clients obtain logs in different ways, which are described below.
+
+### Mount point
+
+When the JuiceFS file system is mounted with the [`-d` option](../reference/command_reference.md#juicefs-mount) (indicating running in the background), the log will be output to the syslog and `/var/log/juicefs.log` (requires v0.15 and above client, see [`--log` option](../reference/command_reference.md#juicefs-mount)). Depending on the operating system you are using, you can get the logs with different commands:
 
 ```bash
 # macOS
@@ -20,11 +26,11 @@ $ cat /var/log/syslog | grep 'juicefs'
 # CentOS based system
 $ cat /var/log/messages | grep 'juicefs'
 
-# v0.15+
+# All system (require v0.15+ JuiceFS)
 $ tail -n 100 /var/log/juicefs.log
 ```
 
-There are 4 log levels. You can use the `grep` command to filter different levels of logs for performance analysis or troubleshooting:
+You can use the `grep` command to filter different levels of logs for performance analysis or troubleshooting:
 
 ```
 $ cat /var/log/syslog | grep 'juicefs' | grep '<INFO>'
@@ -33,9 +39,47 @@ $ cat /var/log/syslog | grep 'juicefs' | grep '<ERROR>'
 $ cat /var/log/syslog | grep 'juicefs' | grep '<FATAL>'
 ```
 
-## Access Log
+### Kubernetes CSI Driver
+
+Depending on the version of the JuiceFS CSI Driver you are using, there will be different ways to obtain logs. For details, please refer to [CSI Driver documentation](https://juicefs.com/docs/csi/troubleshooting).
+
+### S3 Gateway
+
+The S3 gateway only supports running in the foreground, so client logs are output directly to the terminal. If you are deploying the S3 gateway in Kubernetes, you need to view the logs for the corresponding pod.
+
+### Hadoop Java SDK
+
+The logs of the application process (such as Spark executor) using the JuiceFS Hadoop Java SDK will include the JuiceFS client logs, because they are mixed with the logs generated by the application itself, and need to be filtered by specific keywords (such as `juicefs`, pay attention here case is ignored).
+
+
+## Access log
+
+Each JuiceFS client has an access log that details all operations on the file system, such as operation type, user ID, group ID, file inodes and how long this operation took. Access logs can be used for various purposes such as performance analysis, auditing, troubleshooting.
+
+### Access log format
+
+An example format of an access log is as follows:
+
+```
+2021.01.15 08:26:11.003330 [uid:0,gid:0,pid:4403] write (17669,8666,4993160): OK <0.000010>
+```
+
+The meaning of each column is:
+
+- `2021.01.15 08:26:11.003330`: The time of the current operation
+- `[uid:0,gid:0,pid:4403]`: User ID, group ID, process ID of the current operation
+- `write`: Operation type
+- `(17669,8666,4993160)`: The input parameters of the current operation type. For example, the input parameters of the `write` operation in the example are the inode of the written file, the size of the written data, and the offset of the written file. Different operation types have different parameters. For details, please refer to the [`vfs.go`](https://github.com/juicedata/juicefs/blob/main/pkg/vfs/vfs.go) file.
+- `OK`: Whether the current operation is successful or not, if it is unsuccessful, specific failure information will be output.
+- `<0.000010>`: The time (in seconds) that the current operation took
 
-There is a virtual file called `.accesslog` in the root of JuiceFS to show all the operations and the time they takes, for example:
+You can debug and analyze performance issues with access log, or try `juicefs profile <mount-point>` to see real-time statistics. Run `juicefs profile -h` or refer to [here](../benchmark/operations_profiling.md) to learn more about this subcommand.
+
+Different JuiceFS clients obtain access log in different ways, which are described below.
+
+### Mount point
+
+There is a virtual file named `.accesslog` in the root directory of the JuiceFS file system mount point, the contents of which can be viewed by the `cat` command (the command will not exit), for example (assuming the root directory of the mount point is `/jfs`):
 
 ```bash
 $ cat /jfs/.accesslog
@@ -44,9 +88,25 @@ $ cat /jfs/.accesslog
 2021.01.15 08:26:11.003616 [uid:0,gid:0,pid:4403] write (17666,390,951582): OK <0.000006>
 ```
 
-The last number on each line is the time (in seconds) current operation takes. You can use this to know information of every operation, or try `juicefs profile /jfs` to monitor aggregated statistics. Please run `juicefs profile -h` or refer to [here](../benchmark/operations_profiling.md) to learn more about this subcommand.
+### Kubernetes CSI driver
+
+[Get mount pod](https://juicefs.com/docs/csi/troubleshooting/#get-mount-pod), just view the `.accesslog` file in the root directory of the JuiceFS file system mount point in mount pod, 
+mount point in mount pod is `/jfs/<pv_volumeHandle>`, for example (assuming PV volumeHandle is `pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373`):
+
+```bash
+kubectl -n kube-system exec juicefs-chaos-k8s-002-pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373 -- cat /jfs/pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373/.accesslog
+````
+
+### S3 Gateway
+
+You need to add the [`--access-log` option](../reference/command_reference.md#juicefs-gateway) when starting the S3 gateway to specify the path to output the access log. By default, the S3 gateway does not output the access log.
+
+### Hadoop Java SDK
+
+You need to add the `juicefs.access-log` configuration item in the [client configurations](../deployment/hadoop_java_sdk.md#other-configurations) of the JuiceFS Hadoop Java SDK to specify the path of the access log output, and the access log is not output by default.
+
 
-## Runtime Information
+## Runtime information
 
 By default, JuiceFS clients will listen to a TCP port locally via [pprof](https://pkg.go.dev/net/http/pprof) to get runtime information such as Goroutine stack information, CPU performance statistics, memory allocation statistics. You can see the specific port number that the current JuiceFS client is listening on by using the system command (e.g. `lsof`):
 

diff --git a/docs/en/deployment/juicefs_on_docker.md b/docs/en/deployment/juicefs_on_docker.md
@@ -5,11 +5,11 @@ slug: /juicefs_on_docker
 ---
 # Use JuiceFS on Docker
 
-There are  three ways to use JuiceFS on Docker:
+There are three ways to use JuiceFS with Docker:
 
-## 1. Volume Mapping
+## 1. Volume Mapping {#volume-mapping}
 
-This method is to map the directories in the JuiceFS mount point to the Docker container. For example, the JuiceFS storage is mounted in the `/mnt/jfs` directory. When creating a container, you can map JuiceFS storage to the Docker container as follows:
+Volume mapping maps the directories in the JuiceFS mount point to the Docker container. For example, assuming a JuiceFS file system is mounted to the `/mnt/jfs` directory, you can map this file system when creating a docker container as follows:
 
 ```shell
 $ sudo docker run -d --name nginx \
@@ -18,13 +18,13 @@ $ sudo docker run -d --name nginx \
   nginx
 ```
 
-By default, only the user who mounts the JuiceFS storage has the access permissions for the storage. When you need to map the JuiceFS storage to a Docker container, if you are not using the root identity to mount the JuiceFS storage, you need to turn on the FUSE `user_allow_other` first, and then re-mount the JuiceFS with `-o allow_other` option.
+By default, only the user who mounts the JuiceFS file system has access permissions. To make a file system mappable for docker containers created by others, you need to enable FUSE option `user_allow_other` first, and then re-mount the file system with option `-o allow_other`.
 
-> **Note**: JuiceFS storage mounted with root user identity or `sudo` will automatically add the `allow_other` option, no manual setting is required.
+> **Note**: JuiceFS file system mounted with root privilege has already enabled the `allow_other` option. Thus, you don't need to set it manually.
 
-### FUSE Setting
+### FUSE Settings
 
-By default, the `allow_other` option is only allowed to be used by the root user. In order to allow other users to use this mount option, the FUSE configuration file needs to be modified.
+By default, the `allow_other` option is only available for users with root privilege. In order to allow other users to use this mount option, the FUSE configuration file needs to be modified.
 
 ### Change the configuration file
 
@@ -34,7 +34,7 @@ Edit the configuration file of FUSE, usually `/etc/fuse.conf`:
 $ sudo nano /etc/fuse.conf
 ```
 
-Delete the `# ` symbol in front of `user_allow_other` in the configuration file, and modify it as follows:
+First, uncomment the line `# user_allow_other` by deleting the`#` symbol. Your configuration file should look like the following after the modification.
 
 ```conf
 # /etc/fuse.conf - Configuration file for Filesystem in Userspace (FUSE)
@@ -49,15 +49,15 @@ user_allow_other
 
 #### Re-mount JuiceFS
 
-After the `allow_other` of FUSE is enabled, you need to re-mount the JuiceFS file systemd with the `allow_other` option, for example:
+Run the following command to re-mount the JuiceFS file system with `allow_other` option.
 
 ```sh
 $ juicefs mount -d -o allow_other redis://<your-redis-url>:6379/1 /mnt/jfs
 ```
 
 ## 2. Docker Volume Plugin
 
-We can also use [volume plugin](https://docs.docker.com/engine/extend/) to access JuiceFS.
+[Volume plugin](https://docs.docker.com/engine/extend/) is another option to access JuiceFS.
 
 ```sh
 $ docker plugin install juicedata/juicefs
@@ -71,13 +71,13 @@ $ docker volume create -d juicedata/juicefs:latest -o name={{VOLUME_NAME}} -o me
 $ docker run -it -v jfsvolume:/opt busybox ls /opt
 ```
 
-Replace above `{{VOLUME_NAME}}`, `{{META_URL}}`, `{{ACCESS_KEY}}`, `{{SECRET_KEY}}` to your own volume setting. For more details about JuiceFS volume plugin, refer [juicedata/docker-volume-juicefs](https://github.com/juicedata/docker-volume-juicefs) repository.
+Replace `{{VOLUME_NAME}}`, `{{META_URL}}`, `{{ACCESS_KEY}}` and `{{SECRET_KEY}}` to fit your situation. For more details about JuiceFS volume plugin, please refer to [juicedata/docker-volume-juicefs](https://github.com/juicedata/docker-volume-juicefs) repository.
 
 ## 3. Mount JuiceFS in a Container
 
-This method is to mount and use the JuiceFS storage directly in the Docker container. Compared with the first method, directly mounting JuiceFS in the container can reduce the chance of file misoperation. It also makes container management clearer and more intuitive.
+In this section, we introduce a way to mount and use JuiceFS file system directly in a Docker container. Compared with [volume mapping](#volume-mapping), directly mounting reduces the chance of misoperating files. It also makes container management clearer and more intuitive.
 
-Since the file system mounting in the container needs to copy the JuiceFS client to the container, the process of downloading or copying the JuiceFS client and mounting the file system needs to be written into the Dockerfile, and then rebuilt the image. For example, you can refer to the following Dockerfile to package the JuiceFS client into the Alpine image.
+To mount a JuiceFS file system in a Docker container, the JuiceFS client executable needs to be copied into the image. Usually, this could be done by writing the commands that download or copy the executable and mount the file system into your Dockerfile, and rebuild the image. You can refer to the following Dockerfile as an example which packs the JuiceFS client into the Alpine image.
 
 ```dockerfile
 FROM alpine:latest
@@ -96,7 +96,7 @@ RUN apk add --no-cache curl && \
 ENTRYPOINT ["/usr/bin/juicefs", "mount"]
 ```
 
-In addition, since the use of FUSE in the container requires corresponding permissions, when creating the container, you need to specify the `--privileged=true` option, for example:
+In addition, using FUSE in a container requires specific permissions. You need to specify the `--privileged=true` option on creating. For example:
 
 ```shell
 $ sudo docker run -d --name nginx \

diff --git a/docs/en/reference/command_reference.md b/docs/en/reference/command_reference.md
@@ -561,6 +561,9 @@ don't exclude Key matching PATTERN. Need to be used with `--exclude PATTERN`.
 The order in which `--exclude` and `--include` are set will affect the result. Each object will be matched in the order in which the two parameters appear. Once the PATTERN of a parameter is matched, the behavior of the object is the type of the parameter, and the matching of the parameters that appear later will not be attempted. If the object is not matched by any of the parameters, the default behavior of the object is include . `--include` and `--exclude` parameters are designed with reference to `rsync`, but currently we do not support the two matching rules of `**` and `***` in `rsync`.
 :::
 
+`--links, -l`<br />
+copy symlinks as symlinks (default: false)
+
 `--manager value`<br />
 manager address
 

diff --git a/docs/en/reference/how_to_setup_metadata_engine.md b/docs/en/reference/how_to_setup_metadata_engine.md
@@ -113,7 +113,7 @@ Other PostgreSQL-compatible databases (such as CockroachDB) can also be used as
 
 ### Create a file system
 
-When using PostgreSQL as the metadata storage engine, the following format is usually used to access the database:
+When using PostgreSQL as the metadata storage engine, you need to create a database manually before create the file system, following format is usually used to access the database:
 
 ```shell
 postgres://<username>[:<password>]@<host>[:5432]/<database-name>[?parameters]
@@ -133,7 +133,7 @@ $ juicefs format --storage s3 \
 A more secure approach would be to pass the database password through the environment variable `META_PASSWORD`:
 
 ```shell
-$ export META_PASSWORD=password
+$ export META_PASSWORD="mypassword"
 $ juicefs format --storage s3 \
     ...
     "postgres://[email protected]:5432/juicefs" \
@@ -149,7 +149,7 @@ sudo juicefs mount -d "postgres://user:[email protected]:5432/juicefs" /mnt
 Passing password with the `META_PASSWORD` environment variable is also supported when mounting a file system.
 
 ```shell
-$ export META_PASSWORD=mypassword
+$ export META_PASSWORD="mypassword"
 $ sudo juicefs mount -d "postgres://[email protected]:5432/juicefs" /mnt/jfs
 ```
 
@@ -172,7 +172,7 @@ Additional parameters can be appended to the metadata URL, [click here to view](
 
 ### Create a file system
 
-When using MySQL as the metadata storage engine, the following format is usually used to access the database:
+When using MySQL as the metadata storage engine, you need to create a database manually before create the file system, the following format is usually used to access the database:
 
 ```shell
 mysql://<username>[:<password>]@(<host>:3306)/<database-name>
@@ -194,7 +194,7 @@ $ juicefs format --storage s3 \
 A more secure approach would be to pass the database password through the environment variable `META_PASSWORD`:
 
 ```shell
-$ export META_PASSWORD=mypassword
+$ export META_PASSWORD="mypassword"
 $ juicefs format --storage s3 \
     ...
     "mysql://user@(192.168.1.6:3306)/juicefs" \
@@ -210,7 +210,7 @@ sudo juicefs mount -d "mysql://user:mypassword@(192.168.1.6:3306)/juicefs" /mnt/
 Passing password with the `META_PASSWORD` environment variable is also supported when mounting a file system.
 
 ```shell
-$ export META_PASSWORD=mypassword
+$ export META_PASSWORD="mypassword"
 $ sudo juicefs mount -d "mysql://user@(192.168.1.6:3306)/juicefs" /mnt/jfs
 ```
 
@@ -236,7 +236,7 @@ $ sudo juicefs mount -d "mysql://user:mypassword@(192.168.1.6:3306)/juicefs" /mn
 Passing passwords through environment variables is also exactly the same:
 
 ```shell
-$ export META_PASSWORD=mypassword
+$ export META_PASSWORD="mypassword"
 $ juicefs format --storage s3 \
     ...
     "mysql://user@(192.168.1.6:3306)/juicefs" \