-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor #6
base: main
Are you sure you want to change the base?
Refactor #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新添加的函数请放到以后函数的下面,否则不方便 diff review
repo.go
Outdated
if file.Size != newSize || file.SecUpdated() != newUpdated { | ||
logging.LogErrorf("file changed [%s], size [%d -> %d], updated [%d -> %d]", absPath, file.Size, newSize, file.Updated, newUpdated) | ||
|
||
update = file.Size != newSize || file.SecUpdated() != newUpdated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里没有必要抽取变量了,直接条件判断。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数的本意是判断文件是否更新,虽然为了跟之前的代码保持一致,在错误里也携带了相关信息,但连 bool 也不带感觉语义怎么看都不太合适……
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个条件并不复杂,请勿抽取变量,如果觉得不好阅读,加个注释即可。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
主要是update是返回值啊……不抽取变量的话,这个函数不就变成一个只返回error的函数了?
要不我把ErrIndexFileChanged交个上一层的putFileChunks进行生成?checkFileIfUpdate 只产生 os.Stat 的报错。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你外面也没有用到这个返回值啊:
_, checkErr := repo.checkFileIfUpdate(file)
ErrIndexFileChanged 原本就是在 putFileChunks 里的……
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,那我就改成ErrIndexFileChanged在putFileChunks 里,不在check 函数里了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
不能动态分块的
…---Original---
From: ***@***.***>
Date: Sun, Sep 15, 2024 09:14 AM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [siyuan-note/dejavu] Refactor (PR #6)
@zxhd863943427 commented on this pull request.
In repo.go:
> @@ -57,6 +57,18 @@ type Repo struct { cloud cloud.Cloud // 云端存储服务 } +type ChunkSize struct {
这个位置我之后是打算看看能否根据文件大小切分不同的分片等级,因为当前以我自己的笔记数据为例,几乎所有笔记的数据量都在500k以内,大于500kb的基本都是资源文件,而笔记内容是经常变动的,资源文件基本是不变的,这导致分片功能实际上只能起到切分大资源文件、提高上传速度的能力,而对于笔记内容无法做到增量备份。为了达到这个目的,使用封装的数据结构,方便后续操作是有必要的
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
这个怎么说?我观察来说,只要对于一批文件使用同一个固定的分块体积,那么就能有效的分块。 |
文件大小变化后分块大小不固定会有问题
…---Original---
From: ***@***.***>
Date: Sun, Sep 15, 2024 12:07 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [siyuan-note/dejavu] Refactor (PR #6)
不能动态分块的
这个怎么说?我观察来说,只要对于一批文件使用同一个固定的分块体积,那么就能有效的分块。
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
举个例子说明下?假设目前使用两个分块等级,0-500kb和500kb以上,一个文件之前为500kb的,那么会按照第一个等级分块,体积增加到500kb时就按照第二个等级分块,这会导致什么问题? |
已有分块会有变动
…---Original---
From: ***@***.***>
Date: Sun, Sep 15, 2024 12:36 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [siyuan-note/dejavu] Refactor (PR #6)
文件大小变化后分块大小不固定会有问题
举个例子说明下?假设目前使用两个分块等级,0-500kb和500kb以上,一个文件之前为500kb的,那么会按照第一个等级分块,体积增加到500kb时就按照第二个等级分块,这会导致什么问题?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
分块不是按照hash命名的吗?分块大小改变后,hash肯定会变吧,所以不太可能修改已有的块。 |
我指的是按照动态算法的话已有的分块就废了。
另外,之前的分块大小是按照比较符合笔记数据来配置的,没有必要动态配置,分块太多也不能提高性能,甚至可能降低性能,所以不打算改变了。
…---Original---
From: ***@***.***>
Date: Sun, Sep 15, 2024 13:22 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [siyuan-note/dejavu] Refactor (PR #6)
分块不是按照hash命名的吗?分块大小改变后,hash肯定会变吧,所以不太可能修改已有的块。
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
已有的分块不会受影响,因为旧有的文件大于500kb的分块方式与计划更改后的效果一致。 而小于500kb的文件原本是整个作为一个块,无论如何也不可能影响到这些分块的数据,因为读取方式没变,写入也不会修改到它们。
之前是直接使用的 restic 的默认分块大小,但是restic的分块方式与思源不一样,它是把整个备份数据当成一个连续的文件进行分块,而且它的备份频率也不高,这样的分块大小对于它是合理的。 而思源是每个文件单独进行分块,因此根据文件大小进行一定的等级分块更加合理。
目的也不是提高快照建立性能,而是减少磁盘占用和提高同步速度。不过后者确实会在小文件太多时反而速度降低。 |
等等,D 你的意思是重新建立快照会导致之前的小文件会被重新分块产生新的块? |
你测试一下看看吧,我没有测过。
最好是不要改,我记得之前也不是用的默认分块大小,而 8M 吧。
…---Original---
From: ***@***.***>
Date: Sun, Sep 15, 2024 14:32 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [siyuan-note/dejavu] Refactor (PR #6)
等等,D 你的意思是重新建立快照会导致之前的小文件会被重新分块产生新的块?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
抱歉,我记错了,用的是这个库的默认分块,但这个值之前我是以一些测试笔记数据作为参考后才没有手动设置其他值的的,所以这个值是比较合理的,不考虑变动了。 |
repo.go
Outdated
if file.Size != newSize || file.SecUpdated() != newUpdated { | ||
logging.LogErrorf("file changed [%s], size [%d -> %d], updated [%d -> %d]", absPath, file.Size, newSize, file.Updated, newUpdated) | ||
|
||
update = file.Size != newSize || file.SecUpdated() != newUpdated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个条件并不复杂,请勿抽取变量,如果觉得不好阅读,加个注释即可。
repo.go
Outdated
if file.Size != newSize || file.SecUpdated() != newUpdated { | ||
logging.LogErrorf("file changed [%s], size [%d -> %d], updated [%d -> %d]", absPath, file.Size, newSize, file.Updated, newUpdated) | ||
|
||
update = file.Size != newSize || file.SecUpdated() != newUpdated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你外面也没有用到这个返回值啊:
_, checkErr := repo.checkFileIfUpdate(file)
ErrIndexFileChanged 原本就是在 putFileChunks 里的……
repo.go
Outdated
return | ||
} | ||
|
||
func (repo *Repo) createChunks(file *entity.File, minSize, maxSize uint) (chunks []*entity.Chunk, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请勿再拆分函数了,原因:
- 没有函数复用
- 抽取函数后可读性并没有显著提升
- 在 getFileNewInfo 中重复调用 repo.absPath 函数了
所以结构上就不要调整了,没有太大必要性。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是指getFileNewInfo函数?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还是说createChunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没有函数复用
这里的主要目的是提高代码的可读性。
对于可读性的提升我觉得是存在的,现在阅读putFileAndChunks主函数能很清晰的看出步骤的进行。假如不拆分,原函数真的太长了,加上注释也很难看懂意图。
在 getFileNewInfo 中重复调用 repo.absPath 函数了
这只是一个简单的字符串运算,相比可读性的提升我觉得是值得的。
不抽取函数可读性更好一些
…---Original---
From: ***@***.***>
Date: Fri, Sep 20, 2024 12:54 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [siyuan-note/dejavu] Refactor (PR #6)
@zxhd863943427 commented on this pull request.
In repo.go:
> + return + } + + putFileErr := repo.store.PutFile(file) + + if nil != putFileErr { + err = putFileErr + return + } + + eventbus.Publish(eventbus.EvtIndexUpsertFile, context, count, total) + + return +} + +func (repo *Repo) createChunks(file *entity.File, minSize, maxSize uint) (chunks []*entity.Chunk, err error) {
没有函数复用
这里的主要目的是提高代码的可读性。
对于可读性的提升我觉得是存在的,现在阅读putFileAndChunks主函数能很清晰的看出步骤的进行。假如不拆分,原函数真的太长了,加上注释也很难看懂意图。
在 getFileNewInfo 中重复调用 repo.absPath 函数了
这只是一个简单的字符串运算,相比可读性的提升我觉得是值得的。
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
absPath := repo.absPath(file.Path) | ||
chunks := make([]*entity.Chunk, 0) | ||
reader, err := filelock.OpenFile(absPath, os.O_RDONLY, 0644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里请勿调整,按原来的实现。
对于 repo.go 中 putFileChunks 函数的代码进行了少量重构