Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data recovery #97

Open
qishenonly opened this issue Jun 13, 2023 · 12 comments
Open

Data recovery #97

qishenonly opened this issue Jun 13, 2023 · 12 comments
Assignees
Labels
difficult:⭐⭐⭐ Up to five stars

Comments

@qishenonly
Copy link
Member

If a data operation, such as data insertion or deletion, fails, you need to perform the operation again according to the operation log.

@qishenonly qishenonly added the difficult:⭐⭐⭐ Up to five stars label Jun 13, 2023
@halalala222
Copy link
Contributor

Can u tell me more about this issue? thx!

@qishenonly
Copy link
Member Author

When performing data operations, such as insertion or deletion, a log will be written to the operation log. If the operation fails, the failed operation will be read from the log and the operation will be resumed. However, inserting operation logs has not yet been implemented.

@halalala222
Copy link
Contributor

Below are some of my questions; I hope they can be answered. Thank you very much!!!!!!
Resumed the operation to perform the operation again? Is it about adding a retry mechanism for data operations? If so, should the corresponding number of retries be configured?
Is the operation at endine/db.go the Delete operation and then re-executing the db.Delete() method? Or is it the operation on db.index.Delete(key) index, and then re-executing?
Written to the operation file log right ?

@qishenonly
Copy link
Member Author

Sure, here's my reply.

Q:Resumed the operation to perform the operation again? Is it about adding a retry mechanism for data operations? If so, should the corresponding number of retries be configured?

A:Of course it is, but we do not have a retry mechanism for data operations to configure, I think the configuration of the number of retries should be the DB built-in configuration rather than exposed.

Q:Is the operation at endine/db.go the Delete operation and then re-executing the db.Delete() method? Or is it the operation on db.index.Delete(key) index, and then re-executing?

A:I'm not sure what you mean. Maybe you could be a little more specific.

Q:Written to the operation file log right ?

A:Due to the lack of maintenance for a period of time and the document is not updated in time, no functions related to operation logs are found. Now, you only need to in engine/db.go add import statements: _ "github.com/ByteStorage/FlyDB/lib/logger", can realize the function operation log. Successful log records in the db/engine/logs/runtime.log, failure in db/engine/logs/runtime_err.log. Now you just need to analyze the operation in the runtime_err.log file and re-execute it.

I hope that solves your confusion.

@halalala222
Copy link
Contributor

halalala222 commented Aug 7, 2024

Thank you for your answer! I had misunderstood some things earlier, but I get some new question I hope you can help with.
Thx!!!!!!!!
Q: in engine/db.go import statements : _ "github.com/ByteStorage/FlyDB/lib/logger" just only to init Zap logger, in engine/db.go only use zap.L().Info() to log info level.
For example : ending/db.go put function

	// append log record
	pos, err := db.appendLogRecordWithLock(logRecord)
	if err != nil {
		return err
	}

returns an error without logging; it simply returns the error to the caller.

	// update index
	if ok := db.index.Put(key, pos); !ok {
		return _const.ErrIndexUpdateFailed
	}

db.index.Put not contain err also failed return a consts error.
Do you mean that for both of the above failure scenarios, an error log should be recorded?

A: analyze the operation in the runtime_err.log file and re-execute it.

In such an asynchronous process where the function has already returned, what is the purpose of the retry?

@qishenonly
Copy link
Member Author

ok, here's my reply.

    // update index
if ok := db.index.Put(key, pos); !ok {
	return _const.ErrIndexUpdateFailed
}

Take btree as an example.

func (bt *BTree) Put(key []byte, pst *data.LogRecordPst) bool {
	bt.lock.Lock()
	defer bt.lock.Unlock()
	it := &Item{key: key, pst: pst}

	bt.tree.ReplaceOrInsert(it)
	return true
}

The bool type is returned for the CRUD operation of the index, but in reality, it signifies a write to disk. Therefore, error handling for this piece is unnecessary as the index is maintained in memory and any errors would likely be due to insufficient memory.

For the ending/db.go put function
maybe you can

// append log record
pos, err := db.appendLogRecordWithLock(logRecord)
if err != nil {
	zap.L().Error("put error", zap.Error(err), zap.ByteString("key", key), zap.ByteString("value", value))
	return err
}

According to this error logic, let's analyze the approach for performing operations to ensure the proper insertion of data. When using the appendLogRecordWithLock function, errors may arise due to one of three reasons.

  1. setActiveDataFile function error: If there is no active data file, the function tries to create a new data file. If the file fails to be created, the function returns an error.
  2. Data file write error: The function will attempt to write log records to the active data file. If the write fails, for example due to insufficient disk space or file system permissions, the function returns an error.
  3. Data file synchronization error: If the user has enabled synchronous write, the function synchronizes the data file immediately after the data is written. If synchronization fails, for example due to a disk error or other I/O error, the function returns an error.

Therefore, you can record the location and cause of the analysis error with 'runtime_error.log', and then re-execute the operation.

This is merely to offer you a line of thought. It might not be the best direction as the implementation could be somewhat trouble.

@halalala222
Copy link
Contributor

Thank you for your answer!
Do you mean that instead of our program automatically parsing the error.log, it's the user who analyzes the error through this error.log and then re-executes it? I mistakenly thought it was the program that automatically parses the error.log to retry, so I don't understand the significance of the program automatically reading the error.log to retry.

@qishenonly
Copy link
Member Author

In general, there are two approaches to log recovery. One involves analyzing the error log and then re-executing the operation, while the other entails reading the successful log to recover all the data.
Operation logs are fixed inside the program and are not exposed to the user. You only need to provide an api interface.

@halalala222
Copy link
Contributor

Thank you for your very patient reply. Is it correct to understand that this feature mainly involves logging data operations within the program, providing an API based on logs for operation retries and recovery for users to call? However, the current zap log configuration does not include log rotation and elimination, and the runtime log contains full logs. For the API mentioned above, I think it is necessary to provide a time range for the API to retry or recover operations within this specified timeframe.

@qishenonly
Copy link
Member Author

Your understanding is accurate. It is essential to provide an API with a specified time range to avoid significantly increasing system overhead. However, in order to maintain data consistency and prevent inconsistent system status, it is important not to compromise the system due to data recovery.

Additionally, full recovery functionality should also be available. Full recovery is preferable over recovery within a specified time range when dealing with small log files or infrequent recovery operations.

For example:

Restore() error
RestoreByRange(start, end *time.Time) error

@halalala222
Copy link
Contributor

Thank you for your patient response!!!!!! I will try to work on this issue!

@qishenonly
Copy link
Member Author

ok, try to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficult:⭐⭐⭐ Up to five stars
Projects
None yet
Development

No branches or pull requests

2 participants