Data recovery #97

qishenonly · 2023-06-13T05:05:58Z

If a data operation, such as data insertion or deletion, fails, you need to perform the operation again according to the operation log.

halalala222 · 2024-08-05T15:31:33Z

Can u tell me more about this issue? thx!

qishenonly · 2024-08-06T02:19:05Z

When performing data operations, such as insertion or deletion, a log will be written to the operation log. If the operation fails, the failed operation will be read from the log and the operation will be resumed. However, inserting operation logs has not yet been implemented.

halalala222 · 2024-08-07T06:52:58Z

Below are some of my questions; I hope they can be answered. Thank you very much!!!!!!
Resumed the operation to perform the operation again? Is it about adding a retry mechanism for data operations? If so, should the corresponding number of retries be configured?
Is the operation at endine/db.go the Delete operation and then re-executing the db.Delete() method? Or is it the operation on db.index.Delete(key) index, and then re-executing?
Written to the operation file log right ?

qishenonly · 2024-08-07T11:34:57Z

Sure, here's my reply.

Q：Resumed the operation to perform the operation again? Is it about adding a retry mechanism for data operations? If so, should the corresponding number of retries be configured?

A：Of course it is, but we do not have a retry mechanism for data operations to configure, I think the configuration of the number of retries should be the DB built-in configuration rather than exposed.

Q：Is the operation at endine/db.go the Delete operation and then re-executing the db.Delete() method? Or is it the operation on db.index.Delete(key) index, and then re-executing?

A：I'm not sure what you mean. Maybe you could be a little more specific.

Q：Written to the operation file log right ?

A：Due to the lack of maintenance for a period of time and the document is not updated in time, no functions related to operation logs are found. Now, you only need to in engine/db.go add import statements: _ "github.com/ByteStorage/FlyDB/lib/logger", can realize the function operation log. Successful log records in the db/engine/logs/runtime.log, failure in db/engine/logs/runtime_err.log. Now you just need to analyze the operation in the runtime_err.log file and re-execute it.

I hope that solves your confusion.

halalala222 · 2024-08-07T12:07:46Z

Thank you for your answer! I had misunderstood some things earlier, but I get some new question I hope you can help with.
Thx!!!!!!!!
Q: in engine/db.go import statements : _ "github.com/ByteStorage/FlyDB/lib/logger" just only to init Zap logger, in engine/db.go only use zap.L().Info() to log info level.
For example : ending/db.go put function

	// append log record
	pos, err := db.appendLogRecordWithLock(logRecord)
	if err != nil {
		return err
	}

returns an error without logging; it simply returns the error to the caller.

	// update index
	if ok := db.index.Put(key, pos); !ok {
		return _const.ErrIndexUpdateFailed
	}

db.index.Put not contain err also failed return a consts error.
Do you mean that for both of the above failure scenarios, an error log should be recorded?

A: analyze the operation in the runtime_err.log file and re-execute it.

In such an asynchronous process where the function has already returned, what is the purpose of the retry?

qishenonly · 2024-08-07T13:13:51Z

ok, here's my reply.

    // update index
if ok := db.index.Put(key, pos); !ok {
	return _const.ErrIndexUpdateFailed
}

Take btree as an example.

func (bt *BTree) Put(key []byte, pst *data.LogRecordPst) bool {
	bt.lock.Lock()
	defer bt.lock.Unlock()
	it := &Item{key: key, pst: pst}

	bt.tree.ReplaceOrInsert(it)
	return true
}

The bool type is returned for the CRUD operation of the index, but in reality, it signifies a write to disk. Therefore, error handling for this piece is unnecessary as the index is maintained in memory and any errors would likely be due to insufficient memory.

For the ending/db.go put function
maybe you can

// append log record
pos, err := db.appendLogRecordWithLock(logRecord)
if err != nil {
	zap.L().Error("put error", zap.Error(err), zap.ByteString("key", key), zap.ByteString("value", value))
	return err
}

According to this error logic, let's analyze the approach for performing operations to ensure the proper insertion of data. When using the appendLogRecordWithLock function, errors may arise due to one of three reasons.

setActiveDataFile function error: If there is no active data file, the function tries to create a new data file. If the file fails to be created, the function returns an error.
Data file write error: The function will attempt to write log records to the active data file. If the write fails, for example due to insufficient disk space or file system permissions, the function returns an error.
Data file synchronization error: If the user has enabled synchronous write, the function synchronizes the data file immediately after the data is written. If synchronization fails, for example due to a disk error or other I/O error, the function returns an error.

Therefore, you can record the location and cause of the analysis error with 'runtime_error.log', and then re-execute the operation.

This is merely to offer you a line of thought. It might not be the best direction as the implementation could be somewhat trouble.

halalala222 · 2024-08-07T14:10:56Z

Thank you for your answer!
Do you mean that instead of our program automatically parsing the error.log, it's the user who analyzes the error through this error.log and then re-executes it? I mistakenly thought it was the program that automatically parses the error.log to retry, so I don't understand the significance of the program automatically reading the error.log to retry.

qishenonly · 2024-08-08T04:26:53Z

In general, there are two approaches to log recovery. One involves analyzing the error log and then re-executing the operation, while the other entails reading the successful log to recover all the data.
Operation logs are fixed inside the program and are not exposed to the user. You only need to provide an api interface.

halalala222 · 2024-08-08T14:04:43Z

Thank you for your very patient reply. Is it correct to understand that this feature mainly involves logging data operations within the program, providing an API based on logs for operation retries and recovery for users to call? However, the current zap log configuration does not include log rotation and elimination, and the runtime log contains full logs. For the API mentioned above, I think it is necessary to provide a time range for the API to retry or recover operations within this specified timeframe.

qishenonly · 2024-08-09T12:52:46Z

Your understanding is accurate. It is essential to provide an API with a specified time range to avoid significantly increasing system overhead. However, in order to maintain data consistency and prevent inconsistent system status, it is important not to compromise the system due to data recovery.

Additionally, full recovery functionality should also be available. Full recovery is preferable over recovery within a specified time range when dealing with small log files or infrequent recovery operations.

For example:

Restore() error
RestoreByRange(start, end *time.Time) error

halalala222 · 2024-08-09T16:32:09Z

Thank you for your patient response!!!!!! I will try to work on this issue!

qishenonly · 2024-08-10T10:19:09Z

ok, try to do it.

qishenonly added the difficult：⭐⭐⭐ Up to five stars label Jun 13, 2023

qishenonly assigned qishenonly and halalala222 and unassigned qishenonly Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data recovery #97

Data recovery #97

qishenonly commented Jun 13, 2023

halalala222 commented Aug 5, 2024

qishenonly commented Aug 6, 2024

halalala222 commented Aug 7, 2024

qishenonly commented Aug 7, 2024

halalala222 commented Aug 7, 2024 •

edited

Loading

qishenonly commented Aug 7, 2024

halalala222 commented Aug 7, 2024

qishenonly commented Aug 8, 2024

halalala222 commented Aug 8, 2024

qishenonly commented Aug 9, 2024

halalala222 commented Aug 9, 2024

qishenonly commented Aug 10, 2024

Data recovery #97

Data recovery #97

Comments

qishenonly commented Jun 13, 2023

halalala222 commented Aug 5, 2024

qishenonly commented Aug 6, 2024

halalala222 commented Aug 7, 2024

qishenonly commented Aug 7, 2024

halalala222 commented Aug 7, 2024 • edited Loading

qishenonly commented Aug 7, 2024

halalala222 commented Aug 7, 2024

qishenonly commented Aug 8, 2024

halalala222 commented Aug 8, 2024

qishenonly commented Aug 9, 2024

halalala222 commented Aug 9, 2024

qishenonly commented Aug 10, 2024

halalala222 commented Aug 7, 2024 •

edited

Loading