-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncate record failing when attempting to recover broken record #10286
Comments
On the same database, we're observing a different persistent error (same cluster, different records):
|
@tglman - any chance of getting some pointers on this one? |
So, going strait to how to recover: OrientDB store the cluster data in two files one with is the id file that keep the position where the data is stored and some minimal metadata, and the other where the data is stored, the one that is failing is the one that keep the data. So should be possible to mark the record as delete and create a "leak" in the data file. If you are happy to hack a bit in the code, this: is the part that remove the data from the data file, so is the part that is failing on you should be able to run a Obviously if you do this hack use that code just to remove the failed records, do not run in production will cause record to leak data, even if you change the code to handle the previous exception without deleting it it may cause some other issues, so better to not deploy hack in production. Going to the origin of the bug, also in this case it feel something I already saw, checking the history of the file in 3.2.x https://github.com/orientechnologies/orientdb/commits/3.2.x/core/src/main/java/com/orientechnologies/orient/core/storage/cluster/v2/OPaginatedClusterV2.java I do have a commit with a fix that is not present in 3.1.x in the specific: 21c49f9 by the way, is there anything that is holding you in 3.1.x ? it may be a good to try to switch to 3.2.x Bye |
Thanks @tglman When you say do not run in production, I'm assuming you mean we could use this offline with a hacked OrientDB to fix these records, but not let that hacked delete/truncate code run in production as it would cause other issues? Would it make sense to make this a Similarly, would this be something the check/repair database should be able to detect and rectify, or is the DB structure too broken at that point? The problem is that you can't do a complete database export with these broken records in place (thankfully the affected records in our case are generated, so we can do a partial export/import + regenerate in this case), so having a repair tool that could patch them up enough to get back to a workable state (e.g. that we could manually fix the graph on or run a We're planning to get to 3.2.x, but we need to find time to stress test the distributed code in 3.2 (which has changed a fair amount since 3.1). Given we're running into a lot of things fixed in 3.2 now, we might look to bring that forward. |
Hi,
Yes
Yes, but this require some proper development time, as today the low level code is shared between the normal delete and the truncate, so there is need of some work to spit the logic and make sure to have the option of "leak the data" And yes as well as soon a truncate is done it could be used in a Anyway as today there is something that can be used already in the database export that is Bye |
OrientDB Version: 3.1.21-SNAPSHOT (3.1.20 + our HA/replicated stability patches)
Java Version: 1.8
OS: Linux/arm64
We have a database in production with 9 broken records:
#24:5773252
through#24:5773260
We suspect they were created by an unclean shutdown of the database server.
When these records are accessed (e.g. a
SELECT
or aDELETE
), they fail with aBufferUnderflowException
.Reading some other similar issues, the solution recommended in those is to use
TRUNCATE RECORD
, however this is also failing.Results for the first broken record:
Subsequent broken records all fail similarly, but with distinct indexes:
Short of getting a fix for this, are there any other steps (short of exporting/importing the data) that can be taken to recover this database?
(The records in question are derived, so simply removing them is acceptable in this situation).
Unfortunately the data in the databases is confidential information, so we can't provide a copy for examination, but the engineering team supporting it is quite experienced with btree structures, so we can probably manage detailed file investigation/patching if we can get guidance on what is going on.
The text was updated successfully, but these errors were encountered: