-
Notifications
You must be signed in to change notification settings - Fork 60
Replication can miss documents #56
Comments
I'm also getting this in the log during intermittent connection loss
|
No other errors have occurred, but still the design document in question is not synced over. (Other design documents succeeded though). |
Looking at _all_docs reveals this difference
among all the details. I'm a bit surprised by this, I thought that replication should guarantee me the same document count as it counts only the documents that are curend and accessible (i.e. excludes deleted items)? Running the replication again after that seems to have worked - still the replicator finished thinking that it had succeeded as far as I can tell. |
And again - this time on the document I would have expected it on:
Unable to save checkpoint messages where coming soon after. |
And indeed this leads to a database that is not complete (missing about 500 documents) where further replication attempts will not get any additional documents. :-( As far as I understand this is triggered by the out of memory error during the replication - the design document tin question here was 2.9 megabytes. As this sometimes succeeds, this could indicate that the rest of the replication is slowly leaking memory. |
Looking at a memory profiler, it looks like jackson may be used incorrectly so that it tries to accumulate every object replicated in memory somewhere. This is what the eclipse memory tool spits out as the top suspect: The thread java.lang.Thread @ 0x405567b0 ChangeTracker-http://redacted keeps local variables with total size 17.969.736 (87,19%) bytes. The memory is accumulated in one instance of "org.codehaus.jackson.map.util.ObjectBuffer" loaded by "dalvik.system.PathClassLoader @ 0x40514290". Keywords |
Two things I think need fixing here: a) The database absolutely has to protect itself against becoming inconsistent due to memory errors - any errors really. |
Does that design document contain any attachments? There are known issues replicating with attachments right now. Also are you using the latest code? I've successfully pulled a database with 170k revisions without running out of memory (it ran for 2 hours). |
Also, to clarify, the tombstones of deleted documents must also be replicated, which is why you saw the deleted documents being replicated. |
Yes, there is a couch app on that design document. Though none of those files are bigger than several kilobytes. Yes I am using the latest code. I did see these issues on Android 2.2 (the oldest supported version) - maybe that might be the reason? On thursday I will be trying emulators with newer versions of Android to see if that narrows down the problem - I would be more than glad though if you where able to pinpoint the problem already. I did try to debug my way down to jackson and have to admit that my android debugging foo is not up to that task yet. :-( Oh, and thanks for the heads up on the deleted docs. |
OK, the attachments are at least part of the problem. The patch I'm working on now should help a bit. The problem is that we are transferring attachments in-line with the document (encoded as base64). This means that the total size of the JSON we parse is the size of the design document plus all off the attachments after base64 encoding (roughly 33% larger). With the patch I'm working on now, a document and its attachments are transfered as a MIME multi-part message. And each of the attachments can be streamed directly to disk, never going through the JSON parser. |
Out of curiosity, is my observation correct that the json parser keeps a reference to each parsed object? (At least on Android 2.2) Also Android 2.2 seems to have a lot stricter memory limits than later versions (4.0 for example). |
I've retested replication on a virtual 4.0 device and the memory problem seems not to exist there / seems to be a lot less severe there. In any case while the 2.2 emulated device was never able to finish a replication, the virtual 4.0 device is. Not sure if this indicates that this might be more of an android versioning issue. |
I also seem to be experiencing this problem - a number of documents that have attachments do not show up in the replicated database (mobile => couchdb server), even though the replication thinks it's complete. Is there anything I should do to verify that this is the same issue, and not a new one? I'm experiencing this on Ice Cream Sandwich. |
There are several problems with the replicator right now. It will deadlock and/or run out of memory in lots of cases, and for some reason doesn't resume correctly upon restart. If you can identify if there are any particular documents that never replicate. That would be useful to know. Or if its different documents each time that might also be useful. Finally if you have errors in the log that would help. |
I created a new remote database, turned off the Pull replication, thus trying to push all documents to the remote. The replicator started up correctly, I see a long list of memory allocation Grow_heap (frag case) calls in the logs, and then then after maybe a minute the app crashes with an out of memory exception. At the time of the crash, no documents have been propagated this new remote db. Here's the salient log entries: https://gist.github.com/3305565 The docs that don't replicate are identical in structure to the docs that do - basically some descriptive info fields (text) and then 2 attachments - a thumbnail, and a medium quality resize of a photograph. Basically this: medium.jpg 376.2 KB, image/jpeg From 2 different devices I'm seeing what appears to be a random subset of documents that don't replicate. |
If the documents have multiple attachments larger than 16k bytes, this could be the same as this issue I just worked around for iOS: |
Currently we don't send attachments with MIME multipart (this patch is in progress). Once we implement that feature, we may run into the issue Jens is referring to. I think we can work around it by using a feature of the Jackson JSON parser to force ordering of the attachments map. Back to this issue, the log shows an out of memory is occurring when it is trying to base64 encode one of the attachments. It was trying to allocate around 1 megabyte at the type it failed. It seems as though you already had an absurdly high amount of memory allocated though (over 100MB). By any chance did you have the debugger attached when you were running this replication? Previously we used a new thread for each request. This was much worse for pull replications, but in long-running push replications it can be a problem as well. If not, there may be some other leak... |
I am facing issues in both PUSH and PULL replication. I am testing in a low connectivity and on a limited phone config. I think the issue is stemming from the list of doc changes we keep in memory. (I get quite a few OutOfMemoryExceptions). Additionally, I have noticed that in PULL replication, the thread removes the entry from the inbox and kicks off an AsyncTask. In the event that the thread/AsyncTask fails there is no update to the Inbox. Is my understand correct? Do you think it will help if we write the list of documents into the DB and fire threads based on the DB? |
As for the question about how attachments are pulled/pushed: The good news is, that this has been fixed in CouchDB master branch. Since the CouchDB Version 1.2.1 was tagged 09.01.2013 this should be |
Well, it didn't prevent using multipart, but it did make it more difficult. When receiving MIME responses, TouchDB/iOS computes a digest of each MIME body and uses those to figure out which attachment goes with which metadata object. On the sending side, it uses a custom JSON encoder that writes out the _attachments dictionary keys in known (sorted) order so that it can write the MIME bodies in the same order. |
I see. Though a bit cumbersome this approach also works with earlier CouchDB versions. |
I think it's the same as whatever TouchDB-iOS was doing last summer, or whenever Marty stopped having time to port my commits :) |
I am not talking about attachements. I have a more basic question on the Pull/Push mechanism In TouchDB-Android - we remove the revision from the ArrayList after launching a request without checking if the request completed successfully or not.
And the later:
Thus if there are 10 changes in the inbox, suppose 9th fails but 10th succeeds (may be due to a dip in internet, or timeout or whatever), then seq gets updated to the latest and there is no entry for the lose of the 9th change. Is my understanding correct? |
The current iOS TDPuller is good about keeping track of this. It uses a helper class called a TDSequenceMap to keep track of which remote sequences are pending vs. which have been successfully copied. The sequence map can tell it what remote sequence to checkpoint. I looked through the iOS TDPusher code and it's less robust. I think it hasn't had as much attention because it's not used as heavily (a lot of apps rely more on pull) and because most of the time it has an easier job since it can send lots of revisions in one Update: Filed https://github.com/couchbaselabs/TouchDB-iOS/issues/246 on this. |
We should link to the test that proves this is fixed. I like what @dwt said: Two things I think need fixing here: a) The database absolutely has to protect itself against becoming inconsistent due to memory errors - any errors really. |
HI there,
I'm curerntly investigating an issue where after the replication succeeds (and touchdb tells me that it's done, but has missed replicating some documents in the process (most impressively it missed one of several design documents).
The app did crash repeatedly during replication with for example out of memory errors and showed several 409s while trying to save the checkpoint.
I'm still trying to reproduce this in more detail, so I'll be adding log fragments that might help here:
The returned document is
Which doesn't make much sense, as it was my understanding that deleted documents are not replicated?
The text was updated successfully, but these errors were encountered: