Skip to content
This repository has been archived by the owner on Mar 9, 2022. It is now read-only.

Replication can miss documents #56

Open
dwt opened this issue Jul 20, 2012 · 26 comments
Open

Replication can miss documents #56

dwt opened this issue Jul 20, 2012 · 26 comments
Assignees
Milestone

Comments

@dwt
Copy link
Contributor

dwt commented Jul 20, 2012

HI there,

I'm curerntly investigating an issue where after the replication succeeds (and touchdb tells me that it's done, but has missed replicating some documents in the process (most impressively it missed one of several design documents).

The app did crash repeatedly during replication with for example out of memory errors and showed several 409s while trying to save the checkpoint.

I'm still trying to reproduce this in more detail, so I'll be adding log fragments that might help here:


I/dalvikvm( 4757): "RemoteRequest-http://redacted/47dfd65767aa49e479980f50feeb9713?rev=2-2ffeb6d48ce2618edd98dfba6cfcf679&revs=true&attachments=true" prio=5 tid=25 RUNNABLE
I/dalvikvm( 4757):   | group="main" sCount=0 dsCount=0 obj=0x4112f038 self=0x3787a8
I/dalvikvm( 4757):   | sysTid=5175 nice=0 sched=0/0 cgrp=default handle=3583632
I/dalvikvm( 4757):   at org.apache.http.impl.io.AbstractSessionInputBuffer.init(AbstractSessionInputBuffer.java:~79)
I/dalvikvm( 4757):   at org.apache.http.impl.io.SocketInputBuffer.<init>(SocketInputBuffer.java:93)
I/dalvikvm( 4757):   at org.apache.http.impl.SocketHttpClientConnection.createSessionInputBuffer(SocketHttpClientConnection.java:83)
I/dalvikvm( 4757):   at org.apache.http.impl.conn.DefaultClientConnection.createSessionInputBuffer(DefaultClientConnection.java:170)
I/dalvikvm( 4757):   at org.apache.http.impl.SocketHttpClientConnection.bind(SocketHttpClientConnection.java:106)
I/dalvikvm( 4757):   at org.apache.http.impl.conn.DefaultClientConnection.openCompleted(DefaultClientConnection.java:129)
I/dalvikvm( 4757):   at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:173)
I/dalvikvm( 4757):   at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:164)
I/dalvikvm( 4757):   at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:119)
I/dalvikvm( 4757):   at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:359)
I/dalvikvm( 4757):   at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:555)
I/dalvikvm( 4757):   at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:487)
I/dalvikvm( 4757):   at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:465)
I/dalvikvm( 4757):   at com.couchbase.touchdb.support.TDRemoteRequest.run(TDRemoteRequest.java:133)
I/dalvikvm( 4757):   at java.lang.Thread.run(Thread.java:1019)
I/dalvikvm( 4757): 

The returned document is

{
  "_id":"47dfd65767aa49e479980f50feeb9713",
  "_rev":"2-2ffeb6d48ce2618edd98dfba6cfcf679",
  "_deleted":true,
  "_revisions":{"start":2,"ids":["2ffeb6d48ce2618edd98dfba6cfcf679","87b4287484f4794f5e0124a4979d7438"]}
}

Which doesn't make much sense, as it was my understanding that deleted documents are not replicated?

@dwt
Copy link
Contributor Author

dwt commented Jul 20, 2012

I'm also getting this in the log during intermittent connection loss


07-20 13:51:37.210: V/TDDatabase(5886): com.couchbase.touchdb.replicator.TDReplicator$5@40cc0898: Unable to save remote checkpoint
07-20 13:51:37.210: V/TDDatabase(5886): org.apache.http.conn.HttpHostConnectException: Connection to http://db.insideguidance.com refused
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:178)
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:164)
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:119)
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:359)
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:555)
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:487)
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:465)
07-20 13:51:37.210: V/TDDatabase(5886):     at com.couchbase.touchdb.support.TDRemoteRequest.run(TDRemoteRequest.java:133)
07-20 13:51:37.210: V/TDDatabase(5886):     at java.lang.Thread.run(Thread.java:1019)
07-20 13:51:37.210: V/TDDatabase(5886): Caused by: java.net.ConnectException: /213.73.99.18:80 - Network is unreachable
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.harmony.luni.net.PlainSocketImpl.connect(PlainSocketImpl.java:207)
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.harmony.luni.net.PlainSocketImpl.connect(PlainSocketImpl.java:437)
07-20 13:51:37.210: V/TDDatabase(5886):     at java.net.Socket.connect(Socket.java:983)
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:119)
07-20 13:51:37.210: V/TDDatabase(5886):     at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:143)
07-20 13:51:37.210: V/TDDatabase(5886):     ... 8 more

@dwt
Copy link
Contributor Author

dwt commented Jul 20, 2012

No other errors have occurred, but still the design document in question is not synced over. (Other design documents succeeded though).

@dwt
Copy link
Contributor Author

dwt commented Jul 20, 2012

Looking at _all_docs reveals this difference


 {
+    "total_rows": 8549,
     "offset": 0,
-    "total_rows": 7672,
     "rows": [
         {

among all the details. I'm a bit surprised by this, I thought that replication should guarantee me the same document count as it counts only the documents that are curend and accessible (i.e. excludes deleted items)?

Running the replication again after that seems to have worked - still the replicator finished thinking that it had succeeded as far as I can tell.

@dwt
Copy link
Contributor Author

dwt commented Jul 20, 2012

And again - this time on the document I would have expected it on:

E/dalvikvm-heap(24307): Out of memory on a 524304-byte allocation.
I/dalvikvm(24307): "RemoteRequest-http://redacted/_design%2Fcatalog?rev=6-d8b24a62b63d2c5320aa0647aad6d6ed&revs=true&attachments=true" prio=5 tid=17 RUNNABLE
I/dalvikvm(24307):   | group="main" sCount=0 dsCount=0 obj=0x40f10be0 self=0x3b2420
I/dalvikvm(24307):   | sysTid=24472 nice=0 sched=0/0 cgrp=default handle=3606688
I/dalvikvm(24307):   at org.codehaus.jackson.util.TextBuffer._charArray(TextBuffer.java:~705)
I/dalvikvm(24307):   at org.codehaus.jackson.util.TextBuffer.finishCurrentSegment(TextBuffer.java:573)
I/dalvikvm(24307):   at org.codehaus.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1922)
I/dalvikvm(24307):   at org.codehaus.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1899)
I/dalvikvm(24307):   at org.codehaus.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:276)
I/dalvikvm(24307):   at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59)
I/dalvikvm(24307):   at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:218)
I/dalvikvm(24307):   at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
I/dalvikvm(24307):   at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:218)
I/dalvikvm(24307):   at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
I/dalvikvm(24307):   at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:218)
I/dalvikvm(24307):   at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
I/dalvikvm(24307):   at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2723)
I/dalvikvm(24307):   at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1900)
I/dalvikvm(24307):   at com.couchbase.touchdb.support.TDRemoteRequest.run(TDRemoteRequest.java:145)
I/dalvikvm(24307):   at java.lang.Thread.run(Thread.java:1019)
I/dalvikvm(24307): 
W/dalvikvm(24307): threadid=17: thread exiting with uncaught exception (group=0x40015560)
E/AndroidRuntime(24307): FATAL EXCEPTION: RemoteRequest-http://redacted/_design%2Fcatalog?rev=6-d8b24a62b63d2c5320aa0647aad6d6ed&revs=true&attachments=true
E/AndroidRuntime(24307): java.lang.OutOfMemoryError
E/AndroidRuntime(24307):    at org.codehaus.jackson.util.TextBuffer._charArray(TextBuffer.java:705)
E/AndroidRuntime(24307):    at org.codehaus.jackson.util.TextBuffer.finishCurrentSegment(TextBuffer.java:573)
E/AndroidRuntime(24307):    at org.codehaus.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1922)
E/AndroidRuntime(24307):    at org.codehaus.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1899)
E/AndroidRuntime(24307):    at org.codehaus.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:276)
E/AndroidRuntime(24307):    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59)
E/AndroidRuntime(24307):    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:218)
E/AndroidRuntime(24307):    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
E/AndroidRuntime(24307):    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:218)
E/AndroidRuntime(24307):    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
E/AndroidRuntime(24307):    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:218)
E/AndroidRuntime(24307):    at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
E/AndroidRuntime(24307):    at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2723)
E/AndroidRuntime(24307):    at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1900)
E/AndroidRuntime(24307):    at com.couchbase.touchdb.support.TDRemoteRequest.run(TDRemoteRequest.java:145)
E/AndroidRuntime(24307):    at java.lang.Thread.run(Thread.java:1019)
W/ActivityManager(  109):   Force finishing activity com.insideguidance/.android.InsideMobileActivity
D/dalvikvm(24307): GC_FOR_MALLOC freed 12815K, 51% free 15046K/30663K, external 2031K/2137K, paused 254ms
I/TDDatabase(24307): TDPuller[http://redacted] inserting 33 revisions...
I/InsideMobile(24307): Waiting for replicator to finish
D/dalvikvm(24307): GC_FOR_MALLOC freed 2146K, 52% free 14913K/30663K, external 2031K/2137K, paused 127ms
E/WindowManager(24307): Activity com.insideguidance.android.InsideMobileActivity has leaked window com.android.internal.policy.impl.PhoneWindow$DecorView@40525c68 that was originally added here
E/WindowManager(24307): android.view.WindowLeaked: Activity com.insideguidance.android.InsideMobileActivity has leaked window com.android.internal.policy.impl.PhoneWindow$DecorView@40525c68 that was originally added here
E/WindowManager(24307):     at android.view.ViewRoot.<init>(ViewRoot.java:258)
E/WindowManager(24307):     at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:148)
E/WindowManager(24307):     at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:91)
E/WindowManager(24307):     at android.view.Window$LocalWindowManager.addView(Window.java:424)
E/WindowManager(24307):     at android.app.Dialog.show(Dialog.java:241)
E/WindowManager(24307):     at android.app.ProgressDialog.show(ProgressDialog.java:107)
E/WindowManager(24307):     at android.app.ProgressDialog.show(ProgressDialog.java:90)
E/WindowManager(24307):     at android.app.ProgressDialog.show(ProgressDialog.java:85)
E/WindowManager(24307):     at com.insideguidance.android.SplashScreenPresenter.show(SplashScreenPresenter.java:16)
E/WindowManager(24307):     at com.insideguidance.android.InsideMobileActivity.showSplashScreen(InsideMobileActivity.java:93)
E/WindowManager(24307):     at com.insideguidance.android.InsideMobileActivity.launchApplication(InsideMobileActivity.java:43)
E/WindowManager(24307):     at com.insideguidance.android.InsideMobileActivity.onCreate(InsideMobileActivity.java:35)
E/WindowManager(24307):     at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047)
E/WindowManager(24307):     at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1611)
E/WindowManager(24307):     at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:1663)
E/WindowManager(24307):     at android.app.ActivityThread.access$1500(ActivityThread.java:117)
E/WindowManager(24307):     at android.app.ActivityThread$H.handleMessage(ActivityThread.java:931)
E/WindowManager(24307):     at android.os.Handler.dispatchMessage(Handler.java:99)
E/WindowManager(24307):     at android.os.Looper.loop(Looper.java:130)
E/WindowManager(24307):     at android.app.ActivityThread.main(ActivityThread.java:3683)
E/WindowManager(24307):     at java.lang.reflect.Method.invokeNative(Native Method)
E/WindowManager(24307):     at java.lang.reflect.Method.invoke(Method.java:507)
E/WindowManager(24307):     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)
E/WindowManager(24307):     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)
E/WindowManager(24307):     at dalvik.system.NativeStart.main(Native Method)
V/TDDatabase(24307): TDPuller[redacted]: Setting lastSequence to 6490 from( 6456)

Unable to save checkpoint messages where coming soon after.

@dwt
Copy link
Contributor Author

dwt commented Jul 20, 2012

And indeed this leads to a database that is not complete (missing about 500 documents) where further replication attempts will not get any additional documents.

:-(

As far as I understand this is triggered by the out of memory error during the replication - the design document tin question here was 2.9 megabytes.

As this sometimes succeeds, this could indicate that the rest of the replication is slowly leaking memory.

@dwt
Copy link
Contributor Author

dwt commented Jul 20, 2012

Looking at a memory profiler, it looks like jackson may be used incorrectly so that it tries to accumulate every object replicated in memory somewhere.

This is what the eclipse memory tool spits out as the top suspect:

The thread java.lang.Thread @ 0x405567b0 ChangeTracker-http://redacted keeps local variables with total size 17.969.736 (87,19%) bytes.

The memory is accumulated in one instance of "org.codehaus.jackson.map.util.ObjectBuffer" loaded by "dalvik.system.PathClassLoader @ 0x40514290".

Keywords
org.codehaus.jackson.map.util.ObjectBuffer
dalvik.system.PathClassLoader @ 0x40514290

@dwt
Copy link
Contributor Author

dwt commented Jul 20, 2012

Two things I think need fixing here:

a) The database absolutely has to protect itself against becoming inconsistent due to memory errors - any errors really.
b) It should be possible to replicate even very large databases with a memory usage not bigger than the biggest document to be received.

@mschoch
Copy link
Contributor

mschoch commented Jul 20, 2012

Does that design document contain any attachments?

There are known issues replicating with attachments right now.

Also are you using the latest code?

I've successfully pulled a database with 170k revisions without running out of memory (it ran for 2 hours).

@mschoch
Copy link
Contributor

mschoch commented Jul 20, 2012

Also, to clarify, the tombstones of deleted documents must also be replicated, which is why you saw the deleted documents being replicated.

@dwt
Copy link
Contributor Author

dwt commented Jul 23, 2012

Yes, there is a couch app on that design document. Though none of those files are bigger than several kilobytes.

Yes I am using the latest code.

I did see these issues on Android 2.2 (the oldest supported version) - maybe that might be the reason?

On thursday I will be trying emulators with newer versions of Android to see if that narrows down the problem - I would be more than glad though if you where able to pinpoint the problem already. I did try to debug my way down to jackson and have to admit that my android debugging foo is not up to that task yet. :-(

Oh, and thanks for the heads up on the deleted docs.

@mschoch
Copy link
Contributor

mschoch commented Jul 23, 2012

OK, the attachments are at least part of the problem. The patch I'm working on now should help a bit. The problem is that we are transferring attachments in-line with the document (encoded as base64). This means that the total size of the JSON we parse is the size of the design document plus all off the attachments after base64 encoding (roughly 33% larger).

With the patch I'm working on now, a document and its attachments are transfered as a MIME multi-part message. And each of the attachments can be streamed directly to disk, never going through the JSON parser.

@dwt
Copy link
Contributor Author

dwt commented Jul 24, 2012

Out of curiosity, is my observation correct that the json parser keeps a reference to each parsed object? (At least on Android 2.2)

Also Android 2.2 seems to have a lot stricter memory limits than later versions (4.0 for example).

@dwt
Copy link
Contributor Author

dwt commented Jul 25, 2012

I've retested replication on a virtual 4.0 device and the memory problem seems not to exist there / seems to be a lot less severe there. In any case while the 2.2 emulated device was never able to finish a replication, the virtual 4.0 device is.

Not sure if this indicates that this might be more of an android versioning issue.

@ZavenArra
Copy link

I also seem to be experiencing this problem - a number of documents that have attachments do not show up in the replicated database (mobile => couchdb server), even though the replication thinks it's complete. Is there anything I should do to verify that this is the same issue, and not a new one? I'm experiencing this on Ice Cream Sandwich.

@mschoch
Copy link
Contributor

mschoch commented Aug 9, 2012

There are several problems with the replicator right now. It will deadlock and/or run out of memory in lots of cases, and for some reason doesn't resume correctly upon restart.

If you can identify if there are any particular documents that never replicate. That would be useful to know. Or if its different documents each time that might also be useful.

Finally if you have errors in the log that would help.

@ZavenArra
Copy link

I created a new remote database, turned off the Pull replication, thus trying to push all documents to the remote. The replicator started up correctly, I see a long list of memory allocation Grow_heap (frag case) calls in the logs, and then then after maybe a minute the app crashes with an out of memory exception. At the time of the crash, no documents have been propagated this new remote db. Here's the salient log entries:

https://gist.github.com/3305565

The docs that don't replicate are identical in structure to the docs that do - basically some descriptive info fields (text) and then 2 attachments - a thumbnail, and a medium quality resize of a photograph. Basically this:

medium.jpg 376.2 KB, image/jpeg
thumb.jpg 27.0 KB, image/jpeg

From 2 different devices I'm seeing what appears to be a random subset of documents that don't replicate.

@snej
Copy link

snej commented Aug 9, 2012

a number of documents that have attachments do not show up in the replicated database (mobile => couchdb server)

If the documents have multiple attachments larger than 16k bytes, this could be the same as this issue I just worked around for iOS:
https://github.com/couchbaselabs/TouchDB-iOS/issues/133
Basically, CouchDB gets confused about the order the attachments are sent in, and bad stuff ensues. In that specific instance it was throwing an Erlang exception and returning a 500 error, but I can imagine other problems where you might just not get a response at all.

@mschoch
Copy link
Contributor

mschoch commented Aug 13, 2012

Currently we don't send attachments with MIME multipart (this patch is in progress). Once we implement that feature, we may run into the issue Jens is referring to. I think we can work around it by using a feature of the Jackson JSON parser to force ordering of the attachments map.

Back to this issue, the log shows an out of memory is occurring when it is trying to base64 encode one of the attachments. It was trying to allocate around 1 megabyte at the type it failed. It seems as though you already had an absurdly high amount of memory allocated though (over 100MB).

By any chance did you have the debugger attached when you were running this replication? Previously we used a new thread for each request. This was much worse for pull replications, but in long-running push replications it can be a problem as well. If not, there may be some other leak...

@sameersegal
Copy link

I am facing issues in both PUSH and PULL replication. I am testing in a low connectivity and on a limited phone config. I think the issue is stemming from the list of doc changes we keep in memory. (I get quite a few OutOfMemoryExceptions). Additionally, I have noticed that in PULL replication, the thread removes the entry from the inbox and kicks off an AsyncTask. In the event that the thread/AsyncTask fails there is no update to the Inbox. Is my understand correct?

Do you think it will help if we write the list of documents into the DB and fire threads based on the DB?

@fscz
Copy link

fscz commented Mar 27, 2013

As for the question about how attachments are pulled/pushed:
The real solution to this problem is to use multipart messages.
However there was a problem with CouchDB that prevented this solution.
See https://issues.apache.org/jira/browse/COUCHDB-1521

The good news is, that this has been fixed in CouchDB master branch.
See https://git-wip-us.apache.org/repos/asf?p=couchdb.git;a=shortlog;h=refs/heads/master;pg=1
The patch name is "Send attachment headers in multipart responses"
The date is: 08.01.2013

Since the CouchDB Version 1.2.1 was tagged 09.01.2013 this should be
in the latest version of CouchDB and therefore the multipart solution to our
problem is now available and I will implement that soon in my fork.

@snej
Copy link

snej commented Mar 27, 2013

The real solution to this problem is to use multipart messages.
However there was a problem with CouchDB that prevented this solution.

Well, it didn't prevent using multipart, but it did make it more difficult. When receiving MIME responses, TouchDB/iOS computes a digest of each MIME body and uses those to figure out which attachment goes with which metadata object. On the sending side, it uses a custom JSON encoder that writes out the _attachments dictionary keys in known (sorted) order so that it can write the MIME bodies in the same order.

@fscz
Copy link

fscz commented Mar 27, 2013

I see. Though a bit cumbersome this approach also works with earlier CouchDB versions.
Moreover it makes me think again about the push mechanism in TouchDB-Android.
I have not looked at it in detail. In a nutshell, what approach is taken there? Same as TouchDB-IOS?

@snej
Copy link

snej commented Mar 27, 2013

In a nutshell, what approach is taken there? Same as TouchDB-IOS?

I think it's the same as whatever TouchDB-iOS was doing last summer, or whenever Marty stopped having time to port my commits :)

@sameersegal
Copy link

I am not talking about attachements. I have a more basic question on the Pull/Push mechanism

In TouchDB-Android - we remove the revision from the ArrayList after launching a request without checking if the request completed successfully or not.

            pullRemoteRevision(revsToPull.get(0));
            revsToPull.remove(0);

And the later:

      if (inboxCount == 0) {
            // Nothing to do. Just bump the lastSequence.
            Log.w(TDDatabase.TAG,
                    String.format("%s no new remote revisions to fetch", this));
            setLastSequence(lastInboxSequence);
            return;
        }

Thus if there are 10 changes in the inbox, suppose 9th fails but 10th succeeds (may be due to a dip in internet, or timeout or whatever), then seq gets updated to the latest and there is no entry for the lose of the 9th change.

Is my understanding correct?

@snej
Copy link

snej commented Mar 29, 2013

The current iOS TDPuller is good about keeping track of this. It uses a helper class called a TDSequenceMap to keep track of which remote sequences are pending vs. which have been successfully copied. The sequence map can tell it what remote sequence to checkpoint.

I looked through the iOS TDPusher code and it's less robust. I think it hasn't had as much attention because it's not used as heavily (a lot of apps rely more on pull) and because most of the time it has an easier job since it can send lots of revisions in one _bulk_docs call. But it's not being careful about advancing the lastSequence property, so in some edge cases it could lose revisions — I need to fix that.

Update: Filed https://github.com/couchbaselabs/TouchDB-iOS/issues/246 on this.

@ghost ghost assigned tahmmee Jul 24, 2013
@jchris
Copy link
Contributor

jchris commented Jul 24, 2013

We should link to the test that proves this is fixed.

I like what @dwt said:

Two things I think need fixing here:

a) The database absolutely has to protect itself against becoming inconsistent due to memory errors - any errors really.
b) It should be possible to replicate even very large databases with a memory usage not bigger than the biggest document to be received.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants