-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning a copy collection to avoid corrupted iterator #3913
Conversation
|
Please verify the committer name, email, and GitHub username association are all correct and match CLA records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ntisseyre ! Happy to see you here 😄
I added some comments regarding this PR, because I don't know the purpose of the changes. I assume they might be needed for latter contributions.
Besides that, you will need to sign either ICLA or CCLA, so that we could merge that work into JanusGraph.
You can check the contribution guide here: https://github.com/JanusGraph/janusgraph/blob/master/CONTRIBUTING.md#sign-the-cla
Also, each commit has to be signed by you. You can update your commit signature via git commit --amend -s
. See instructions about signing commits here: https://github.com/JanusGraph/janusgraph/blob/master/CONTRIBUTING.md#commit-changes-and-sign-the-developer-certificate-of-origin
Please, let me know if you need any help there and I can jump in.
@Override | ||
public synchronized Collection<InternalRelation> getAll() { | ||
return super.getAll(); | ||
return new ArrayList<>(super.getAll()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see getAll()
method is used in a single place (to pass added relations into the commit logic).
I'm thinking concurrent access to this collection could happen only in the situation when we are in the process of committing the transaction but also another thread is adding new relations into this transaction.
I'm can't think about cases when adding new relations while committing the transaction makes sense. I could miss some picture here and could be that this part might be needed for future optimizations / features, but I just can't think about the case when wrapping this collection into mutable collection is needed. Right now super.getAll()
returns unmodifiable collection which can't be mutated by the caller.
Perhaps you could give some example why this change is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only usage I found when getAll()
is called is when we commit the transaction. I that case, I believe, only a single thread will be committing a transaction. I'm not sure committing the same transaction OR mutating elements when another thread is committing a transaction is a safe operation.
I do understand that the class is called ConcurrentAddedRelations
and it should be assumed that all method under this class will return thread-safe collections, but it seems it's an unnecessary operation in this case.
What I was thinking is that we can change the naming of this interface method from getAll()
to getAllUnsafe()
which would clarify to the caller that the product of this method is not thread-safe.
However, I'm OK if you want to leave it as is because this method is called only during the transaction commit operation. Thus, shouldn't be expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed the method to getAllUnsafe
and removed the copy
@Override | ||
public synchronized Iterable<InternalRelation> getView(final Predicate<InternalRelation> filter) { | ||
return super.getView(filter); | ||
return copyView(super.getView(filter)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this method is called from several places, but I can't figure out if there is a bug or not.
Would you be able to explain why this change is needed? Ideally it would be great to have a test to reproduce the bug, but in some cases it's complicated to write a meaningful test for a bug. In such case a simple description of the problem could be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the description of the problem to the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this part with @ntisseyre previously.
Current concern is that this method might be called too often which may increase pressure on GC due to creation of many copied arrays in memory.
Below are my observations.
Observation 1.
As I'm checking right now, this method is called for each added Vertex Property. I.e. the below snipped will call this method 3 times:
Vertex vertex = tx.addVertex();
vertex.property("foo1", "bar"); // first `getView` call
vertex.property("foo2", "bar2"); // second `getView` call
vertex.property("foo3", "bar3"); // third `getView` call
The addedRelations
(ConcurrentAddedRelations
) is used in two separate places. One addedRelations
for the whole transaction and the second addedRelations
collection for a each vertex. At the beginning I was afraid that it might cause getView
call to the addedRelation
of the Transaction which would cause a lot of objects to be copied in-memory (i.e. all relations for the whole transaction). However, upon checking I see that it's not the case. When we are adding property
that relation is added into both Transaction addedRelation
collection and StandardVertex addedRelation
collection, but the getView
call happens for the StandardVertex addedRelation
only and not for the whole transaction.
Essentially, it's not a big concern, because properties of a single vertex are copied, and not all properties of all vertices.
Observation 2.
It also seems that when we are executing queries we also call getNew
method of StandardJanusGraphTx
which essentially calls getView
of the Transaction addedRelation
(i.e. whole transaction). I assume, if you execute many read transaction and also adding many relations in that transaction it might be a problem because with this implementation it will result in many auxiliary arrays to be allocated in-memory to hold all the references.
That said, looking at getNew
implementation it seems we already copy those elements into a Set (in one case, but passing a simple reference in another case). And looking at getUnfoldedIterator()
which is used to process query it seems that the elements returned from getNew
are copied again into other collections.
As such, the time & space complexity of this flow doesn't actually change. The operations amount change, but amortized complexity stays the same. I.e. if we have N elements in that collection the amount of operations increases from N2 to N3 which I believe isn't a big problem. If we want to optimize that then it might be better to optimize N instead of constant.
Overall, I don't think this will cause many problems. The only usage pattern affected I see is when you are adding a huge amount of properties in a single transaction as well as querying some data in parallel for the same transaction. I will assume this isn't the most popular usage pattern.
Conclusion: I think it's OK to use copyView
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Thank you! I'm working on my CLA |
bfa75db
to
ede08c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ntisseyre for the contribution!
LGTM, however, see the comments I left above #3913 (comment)
I would prefer to simply changing the name of getAll
method to the one below instead of copying that collection into a new ArrayList. However, I'm OK if you want to leave it as is.
@Override
public synchronized Collection<InternalRelation> getAllUnsafe() {
return super.getAllUnsafe();
}
9b1fa1f
to
6224134
Compare
@JanusGraph/committers I will be merging this PR in a week following lazy consensus unless anyone else jumps in for the review. |
… threads Signed-off-by: ntisseyre <[email protected]>
Merging by following lazy consensus. Backporting to 1.0. |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation and see the Github Action logs for details |
In a highly concurrent environment, within a single JanusGraph transaction, where multiple thousands of vertices are being created, I have been observing an exception:
After an investigation, I have found that the container of vertex relationships might be modified and iterated in parallel, causing a corrupted iterator.
To solve an issue, I have made the function to return a copy of the current iterator state.