Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullReferenceException in FeedIterator.ReadNextAsync #1004

Closed
joshlang opened this issue Nov 15, 2019 · 11 comments · Fixed by #1105
Closed

NullReferenceException in FeedIterator.ReadNextAsync #1004

joshlang opened this issue Nov 15, 2019 · 11 comments · Fixed by #1105
Assignees
Labels
bug Something isn't working QUERY

Comments

@joshlang
Copy link
Contributor

joshlang commented Nov 15, 2019

Using 3.4.1

I'm aware of this: #871 ... but this doesn't seem to be related to a serializer (although we do also use a custom serializer).

Our code is essentially:

IQueryable<T> query = ...;
var feedIterator = query.ToStreamIterator();
await feedIterator.ReadNextAsync(cancellationToken); // NullReferenceException

Adding some debugging statements let us discover: query.ToString() yields {"query":"SELECT VALUE root FROM root WHERE ((root[\"Type\"] = \"Blockchain\") AND (NOT root[\"Deleted\"])) "}

Interestingly (sigh), the code works on dev machines (windows) but not when deployed (linux, kubernetes). Same database though. Other database calls do work, so it's not some weird access issue.

Stack trace:

System.NullReferenceException:
   at Microsoft.Azure.Cosmos.Query.CosmosQueryExecutionContextFactory+<CreateFromPartitionedQuerExecutionInfoAsync>d__14.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.4.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Cosmos.Query.CosmosQueryExecutionContextFactory+<CreateItemQueryExecutionContextAsync>d__13.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.4.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Cosmos.Query.CosmosQueryExecutionContextFactory+<ExecuteNextAsync>d__11.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.4.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.Cosmos.Query.QueryIterator+<ReadNextAsync>d__6.MoveNext (Microsoft.Azure.Cosmos.Client, Version=3.4.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at XXX.Database.Core.DbBase+<ExecuteQueryAsync>d__18`1.MoveNext (XXX.Database.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=nullXXX.Database.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: /src/XXX.Database.Core/DbBase.csXXX.Database.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: 393)

Still investigating, but at this point I'm reasonably certain it's a sdk bug.

@joshlang joshlang changed the title NullReferenceException in FeedIterator.ReadANextAsync NullReferenceException in FeedIterator.ReadNextAsync Nov 15, 2019
@joshlang
Copy link
Contributor Author

By the way, CreateFromPartitionedQuerExecutionInfoAsync is spelled wrong in your internal class :D

@joshlang
Copy link
Contributor Author

This touched the code and renamed it to TryCreateXXX instead of CreateXXX. No idea if it fixed it, but just saying the 3.4.1 I'm using isn't the latest code: #943

@j82w
Copy link
Contributor

j82w commented Nov 18, 2019

@joshlang any chance you can provide a simple console app with a repo?

@bchong95 any suggestions?

@j82w j82w added bug Something isn't working QUERY labels Nov 18, 2019
@joshlang
Copy link
Contributor Author

Well this was annoying to debug. Ok, so here's what I've found so far.

It is related to the use of a serializer, but it still seems to be a SDK bug.

If I construct my client like this, everything works fine:

var client = new CosmosClient(endpoint, auth);

If I construct my client with a serializer like this, everything works fine FOR WINDOWS but crashes with a NotImplementedException in LINUX

class NothingSerializer : CosmosSerializer
{
    public override T FromStream<T>(Stream stream) => throw new NotImplementedException();
    public override Stream ToStream<T>(T input) => throw new NotImplementedException();
}
...
var client = new CosmosClient(endpoint, auth, new CosmosClientOptions
{
    Serializer = new NothingSerializer()
};

The test query I'm doing has no results.

In windows, the serializer is never called. In linux, it calls FromStream<Microsoft.Azure.Cosmos.Query.PartitionedQueryExecutionInfo> with this payload:

{"partitionedQueryExecutionInfoVersion":2,"queryInfo":{"distinctType":"None","top":null,"offset":null,"limit":null,"orderBy":[],"orderByExpressions":[],"groupByExpressions":[],"groupByAliases":[],"aggregates":[],"groupByAliasToAggregateType":{},"rewrittenQuery":"","hasSelectValue":true},"queryRanges":[{"min":"","max":"FF","isMinInclusive":true,"isMaxInclusive":false}]}

However, Microsoft.Azure.Cosmos.Query.PartitionedQueryExecutionInfo is an internal class and I cannot deserialize it.

I tried this, to see if JsonSerializer had some magic, but alas, no luck:

class NothingSerializer2 : CosmosSerializer
{
    public override T FromStream<T>(Stream stream)
    {
        using var ms = (MemoryStream)stream;
        return JsonSerializer.Deserialize<T>(ms.ToArray().AsSpan());
    }
    public override Stream ToStream<T>(T input) => throw new NotImplementedException();
}

The value that's deserialized is not null. I do get a PartitionedQueryExecutionInfo. But calling ToString() on it yields: {"partitionedQueryExecutionInfoVersion":2,"queryInfo":null,"queryRanges":null}

I suspect that the consumer of the object is hitting a NRE because of the nulls in the PartitionedQueryExecutionInfo.

It seems this is probably related to #871 after all.

Hopefully not a System.Text.Json bug :D

@joshlang
Copy link
Contributor Author

Here's a minimal repo:

class Program
{
    class NothingSerializer : CosmosSerializer
    {
        public override T FromStream<T>(Stream stream)
        {
            using var ms = (MemoryStream)stream;
            Console.WriteLine($"FromStream<{typeof(T).Name}> called with payload: {Encoding.UTF8.GetString(ms.ToArray())}");
            var t = JsonSerializer.Deserialize<T>(ms.ToArray().AsSpan());
            Console.WriteLine($"Returning deserialized <T>.ToString(): {t?.ToString()}");
            return t;
        }
        public override Stream ToStream<T>(T input) => throw new NotImplementedException();
    }
    static async Task Main(string[] args)
    {
        var client = new CosmosClient("...", "...", clientOptions: new CosmosClientOptions
        {
            Serializer = new NothingSerializer()
        });
        var db = await client.CreateDatabaseIfNotExistsAsync("test1");
        var cc = await db.Database.CreateContainerIfNotExistsAsync("test1container", "/id");
        var c = cc.Container;
        var query = c.GetItemLinqQueryable<IDictionary<string, object>>(allowSynchronousQueryExecution: false, requestOptions: new QueryRequestOptions { PartitionKey = new PartitionKey("blerg") })
            .Where(x => ((string)x["a"]) == "b");    // No exception if removing this line
        var iterator = query.ToStreamIterator();
        using var results = await iterator.ReadNextAsync();
        Console.WriteLine("I'm alive!");
    }
}

This is a .Net core 3.1 console app. With <LangVersion>8.0</LangVersion> in the csproj file. Add the default Dockerfile (Linux).

F5-debugging works, and outputs I'm alive!.

Build & run in linux docker container and It'll explode.

@joshlang
Copy link
Contributor Author

Deepening the mystery... It does crash in windows when I'm running code in an XUnit test project, but not in an identical console app project. Both netcoreapp3.1.

But I'm not sure that helps with this fix.

@joshlang
Copy link
Contributor Author

joshlang commented Nov 19, 2019

Shallowifying the mystery...

The problem occurrs in x86 mode only. NOT x64.

This is why it runs in a console app (x64) but not in XUnit (x86).

This is why it runs in our local dev environment on windows (x64) but not when deployed on linux in kubernetes (x86).

While the mystery factor is now reduced, the WTF factor has increased.

#974 ?

@j82w
Copy link
Contributor

j82w commented Nov 19, 2019

It's likely something similar to #974. It seems like the gateway is giving some response the client is not correctly handling. @bchong95 will be investigating the issue.

@j82w j82w assigned bchong95 and j82w and unassigned bchong95 Nov 19, 2019
@j82w
Copy link
Contributor

j82w commented Nov 20, 2019

After further investigation I'll be working on a fix for this.

@joshlang
Copy link
Contributor Author

Is this gonna be in the next release by any chance? We still can't do any queries or use the change feed until it's fixed

@j82w
Copy link
Contributor

j82w commented Dec 10, 2019

sorry for the delay. I got pulled to fix a few other bugs. I'm currently working on a PR to fix this issue. I don't know if it will make the next release. Once the PR is out and merged we can see about doing a hot fix for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working QUERY
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants