Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.listen causes spurious periodic snapshot to be dumped #275

Closed
rawmean opened this issue Mar 28, 2019 · 12 comments
Closed

.listen causes spurious periodic snapshot to be dumped #275

rawmean opened this issue Mar 28, 2019 · 12 comments

Comments

@rawmean
Copy link

rawmean commented Mar 28, 2019

  • Operating System version: Ubuntu 14.04
  • Firebase SDK version: 2.16.0
  • Library version: 2.16.0
  • Firebase Product: db module (

[REQUIRED] Step 3: Describe the problem

Please see this: https://stackoverflow.com/questions/54504339/python-firebase-admin-listener-automatically-fires-every-hour

Steps to reproduce:

Please see the example code provided above.
Problem: the .listed causes the entire database to be returned once per hour even when there is no change. Use the sample code provided to reproduce the issue.

Relevant Code:

Please see the code in: https://stackoverflow.com/questions/54504339/python-firebase-admin-listener-automatically-fires-every-hour

@google-oss-bot
Copy link

I found a few problems with this issue:

  • I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.
  • This issue does not seem to follow the issue template. Make sure you provide all the required information.

@hiranya911
Copy link
Contributor

Hey @rawmean. Unfortunately this is just the way the listen() API works. The underlying HTTP/OAuth session expires every hour, and the SDK must reconnect with the server when this happens. At this point the server sends back the current contents of the reference you're listening on. There's simply no way around this when using OAuth2 and the REST protocol to communicate with the RTDB server. You need the WebSocket protocol to get around it, which we have no plans of implementing in the Python SDK. Therefore in #268, we took steps to document this behavior as follows:

The specified callback function will get invoked with db.Event objects for each realtime update received from the database. It will also get called whenever the SDK reconnects to the server due to network issues and credential expiration. In general, the OAuth2 credentials used to authorize connections to the server expire every hour. Therefore clients should expect the callback to fire at least once every hour, even if there are no updates in the database.

Generally speaking, we are currently not working on supporting new APIs for the Realtime Database. Even the listen() API you're using today was implemented and contributed by members of the open source community. This API is good enough for applications listening on references with small amounts of data. But it's definitely not optimized for large updates.

As far as I can see, you only have three options here:

  1. Switch to our Node.js or Java SDKs. They both implement the WebSocket protocol.
  2. Try to find a third-party library for accessing Firebase over WebSocket. I only know of one other Python library that supports Firebase, but they are using the REST protocol as well. However, they do support authorizing requests without OAuth2 (i.e. as a client using an API key). That can overcome this issue, but you will be accessing the database as a client, not as a backend component.
  3. Try to implement the WebSocket protocol for Firebase yourself. This is not a well documented interface, so I'd expect this to be fairly difficult.

Duplicate of #267.

@rawmean
Copy link
Author

rawmean commented Mar 28, 2019

Thanks @hiranya911 . Is it possible to increase the token lifetime to more than one hour. It feels like I am using an orphan/depricated API. I am subscribed to a paid service (firebase). I am not a Node.js or Java developer.

Maybe you can suggest a better way if I explain my use-case:
I have only a single "user". All instances of my iOS app act as a single user and upload data to the database. I have a single server who accesses the database as that single user.

Would it help if I create an anonymous user and have the server (which uses .listen) to access the database whereby avoiding this token expiration problem?

I appreciate the help.

@hiranya911
Copy link
Contributor

hiranya911 commented Mar 28, 2019

Hi @rawmean. Token expiry time is a Google-wide default. That is for security reasons. A leaked/exposed token with a long expiry time could have disastrous consequences.

Given your use case, I think you can get around the issue by not using OAuth2. Admin SDK doesn't support this mode of authentication. But Pyrebase library supports this (see bullet point 2 in my previous comment), and you should be able to implement your program without issues using that.

@rawmean
Copy link
Author

rawmean commented Mar 28, 2019

@hiranya911 I was using Pyrebase up until a month go, but it has its own critical problems with stream (aka listen). When number of entries in the database is ~2000 or more, the updates to the database do not call the callback function.

Would it help if I create an anonymous user and have the server (which uses .listen) to access the database whereby avoiding this token expiration problem?

@hiranya911
Copy link
Contributor

Admin SDK does not support accessing the database as a user. We support OAuth2 only.

I should also point out that our listen() API is very similar to Pyrebase:

"""SSEClient module to stream realtime updates from the Firebase Database.
Based on a similar implementation from Pyrebase.
"""

So you're likely to encounter similar issues with both libraries. But at least Pyrebase supports accessing the database as a client (which is something we don't allow).

@hiranya911
Copy link
Contributor

hiranya911 commented Mar 28, 2019

It seems you're in a tough position.

When number of entries in the database is ~2000 or more, the updates to the database do not call the callback function.

I think I know why this is. We encountered a similar problem and fixed it in our code: #221. But unfortunately we make OAuth2 mandatory. Pyrebase doesn't mandate OAuth2, but it's probably suffering from the above performance issue.

@rawmean
Copy link
Author

rawmean commented Mar 28, 2019

I can't possibly be the only firebase user who is interested in using Python with the real-time database and does not want to pay 10x more than what the actual download should be?

What am I missing? Real-time database is at the core of Firebase and Python is not some obscure language. you seem like an experienced developer and surely you can see how broken this API is.

Can I/you escalate this? Again, this is a paid service provided by a large company. This is not a science project provided by some individual.

@hiranya911
Copy link
Contributor

hiranya911 commented Mar 28, 2019

You can try reporting at https://firebase.google.com/support/. Please note that Real-time Database is feature frozen. Meaning we're not working on new APIs. The existing listen() API was added only about 8 months ago, and even that was a contribution from the community. This is why it's marked as an experimental feature, and I don't expect this status to change anytime soon:

This API is based on the event streaming support available in the Firebase REST API. Each
call to ``listen()`` starts a new HTTP connection and a background thread. This is an
experimental feature. It currently does not honor the auth overrides and timeout settings.
Cannot be used in thread-constrained environments like Google App Engine.

On top of that, we don't see a huge demand for server-side Python support for RTDB. The cases where you'd want to use an API like listen() on a server are ever becoming scarce since Cloud Functions support such use cases far more elegantly.

You need a couple of things to fall into place for your use case to be viable in Python:

  1. Support user/client authentication, which is not available in this library.
  2. The performance improvements that are available in this library.

Given the circumstances, my advice to you is to try and implement your own listen() method without OAuth2. You should be able to use our _sseclient.py module for the most part, which should give you 2. If it's ok for your database to be publicly readable, just make it so in security rules, and initiate a connection via our SSE module. The resulting connection should remain open until you close it or the server shuts down.

@hiranya911
Copy link
Contributor

Hey @rawmean. I was chatting with a team member about this issue, and he pointed out a couple of other options that may be viable for you. I'll list them as 4 and 5 as an add-on to my earlier comment:

  1. See if you can revise your RTDB schema, so that the listen works on a smaller portion of your database. You can probably move some of the static content in the database to a separate node, and listen only on the data that is regularly changing. Hopefully, that portion of your data is not too large.
  2. Cloud Firestore will soon start supporting snapshot listeners in its Python SDK soon. The API is already implemented, and now awaiting launch. This is also an option for you if you can manage to migrate from RTDB to Firestore.

@rawmean
Copy link
Author

rawmean commented Mar 28, 2019

Thank you @hiranya911 for thinking about this and for your guidance. I have done #4 already. My clients (iOS apps) upload their schedule to the RTDB and my server pays attention to the uploaded schedules and turns on their car's climate based on user schedule. The listen is done on the schedule portion of the database only. There are about 4000 entries and it's growing fast.

If I allow anonymous user login, does the 1-hour token restriction still apply? My data on the server is encrypted already so making it visible is not a big risk.

#5 seems promising. The transition would not be easy because I have ~8000 iOS clients so I have to somehow maintain both RTDB and Firestore for a while.

@hiranya911
Copy link
Contributor

If I allow anonymous user login, does the 1-hour token restriction still apply? My data on the server is encrypted already so making it visible is not a big risk.

What you need is no-auth. Any form of auth token (OAuth2 tokens, ID tokens etc) has a TTL, and the client must reconnect when the token expires. But if you make your database publicly readable, you can stream updates from it without any auth tokens, and the session will remain valid unless it gets interrupted by a network issue. However, our SDKs don't support this. So you will have to write something on you own. You should be able to reuse our _sseclient.py implementation to handle most of the low-level details, but you will have to write some code on top of it.

I would also like to point you towards Cloud Functions RTDB triggers: https://cloud.google.com/functions/docs/calling/realtime-database#functions_firebase_rtdb-python

As your database continues to grow, you will eventually come to a point, that a single server listening on the database is simply not going to scale, even if you somehow manage to solve your current problems. Cloud Functions should help you scale up to arbitrarily large volumes of data. You can register a trigger on a wildcard path like /schedules/{userId}, and each update will then result in a separate execution, without ever downloading the entire /schedules node. This is now supported in Node.js, Python and Go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants