Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pubsub.subscribe fails if called twice with the same subscription name #1257

Closed
jasonpolites opened this issue Apr 22, 2016 · 29 comments
Closed
Assignees
Labels
api: pubsub Issues related to the Pub/Sub API. type: question Request for information or clarification. Not an issue.

Comments

@jasonpolites
Copy link

The sample code for pubsub will fail if it's run twice.

The "subscribe" method:

topic.subscribe('new-subscription'...)

Will fail if "new-subscription" already exists. I see the following error:

Resource already exists in the project (resource=new-subscription).

@stephenplusplus
Copy link
Contributor

All subscriptions must be given unique names. That line is saying: "Create a new subscription named new-subscription", so it is expected to fail the second time.

Do you think it would be more clear if we changed it to:

// Subscribe to the topic.
topic.subscribe('new-subscription-name', function(err, subscription) {

@stephenplusplus stephenplusplus added type: question Request for information or clarification. Not an issue. api: pubsub Issues related to the Pub/Sub API. labels Apr 22, 2016
@jasonpolites
Copy link
Author

Sure, but presumably we don't want to be creating thousands/millions of subscriptions? (one per code invocation). What I really want to do is get back to the same subscription, but it seems there are a bunch of hoops I need to jump through to do that.

Are we saying that the "recommended" approach is to first call getSubscriptions to see if the one you're looking for is there, and if it's not call subscribe?

Seems to me that it would be simpler for the developer if subscribe were a "get-or-create"

@stephenplusplus
Copy link
Contributor

You can use reuseExisting (https://googlecloudplatform.github.io/gcloud-node/#/docs/v0.31.0/pubsub/topic?method=subscribe):

topic.subscribe('maybe-subscription-name', { reuseExisting: true }, function(err, subscription) {
  // subscription was "get-or-create"-ed
});

Another way to do it (this is a pattern throughout our whole API):

var subscription = topic.subscription('maybe-subscription-name');
subscription.get({ autoCreate: true }, function(err, subscription) {
  // subscription was "get-or-create"-ed
});

@jgeewax
Copy link
Contributor

jgeewax commented Apr 22, 2016

Hm - seems like that should be the default, no ?


Adding more context...

If we default to {reuseExisting: true}, we're effectively saying "this is the worker model" in the sense that you're creating more instances that will pull from the same subscription (ie, 1 message == 1 item of work)

If we default to {reuseExisting: false} (as we do today), we're saying "this is the broadcast model" in that your each instance will get all the messages (ie, 1 message = N items, where N is the number of nodes).

For the worker model, names matter, and should be required. For the broadcast model, names don't really matter so much (the topic matters a lot, but the name of the subscription doesn't seem that useful, right?)

Is this ringing true with everyone? or am I crazy? @tmatsuo @stephenplusplus

@stephenplusplus
Copy link
Contributor

stephenplusplus commented Apr 22, 2016

reuseExisting should be true by default? I don't think so, because if it exists, you would just use:

var subscription = topic.subscription('subscription-name');
subscription.on('message', function(message) {});

If you're unsure if it exists, that's why we have both of those examples from my last post.

I'm not sure I understand the Pub/Sub use cases where users are uncertain if a topic exists (related: #696), but if you both think that defaulting reuseExisting is the right way to go, I'll gladly send a PR. Is it worth considering removing the reuseExisting option altogether?

// @tmatsuo since we're talking Pub/Sub stuff.

@stephenplusplus
Copy link
Contributor

RE: worker/broadcast model concepts, I haven't looked at it like that before, but I think that makes sense. If I'm following, are you saying we should generate subscription names for the user? (Related: googleapis/google-cloud-ruby#193)

@jgeewax
Copy link
Contributor

jgeewax commented Apr 22, 2016

So making this into code:

I'm a worker! I might not see every message that gets published!

var topic = pubsub.topic('tweets');
pubsub.subscribe(topic, 'parse-language', { reuseExisting: true }, function(err, sub) {
  sub.on('message', function(msg)  {
    // Detect the language of this tweet!
    // Then ack!
  });
});

I'm just a listener to everything! Bring it on!

var topic = pubsub.topic('tweets');
pubsub.subscribe(topic, 'random-listening-subscription-id', {autoAck: true}, function(err, sub) {
  sub.on('message', function(msg) {
    console.log('Got a tweet! Awesome!');
  });
});

Though the latter case might be better as...

var topic = pubsub.topic('tweets');
pubsub.subscribe(topic, function(err, sub) {
  sub.on('message', function(msg) {
    console.log('Got a tweet! Awesome!');
  });
});

@jasonpolites
Copy link
Author

Maybe a simply solution for now is to just add { reuseExisting: true } to the README so users don't get tripped up in their first 5 minutes

@stephenplusplus
Copy link
Contributor

Thanks for the code examples, it always helps! So is this right?:

If I give a name to subscribe, I expect it to be created. Remove the reuseExisting option. If it can't be created because it already exists, don't return an error and just use the subscription.

If I don't give a name to subscribe, just create one for me named something random.

@jgeewax
Copy link
Contributor

jgeewax commented Apr 23, 2016

I'd want @tmatsuo to chime in here if possible... but that's my thought... Sometimes I want to just listen on a topic, and I don't ever intend to share the listening responsibilities with anyone else...

@stephenplusplus
Copy link
Contributor

Ping @tmatsuo for thoughts!

@tmatsuo
Copy link
Contributor

tmatsuo commented May 3, 2016

Sorry guys for the late reply.

Currently subscription name automatic assignment doesn't work as expected, so there's no reliable way of auto assigning a subscription name. However, the use cases that @jgeewax explained make sense to me.

@stephenplusplus
Copy link
Contributor

@jgeewax should we wait until that feature is supported to change the model here or implement now and do our own name generation, e.g. {projectId}-{uuid}?

@jasonpolites
Copy link
Author

If you auto-generate a name, isn't there a risk that the user will end up back in the world of thousands of zombie subscriptions? (does that matter?)

Would it make sense to just have a "default" subscription name, such that the absence of a name argument in subscribe simply means using a default name? (like, I dunno: "DEFAULT"). The implication being that any client who subscribes in this way will be using the same subscription instance

@stephenplusplus
Copy link
Contributor

@jgeewax @tmatsuo any thoughts?

@tmatsuo
Copy link
Contributor

tmatsuo commented May 13, 2016

It's just my opinion. I would not implement subscription names auto generation on the client side. There are many corner cases and the implementation will be complex. I would wait for the server side to implement it.

I don't think zombie subscription is a big problem. Subscriptions which are inactive for 30 days will be automatically get turned down. Also messages delivered to the subscriptions are only retained for 30 days.

@eschapira
WDYT?

@eschapira
Copy link

I think I'm going to suggest the opposite of Takashi.

The autogeneration is simple; on the server side this is the code (in java)

  // Compose subscription name of topic + a random long number in hex format.
  String subscriptionName =
      topicName + "-" + String.format("%x", Math.abs(Random.nextLong()));

It would be fine to do this on the client.

Regarding the default subscribe() behavior, I would prefer if it doesn't create unintended random subscriptions, it will pollute the client resources and possibly put unnecessary load on the server if this pattern repeats too much. Users should explicitly say autoCreateSubscription: true or something like that.

@tmatsuo
Copy link
Contributor

tmatsuo commented May 13, 2016

What happens if the topic name has a length of 255?

@tmatsuo
Copy link
Contributor

tmatsuo commented May 13, 2016

It's also tricky if multiple subscribers are auto creating subscriptions at the same time. because then they are unintentionally using the same subscription?

@tmatsuo
Copy link
Contributor

tmatsuo commented May 13, 2016

Although I'm not strongly against implementing it on the client side. If the implementation is robust, I'm fine with it.

@stephenplusplus
Copy link
Contributor

stephenplusplus commented May 23, 2016

Thank you @tmatsuo and @eschapira for the information.

@jgeewax how shall we proceed?

@jgeewax
Copy link
Contributor

jgeewax commented May 23, 2016

Can you summarize the options here? I realize that the thread started with "don't fail when subscribing twice", but it progressed to a few different use-cases on top of that, right?

@stephenplusplus
Copy link
Contributor

Yes, this is mostly a discussion of an API shift you suggested here, allowing for unnamed subscriptions to be created automatically. I think the best thing to do would be going over the thread starting from there, as we had many opinions weigh in, so it'd be unfair of their expertise if I left any of their points out in a summary.

@stephenplusplus stephenplusplus added this to the 1.0 milestone Sep 26, 2016
@stephenplusplus
Copy link
Contributor

@jgeewax @omaray

@1N50MN14
Copy link

1N50MN14 commented Oct 16, 2016

@stephenplusplus I think API could be slightly expanded and simplified imho, when I first looked into pubsub (having used common alternatives in the past), I was expecting something like so:

/*
 - if topic exists, reuse otherwise auto create 
 - publish()/subscribe() topic arg can be a string
 - wildcard support for topics, then subjects (aka subscription ids) make more sense
 - subjects are optional, default to *
 - simple publish API pubsub.publish(topic, subject[optional], msg), 
   handle all logic internally
*/

var pubsub = gcloud.pubsub(conf);

// BROADCAST - all subscribers receive the message
pubsub.subscribe(topic, function(msg, subject) {
  console.log('received message', msg, 'on subject', subject);
});

// WORKERS/QUEUE - message delivered to only one subscriber (ack built in)
pubsub.subscribe(topic, {queue:'myjob.workers'}, function(msg) {
  console.log('starting job');
})

// REQUEST/REPLY - unsubscribe after X responses are received (1 in below example)
pubsub.subscribe(topic, function(msg, replyTo) {
  var response = {foo:'bar'};
  pubsub.publish(replyTo, response);
});

pubsub.request(topic, msg, {autoUnsubscribe:1}, function(response) {  
  console.log('receive response', response);
});

I'm going to attempt to wrap the current API see if I can accomplish the above

@stephenplusplus
Copy link
Contributor

Thanks! I also like the idea of creating a subscription without requiring a name, although I think keeping the event emitter works well with the Pub/Sub model:

topic.subscribe(function(err, subscription) {
  if (err) // API error, subscription couldn't be created
  subscription.on('message', function(message) {})
});

If we wanted to add a shortcut, maybe something like:

topic.onMessage(function(err, message) {
  if (err) // API error, subscription couldn't be created
  // subscription created & registered for message events
});

I think that extra functionality you drew up would fit well in a library that wraps ours.

@stephenplusplus stephenplusplus removed this from the 1.0 milestone Nov 15, 2016
@stephenplusplus
Copy link
Contributor

Here's where we stand:

  • remove opts.reuseExisting from topic.subscribe('name', opts)
  • automatically generate a subscription name for the user if they don't provide one

@jgeewax explained the worker vs broadcast model:

A worker takes an item from the message queue, works, acks. The name of the subscription is important to the worker, so the same work isn't performed on the same message multiple times.

The broadcast model is for a listener who just wants to tap into the incoming messages. The name of the subscription doesn't matter in this case.

Before

topic.subscribe('my-subscription', { reuseExisting: true }, function(err, subscription) {
  // subscription was either created or re-used
})

After

topic.subscribe('my-subscription', function(err, subscription) {
  // subscription was either created or re-used
})

To differentiate between the worker and broadcast models, we will use the presence of a subscription name in place of opts.reuseExisting. So, if I just want to receive all of the messages:

topic.subscribe(function(err, subscription) {
  // subscription with a random name was created
})

Any corrections or thoughts before I move forward with a PR?

@stephenplusplus
Copy link
Contributor

PR sent in #1799

@art-pumpkin
Copy link

art-pumpkin commented Oct 1, 2018

This was modified in pull 226. googleapis/nodejs-pubsub#226 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

7 participants