-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OTel]: enable persistence by default in our OTel distribution #5347
Conversation
This pull request does not have a backport label. Could you fix it @VihasMakwana? 🙏
NOTE: |
changelog/fragments/1724406973-enable-persistence-by-default.yaml
Outdated
Show resolved
Hide resolved
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
This pull request does not have a backport label. Could you fix it @VihasMakwana? 🙏
NOTE: |
Quality Gate passedIssues Measures |
@elastic/elastic-agent-control-plane can someone take a look here? |
Do we also need to update the sample yaml configurations that onboarding uses? |
Good catch. I think we should. They live here: https://github.com/elastic/elastic-agent/tree/main/internal/pkg/otel/samples |
@strawgate @ycombinator those configs already use the filestorage extension.
|
It seems like If this PR changes the default then can these changes be removed from the onboarding yaml? |
Not really, If you take a look here elastic-agent/internal/pkg/otel/run.go Lines 91 to 99 in f32e874
I only override STORAGE_DIR if it's not set by the user (by default I use elastic-agent/data/registry/otelcol , but we can change the default).
If a user has set the |
For the first part of my question though -- It seems like STORAGE_DIR is probably too general to be using as an environment variable here? |
Sorry, I forgot to add it in above comment. I agree with you. It's too general. I believe we should rename it in context of agent and otel. |
So doesn't this mean that we no longer need to define this directory in the onboarding yaml? Do we want to create this folder if it doesn't exist? |
@strawgate In the onboarding yaml, we use the env variable elastic-agent/internal/pkg/otel/samples/linux/logs_metrics_traces.yml Lines 50 to 52 in fd477ec
Yes. I've added that step in this PR. |
This is no longer necessary though after this PR though because if it's not set, it defaults to The onboarding flow uses this yaml and replaces the environment variables with actual values -- so it actually replaces |
I believe it's necessary to set it in the onboarding YAML. If you review the filestorage extension documentation, it specifies that the default directory used is If we omit the Let me give an example to make this clear (hopefully ;)), if a user has set However, if we remove the Therefore, omitting that setting means that the otel will not use the user-specified directory ( Does that make sense? |
That being said, If the OpenTelemetry Collector used env:storage_dir by default, your point would be valid. |
@strawgate your understanding is correct. elastic-agent/internal/pkg/otel/samples/linux/logs_metrics_traces.yml Lines 50 to 52 in fd477ec
|
I see, so the remaining issue (from my perspective) is just us using the |
yes.
I agree. But as of now, the extension doesn't handle that part. |
@strawgate regarding the name, I believe we should switch to |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we are ensuring the existence of a storage directory here but I am not sure that it's enough to enable persistence by default, without some default receiver/exporter/extensions/service configuration being injected.
Obviously the choice depends on what kind of compatibility we want to keep with upstream OTel collector we want to keep but in my view enabling persistence by default means that I can omit storage extensions settings in my config file and then still benefit from persistence by default but maybe I misunderstood the purpose of the change
@@ -80,3 +86,23 @@ func newSettings(version string, configPaths []string) (*otelcol.CollectorSettin | |||
DisableGracefulShutdown: true, | |||
}, nil | |||
} | |||
|
|||
func ensureRegistryExists() error { | |||
storageDir := os.Getenv("STORAGE_DIR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: reference or define a constant for the env variable name STORAGE_DIR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this should have a more specific name, we should also discuss if we should just reuse the existing STATE_PATH variable that controls the same thing.
if _, err := os.Stat(storageDir); err == nil { | ||
// directory exists | ||
return nil | ||
} else if os.IsNotExist(err) { | ||
return os.MkdirAll(storageDir, 0755) | ||
} else { | ||
return fmt.Errorf("error stating %s: %w", storageDir, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can maybe restructure this block a bit
if _, err := os.Stat(storageDir); err == nil { | |
// directory exists | |
return nil | |
} else if os.IsNotExist(err) { | |
return os.MkdirAll(storageDir, 0755) | |
} else { | |
return fmt.Errorf("error stating %s: %w", storageDir, err) | |
} | |
err := os.Stat(storageDir); | |
if errors.Is(err, fs.ErrNotExist) { | |
// create directory if it doesn't exist | |
return os.MkdirAll(storageDir, 0755) | |
} | |
if err != nil { | |
// we have a generic error | |
return fmt.Errorf("error stating %s: %w", storageDir, err) | |
} | |
return nil |
if err := ensureRegistryExists(); err != nil { | ||
return fmt.Errorf("error while creating registry: %w", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check and inject the storage
extension settings here? It may be conditional to a cli flag or some env variable or something else but I doubt very man people are gonna run the elastic otel collector with our sample config file and without that this change ensures only that a given dir exists/is created...
This pull request is named "enable persistence by default", but it does not do what the description says. I believe what this change is intending to achieve is "make sure the directory used by the File Storage extension exists", but I it does not actually do that either. What this change actually does is:
I see the following problems with this approach:
A much better way to achieve the intention "make sure the directory used by the File Storage extension exists" would be to add an option to the upstream File Storage extension like |
Another option is to also ensure that the It would also be nice to add ability to adjust the default in upstream instead of it being hard-coded. |
Thank you for your feedback! I agree that handling directory creation should ideally be managed upstream. I've submitted an enhancement request to address this. In the meantime, my current PR aimed to ensure the storage directory exists by using the storage_dir environment variable, as it's featured in our sample configs. However, this approach may lead to issues if the user doesn't include this setting in their otel config. I propose repurposing the PR to focus solely on enabling |
This pull request is now in conflicts. Could you fix it? 🙏
|
if storageDir == "" { | ||
|
||
// by default use "${path.data}/registry/otelcol" to store offsets | ||
storageDir = filepath.Join(paths.Data(), "registry", "otelcol") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably need some discussion on what our directory structure should be, considering https://github.com/elastic/ingest-dev/issues/3671 and #5304 for example.
There are also things besides filelog that may want disk persistent, who don't use a registry concept. If we needed more than this, where would they go in the directory structure? Here? What if we decided we needed more than one storage extension?
Taking a higher level look at this, I don't think we need to introduce the .go changes or the env var quite yet. These are not upstream functionality and if we expose them to users they will be hard to get rid of. Fundamentally, lack of persistence for log collection is sub-optimal but not fatal. As long as we are only considering our pure OTel collector distribution (i.e. not Beats receivers yet) we should solve this with documentation and reference examples while making or advocating for upstream changes to make the configuration changes needed simpler. +1 to the suggestion in #5347 (comment) to add a For Beats receivers, we will have to store the Filebeat registry somewhere to maintain parity with what we do today, but we already have a place to put that we could reuse without adding anything new. We may opt to change where this state is stored, but until we have to do that let's avoid introducing any new configuration or env vars unless there is a really strong reason. Adding configuration is easy, it is much harder to take away later so let's not do it until we have to and I don't think we have to yet. |
|
|
Closing this PR. |
What does this PR do?
Why is it important?
Checklist
./changelog/fragments
using the changelog toolDisruptive User Impact
How to test this PR locally
elastic-agent otel
elastic-agent otel
Related issues
Questions to ask yourself