Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds proposal for auto partitioning in LSO #486

Closed
wants to merge 1 commit into from

Conversation

rohan47
Copy link

@rohan47 rohan47 commented Sep 25, 2020

This PR adds a design proposal for the automatic partitioning of local devices and provisioning PVs for those partitions in LSO.

Signed-off-by: rohan47 [email protected]

@rohan47
Copy link
Author

rohan47 commented Sep 25, 2020

/assign @travisn @gnufied @leseb @jarrpa @rohantmp @sp98

@rohan47
Copy link
Author

rohan47 commented Sep 25, 2020

/assign @jsafrane

enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
Comment on lines 38 to 40
For performance reasons the user will want to use partitions.
On NVMe devices, some of the devices are capable of delevering very high IOPS per device. An OSD tends to bottleneck long before it reaches anywhere close to the device IOPS limit.
So with single OSD per NVMe device, there will be a big gap between measured or advertised capabilities of these devices and what you can get with Ceph/OCS on top of them.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an "OSD"? Why does the LSO or even OpenShift care about these things?

What I'm saying is that you need to phrase this in a way that presents a broader use-case that is not specific to OCS.

enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
- Policy for wiping device to clean the partitions
- Mixed method of PV creation is not supported. i.e., using some devices entirely and some with partitoins in a single LocalVolumeSet is not supported.

### Risks and Mitigations
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the point of the risks section. You are pointing out that we might partition the wrong disk if the user gives the wrong device filter? This risk doesn't seem any different from risks of the LocalVolumeSet already has, so maybe we can just refer back to that doc if it has a risks section.

enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
// TotalProvisionedPartitionCount is the count of the total partitions over which the PVs has been provisioned
// This will be diffrent for TotalProvisionedDeviceCount as TotalProvisionedDeviceCount is
//the count of the actual devices
TotalProvisionedPartitionCount *int32 `json:"totalProvisionedPartitionCount,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this be updated? If each node updates the status, we could get conflicts. Was totalProvisionedDeviceCount implemented? It would have the same issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalProvisionedDeviceCount is implemented, looking into the implementation details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, updating this across daemon restart may be complicated.

I would ask maybe a stupid question: why is this value important? How is it different to an user if LSO matches a disk with pre-existing paritionions and only consumes them, to LSO actually partitioning empty disk?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will remove this part.

enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved

As per the above example the administrator wants partitions of size 30G.
Lets assume that there are 5 devices of 100G that are detected and are fit for partitoning, so each device will be partitioned into 3 parts of 30G each and 10G of each device will remain unpartitioned.
Right now we are keeping it simple by partitioning by size and will move to advanced partitioning policy in the further releases to avoid having unusable disk space.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think the admins would most often carve up the devices in sizes that make sense for their disks, so this may not even be a common issue. Sounds good for a future release though if it's common enough.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rohan47
To complete the pull request process, please assign jsafrane after the PR has been reviewed.
You can assign the PR to them by writing /assign @jsafrane in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved

type PartitioningSpec struct {
// PartSize determines the exact size of each partition.
PartSize *resource.Quantity `json:"partSize"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we need to have a Count as well. It looks like a waste to use the entire device all the time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to partition the disk, seems like you'd want to partition it all at the same time since the unpartitioned space wouldn't be usable. There could certainly be other options to indicate how partitions should be created, but I want to be clear about the scenarios to keep it simple.

Copy link
Author

@rohan47 rohan47 Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to partition the disk, seems like you'd want to partition it all at the same time since the unpartitioned space wouldn't be usable. There could certainly be other options to indicate how partitions should be created, but I want to be clear about the scenarios to keep it simple.

@travisn
I didn't understand by partition it all at the same time.
Do you suggest that we should not keep count for the current scenario?

Copy link
Author

@rohan47 rohan47 Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of count as it is also a simple parameter to partition the disks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the rationale is why should we use the entire device? If we device is an NVMe it could certainly be used by other applications. I don't think we (can?) restrict disks just for Ceph.

So if we decide that Ceph needs 3 partitions of 40GiB (making this up) then the rest can still be used by other applications.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of the partitioning will be that we have a PV created on each partition. We don't know who the consumer is, it might be Ceph or it might be some other app, depending on who requests the volumes. So why wouldn't we partition the whole disk? If you're partitioning it for PVs, do we need to allow apps to use it for non-PVs? I didn't think so, but deciding not to partition the whole device can certainly be part of the policy.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the parameters, is this the suggested policy in this thread?

  • If only size is specified, create partitions for the whole disk all of that size. If there is space remaining, create a PV for the remaining amount.
  • If both count and size are specified, only create that number of partitions and leave the remaining disk unpartitioned.

That seems reasonable and simple to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the parameters, is this the suggested policy in this thread?

* If only `size` is specified, create partitions for the whole disk all of that size. If there is space remaining, create a PV for the remaining amount.

* If both `count` and `size` are specified, only create that number of partitions and leave the remaining disk unpartitioned.

My understanding of count is to create count no of partitions i.e., Size of the entire device/count
for example we have a device of total capacity of 128G and the value of count is 4, then 4 partitions of 32G each will be created.

That seems reasonable and simple to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was stuck thinking about the size as being a required setting. What about this?

  • If only size is specified, create partitions for the whole disk all of that size. If there is space remaining, create a PV for the remaining amount.
  • If only count is specified, divide the disk up into that many partitions and calculate the size.
  • If both count and size are specified, only create that number of partitions of the desired size and leave the remaining disk unpartitioned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved

### Workflow for Local PV creation via `LocalVolumeSet CR`
- The admin decides they want to create partitions on some devices in the cluster
- He creates a LocalVolumeSet CR that includes the partitioning policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably: LocalVolumeSet matching the disk they want to partition.

enhancements/local-storage/automatic-partition.md Outdated Show resolved Hide resolved
// TotalProvisionedPartitionCount is the count of the total partitions over which the PVs has been provisioned
// This will be diffrent for TotalProvisionedDeviceCount as TotalProvisionedDeviceCount is
//the count of the actual devices
TotalProvisionedPartitionCount *int32 `json:"totalProvisionedPartitionCount,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, updating this across daemon restart may be complicated.

I would ask maybe a stupid question: why is this value important? How is it different to an user if LSO matches a disk with pre-existing paritionions and only consumes them, to LSO actually partitioning empty disk?


## Partitioning policy

* Size: If only size is specified, create partitions for the whole disk all of that size. If there is space remaining, create a PV for the remaining amount.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it consistent with "If both count and size are specified" - only create partitions of the desired size and leave the remaining disk unpartitioned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean if only size is specified then create partitions of desired size and leave the remaining disk unpartitioned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@rohan47 rohan47 force-pushed the auto-partition branch 2 times, most recently from bb80b8e to 38c19c8 Compare November 20, 2020 13:25
@rohan47 rohan47 marked this pull request as draft January 5, 2021 14:20
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2021
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 5, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jun 5, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 5, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rohan47
Copy link
Author

rohan47 commented Jun 8, 2021

/reopen

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 8, 2021

@rohan47: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot reopened this Jun 8, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 8, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from jsafrane after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 8, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants