[Tiny EPIC] Support SOLR standalone on ECS #3826

jbrown-xentity · 2022-05-13T20:07:24Z

Purpose

We want to a security compliant SOLR, but we're not sure how to do that.

Given above this need, conducting ECS deployment with SOLR8 image is needed to provide factual knowledge on future steps.

Acceptance Criteria

[ACs should be clearly demo-able/verifiable whenever possible. Try specifying them using BDD.]

GIVEN a CKAN system needs a SOLR instance
WHEN 5 days have expired
THEN CKAN can connect to SOLR
AND an index is started
AND the necessary stories are created to implement this in a production way

Background

To work around security issues around supporting EKS in AWS.
We already deployed SOLR8 on EKS in AWS, should be able to be ported to terraform.

Sketch

Feasibility Testing

Make service broker-able

Package terraform code into brokerpak definition
Rework processes and procedures to support AWS Credentials in repo
- Copy local testing from eks-brokerpak
- Setup Github Action secrets for CI tests to pass
(Local) Make sure solr-on-ecs service can provision/bind and unbind/deprovision
Preserve the original solrcloud service as a sibling service that the solr brokerpak supports
(Local) Make sure solr-cloud service can still provision/bind and unbind/deprovision
Get Github Action tests working
Solr on ECS GSA-TTS/datagov-brokerpak-solr#36 (merge this)

Follow-on work

Test reindex speed on a production-comparable DB size
Test harvesting reliability
Perform load testing to gauge the performance of solr standalone
Make a new ticket for Leader-Follower paradigm (if necessary)
Make a new ticket for EFS performance boosting (if necessary)

Example: Add Leader-Follower paradigm

Make a new PR
Add new ECS services for the followers
Create script to initialize Follower containers
Ensure Leader and Follower Solr instances can communicate

Example: Boost EFS performance

Make a new PR
Try setting an initial size for EFS volume
Test Throughput options on EFS
Test forcing a single AZ to see if there is a cross-availibility-zone issue
Try Max IO Mode
Try some suggestions here, https://docs.aws.amazon.com/efs/latest/ug/performance-tips.html

The text was updated successfully, but these errors were encountered:

nickumia-reisys · 2022-09-12T13:21:29Z

List of references (in case we ever need them..):

nickumia-reisys · 2022-09-12T13:22:40Z

Two more that have to do with EFS encryption things:

nickumia-reisys · 2022-11-11T20:57:58Z

As a retrospective comment to anyone who wanders onto this ticket (probably future-me 😅), this was a very important effort in the cloud migration of Data.gov apps, specifically https://catalog.data.gov and https://inventory.data.gov. I attempted to link all of the follow-on work that was needed after this pivotal ticket occurred that helped catapult this over the finish line; however, there's probably a few that still slipped through the cracks.

I consider this one of my biggest contributions to Data.gov. There were many possible paths for how Data.gov could procure a production Solr setup. Prior to me joining Data.gov, there was work to create Solr as an application on cloud.gov. This was made impractical because CloudFoundry only allows a maximum of 6GB of persistent storage and our Solr instance (as of writing) requires ~22GB. The next step was converting the app into a custom cloud.gov service based on Apache's solr-operator/solrcloud and AWS's EKS architecture. There were many forks and convoluted paths that we (@mogul, me and others before me) struggled with in its design. The EKS Brokerpak is still alive and very much practical for other projects. While this path never hit a hard wall or dead-end, the entire design was overly complex and had many, many moving parts. It was abandoned when there were a host of mystifying errors and bugs with indiscernible causes, specifically about solrcloud. It was at this point that inspiration from @jbrown-xentity led us to a pure Solr implementation on ECS (this ticket). This simplified our design on two fronts: (1) ECS had better defaults than EKS for us. It was like using docker-compose over kubernetes. (2) Data.gov was more familiar with solr than solrcloud. It had a Solr deployment on it's older platform and the team had better confidence Solr would be more stable.

In terms of how this was implemented, I don't really like it ( ...I know I wrote it ). As a relatively large user and producer of open-source code, Data.gov strives to stick closely with the communities we pull from and give back meaningful contributions as well. The less customization we have, the better we're able to develop, stay secure and remain integrated. Our Solr deployment is very special. It is at the intersection of many open-source communities ... Solr ... CKAN ... AWS ... Terraform ... Cloud.gov ... Brokerpaks ... (and maybe a few more) I feel like there were too many customizations to this code to meet the unspoken requirements of Data.gov. This is code we had to write because we couldn't borrow from something that existed already and it isn't well-abstracted for others to use it to do anything else. I believe this was a necessary evil for the position we were in, but going forward this will likely face a painful death in the future.

With more than 3 months of production catalog (and 4 months of production inventory) using this code, I think it's safe to say that it is rather successful. There were a few bugs and concerns after the initial release; but thanks to @FuhuXia's diligence, we've been able to monitor Solr's performance and health and ensure problems are taken care of. From the initial release of the Leader-Follower design, there has not been major changes to the core code or infrastructure. And I take that as a win.

I'm leaving this comment as a reminder for me and as counsel for whoever this code may effect in the future. This endeavor was an unwelcomingly large part of my life for almost a year. It wasn't very fun working on this. Did I learn a lot? Yes. Was it challenging? I don't think for the right reason haha, but yes. Probably, the only thing that got me through this was the encouragement, support and guidance that I received from my team. Very warm and hearty thanks to @mogul @jbrown-xentity @FuhuXia 🙇

Follow-on tickets:

jbrown-xentity added the component/solr-service Related to Solr-as-a-Service, a brokered Solr offering label May 13, 2022

nickumia-reisys self-assigned this May 17, 2022

nickumia-reisys changed the title ~~Test SOLR leader-follower on ECS~~ [Tiny EPIC] Support SOLR leader-follower on ECS May 27, 2022

This was referenced May 27, 2022

Use standalone Solr on ECS GSA/catalog.data.gov#460

Merged

Solr on ECS GSA-TTS/datagov-brokerpak-solr#36

Merged

Minor Improvements to ECS Solution GSA-TTS/datagov-brokerpak-solr#37

Merged

NEW Service Plan - Solr on ECS GSA/datagov-ssb#140

Merged

nickumia-reisys changed the title ~~[Tiny EPIC] Support SOLR leader-follower on ECS~~ [Tiny EPIC] Support SOLR standalone on ECS Jun 9, 2022

nickumia-reisys added this to the Sprint 20220609 milestone Jun 9, 2022

nickumia-reisys closed this as completed Jun 9, 2022

mogul added the Epic label Aug 3, 2022

nickumia-reisys mentioned this issue Aug 22, 2022

[CLEANUP] Delete Solr 6 App + EKS Cluster GSA/datagov-ssb#157

Merged

nickumia-reisys mentioned this issue Dec 10, 2022

[SC-23] Ensure all inter-component traffic within Solr is sent over TLS #4119

Closed

10 tasks

This was referenced Feb 2, 2023

SOLR Classic configuration test #3780

Closed

Security scanning for SSB/SOLR (test) #3799

Closed

This was referenced Feb 4, 2023

Playbooks should have a clear distinction between application and platform #544

Closed

Better health checks and monitoring for solr service #930

Closed

nickumia-reisys added component/ssb Feature CI/CD labels Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tiny EPIC] Support SOLR standalone on ECS #3826

[Tiny EPIC] Support SOLR standalone on ECS #3826

jbrown-xentity commented May 13, 2022 •

edited by nickumia-reisys

Loading

nickumia-reisys commented Sep 12, 2022

nickumia-reisys commented Sep 12, 2022

nickumia-reisys commented Nov 11, 2022

[Tiny EPIC] Support SOLR standalone on ECS #3826

[Tiny EPIC] Support SOLR standalone on ECS #3826

Comments

jbrown-xentity commented May 13, 2022 • edited by nickumia-reisys Loading

Purpose

Acceptance Criteria

Background

Sketch

Feasibility Testing

Make service broker-able

Follow-on work

nickumia-reisys commented Sep 12, 2022

nickumia-reisys commented Sep 12, 2022

nickumia-reisys commented Nov 11, 2022

jbrown-xentity commented May 13, 2022 •

edited by nickumia-reisys

Loading