Do not merge: Add ip node spec #6

gandhiuw · 2018-05-09T07:37:39Z

Initial pull request for evaluation and discussion

… to do this

gandhiuw · 2018-05-09T07:38:55Z

src/main/scala/com/whitepages/cloudmanager/state/SolrState.scala

+    val clusterStateNodeList = if (allowOfflineReferences) allNodes else liveNodes
+
+    //First try by matching to the fully qualified domain name
+    val dnsNodeList: Map[String, String] = dnsNameMap(clusterStateNodeList)


Couldn't think of an idiomatic way to implement this without refactoring each individual filter into a method and use case classes

gandhiuw · 2018-05-09T07:41:24Z

src/main/scala/com/whitepages/cloudmanager/state/SolrState.scala

+    */
+  def unambiguousFragment(fragment: String, nodeComparisonMap: Map[String,String]): Option[String] = {
+    findUnambigousNode(nodeComparisonMap, (s: String) => s == fragment)
+      .orElse(findUnambigousNode(nodeComparisonMap, (s: String) => s.contains(fragment)))


I think we should get rid of the .contains, it prohibits exact IP/Hostname specification. I believe this happens with the existing version of solr cloud manager also.
For e.g specifying --nodes 1.0.0.8 selects nodes 1.0.0.8,1.0.0.81 and 1.0.0.82

I agree "contains" is a poor choice for partial name matching. I liked the idea of being able to say --nodes foo2,foo3 or perhaps --nodes 0.8,0.81,0.82 instead of repeating all the identical fully qualified info every time though, and I wasn't too bothered by 1.0.0.8 matching 1.0.0.8 and 1.0.0.81 because if I do actually have both, it should throw the ambiguous node spec message with the conflicting possibilities.

Perhaps a smarter comparison would do literal matches, but within the context of period-separated chunks. Again, just looking to see if there's a single node that matches while using that comparison function.

So the ambiguous node exception will not be thrown every time, IMO it should be. For e.g. if you invoke waitactive with an IP that matches multiple nodes, the call just sums up the replica count across those nodes and returns their state e.g. "All 18 shards were active". IMO we should enforce really tight matching in this case.

randomstatistic

So I think this is starting to get concealed by the current implementation details, but at a high level, the purpose of this code is to try to find one or more nodes that match a string, according to a sequence of (increasingly fuzzy) comparisons.

For situations where the string is expected to allow matching of multiple nodes ("all", "regex", etc), you work through a list of comparison approaches until one finds non-zero matches, then stop running comparisons and use those. Error if the final result is an empty list.
For situations where the string is expected to match a single node, you try comparison approaches until one gives you exactly one node, then you stop running comparisons and use that. Error if the final result size != 1. (maybe accumulate any ambiguous results for the error message)

randomstatistic · 2018-05-09T09:37:08Z

src/main/scala/com/whitepages/cloudmanager/state/SolrState.scala

@@ -87,34 +88,85 @@ case class SolrState(state: ClusterState, collectionInfo: CollectionInfo, config
  lazy val activeReplicas = allReplicas.filter(_.active)
  lazy val inactiveReplicas = allReplicas.filterNot(activeReplicas.contains)

+  /**
+    * Returns all replicas for a given collection


This comment doesn't tell me anything that isn't in the function name.

randomstatistic · 2018-05-09T09:37:31Z

src/main/scala/com/whitepages/cloudmanager/state/SolrState.scala

  def replicasFor(collection: String): Seq[SolrReplica] = allReplicas.filter(_.collection == collection)
+
+  /**
+    * Returns all replicas for a given collection, slice combination


randomstatistic · 2018-05-09T09:39:29Z

src/main/scala/com/whitepages/cloudmanager/state/SolrState.scala

        case i =>
+          //If a comma separated list of nodes is specified, then for each node


I find this comment misleading, as in this context, we're evaluating a single indicator. There's no list.

randomstatistic · 2018-05-09T09:46:19Z

src/main/scala/com/whitepages/cloudmanager/state/SolrState.scala

        case i =>
+          //If a comma separated list of nodes is specified, then for each node
          val nodeName = Try(Seq(canonicalNodeName(i, allowOfflineReferences))).recover({


But on a related note, maybe the Seq wrapper here is confusing and could be removed.

randomstatistic · 2018-05-09T10:57:42Z

src/main/scala/com/whitepages/cloudmanager/state/SolrState.scala

+    */
+  def unambiguousFragment(fragment: String, nodeComparisonMap: Map[String,String]): Option[String] = {
+    findUnambigousNode(nodeComparisonMap, (s: String) => s == fragment)
+      .orElse(findUnambigousNode(nodeComparisonMap, (s: String) => s.contains(fragment)))


I agree "contains" is a poor choice for partial name matching. I liked the idea of being able to say --nodes foo2,foo3 or perhaps --nodes 0.8,0.81,0.82 instead of repeating all the identical fully qualified info every time though, and I wasn't too bothered by 1.0.0.8 matching 1.0.0.8 and 1.0.0.81 because if I do actually have both, it should throw the ambiguous node spec message with the conflicting possibilities.

Perhaps a smarter comparison would do literal matches, but within the context of period-separated chunks. Again, just looking to see if there's a single node that matches while using that comparison function.

randomstatistic · 2018-05-09T10:59:44Z

src/main/scala/com/whitepages/cloudmanager/state/SolrState.scala

+    * @param comparison        function to use for comparison
+    * @return
+    */
+  def findUnambigousNode(nodeComparisonMap: Map[String,String], comparison: (String) => Boolean): Option[String] = {


Maybe this should be changed to just return the matching node list, and have the test for "single-match" elsewhere. Then we can reuse it in the regex side of things too.

gandhiuw · 2018-05-09T17:19:44Z

IMO, the comparisons should be fuzzy only when the user passes in regular expressions. When using the "--nodes" argument to specify nodes, the expectation should be that his "indicator" match one node only. These indicators could be the exact host name or IP (circumventing the problem of an indicator matching multiple nodes). The changes I've made in this PR enable the user to pass in specific IPs as well (ipNameMap). I'd like to remove the contains altogether and enforce the condition that one indicator match one node only (this happens only in one specific case atm)

gandhiuw · 2018-05-09T17:33:02Z

"For situations where the string is expected to match a single node, you try comparison approaches until one gives you exactly one node, then you stop running comparisons and use that. Error if the final result size != 1. (maybe accumulate any ambiguous results for the error message)"
I've modified the canonicalNodeName method to try matching DNS names first, then IPs and then attempt to resolve the indicator and match it with nodes from the cluster state.

The getNodeListUsingRegEx uses a similar multi-phase approach

…nd to determine success

agandhi added 5 commits May 8, 2018 09:16

Test commit: Adding method comments

bc127a2

Some more comments

c5bf434

Saving progress

4c11ad2

Some more changes, method comments

92f8812

Refactored getNodeListUsingRegEx, couldn't figure out a idiomatic way…

774e9f4

… to do this

gandhiuw requested a review from randomstatistic May 9, 2018 07:37

gandhiuw commented May 9, 2018

View reviewed changes

randomstatistic added the question label May 9, 2018

randomstatistic reviewed May 9, 2018

View reviewed changes

Removing unwanted comments

5a4ddfa

Refactored to use common methods that take functions for comparison a…

8fad29c

…nd to determine success

psutton approved these changes May 10, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not merge: Add ip node spec #6

Do not merge: Add ip node spec #6

gandhiuw commented May 9, 2018

gandhiuw May 9, 2018

gandhiuw May 9, 2018

randomstatistic May 9, 2018

gandhiuw May 9, 2018

randomstatistic left a comment

randomstatistic May 9, 2018

gandhiuw May 9, 2018

randomstatistic May 9, 2018

gandhiuw May 9, 2018

randomstatistic May 9, 2018

randomstatistic May 9, 2018

randomstatistic May 9, 2018

randomstatistic May 9, 2018

gandhiuw commented May 9, 2018

gandhiuw commented May 9, 2018

		case i =>
		//If a comma separated list of nodes is specified, then for each node

Do not merge: Add ip node spec #6

Are you sure you want to change the base?

Do not merge: Add ip node spec #6

Conversation

gandhiuw commented May 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

randomstatistic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gandhiuw commented May 9, 2018

gandhiuw commented May 9, 2018