Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/r8 #309

Closed
wants to merge 0 commits into from
Closed

Release/r8 #309

wants to merge 0 commits into from

Conversation

oguzhanunlu
Copy link
Member

@oguzhanunlu oguzhanunlu commented Dec 29, 2017

@oguzhanunlu oguzhanunlu added this to the Release 8 Basel Dove milestone Dec 29, 2017
@oguzhanunlu oguzhanunlu self-assigned this Dec 29, 2017
@alexanderdean
Copy link
Member

Adding @chuwy to review!

Copy link
Contributor

@chuwy chuwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @oguzhanunlu, we definitely need many fixes around implementation of #221. This is most important and interesting (for me personally) ticket in this release.

/**
* Type representing value for `description` key
*
* @see http://json-schema.org/latest/json-schema-validation.html#rfc.section.7.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link describes "Implementation Requirements", not description.

val lintDescriptionPresent: Linter = (schema: Schema) => {
(schema.description) match {
case Some(_) => propertySuccess
case None => s"${schema.`type`.get} Schema doesn't contain description property".failure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never ever use Option#get. This linter will fail with runtime exception for any schema that has no description and no type, which is extremely common.

(schema.description) match {
case Some(_) => propertySuccess
case None => s"${schema.`type`.get} Schema doesn't contain description property".failure
case _ => propertySuccess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this case for? schema.description: Option[_] is either Some(_) or None. Nothing else is possible.

List(s"Warning: File [${input.getAbsolutePath}] contains a schema whose version is NOT 1-0-0")
else List.empty
val schemaVerError: List[String] =
if(input.isDirectory && versions.head != "1-0-0" && !force)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine: input is a com.acme directory with files:

  • com.acme/event/jsonschema/1-0-0
  • com.acme/context/jsonschema/1-0-1
  • com.acme/context/jsonschema/1-0-2

1-0-0 is head? Yep. Is it correct? Nope.

Scala core primitives allow you to group schemas by name and check presence of 1-0-0 version for each particular schema. This is again benefit of "structured" (SchemaMap) types versus "unstructured" (String).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are definitely right. I forgot to consider name.

else if(input.isDirectory && versions.head == "1-0-0" && existMissingSchemaVersion(versions) && !force){
List(s"Error: Directory [${input.getAbsolutePath}] has gaps between schema versions. You can use --force to override.")
}
else List.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken formatting. Should be } List.empty on single line. Also, List.empty[Foo] is useful to help compiler with type inference, but here Nil can be used.

List(s"Warning: File [${input.getAbsolutePath}] contains a schema whose version is NOT 1-0-0")
else List.empty
val schemaVerError: List[String] =
if(input.isDirectory && versions.head != "1-0-0" && !force)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed space: if (input....

@@ -92,6 +93,20 @@ case class GenerateCommand(
// Parse Self-describing Schemas
val (schemaErrors, schemas) = splitValidations(files.map(_.extractSelfDescribingSchema))

// Check schema versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, refactor this as a separate function and implement unit tests for it. Right now it has too many wrong outputs (read below) - especially we want to avoid runtime exceptions.

private[ctl] def existMissingSchemaVersion(versions: List[String]): Boolean = {
val numOfVersions = versions.length

if (numOfVersions == 1) return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never ever use return in Scala. https://tpolecat.github.io/2014/05/09/return.html


if (numOfVersions == 1) return false

val versionsArr = versions.map(version => version.split('-').map(_.toInt))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be List[SchemaVer], but we casted it to unnecessary vague List[String] and now trying to get back structure of List[SchemaVer].

There's a notion of total and partial functions. If you say that it accepts List[String] and returns Boolean then it means that it works for any list of strings. But it will fail (with runtime exceptions - String#toInt, List#head!) for way too many strings. You can avoid this by switching back to List[SchemaVer].

val schemaVerError: List[String] =
if(input.isDirectory && versions.head != "1-0-0" && !force)
List(s"Error: Directory [${input.getAbsolutePath}] does NOT contain a schema whose version is 1-0-0. You can use --force to override.")
else if(input.isDirectory && versions.head == "1-0-0" && existMissingSchemaVersion(versions) && !force){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, directory com.acme with:

  • com.acme/event/jsonschema/1-0-0
  • com.acme/context/jsonschema/1-0-1
  • com.acme/context/jsonschema/1-0-2

has no gaps (List(1-0-0, 1-0-1, 1-0-2)) according to this function, but obviously incorrect overall.

Think of this function working with schemas directory from https://github.com/snowplow/iglu-central, not jsonschema from https://github.com/snowplow/iglu-central/tree/master/schemas/com.mailchimp/cleaned_email

@oguzhanunlu
Copy link
Member Author

Hi @chuwy , I clearly see that there is a mess with #221 , I'll fix by following your recommendations.

@oguzhanunlu oguzhanunlu force-pushed the release/r8 branch 7 times, most recently from 5f13ad5 to 4eb4f86 Compare January 16, 2018 22:43
@chuwy
Copy link
Contributor

chuwy commented Jan 18, 2018

Hey @oguzhanunlu,

I'm trying to compile local copy of Igluctl from release/r8 and SBT fails with following error:

info] Resolving com.snowplowanalytics#iglucorejson4s_2.12;0.2.0 ...
[warn]  module not found: com.snowplowanalytics#iglucorejson4s_2.12;0.2.0
[warn] ==== local: tried
[warn]   /Users/chuwy/.ivy2/local/com.snowplowanalytics/iglucorejson4s_2.12/0.2.0/ivys/ivy.xml
[warn] ==== public: tried
[warn]   https://repo1.maven.org/maven2/com/snowplowanalytics/iglucorejson4s_2.12/0.2.0/iglucorejson4s_2.12-0.2.0.pom

Do you have any idea what happened? Could you try to reproduce it after deleting cache (~/.ivy2/local and ~/.ivy2/cache) and doing sbt clean?

By the way - really awesome work on #221!

@oguzhanunlu
Copy link
Member Author

oguzhanunlu commented Jan 18, 2018

Hi @chuwy , I tried to reproduce by deleting ~/.ivy2 and all target folders under iglu followed by sbt clean but I couldn't reproduce, sbt builds as expected. Then I noticed that URL within your error log isn't as expected, expected URL contains iglu-core-json4s, not iglucorejson4s. I'm not sure why sbt would request that but current release/r8 branch indicates the one with dashes. Thank you for the nice words!

private[ctl] def validateSchemaVersions(schemas: List[IgluSchema]): (List[String], List[String]) = {

def existMissingSchemaVersion(schemaMaps: List[SchemaMap]): Boolean = {
val numOfMaps = schemaMaps.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you're processing List with pattern like this (if empty then X, if non-empty then Y) and call head or tail - you need to use pattern-matching:

    def existMissingSchemaVersion(schemaMaps: List[SchemaMap]): Boolean =
      schemaMaps match {
        case Nil => false
        case _ :: Nil => false
        case h :: tail => {
          val prevModel    = h.version.model
          val prevRevision = h.version.revision
          val prevAddition = h.version.addition
          val curModel     = tail.head.version.model
          val curRevision  = tail.head.version.revision
          val curAddition  = tail.head.version.addition

          ??? // same implementation
        }
      }

This is form of Mathematical induction. You basically step-by-step reduce whole input to a single simplest step (no schemas = no error). This is a basis for recursive functions (btw, your function is tail-recursive - don't forget to add corresponding annotation).

if (numOfMaps == 1){
false
} else {
val prevModel = schemaMaps.head.version.model
Copy link
Contributor

@chuwy chuwy Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation itself can also be little bit generalized and simplified (later statement might sound questionable for you now, but will be clearer when we'll have a chance to talk about Iglu ASTs) using two following facts:

  1. SchemaVer is an ordered set (someone may argue that it's a poset, but let's omit it for a moment). It means that we can build a List of SchemaVer from "lowest" to "highest" by (versions: List[SchemaVer]).sorted) (instance for Ordering is in iglu-core)
  2. We can define very general (so general that they can live in scala-core) functions:
def modelSucc(a: SchemaVer.Full, b: SchemaVer.Full): Boolean = a.model + 1 == b.model
def revisionSucc(a: SchemaVer.Full, b: SchemaVer.Full): Boolean = a.revision + 1 == b.revision
def additionSucc(a: SchemaVer.Full, b: SchemaVer.Full): Boolean = a.addition + 1 == b.addition
def succ(a: SchemaVer.Full, b: SchemaVer.Full): Boolean = modelSucc(a, b) || revisionSucc(a, b) || additionSucc(a, b)

These functions tell us if a in some way is successor of b (succ originates from Haskell).

Having these two facts we can write very simple function checking if in row of ordered SchemaVer sneaked a gap.

case class Acc(previous: Option[Full], gaps: List[SchemaVer])
object Acc {
  def empty: Acc = Acc(None, Nil)
}

orderedSchemaVers.foldLeft(Acc.empty) { (acc, cur) =>
  acc match {
    case Acc(None, gaps) if cur == Full(1,0,0) => Acc(Some(Full(1,0,0)), gaps)
    case Acc(None, gaps) => Acc(Some(cur), Full(1,0,0) :: gaps)
    case Acc(Some(prev), gaps) if succ(prev, cur) => Acc(Some(cur), gaps)
    case Acc(Some(prev), gaps) => Acc(Some(cur), cur :: gaps)
  }
}

This is quick and dirty implementation (not even entirely correct - it saves not "gaps", but latest found vers, up to you to fix it), but idea should be clear. I want this general functions to live somewhere is upstream libraries (Scala Core or Schema DDL) because this is a basis for a consistent Iglu repositories, where each schema does not live in isolation, but has a "connection" to its successors and predecessor and therefore basis for a schema inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying you need to implement it in R8, but this is a main reason why I was so excited about #221. If you're not going to implement this solution in R8 - then please create a ticket with link (or cite) to above message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #315 for this feedback. I really liked the ideas, thanks for the feedback @chuwy !

}

if (schemas.empty) {
(List.empty[String], List.empty[String])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to use private case class MessageAccumulator(warnings: List[String], errors: List[String])


result.warnings.foreach(printMessage)
val dirErr = result.warnings.filter(w => w.startsWith("Error: Directory"))
if (!dirErr.isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!x.isEmpty === x.nonEmpty

.map(_.write(force)).foreach(printMessage)

result.warnings.foreach(printMessage)
val dirErr = result.warnings.filter(w => w.startsWith("Error: Directory"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this check does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema version errors happen due to either
1 - Input directory not containing schema version 1-0-0 or
2 - Input directory containing gap(s) between schema versions

per issue strategy which indicates to refuse to do anything if either 1 or 2 occur.

Their error messages start with "Error: Directory..." and I want to check if that happened, before outputting anything else. I didn't want to insert any exit mechanism to transformSelfDescribing hence the current implementation. It doesn't sound much clever to continue program after this line but outputResult is only place to look at for outputting any end result. Sure it would take shorter to exit if I had inserted. Also, current check is a loose connection with #221 and error outputting which could be a hidden tiny bug in future error-related issues. Which way should we follow @chuwy ?

case (Nil, errors) =>
println(errors.mkString("\n"))
false
case (List(_), List(_)) => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this last match means if we have one warning and one error then everything is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chuwy , actually this is the case that never gets executed but inserted here to be able to compile. validateSchemaVersions never return warnings together with errors, it is either (no warning, no error), (warning, no error) or (no warning, error) per issue strategy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points here:

  1. If you need a universal fallback case, you write it as case _ => true
  2. It again looks like a way to trick the compiler and developer who doesn't know about context. In languages with strong type discipline like Scala, one of the most important purposes of types is to provide developer enough information to reason about code using smallest possible scope, ideally just types. Therefore it's easier to spot bugs and test the code. Here developer needs to look into different functions/modules and examine this code to understand that we have a line that never gets executed. That's okay for now, but I'd like to see this code refactored someday (of course, it was me who started this mess, but I was young back then!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a short comment for now.

@chuwy
Copy link
Contributor

chuwy commented Jan 18, 2018

I tried to reproduce by deleting ~/.ivy2

Yeah, I had outdated schema-ddl artifact in my own ivy2 cache. Sorry for confusion.

@oguzhanunlu oguzhanunlu force-pushed the release/r8 branch 12 times, most recently from 8f8f4c2 to 3ceebc7 Compare January 23, 2018 15:03
action { (x, c) => c.copy(linters = x) }
valueName "<linters>"
text "Lint without specified linters, given comma separated\tDefault: None\n" +
"\t\t\t All linters and their explanations are below.\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been derived from allowedToBeExcluded or other place where we have names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants