Skip to content
This repository has been archived by the owner on Jan 21, 2020. It is now read-only.

Use containerd as source of truth #21

Merged
merged 1 commit into from
Jan 20, 2020
Merged

Use containerd as source of truth #21

merged 1 commit into from
Jan 20, 2020

Conversation

carlosedp
Copy link
Contributor

@carlosedp carlosedp commented Jan 8, 2020

Description

This allow persisting state between restarts by reading containers from
containerd on certain task status.

Motivation and Context

  • I have raised an issue to propose this change this is required

Currently faas-containerd does not persist data between restarts so deployed functions cannot be used if faas-containerd process is restarted.

Addresses issue #20 .

How Has This Been Tested?

Verified the basic service functions to Update, Add, Get and List the containers. Still work-in-progress.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Commits:

  • I've read the CONTRIBUTION guide
  • My commit message has a body and describe how this was tested and why it is required.
  • I have signed-off my commits with git commit -s for the Developer Certificate of Origin (DCO)

Code:

  • My code follows the code style of this project.
  • I have added tests to cover my changes.

Docs:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

handlers/deploy.go Outdated Show resolved Hide resolved
handlers/read.go Outdated Show resolved Hide resolved
handlers/read.go Outdated Show resolved Hide resolved
handlers/service_map.go Outdated Show resolved Hide resolved
handlers/service_map.go Outdated Show resolved Hide resolved
handlers/service_map.go Outdated Show resolved Hide resolved
Copy link
Owner

@alexellis alexellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, as expected, just a few comments on style.

@alexellis
Copy link
Owner

How Has This Been Tested?
Verified the basic service functions to Update, Add, Get and List the containers. Still work-in-progress.

What about restarting faas-containerd (tasks stay), or restarting the host (tasks die)?

Have you tried a scale to zero? replicas should be 0 if not running, so the scale to zero pauses them.. and you should see them unpause when invoking through faasd and the gateway.

@alexellis
Copy link
Owner

I checked out the PR, built it and deployed with CNI 0.8.4 on x86_64, this is what I got

Jan 10 18:39:31 alexx systemd[1]: faas-containerd.service: Unit entered failed state.
Jan 10 18:39:31 alexx systemd[1]: faas-containerd.service: Failed with result 'exit-code'.
Jan 10 18:39:38 alexx systemd[1]: Stopped faasd-containerd.
Jan 10 18:39:38 alexx systemd[1]: Started faasd-containerd.
Jan 10 18:39:38 alexx faas-containerd[7346]: 2020/01/10 18:39:38 faas-containerd starting..        Version:         Commit:         Service Timeout: 1m0s
Jan 10 18:39:38 alexx faas-containerd[7346]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 10 18:39:38 alexx faas-containerd[7346]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x90 pc=0xf174d5]
Jan 10 18:39:38 alexx faas-containerd[7346]: goroutine 1 [running]:
Jan 10 18:39:38 alexx faas-containerd[7346]: github.com/alexellis/faas-containerd/handlers.(*ServiceMap).Update(0xc00028a6e0)
Jan 10 18:39:38 alexx faas-containerd[7346]:         /home/alex/go/src/github.com/alexellis/faas-containerd/handlers/service_map.go:46 +0x1b5
Jan 10 18:39:38 alexx faas-containerd[7346]: main.Start()
Jan 10 18:39:38 alexx faas-containerd[7346]:         /home/alex/go/src/github.com/alexellis/faas-containerd/main.go:105 +0x6e8
Jan 10 18:39:38 alexx faas-containerd[7346]: main.main()
Jan 10 18:39:38 alexx faas-containerd[7346]:         /home/alex/go/src/github.com/alexellis/faas-containerd/main.go:51 +0x20
Jan 10 18:39:38 alexx systemd[1]: faas-containerd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 10 18:39:38 alexx systemd[1]: faas-containerd.service: Unit entered failed state.
Jan 10 18:39:38 alexx systemd[1]: faas-containerd.service: Failed with result 'exit-code'.

@alexellis
Copy link
Owner

This is caused due to a lack of proper error handling, if a function returns two things - a result and an error, then you cannot use that result if there was an error.

I'd suggest the following:

func (s *ServiceMap) Update() {
	s.lock.Lock()
	defer s.lock.Unlock()

	containers, _ := s.client.Containers(s.context)
	for _, k := range containers {
		id := k.ID()
		s.containers[id] = k

		t, err := k.Task(s.context, nil)

		if err != nil {
			log.Printf("No task for %s\n", id)
		} else {
			svc, statusErr := t.Status(s.context)

			if statusErr != nil && svc.Status == "running" {
				s.servicesPID[id] = t.Pid()
				// Get container IP address
				ip, ipErr := getIP(DefaultBridgeName, int(t.Pid()))
				if ipErr != nil {
					s.services[id] = ip
				}
			}
		}
	}
}

If t doesn't exist, then update replicas to 0.

@alexellis
Copy link
Owner

alexellis commented Jan 10, 2020

After fixing that, I get:

faas-cli store deploy "ASCII Cows" --name cows1
WARNING! Communication is not secure, please consider using HTTPS. Letsencrypt.org offers free SSL/TLS certificates.

Unexpected status: 502, message: 

Function 'cows1' failed to deploy with status code: 502

faasd after merging openfaas/faasd#21 shows:

Jan 10 18:48:09 alexx faasd[13600]: 2020/01/10 18:48:09 Get http://faas-containerd:8081/system/functions: dial tcp 10.62.0.1:8081: connect: connection refused

@alexellis alexellis mentioned this pull request Jan 10, 2020
11 tasks
@alexellis
Copy link
Owner

diff --git a/handlers/service_map.go b/handlers/service_map.go
index 4e517b1..c07273b 100644
--- a/handlers/service_map.go
+++ b/handlers/service_map.go
@@ -3,6 +3,7 @@ package handlers
 import (
 	"context"
 	"fmt"
+	"log"
 	"net"
 	"sync"
 
@@ -40,15 +41,24 @@ func (s *ServiceMap) Update() {
 
 	containers, _ := s.client.Containers(s.context)
 	for _, k := range containers {
-		i := k.ID()
-		s.containers[i] = k
-		t, _ := k.Task(s.context, nil)
-		svc, _ := t.Status(s.context)
-		if svc.Status == "running" {
-			s.servicesPID[i] = t.Pid()
-			// Get container IP address
-			ip, _ := getIP(DefaultBridgeName, int(t.Pid()))
-			s.services[i] = ip
+		id := k.ID()
+		s.containers[id] = k
+
+		t, err := k.Task(s.context, nil)
+
+		if err != nil {
+			log.Printf("No task for %s\n", id)
+		} else {
+			svc, statusErr := t.Status(s.context)
+
+			if statusErr != nil && svc.Status == "running" {
+				s.servicesPID[id] = t.Pid()
+				// Get container IP address
+				ip, ipErr := getIP(DefaultBridgeName, int(t.Pid()))
+				if ipErr != nil {
+					s.services[id] = ip
+				}
+			}
 		}
 	}
 }
diff --git a/main.go b/main.go
index 874479e..174f39c 100644
--- a/main.go
+++ b/main.go
@@ -99,6 +99,7 @@ func Start() {
 	if err != nil {
 		panic(err)
 	}
+
 	defer client.Close()
 
 	serviceMap := handlers.NewServiceMap(client, "openfaas-fn")

Feel free to apply this patch

@alexellis
Copy link
Owner

alexellis commented Jan 10, 2020

My original suggestion was to remove the service map entirely and go directly to containerd all the time, then once we would have had that stable, I would have re-written the cache and optimisations.

@alexellis
Copy link
Owner

@carlosedp please can you reply to the comments I've left for you, and see if you have addressed or have any questions?

@alexellis
Copy link
Owner

/set title: Use containerd as source of truth

@derek derek bot changed the title Read containers from containerd and store on serviceMap struct Use containerd as source of truth Jan 17, 2020
@alexellis
Copy link
Owner

@carlosedp this will need a rebase now due to the merge of #26

@carlosedp
Copy link
Contributor Author

Working on this already rebased on #26

handlers/read.go Outdated Show resolved Hide resolved
@alexellis
Copy link
Owner

I'm not seeing any changes for handlers/replicas.go, did you cover it?

Removed ServiceMap struct and changed all operations to be performed on
containerd.
All status for the running and stopped tasks are read by the current
operation being executed.

Signed-off-by: Carlos de Paula <[email protected]>
@carlosedp
Copy link
Contributor Author

Force-pushed new commit where all ServiceMap struct has been removed and all state comes from containerd.

if err != nil {
return errors.Wrapf(err, "Unable to get task for container %s: %s", name, err)
log.Printf("[Delete] container %s does not have task\n", name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably say a task, but how about: Unable to find a task for container: %s\n

image: image.Name(),
}
replicas := 0
task, err := c.Task(ctx, nil)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the task is in a frozen state, the replicas should also be 0, this could be fixed in a follow-up PR

// Task for container exists
svc, err := task.Status(ctx)
if err != nil {
return Function{}, fmt.Errorf("Unable to get task status for container: %s", name, err)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Go, errors should start with lowercase

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small bug here too, 2 args, but only one %s

}

func (i *InvokeResolver) Resolve(functionName string) (url.URL, error) {
log.Printf("Resolve: %q\n", functionName)

serviceIP := i.serviceMap.Get(functionName)
if serviceIP == nil {
fun, err := GetFunction(i.client, functionName)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write function or fn, vs fun? Just a nit

Copy link
Owner

@alexellis alexellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, just a couple of nits / questions.

Can you also confirm what paths have been tested in the latest push/change?

@alexellis alexellis merged commit 28d0dce into alexellis:master Jan 20, 2020
@alexellis
Copy link
Owner

I'm going to merge and ask for you to work out anything else that needs a change in HEAD / master.

return f, nil

}
return Function{}, fmt.Errorf("Unable to find function %s: %s", name, err)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible typo, %s: should normally be:

function: NAME

alexellis added a commit that referenced this pull request Jan 20, 2020
In Go, errors should start with lower-case. Ref: #21

Signed-off-by: Alex Ellis (OpenFaaS Ltd) <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants