Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated siva file contains corrupted objects #264

Closed
erizocosmico opened this issue Apr 23, 2018 · 3 comments
Closed

Generated siva file contains corrupted objects #264

erizocosmico opened this issue Apr 23, 2018 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@erizocosmico
Copy link
Contributor

ML team reported ArrayIndexOutOfBoundsException in the engine, which was strange because it seemed to be jgit's fault. After a bit of debugging I found the offending siva file, which is: /apps/borges/10k/0a0bfaa46954437548fbaeb0e19237f84e968511.siva and the object is bd7cb56b3bc934acda47089175733f625c6cdb37.

I made a reproduction case with jgit: https://github.com/erizocosmico/jgit-outofbounds, but it also happens with plain git.

$ siva unpack 0a0bfaa46954437548fbaeb0e19237f84e968511.siva
$ git show bd7cb56b3bc934acda47089175733f625c6cdb37
error: delta replay has gone wild
error: failed to apply delta
error: failed to read delta base object 543a4825f17b54479ee422a77f5fdd4f866eb839 at offset 469927 from ./objects/pack/pack-4c91f5bcbe51b8c71101ba25d1f06f441aba6650.pack
fatal: packed object bd7cb56b3bc934acda47089175733f625c6cdb37 (stored in ./objects/pack/pack-4c91f5bcbe51b8c71101ba25d1f06f441aba6650.pack) is corrupt

So, it seems we're writing siva files with corrupted objects and/or deltas. Which is weird is the fact that go-git is perfectly able to read that siva file.

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"os"

	git "gopkg.in/src-d/go-git.v4"
	"gopkg.in/src-d/go-git.v4/plumbing"
)

func main() {
	wd, err := os.Getwd()
	assert(err)

	r, err := git.PlainOpen(wd)
	assert(err)

	blob, err := r.BlobObject(plumbing.NewHash("bd7cb56b3bc934acda47089175733f625c6cdb37"))
	assert(err)

	rd, err := blob.Reader()
	assert(err)

	bytes, err := ioutil.ReadAll(rd)
	assert(err)

	fmt.Println(len(bytes), "bytes")
}

func assert(err error) {
	if err != nil {
		log.Fatal(err)
	}
}
@r0mainK
Copy link

r0mainK commented May 21, 2018

Hey guys, just wanted to let you know that this bug seems to be quite frequent at the subdirectory level, for the version of PGA we have on local gpu's HDFS it seems to affect about 1 / 10 (I have only preprocessed ~140 / 256 subdirs so it might be more/less). No pressure, its not rly a blocker for ML, Gabor is going to try and find the exact Siva files in each subdir with this bug so we can exclude them and not loose too much data, but yeah just a heads up to say that this doesnt seem to be rare

@ajnavarro ajnavarro reopened this May 22, 2018
@erizocosmico erizocosmico self-assigned this May 22, 2018
@erizocosmico
Copy link
Contributor Author

erizocosmico commented May 23, 2018

It is reproducible with the following repo https://github.com/zfcampus/zf-oauth2 even with current borges master.

Go-git can clone this repo just fine, so the problem must be inside borges.

@kuba--
Copy link
Contributor

kuba-- commented Jun 8, 2018

Fix PR: #299

@kuba-- kuba-- closed this as completed Jun 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants