Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS targets show in confirmation even if not needed #15

Closed
alexrobbins opened this issue Jan 25, 2013 · 16 comments
Closed

HDFS targets show in confirmation even if not needed #15

alexrobbins opened this issue Jan 25, 2013 · 16 comments
Assignees

Comments

@alexrobbins
Copy link

Terminal output here: http://pastebin.com/J08GAk1Y

drake has already run once, to completion. No files have been modified. drake correctly notices this as skips all the steps. Why does it still say it is going to do the steps?

All the steps involve at least one hdfs location. A very similar workflow that was all local didn't exhibit this same behavior.

@alexrobbins
Copy link
Author

Pastebin content:

alexr@dev101:~/src/resolve/src/resolve/ml$ drake
The following steps will be run, in order:
1: hdfs://user/alexr/resolve-ml/gold-annotations <- data/gold-annotations [missing output]
2: hdfs://user/alexr/resolve-ml/good-pairs <- hdfs://user/alexr/resolve-ml/gold-annotations [projected timestamped]
3: hdfs://user/alexr/resolve-ml/uuid-and-attrs <- hdfs://user/alexr/resolve-ml/gold-annotations [projected timestamped]
4: hdfs://user/alexr/resolve-ml/all-pairs <- hdfs://user/alexr/resolve-ml/gold-annotations [projected timestamped]
5: data/good-pairs <- hdfs://user/alexr/resolve-ml/good-pairs [projected timestamped]
6: data/all-pairs <- hdfs://user/alexr/resolve-ml/all-pairs [projected timestamped]
7: hdfs://user/alexr/resolve-ml/bad-pairs <- data/all-pairs, data/good-pairs [projected timestamped]
8: hdfs://user/alexr/resolve-ml/good-pairs-with-features <- hdfs://user/alexr/resolve-ml/good-pairs, hdfs://user/alexr/resolve-ml/uuid-and-attrs [projected timestamped]
9: hdfs://user/alexr/resolve-ml/bad-pairs-with-features <- hdfs://user/alexr/resolve-ml/bad-pairs, hdfs://user/alexr/resolve-ml/uuid-and-attrs [projected timestamped]
Confirm? [y/n] y
Running 9 steps...

--- 0. Skipped (up-to-date): hdfs://user/alexr/resolve-ml/gold-annotations <- data/gold-annotations

--- 1. Skipped (up-to-date): hdfs://user/alexr/resolve-ml/good-pairs <- hdfs://user/alexr/resolve-ml/gold-annotations

--- 2. Skipped (up-to-date): hdfs://user/alexr/resolve-ml/uuid-and-attrs <- hdfs://user/alexr/resolve-ml/gold-annotations

--- 3. Skipped (up-to-date): hdfs://user/alexr/resolve-ml/all-pairs <- hdfs://user/alexr/resolve-ml/gold-annotations

--- 4. Skipped (up-to-date): data/good-pairs <- hdfs://user/alexr/resolve-ml/good-pairs

--- 5. Skipped (up-to-date): data/all-pairs <- hdfs://user/alexr/resolve-ml/all-pairs

--- 6. Skipped (up-to-date): hdfs://user/alexr/resolve-ml/bad-pairs <- data/all-pairs, data/good-pairs

--- 7. Skipped (up-to-date): hdfs://user/alexr/resolve-ml/good-pairs-with-features <- hdfs://user/alexr/resolve-ml/good-pairs, hdfs://user/alexr/resolve-ml/uuid-and-attrs

--- 8. Skipped (up-to-date): hdfs://user/alexr/resolve-ml/bad-pairs-with-features <- hdfs://user/alexr/resolve-ml/bad-pairs, hdfs://user/alexr/resolve-ml/uuid-and-attrs

Done (0 steps run).

@aboytsov
Copy link
Contributor

This is extremely strange. How is it possible that "gold-annotations" listed with "missing output" reason, but then not build with "up-to-date" reason? Does the file really exist or not? One of those is definitely wrong, but which one? Can you dig into it a little further, i.e. ls -l inputs and outputs of the first target? Thanks!

@ghost ghost assigned aboytsov Jan 25, 2013
@alexrobbins
Copy link
Author

So, after further investigation, this issue is intermittent, and seems to occur when I'm moving from local to hdfs, or back. I wonder if it is a synchronization issue? Sometimes the hdfs last modified time ends up slightly different than the local one. If I rerun the workflow sometimes it is a problem again, and sometimes not.

@aboytsov
Copy link
Contributor

Yes, it is most likely the synchronization issue. I ran into this issue before, and reported to Philip. After he fixed it, it was gone. It probably happened again. Even 1 second de-synchronization could create this issue. I won't close the bug just yet - but let me know once you talk to Philip.

@ghost ghost self-assigned this Jan 28, 2013
@dirtyvagabond
Copy link
Contributor

either one of you guys up for adding details on this to the wiki? sounds like an annoying gotcha that we should warn folks about

@aboytsov
Copy link
Contributor

when you start FAQ, I'll add it there :)

@alexrobbins
Copy link
Author

So, is there is a way we could make drake more tolerant of this? Right now if the servers get out of sync by even a millisecond, we'll have a problem. The problem only surfaces when the dependencies go the wrong way across the divide.

Could we add some sort of configurable delay between steps? If we waited one second after each step, then the servers would have to be off by more than one second to see the problem. That seems much less likely than being one millisecond off. In most data workflows an extra second or two per step is not going to be a big deal.

Alternately, could we just configure a "fuzz" factor into the out-of-date calculation that calls things ok if their timestamp is after their dependency, after adding the fuzz factor?

I think the problem shows up mostly when using dummy commands that don't take very long. If the commands were longer, it'd overcome the server time difference. That said, I'm seeing the issue around hdfs -copyToLocal and -put, which are going to be something I use in the future. Adding a 30 second sleep to each command does fix the problem.

@dirtyvagabond
Copy link
Contributor

Maybe we could control the scope of the forced delay? The rule would be like: If using HDFS and step was fast, then artificially pause.

Or is there some elegant way to detect the problem beforehand?

@alexrobbins
Copy link
Author

AFAICT, the issue only comes up when moving between filesystems, so we could add the (configurable?) delay before any step that uses both local and hdfs filesystems.

Also, we could try to detect the unsync condition between the two systems and correct for it. Maybe make a tmp file on both systems at the same time, compare eventual access times, then adjust later times by the difference? This might work, except that network lag may not be consistent, so the adjustment we derive may only be accurate for that single point in time.

I think there isn't an easy fix because distributed systems have to deal with network lag that can make time comparisons tough. The delay method will work as long as the delay is bigger than the time difference between the two systems, I think.

@aboytsov
Copy link
Contributor

Alex, thank you very much for your thoughts.

There isn't an easy fix. Ideally, every system would have its time synchronized over NTP.

I like both your ideas. We can add configurable delay (say, 300-400 ms) before every step that uses more than one filesystem. It would not solve all possible problems, but probably a big chunk.

I'm a little more worried about "fuzzy" timestamp evaluation. If it relaxed the requirements (i.e. more targets would be evaluated than otherwise), it would be OK. But it tightens the requirements, which can be problematic. Consider, for example, a user that runs a script which runs touch on several targets to invalidate part of the workflow. This script might not work very well, and the result can be a bit dangerous and hard to debug.

I also like the idea to test the filesystem delay. We can have a special flag (--fs_test or something) and report the timestamp delays on all filesystems. We can run temporary file creation in a thread pool for best results.

But if the desynchronization is seconds, nothing will really help.

I'm not sure if all of the above is the highest priority - would you like to help us with the code? I'd be more than happy to review or point you to the right place.

@aboytsov
Copy link
Contributor

Added --step-delay flag in feature/vvv: ee833c5

@alexrobbins
Copy link
Author

The --step-delay flag delays every step. While that would work, we really
only need to delay before steps that cross the local to hdfs divide. (I
imagine your code was just a first pass at a solution, and it does fix the
problem.)

I wonder if the fs_test you mention should run as a precondition whenever
there are multiple file systems. If the systems are out of sync, there are
going to be weird issues. I'm in favor of failing fast with explicit error
messages, as opposed to failing weird, later, for no apparent reason.

"But if the desynchronization is seconds, nothing will really help." Yeah,
at that point I think the best we can do is complain loudly to the user.

@aboytsov
Copy link
Contributor

Yes, I remember your suggestion to implement it only for steps crossing over 2 or more filesystems, and it's a fine one. But I had to implement it this way, because the problem seems to be fundamental on certain filesystems - #36.

We could have another flag that would turn control the behavior of --step-delay and enable it on multiple filesystem only, i.e. --step-delay-cross-fs or something like that.

I agree we should fail fast, and I think we can be even smarter with fs_test. We can put a flag under .drake/ directory where Drake keeps all temporary files including logs and script files, which would indicate whether the filesystem testing happened for this workflow or not. We can repeat it every week (day?), if needed, and, of course, one needs to be able to disable it completely. I agree we can easily detect whether the workflow uses multiple filesystems.

I have to say this issue is quite low on my priority list for now. But I'd be more than happy to review anyone's code contributions and provide direction and guidance.

@alexrobbins
Copy link
Author

Oh, I didn't realize HDFS was limited to 1s resolution. Your change makes sense in light of that.

@aboytsov
Copy link
Contributor

aboytsov commented Feb 2, 2013

I'm actually not sure what timestamp resolution HDFS has. If you could run any workflow that uses HDFS with --debug flag and see what timestamps it reports, it would be helpful. @larsyencken was talking about HFS+ which is the file system OS X uses.

@aboytsov
Copy link
Contributor

aboytsov commented Feb 3, 2013

Alex, we should probably close this bug since it is related to Factual's HDFS/NFS desynchronization, and if Philip fixed it, this problem should go away.

I liked all your other ideas, however, and I was wondering if you could file a feature request for what you think we could do to make it even better (i.e. detection of multiple filesystems, automated tests) etc.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants