Fix Tar timestamp conversion from/to string and DateTimeOffset #71038

carlossanlop · 2022-06-21T02:30:53Z

The tests that made comparisons between expected timestamps and timestamps extracted from extended attributes (string -> double -> DateTimeOffset) were intermittently failing due to a precision bug. The main suspicion is that double is losing precision, so I was given the suggestion here to change it to decimal. In that same comment, I added more decimals when parsing (from 6 to 9) to help with the precision as well.

Fixes #69474
Fixes #70060

Re-enable test disabled here: #69997

ghost · 2022-06-21T02:31:09Z

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

The tests that made comparisons between expected timestamps and timestamps extracted from extended attributes (string -> double -> DateTimeOffset) were intermittently failing due to a precision bug. The main suspicion is that double is losing precision, so I was given the suggestion here to change it to decimal. In that same comment, I added more decimals when parsing (from 6 to 9) to help with the precision as well.

Fixes #69474
Fixes #70060

Re-enable test disabled here: #69997

Author:	carlossanlop
Assignees:	carlossanlop
Labels:	`area-System.IO`
Milestone:	7.0.0

carlossanlop · 2022-06-21T02:35:13Z

/azp run runtime-extra-platforms

azure-pipelines · 2022-06-21T02:35:34Z

Azure Pipelines successfully started running 1 pipeline(s).

danmoseley · 2022-06-21T04:21:32Z

Do we understand where the loss occurs? It seems unlikely it's long-double-long since as @tannergooding pointed out that would require a value over 2^52. I wonder whether we are papering over an issue that may manifest elsewhere.

If we are lacking a local repro to debug, we could disable the test for now (or make it tolerant), and leave the issue open. Then later add logging/asserts and loop it until we find a debuggable repro?

eerhardt · 2022-06-21T14:44:41Z

It seems unlikely it's long-double-long since as @tannergooding pointed out that would require a value over 2^52.

runtime/src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHelpers.cs

Lines 122 to 123 in 7ab7f83

    
           internal static DateTimeOffset GetDateTimeOffsetFromSecondsSinceEpoch(double secondsSinceUnixEpoch) => 
        
               new DateTimeOffset((long)(secondsSinceUnixEpoch * TimeSpan.TicksPerSecond) + DateTime.UnixEpoch.Ticks, TimeSpan.Zero);

Today (GMT: Tuesday, June 21, 2022 2:39:51 PM), there have been 1,655,769,600 seconds since 1/1/1970.

In the above code, we are multiplying seconds since epoch by TimeSpan.TicksPerSecond (10,000,000):

UPDATE: My original "seconds since 1970" number was actually "seconds since year 0". I've updated the math above to (hopefully) be correct.

danmoseley · 2022-06-21T14:45:59Z

Thank you. Somehow I read it as 10^52 😀

eerhardt · 2022-06-21T15:14:18Z

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHelpers.cs

-            double secondsSinceEpoch = GetSecondsSinceEpochFromDateTimeOffset(timestamp);
-            return secondsSinceEpoch.ToString("F9", CultureInfo.InvariantCulture); // 6 decimals, no commas
+            decimal secondsSinceEpoch = GetSecondsSinceEpochFromDateTimeOffset(timestamp);
+            return secondsSinceEpoch.ToString("F9", CultureInfo.InvariantCulture); // 9 decimals, no commas


Don't we want G9?

Hmm, maybe 9 is too small.

Maybe we just want G:

https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings#general-format-specifier-g

Just G (the default) will print the shortest roundtrippable string. G9 will print the shortest string or up to 9 significant digits, whichever is lesser. However, it will also print exponentials.

There is actually a decent amount of "inconsistency" in how the formatting APIs work between the different format specifiers and its a bit frustrating at times, but we can't easily change existing ones due to back-compat.

However, it will also print exponentials.

I had initially tested G and because it shows E+XX like in the screenshot above, I decided to use F instead.

@tannergooding just to make sure we are all in agreement, using F is ok?

In what scenarios?

double d = 1655815010.199999998; Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 1.65581501E+09 d = 1.199999998; Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 1.2 d = 123456789.199999998; Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 123456789 (lol wat?)

And F9 with double does not seem to improve:

double d = 1655815010.199999998; Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1655815010.200000048 d = 1.199999998; Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1.199999998 d = 123456789.199999998; Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 123456789.200000003

We are using a decimal in the code.

With decimal, F9 works fine:

decimal d = 1655815010.199999998M; Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1655815010.199999998 d = 1.199999998M; Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1.199999998 d = 123456789.199999998M; Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 123456789.199999998

But G9 behaves similarly to the example above for double:

decimal d = 1655815010.199999998M; Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); //1.65581501E+09 d = 1.199999998M; Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 1.2 d = 123456789.199999998M; Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 123456789

I'll use G as suggested above, and as explained via chat.

I am also going to remove some calls to this conversion code where it's not necessary. This code to convert timestamps is only required when converting from gnu to pax or viceversa (to store atime and ctime as required by the format).

eerhardt · 2022-06-21T16:31:34Z

src/libraries/System.Formats.Tar/tests/TarEntry/TarEntry.Conversion.Tests.Base.cs

                }
                else if (originalEntry.Format is TarEntryFormat.Ustar or TarEntryFormat.V7)
                {
-                    CompareDateTimeOffsets(initialNow, actualAccessTime);
-                    CompareDateTimeOffsets(initialNow, actualChangeTime);
+                    AssertExtensions.GreaterThanOrEqualTo(actualAccessTime, initialNow);


Why is this GreaterThanOrEqualTo and not just Assert.Equal?

In these cases, the initialNow timestamp is DateTimeOffset.UtcNow, which is generated shortly before invoking the constructors that take the other entry. Since the constructors generate the mtime timestamp automatically using UtcNow, I can't know the exact expected value, but I can at least verify that the value is larger than the timestamp I saved before calling the constructors.

The cases that do an Equal comparison are the ones where the existing mtime is used as the value to store for atime and ctime, so I know the exact expected value to compare.

carlossanlop · 2022-06-22T03:12:32Z

@eerhardt the latest commits take care of the following:

Avoid using the conversion methods when it is not needed: There is a case where we create a FileSystemInfo object, and we can reuse 2 of the 3 Last*TimeUtc properties.
Fixed a bug checking two times for a key in the dictionary, instead of two different ones.
Using 'G' for string formatting of the decimal, as recommended. I also modified the related test to not require finding a dot character in the string. The important thing is that later that timestamp should be able to do the roundtrip, and the tests should pass.
- The particular case where the string timestamp might not contain a dot and decimal portion is when the a v7/ustar/gnu entry is converted to pax, and the mtime field is then copied into the extended attributes dictionary. The mtime field in that case did not have a decimal portion, it was obtained directly from the old header mtime field, which holds an integer.
Process ten base fields containing timestamp seconds as long, not int (commit is below).

carlossanlop · 2022-06-22T03:12:47Z

/azp run runtime-extra-platforms

azure-pipelines · 2022-06-22T03:12:59Z

Azure Pipelines successfully started running 1 pipeline(s).

carlossanlop · 2022-06-22T05:51:18Z

/azp run runtime-extra-platforms

azure-pipelines · 2022-06-22T05:51:31Z

Azure Pipelines successfully started running 1 pipeline(s).

eerhardt · 2022-06-22T14:01:25Z

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs

@@ -383,10 +383,10 @@ private void ReadPosixAndGnuSharedAttributes(Span<byte> buffer)
        private void ReadGnuAttributes(Span<byte> buffer)
        {
            // Convert byte arrays
-            int aTime = TarHelpers.GetTenBaseNumberFromOctalAsciiChars(buffer.Slice(FieldLocations.ATime, FieldLengths.ATime));
+            long aTime = TarHelpers.GetTenBaseLongFromOctalAsciiChars(buffer.Slice(FieldLocations.ATime, FieldLengths.ATime));


It would be good to create a test that has a date time in 2039 to make sure we can handle dates past 2038.

Added tests for the epochalypse and the max upper limit in octal.

eerhardt · 2022-06-22T14:02:59Z

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs

@@ -397,7 +396,7 @@ private void CollectExtendedAttributesFromStandardFieldsIfNeeded()
            _extendedAttributes.Add(PaxEaName, _name);

            bool containsATime = _extendedAttributes.ContainsKey(PaxEaATime);
-            bool containsCTime = _extendedAttributes.ContainsKey(PaxEaATime);
+            bool containsCTime = _extendedAttributes.ContainsKey(PaxEaCTime);


Is there a test that would have caught this bug? If not, we should add one.

We do, kinda:

The TarWriter_WriteEntry_Pax_Tests.WritePaxAttributes_Timestamps_AutomaticallyAdded checks this for the PaxTarEntry(entryFormat, string) constructor, when the user does not explicitly set atime and ctime.

The PaxTarEntry_Conversion_Tests.Constructor_ConversionFrom* tests are semi-related: they check that atime and ctime are always in the dictionary after conversion, in the constructor itself, not at write time.

We don't have a test for the conversion constructors, so I'm adding one.

But now that you mention it, it seems these two conditions will never be false, making the code unreachable, so I am removing the code. Here's why:

When constructing a PaxTarEntry, whether it is new or it is being converted from another entry, all constructors ensure to add atime and ctime to the extended attributes dictionary if the user did not provide them (see the method AddNewAccessAndChangeTimestampsIfNotExist). Consider also that the extended attributes dictionary is exposed to the user as an IReadOnlyDictionary<string, string> getter-only property, so no new items can be added after construction.

Update: Both the existing test and the new test I'm adding pass after removing the unreachable code.

eerhardt · 2022-06-22T14:05:02Z

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHelpers.cs

            new DateTimeOffset((long)(secondsSinceUnixEpoch * TimeSpan.TicksPerSecond) + DateTime.UnixEpoch.Ticks, TimeSpan.Zero);

        // Converts the specified DateTimeOffset to the number of seconds that have passed since the Unix Epoch.
-        internal static double GetSecondsSinceEpochFromDateTimeOffset(DateTimeOffset dateTimeOffset) =>
-            ((double)(dateTimeOffset.UtcDateTime - DateTime.UnixEpoch).Ticks) / TimeSpan.TicksPerSecond;
+        internal static decimal GetSecondsSinceEpochFromDateTimeOffset(DateTimeOffset dateTimeOffset) =>


Does this need to be internal? Is it only called from this class?

eerhardt · 2022-06-22T14:09:31Z

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHelpers.cs

+        // Converts the array to an octal base number, then transforms it to ten base and returns it.
+        internal static long GetTenBaseLongFromOctalAsciiChars(Span<byte> buffer)
+        {
+            string str = GetTrimmedAsciiString(buffer);


Do we really need to create an intermediate string just to parse it into an integer?

We should be able to use System.Buffers.Text.Utf8Parser

I don't see a Utf8Parser API that parses the integer from an "octal" string. I see it supports x - hexstring.

I agree that a new allocation is not needed. This code can be improved to instead of returning a string, it returns an int representing the length of the ROS for slicing.

The method checks that the last character(s) in the ROS are either a 0 (null char) or a 32 (space). All other characters are not trimmed. Which means that if an unexpected non-numeric character is found, it will cause the conversion to fail.

Do you mind if I address this request later? I'd like to get this PR merged just for the DateTimeOffsets.

Do you mind if I address this request later?

I think that is fine.

eerhardt · 2022-06-22T14:14:28Z

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarWriter.Unix.cs

-            entry._header._mTime = TarHelpers.GetDateTimeOffsetFromSecondsSinceEpoch(status.MTime);
-            entry._header._aTime = TarHelpers.GetDateTimeOffsetFromSecondsSinceEpoch(status.ATime);
+            entry._header._mTime = info.LastWriteTimeUtc;
+            entry._header._aTime = info.LastAccessTimeUtc;
            entry._header._cTime = TarHelpers.GetDateTimeOffsetFromSecondsSinceEpoch(status.CTime);


From looking at the FileStatus code:

runtime/src/libraries/System.Private.CoreLib/src/System/IO/FileStatus.Unix.cs

Lines 309 to 318 in fdc3b51

return UnixTimeToDateTimeOffset(_fileCache.MTime, _fileCache.MTimeNsec);

}

internal void SetLastWriteTime(string path, DateTimeOffset time, bool asDirectory)

=> SetAccessOrWriteTime(path, time, isAccessTime: false, asDirectory);

private static DateTimeOffset UnixTimeToDateTimeOffset(long seconds, long nanoseconds)

{

return DateTimeOffset.FromUnixTimeSeconds(seconds).AddTicks(nanoseconds / NanosecondsPerTick);

}

It looks like status.XTime gets added to status.XTimeNsec to make this DateTimeOffset. We will be respecting that for M and A time, but it looks like we don't respect that for C time now.

Good point. Changing this logic to:

entry._header._cTime = DateTimeOffset.FromUnixTimeSeconds(status.CTime).AddTicks(status.CTimeNsec / 100 /* nanoseconds per tick */);

…to store as DateTimeOffset.

…e a dot. The DateTimeOffset comparison done afterwards should suffice.

… entry. Add tests to ensure we always add those entries to the dictionary on construction.

carlossanlop · 2022-06-22T20:58:51Z

I'm rebasing on top of the latest bits in main, which contain the fixes for the timestamp and Apple-specific test failures.

carlossanlop · 2022-06-22T20:59:44Z

/azp run runtime-extra-platforms

azure-pipelines · 2022-06-22T21:00:03Z

Azure Pipelines successfully started running 1 pipeline(s).

eerhardt

Looks good! Thanks @carlossanlop.

carlossanlop · 2022-06-22T23:18:56Z

The tvOS failure is unrelated to this change, and it is not critical:

The ExtractToFile_SpecialFile_Unelevated_Throws method threw UnauthorizedAccessException when attempting to extract the fifo from the archive into disk:

https://github.com/dotnet/runtime/blame/051b4828c7d3a0cad3289830ef9fd2120f45bb2b/src/libraries/System.Formats.Tar/tests/TarReader/TarReader.ExtractToFile.Tests.Unix.cs#L39

In other Unix platforms, extracting fifos does not throw, but apparently it does on tvOS. I'll get it fixed later.

Here's the callstack:

[14:55:07.7170830] 2022-06-22 14:55:07.706 System.Formats.Tar.Tests[58680:78698627]    Exception messages: System.UnauthorizedAccessException : Access to the path '/private/var/mobile/Containers/Data/Application/11374FD2-B51C-49C4-844D-F40F062ACE94/tmp/1pslf5q1.gf2/output' is denied.
[14:55:07.7171130] ---- System.IO.IOException : Operation not permitted
[14:55:07.7189880] 2022-06-22 14:55:07.708 System.Formats.Tar.Tests[58680:78698627]    Exception stack traces:    at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
[14:55:07.7190220]    at System.Formats.Tar.TarEntry.ExtractAsFifo(String destinationFileName)
[14:55:07.7190300] 2022-06-22 14:55:07.708 System.Formats.Tar.Tests[58680:78698627]    at System.Formats.Tar.TarEntry.ExtractToFileInternal(String filePath, String linkTargetPath, Boolean overwrite)
[14:55:07.7190360]    at System.Formats.Tar.TarEntry.ExtractToFile(String destinationFileName, Boolean overwrite)
[14:55:07.7201670] 2022-06-22 14:55:07.709 System.Formats.Tar.Tests[58680:78698627]    at System.Formats.Tar.Tests.TarReader_ExtractToFile_Tests.ExtractToFile_SpecialFile_Unelevated_Throws()

carlossanlop added the area-System.IO label Jun 21, 2022

carlossanlop added this to the 7.0.0 milestone Jun 21, 2022

carlossanlop requested review from adamsitnik, eerhardt, bartonjs, tarekgh and jozkee June 21, 2022 02:30

carlossanlop self-assigned this Jun 21, 2022

tarekgh approved these changes Jun 21, 2022

View reviewed changes

eerhardt reviewed Jun 21, 2022

View reviewed changes

carlossanlop mentioned this pull request Jun 21, 2022

The "Standard numeric format strings" doc is out of date dotnet/docs#29951

Closed

carlossanlop mentioned this pull request Jun 22, 2022

Implement Tar Global Extended Attributes API changes #70869

Merged

eerhardt reviewed Jun 22, 2022

View reviewed changes

Change double to decimal in timestamp conversions to preserve precision

529e4c0

carlossanlop added 11 commits June 22, 2022 10:16

Re-enable disabled test

d52a541

Adjust assert message

33974cd

Reuse FileSystemInfo Last*TimeUtc fields, use implicit cast operator …

9973fcd

…to store as DateTimeOffset.

Fix using wrong fieldname for ctime.

c8df15b

Using 'G' for decimal to string conversion. Adjust test to not requir…

44789fc

…e a dot. The DateTimeOffset comparison done afterwards should suffice.

Ensure timestamps are converted to long, not int.

17e6060

Add Epochalypse and Past-octal-limit timestamp tests

9f81016

TarHelpers methods can be private

7fa8776

Remove unreachable code for adding ctime and atime before writing Pax…

dc29c6b

… entry. Add tests to ensure we always add those entries to the dictionary on construction.

Fix typo in test when retrieving ctime.

c7a0923

Make sure CTime adds nanoseconds on Unix when retrieving info from disk.

cad5599

carlossanlop force-pushed the FixTarTimestamps branch from 72698b6 to cad5599 Compare June 22, 2022 20:59

eerhardt approved these changes Jun 22, 2022

View reviewed changes

trylek mentioned this pull request Jun 22, 2022

Fix for issue 70385 (stack overflow in the CoreCLR runtime during SVM validation) #71135

Merged

carlossanlop merged commit 70e9ca0 into dotnet:main Jun 23, 2022

carlossanlop deleted the FixTarTimestamps branch June 23, 2022 00:22

carlossanlop mentioned this pull request Jul 12, 2022

Tar APIs pending feedback to address #68230

Open

54 tasks

ghost locked as resolved and limited conversation to collaborators Jul 23, 2022

jeffhandley added area-System.Formats.Tar and removed area-System.IO labels Nov 21, 2022

	return UnixTimeToDateTimeOffset(_fileCache.MTime, _fileCache.MTimeNsec);
	}

	internal void SetLastWriteTime(string path, DateTimeOffset time, bool asDirectory)
	=> SetAccessOrWriteTime(path, time, isAccessTime: false, asDirectory);

	private static DateTimeOffset UnixTimeToDateTimeOffset(long seconds, long nanoseconds)
	{
	return DateTimeOffset.FromUnixTimeSeconds(seconds).AddTicks(nanoseconds / NanosecondsPerTick);
	}

Fix Tar timestamp conversion from/to string and DateTimeOffset #71038

Fix Tar timestamp conversion from/to string and DateTimeOffset #71038

Conversation

carlossanlop commented Jun 21, 2022

ghost commented Jun 21, 2022

carlossanlop commented Jun 21, 2022

azure-pipelines bot commented Jun 21, 2022

danmoseley commented Jun 21, 2022

eerhardt commented Jun 21, 2022 • edited Loading

danmoseley commented Jun 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eerhardt Jun 21, 2022 • edited Loading

Choose a reason for hiding this comment

tannergooding Jun 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlossanlop Jun 21, 2022 • edited Loading

Choose a reason for hiding this comment

carlossanlop Jun 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlossanlop commented Jun 22, 2022 • edited Loading

carlossanlop commented Jun 22, 2022

azure-pipelines bot commented Jun 22, 2022

carlossanlop commented Jun 22, 2022

azure-pipelines bot commented Jun 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlossanlop commented Jun 22, 2022

carlossanlop commented Jun 22, 2022

azure-pipelines bot commented Jun 22, 2022

eerhardt left a comment

Choose a reason for hiding this comment

carlossanlop commented Jun 22, 2022

eerhardt commented Jun 21, 2022 •

edited

Loading

eerhardt Jun 21, 2022 •

edited

Loading

tannergooding Jun 21, 2022 •

edited

Loading

carlossanlop Jun 21, 2022 •

edited

Loading

carlossanlop Jun 21, 2022 •

edited

Loading

carlossanlop commented Jun 22, 2022 •

edited

Loading