Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3040] pick up a more proper local ip address for Utils.findLocalIpAddress method #1946

Closed
wants to merge 2 commits into from

Conversation

advancedxy
Copy link
Contributor

Short version: NetworkInterface.getNetworkInterfaces returns ifs in reverse order compared to ifconfig output. It may pick up ip address associated with tun0 or virtual network interface.
See SPARK_3040 for more detail

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

I have seen cases where a bad interface was chose, so this does seem like a good idea. But for windows, does this mean that the wrong interface is chosen? Since we support windows, should we add a check here to confirm we are on a unix-like system?

@advancedxy
Copy link
Contributor Author

Sorry, I didn't realize windows is supported. In that case, I believe a check is necessary. I will update the pr.

@advancedxy
Copy link
Contributor Author

@pwendell, would you look at this? It's a fairly simple fix. I don't have windows for primary use, so it's not confirmed on windows. I hope someone who uses windows can confirm this behavior.

@mridulm
Copy link
Contributor

mridulm commented Aug 15, 2014

Is this relying on documented behaviour or observed evidence ?
If latter, it is iffy at best to include this.
On 15-Aug-2014 1:29 pm, "叶先进" [email protected] wrote:

@pwendell https://github.com/pwendell, would you look at this? It's a
fairly simple fix. I don't have windows for primary use, so it's not
confirmed on windows. I hope someone who uses windows can confirm this
behavior.


Reply to this email directly or view it on GitHub
#1946 (comment).

@pwendell
Copy link
Contributor

I do agree with @mridulm it's a somewhat risky change so I think it's out of scope to put it in 1.1. I think that it's a native method with no stated guarantees about ordering.

@advancedxy
Copy link
Contributor Author

@pwendell @mridulm no, this behavior is not documented. The java SE doc doesn't say anything about ordering.. So before filing the JIRA, I looked up the OpenJDK source code, both 6 and 7.
Maybe I should post some snippets here.
In openjdk-6-src-b27,
jdk/src/windows/native/java/net/NetworkInterface.c Line335-

        /*
         * Put the interface at tail of list as GetIfTable(,,TRUE) is
         * returning the interfaces in index order.
         */
        count++;
        if (netifP == NULL) {
            netifP = curr;
        } else {
            netif *tail = netifP;
            while (tail->next != NULL) {
                tail = tail->next;
            }
            tail->next = curr;
        }

jdk/src/solaris/native/java/net/NetworkInterface.c function *addif Line1000

    /*
     * If "new" then create an netif structure and
     * insert it onto the list.
     */
    if (currif == NULL) {
        currif = (netif *)malloc(sizeof(netif));
        if (currif) {
            currif->name = strdup(name);
            if (currif->name == NULL) {
                free(currif);
                currif = NULL;
            }
        }
        if (currif == NULL) {
            JNU_ThrowOutOfMemoryError(env, "heap allocation failed");
            return ifs;
        }
        currif->index = index;
        currif->addr = NULL;
        currif->childs = NULL;
        currif->virtual = isVirtual;
        currif->next = ifs;
        ifs = currif;
    }

Of course, the whole flow is more complicated, but you can get the idea.
Source code in OpenJDk7 is similar, I won't post it here as this is a long comment.

@advancedxy
Copy link
Contributor Author

And @pwendell, I don't think this is risky. It's a minor change and at least it don't do worse.
The current implementation doesn't pick the more proper ip on my laptop(mac os x) or a Debian.
Of course, if you don't want to put this into 1.1, I totally get it. But I hope this can be made into master and later release.

@pwendell
Copy link
Contributor

By risky I mean this - many existing users of Spark might depend on the current behavior, so if we suddenly chose a different interface it will break existing behavior and it could be non-obvious to users why this is happening.

@mridulm
Copy link
Contributor

mridulm commented Aug 16, 2014

Thanks for digging into this !

From what I see, this is an implementation detail of a specific jdk version
(order in which syscall returns and order in which java handles it) and
subject to arbitrary change : I am not sure why the order is the way it is
as mentioned by you, but there might be some rationale for it .... would be
good to raise a jdk bug and get it clarified.

Which is not to say we can't include it in some future release of spark ,
if it makes end-user life easier - in case it is a jre limitations : as
Patrick mentioned, we need to ensure it does not break for existing users.
On 16-Aug-2014 9:41 am, "叶先进" [email protected] wrote:

@pwendell https://github.com/pwendell @mridulm
https://github.com/mridulm no, this behavior is not documented. The
java SE doc doesn't say anything about ordering.. So before filing the
JIRA, I looked up the OpenJDK source code, both 6 and 7.
Maybe I should post some snippets here.
In openjdk-6-src-b27,
jdk/src/windows/native/java/net/NetworkInterface.c Line335-

    /*
     * Put the interface at tail of list as GetIfTable(,,TRUE) is
     * returning the interfaces in index order.
     */
    count++;
    if (netifP == NULL) {
        netifP = curr;
    } else {
        netif *tail = netifP;
        while (tail->next != NULL) {
            tail = tail->next;
        }
        tail->next = curr;
    }

jdk/src/solaris/native/java/net/NetworkInterface.c function *addif Line1000

/*
 * If "new" then create an netif structure and
 * insert it onto the list.
 */
if (currif == NULL) {
    currif = (netif *)malloc(sizeof(netif));
    if (currif) {
        currif->name = strdup(name);
        if (currif->name == NULL) {
            free(currif);
            currif = NULL;
        }
    }
    if (currif == NULL) {
        JNU_ThrowOutOfMemoryError(env, "heap allocation failed");
        return ifs;
    }
    currif->index = index;
    currif->addr = NULL;
    currif->childs = NULL;
    currif->virtual = isVirtual;
    currif->next = ifs;
    ifs = currif;
}

Of course, the whole flow is more complicated, but you can get the idea.
Source code in OpenJDk7 is similar, I won't post it here as this is a long
comment.


Reply to this email directly or view it on GitHub
#1946 (comment).

@advancedxy
Copy link
Contributor Author

@mridulm, could you give me the link to report a jdk bug? It's very confusing that jdk has many different jira place (OpenJDK jira, Oracle etc.).

After second thought, maybe it's a bad idea to rely on the output order. Maybe we should iterate over all interfaces and get the best(or reasonable) ip.

@mridulm
Copy link
Contributor

mridulm commented Aug 17, 2014

Not sure, https://bugs.openjdk.java.net/browse/JDK ?

On Sun, Aug 17, 2014 at 8:54 AM, 叶先进 [email protected] wrote:

@mridulm https://github.com/mridulm, could you give me the link to
report a jdk bug? It's very confusing that jdk has many different jira
place (OpenJDK jira, Oracle etc.).

After some second thought, maybe it's a bad idea to rely on the output
order. Maybe we should iterate over all interfaces and get the best(or
reasonable) ip.


Reply to this email directly or view it on GitHub
#1946 (comment).

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

pwendell commented Sep 8, 2014

On this one I'm a little torn - mostly because I've directly experienced this issue and also I've seen users run into it, so I think it's a fairly severe issue in terms of how we chose the default. The current behavior is indeed backwards in the most common case (someone running oracle or openJDK on Linux/Unix).

For that reason, I'm inclined to merge this and see if anyone runs into any case where this somehow regresses behavior, and if so we can consider reverting it for 1.2. When we are considering things like this it's generally good to merge them early in a release cycle so it gets a lot of testing.

@advancedxy
Copy link
Contributor Author

I'd love to see this get merged into 1.2.

@vanzin
Copy link
Contributor

vanzin commented Sep 10, 2014

I'm not for or against the fix, but just wanted to point out that event the order the OS returns the interfaces is not really reliable. On my machine, where I constantly switch between wired and wireless, and often have both on, the order of eth0 vs. wlan0 seems to depend on the order they got connected more than anything.

So sure, the change sounds fine if it helps with a common case (or at least to make windows and unix behave more alike?), but really I don't see how anyone can rely on any behavior here since there is really no expected behavior of the underlying api.

@pwendell
Copy link
Contributor

In that case I'd propose merging this tentatively and if it causes issues in the 1.2 dev/QA cycle we can revert it.

I dug around a bunch, it looks like there really isn't clear behavior here, even within Linux variants because the code used to do this had to change a lot for IPv6 support.There isn't an ordering specification anywhere I can find in the internal API's in BSD/POSIX sockets that relate to this.

So it could be that this patch helps some and hurts others. As such, I'd be inclined to try it out and rollback if we find anyone is hurt by this (if it's not a strict win then we should bias towards the old behavior).

@pwendell
Copy link
Contributor

Alright - let's see how this goes and we can rollback if we see any issues.

@asfgit asfgit closed this in febafef Sep 16, 2014
szehon-ho added a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
… they are not equal (apache#1946)

-- Allow SPJ between 'compatible' bucket funtions
-- Add a mechanism to define 'reducible' functions, one function whose output can be 'reduced' to another for all inputs.

  ### Why are the changes needed?
-- SPJ currently applies only if the partition transform expressions on both sides are identifical.

  ### Does this PR introduce _any_ user-facing change?
No

  ### How was this patch tested?
Added new tests in KeyGroupedPartitioningSuite

  ### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#45267 from szehon-ho/spj-uneven-buckets.

Authored-by: Szehon Ho <[email protected]>

Signed-off-by: Chao Sun <[email protected]>
Co-authored-by: Szehon Ho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants