Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any possibility for .map() or .forEach()? #2

Closed
fidian opened this issue Apr 6, 2017 · 21 comments
Closed

Any possibility for .map() or .forEach()? #2

fidian opened this issue Apr 6, 2017 · 21 comments

Comments

@fidian
Copy link

fidian commented Apr 6, 2017

I am parsing some search results for issues and I get back a structure like this:

[
    {
        user: {
            id: 12345,
            name: "Tyler Akins"
        },
        ... other properties that aren't useful for this example ...
    },
    {
        user: {
            id: 12345,
            name: "Tyler Akins"
        },
        ... other properties that aren't useful for this example ...
    }
]

The object content is the same content but are different objects. My code combines them so they point to the same object instance in memory. What I would love is a way to have code that does something like this:

dotWild.map(data, "*.user", (item) => remapUserToOneObject(item.user));

Another possible solution, though less useful for my particular case

dotWild.forEach(data, "*", (item) => {
    item.user = remapUserToOneObject(item.user);
}));

Currently I have a list of several wildcards and I'm splitting on ".", taking off the last chunk and using the rest.

[
    "*.user",
    "*.project",
    ... many more ...
].forEach((wildcardPattern) => {
    var propertyName, list;

    wildcardPattern = wildcardPattern.split(".");
    propertyName = wildcardPattern.pop();
    list = dotWild.get(data, wildcardPattern);

    // Due to how things work, the wildcard pattern may not have a *
    // so this might not be an array.
    list = [].concat(list);
    list.forEach((item) => {
        item[propertyName] = remapToOneObject(propertyName, item[propertyName]);
    });
});

As you can see, the above example would be a couple lines shorter with .forEach() and greatly simplified with support for .map.

// .map() example
[
    "*.user",
    "*.project",
    ... many more ...
].forEach((wildcardPattern) => {
    var propertyName;

    propertyName = wildcardPattern.match(/\.([^.]+)$/)[0];
    dotWild.map(data, wildcardPattern, (item) => remapToOneObject(propertyName, item));
})
@wadackel
Copy link
Owner

wadackel commented Apr 7, 2017

Hi @fidian, Thanks for Issues.

As you mentioned, map() and forEach() may be useful in some cases.
I'd like to update the version in the near future 😃

@wadackel
Copy link
Owner

Hi @fidian. I'm sorry I'm late.
forEach() and map() have been added to v1.1.0 🎉

Please check when you have time 🙏

@fidian
Copy link
Author

fidian commented Apr 10, 2017

Unfortunately, the key is useless to me. It doesn't give me the path into the object and it doesn't give me the parent object as well. Your examples provide accurate values, but the key is just a string. I don't know what object contains that string and I can't modify that object directly.

Consider an Array example:

var a = ['zero', 'one', 'two', 'three'];
a.forEach((value, index) => {
    console.log(`I know the value ${value} corresponds to a[${index}]: ${a[index]}`)
});

Now let us use your forEach method to replicate the same thing.

// Uses the example data from the readme
dot.forEach(postData, 'data.posts.*.id', (value, key) => {
    console.log(`Value is ${value}, key is ${key}, but I don't know what object contains that key.`);
})

Having the key supplied does not appear to have any purpose. If you were to pass the parent object along or the full key, that would get us somewhere.

// Uses the example data from the readme

// If key is "data.posts.0.id" I could ...
dot.forEach(postData, 'data.posts.*.id', (value, key) => {
    console.log(`Value ${value} was found at position ${key}.`);
    var mostOfTheKey = key.replace(/\.[^\.]*$/, ''); // Remove the last bit of the key
    dot.forEach(postData, mostOfTheKey.join("."), (containingObject) => {
        // Now I have the containing object and can modify things.
        containingObject.moreData = "additional data";
        containingObject[key.replace(/^(.*\.)?[^.]*$/, '')] = "alternate id";
    });
});

// If you add the containing object to the list of parameters and keep key as-is ...
dot.forEach(postData, 'data.posts.*.id', (value, key, containingObject) => {
    console.log(`Value ${value} was found at position ${key}.`);
    // Now I have the containing object and can modify things.
    containingObject.moreData = "additional data";
    containingObject[key] = "alternate id";
});

The map() function would benefit from the containing object as well, allowing you to remap values in the output array by pulling in other, nearby fields.

@wadackel
Copy link
Owner

@fidian Thanks for confirmation and reply.
It seems necessary to change the specification...


What do you think of the following ideas?

dot.forEach(postData, 'data.posts.*.id', (value, key, path, data) => {
  console.log(key); // => 'id'
  console.log(path); // => 'data.posts.0.id'
});

We think that convenience will be improved by giving the full path that can access the most recent key and object.

@fidian
Copy link
Author

fidian commented Apr 10, 2017

That looks good, especially if this is true:

dot.forEach(postData, 'data.posts.0.id', (value, key, path, data) => {
    console.log(path); // 'data.posts.0.id'
    console.log(data === postData.data.posts[0].id); // true
});

@wadackel
Copy link
Owner

@fidian It is reasonable to pass the original array to the variable data. (Ref: Array.prototype.forEach())

I would like to implement it with the following contents if possible.

dot.forEach(postData, 'data.posts.0.id', (value, key, path, data) => {
    console.log(path); // => 'data.posts.0.id'
    console.log(key); // => 'id'
    console.log(value === postData.data.posts[0].id); // => true
    console.log(dot.get(postData, path) === value); // => true
    console.log(data); // => postData...
});

Thanks.

@fidian
Copy link
Author

fidian commented Apr 11, 2017

That implementation would give me everything I need, so it certainly meets what I want. Also, I like maintaining an interface similar to Array.prototype.forEach(), so that's good too.

@wadackel wadackel mentioned this issue Apr 13, 2017
@wadackel
Copy link
Owner

Hi @fidian, Changed the specification of forEach() and map() in v1.1.1.
Please check when you have time 😃

@fidian
Copy link
Author

fidian commented Apr 14, 2017

Why is the fourth parameter the same as the object going in? I hope I wasn't unclear earlier when I was trying to describe that I would really need access to the object that contains the value.

// What is currently happening for me:
dot.forEach(postData, 'data.posts.*.id', (value, key, path, data) => {
    console.log(data === postData); // true - unhelpful because it's already in scope
});

// What I would like:
dot.forEach(postData, 'data.posts.*.id', (value, key, path, data) => {
    console.log(data === postData); // false
    console.log(data === postData.data.posts[0] || data === postData.data.posts[1]); // true
});

That would give me the object containing the key property, and data[key] === value.

Also, the documentation is a bit weird. I would expect the docs to say the following.

console.log(data); // postData.data.posts[0] or postData.data.posts[1]

@wadackel
Copy link
Owner

I am not good at English and may be misinterpreting.
I believe that the following uses are suitable.

  • value = Value for the specified path.
  • key = The most recent key.
  • path = Path of value in data.
  • data = The original data that we are referring to. (Ref: thisArg)

I think that data should contain the original data.

dot.forEach(postData, 'data.posts.*.id', (value, key, path, data) => {
  assert.deepStrictEqual(postData, data);
});

The contents you want to realize can be realized with the following code.

dot.forEach(postData, 'data.posts.*', (value, key, path, data) => {
    console.log(value === postData.data.posts[0] || value === postData.data.posts[1]); // true
});

@fidian
Copy link
Author

fidian commented Apr 14, 2017

I'll try to word it a few different ways and we'll see what we come to.

Topic 1: data === postData

It seems that the data value is meaningless because it is in scope. There's no point in passing it along when I can just use postData from outside. I can simply use postData.

dot.forEach(postData, 'data.posts.*', (value, key, path) => {
    console.log("I still have the original: " + JSON.stringify(postData));
});

If there's a context where you lose access to postData, you could simulate it using bind().

function logThings(data, value, key, path) {
    console.log("I still have the original: " + JSON.stringify(data));
}

(function () {
    postData = { ... };
    dot.forEach(postData, 'data.posts.*', logThings.bind(postData));
});

So, because you can always get that original chunk of data, there's no reason to pass it to the callback. It does mimic the Array.prototype.forEach() a bit, but the parameter order is wacky. If you wanted to match it, the callback should use (value, key, data, path).

I am unable to find any functional reason to pass the original object to the callback.

Topic 2: Easy retrieval of parent object

My goal is to have a pattern to define object relationships in a huge Jira dump. Right now the patterns look like this:

relationships = [
    {
        match: "issues.*.fields.assignee",
        collectionName: "users",
        idProperty: "name"
    },
    {
        match: "issues.*.fields.versions.*",
        collectionName: "versions",
        idProperty: "id"
    },
    // many, many more.
];

This says I want to map all assignees of all of the issues to a single users collection. This will save memory, which is wonderful. I'm loading a lot of data and removing redundant information is very desirable.

Some fields contain objects and some fields contain arrays of objects, which is why I have the wildcard in different spots. Here's a fictional example:

bigData = {
    issues: {
        "12345": {
            fields: {
                assignee: {
                    name: "tyler.akins",
                    displayName: "Tyler Akins"
                },
                summary: "A fake issue",
                versions: [
                    {
                        id: "148956",
                        name: "1.0.1"
                    },
                    {
                        id: "776245",
                        name: "2.0.0"
                    }
                ]
            }
        }
    }
};

When another issue cites the same assignee, that data is duplicated. It is not the same object, so bigData.issues["12345"].fields.assignee may look like bigData.issues["23456"].fields.assignee but they are not === to each other. My goal is to cycle through the relationships I've defined and collect the different object, replacing them with a single object that is referenced in all sorts of places.

Keep in mind that this, when deduplicated, is about 17 megs of very ugly JSON. I have a lot of duplication removed to bring it down to 17 megs and hundreds of thousands of issues.

Ideally, I would iterate across the relationships, in effect I would do this:

function deduplicate(obj, idProp, collection) {
    var id = obj[idProp];
    
    // If not in collection, add to the collection
    if (!collection[id]) {
        collection[id] = obj;
    }
    
    // Return the collection member
    return collection[id];
}

function updateObject(parentObject, key, rel) {
    // Get the accumulator of the deduplicated objects
    var collection = bigData[rel.collectionName] || (bigData[rel.collectionName] = {});
    
    // Next, call the deduplicator
    var objectToUse = deduplicate(value, rel.idProperty, collection);
    
    // Update the parent object to put this back in
    parentObj[key] = objectToUse;
}

relationships.forEach((rel) => {
    dot.forEach(bigData, rel.match, (value, key, path) => {
        var parentObject = ......?
        updateObject(parentObject, key, rel);
    });
});

The solutions you propose mean I lose track of the property I am aggregating, which could be a wildcard and I'd have to resort to scanning all the properties again. This makes dot-wild a bit useless to me. This might be a difficult point for me to explain, but the point of the relationships object is to effectively say "everything matching this wildcard should be placed in a collection" or "target all of the elements that I'm specifying".

I feel that key is an utterly useless value when you don't have the corresponding object.

With the above example, there's only one tricky part and it's finding the parent. Here's my two ideas:

// First idea: allow the parent object to be passed to the callback.
dot.forEach(bigData, rel.match, (value, key, path, parentObject) => {
    updateObject(parentObject, key, rel);
});

 
// Second idea, split the wildcard pattern that was created and scan myself.
var match = rel.match.split(".");
var lastPart = match.pop();
match = match.join(".");
dot.forEach(bigData, match, (parentObject, key, path) => {
    var value = parentObject[lastPart];
    if (value) {
        if (lastPart === "*") {
            value.forEach((item, itemKey) => {
                updateObject(parentObject, itemKey, rel);
            });
        } else {
            updateObject(parentObject, key, rel);
        }
    }
});

If the objects were in a DOM node, I would traverse up a parent link and this wouldn't be an issue.

I can't just remove key from path and search again because there's that magic dot. What if patterns don't have that dot? Also, I don't think it is as simple as just searching for a dot because you can escape strings.

Do you see a more elegant solution than passing the parent object to the callback? Maybe an array of parents?

Is it possible for key to be useful without a reference to that object?

An alternate could be to resolve a path, but this seems unwieldy.

dot.forEach(bigData, match, (value, key, path) => {
    var parentObject = dot.resolve(bigData, path, ".."); // or whatever notation
    updateObject(parentObject, key, rel);
});

I hope that this explains my position well. If not, I'd love to hear your viewpoints and arguments against my claims.

@fidian
Copy link
Author

fidian commented Apr 14, 2017

Come to think about it, the Array.prototype.forEach() model could be followed another way.

arr = [];
arr.forEach((value, index, originalArray) => {
    // value is the item in the array
    // index and originalArray are related. originalArray[index] === value
    // Last argument is related to the thing we're iterating over. originalArray === arr
})

// Maybe this would work?
dot.forEach(data, pattern, (value, key, parentObject, fullPath, topLevelData) => {
    // value is as discussed earlier
    // key and parentObject are related. parentObject[key] === value
    // fullPath and topLevelData are related. dot.get(topLevelData, fullPath) === value
    // Last argument is related to the thing we're searching. topLevelData === data
});

@wadackel
Copy link
Owner

Hi @fidian, As you said, the following code may be more intuitive when simulating Array.prototype.forEach().

dot.forEach(data, pattern, (value, key, parentObject, fullPath, topLevelData) => {
  // ...
});

I will implement it.


Would you please check the function before it releases next time if possible?
(This is because versioning goes up unnecessarily.)

@fidian
Copy link
Author

fidian commented Apr 24, 2017

Would I check the feature? Certainly. I had missed your discussion point 13 days ago and thought it provided what I had expected, so I am sorry that happened.

@wadackel
Copy link
Owner

Hi @fidian, Thank you for your reply.

I'm planning to complete the development within this week.
I will inform you as soon as I can test it.

I'm sorry, please wait for a while 🙏

@fidian
Copy link
Author

fidian commented Apr 24, 2017 via email

@wadackel
Copy link
Owner

Thank you for waiting.

I changed the argument of the callback.
Could you test using the develop branch?

The following is an example of setting package.json.

{
  "dependencies": {
    "dot-wild": "tsuyoshiwada/dot-wild#develop"
  }
}

Please refer here for the document.

@fidian
Copy link
Author

fidian commented Apr 25, 2017

Looks like a winner. I've just put it into my code and it's working splendidly. Thanks!

@wadackel
Copy link
Owner

Thank you for confirmation 👍 !!
I will publish it to npm after a while 🎉

@fidian
Copy link
Author

fidian commented Apr 25, 2017

I really appreciate the effort you went though and the willingness you've shown to help me out even when I was giving you mixed messages. Thank you very much!

@wadackel
Copy link
Owner

I just published it to npm 🎉

Thank you too !
I feel that it became an useful library thanks to your advice 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants