Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing large JSON files #49

Open
FHantke opened this issue Jul 25, 2022 · 1 comment
Open

Comparing large JSON files #49

FHantke opened this issue Jul 25, 2022 · 1 comment

Comments

@FHantke
Copy link

FHantke commented Jul 25, 2022

Hi and thanks for the great library.
My use case is that I compare DOM trees that are represented as JSON files to find the difference between two similar webpages. Unfortunately, I have problems comparing two large JSON files (>300K) as the comparison never comes to an end (I stopped after 10 minutes).

I'm not sure whether this is due to a bug in the code and/or due to the complexity of the JSON files. While debugging a bit, I realized that many elements are compared multiple times with the same element (or also themselves). For instance the following element from diff1.json when compared to diff2.json (Example files).

diff1["childNodes"][0]["childNodes"][1]["childNodes"][29]["childNodes"][18]["childNodes"][0]["childNodes"][0]["childNodes"][1]["childNodes"][0]["childNodes"][1]["childNodes"][1]["childNodes"][0]["childNodes"][0]["childNodes"][2]["childNodes"][40]["childNodes"][0]
{'nodeName': '#text',
 'nodeValue': 'Tienda Kindle',
 'childNodes': [],
 'attributes': {}}

Is there any option in the library to compare large JSONs or do you have any recommendation how to approach this use case?
Thank you!

@rbrisita
Copy link

To give another example, I have two files about 200K each and did a command line comparison with argument -i 2 and it completed in about four minutes. The data is not complex: two properties, one being an array of objects containing two properties: int and a short string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants