Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embeddings of all nodes are not obtained #11

Open
ayushidalmia opened this issue Jun 1, 2017 · 16 comments
Open

Embeddings of all nodes are not obtained #11

ayushidalmia opened this issue Jun 1, 2017 · 16 comments

Comments

@ayushidalmia
Copy link

Hi, I was trying to run this on a graph. However, the embeddings vec_1st.txt, vec_2nd.txt, and vec_all.txt do not generate the embeddings of all the nodes as in the original input graph.

Can you tell where I might be going wrong or why is this behavior caused?

@Cheng-CZ
Copy link

Same issue with me. Some nodes are missing.

@zhujiangang
Copy link

If you have read the code, you may find that the training instances are sampled from the graph, so the edges of low degree vertices won't be sampled in the training stage. This is the reason that some nodes are missing in the final embedding result.

@mongooma
Copy link

mongooma commented Aug 29, 2017

@zhujiangang The embeddings are initialized at first in InitVector() so even some edges are not sampled the nodes still have embeddings. I didn't have this issue in my case of using LINE. I wonder what caused your problem.

@jiay302
Copy link

jiay302 commented Sep 15, 2017

@gooeyforms Could you help me use this LINE model, I hava met some problems: I followed the train_youtube command, set the binary parameter 0, but the first column of the result file appear float nums, differ from the origin vertex id. I am very confused.

@mongooma
Copy link

mongooma commented Sep 15, 2017 via email

@jiay302
Copy link

jiay302 commented Sep 18, 2017

@gooeyforms Can you provide your email to me? I have some questions to ask you. I am a student at Beijing University of Posts and Telecommunications. I am looking forward to getting your help. Thank you very much.

@yyr93520
Copy link

Have you solved it? I met the same problem. Thanks!

@pickou
Copy link

pickou commented Mar 13, 2018

I have met the same problem, while I use the data of BlogCatalog, the embedding should be 10312,but line only returns a number of 10263.

@mongooma
Copy link

@pickou Could you run the code again with binary -1, and count the lines in the binary embedding file, like using wc -l *.embedding?
I have run LINE on the BlogCatalog dataset with binary -1 and this issue didn't occur. But I'm having trouble with binary -0.

@pickou
Copy link

pickou commented Mar 13, 2018

In my case, the issue occur the same. when I use wc -l line.emb ,I got 27420 and 27501 in two runs of LINE with the same parameters.

@pickou
Copy link

pickou commented Mar 14, 2018

@mongooma have you change the graph as undirected one ? I have made the change, like this.

1 2
3 5

then,

1 2
2 1
3 5
5 3

@mongooma
Copy link

@pickou I did. I don't know what caused the issue. I suggest you set breakpoints or print lines to debug the code. Please let me know when you locate the problem.

@pickou
Copy link

pickou commented Mar 15, 2018

@mongooma I have found what caused the issue.
when you read the edges from file, you must give a weight.

fscanf(fin, "%s %s %lf", name_v1, name_v2, &weight);

see,

1 2
3 5

then,

1 2 1
2 1 1
3 5 1
5 3 1

You'd better to warn people of that or you can set a parameter, like weighted, and deal with weighted and unweighted graph.

@mongooma
Copy link

mongooma commented Mar 15, 2018

@pickou I'm glad you located the problem. However, I still don't understand why this would cause the random result with different runs as you described. And even that I think the original input format is explicit enough for all types of graphs, I definitely think a separate script to deal with different input formats is a good idea.
At this point, you could commit a pull request to add a warning line to the Readme file.

@pickou
Copy link

pickou commented Mar 15, 2018

@mongooma I don't know either, but I have followed the ReadData() function, when I use the unweighted graph as input, like

1 2
2 1
3 1
1 3

and I print the name_v1 and name_v2,
Sometimes I got "1\100\066" instead of "1". I think the issue came from here.

@ZiyaoWu
Copy link

ZiyaoWu commented Jan 16, 2019

I suppose the reason why nodes miss is that the degree of missing nodes is zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants