Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample Code returns empty #46

Closed
pasa13142 opened this issue Dec 26, 2019 · 17 comments
Closed

Sample Code returns empty #46

pasa13142 opened this issue Dec 26, 2019 · 17 comments
Assignees
Labels
help wanted Extra attention is needed high priority

Comments

@pasa13142
Copy link

pasa13142 commented Dec 26, 2019

from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)

sample = db.convertDocument([0.1,0.2,0.3,0.4], {"hello": "world"})

db.addDocuments([sample])
vector = db.convertMatrix([0.1,0.2,0.3,0.4])

k = 10
result = db.getNearest(vector, k)

This is sample data set which is here https://github.com/a-mma/AquilaDB/wiki/Get-started-with-AquilaDB , and in my try, it returns empty list with something like :

status: true
documents: "[]"

Any idea ?

@pasa13142 pasa13142 changed the title Permission Denied while docker pull Sample data returns empty Dec 26, 2019
@pasa13142 pasa13142 reopened this Dec 26, 2019
@pasa13142 pasa13142 changed the title Sample data returns empty Sample Code returns empty Dec 26, 2019
@sopaoglu
Copy link
Contributor

sopaoglu commented Dec 26, 2019

I also challenge with the same problem, the following example is better
((https://github.com/a-mma/AquilaDB-Examples/blob/master/MNIST_example/1.%20Introduction%20to%20AquilaDB%20-%20MNIST%20Python.ipynb))

@freakeinstein
Copy link
Member

It is required at least vecount (as configured in DB_config.yml) documents to be indexed before first run of getNearest(). Please let me know if you are getting the error even after indexing vecount documents. To debug you can also try steps mentioned here as well: https://github.com/a-mma/AquilaDB/issues/45#issuecomment-569086001

@freakeinstein
Copy link
Member

No activity, closing. Please reopen if the issue persists.

@NikolaiPohodenko
Copy link

NikolaiPohodenko commented Jan 10, 2020

The issue persists.
vecount is 100
After adding 250 documents, getNearest() still comes empty.

The logs say, there is an uninitialized JS-object:

1|peer_manager  |  TypeError: Cannot read property 'rows' of undefined
1|peer_manager  |     at /AquilaDB/src/p2p/routing_table/index.js:157:34

@NikolaiPohodenko
Copy link

No activity, closing. Please reopen if the issue persists.

Please reopen.
I cannot reopen it myself, can only create a new one.
AFAIK, only a repository collaborator can reopen this issue.

@freakeinstein freakeinstein reopened this Jan 10, 2020
@freakeinstein
Copy link
Member

@NikolaiPohodenko could you please provide more details to reproduce the issue?

  1. which docker image is you are using
  2. share the client code you have used to index and query the data (so that I can run the code myself)
  3. which operating system you are using
  4. more logs from vecdb and vecstore (not from peer_manager)
  5. any other information so that will help while testing it out myself

@sopaoglu
Copy link
Contributor

sopaoglu commented Jan 10, 2020

If you change the config, when the container is running, you need to restart the container.
docker restart <container id>

@freakeinstein freakeinstein self-assigned this Jan 10, 2020
@freakeinstein freakeinstein added bug Something isn't working high priority help wanted Extra attention is needed and removed bug Something isn't working labels Jan 10, 2020
@NikolaiPohodenko
Copy link

  1. docker image is "latest" from 07 Jan 2020.
Digest: sha256:29bb80d259e17d754a9ad283eb308fdf4e1e64cdde03c3142c9193c0c69ada25
Status: Downloaded newer image for ammaorg/aquiladb:latest
docker.io/ammaorg/aquiladb:latest
  1. the client code is simple
    In one script I populate the db:
from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)

v0 = [0.1, 0.2, 0.3, 0.4]
attempts = 300
success_count = 0

for i in range(attempts):
    v = [i+x for x in v0]
    s = db.convertDocument(v, {"idx": f"{i}"})
    r = db.addDocuments([s])
    
    if r.status:
        success_count += 1
        
print(f"success_count = {success_count} of {attempts}") # 300 of 300

In another script I make KNN requests:

from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)

v0 = [0.11, 0.21, 0.31, 0.41]

success_count = 0
attempts = 300

for i in range(attempts):
    v = [i+x for x in v0]
    m = db.convertMatrix(v)
    r = db.getNearest(m, 10)
    
    if r.status:
        success_count += 1
        
print(f"success_count = {success_count} of {attempts}") # 0 of 300
  1. operating system is Windows 10 Enterprise
  2. more logs
(base) root@2fc5f5700917:/# pm2 logs
[TAILING] Tailing last 15 lines for [all] processes (change the value with --lines option)
/root/.pm2/pm2.log last 15 lines:
PM2        | 2020-01-10T11:34:20: PM2 log: PM2 PID file         : /root/.pm2/pm2.pid
PM2        | 2020-01-10T11:34:20: PM2 log: RPC socket file      : /root/.pm2/rpc.sock
PM2        | 2020-01-10T11:34:20: PM2 log: BUS socket file      : /root/.pm2/pub.sock
PM2        | 2020-01-10T11:34:20: PM2 log: Application log path : /root/.pm2/logs
PM2        | 2020-01-10T11:34:20: PM2 log: Worker Interval      : 30000
PM2        | 2020-01-10T11:34:20: PM2 log: Process dump file    : /root/.pm2/dump.pm2
PM2        | 2020-01-10T11:34:20: PM2 log: Concurrent actions   : 2
PM2        | 2020-01-10T11:34:20: PM2 log: SIGTERM timeout      : 1600
PM2        | 2020-01-10T11:34:20: PM2 log: ===============================================================================
PM2        | 2020-01-10T11:34:20: PM2 log: App [vecdb:0] starting in -fork mode-
PM2        | 2020-01-10T11:34:20: PM2 log: App [vecdb:0] online
PM2        | 2020-01-10T11:34:20: PM2 log: App [peer_manager:1] starting in -fork mode-
PM2        | 2020-01-10T11:34:20: PM2 log: App [peer_manager:1] online
PM2        | 2020-01-10T11:34:20: PM2 log: App [vecstore:2] starting in -fork mode-
PM2        | 2020-01-10T11:34:20: PM2 log: App [vecstore:2] online

/root/.pm2/logs/vecdb-error.log last 15 lines:
/root/.pm2/logs/vecstore-error.log last 15 lines:
/root/.pm2/logs/peer-manager-out.log last 15 lines:
1|peer_man | peer events subscription done
1|peer_man | Example app listening on port 50053!
1|peer_man | OpenError: IO error: /data/default_swarmdb: Invalid argument
1|peer_man |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
1|peer_man |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
1|peer_man |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21
1|peer_man |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14

/root/.pm2/logs/vecstore-out.log last 15 lines:
2|vecstore | FAISS index loading failed Error in faiss::{anonymous}::FileIOReader::FileIOReader(const char*) at index_io.cpp:136: Error: 'f' failed: could not open /data/model_hf for reading: No such file or directory
2|vecstore | Annoy index loading failed
2|vecstore | Starting server. Listening on port 50052.

/root/.pm2/logs/vecdb-out.log last 15 lines:
0|vecdb    | OpenError: IO error: /data/default_docsdb: Invalid argument
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    | OpenError: IO error: /data/default_docsdb: Invalid argument
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    | OpenError: IO error: /data/default_docsdb: Invalid argument
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14

/root/.pm2/logs/peer-manager-error.log last 15 lines:
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34

1|peer_manager  | TypeError: Cannot read property 'rows' of undefined
1|peer_manager  |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_manager  | TypeError: Cannot read property 'rows' of undefined
1|peer_manager  |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
  1. any other information so that will help while testing it out
(base) root@2fc5f5700917:/AquilaDB/src# cat DB_config.yml
docs:
  vecount: 100 # minimum data required to start indexing
faiss:
  init:
    nlist: 1 # number of cells
    nprobe: 1 # number of cells that are visited to perform a search
    bpv: 8 # bytes per vector
    bpsv: 8 # bytes per sub vector
    vd: 784 # fixed vector dimension
annoy:
  init:
    vd: 784 # fixed vector dimension
    smetric: 'angular' # similarity metric to be used
    ntrees: 10 # no. of trees
couchDB:
  DBInstance: default # database namespace
  host: /data
  user: root
  password:
vectorID:
  sync_t: 5000(base) root@2fc5f5700917:/AquilaDB/src#

@freakeinstein
Copy link
Member

freakeinstein commented Jan 11, 2020

@NikolaiPohodenko I was able to run your script successfully with successful kNN search returns.

But in your case from the logs I can see that the vecdb module is crashing.

/root/.pm2/logs/vecdb-out.log last 15 lines:
0|vecdb    | OpenError: IO error: /data/default_docsdb: Invalid argument
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21

So, it is required to identify the specific reason why it's crashing. What we know about your test environment which is different from AquilaDB automated test environment (Ubuntu AMD64):

  • your operating system is windows 10 enterprise

could you please follow some steps share more info?

  • which's your processor (CPU) model?
  • please run docker run -d -i -p 50051:50051 -t ammaorg/aquiladb:bleeding (with latest bleeding image) and check if you get the same issue (vecdb log mentioned above) and let me know
  • After following the above step and you still didn't get kNN search results, please make sure the disk you mount with -v parameter has right permissions to write and read from docker container (only applicable if you mount host system directory as volume)
  • If the permissions are okay, please make sure that no more than one AquilaDB containers that are running are mounted to the same host directory with -v param
  • Wait a few seconds between vector indexing and querying (in your case run second script a few seconds after first one) because, AquilaDB is an eventual consistent database.

@NikolaiPohodenko
Copy link

I confirm that in my case the problem was with the container access to the mounted host directory and, therefore, is of different nature than the issue description.

If I keep the data within the container, the example works ok.

@freakeinstein
Copy link
Member

@NikolaiPohodenko, thanks for the update. So, write permission prevented the document DB from accessing the mount directory which in turn blocked the change event generated by the document DB event listener and blocked updates to the vector DB as well. That's why you were getting empty results.

It will be great if you could figure out and share tips for Windows users who might face the same issue while mounting host directories.

And, I'm going to keep this issue open for while..

@Mikel-a-esparza
Copy link

Hello everyone!

I'm facing the same issue as described here. I have windows 10 os with docker and the image of AquilaDB installed. If don't mount the directory, the example works great. But if I try to mount it, the example stops working.

I've tried differents paths but no one seems to works. I've also checked the options in docker to share my drives and I've tried to start docker with superuser permissions but still nothing.

Anyone can help me with this? @NikolaiPohodenko How did you solve it??

Thanks in advance.

@NikolaiPohodenko
Copy link

NikolaiPohodenko commented Apr 9, 2020

@Mikel-a-esparza , I didn't make Aquila to store vectors on win-10 host file system.
I keep the data within the container. On Ubuntu external storage option works though.

I didn't delve into the problem, since I'm planning to migrate away from Aql, my reasons:

  1. Aql hasn't yet implemented Issues Delete an item by ID #25 return distance as part of document during k-NN search [ENHANCEMENT] #61
  2. Pre-search filtering might be eventually required
  3. A new bug (unreported yet): less than k-nn items returned, when there are identical vectors stored in Aql
  4. Underlying FAISS does not support neither return distance as part of document during k-NN search [ENHANCEMENT] #61 not pre-search filtering, which implies Aql may never have these.

@freakeinstein
Copy link
Member

freakeinstein commented Apr 11, 2020

Hi @NikolaiPohodenko , we're going through whiteboard discussions and code refactoring of ADB including changes to parts of existing architecture. It will take some time until next release. Unfortunately, features https://github.com/a-mma/AquilaDB/issues/25 and https://github.com/a-mma/AquilaDB/issues/61 will only be available with that release. We're sorry for the inconvenience. You can take a look at Elastic search which has implemented vector search within it. https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch It's straight forward and with Elastic search, you can implement all the use cases you see in our documentation. We're very thankful for your support in testing out ADB and reporting multiple issues.

@Mikel-a-esparza
Copy link

Hi @NikolaiPohodenko. First of all thank you so much for taking your time in answering. I will do my testing keeping the data inside the container an if it's performance is good I will evaluate to migrate the solution into a Ubuntu system.

Btw which other DB are you taking into consideration for this type of projects? I'm building an engine for Face similarity search so a fast knn search and a optional pre filtering would be great.

Thanks again.

@NikolaiPohodenko
Copy link

@Mikel-a-esparza in the end FAISS is the king, but there is also PostgreSQL+Cube.
https://news.ycombinator.com/item?id=21461755

@freakeinstein
Copy link
Member

Code is rewritten. Bug is irrelevant and covered. closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed high priority
Projects
None yet
Development

No branches or pull requests

5 participants