iknn-HA

Incomplete k-nearest neighbor query in postgresql using HA algorithm

Algorithm Discription

HA algorithm:

Please see Mr. Gao's paper: IkNN-TFS-Yunjun Gao-20150115

Initialization:

Set an extra table to record field-nBins-nObject relations, in other words, the number of bins on each field names, and the number of objects at present that have complete value on this field.
On each field, sort all objects that are complete on this field and distribute them into bins. The number of objects in each bins are roughly equal (the difference between two arbitruary bins is no more than 1, and the numbers of objects are non-ascendent).
For each object inserted, deleted or updated, the bins are updated to maintain proper distribution. The maintaining algorithm is best described in code. Please see the three triggers in pgsql/HAinit.sql.

Query

Enumerate each field of the query object;
For each field that is complete in the query object, binary search the bins to find which bin (whichBin in code) should the query object belong to.
Enumerate each binary-searched bin, compute the distance between each object in the bin and the query object, then do the following things under two situations: 1. If the candidate set is not full, insert the object into the candidate set; 2. If the candidate set is full, compare the newly computed distance with the largest distance in the candidate set. If the newly computed distance is smaller than the largest distance in the candidate set, do a substitution.
The candidate set is maintained by a max-heap.
Return all tuples remained in the candidate set.

How to use?

0. Install postgresql-server-dev fist

    sudo apt-get install postgreslq-server-dev-all

1. Clone and enter my repo (in terminal)

    git clone [email protected]:Travelinglight/iknn-HA.git
    cd iknn-HA

2. Import LPinit.sql (in postgresql)

    \i pgsql/HAinit.sql

3. Initialize a target table to support iknn query

You may want to use the sample table "test". Import the table before you initialize it:

    CREATE DATABASE iknn;
    \i iknn.sql

Now initialize the "test" table:

    select hainit('test', 2,4,3,4);

Command format:

'test' is the table name
2,4,3,4 are the number of bins on each field. The table 'test' has four columns, so there are four numbers following the table name.

The hainit function automatically does these things:

Create a tmp table with column dimension, nbin, nobj;
Add a column to the original table: ha_id, recording the unique id for ha algorithm;
Create table for each bin, representing buckets, with the name habin_[table name]_[field name]_id. e.g. habin_test_a0_1. Each bin has two columns: val and ha_id;
Build up b-tree index on habin_[table name]_[field name]_id at column val, to auto-sort the tuples with field value;
Distribute objects into bins;
Set triggers to maintain the three columns and the extra tables on insert, update and delete.

4. Make and install iknnLP function (in terminal)

    cd c
    make
    sudo make install
    cd ..

5. Import iknnLP function (in postgresql)

    \i c/iknnHA.sql

6. Performing iknn query with LP althrothm

    select a, b, c, d, distance from iknnHA('find 3 nearest neighbour of (a0,a2,a3)(31,33,34) from test') AS (a int, b int, c int, d int, distance float);

a0,a2,a3 are columns in the table test.
31,33,34 are values of the columns respectively.
The tuples returned are those considered nearest with the query object.

7. Here's the result

 a | b  | c  | d  | distance 
---+----+----+----+----------
   | 82 | 43 | 38 |      232
   | 17 |    | 35 |        4
   |  2 |    | 39 |      100
(3 rows)

8. Inport LPwithdraw.sql

    \i HAwithdraw.sql

9. Withdraw iknn query support

    select hawithdraw([table name]);

This function automatically does these things:

Drop the extra column (ha_id) of the original table;
Drop all tmp tables;
Drop the three triggers that maintain the tmp table and the bins;

Q&A

I cannot create hstore extension when importing LPinit.sql.

In ubuntu, you need to install the contrib before you create them:

    sudo apt-get install postgresql-contrib-9.4

Or you can install postgresql with those contribs:

    sudo apt-get install postgresql postgresql-contrib

Contact us

You can get the paper from Mr. Gao: [email protected]
The projet is coded by Kingston Chen. Feel free to ask any question: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
c		c
pgsql		pgsql
.gitignore		.gitignore
README.md		README.md
iknn.sql		iknn.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iknn-HA

Algorithm Discription

HA algorithm:

Initialization:

Query

How to use?

0. Install postgresql-server-dev fist

1. Clone and enter my repo (in terminal)

2. Import LPinit.sql (in postgresql)

3. Initialize a target table to support iknn query

4. Make and install iknnLP function (in terminal)

5. Import iknnLP function (in postgresql)

6. Performing iknn query with LP althrothm

7. Here's the result

8. Inport LPwithdraw.sql

9. Withdraw iknn query support

Q&A

I cannot create hstore extension when importing LPinit.sql.

Contact us

About

Releases

Packages

Languages

Travelinglight/iknn-HA

Folders and files

Latest commit

History

Repository files navigation

iknn-HA

Algorithm Discription

HA algorithm:

Initialization:

Query

How to use?

0. Install postgresql-server-dev fist

1. Clone and enter my repo (in terminal)

2. Import LPinit.sql (in postgresql)

3. Initialize a target table to support iknn query

4. Make and install iknnLP function (in terminal)

5. Import iknnLP function (in postgresql)

6. Performing iknn query with LP althrothm

7. Here's the result

8. Inport LPwithdraw.sql

9. Withdraw iknn query support

Q&A

I cannot create hstore extension when importing LPinit.sql.

Contact us

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages