Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: index row size 1648 exceeds maximum 1336 for index "..." #9

Open
To4e opened this issue Oct 19, 2016 · 19 comments
Open

ERROR: index row size 1648 exceeds maximum 1336 for index "..." #9

To4e opened this issue Oct 19, 2016 · 19 comments
Labels

Comments

@To4e
Copy link

To4e commented Oct 19, 2016

New problem... Trying to create the rum index:

create index some_index on some_table using rum (some_column rum_tsvector_ops);

Here is the text of the error:

SQL Error [54000]: ERROR: index row size 1648 exceeds maximum 1336 for index "some_index"
org.postgresql.util.PSQLException: ERROR: index row size 1648 exceeds maximum 1336 for index "some_index"
@za-arthur
Copy link
Contributor

Index key can't have size more than 1336 bytes. It is a limitation of the current version. It is true also for GIN (it could has another limit).

It seems there is some big string in your table. You can check it with the query:

select word, char_length(word)
from ts_stat('select some_column from some_table')
order by char_length(word) desc
limit 10;

@za-arthur za-arthur added the bug label Oct 19, 2016
@To4e
Copy link
Author

To4e commented Oct 20, 2016

Yes, i got HUGE strings in this table, not the only one, a column with ts_vector of it and a gin index on it.

@za-arthur
Copy link
Contributor

Closed

@za-arthur za-arthur reopened this Oct 25, 2016
@za-arthur
Copy link
Contributor

We are working on this issue. And as I know there are huge urls in your strings.

Are you do search by these urls? Are they important to you? If not, then it is easy to fix it. Just don't store it in index (I can explain how).

If they important we can fix RUM and cut off urls. But I think we need it anyway, other users may have similar issue.

@To4e
Copy link
Author

To4e commented Nov 10, 2016

Glad to hear that you are working on this issue.

We are not using these urls at the moment, but they can be used in the near future. So, it would be great, if you had the RUM fixed according to this fact.

@za-arthur
Copy link
Contributor

We find a solution. We can add new OPERATOR CLASS which will store hash of lexems, then you can store huge strings. But in this OPERATOR CLASS we cant use prefix (or partial) matching.

Do you use prefix matching?

@To4e
Copy link
Author

To4e commented Nov 10, 2016

Can you trim the url and raise some notice like:

NOTICE: Index key can't have size more than 1336 bytes.
HINT: key has been trimmed.

@za-arthur
Copy link
Contributor

Can you trim the url and raise some notice like:

NOTICE: Index key can't have size more than 1336 bytes.
HINT: key has been trimmed.

I think we can trim the url also. We will fix limits for posting trees, and they will become as in GIN.

@To4e
Copy link
Author

To4e commented Nov 10, 2016

I think we can trim the url also. We will fix limits for posting trees, and they will become as in GIN.

Sounds great, thank you.

@za-arthur
Copy link
Contributor

za-arthur commented Nov 13, 2016

@To4e , please can you check the issue_9_max_item_size branch?
https://github.com/postgrespro/rum/tree/issue_9_max_item_size

@To4e
Copy link
Author

To4e commented Nov 15, 2016

@select-artur, still having the same problem:

ERROR: index row size 1544 exceeds maximum 1352 for index "some_index"

@akorotkov
Copy link
Contributor

Please, try commit 58fee28.

@To4e
Copy link
Author

To4e commented Dec 9, 2016

Please, try commit 58fee28.

Well, installed it, tests are ok, rebuilded the extension, started the creation of the index, and ... about 3 hours of total suffering of the machine, where the database is, ended with its self reboot:

IO Max: 202797/s
Load: 26.71 29.73 31.23

@akorotkov
Copy link
Contributor

akorotkov commented Dec 9, 2016

Could you shared some more details about dataset and machine? Table size, row count, postgresql settings, amount of RAM, processor, disk type and space etc.

@To4e
Copy link
Author

To4e commented Dec 12, 2016

Here you are:

Table size: 20 GB + 51 GB toast,
Rows count: 19890432,
Index on search_vector: 6137 MB,
RAM: 25.36 GB,
Number of processors: 8,
Disk type: ssd,
Shared buffers: 16 GB,
work_mem: 1GB,
maintenance_work_mem: 4GB.

And this is the test server.

@akorotkov
Copy link
Contributor

akorotkov commented Dec 15, 2016

Do we have any chance to access the test server? Or share dataset? Or share some kind of anonymized dataset where issue still occur?

@To4e
Copy link
Author

To4e commented Dec 15, 2016

Sorry, but no, and no. This information is some kind confidential.

@akorotkov
Copy link
Contributor

But could you try to generate some random data where same issue will occur?

@To4e
Copy link
Author

To4e commented Dec 15, 2016

Not a good variant too, because the random data will give different size of the vector, tockens and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants