Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terms results differs between one node and multiple #38

Closed
clintongormley opened this issue Feb 22, 2010 · 4 comments
Closed

Terms results differs between one node and multiple #38

clintongormley opened this issue Feb 22, 2010 · 4 comments

Comments

@clintongormley
Copy link

hiya

When I run 'terms' queries against multiple nodes, I get incorrect results with these shard failures:

  "reason" : "BroadcastShardOperationFailedException[[es_test_2][2] ]; nested: RemoteTransportException[[Thumb, Tom][inet[/127.0.0.2:9302]][indices/terms/shard]]; nested: ArrayIndexOutOfBoundsException[23]; "

Start one server, then run this script. It pauses to allow you to stop the server, then to start 3 nodes, then it shows the diff between the two runs:

#!/bin/bash
curl -XPUT 'http://127.0.0.2:9200/es_test_1/'  -d '
{}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/'  -d '
{}'
curl -XPUT 'http://127.0.0.2:9200/_all/type_1/_mapping?ignoreDuplicates=false'  -d '
{"properties":{"num":{"store":"yes","type":"integer"},"text":{"store":"yes","type":"string"}}}'
curl -XPUT 'http://127.0.0.2:9200/_all/type_2/_mapping?ignoreDuplicates=false'  -d '
{"properties":{"num":{"store":"yes","type":"integer"},"text":{"store":"yes","type":"string"}}}'
curl -XPOST 'http://127.0.0.2:9200/_flush?refresh=true' 
sleep 2;
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/1'  -d '
{"num":2,"text":"foo"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/2'  -d '
{"num":3,"text":"foo"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/3'  -d '
{"num":4,"text":"foo"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/4'  -d '
{"num":5,"text":"foo"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/5'  -d '
{"num":6,"text":"foo bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/6'  -d '
{"num":7,"text":"foo bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/7'  -d '
{"num":8,"text":"foo bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/8'  -d '
{"num":9,"text":"foo bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/9'  -d '
{"num":10,"text":"foo bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/10'  -d '
{"num":11,"text":"foo bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/11'  -d '
{"num":12,"text":"foo bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/12'  -d '
{"num":13,"text":"foo bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/13'  -d '
{"num":14,"text":"bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/14'  -d '
{"num":15,"text":"bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/15'  -d '
{"num":16,"text":"bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/16'  -d '
{"num":17,"text":"bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/17'  -d '
{"num":18,"text":"baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/18'  -d '
{"num":19,"text":"baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/19'  -d '
{"num":20,"text":"baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/20'  -d '
{"num":21,"text":"baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/21'  -d '
{"num":22,"text":"bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/22'  -d '
{"num":23,"text":"bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/23'  -d '
{"num":24,"text":"bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/24'  -d '
{"num":25,"text":"bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/25'  -d '
{"num":26,"text":"foo baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/26'  -d '
{"num":27,"text":"foo baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/27'  -d '
{"num":28,"text":"foo baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/28'  -d '
{"num":29,"text":"foo baz"}'
curl -XPOST 'http://127.0.0.2:9200/_flush?refresh=true' 
sleep 2
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/30'  -d '
{
   "text" : "foo"
}
'

curl -XPOST 'http://127.0.0.2:9200/_flush?refresh=true' 
echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true'" > log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true' >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/es_test_1/_terms?pretty=true&fields=text&toInclusive=true'" >> log_1
curl -XGET 'http://127.0.0.2:9200/es_test_1/_terms?pretty=true&fields=text&toInclusive=true' >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&minFreq=17'" >> log_1 
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&minFreq=17' >> log_1 

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&maxFreq=16&fields=text&toInclusive=true'"  >> log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&maxFreq=16&fields=text&toInclusive=true'  >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&size=2&fields=text&toInclusive=true'"  >> log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&size=2&fields=text&toInclusive=true'  >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&sort=freq&fields=text&toInclusive=true'"  >> log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&sort=freq&fields=text&toInclusive=true'  >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&from=baz'"  >> log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&from=baz'  >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&to=baz&fields=text&toInclusive=true'"  >> log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&to=baz&fields=text&toInclusive=true'  >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&from=baz&fromInclusive=false'"  >> log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&from=baz&fromInclusive=false'  >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&to=baz&fields=text&toInclusive=false'"  >> log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&to=baz&fields=text&toInclusive=false'  >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&prefix=ba'"  >> log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&prefix=ba'  >> log_1

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&regexp=foo|baz'"  >> log_1
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&regexp=foo|baz'  >> log_1
  #########################################################################


echo "

Now kill the current server, and start 3 nodes, then press Enter

"

read

curl -XPUT 'http://127.0.0.2:9200/es_test_1/'  -d '
{}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/'  -d '
{}'
curl -XPUT 'http://127.0.0.2:9200/_all/type_1/_mapping?ignoreDuplicates=false'  -d '
{"properties":{"num":{"store":"yes","type":"integer"},"text":{"store":"yes","type":"string"}}}'
curl -XPUT 'http://127.0.0.2:9200/_all/type_2/_mapping?ignoreDuplicates=false'  -d '
{"properties":{"num":{"store":"yes","type":"integer"},"text":{"store":"yes","type":"string"}}}'
curl -XPOST 'http://127.0.0.2:9200/_flush?refresh=true' 
sleep 2;
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/1'  -d '
{"num":2,"text":"foo"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/2'  -d '
{"num":3,"text":"foo"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/3'  -d '
{"num":4,"text":"foo"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/4'  -d '
{"num":5,"text":"foo"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/5'  -d '
{"num":6,"text":"foo bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/6'  -d '
{"num":7,"text":"foo bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/7'  -d '
{"num":8,"text":"foo bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/8'  -d '
{"num":9,"text":"foo bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/9'  -d '
{"num":10,"text":"foo bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/10'  -d '
{"num":11,"text":"foo bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/11'  -d '
{"num":12,"text":"foo bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/12'  -d '
{"num":13,"text":"foo bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/13'  -d '
{"num":14,"text":"bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/14'  -d '
{"num":15,"text":"bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/15'  -d '
{"num":16,"text":"bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/16'  -d '
{"num":17,"text":"bar baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/17'  -d '
{"num":18,"text":"baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/18'  -d '
{"num":19,"text":"baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/19'  -d '
{"num":20,"text":"baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/20'  -d '
{"num":21,"text":"baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/21'  -d '
{"num":22,"text":"bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/22'  -d '
{"num":23,"text":"bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/23'  -d '
{"num":24,"text":"bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/24'  -d '
{"num":25,"text":"bar"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/25'  -d '
{"num":26,"text":"foo baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_2/26'  -d '
{"num":27,"text":"foo baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_1/27'  -d '
{"num":28,"text":"foo baz"}'
curl -XPUT 'http://127.0.0.2:9200/es_test_2/type_2/28'  -d '
{"num":29,"text":"foo baz"}'
curl -XPOST 'http://127.0.0.2:9200/_flush?refresh=true' 
sleep 2
curl -XPUT 'http://127.0.0.2:9200/es_test_1/type_1/30'  -d '
{
   "text" : "foo"
}
'

curl -XPOST 'http://127.0.0.2:9200/_flush?refresh=true' 
echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true'" > log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true' >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/es_test_1/_terms?pretty=true&fields=text&toInclusive=true'" >> log_2
curl -XGET 'http://127.0.0.2:9200/es_test_1/_terms?pretty=true&fields=text&toInclusive=true' >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&minFreq=17'" >> log_2 
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&minFreq=17' >> log_2 

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&maxFreq=16&fields=text&toInclusive=true'"  >> log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&maxFreq=16&fields=text&toInclusive=true'  >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&size=2&fields=text&toInclusive=true'"  >> log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&size=2&fields=text&toInclusive=true'  >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&sort=freq&fields=text&toInclusive=true'"  >> log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&sort=freq&fields=text&toInclusive=true'  >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&from=baz'"  >> log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&from=baz'  >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&to=baz&fields=text&toInclusive=true'"  >> log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&to=baz&fields=text&toInclusive=true'  >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&from=baz&fromInclusive=false'"  >> log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&from=baz&fromInclusive=false'  >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&to=baz&fields=text&toInclusive=false'"  >> log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&to=baz&fields=text&toInclusive=false'  >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&prefix=ba'"  >> log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&prefix=ba'  >> log_2

echo "
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&regexp=foo|baz'"  >> log_2
curl -XGET 'http://127.0.0.2:9200/_terms?pretty=true&fields=text&toInclusive=true&regexp=foo|baz'  >> log_2

echo "




Showing diff:
"

diff -y --left-column log_1 log_2
@kimchy
Copy link
Member

kimchy commented Feb 25, 2010

Just pushed a fix for this, the array index exception is fixed. Can you check?

@clintongormley
Copy link
Author

That's fixed++

I'm away for the weekend, but I'll be back on Tuesday to find more bugs :)

@kimchy
Copy link
Member

kimchy commented Feb 25, 2010

great stuff!, I really appreciate your effort into making elasticsearch better

@clintongormley
Copy link
Author

likewise :)

dadoonet pushed a commit that referenced this issue Jun 5, 2015
Reading the logic, saw a few typos. Feel free to just fix them and not bother with a PR.

Closed #38.
dadoonet added a commit that referenced this issue Jun 5, 2015
Closes #38.

(cherry picked from commit 7ee03e7)
dadoonet added a commit that referenced this issue Jun 5, 2015
Microsoft team has released a new specific project to deal with storage with a much cleaner API than the previous version.

See https://github.com/azure/azure-storage-java
Documentation is here: http://azure.microsoft.com/en-us/documentation/articles/storage-java-how-to-use-blob-storage/

Note that the produced ZIP file has been reduced from 5mb to 1.3mb.

Related to #38

(cherry picked from commit 4467254)
(cherry picked from commit b2f1e4d)
dadoonet added a commit that referenced this issue Jun 5, 2015
This first version adds `azure-management` 0.7.0 instead of using our own XML implementation.
We can now have more control and give more options to the users.

We now support different keystore types using `cloud.azure.management.keystore.type`:

* `pkcs12`
* `jceks`
* `jks`

Closes #38

(cherry picked from commit 72c77d3)
(cherry picked from commit d2541ab)
dadoonet added a commit that referenced this issue Jun 5, 2015
rmuir pushed a commit to rmuir/elasticsearch that referenced this issue Nov 8, 2015
From original PR elastic#17 from @fcamblor

If you try to index a document with an invalid metadata, the full document is rejected.

For example:

```html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html lang="fr">
<head>
<title>Hello</title>
<meta name="date" content="">
<meta name="Author" content="kimchy">
<meta name="Keywords" content="elasticsearch,cool,bonsai">
</head>
<body>World</body>
</html>
```

has a non parseable date.

This fix add a new option that ignore parsing errors `"index.mapping.attachment.ignore_errors":true` (default to `true`).

Closes elastic#17, elastic#38.
rmuir pushed a commit to rmuir/elasticsearch that referenced this issue Nov 8, 2015
rmuir pushed a commit to rmuir/elasticsearch that referenced this issue Nov 8, 2015
Original request:
        I am sending multiple pdf, word etc. attachments in one documents to be indexed.

        Some of them (pdf) are encrypted and I am getting a MapperParsingException caused by org.apache.tika.exception.TikaException: Unable to extract PDF content cause by
        org.apache.pdfbox.exceptions.WrappedIOException: Error decrypting document.

        I was wondering if the attachment mapper could expose some switch to ignore the documents it can not extract?

 As we now have option `ignore_errors`, we can support it. See elastic#38 relative to this option.

Closes elastic#18.
njlawton pushed a commit to njlawton/elasticsearch that referenced this issue Mar 15, 2017
henningandersen pushed a commit to henningandersen/elasticsearch that referenced this issue Jun 4, 2020
With this commit we add a new challenge `index-logs-fixed-daily-volume` which
allows to ingest a fixed (but parameterizable) amount of raw logs per day. Also,
the number of days can be specified.

We also add a new parameter `daily_logging_volume` that allows users to limit
the amount of (raw) data that is generated per (logical) day. If the limit is
hit, we simulate that the next document is generated on the next day. With
another new parameter `number_of_days` we can configure for how many days logs
should be generated.

We also remove support to specify an `end_point` in the timestamp generator. 
This functionality was only used in one place and has also generated an
unrealistic distribution of timestamps (random across the whole range) and
should instead be simulated with an acceleration factor.

Finally, we materialize relative timestamps eagerly so we can easily
generate timestamps on the hot code path without making any further decisions.
This slightly changes the semantics of the start timestamp of the timestamp
generator (it's now evaluated when the object is created instead of when its
generator method is first invoked) but this does not matter in practice as both
calls happen within a very short time period.

Relates elastic#38
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Oct 2, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants