java.lang.Object
org.apache.lucene.util.bkd.BKDWriter
- All Implemented Interfaces:
Closeable
,AutoCloseable
Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and
smaller N-dim rectangles (cells) until the number of points in a given rectangle is <=
config.maxPointsInLeafNode
. The tree is partially balanced, which means the leaf nodes
will have the requested config.maxPointsInLeafNode
except one that might have less.
Leaf nodes may straddle the two bottom levels of the binary tree. Values that fall exactly on a
cell boundary may be in either cell.
The number of dimensions can be 1 to 8, but every byte[] value is fixed length.
This consumes heap during writing: it allocates a Long[numLeaves]
, a
byte[numLeaves*(1+config.bytesPerDim)]
and then uses up to the specified
maxMBSortInHeap
heap space for writing.
NOTE: This can write at most Integer.MAX_VALUE * config.maxPointsInLeafNode
/ config.bytesPerDim total points.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
private static interface
flat representation of a kd-treeprivate static class
private static class
private class
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
private final ArrayUtil.ByteArrayComparator
(package private) final int[]
private final ArrayUtil.ByteArrayComparator
protected final BKDConfig
BKD tree configurationstatic final float
Default maximum heap to use, before spilling to (slower) diskprivate final DocIdsWriter
protected final FixedBitSet
private final BKDUtil.ByteArrayPredicate
private boolean
private final int
(package private) final double
protected final byte[]
Maximum per-dim values, packedprivate final int
protected final byte[]
Minimum per-dim values, packedprotected long
private PointWriter
(package private) final byte[]
(package private) final BytesRef
(package private) final BytesRef
(package private) final byte[]
private static final int
Number of splits before we compute the exact bounding box of an inner node.(package private) final TrackingDirectoryWrapper
(package private) final String
private IndexOutput
private final long
An upper bound on how many points the caller will add (includes deletions)static final int
static final int
static final int
static final int
static final int
static final int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
add
(byte[] packedValue, int docID) private int
appendBlock
(ByteBuffersDataOutput writeBuffer, List<byte[]> blocks) Appends the current contents of writeBuffer as another block on the growing in-memory fileprivate void
build
(int leavesOffset, int numLeaves, MutablePointTree reader, int from, int to, IndexOutput out, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds) private void
build
(int leavesOffset, int numLeaves, BKDRadixSelector.PathSlice points, IndexOutput out, BKDRadixSelector radixSelector, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds) The point writer contains the data that is going to be splitted using radix selection.private void
checkMaxLeafNodeCount
(int numLeaves) void
close()
private void
computeCommonPrefixLength
(HeapPointWriter heapPointWriter, byte[] commonPrefix, int from, int to) private static BytesRef[]
computeMinMax
(int count, IntFunction<BytesRef> packedValues, int offset, int length) Return an array that contains the min and max values for the [offset, offset+length] interval of the givenBytesRef
s.private void
computePackedValueBounds
(MutablePointTree values, int from, int to, byte[] minPackedValue, byte[] maxPackedValue, BytesRef scratch) private void
computePackedValueBounds
(BKDRadixSelector.PathSlice slice, byte[] minPackedValue, byte[] maxPackedValue) finish
(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut) Writes the BKD tree to the providedIndexOutput
s and returns aRunnable
that writes the index of the tree if at least one point has been added, ornull
otherwise.private int
getNumLeftLeafNodes
(int numLeaves) private void
private IORunnable
makeWriter
(IndexOutput metaOut, IndexOutput indexOut, byte[] splitDimensionValues, long[] leafBlockFPs, long dataStartFP) merge
(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, List<MergeState.DocMap> docMaps, List<PointValues> readers) More efficient bulk-add for incomingPointValues
s.private byte[]
packIndex
(BKDWriter.BKDTreeLeafNodes leafNodes) Packs the two arrays, representing a semi-balanced binary tree, into a compact byte[] structure.private int
recursePackIndex
(ByteBuffersDataOutput writeBuffer, BKDWriter.BKDTreeLeafNodes leafNodes, long minBlockFP, List<byte[]> blocks, byte[] lastSplitValues, boolean[] negativeDeltas, boolean isLeft, int leavesOffset, int numLeaves) lastSplitValues is per-dimension split value previously seen; we use this to prefix-code the split byte[] on each inner nodeprivate static int
runLen
(IntFunction<BytesRef> packedValues, int start, int end, int byteOffset) protected int
split
(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits) Pick the next dimension to split.private HeapPointWriter
switchToHeap
(PointWriter source) Pull a partition back into heap once the point count is low enough while recursing.private static boolean
valueInBounds
(BKDConfig config, BytesRef packedValue, byte[] minPackedValue, byte[] maxPackedValue) private static boolean
valueInOrder
(BKDConfig config, long ord, int sortedDim, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset, int doc, int lastDoc) private static boolean
valuesInOrderAndBounds
(BKDConfig config, int count, int sortedDim, byte[] minPackedValue, byte[] maxPackedValue, IntFunction<BytesRef> values, int[] docs, int docsOffset) private Error
verifyChecksum
(Throwable priorException, PointWriter writer) Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.private static void
verifyParams
(double maxMBSortInHeap, long totalPointCount) private void
writeActualBounds
(DataOutput out, int[] commonPrefixLengths, int count, IntFunction<BytesRef> packedValues) private void
writeCommonPrefixes
(DataOutput out, int[] commonPrefixes, byte[] packedValue) writeField
(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree reader) Write a field from aMutablePointTree
.private IORunnable
writeField1Dim
(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree reader) private IORunnable
writeFieldNDims
(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree values) private void
writeHighCardinalityLeafBlockPackedValues
(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef> packedValues, int compressedByteOffset) private void
writeIndex
(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, int numLeaves, byte[] packedIndex, long dataStartFP) private void
writeIndex
(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, BKDWriter.BKDTreeLeafNodes leafNodes, long dataStartFP) private void
writeLeafBlockDocs
(DataOutput out, int[] docIDs, int start, int count) private void
writeLeafBlockPackedValues
(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef> packedValues, int leafCardinality) private void
writeLeafBlockPackedValuesRange
(DataOutput out, int[] commonPrefixLengths, int start, int end, IntFunction<BytesRef> packedValues) private void
writeLowCardinalityLeafBlockPackedValues
(DataOutput out, int[] commonPrefixLengths, int count, IntFunction<BytesRef> packedValues)
-
Field Details
-
CODEC_NAME
- See Also:
-
VERSION_START
public static final int VERSION_START- See Also:
-
VERSION_LEAF_STORES_BOUNDS
public static final int VERSION_LEAF_STORES_BOUNDS- See Also:
-
VERSION_SELECTIVE_INDEXING
public static final int VERSION_SELECTIVE_INDEXING- See Also:
-
VERSION_LOW_CARDINALITY_LEAVES
public static final int VERSION_LOW_CARDINALITY_LEAVES- See Also:
-
VERSION_META_FILE
public static final int VERSION_META_FILE- See Also:
-
VERSION_CURRENT
public static final int VERSION_CURRENT- See Also:
-
SPLITS_BEFORE_EXACT_BOUNDS
private static final int SPLITS_BEFORE_EXACT_BOUNDSNumber of splits before we compute the exact bounding box of an inner node.- See Also:
-
DEFAULT_MAX_MB_SORT_IN_HEAP
public static final float DEFAULT_MAX_MB_SORT_IN_HEAPDefault maximum heap to use, before spilling to (slower) disk- See Also:
-
config
BKD tree configuration -
comparator
-
equalsPredicate
-
commonPrefixComparator
-
tempDir
-
tempFileNamePrefix
-
maxMBSortInHeap
final double maxMBSortInHeap -
scratchDiff
final byte[] scratchDiff -
scratch
final byte[] scratch -
scratchBytesRef1
-
scratchBytesRef2
-
commonPrefixLengths
final int[] commonPrefixLengths -
docsSeen
-
pointWriter
-
finished
private boolean finished -
tempInput
-
maxPointsSortInHeap
private final int maxPointsSortInHeap -
minPackedValue
protected final byte[] minPackedValueMinimum per-dim values, packed -
maxPackedValue
protected final byte[] maxPackedValueMaximum per-dim values, packed -
pointCount
protected long pointCount -
totalPointCount
private final long totalPointCountAn upper bound on how many points the caller will add (includes deletions) -
maxDoc
private final int maxDoc -
docIdsWriter
-
-
Constructor Details
-
BKDWriter
-
-
Method Details
-
verifyParams
private static void verifyParams(double maxMBSortInHeap, long totalPointCount) -
initPointWriter
- Throws:
IOException
-
add
- Throws:
IOException
-
writeField
public IORunnable writeField(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree reader) throws IOException Write a field from aMutablePointTree
. This way of writing points is faster than regular writes withadd(byte[], int)
since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.- Throws:
IOException
-
computePackedValueBounds
private void computePackedValueBounds(MutablePointTree values, int from, int to, byte[] minPackedValue, byte[] maxPackedValue, BytesRef scratch) -
writeFieldNDims
private IORunnable writeFieldNDims(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree values) throws IOException - Throws:
IOException
-
writeField1Dim
private IORunnable writeField1Dim(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree reader) throws IOException - Throws:
IOException
-
merge
public IORunnable merge(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, List<MergeState.DocMap> docMaps, List<PointValues> readers) throws IOException More efficient bulk-add for incomingPointValues
s. This does a merge sort of the already sorted values and currently only works when numDims==1. This returns -1 if all documents containing dimensional values were deleted.- Throws:
IOException
-
getNumLeftLeafNodes
private int getNumLeftLeafNodes(int numLeaves) -
checkMaxLeafNodeCount
private void checkMaxLeafNodeCount(int numLeaves) -
finish
public IORunnable finish(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut) throws IOException Writes the BKD tree to the providedIndexOutput
s and returns aRunnable
that writes the index of the tree if at least one point has been added, ornull
otherwise.- Throws:
IOException
-
makeWriter
private IORunnable makeWriter(IndexOutput metaOut, IndexOutput indexOut, byte[] splitDimensionValues, long[] leafBlockFPs, long dataStartFP) -
packIndex
Packs the two arrays, representing a semi-balanced binary tree, into a compact byte[] structure.- Throws:
IOException
-
appendBlock
Appends the current contents of writeBuffer as another block on the growing in-memory file -
recursePackIndex
private int recursePackIndex(ByteBuffersDataOutput writeBuffer, BKDWriter.BKDTreeLeafNodes leafNodes, long minBlockFP, List<byte[]> blocks, byte[] lastSplitValues, boolean[] negativeDeltas, boolean isLeft, int leavesOffset, int numLeaves) throws IOException lastSplitValues is per-dimension split value previously seen; we use this to prefix-code the split byte[] on each inner node- Throws:
IOException
-
writeIndex
private void writeIndex(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, BKDWriter.BKDTreeLeafNodes leafNodes, long dataStartFP) throws IOException - Throws:
IOException
-
writeIndex
private void writeIndex(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, int numLeaves, byte[] packedIndex, long dataStartFP) throws IOException - Throws:
IOException
-
writeLeafBlockDocs
private void writeLeafBlockDocs(DataOutput out, int[] docIDs, int start, int count) throws IOException - Throws:
IOException
-
writeLeafBlockPackedValues
private void writeLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef> packedValues, int leafCardinality) throws IOException - Throws:
IOException
-
writeLowCardinalityLeafBlockPackedValues
private void writeLowCardinalityLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, IntFunction<BytesRef> packedValues) throws IOException - Throws:
IOException
-
writeHighCardinalityLeafBlockPackedValues
private void writeHighCardinalityLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef> packedValues, int compressedByteOffset) throws IOException - Throws:
IOException
-
writeActualBounds
private void writeActualBounds(DataOutput out, int[] commonPrefixLengths, int count, IntFunction<BytesRef> packedValues) throws IOException - Throws:
IOException
-
computeMinMax
private static BytesRef[] computeMinMax(int count, IntFunction<BytesRef> packedValues, int offset, int length) Return an array that contains the min and max values for the [offset, offset+length] interval of the givenBytesRef
s. -
writeLeafBlockPackedValuesRange
private void writeLeafBlockPackedValuesRange(DataOutput out, int[] commonPrefixLengths, int start, int end, IntFunction<BytesRef> packedValues) throws IOException - Throws:
IOException
-
runLen
-
writeCommonPrefixes
private void writeCommonPrefixes(DataOutput out, int[] commonPrefixes, byte[] packedValue) throws IOException - Throws:
IOException
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
verifyChecksum
Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.- Throws:
IOException
-
split
protected int split(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits) Pick the next dimension to split.- Parameters:
minPackedValue
- the min values for all dimensionsmaxPackedValue
- the max values for all dimensionsparentSplits
- how many times each dim has been split on the parent levels- Returns:
- the dimension to split
-
switchToHeap
Pull a partition back into heap once the point count is low enough while recursing.- Throws:
IOException
-
build
private void build(int leavesOffset, int numLeaves, MutablePointTree reader, int from, int to, IndexOutput out, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds) throws IOException - Throws:
IOException
-
computePackedValueBounds
private void computePackedValueBounds(BKDRadixSelector.PathSlice slice, byte[] minPackedValue, byte[] maxPackedValue) throws IOException - Throws:
IOException
-
build
private void build(int leavesOffset, int numLeaves, BKDRadixSelector.PathSlice points, IndexOutput out, BKDRadixSelector radixSelector, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds) throws IOException The point writer contains the data that is going to be splitted using radix selection. /* This method is used when we are merging previously written segments, in the numDims > 1 case.- Throws:
IOException
-
computeCommonPrefixLength
private void computeCommonPrefixLength(HeapPointWriter heapPointWriter, byte[] commonPrefix, int from, int to) -
valuesInOrderAndBounds
private static boolean valuesInOrderAndBounds(BKDConfig config, int count, int sortedDim, byte[] minPackedValue, byte[] maxPackedValue, IntFunction<BytesRef> values, int[] docs, int docsOffset) -
valueInOrder
private static boolean valueInOrder(BKDConfig config, long ord, int sortedDim, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset, int doc, int lastDoc) -
valueInBounds
-