Module org.apache.lucene.core
Class Lucene90CompressingTermVectorsWriter
java.lang.Object
org.apache.lucene.codecs.TermVectorsWriter
org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsWriter
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Accountable
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
private class
a pending docprivate class
a pending field -
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final boolean
(package private) static final String
private final int
private final CompressionMode
private final Compressor
(package private) static final int
private FieldsIndexWriter
private final BytesRef
private int[]
private final int
(package private) static final int
private IndexOutput
private long
private long
private long
private int
(package private) static final int
(package private) static final int
private final ByteBuffersDataOutput
private int[]
(package private) static final int
private final Deque
<Lucene90CompressingTermVectorsWriter.DocData> (package private) static final int
private int[]
private final ByteBuffersDataOutput
private final String
private int[]
private final ByteBuffersDataOutput
(package private) static final String
(package private) static final String
(package private) static final String
(package private) static final String
private IndexOutput
(package private) static final int
(package private) static final int
private final BlockPackedWriter
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
Constructor Summary
ConstructorsConstructorDescriptionLucene90CompressingTermVectorsWriter
(Directory directory, SegmentInfo si, String segmentSuffix, IOContext context, String formatName, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockShift) Sole constructor. -
Method Summary
Modifier and TypeMethodDescriptionaddDocData
(int numVectorFields) void
addPosition
(int position, int startOffset, int endOffset, BytesRef payload) Adds a term position and offsetsvoid
Called by IndexWriter when writing new segments.private boolean
canPerformBulkMerge
(MergeState mergeState, MatchingReaders matchingReaders, int readerIndex) void
close()
private void
copyChunks
(MergeState mergeState, Lucene90CompressingTermVectorsWriter.CompressingTermVectorsSub sub, int fromDocID, int toDocID) void
finish
(int numDocs) Called beforeTermVectorsWriter.close()
, passing in the number of documents that were written.void
Called after a doc and all its fields have been added.void
Called after a field and all its terms have been added.private void
flush
(boolean force) private int[]
Returns a sorted array containing unique field numbersprivate void
flushFields
(int totalFields, int[] fieldNums) private void
flushFlags
(int totalFields, int[] fieldNums) private int
flushNumFields
(int chunkDocs) private void
flushNumTerms
(int totalFields) private void
flushOffsets
(int[] fieldNums) private void
private void
private void
private void
Returns nested resources of this class.int
merge
(MergeState mergeState) Merges in the term vectors from the readers inmergeState
.long
Return the memory usage of this object in bytes.void
startDocument
(int numVectorFields) Called before writing the term vectors of the document.void
startField
(FieldInfo info, int numTerms, boolean positions, boolean offsets, boolean payloads) Called before writing the terms of the field.void
Adds a term and its term frequencyfreq
.(package private) boolean
tooDirty
(Lucene90CompressingTermVectorsReader candidate) Returns true if we should recompress this reader, even though we could bulk merge compressed dataprivate boolean
Methods inherited from class org.apache.lucene.codecs.TermVectorsWriter
addAllDocVectors, finishTerm
-
Field Details
-
VECTORS_EXTENSION
- See Also:
-
VECTORS_INDEX_EXTENSION
- See Also:
-
VECTORS_META_EXTENSION
- See Also:
-
VECTORS_INDEX_CODEC_NAME
- See Also:
-
VERSION_START
static final int VERSION_START- See Also:
-
VERSION_CURRENT
static final int VERSION_CURRENT- See Also:
-
META_VERSION_START
static final int META_VERSION_START- See Also:
-
PACKED_BLOCK_SIZE
static final int PACKED_BLOCK_SIZE- See Also:
-
POSITIONS
static final int POSITIONS- See Also:
-
OFFSETS
static final int OFFSETS- See Also:
-
PAYLOADS
static final int PAYLOADS- See Also:
-
FLAGS_BITS
static final int FLAGS_BITS -
segment
-
indexWriter
-
metaStream
-
vectorsStream
-
compressionMode
-
compressor
-
chunkSize
private final int chunkSize -
numChunks
private long numChunks -
numDirtyChunks
private long numDirtyChunks -
numDirtyDocs
private long numDirtyDocs -
numDocs
private int numDocs -
pendingDocs
-
curDoc
-
curField
-
lastTerm
-
positionsBuf
private int[] positionsBuf -
startOffsetsBuf
private int[] startOffsetsBuf -
lengthsBuf
private int[] lengthsBuf -
payloadLengthsBuf
private int[] payloadLengthsBuf -
termSuffixes
-
payloadBytes
-
writer
-
maxDocsPerChunk
private final int maxDocsPerChunk -
scratchBuffer
-
BULK_MERGE_ENABLED_SYSPROP
-
BULK_MERGE_ENABLED
static final boolean BULK_MERGE_ENABLED
-
-
Constructor Details
-
Lucene90CompressingTermVectorsWriter
Lucene90CompressingTermVectorsWriter(Directory directory, SegmentInfo si, String segmentSuffix, IOContext context, String formatName, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockShift) throws IOException Sole constructor.- Throws:
IOException
-
-
Method Details
-
addDocData
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in classTermVectorsWriter
- Throws:
IOException
-
startDocument
Description copied from class:TermVectorsWriter
Called before writing the term vectors of the document.TermVectorsWriter.startField(FieldInfo, int, boolean, boolean, boolean)
will be callednumVectorFields
times. Note that if term vectors are enabled, this is called even if the document has no vector fields, in this casenumVectorFields
will be zero.- Specified by:
startDocument
in classTermVectorsWriter
- Throws:
IOException
-
finishDocument
Description copied from class:TermVectorsWriter
Called after a doc and all its fields have been added.- Overrides:
finishDocument
in classTermVectorsWriter
- Throws:
IOException
-
startField
public void startField(FieldInfo info, int numTerms, boolean positions, boolean offsets, boolean payloads) throws IOException Description copied from class:TermVectorsWriter
Called before writing the terms of the field.TermVectorsWriter.startTerm(BytesRef, int)
will be callednumTerms
times.- Specified by:
startField
in classTermVectorsWriter
- Throws:
IOException
-
finishField
Description copied from class:TermVectorsWriter
Called after a field and all its terms have been added.- Overrides:
finishField
in classTermVectorsWriter
- Throws:
IOException
-
startTerm
Description copied from class:TermVectorsWriter
Adds a term and its term frequencyfreq
. If this field has positions and/or offsets enabled, thenTermVectorsWriter.addPosition(int, int, int, BytesRef)
will be calledfreq
times respectively.- Specified by:
startTerm
in classTermVectorsWriter
- Throws:
IOException
-
addPosition
public void addPosition(int position, int startOffset, int endOffset, BytesRef payload) throws IOException Description copied from class:TermVectorsWriter
Adds a term position and offsets- Specified by:
addPosition
in classTermVectorsWriter
- Throws:
IOException
-
triggerFlush
private boolean triggerFlush() -
flush
- Throws:
IOException
-
flushNumFields
- Throws:
IOException
-
flushFieldNums
Returns a sorted array containing unique field numbers- Throws:
IOException
-
flushFields
- Throws:
IOException
-
flushFlags
- Throws:
IOException
-
flushNumTerms
- Throws:
IOException
-
flushTermLengths
- Throws:
IOException
-
flushTermFreqs
- Throws:
IOException
-
flushPositions
- Throws:
IOException
-
flushOffsets
- Throws:
IOException
-
flushPayloadLengths
- Throws:
IOException
-
finish
Description copied from class:TermVectorsWriter
Called beforeTermVectorsWriter.close()
, passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls toTermVectorsWriter.startDocument(int)
, but a Codec should check that this is the case to detect the JRE bug described in LUCENE-1282.- Specified by:
finish
in classTermVectorsWriter
- Throws:
IOException
-
addProx
Description copied from class:TermVectorsWriter
Called by IndexWriter when writing new segments.This is an expert API that allows the codec to consume positions and offsets directly from the indexer.
The default implementation calls
TermVectorsWriter.addPosition(int, int, int, BytesRef)
, but subclasses can override this if they want to efficiently write all the positions, then all the offsets, for example.NOTE: This API is extremely expert and subject to change or removal!!!
- Overrides:
addProx
in classTermVectorsWriter
- Throws:
IOException
-
copyChunks
private void copyChunks(MergeState mergeState, Lucene90CompressingTermVectorsWriter.CompressingTermVectorsSub sub, int fromDocID, int toDocID) throws IOException - Throws:
IOException
-
merge
Description copied from class:TermVectorsWriter
Merges in the term vectors from the readers inmergeState
. The default implementation skips over deleted documents, and usesTermVectorsWriter.startDocument(int)
,TermVectorsWriter.startField(FieldInfo, int, boolean, boolean, boolean)
,TermVectorsWriter.startTerm(BytesRef, int)
,TermVectorsWriter.addPosition(int, int, int, BytesRef)
, andTermVectorsWriter.finish(int)
, returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).- Overrides:
merge
in classTermVectorsWriter
- Throws:
IOException
-
tooDirty
Returns true if we should recompress this reader, even though we could bulk merge compressed dataThe last chunk written for a segment is typically incomplete, so without recompressing, in some worst-case situations (e.g. frequent reopen with tiny flushes), over time the compression ratio can degrade. This is a safety switch.
-
canPerformBulkMerge
private boolean canPerformBulkMerge(MergeState mergeState, MatchingReaders matchingReaders, int readerIndex) -
ramBytesUsed
public long ramBytesUsed()Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal. -
getChildResources
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- See Also:
-