Class DirectPostingsFormat

java.lang.Object
org.apache.lucene.codecs.PostingsFormat
org.apache.lucene.codecs.memory.DirectPostingsFormat
All Implemented Interfaces:
NamedSPILoader.NamedSPI

public final class DirectPostingsFormat extends PostingsFormat
Wraps Lucene99PostingsFormat format for on-disk storage, but then at read time loads and stores all terms and postings directly in RAM as byte[], int[].

WARNING: This is exceptionally RAM intensive: it makes no effort to compress the postings data, storing terms as separate byte[] and postings as separate int[], but as a result it gives substantial increase in search performance.

This postings format supports TermsEnum.ord() and TermsEnum.seekExact(long).

Because this holds all term bytes as a single byte[], you cannot have more than 2.1GB worth of term bytes in a single segment.

  • Field Details

    • minSkipCount

      private final int minSkipCount
    • lowFreqCutoff

      private final int lowFreqCutoff
    • DEFAULT_MIN_SKIP_COUNT

      private static final int DEFAULT_MIN_SKIP_COUNT
      See Also:
    • DEFAULT_LOW_FREQ_CUTOFF

      private static final int DEFAULT_LOW_FREQ_CUTOFF
      See Also:
  • Constructor Details

    • DirectPostingsFormat

      public DirectPostingsFormat()
    • DirectPostingsFormat

      public DirectPostingsFormat(int minSkipCount, int lowFreqCutoff)
      minSkipCount is how many terms in a row must have the same prefix before we put a skip pointer down. Terms with docFreq <= lowFreqCutoff will use a single int[] to hold all docs, freqs, position and offsets; terms with higher docFreq will use separate arrays.
  • Method Details