Ability for rocksdb to compact when flushing the in-memory memtable to a file in L0.

Summary:
Rocks accumulates recent writes and deletes in the in-memory memtable.
When the memtable is full, it writes the contents on the memtable to
a file in L0.

This patch removes redundant records at the time of the flush. If there
are multiple versions of the same key in the memtable, then only the
most recent one is dumped into the output file. The purging of
redundant records occur only if the most recent snapshot is earlier
than the earliest record in the memtable.

Should we switch on this feature by default or should we keep this feature
turned off in the default settings?

Test Plan: Added test case to db_test.cc

Reviewers: sheki, vamsi, emayanke, heyongqiang

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8991
This commit is contained in:
Dhruba Borthakur
2013-02-28 14:09:30 -08:00
parent 4992633751
commit 806e264350
10 changed files with 201 additions and 18 deletions

View File

@@ -666,13 +666,18 @@ Status DBImpl::WriteLevel0TableForRecovery(MemTable* mem, VersionEdit* edit) {
meta.number = versions_->NewFileNumber();
pending_outputs_.insert(meta.number);
Iterator* iter = mem->NewIterator();
const SequenceNumber newest_snapshot = snapshots_.GetNewest();
const SequenceNumber earliest_seqno_in_memtable =
mem->GetFirstSequenceNumber();
Log(options_.info_log, "Level-0 table #%llu: started",
(unsigned long long) meta.number);
Status s;
{
mutex_.Unlock();
s = BuildTable(dbname_, env_, options_, table_cache_.get(), iter, &meta);
s = BuildTable(dbname_, env_, options_, table_cache_.get(), iter, &meta,
user_comparator(), newest_snapshot,
earliest_seqno_in_memtable);
mutex_.Lock();
}
@@ -710,6 +715,9 @@ Status DBImpl::WriteLevel0Table(MemTable* mem, VersionEdit* edit,
*filenumber = meta.number;
pending_outputs_.insert(meta.number);
Iterator* iter = mem->NewIterator();
const SequenceNumber newest_snapshot = snapshots_.GetNewest();
const SequenceNumber earliest_seqno_in_memtable =
mem->GetFirstSequenceNumber();
Log(options_.info_log, "Level-0 flush table #%llu: started",
(unsigned long long) meta.number);
@@ -718,7 +726,9 @@ Status DBImpl::WriteLevel0Table(MemTable* mem, VersionEdit* edit,
Status s;
{
mutex_.Unlock();
s = BuildTable(dbname_, env_, options_, table_cache_.get(), iter, &meta);
s = BuildTable(dbname_, env_, options_, table_cache_.get(), iter, &meta,
user_comparator(), newest_snapshot,
earliest_seqno_in_memtable);
mutex_.Lock();
}
base->Unref();