pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/ozontech/seq-db/pull/334

href="https://github.githubassets.com/assets/actions-109fb3a41bacb1c2.css" /> feat: new WAL file for meta by cheb0 · Pull Request #334 · ozontech/seq-db · GitHub
Skip to content

feat: new WAL file for meta#334

Open
cheb0 wants to merge 8 commits intomainfrom
311-wal-crash-recovery
Open

feat: new WAL file for meta#334
cheb0 wants to merge 8 commits intomainfrom
311-wal-crash-recovery

Conversation

@cheb0
Copy link
Member

@cheb0 cheb0 commented Jan 30, 2026

Description

A new WAL file which can withstand arbitrary failures like power shutdowns, lost disk sector writes or corrupted bytes.

There is also a new MetaBlock type which replaces DocBlock at all places where meta is transfered or stored to filesystem.

Fixes #311


  • I have read and followed all requirements in CONTRIBUTING.md;
  • I used LLM/AI assistance to make this pull request;

@github-actions
Copy link
Contributor

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
FindSequence_Random/small-4 174405 6e8721
10280.15 MB/s 5489.51 MB/s 0.53 🔴
24.90 ns/op 46.63 ns/op 1.87 🔴

@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2026

Codecov Report

❌ Patch coverage is 85.57994% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.33%. Comparing base (1744059) to head (4375be9).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
frac/active.go 71.42% 17 Missing and 5 partials ⚠️
storage/wal_reader.go 86.07% 8 Missing and 3 partials ⚠️
storage/wal_block.go 94.11% 5 Missing and 1 partial ⚠️
frac/active_writer.go 66.66% 3 Missing ⚠️
storage/wal_writer.go 90.90% 1 Missing and 1 partial ⚠️
storeapi/grpc_bulk.go 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #334      +/-   ##
==========================================
- Coverage   71.65%   71.33%   -0.32%     
==========================================
  Files         204      209       +5     
  Lines       14770    15208     +438     
==========================================
+ Hits        10583    10849     +266     
- Misses       3435     3572     +137     
- Partials      752      787      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cheb0
Copy link
Member Author

cheb0 commented Jan 30, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link

seqbenchbot commented Jan 30, 2026

Nice, @cheb0 <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - 0c97854c.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@cheb0
Copy link
Member Author

cheb0 commented Jan 30, 2026

@seqbenchbot down 0c97854c

@seqbenchbot
Copy link

seqbenchbot commented Jan 30, 2026

Nice, @cheb0 <(-^,^-)=b!

Your request was successfully served.
The benchmark with identificator 0c97854c was stopped.

Have a great time!

@dkharms dkharms added feature New feature or request stability labels Feb 10, 2026
// MetaBlock format: M : V : C : LLLL : UUUU : PPPP : DDDD-DDDD : HHHH
// M = Magic (101), V = Version, C = Codec, L = Length, U = Raw Length, P = Payload Checksum, D = Docs Offset, H = Header Checksum

type MetaBlock []byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can go with WALBlock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to WalBlock

// BlockAlignment is the alignment boundary for blocks in the new WAL format. Must be greater than
// MetaBlock header (27 bytes) to prevent header torn writes and allow faster navigation during replay
// of corrupted WAL file
BlockAlignment int64 = 64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BlockAlignment int64 = 64
WALBlockAlignment int64 = 64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"github.com/ozontech/seq-db/logger"
)

type WalRecord struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name everything WAL-related consistently. Somewhere you have Wal, somewhere you have WAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

for {
headerBuf := make([]byte, MetaBlockHeaderLen)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could reuse headerBuf on each iteration.
Is it possible to move headerBuf outside of the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"github.com/ozontech/seq-db/metric/stopwatch"
)

const (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move these variables to meta_block.go (which must be renamed to wal_block.go in my opinion) and introduce new structure WALHeader with different access methods e.g. Magic() and Version()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial version had exactly that - a dedicated WalHeader struct. But I didn't see much of the point right now since there is no much usage of it.

I wouldn't move constants to wal_block.go since they are not related to the block itself, only to WAL.

Copy link
Member

@dkharms dkharms Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see, then a small nit:

const (
	// _WalVersionInitial is the first version of WAL file
	// with CRC32 checksums and 64 byte alignment for blocks.
	_WalVersionInitial byte = iota
)

const (
	// WalMagic is the magic number at the start of WAL files.
	WalMagic uint32 = 0xFFFFFFFF
	// WalCurrentVersion is the current WAL format version.
	WalCurrentVersion = _WalVersionInitial
	// WalHeaderSize is the size of the WAL header in bytes (4 bytes magic + 1 byte version).
	// 59 bytes are also reserved due to alignment.
	WalHeaderSize = 5
)

const (
	// WalBlockAlignment is the alignment boundary for blocks in the new WAL format.
	// Must be greater than [WalBlockHeaderLen] to prevent header torn writes
	// and allow faster navigation during replay of corrupted WAL file.
	WalBlockAlignment int64 = 64
)

That's just my personal preference so you can ignore it (and it's very insignificant, I guess). I've added prefix (_) to _WalVersionInitial to send a signal to other developers that this variable should not be used.

By the way, you use specific WAL version instead of current WAL version in wal_reader.go:

version := header[4]
if version != _WalVersionInitial {
	return nil, fmt.Errorf("unknown WAL version: %d (supported: %d)", version, _WalVersionInitial)
}

// WALMagic is the magic number at the start of WAL files
WALMagic uint32 = 0xFFFFFFFF
// WALVersion1 is the first version of WAL file with CRC32 checksums and 64 byte alignment for blocks.
WALVersion1 uint8 = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why WALVersion1 and not just WALVersion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next line, WALCurrentVersion = WalVersion1

}
}
return result
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can get rid of CAS loop where.

Let's rewrite initialization like this:

func NewWalWriter(ws WriteSyncer, offset int64, skipSync bool) *WalWriter {
	w := &WalWriter{
		ws:       ws,
		skipSync: skipSync,
		notify:   make(chan struct{}, 1),
	}

 	// Here we have offset that can be calculated like 64 * `k` for some `k`.
	w.offset.Store(nextBlockOffset(offset))
	// write a header at the beggining if it's a new file
	if offset == 0 {
		if err := writeWALHeader(ws); err != nil {
			logger.Panic("failed to write WAL header", zap.Error(err))
		}

		if !skipSync {
			_ = ws.Sync()
		}


 	    // Here we have offset that can be calculated like 64 * `k` for some `k`.
        // In this case `k` is equal to 1.
		w.offset.Store(nextBlockOffset(WALHeaderSize)) 
	}

	w.wg.Add(1)
	go func() {
		w.syncLoop()
		w.wg.Done()
	}()

	return w
}

So we know that offset is already aligned.
Now we can reserve space like this:

func (w *WalWriter) reserveSpace(blockSize int64) int64 {
	aligned := nextBlockOffset(blockSize)
    
    // w.offset is already aligned.
    // So when we add aligned block we still have aligned offset.
	end := w.offset.Add(aligned)
	start := end - aligned

	return start
}

I've applied these changes locally and tests are green.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, if offset is always aligned, then CAS is not needed. fixed

frac/active.go Outdated
Comment on lines 92 to 111
legacyMetaFileName := baseFileName + consts.MetaFileSuffix
if _, err := os.Stat(legacyMetaFileName); err == nil {
// .meta file exists
metaFile, metaStats = mustOpenFile(legacyMetaFileName, config.SkipFsync)
metaSize = uint64(metaStats.Size())
metaReader = storage.NewDocBlocksReader(readLimiter, metaFile)
writer = NewActiveWriterLegacy(docsFile, metaFile, docsStats.Size(), metaStats.Size(), config.SkipFsync)
logger.Info("using legacy meta file format", zap.String("fraction", baseFileName))
} else {
logger.Info("using new WAL format", zap.String("fraction", baseFileName))
walFileName := baseFileName + consts.WalFileSuffix
metaFile, metaStats = mustOpenFile(walFileName, config.SkipFsync)
metaSize = uint64(metaStats.Size())
writer = NewActiveWriter(docsFile, metaFile, docsStats.Size(), metaStats.Size(), config.SkipFsync)
var err error
walReader, err = storage.NewWalReader(readLimiter, metaFile, baseFileName)
if err != nil {
logger.Fatal("failed to initialize WAL reader", zap.String("fraction", baseFileName), zap.Error(err))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move this to separate method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// WalWriter writes MetaBlocks to a WAL file with header and 64-byte alignment.
// Format: [Header 5B] [... -> align to 64] [MetaBlock] [... -> align to 64] [MetaBlock] ...
type WalWriter struct {
ws WriteSyncer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reusing frac.FileWriter here? Yeah, I've noticed that in FileWriter there is a conversion from new WAL format to old Meta format but I guess it can be done somewhere else...

I do not like the duplication of fsync-coalescing logic here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. I redesigned a bit - FileWriter now (probably should be named SyncFileWriter) is not aware of any block formats, while convertions of doc => wal happens in a separate wrapper LegacyMetaWriter.

WalWriter now also works on top of FileWriter, which means it now reserves space from FileWriter too, since offset is owned by FileWriter.

Both WalWriter and LegacyMetaWriter implement a trait MetaWriter.

I also moved FileWriter to storage package to prevent circular dependency.

Copy link
Member

@dkharms dkharms Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both WalWriter and LegacyMetaWriter implement a trait

I guess, you've confused languages a little bit 🤣
Overall, looks great.

@github-actions
Copy link
Contributor

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
AggDeep/size=1000-4 8aefcc 717c8a
4817.00 ns/op 5643.00 ns/op 1.17 🔴
AggDeep/size=1000000-4 8aefcc 717c8a
4837044.00 ns/op 5763770.00 ns/op 1.19 🔴
AggWide/size=1000-4 8aefcc 717c8a
4804.00 ns/op 5623.00 ns/op 1.17 🔴
AggWide/size=1000000-4 8aefcc 717c8a
516.00 B/op 599.00 B/op 1.16 🔴
5419714.00 ns/op 6368210.00 ns/op 1.18 🔴
FindSequence_Random/medium-4 8aefcc 717c8a
10747.09 MB/s 9459.73 MB/s 0.88 🔴
FindSequence_Random/small-4 8aefcc 717c8a
6638.44 MB/s 5961.93 MB/s 0.90 🔴

@github-actions
Copy link
Contributor

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
AggWide/size=10000-4 8aefcc 14c5ee
47761.00 ns/op 54871.00 ns/op 1.15 🔴
FindSequence_Random/small-4 8aefcc 14c5ee
6638.44 MB/s 5571.79 MB/s 0.84 🔴

@cheb0 cheb0 requested a review from dkharms February 20, 2026 16:30
@cheb0
Copy link
Member Author

cheb0 commented Feb 23, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link

seqbenchbot commented Feb 23, 2026

Nice, @cheb0 <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - b9ff764f.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@cheb0
Copy link
Member Author

cheb0 commented Feb 23, 2026

@seqbenchbot down b9ff764f

@seqbenchbot
Copy link

seqbenchbot commented Feb 23, 2026

Nice, @cheb0 <(-^,^-)=b!

The benchmark with identificator b9ff764f was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 39.66 39.60 -0.16% 15.51 15.96 +2.92% 36.00 35.00 -2.78% 67.00 68.00 +1.49% 105.50 104.00 -1.42% 31889.00 31959.00 +0.22%

Have a great time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request stability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make WAL resilient to hardware crash

4 participants

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy