Reprint Notice(转载提示):本文转载自 https://github.com/lance-format/lance/blob/main/docs/src/format/table/index.md。 原文标题:Lance Table Format。 原文语言:English。本文为中文翻译版本,并将层级规范链接改写为本站预留文章链接(与上文 /posts/lance-format-table-format-specification 链接对齐)。Lance 表格式概览 Lance 表格式将数据集组织为由片段(fragment)和索引构成的版本化集合。 每个版本都由一个不可变的 manifest 文件描述,它会引用数据文件、删除文件、事务文件与索引。 该格式通过多版本并发控制(MVCC)支持 ACID 事务、模式演进和高效增量更新。Manifest Overview Manifest 描述数据集的一个版本。 它包含完整的 schema 定义(包括嵌套字段)、该版本包含的数据片段列表、单调递增的版本号,以及对索引区段的可选引用(用于描述一组索引元数据)。message Manifest { // All fields of the dataset, including the nested fields. repeated lance.file.Field fields = 1; // Schema metadata. map<string, bytes> schema_metadata = 5; // Fragments of the dataset. repeated DataFragment fragments = 2; // Snapshot version number. uint64 version = 3; // The file position of the version auxiliary data. // * It is not inheritable between versions. // * It is not loaded by default during query. uint64 version_aux_data = 4; message WriterVersion { // The name of the library that created this file. string library = 1; // The version of the library that created this file. Because we cannot assume // that the library is semantically versioned, this is a string. However, if it // is semantically versioned, it should be a valid semver string without any 'v' // prefix. For example: `2.0.0`, `2.0.0-rc.1`. // // For forward compatibility with older readers, when writing new manifests this // field should contain only the core version (major.minor.patch) without any // prerelease or build metadata. The prerelease/build info should be stored in // the separate prerelease and build_metadata fields instead. string version = 2; // Optional semver prerelease identifier. // // This field stores the prerelease portion of a semantic version separately // from the core version number. For example, if the full version is "2.0.0-rc.1", // the version field would contain "2.0.0" and prerelease would contain "rc.1". // // This separation ensures forward compatibility: older…