Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Docker Image v1 Spec Documention #9560

Merged
merged 1 commit into from Jan 12, 2015
Merged

Conversation

jlhawn
Copy link
Contributor

@jlhawn jlhawn commented Dec 8, 2014

Docker-DCO-1.1-Signed-off-by: Josh Hawn josh.hawn@docker.com (github: jlhawn)

@jlhawn
Copy link
Contributor Author

jlhawn commented Dec 8, 2014

From issue #9538:

If you were to complain that Docker's image format and runtime specification, as massively adopted as it is, is not appropriately documented, and it could be made easier to produce alternate implementations - then I would completely agree with you. In response, I would encourage the project maintainers to improve the specs documentation based on your suggestions.

Well, here it is 🐋

@SvenDowideit @fredlf Please review and give feedback.

@jlhawn jlhawn force-pushed the image_spec branch 3 times, most recently from 3997360 to 82f5917 Compare December 8, 2014 08:31
@jessfraz
Copy link
Contributor

jessfraz commented Dec 8, 2014

😍

@jlhawn
Copy link
Contributor Author

jlhawn commented Dec 8, 2014

also @vbatts @dmp42 @crosbymichael @tianon @jfrazelle @unclejack @docker/distribution-trust @nathanleclaire @cpuguy83 @huslage and anyone else in the community that comes across this - please read through it and comment on anything that isn't clear or anything that requires more explanation, keeping in mind that this is not a new specification but is only documentation of how images are currently create/formatted in Docker.

I was thinking we could also generate a list of 'issues' with this specification to include in the bottom - something that could help us drive design of the next major version of the specification. Here are a few things I can think of for example:

- image IDs are an implementation detail of storage drivers in Docker and shouldn't be part of the specification.
- there is extraneous or useless info in some of the fields:
    - container id?
    - config *and* containerConfig? containerConfig seems to be only useful for the build system and nothing else.
- why is OnBuild in the runConfig and not top-level in the image JSON?
- is every field of the `runconfig.Config` struct necessary or useful?
- etc...

@jlhawn
Copy link
Contributor Author

jlhawn commented Dec 8, 2014

also @metalivedev ;-)


The execution parameters which should be used as a base when running a container using the image.

<h4>Container RunConfig Field Descriptions</h4>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This container config has been a strong point of confusion for me and several others. As far as I can tell:

  • This provide defaults values for settings if not specified at run time (e.g: CpuShares)
  • Some of these settings are completely ignored (e.g.: Tty, Attach*, ...)
  • As far as I understand, this whole idea of "default container config" is out of v2 image format so although this is a purely v1 documentation, don't you think it might be relevant to add a "deprecation warning"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good points @icecrime

I mentioned above:

  • is every field of the runconfig.Config struct necessary or useful?

I can update this document to clarify this - only I'm not entirely sure which fields are ignored. I guess I can dig up the code to find out exactly what's going on: https://github.com/docker/docker/blob/58ce0146e16e2e63b7a94d34a48722a9c7400c18/daemon/daemon.go#L418

Do you happen to know which fields are used? @erikh I think you have some expertise with runconfig, could you shed any light on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's all in runconfig.Merge. Fields used:

  • Cmd
  • CpuShares
  • Entrypoint
  • Env
  • ExposedPorts (and its legacy counterpart PortSpecs)
  • Memory
  • MemorySwap
  • User
  • Volumes
  • WorkingDir

Deleted: /etc/my-app-config
```

It then creates a Tar Archive which contains *only* this changeset: The added and modified files in their entirety, and for each deleted item it creates an entry for an empty file at the same location but prefixes the basename of the file with `.wh.`. These `.wh.` prefixed files are known as whiteout files. The resulting Tar archive for `f60c56784b83` has the following entries:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested: "The filenames prefixed with .wh. are known as "whiteout" files."

Is the name the only indication of the special nature of these files? That is, if I had a file named .wh.somename actually in my tree, would the file be unpacked to the layer? I'm kind of hoping there is some permissions bit or something set that together with the name means it is a special file.

@jamescarr
Copy link
Contributor

👍 this is great to see!

@jlhawn jlhawn force-pushed the image_spec branch 2 times, most recently from fcaaef4 to e073386 Compare December 10, 2014 00:43
@jlhawn
Copy link
Contributor Author

jlhawn commented Dec 10, 2014

I've just pushed a major update to the draft spec. Please review again if you already have!

Layer
</dt>
<dd>
Refers to either one or both of the JSON metadata and filesystem changes for a single link in a chain of layers that make up a complete image. To refer to either specifically, one may use the terms `Image/Layer JSON` or `Image/Layer Metadata` to refer to its JSON metadata and `Image/Layer Filesystem Changeset` or `Image/Layer Diff` to refer to the set of filesystem changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this pretty hard to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd be okay to just delete the second sentence of this paragraph? I realize I probably went a little crazy in the second sentence... we really should agree on some common terminology though. It's a bit confusing to have a single term used loosely to refer to multiple things :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could force a definition and make sure we use that everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the first sentence definition if everyone else is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would phrase it something like this:

Images are composed of "layers". "Image layer" is a general term which may be used to refer to one or both of the following:

  1. The metadata for the layer, described in the JSON format
  2. The filesystem changes described by a layer

To refer to the former specifically, the terms "Layer JSON" or "Layer Metadata" are frequently used.

To refer to the latter, the terms "Image Filesystem Changeset" or "Image Diff" are frequently used.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, @nathanleclaire I'll update it.

Image ID
</dt>
<dd>
The randomly generated ID given to an image or image layer upon its creation. It is represented as a hexidecimal encoding of 256 bits, e.g., `a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"image or image layer" -- this seems odd. Do images and image layers have IDs in different namespaces?

Need the image ID necessarily be random, or is the assertion simply that it need not have semantic meaning?

If random, must the ID be from a CSPRNG?

@nathanleclaire
Copy link
Contributor

cc @jamtur01 would be great to get your input on this

Image Filesystem Changeset
</dt>
<dd>
An archive of the new or changed files and directories which a layer of an image has. This archive also contains special "whiteout" files, which have names beginning with `.wh.`, which describe that that file or directory has been deleted from its parent image's filesystem. These archives can be made trivially by a layer-based/union filesystem such as AUFS or OverlayFS or by computing the diff of two directories (one corresponding to a snapshot of the parent image's filesystem and the other the current image's filesystem).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like any description of whiteout files and their semantics is going to #include the implementation details of a specific version of aufs, with specific config/compilation flags, along with the flags Docker invokes it with. For example, this paragraph from the aufs documentation:

The whiteout is for hiding files on lower branches. Also it is applied to stop readdir going lower branches. The latter case is called ’opaque directory.’ Any whiteout is an empty file, it means whiteout is just an mark. In the case of hiding lower files, the name of whiteout is ’.wh..’ And in the case of stopping readdir, the name is ’.wh..wh..opq’ or ’.wh.__dir_opaque.’ The name depends upon your compile configuration CONFIG_AUFS_COMPAT. All whiteouts are hardlinked, including ’/.wh..wh.aufs.’

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rm the bit about whiteout files (cover it later) and phrase like:

An archive of the files which have been added, changed, or deleted in an image layer. Using a layer-based or union filesystem such as AUFS, or by computing the diff from filesystem snapshots, the filesystem changeset can be used to present a series of image layers as if it were one cohesive filesystem.

Env <code>array of strings</code>
</dt>
<dd>
Entries are in the format of <code>VARNAME="var value"</code>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these double-quotes normative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably shouldn't have included the quotes in this example. I believe the way this value should be interpreted is that the substring before the first = is the variable name and everything after is the value. In Docker, I think this is passed directly to the execution driver in this format. @crosbymichael could you clarify this for us please?

<dd>
The username or UID which the process in the container should
run as. This acts as a default value to use when the value is
not specified when creating a container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the following are valid:

  • user
  • uid
  • user:group
  • uid:gid
  • uid:group
  • user:gid

If group/gid is not specified, the default group and supplementary groups of the given user/uid in /etc/passwd from the container are applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @tianon !

@mmdriley
Copy link
Contributor

mmdriley commented Jan 1, 2015

lgtm. No doubt there are still nits to be picked, but this is a great step forward in unambiguously-described behavior. Thanks for your time and effort in compiling it.

@crosbymichael
Copy link
Contributor

@jlhawn Any more edits remaining?

There is no reason for shykes to look at this if you are just documenting the reality of the current system.

@shykes
Copy link
Contributor

shykes commented Jan 5, 2015

Correct, if we're documenting today's design don't feel obligated to wait for my +1

@jlhawn
Copy link
Contributor Author

jlhawn commented Jan 5, 2015

@crosbymichael I think I just need to add in the User field description from @tianon and we're set!

@bfirsh
Copy link
Contributor

bfirsh commented Jan 12, 2015

This is brilliant. Thanks @jlhawn. I'm really glad we're taking steps towards specifying how Docker works.

Perhaps we could this in for Docker 1.5 and shout about it a bit. ^_^

@jessfraz
Copy link
Contributor

I don't know what we are waiting on @jlhawn

@jessfraz
Copy link
Contributor

Maybe just a squash of commits?

Many iterations have gone into documenting a v1 specification of Docker's Image
format.

v1 Image spec: clarify parent field

- metalivedev pointed out that the description was ambiguous, so I've removed
  mention that it was randomly generated. It IS the ID of the parent image.

Updated v1 image specificatino documentation

- More complete details and deprication notifications for each field
  in the JSON metadata of an image.
- Details on the format for packaging combined Image JSON + Filesystem
  Changeset archives for all layers of an image.

Clarify description of an image "Layer" in v1 spec

Updated intro of image v1 spec

Updated image v1 spec after more review

- Removed description of "Image" from the terminology section. The entire
  document is meant to serve this purpose.
- Updated the definition of "Image Filesystem Changeset".
- Clarified the level of randomness needed for generating image IDs.
- Updated the description of "Image Checksum".
- Added term descriptions for "Repository" and "Tag"
- Removed extraneous/implementation-specific fields from the Image JSON
  example file and field descriptions:
  - removed "container_config" and "docker_version" fields.
  - Added missing "author" field example and description.
- Removed extraneous/implementation-specific fields from the "config" struct
  example and description:
  - removed "Hostname", "Domainname", "Cpuset", "AttachStdin", "AttachStdout",
    "AttachStderr", "PortSpecs", "Tty", "OpenStdin", "StdinOnce", "Image",
    "NetworkDisabled", and "OnBuild".
- Updated example Image JSON config with better example values for "Env",
  "Cmd", "Volumes", "WorkingDir", "Entrypoint", "CpuShares", "Memory",
  "MemorySwap", and "User".
- Added notices that any fields not specified are to be considered as
  implementation specific and should be ignored my implementations which
  are unable to interpret them.
- Updated example of creating layer filesystem changesets to use less formal
  language.
- Listed more details in the section regarding extraction of a bundle of image
  layers into the root filesystem of a container.
- Updated the closing mention of Docker as an evolving implementation.

More updates to the v1 image spec

- Added line wrapping after 80 columns per line to adhere to documentation
  style guides, as pointed out by @jamtur01

- Removed references to any specific docker commands, updated a few descriptions
  or drop repeated statements, as pointed out by @cpuguy83

Cleanup image v1 spec draft after fredlf comments

Address comments by mmdriley on v1 image spec

Improve description of image v1 spec 'config.User`

- Improves description of image v1 specification for the 'User' runtime
  parameter after recomendations by tianon.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
@jlhawn
Copy link
Contributor Author

jlhawn commented Jan 12, 2015

@jfrazelle All squashed!

jessfraz pushed a commit that referenced this pull request Jan 12, 2015
Adds Docker Image v1 Spec Documention
@jessfraz jessfraz merged commit 0192b6c into moby:master Jan 12, 2015
@jessfraz
Copy link
Contributor

awesome!

@thaJeztah
Copy link
Member

Thanks @jlhawn!

@odino
Copy link

odino commented Jan 15, 2015

I was talking to @shykes about this at the dockercon in amsterdam...great job guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet