Roadmap

DataLad started out as a rather monolithic code base that mixed a Python library, a Python API geared towards interactive use, and a command line interface (CLI). The general development trajectory is to disentangle the code, and form a more modular, layered software system that comprises:

  • dedicated applications providing a CLI, a graphical user interface (GUI), and a Python-based command API for interactive use and scripting
  • a collection of topical extension packages
  • utility libraries for a DataLad framework of closely aligned implementations
  • another utility library with generic algorithms and implementations, not considered to be part of the DataLad framework

The schema below depict the envisioned relationships and dependencies between these components (solid arrows indicate dependencies and dashed ones optional usage).

graph LR;
  subgraph "Non-framework<br>utility libraries"
      salad("datasalad")
  end
  subgraph DataLad framework
      core("datalad-core")
      next("datalad-next")
  end
  subgraph "DataLad<br>applications"
      dlcmd("dlcmd (CLI)")
      gooey("gooey (GUI)")
      py("Python<br>Command API")
  end
  subgraph "(3rd-party)<br>Extension packages"
      extension("datalad-${extension}")
  end
  salad ---> core
  salad ---> next
  salad ---> extension
  core --> next
  core --> extension
  next -.-> extension
  core --> dlcmd
  next -.-> dlcmd
  extension -.-> dlcmd
  core --> gooey
  next -.-> gooey
  extension -.-> gooey
  core --> py
  next -.-> py
  extension -.-> py
  %% node links to websites
  click salad href "https://github.com/datalad/datasalad"
  click core href "https://github.com/datalad/datalad-core"
  click next href "https://github.com/datalad/datalad-next"
  click dlcmd href "/dev/dlcmd/"

Targeted components

Non-framework utility libraries

Such libraries hold implementations developed by the DataLad project and for the DataLad project that are nevertheless so generic that they are not considered to be part of the DataLad framework. The means:

  • no DataLad jargon in messages
  • no dependencies on other DataLad components
  • no use of DataLad facilities (e.g., recognition of DataLad-specific configuration)

A concrete library example is datasalad, which provides tooling to work with subprocesses.

DataLad framework libraries

These library provide everything necessary to implement DataLad command and have them work in a uniform fashion. This includes aspects like configuration management, particular workflows (e.g., credential input and storage), and working with git(-annex) repositories in a particular “DataLad way”.

We distinguish two different libraries: core and next.

The core library provide the essential set of DataLad functionality that is broadly applicable to the widest range of use cases. It aims to have a lean dependency footprint to enable deploying DataLad in a wide range of environments. The current development state is available at https://github.com/datalad/datalad-core.

The next library serves the same purpose and scope as the core library. It is, however, a staging area for making new and improved implementations available before they may migrate in the core library. While core evolves at a comparatively slow pace, next is expected to have a much higher frequency of feature releases. The current development state is available at https://github.com/datalad/datalad-next.

Topical DataLad extension packages can use both libraries to implement their functionality.

User interfaces

The libraries are accompanied by applications that provide concrete user interfaces. These (can) include:

  • command line interface (CLI)
  • graphical user interface (GUI)
  • language-bindings or scripting interfaces

Such interface applications could be lean (only proxying library functionality), or heavily tailored for a specific purpose. There is no assumption of exclusivity. For example, there can be any number of CLI implementations.

In order to cleanly separate the underlying requirements and dependencies, even a “Python command API” is distinguished from the framework libraries (also written in Python). Only the former will define aspects like a uniform logging/messaging behavior.

Topical extension packages

Extension packages extend DataLad with additional functionality. Many extensions are provided by the DataLad project, but their can be implemented completely independent of the project and require no approval and generally need no coordination with the DataLad project.

Any functionality that is out-of-scope for the DataLad framework libraries can be implemented in an extension package.

Extension development is facilitated by a project template at https://github.com/datalad/datalad-extension-template.

Examples of extension packages are