Subsections of Development
Roadmap
DataLad started out as a rather monolithic code base that mixed a Python library, a Python API geared towards interactive use, and a command line interface (CLI). The general development trajectory is to disentangle the code, and form a more modular, layered software system that comprises:
- dedicated applications providing a CLI, a graphical user interface (GUI), and a Python-based command API for interactive use and scripting
- a collection of topical extension packages
- utility libraries for a DataLad framework of closely aligned implementations
- another utility library with generic algorithms and implementations, not considered to be part of the DataLad framework
The schema below depict the envisioned relationships and dependencies between these components (solid arrows indicate dependencies and dashed ones optional usage).
graph LR; subgraph "Non-framework<br>utility libraries" salad("datasalad") end subgraph DataLad framework core("datalad-core") next("datalad-next") end subgraph "DataLad<br>applications" dlcmd("dlcmd (CLI)") gooey("gooey (GUI)") py("Python<br>Command API") end subgraph "(3rd-party)<br>Extension packages" extension("datalad-${extension}") end salad ---> core salad ---> next salad ---> extension core --> next core --> extension next -.-> extension core --> dlcmd next -.-> dlcmd extension -.-> dlcmd core --> gooey next -.-> gooey extension -.-> gooey core --> py next -.-> py extension -.-> py %% node links to websites click salad href "https://github.com/datalad/datasalad" click core href "https://github.com/datalad/datalad-core" click next href "https://github.com/datalad/datalad-next" click dlcmd href "/dev/dlcmd/"
Targeted components
Non-framework utility libraries
Such libraries hold implementations developed by the DataLad project and for the DataLad project that are nevertheless so generic that they are not considered to be part of the DataLad framework. The means:
- no DataLad jargon in messages
- no dependencies on other DataLad components
- no use of DataLad facilities (e.g., recognition of DataLad-specific configuration)
A concrete library example is datasalad, which provides tooling to work with subprocesses.
DataLad framework libraries
These library provide everything necessary to implement DataLad command and have them work in a uniform fashion. This includes aspects like configuration management, particular workflows (e.g., credential input and storage), and working with git(-annex) repositories in a particular “DataLad way”.
We distinguish two different libraries: core
and next
.
The core
library provide the essential set of DataLad functionality that is broadly applicable to the widest range of use cases.
It aims to have a lean dependency footprint to enable deploying DataLad in a wide range of environments.
The current development state is available at https://github.com/datalad/datalad-core.
The next
library serves the same purpose and scope as the core
library.
It is, however, a staging area for making new and improved implementations available before they may migrate in the core
library.
While core
evolves at a comparatively slow pace, next
is expected to have a much higher frequency of feature releases.
The current development state is available at https://github.com/datalad/datalad-next.
Topical DataLad extension packages can use both libraries to implement their functionality.
User interfaces
The libraries are accompanied by applications that provide concrete user interfaces. These (can) include:
- command line interface (CLI)
- graphical user interface (GUI)
- language-bindings or scripting interfaces
Such interface applications could be lean (only proxying library functionality), or heavily tailored for a specific purpose. There is no assumption of exclusivity. For example, there can be any number of CLI implementations.
In order to cleanly separate the underlying requirements and dependencies, even a “Python command API” is distinguished from the framework libraries (also written in Python). Only the former will define aspects like a uniform logging/messaging behavior.
Topical extension packages
Extension packages extend DataLad with additional functionality. Many extensions are provided by the DataLad project, but their can be implemented completely independent of the project and require no approval and generally need no coordination with the DataLad project.
Any functionality that is out-of-scope for the DataLad framework libraries can be implemented in an extension package.
Extension development is facilitated by a project template at https://github.com/datalad/datalad-extension-template.
Examples of extension packages are
Continuous integration
Appveyor
We have a paid subscription of the Appveyor CI/CD service. It is administered by mih.
Logs for projects can be found at URLs of the pattern https://ci.appveyor.com/project/mih/<project-name>
.
Github actions
Forgejo actions
The hub is set up to run Forgejo actions. These are, to some degree, compatible with Github actions. This means that Forgejo can (attempt to) run Github actions, but not the other way round.
We operate runners on the following machines:
dlcmd (CLI)
dlcmd
is a command line interface (CLI) for DataLad that aims to provide a modern, and convenient approach to using DataLad in a terminal.
DataLad functionality provided via dlcmd
is separated into two different categories:
- tailored, stable commands for a finite set of features
- auto-generated interfaces for any DataLad command available in an installation
For the second category, dlcmd
provides no guarantees regarding API stability, and accessibility of particular functionality in the terminal.
The first category, however, comprises individually tuned and documented commands that are specifically tailored and integrated for their joint use in a terminal.
Here dlcmd
is not serving as a thin layer between the terminal and a Python implementation of a command, but as a fully featured application with consistent (error) messaging, and behavior.
These implementations are individually tested to work via the CLI.
The development of dlcmd
is presently conducted at https://hub.datalad.org/datalad/dlcmd