This project aims to show the limitations of using Docker containers as a reliable reproducibility tool.
In particular, as Docker relies on non-reproducible tools, it is difficult to construct a \dfile\ that will rebuild the \emph{exact} same software environment in the future.
In this project, we will collect research artifacts coming from various scientific conferences containing \dfile s, rebuild them periodically, and observe the variation in the resulting software environments.
This Python script\ \cite{ecg_code} takes as input a (verified) JSON representation of the Nickel artifact description, and then tries to build the \dfile\ contained in the artifact.
\item If the build is successful, gather information about the produced software environment (Sections \ref{sec:package_managers}, \ref{sec:git}, \ref{sec:misc}, and \ref{sec:pyenv})
In this case, once the container built successfully, \ecg\ logs into the container and extracts the commit hash of the repository (via \texttt{git log}).
In the case where the \dfile\ downloads content from the internet (\eg\ archives, binaries), \ecg\ will download the same content on the host machine (\ie\ not in the container) and then compute the cryptographic hash of the downloaded content.
Even if \texttt{pip} is managed in the ``Package Managers'' section (Section \ref{sec:package_managers}), when authors use a virtual environment, \ecg\ needs to query this exact Python environment, and not the global one.
The gathering part of the \dfile s will be done right after the publication of the proceeding of a conference.
Contributors of the ``Data Curation'' phase will go through all the papers and their artifact to extract artifact containing \dfile s.
These \dfile s will then be captured with the Nickel description (see Section \ref{sec:nickel}).
To avoid mistake, at least two contributors will be assigned by paper.
If there is any difference in the Nickel description of an artifact, a discussion between the contributors will be initiated to conclude on the correct artifact description.
The first part of the analysis can be done statically from the description of the artifacts.
\begin{itemize}
\item Number/Proportion of \dfile s using particular package managers
\item Number/Proportion of \dfile s downloading from Git repositories
\item Number/Proportion of \dfile s downloading from internet
\end{itemize}
\subsection{Dynamic Analysis}
The second part of the analysis will be done after the first year of data collection, and will focus on the temporal evolution of properties of the artifacts.
\paragraph{Artifact Sources}
\begin{itemize}
\item Number/Proportion of artifacts that can be downloaded
\item Number/Proportion of artifacts which content has changed
\end{itemize}
\paragraph{Build Status}
\begin{itemize}
\item Number/Proportion of \dfile s that build succesfully
\item Number/Proportion of \dfile s errors (\texttt{baseimage\_unavailable}, \texttt{job\_time\_execeed}, \texttt{unknown\_error}) for the failed builds