Updated doc with details on analysis outputs. Moved test scripts to a dedicated directory.

2024-08-22 19:19:40 +02:00 · 2024-08-22 19:19:40 +02:00 · 514e186c3d
commit 514e186c3d
parent 0c281e1051
4 changed files with 22 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -58,9 +58,9 @@ Otherwise, you can use the Nix package manager and run `nix develop` in this dir

 ### Running the whole workflow

-First, open `config/config.yaml` and change `system` to `local` if you try to run the workflow outside of the Grid'5000 testbed.
+First, open `config/config.yaml` and set `system` to `local` to run the workflow on your local machine, or to `g5k` to run it on the Grid'5000 testbed.

-Then, run the following command at the root directory of the repository:
+Then, run the following command at the root directory of this repository:
 ```
 snakemake --cores <nb_cores>
 ```
@ -152,7 +152,14 @@ Where:
 - `<input_table1>`, `<input_table2>`... are one or more output tables from ECG. The required ECG output depends on the analysis script, see below.
 - `<output_table>` is the path where to store the table generated by the analysis script.

-*TODO: explain the content of the output files*
+The outputs are CSV files with the following structure:
+
+| Category 1 | Category 2 | ... | Timestamp |
+|------------|------------|-----|-----------|
+
+Where `Category 1`, `Category 2`, ... are the categories (package sources, build status or artifact status) being measured by the analysis functions. For each category, the amount of entities (packages, containers or artifacts) belonging to this category is given in the respective column. Categories are clarified below for each type of analysis.
+
+The timestamp corresponds to the time when the output file is being written.

 ##### Software environment analysis

@ -164,18 +171,24 @@ Depending on the type of analysis, multiple tables can be generated:

 The type of analysis can be specified using the option `-t`.

+The categories are all the package sources supported by ECG, in the following order: `dpkg, rpm, pacman, pip, conda, git, misc`
+
 ##### Artifact analysis

 The script `artifact_analysis.py` performs an artifact analysis by parsing one or more artifact hash logs generated by ECG.

 The table generated by this script gives the amount of artifacts that are available or not available, and the amount of artifacts that have been modified over time.

+The categories are all possible artifacts' status, in the following order : `available, unavailable, changed`
+
 ##### Build status analysis

 The script `buildstatus_analysis.py` performs a build status analysis by parsing one or more build status log generated by ECG.

 The table generated by this script gives the amount of images that have been built successfully, and the amount of images that failed to build, for each category of error.

+The categories are all build status supported by ECG, in the following order : `success, package_install_failed, baseimage_unavailable, artifact_unavailable, dockerfile_not_found, script_crash, job_time_exceeded, unknown_error`
+
 #### Plots with R

 Under the directory `plot`, you will find a `plot.r` script. Run it as follow:
--- a/test_scripts/clean.sh
+++ b/test_scripts/clean.sh
@ -1,5 +1,7 @@
 #!/bin/bash

+cd ..
+
 OUTPUT_PATH=output
 CACHE_DIR=cache

--- a/test_scripts/nickel.sh
+++ b/test_scripts/nickel.sh
@ -1,5 +1,7 @@
 #!/bin/bash

+cd ..
+
 ARTIFACT_NAME=$1
 ARTIFACT_OUT="artifacts/json/$ARTIFACT_NAME.json"
 ARTIFACT_IN="artifacts/nickel/$ARTIFACT_NAME.ncl"
--- a/test_scripts/run.sh
+++ b/test_scripts/run.sh
@ -1,5 +1,7 @@
 #!/bin/bash

+cd ..
+
 OUTPUT_PATH=output
 CACHE_DIR=cache
 TESTFILE=$1