Code should be stored as two python scripts (.py files):
“measure_disparity.py” takes in a set of model predictions and quantifies discrimination in model outcomes.
(1) A dataframe with one row per individual. Columns will include:
(i) Model prediction (as a probability)
(ii) Binary outcome (i.e. 0 or 1, where 1 indicates the favorable outcome for the individual being scored)
(iii) Model label(iv) Sample weights
(v) Demographic data on protected and reference classes
(1) One value per protected class measuring discrimination for each metric used
(2) [Optional] graphics/visualization, useful formatted output
“mitigate_disparity.py” takes in a model development dataset (training and test datasets) that your algorithm has not seen before and generates a new, optimally fair/debiased model that can be used to make new predictions.
(1) A model development dataset that contains information on:
(i) Model features
(ii) Model label
(iii) Sample weights
(iv) Demographic data on protected and reference classes
(1) The fair/debiased model object, taking the form of a sklearn-style python object with the following functions accessible:
(i) .fit() – trains the model
(ii) .predict() / .predict_proba() – makes predictions using new data
(iii) .transform() – filters or modifies input data, if applicable
(2) [Optional] graphics/visualization, useful formatted output Python version must be 3.8 or higher. You should also include a “readme.txt” which includes installation instructions and a “requirements.txt” file that lists packages and versions. Submission through the docker image is optional, if necessary.