Page 1 of 1

1. Attack surface and vulnerabilities of the ML system

The following questionnaire is composed of 24 questions meant to estimate the attack surface of your machine-learning system. The questions are designed to identify potential vulnerabilities in the several assets that compose a typical machine-learning pipeline. These questions are organized into 4 categories related to the stages in the lifecycle of a machine-learning system:

· Training data collection (8 questions)

· Design & Implementation (4 questions)

· Training process (5 questions)

· Deployment & Integration + Inference (7 questions)

If your organization runs several machine learning systems, consider one of them when answering this questionnaire. Alternatively, you can complete one questionnaire you would like to evaluate per machine learning system.

Target: Data Scientist (or Data Engineer)

Training data collection:

1. What data sources are used to build your training dataset? (Select all that apply)

Untitled checkboxes field

Internal data (i.e., dataset fully produced inside the organization)

Customer supplied data

Data derived from customer behavior

Private external dataset(s)

Public dataset(s)

2. Do you track the provenance of your training data (i.e., you authenticate your data sources and record which source provided which data)? (Select all that apply)

Untitled checkboxes field

We authenticate our data sources

We keep a record linking data sources and training data

We do not trace our data sources

3. If you use data derived from, or supplied by, customers, how many customers contribute to build your training dataset?

Untitled checkboxes field

2-5

6-20

20-500

500-10,000

10,000-500,000

Does not apply

4. How much data are in your training datasets?

Untitled checkboxes field

<1k

1K-100K

100K-10M

10M-100M

<100M

5. How is your training data labelled? (Select all that apply)

Untitled checkboxes field

Internally by human operators

Internally using an automated process

By customers

By external trusted party

Crowdsourced labels (e.g., for public datasets)

It is not labelled (e.g., we use unsupervised training)

6. Do processes exist by which external parties can modify labels in your training dataset (e.g., through customer feedback)?

Untitled checkboxes field

Yes (optional open text field: how?)

I don't know

7. Where is your training data stored? (Select all that apply)

Untitled checkboxes field

On-premises

Cloud services

Devices managed by your organization (e.g., devices your organization owns, leases out, and BYOD)

Devices not managed by your organization (e.g., personal devices, or devices you have sold and cannot manage)

8. Do you use external/public serialization libraries to transform and store your training data (e.g., Pickle, Numpy .npy, etc.)?

Untitled checkboxes field

Yes

I don't know

Design / Implementation:

9. What type of data is used as input to your ML model? (Select all that apply)

Untitled checkboxes field

Natural data (image, sound, etc.)

Textual data

Numerical data

Categorical data

Other (+ optional open text field)

10. How is data represented when input to your ML model? (Select all that apply)

Untitled checkboxes field

Raw format (e.g., image pixel values, character strings, etc.)

Automatically generated representation (e.g., Word2Vec, bag-of-words, PCA representation, etc.)

Manually defined features (e.g., features extracted using manually defined functions)

Other (+ optional open text field)

11. What type of learning task do you perform? (Select all that apply)

Untitled checkboxes field

Supervised learning (e.g., classification, regression)

Unsupervised learning (e.g., clustering, density estimation)

Reinforcement learning

Semi-supervised learning

Self-supervised learning

Multi-instance learning

Other (+ optional open text field)

12. What type of machine learning model do you use? (Select all that apply)

Untitled checkboxes field

Regression model

Kernel-based model (including SVM)

Bayesian model

Tree-based classifiers

Ensemble classifiers

Neural Networks (including DNN, CNN, RNN, etc.)

Clustering

Other (+ optional open text field)

Training process

13. Do you use a pre-trained ML model as a basis to train your ML model (e.g., through fine-tuning or transfer learning)? (Select all that apply)

Untitled checkboxes field

No, we train from scratch

Yes, a private model pre-trained internally

Yes, a private model pre-trained by an external trusted party

Yes, a public pre-trained model (e.g., from a public platform like TensorFlow Hub, Microsoft Azure AI gallery)

14. Do you use an external Machine Learning as a Service (MLaS) platform to train your ML model?

Untitled checkboxes field

Yes (Microsoft Azure Machine Learning, Amazon Web Service Machine Learning (e.g., Amazon SageMaker, Google Cloud Machine Learning (e.g., AutoML), IBM Watson Machine Learning, BigMLO, ther (open text field).

15. Do you use external/public machine learning and/or data pre-processing framework/libraries to prepare and/or train your ML model (e.g., scikit-learn, (Py)Torch, TensorFlow, Keras, etc.)?

Yes

16. Do you and how often do you retrain your model?

Untitled checkboxes field

Periodically scheduled batch training (Daily, Weekly, Monthly, Yearly)

On demand batch training (automatically triggered)

On demand batch training (manually launched)

Online training

No retraining

17. Do you package your trained model using external or public serialization libraries (e.g., pickle, hdf5, dill, ONNX, pmml, etc.)?

Untitled checkboxes field

Yes

Deployment / Inference:

18. How is your trained model deployed?

Untitled checkboxes field

Accessible through an online API hosted on premises

Accessible through an online API hosted on a self-managed cloud service

Accessible through an online API hosted on a third-party managed cloud service

Locally deployed (e.g., on customer devices) and obfuscated/encrypted

Locally deployed (e.g., on customer devices) in cleartext

19. Who or what produces inputs for the ML model during inference? (Select all that apply)

Untitled checkboxes field

Internal human users (e.g., employees)

External human users (e.g., customers)

Internal system(s)

External system(s)

20. Are the authenticity, integrity and confidentiality of the inputs to your ML model protected? (Select all that apply)

Untitled checkboxes field

Authenticity of the producer (i.e., authenticity of the input data it produces) can be verified

Integrity of the input data (in transit between the producer and the model) is protected

Confidentiality of the input data (while it is being transferred or stored) is protected

21. Who or what is the end user/consumer of the ML model’s predictions/recommendations? (Select all that apply)

Untitled checkboxes field

Internal human users (e.g., employees)

External human users (e.g., customers)

Internal system(s)

External system(s)

22. Are the authenticity, integrity and confidentiality of the ML model’s predictions protected? (Select all that apply)

Untitled checkboxes field

The authenticity of the machine learning system (i.e., the authenticity of its predictions) can be verified

The integrity of the predictions (in transit between the model and the consumer) is protected

Confidentiality of the predictions (while they are being transferred or stored) is protected

23. Does your ML model return to the user any information apart from the predictions/recommendations specified as the model's purpose?

Untitled checkboxes field

No (e.g., your ML system is meant to detect fraudulent payments, and it only returns a binary decision whether a payment is fraudulent or not)

Yes (e.g., your ML system is meant to detect fraudulent payments, but it returns the decision of whether a payment is fraudulent or not together with a confidence score for its decision, while this confidence score is not used. Your ML system is meant to predict a single annotation for an image, but it returns the top 5 predictions together with their confidence scores, while these 4 additional predictions and the confidence scores are not used.)

24. How would you characterize the granularity of the predictions/recommendations returned by your model to the users?

Untitled checkboxes field

Coarse (e.g., binary decision, top class in a classification problem)

Medium (e.g., top classes + their “probabilities” in a classification problem)

Fine-grained (e.g., full probability vector for all classes in a classification problem)