نبذة مختصرة : International audience ; The stochastic multi-armed bandit is a classical reinforcement learning model, where a learning agent sequentially chooses an action (pull a bandit arm) and the environment responds with a stochastic reward drawn from an unknown distribution associated with the chosen action. A popular objective for the agent is to identify the arm having the maximum expected reward, also known as the best arm identification problem. We address the security concerns that occur in a cross-silo federated learning setting, where multiple data owners collaborate under the orchestration of a server to execute a best arm identification algorithm. We propose three secure protocols, which guarantee desirable security properties for the: input data (i.e., reward values), intermediate data (i.e., sums of rewards), and output data (i.e., ranking of arms and in particular the identified best arm). Each protocol has a different architecture, uses different techniques, and proposes a different trade-off with respect to several criteria that we thoroughly analyze: number of participants, generality of the supported reward functions, cryptographic overhead, and communication cost. To build our protocols, we rely on secure multi-party computation, AES-CBC, and the additive homomorphic property of Paillier.
No Comments.