OpenAI Safety Gym facilitates reinforcement learning

Market Expertz   |     June 15, 2020


While most of the work in data science is focused on development, advancement, and algorithmic scale, safety is a domain no less worth pursuing. This is mainly true for applications like self-driving vehicles, where a machine learning system’s bad judgment may contribute to an accident. OpenAI describes it as a suite of equipment for developing Artificial Intelligence that respects safety constraints at the time of training, and for comparing the “safety” of algorithms and the scope to which those algorithms can avoid mistakes at the time of learning.

Safety Gym is planned and designed for reinforcement learning agents, or AI that is increasingly spurred toward aims via rewards or even punishments. They learn by trial & error, which can be an unsafe attempt. The agents sometimes try unsafe behaviors that lead to errors. As a solution, OpenAI proposes a form of support learning known as constrained reinforcement learning, which incorporates cost functions that Artificial Intelligence must restrain. The field of support learning has dramatically advanced in recent years; on the other hand, various implementations use various evaluation and environment procedures. Therefore, the researchers take into consideration the fact that there is a lack of a standard set of environments for making development on safe exploration. To this end, the researchers provided Safety Gym, a suite of devices for accelerating safe exploration study. Safety Gym is a standard suite of 18 high-dimensional continuous control environments for safe exploration and 9 additional environments for debugging task performance separately from safety needs, as well as tools for building additional environments.

After conducting various experiments on the constrained & unconstrained reinforcement learning algorithms on the constrained Safety Gym environments, experts found that the unconstrained reinforcement learning algorithms can score maximum returns by taking unsafe actions, as measured by the rate function. However, the constrained reinforcement learning algorithms achieve lower levels of return and likewise maintain desired levels of costs. As a result, the OpenAI researchers recommend standardized constrained reinforcement learning as the major formalism for safe exploration.