Reinforcement learning is a machine learning area that studies which actions an agent can take in order to optimize a cumulative reward function. Recently, a new class of reinforcement learning algorithms with multiple, possibly conflicting, reward functions was proposed. We call this class of algorithms the multi-objective reinforcement learning (MORL) paradigm. We give an overview on multi-objective optimization techniques imported in MORL and their theoretical simplified variant with a single state, namely the multi-objective multi-armed bandits (MOMAB) paradigm.