Cross-modal variational inference for bijective signal-symbol translation