Normalizing Flows (NFs) (Rezende & Mohamed, 2015) learn an invertible mapping from an (easy) latent/base/source distribution to a given data/target distribution.
n_samples = 1000
target_data, _ = datasets.make_circles(n_samples=n_samples, factor=0.5, noise=0.05)
target_data = StandardScaler().fit_transform(target_data)
plt.scatter(target_data[:,0], target_data[:,1], alpha=0.5)
plt.show()
Let us take a standard normal as a latent distribution.
latent_dist = torch.distributions.Independent(torch.distributions.Normal(torch.zeros(2), torch.ones(2)), 1)
latent_data = latent_dist.sample(torch.Size([1000,])).detach().numpy()
plt.scatter(latent_data[:,0], latent_data[:,1], color=sns.color_palette()[1], alpha=0.5)
plt.show()
Now we map samples from the base distribution through a neural network and look at the output.
bijector = bijectors.SplineAutoregressive()
transformed_dist = distributions.Flow(latent_dist, bijector)
data = transformed_dist.sample(torch.Size([1000,])).detach().numpy()
plt.scatter(data[:,0], data[:,1], color=sns.color_palette()[2], alpha=0.5)
plt.show()
Let us optimize the neural network by maximing the log-likelihood of the data samples.
dataset = torch.tensor(target_data, dtype=torch.float)
optimizer = torch.optim.Adam(transformed_dist.parameters(), lr=5e-3)
for step in range(2000):
optimizer.zero_grad()
loss = -transformed_dist.log_prob(dataset).mean()
loss.backward()
optimizer.step()
if step % 500 == 0:
print('step: {}, loss: {}'.format(step, loss.item()))
step: 0, loss: 2.885023355484009 step: 500, loss: 1.8666640520095825 step: 1000, loss: 1.8461662530899048 step: 1500, loss: 1.7394717931747437
data = transformed_dist.sample(torch.Size([1000,])).detach().numpy()
plt.scatter(target_data[:,0], target_data[:,1], alpha=0.5, label="data")
plt.scatter(data[:,0], data[:,1], color=sns.color_palette()[2], alpha=0.5, label="prediction")
plt.legend()
plt.show()