To achieve this effect there were a few hurdles to overcome. The first being how quickly the image was being changed, you may notice in the video above, the morphs occur in bursts, it changes quickly and then stabilises, changes quickly and then stabilises, in a loop. This is done to allow the viewer to be able to keep up. In the naive approach where the change is at a constant speed I found it was a blur of motion and the viewer was not able to make out any details. So I changed my script to run with a sine wave driving the change amount.
# 'frames_per_wave' is set in the Gradio UI defining the size of the sine wave in frames
# 'denoising_strength_change_amplitude' is set in UI to control the change rate, i.e. the amplitude of the wave
# 'denoising_strength_change_offset' is set in UI to offset the sine wave some amount of frames to better align with the prompt changes
# 'p' is a dictionary containing the current frames settings and 'denoising_strength' is the amount of change done to the input image
denoising_strength_change_rate = 180 / frames_per_wave
cos = abs(math.cos(math.radians(i * denoising_strength_change_rate + denoising_strength_change_offset)))
p.denoising_strength = initial_denoising_strength + denoising_strength_change_amplitude - (cos * denoising_strength_change_amplitude)
The next major issue I ran into was that after about 150 frames the practice of looping the output back into the input was causing a drift in colour balance and in contrast. The colour balance would steadily shift toward orange tones, and the contrast would increase.
I'm unclear why exactly this happened, but I suspect the trained model I was using had a bias toward higher contrast images with a very slight preference for orange. Which you would never notice in any single image generation. It is only with the repeated exposure that the effect compounded enough to become notable.
To resolve this issue I first attempted to colour correct during the output phase of each frame. I would apply some amount of counterweight to both the colour tones and to the contrast. This proved somewhat helpful but the AI was not consistent in the amount of drift, so using a constant value was either under or over correcting in any given run.
Following some experimentation I landed on a solution that would create a plan ahead of time for the entire animations colour values. I created an additional process at the beginning of the run that would take the very first frame and use that to feed into the prompt at several points across the length of the animation with a high amount of change (denoise strength). This creates 'keyframes' for the entire animation that are only a single feedback loop deep, so have basically no colour drift.
I found that there was the chance that some of these keyframes could get a fluke extra colour mixed in where the AI added in some element that was unusually coloured. So to combat this I would actually generate each keyframe four times and blend them together to iron out any variance.
Once I had these keyframes generated, then at the end of each actual frame generation I could calculate the difference between that frame and a keyframe in contrast and colour then correct accordingly. When part way between keyframes I would blend the 2 nearest keyframes by the amount of distance between them, i.e. when keyframes are at frames 20 and 40, frame 35 would blend 25% of keyframe 20 and 75% of keyframe 40 to get it's target values.
def get_cc_target(
self,
targets: dict,
index: int,
frames: int
):
if str(index) in targets:
return targets[str(index)]
a = 0
b = 0
target_a = None
target_b = None
alpha = 0.5
for i in range(frames):
if str(i) in targets:
if i <= index:
a = i
target_a = targets[str(i)]
elif target_b is None:
b = i
target_b = targets[str(i)]
else:
break
alpha = (index - a) / (b - a)
return Image.blend(target_a, target_b, alpha)
def apply_color_correction(
self,
target_image: Image.Image,
original_image: Image.Image,
index: int,
axis=2
):
base_image = cv2.cvtColor(np.asarray(original_image.copy()), cv2.COLOR_RGB2LAB)
correction = cv2.cvtColor(np.asarray(target_image.copy()), cv2.COLOR_RGB2LAB)
histogram = exposure.match_histograms(
base_image,
correction,
channel_axis=axis
)
histogram = cv2.cvtColor(histogram, cv2.COLOR_LAB2RGB)
image = Image.fromarray(histogram.astype("uint8"))
image = blendLayers(image, original_image, BlendType.LUMINOSITY)
return image
def get_gamma_diff(
self,
target_image: Image.Image,
original_image: Image.Image,
):
base_image = cv2.cvtColor(np.asarray(original_image.copy()), cv2.COLOR_RGB2LAB)
correction = cv2.cvtColor(np.asarray(target_image.copy()), cv2.COLOR_RGB2LAB)
l1 = base_image[:, :, 0]
l2 = correction[:, :, 0]
# Compute the mean and standard deviation of the L channel of each image
mean_l1, std_l1 = np.mean(l1), np.std(l1)
mean_l2, std_l2 = np.mean(l2), np.std(l2)
# Compute the scaling factor between the two images
scale_factor = (std_l2 / std_l1) * ((mean_l2 - mean_l1) / mean_l1)
return scale_factor
def apply_gamma_correction(
self,
original_image: Image.Image,
gamma: float,
index: int
):
image_values = np.asarray(original_image.copy())
corrected_array = exposure.adjust_gamma(image_values, gamma=gamma, gain=1)
image = Image.fromarray(corrected_array.astype("uint8"))
return image
All of the image manipulation was done using some common python libraries for image manipulation; PIL and skimage. With help from numpy for dealing with some of the maths.
There is a bunch more code involved in orchestrating the whole loop and passing the images around, saving output and so on, but I don't want this to get too code heavy.
The final step of the video process was to take the frames generated from the main loop and increase the frame rate from 10fps up to 60fps, which really helps to settle down the changes between generated frames. This interpolation is done with another AI algorithm called RIFE, which fills in the additional frames. While I could have built in this final step to my plugin directly, I never actually did as I found a piece of software called FlowFrames that would run RIFE without any additional scripting.