Improving Photorealism of Driving Simulations with Generative Adversarial Networks – Unite.AI | Region & Cash

A new research initiative between the US and China has proposed the use of Generative Adversarial Networks (GANs) to increase the realism of driving simulators.

In a novel approach to the challenge of generating photorealistic POV driving scenarios, researchers have developed a hybrid method that plays to the strengths of different approaches by mixing the more photorealistic output of CycleGAN-based systems with conventionally generated elements that require a higher level of detail and consistency, such as B. Road markings and the actual vehicles viewed from the driver’s perspective.

Hybrid Generative Neural Graphics (HGNG) offer a new direction for driving simulations that preserve the fidelity of 3D models for essential elements (like road markings and vehicles) while playing to the strengths of GANs in generating interesting and non-repetitive backgrounds and detail . source

The system, called Hybrid Generative Neural Graphics (HGNG), injects highly limited outputs of a traditional CGI-based driving simulator into a GAN pipeline, where the NVIDIA SPADE framework does the environment generation work.

According to the authors, the benefit is that the driving environments become potentially more diverse and create a more immersive experience. As it stands, even converting CGI outputs to photorealistic neural rendering outputs cannot solve the repetition problem, since the original material entering the neural pipeline is constrained by the limitations of the model environments and their tendency to repeat textures and meshes is.

Source: https://www.youtube.com/watch?v=0fhUJT21-bs

Converted footage from the 2021 Improving Photorealism paper, which remains dependent on CGI-rendered footage, including the background and general environmental details, limiting the variety of environments in the simulated experience. Source: https://www.youtube.com/watch?v=P1IcaBn3ej0

The paper states*:

“The accuracy of a traditional driving simulator depends on the quality of its computer graphics pipeline, which consists of 3D models, textures and a rendering engine. High-quality 3D models and textures require craftsmanship, while the rendering engine must perform complicated physical calculations for lighting and shading to appear realistic.’

The new paper is titled Photorealism in Driving Simulations: Blending Generative Adversarial Image Synthesis with Renderingand comes from researchers at Ohio State University’s Department of Electrical and Computer Engineering and Chongqing Changan Automobile Co Ltd in Chongqing, China.

background material

HGNG transforms the semantic layout of an input CGI-generated scene by blending partially rendered foreground material with GAN-generated environments. Although the researchers experimented with different datasets to train the models, the KITTI Vision Benchmark Suite proved to be the most effective, containing mostly footage of driver POV footage from the German city of Karlsruhe.

HGNG generates a semantic segmentation layout from the CGI rendered output and then inserts SPADE with different style encodings to create random and diverse photorealistic background images including nearby objects in urban scenes.  The new paper states that repetitive patterns common to resource-constrained CGI pipelines disrupt the immersion of human drivers with a simulator, and that the more diverse backgrounds a GAN can provide alleviate this problem.

HGNG generates a semantic segmentation layout from the CGI rendered output and then inserts SPADE with different style encodings to create random and diverse photorealistic background images including nearby objects in urban scenes. The new paper states that repetitive patterns common to resource-constrained CGI pipelines disrupt the immersion of human drivers using a simulator, and that the more diverse backgrounds a GAN can provide can alleviate this problem.

Researchers experimented with both Conditional GAN ​​(cGAN) and CYcleGAN (CyGAN) as generative networks and ultimately found that both have strengths and weaknesses: cGAN requires paired datasets and CyGAN does not. However, CyGAN is currently unable to outperform the state of the art in traditional simulators pending further improvements in domain matching and cycle consistency. Therefore, cGAN currently performs best with its additional paired data requirements.

The conceptual architecture of HGNG.

The conceptual architecture of HGNG.

In HGNG’s neural graphics pipeline, 2D representations are formed from CGI-synthesized scenes. The objects passed from the CGI rendering to the GAN flow are limited to “essential” items, including road markings and vehicles, which a GAN itself cannot currently render with reasonable temporal consistency and integrity for a driving simulator. The cGAN synthesized image is then blended with the partial physics-based rendering.

exams

To test the system, researchers used SPADE, trained on cityscapes, to transform the scene’s semantic layout into photorealistic output. The CGI source comes from the open source driving simulator CARLA, which uses the Unreal Engine 4 (UE4).

Edition of the open source driving simulator CARLA.  Source: https://arxiv.org/pdf/1711.03938.pdf

Edition of the open source driving simulator CARLA. Source: https://arxiv.org/pdf/1711.03938.pdf

UE4’s shading and lighting engine provided the semantic layout and partially rendered images, outputting only vehicles and lane markings. The blending was achieved using a GP-GAN instance trained with the Transient Attributes Database and all experiments are run on an NVIDIA RTX 2080 with 8GB GDDR6 VRAM.

The researchers tested on semantic retention – the ability of the output image to conform to the initial semantic segmentation mask intended as a template for the scene.

In the test images above, we can see that in the render-only image (bottom left), the full render does not get any plausible shadows. The researchers note that here (yellow circle) shadows from trees falling on the sidewalk were incorrectly classified as “street” content by DeepLabV3 (the semantic segmentation framework used for these experiments).

In the middle column flow we see that vehicles created by cGAN do not have a consistent definition to be used in a driving simulator (red circle). In the column flow on the far right, the blended image conforms to the original semantic definition while retaining essential CGI-based elements.

To assess realism, the researchers used Frechet Inception Distance (FID) as a performance metric because it can work with paired or unpaired data.

Three datasets were used as ground truth: Cityscapes, KITTI and ADE20K.

The output images were compared to each other and to the physics-based (i.e., CGI) pipeline using FID scores, while also assessing semantic retention.

In the above results related to semantic retention, higher scores are better, with the pyramid-based CGAN approach (one of several pipelines tested by the researchers) performing best.

The results shown directly above refer to FID scores, with HGNG best rated using the KITTI dataset.

The render-only method (referred to as [23]) refers to the output of CARLA, a CGI flow that is not expected to be photorealistic.

Qualitative results on the traditional rendering engine (‘c’ in the image directly above) show unrealistically distant background information such as trees and vegetation, while requiring detailed models and just-in-time webloading and other processor-intensive techniques. In the middle (b) we see that cGAN does not receive sufficient definition for the essential elements, cars and road markings. In the proposed mixed edition (a), vehicle and road definition is good, while the environment is diverse and photorealistic.

The paper concludes by suggesting that the temporal consistency of the GAN-generated portion of the rendering pipeline could be increased by using larger urban datasets, and that future work in this direction could offer a viable alternative to costly neural transformations based on CGI streams while offering more realism and variety.

* My conversion of authors’ inline citations to hyperlinks.

First published on July 23, 2022.

Leave a Comment