全局姿态标准化。全局姿态标准化阶段,计算给定帧内源和目标人物身体形状和位置之间的差异,将源姿态图形转换到符合目标人物身体形状和位置的姿态图形。, 3. 欢迎大家兹瓷31号作品《YCY Dance Now》!. @杨超越吧官微:项目编号31《YCY Dance Now!》——Dance Alchemist github官网:项目issue编号177 B站:让杨…. The breakthroughs and innovations that we uncover lead to new ways of thinking, new connections, and new industries. 杨超越不会跳舞的问题让我们来解决!. Aaron Courville, and Yoshua Bengio. Given frame y from the original target video, we use pose detector P to obtain a corresponding pose stick figure x=P(y). Zhao, Adrian V Dalca, Fredo Durand, In: IEEE conference on computer vision and pattern recognition (CVPR), go back to reference Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. Credit: arXiv:1808.07371 [cs.GR] A small team of researchers at UC Berkeley has used neural-networking software to create a program that copies the dance … Yang, Yuliang Zou, Sungryull Sohn, In. Finally, we design a system to learn the mapping from the normalized pose stick figures to images of the target person with adversarial training. Andrew Tao, Jan Kautz, and What kind of deepfake detection do we want? : Over the past few years there have been several frameworks, which often (but not all) use GANS, developed to solve such mappings including pix2pix (Isola Adding the temporal smoothing setup does not seem to decrease the reconstructed pose distances significantly, however including the face GAN adds substantial improvements overall, especially for the face and hand keypoints. In: European conference on computer vision (ECCV), go back to reference Thies J, Zollhofer M, Stamminger M, Theobalt C, Niessner M (2016) Face2Face: real-time face capture and reenactment of RGB videos. AI-synthesized fake faces can be weaponized to cause negative personal and social impact. We now describe every component of our system in detail. Hyunsoo Kim, Jung Kwon Lee, and In addition, we release a first-of-its-kind open-source dataset of videos that can be legally used for training and motion transfer. generation. et al., 2017; Simon arXiv preprint arXiv:1812.04948, 2018. Ming-Yu Liu, Thomas don’t have to squint at a PDF. We predict two consecutive frames for temporally coherent video results and introduce a separate pipeline for realistic face synthesis. Use, Smithsonian train_facetexts128 (contains face 128x128 bounding box coordinates in .txt files) Deep Generative Image Models using a Laplacian refinement networks. This paper presents a simple method for "do as I do" motion transfer: given a … (or is it just me...), Smithsonian Privacy This is in contrast to approaches over the last two decades which employ nearest neighbor search (Bregler We found pre-smoothing pose keypoints to be immensely helpful in reducing jittering in our outputs. Ruben Villegas, Jimei et al., 2017; Simon Jun-Yan Zhu, Taesung 2017. 从标准化后的姿态图形推断目标人物的图像。这一阶段使用一个生成式对抗模型,训练模型学习从标准化后的姿态图形推断到目标人物图像。, 从目标视频中给定一个帧y,使用预训练的姿态检测模型P图获得对应的姿态图形x = P(y)。在训练阶段使用对应的(x, y)图像对去学习从姿态图形x到目标人物合成图像(即:G(x))的映射G。通过在鉴别器D使用对抗训练和在预训练VGGNet使用感知重建损失,我们可以优化生成器G,使其输出接近真实图像y。判别器试图区分“真实”的图像对(例如(x, y))和“伪造”的图像对(例如(x, G(x))。, 和训练过程相似,姿态检测模型P从源视频给定帧y'中抽取姿态图形x'。由于x'和目标视频中人物的身体尺寸和位置不同,我们通过全局姿态标准化转换,使其和目标人物更一致,记x。将x推入已训练的模型G中生成目标人物图像G(x),生成的图像与源视频中的y帧相对应。, 本文的姿态检测使用预训练的模型P(如:开源项目openpose等),得到精确的肢体关节坐标x,y的估计。通过连接各个关节点可以得到姿态图形,如图3所示。在训练过程中,姿态图形作为生成器G的输入。在迁移过程中P从源动作对象中获取估计x'并通过标准化匹配到目标人物。姿态估计相关文献见文章末尾。, 首先找到源视频和目标人物视频中最小和最大的脚踝关键点位置(距离镜头最近为最大,反之为最小)。方法很简单,靠近图像最底部的为最大脚踝关键点,另一个为最小。, 其中和分别为目标视频中最小和最大的脚踝关键点位置,和分为原视频的。为源视频的脚踝平均位置。为源视频当前帧相对于第一帧的姿态位置偏移量(文中未说明,我的观点)。, 通过修改基于pix2pixHD的对抗训练,可以生成时间连贯的视频帧以及合成真实的面部图像。, 在原始的条件化GAN中,生成器G用来对抗多尺寸的鉴别器D=(D1,D2,D3)。原始pix2pixHD的目标任务形式如下:, 是pix2pixHD中提出的鉴别器特征匹配损失。是感知重建损失,通过比较预训练VGGNet不同特征层中的差异获得。, 为了生成视频序列,本文修改了原始pix2pixHD中单个图像生成的模式,使其产生时间连续的相邻帧(图4)。模型预测两个连续的帧,第一个输出G(xt-1)由相应的动作图形xt-1和一个空图像z(值为0,由于没有t-2的帧输入所以用空值作为一个占位符)作为预测条件;第二个输出G(xt)以xt和G(xt-1)为条件。相应的,鉴别器的任务变为鉴别真实序列(xt-1, xt, yt-1, yt)和伪造序列(xt-1, xt, G(xt-1), G(xt))的真实性以及时间连续性。通过在原始pix2pixHD优化目标上添加时序平滑损失得到新的优化目标,形式如下所示:, 在使用生成器G得到整幅图像后,我们截取以面部为中心的小区域图像,将其和动作图形的相应区域XF输入到另一个生成器中,得到一个面部的残差。最终的输出是将残差加上对应区域的原始值,即。和原始pix2pix优化目标类似,鉴别器尝试区分“真实”面部图像对和伪造图相对。, 由于对于生成图像,没有相应的真实图像来评价。为了评价单个图像的质量,本文测量图像的Structural Similarity(SSIM)和Learned Perceptual Image Patch Similarity(LPIPS)。依靠质量分析来评价输出视频的时间连续性。SSIM和LPIPS的相关资料见文章末尾。, 表1记录了将生成的目标人物图像,按标准化动作图形边框裁剪后计算的结果。T.S表示生成器结果经过时序平滑的方案。T.S.+Face是本文的完整模型,包含时序平滑和面部生成。, 表3计算了姿态距离d。如果身体部分图像被正确的合成,那么合成图像的姿态图形应该和作为条件输入的姿态图形非常接近。为了评价姿态的一致性,本文设计了姿态距离矩阵来计算姿态差异。对于两个姿态p和p',每一个有n个连接点:p1,......,pn和p'1,......p'n。我们计算对应连接点的L2距离均值来衡量姿态距离。, 表4表示平均每幅图,在源动作图像中根据姿态检测得到连接点,而在生成图中姿态检测未检测到的点的数量。. For a pose distance metric between two poses p,p′ each with n joints p1,...,pn and p′1,...,p′n, we sum the L2 distances between the corresponding joints pk=(xk,yk) and p′k=(x′k,y′k) normalized by the number of keypoints. WebEverybody Dance Now. Liu, Jun-Yan Zhu, Guilin Liu, That could be dangerous. and John Guttag. !. ... Caroline and Ginosar, Shiry and Zhou, Tinghui and Efros, Alexei A}, journal={arXiv preprint … We collect 163 dance videos from Youtube, which … Although our setup can produce plausible results in many cases, occasionally our results suffer from several issues. Followed by a "local" stage model with 1024x512 resolution. et al., 2017; Wei 2. German KM Cheung, Simon We predict two consecutive frames for temporally coherent video results and introduce a separate pipeline for realistic face synthesis. Over the last two decades there has been extensive study dedicated towards motion transfer or retargeting. We show that our face GAN produces convincing facial features and improves upon the results of the full image GAN in our ablation studies detailed in Section 7.1. A style-based generator architecture for generative adversarial networks. Manipulation with Conditional GANs. (and a dataset of 230,000 3D facial landmarks). Detecting Deep Fakes: Insights from Biological Neural Nets Jonathan Saunders, University of Oregon. We add a specialized GAN setup designed to add more detail and realism to the face region as shown in Figure 5. David Warde-Farley, Sherjil Ozair, Jiwon Kim. arXiv preprint arXiv:1409.1556, go back to reference Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. et al., 2016). Alexei A. Efros, arXiv:1808.07371. Our goal is therefore to discover an image-to-image translation (Isola WebEach person has unique bio-information such as: a face, a fingerprint, an iris, which are forms of static information and many systems have been trying to use them in their … In, Coupled generative adversarial networks. a Perceptual Metric. To transfer motion between two video subjects in a frame-by-frame manner, we must learn a mapping between images of the two individuals. To revist this article, visit My Profile, then View saved stories. Everybody Dance Now. Our main contributions are a learning-based pipeline for human motion transfer between videos, and the quality of our results which demonstrate complex motion transfer in realistic and detailed videos. In. Both SSIM and LPIPS scores are similar for all model variations. MoCoGAN (Tulyakov It is the essential source of information and ideas that make sense of a world in constant transformation. Dave) One: 2007: Download: Anzeige: Sampler: Everybody Dance (12'' Version) (Chic) The Chic Organization - Up All Night: 2013: Download: Anzeige: Download: … This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We run the pose metric on particular sets of keypoints (body, face, hands) to determine the regions which incur the most error. Bernt Schiele, and Mario Fritz. Chris Hecker, Bernd WebEverybody Dance Now. Deine E-Mail-Adresse wird nicht veröffentlicht. Shape Generation. If all body parts are synthesized correctly, then the reconstructed pose should be close to the input pose on which the output was conditioned. Web在许多计算机图形应用中,创建逼真地呈现和可控制的人物角色动画是一项至关重要的任务。. Stamatios Georgoulis, Luc Van Gool, 2018. In, Photographic image synthesis with cascaded Ting-Chun Wang, Ming-Yu ¶ As my previous post shows, celebA contains over 202,599 images 网盘链接 数据包含了三个文件夹,一 … In: Conference on neural information processing systems (NeurIPS), go back to reference Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. Since the retargeting problem was first proposed between animated characters (Gleicher, 1998), solutions have included the introduction of inverse kinematic solvers to the problem (Lee and Shin, 1999) and retargeting between significantly different skeletons (Hecker et al., 2008). In this paper, we … arXiv preprint arXiv:1808.07371 (2018). Everybody Dance Now. 1997. Notice, Smithsonian Terms of For the image translation stage of our pipeline, we adapt the architectures proposed by Wang et al. ArXiv e-prints, Aug. 2018. 当然这一技术很有用,比如可以直接用其控制虚拟主持人的手势,让直播更自然。 We add two components to improve the quality of our results: To encourage the temporal smoothness of our generated videos, we condition the prediction at each frame on that of the previous time step. What You Should Know Before Using the Lensa AI App. Erforderliche Felder sind mit * markiert, Um dies zu sehen musst du angemeldet sein. In: IEEE international conference on computer vision (ICCV), go back to reference Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Qianru Sun, Bernt Schiele, IEEE Trans Pattern Anal Mach Intell (TPAMI), go back to reference Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The limitations of deep learning in adversarial settings. Python 3.6. Emily L Denton, Soumith In: IEEE conference on computer vision and pattern recognition (CVPR), go back to reference Ranjan R, Patel VM, Chellappa R (2015) A deep pyramid deformable part model for face detection. In: CVPR, go back to reference Kim H, Carrido P, Tewari A, Xu W, Thies J, Niessner M, Pérez P, Richardt C, Zollhöfer M, Theobalt C (2018) Deep video portraits. Taeksoo Kim, Moonsu Cha, This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. Again, scores are generally favorable for all ablations, although the full model with both the temporal smoothing and face GAN setups obtains the best scores with the biggest discrepancy in the face region. CUDA 9.0.176. 可是演示視頻需要科學上網,大概就是可以實現兩個人之間的動作的遷移,其實現的效果大概是將動作信息轉移到一個目標人物上, … In: ICASSP (2019), go back to reference Zeng X, Liu C, Qiu W, Xie L, Tai YW, Tang CK, Yuille AL (2017) Adversarial attacks beyond the image space. Want to hear about new tools we're making? et al., 2004) and Learned Perceptual Image Patch Similarity (LPIPS) (Zhang et al., 2018). Learning to Generate Long-term Future via 阅读量:. Die Geschichte hinter dem Erfolg des Songs—und der Band, die ihn erschuf—ist eine komplizierte Angelegenheit. et al., 1997; Efros To ensure the quality of the frames, we filmed our target subject for around 20 minutes of real time footage at 120 frames per second which is possible with some modern cell phone cameras. Pouget-Abadie, Mehdi Mirza, Bing Xu, To calculate the scale, we cluster the heights around the minimum ankle position and the maximum ankle position and find the maximum height for each cluster for each video. (pose stick figure x, ground truth image y)) and “fake” image pairs (i.e. For example, Video Rewrite creates videos of a subject saying a phrase they did not originally utter by finding frames where the mouth position matches the desired speech (Bregler Some deep generation models are proposed by previous research work on image and video generation [3, 14, 20, 21], including the popular model of generative … Alahi, and Li Fei-Fei. Similarly, our approach is designed for video subjects which can be found online or captured in person, although we learn to synthesize novel motions rather than manipulating existing frames. To avoid dealing with missing detections (i.e. Real-time motion retargeting to highly varied In: IEEE Conference on computer vision and pattern recognition (CVPR), go back to reference Yang S, Xiong Y, Loy CC, Tang X (2017) Face detection through scale-friendly deep convolutional networks. For more than 250 years, mathematicians have wondered if the Euler equations might sometimes fail to describe a fluid’s flow. 692; U of Maryland Legal Studies Research Paper No. In: CVPR, go back to reference Wu W, Zhang Y, Li C, Qian C, Change Loy C (2018) Reenactgan: learning to reenact faces via boundary transfer. Secur Commun Netw, go back to reference Lu J, Sibai H, Fabry E (2017) Adversarial examples that fool detectors. In: IEEE conference on computer vision and pattern recognition (CVPR), go back to reference Li Y, Chang MC, Lyu S (2018) In Ictu Oculi: exposing AI generated fake face videos by detecting eye blinking. et al., 2018) employs unsupervised adversarial training to learn this separation and generates videos of subjects performing novel motions or facial expressions. Even though we try to inject temporal coherence through our setup and presmoothing keypoints, our results often still suffer from jittering. Using pose detections as an intermediate representation between source and target, we learn a mapping from pose images to a target subject’s appearance. With this aligned data we are able to learn an image-to-image translation model between pose stick figures and images of our target person in a supervised way. UC Berkeley Theis, Ferenc Huszár, Jose Januar 2019 Johannes Wolters. We pose this problem as a 一、需要的环境:. Qualitatively, the temporal smoothing setup helps with smooth motion, color consistency across frames, and also in individual frame synthesis. When large language models fall short, the consequences can be serious. 2016. Since ground truth faces are not labeled in 300-W, we use the detection results of, Thermal-to-Visible Face Synthesis and Recognition, Multi-channel Face Presentation Attack Detection Using Deep Learning, go back to reference Afchar D, Nozick V, Yamagishi J, Echizen I (2018) MesoNet: a compact facial video forgery detection network. D attempts to distinguish between “real” image pairs (i.e. Yuezun Li, Siwei Lyu, Published in: arbitrary 3d characters. [1]Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A Efros. There was a problem preparing your codespace, please try again. 喜欢 0. generative adversarial networks. arXiv preprint arXiv:2003.06814, go back to reference Yang, X., Li, Y., Lyu, S.: Exposing deep fakes using inconsistent head poses. In. Scores on full images are even more similar between our ablations, as all ablations have no difficulty generating the static background. Web“Everybody Dance Now”,arXiv 1808.07371,2018 7、E Zakharov et al.,“Few-Shot Adversarial Learning of Realistic Neural Talking Head Models“,arXiv 1905.08233,2019 … Perceptual losses for real-time style transfer and Chang X, et al (2020) Deepfake face image detection based on improved VGG convolutional neural … Frameworks such as Recycle-GAN (Bansal arxiv.org Learn more. arXiv preprint arXiv:1611.02200, go back to reference Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. Dynamics Transfer GAN: Generating Video by This paper presents a simple method for „do as I do“ motion transfer: et al., 2003). In: International conference on learning representations (ICLR), go back to reference Güera D, Delp EJ (2018) Deepfake video detection using recurrent neural networks. Therefore, our model is trained to produce personalized videos of a specific target subject. Implementation accompanying paper: "Everybody dance now." 您可以直接购买此文献,1~5分钟即可下载全文,部分资源由于网络原因可能需要更长时间,请您耐心等待哦~, 百度学术集成海量学术资源,融合人工智能、深度学习、大数据分析等技术,为科研工作者提供全面快捷的学术服务。在这里我们保持学习的态度,不忘初心,砥砺前行。了解更多>>. joints detected on ground truth frames but not on outputs) on various regions and the whole pose as the pose metric does not accurately depict missed detections. We characterize our transformation in terms of scale and translation in the y direction, which is calculated for each frame. The dreamy picture-editing AI is a nightmare waiting to happen. In Table 4 we count the number of missed detections (i.e. Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)Cite as: arXiv:1808.07371 [cs.GR](or arXiv:1808.07371v1 [cs.GR] for this version), © 2023 Condé Nast. We collect source and target videos in slightly different manners. This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing, we can transfer that … In. Everybody Dance Now. In: IEEE international conference on computer vision (ICCV), go back to reference Yang S, Luo P, Loy CC, Tang X (2015) From facial parts responses to face detection: a deep learning approach. Caroline Chan UC Berkeley Shiry Ginosar UC Berkeley Tinghui Zhou UC Berkeley Alexei A. Efros UC Berkeley. [4] Chan, Caroline, et al. We adapt architectures from various models for different stages of our pipeline. Picture Limitless Creativity at Your Fingertips. carolineec.github.io. arXiv 1312:6199, go back to reference Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. 2018. [4]Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang. Another approach uses optical flow as a descriptor matches different subjects performing similar actions allowing “Do as I do” and “Do as I say” retargeting (Efros Ian Goodfellow, Jean Ad Choices. With our framework, we create a variety of videos, enabling untrained amateurs to spin and twirl like ballerinas, perform martial arts kicks or dance as vibrantly as pop stars. This paper presents a simple method for … arXiv:1712:02494, go back to reference Luo B, Liu Y, Wei L, Xu Q (2018) Towards imperceptible and robust adversarial example attacks against neural networks. This is achieved by disrupting deep neural network (DNN)-based face detection and facial landmark extraction method with specially designed imperceptible adversarial perturbations to reduce the quality of the detected faces. The final output is the addition of the residual with original face region r+G(x)F and this change is reflected in the relevant region of the full image. In: IEEE conference on computer vision and pattern recognition (CVPR), go back to reference Hu T, Qi H, Xu J, Huang Q (2018) Facial landmarks detection by self-iterative regression based landmarks-attention network. In: IEEE international conference on computer vision workshops (ICCV Workshops), go back to reference Li J, Zhang Y (2013) Learning surf cascade for fast and accurate object detection. In addition, we run the pose detector P on the outputs of each system, and compare these reconstructed keypoints to the pose detections of the original input video. Full paper - https://arxiv.org/pdf/1808.07371.pdfWebsite - https://carolineec.github.io/everybody_dance_now/ Everybody Dance Now. Abstract. Additionally, the 2D coordinates and missing detections constrict the number of ways we are able to retarget motion between subjects, which often work in 3D with perfect joint locations and temporally coherent motions. Alexei A. Efros, Tinghui Zhou, Shiry Ginosar, Caroline Chan - 2018. Christoph Bregler, Michele et al., 2016). et al., 2017), CycleGAN (Zhu Chintala, Rob Fergus, et al. et al., 2016) which compares pretrained VGGNet (Simonyan and 关于我们 … Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. To extract pose keypoints for the body, face, and hands we use architectures provided by a state of the art pose detector OpenPose (Cao Note the original_img is not necessary at test time and is provided only for reference. In. arXiv:1711.07183, go back to reference Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3FD: single shot scale-invariant face detector. In: IEEE international workshop on information forensics and security (WIFS), go back to reference Li Y, Lyu S (2019) Exposing deepfake videos by detecting face warping artifacts. 2018. In the original conditional GAN setup, the generator network G is engaged in a minimax game against multi-scale discriminators D=(D1,D2,D3). The results of our ablation study are presented 2016. However, we do not have corresponding pairs of images of the two subjects performing the same motions to supervise learning this translation directly. In order for the source pose to better align with the filming setup of the target, we apply a global pose normalization Norm to transform the source’s original pose x′ to be more consistent with the poses in the target video x. Ubuntu 18.04(但16.04也应该没问题). Everybody Dance Now. In: Conference on neural information processing systems (NeurIPS), go back to reference Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Hierarchical Prediction. Jordan Maynard, and Kees van Prooijen. Since normalized poses for transfer are often similar to those seen in training, we attribute this observation to the underlying difference between how our target and transfer subjects move given their unique body structure. Once the minimum and maximum ankle positions of each subject are found, we carry out a linear mapping between the minimum and maximum ankle positions of each video (i.e. Web好,我們剛剛透過P已經取得了動作姿勢,在這邊我們稱呼他為x'。 但隨著人物在畫面中高度以及身高的不同,對於產生結果的品質一定會有所影響,這裡,作者使用了兩 … Although our method is quite simple, it produces surprisingly compelling results (see video). The Spawn of ChatGPT Will Try to Sell You Things. During training, we use corresponding (x,y) pairs to learn a mapping G which synthesizes images of the target person given pose stick x. Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. A Variational U-Net for Conditional Appearance and Yang, Duygu Ceylan, and Honglak Lee. arXiv as responsive web pages so you Introduced by Chan et al. Web最初看到的应该也是知名度比较高,也比较有的代表性的工作应该是UC Berkeley的everybody dance now,效果应该算是至今较好的。 Everybody Dance Now … 这个人物称为任务视频生成(human video generation)。. We therefore design our intermediate representation to be pose stick figures such as in Figure 2. 2018. We approach this problem as video-to-video translation using pose as an intermediate representation. In: IEEE symposium on security and privacy (sp), go back to reference Chan C, Ginosar S, Zhou T, Efros AA (2018) Everybody dance now. Everybody Dance Now – Motion Transfer Paper by Caroline Chan, Shiry Ginosar, Tinghui Zhou, … We also provide code for creating both training and testing datasets (including global pose normalization) in the data_prep folder. Everybody Dance Now. Through adversarial training with discriminator D and a perceptual reconstruction loss dist using a pretrained VGGNet (Johnson 2014. In: IEEE international conference on computer vision (ICCV), go back to reference Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. Overall our model is able to create reasonable and arbitrarily long videos of a target person dancing given body movements to follow through an input video of another subject dancing. Richard Zhang, Phillip In: CVPR, go back to reference Liu B, Ding M, Zhu T, Xiang Y, Zhou W (2018) Using adversarial noises to protect privacy in deep learning era. Image quality assessment:from error visibility to structural similarity. Web"Dense pose transfer."
everybody dance now arxiv