Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model
| Entity Passport | |
| Registry ID | arxiv-paper--unknown--2308.13164 |
| License | ArXiv |
| Provider | semantic_scholar |
Cite this paper
Academic & Research Attribution
@misc{arxiv_paper__unknown__2308.13164,
author = {Xunpeng Yi, Han Xu, H. Zhang, Linfeng Tang, Jiayi Ma},
title = {Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model Paper},
year = {2026},
howpublished = {\url{https://free2aitools.com/paper/arxiv-paper--unknown--2308.13164}},
note = {Accessed via Free2AITools Knowledge Fortress}
} π¬Technical Deep Dive
Full Specifications [+]βΎ
βοΈ Nexus Index V2.0
π¬ Index Insight
FNI V2.0 for Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model: Semantic (S:50), Authority (A:74), Popularity (P:57), Recency (R:100), Quality (Q:45).
Verification Authority
π Executive Summary
β Cite Node
@article{Unknown2026Diff-Retinex:,
title={Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model},
author={},
journal={arXiv preprint arXiv:arxiv-paper--unknown--2308.13164},
year={2026}
} Abstract & Analysis
[2308.13164] Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model
Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model
Xunpeng Yi 1 , Han Xu 1 , Hao Zhang, Linfeng Tang, and Jiayi Ma Electronic Information School, Wuhan University, Wuhan 430072, Chinaββ {yixunpeng, xu_han}@whu.edu.cn,β{zhpersonalbox, linfeng0419, jyma2010}@gmail.com Corresponding author
Abstract
In this paper, we rethink the low-light image enhancement task and propose a physically explainable and generative diffusion model for low-light image enhancement, termed as Diff-Retinex . We aim to integrate the advantages of the physical model and the generative network. Furthermore, we hope to supplement and even deduce the information missing in the low-light image through the generative network. Therefore, Diff-Retinex formulates the low-light image enhancement problem into Retinex decomposition and conditional image generation. In the Retinex decomposition, we integrate the superiority of attention in Transformer and meticulously design a Retinex Transformer decomposition network (TDN) to decompose the image into illumination and reflectance maps. Then, we design multi-path generative diffusion networks to reconstruct the normal-light Retinex probability distribution and solve the various degradations in these components respectively, including dark illumination, noise, color deviation, loss of scene contents, etc . Owing to generative diffusion model, Diff-Retinex puts the restoration of low-light subtle detail into practice. Extensive experiments conducted on real-world low-light datasets qualitatively and quantitatively demonstrate the effectiveness, superiority, and generalization of the proposed method.
1 1 footnotetext: These authors contributed equally to this work.
1
Introduction
Images taken in low-light scenes are usually affected by a variety of degradations, such as indefinite noise, low contrast, variable color deviation, etc . Among these degradations, the loss of scene structures is the most thorny. As shown in Fig.Β 1 , the loss of scene structures is not only limited to affecting the visual effect but also reducing the amount of information. Image enhancement is an effective approach to reduce the interference of degradations on human perception and subsequent vision tasks, and finally present high-quality images.
Figure 1: Example of URetinexΒ [ 40 ] and Diff-Retinex for LLIE. Diff-Retinex can repair some missing scene contents by rethinking LLIE through the generative diffusion model.
To handle these degradations, many low-light image enhancement (LLIE) methods have been proposedΒ [ 20 , 26 ] . Besides, a series of research on contrast enhancement, noise removal, and texture preservation have been carried out. The mainstream LLIE methods can be roughly divided into traditionalΒ [ 35 , 3 ] and learning-based methodsΒ [ 22 , 15 , 25 , 23 , 18 ] . Traditional algorithms are usually based on the image prior or simple physical models. For instance, gray transformΒ [ 35 , 14 ] and histogram equalizationΒ [ 5 , 34 ] adjust the intensity distribution by linear or nonlinear means. The Retinex modelΒ [ 12 , 16 , 24 ] decomposes the image into illumination and reflectance images, and the problem is solved with traditional optimization methods. However, these methods are also subject to manual design and optimization-driven efficiency. They usually suffer from poor generalization and robustness, limiting the application scope of these methods.
To address these drawbacks, deep learning is utilized to construct the complex mapping from low-light to normal-light imageΒ [ 22 , 38 ] . Some methods entirely regard low-light image enhancement as a restoration task through an overall fitting, lacking theoretical support and interpretability of physical models. Compared with physical model-based methods, they usually exhibit less targeted enhancement performance, manifested as uneven illumination, non-robustness to noise, etc . The main cause is the shortage of specific definitions of some degradations and targeted processing for them. The physical model-based methods decompose the image into components with physical significance. Then, specific processing is conducted on the components for more targeted enhancement.
However, existing methods hardly escape the essence of fitting. More concretely, the distorted scene can be better rendered through denoising in the existing methods while the missing scene content cannot be repaired. Take Fig.Β 1 as an example, the state-of-the-art method (URetinexΒ [ 40 ] ) cannot restore the weak and missing details and even aggravates the information distortion to a certain extent. To solve this drawback and considering that LLIE is a process of recovering a normal-light image with the guidance of the low-light image, we rethink LLIE with a generative diffusion model. We aim to recover or even reason out the weak, even the lost information in the original low-light image. Thus, LLIE is regarded as not only a restoration fitting function but also an image generation task with the condition. As for the generative model, generative adversarial networks (GAN)Β [ 36 , 46 ] train a generator and a discriminator in an adversarial mechanism. However, they suffer from training instability, resulting in problems such as mode collapse, non-convergence, and exploding or vanishing gradients. Moreover, the GAN-based LLIE methods also have the problem of directly generating the normal-light image through an overall fitting, lacking physical interpretability as mentioned before.
To this end, we propose a physically explainable and generative model for low-light image enhancement, termed as Diff-Retinex . We aim to integrate the advantages of the physical model and the generative network. Thus, Diff-Retinex formulates the low-light image enhancement problem to Retinex decomposition and conditional image generation. In the Retinex decomposition, we integrate the characteristics of TransformerΒ [ 21 , 41 ] and meticulously design a Retinex Transformer decomposition network (TDN) to improve the decomposition applicability. TDN decomposes the image into illumination and reflectance maps. Then, we design generative diffusion-based networks to solve the various degradations in these components respectively, including dark illumination, noise, color deviation, loss of scene contents, etc . The main contributions are summarized as:
We rethink low-light image enhancement from the perspective of conditional image generation. Rather than being limited to enhancing the original low-quality information, we propose a generative Retinex framework to further compensate for content loss and color deviation caused by low light.
Considering the issues of decomposition in Retinex models, we propose a novel Transformer decomposition network. It can take full advantage of the attention and layer dependence to efficiently decompose images, even for images of high resolutions.
To the best of our knowledge, it is the first study that applies the diffusion model with Retinex model for low-light image enhancement. The diffusion model is applied to guide the multi-path adjustments of illumination and reflectance maps for better performance.
Figure 2: Overall framework of Diff-Retinex. It contains three detachable modules, i.e. , Transformer Decomposition Network (TDN), Reflectance Diffusion Adjustment (RDA), and Illumination Diffusion Adjustment (IDA).
2
Related Work
2.1
Retinex-based LLIE Methods
The theory of the retinal-cortex (Retinex) is based on the model of color invariance and the subjective perception of color by the human visual system (HVS)Β [ 13 ] . It decomposes the image into illumination and reflectance maps. It has been widely used in low-light image enhancement and has been proven to be effective and reliable.
Traditional Approaches . In some methods, the resolution of illumination and reflectance patterns is realized by a Gaussian filter or a group of filter banks, such as SSRΒ [ 10 ] and MSRΒ [ 9 ] . LIMEΒ [ 4 ] estimates the lighting map by initializing the maximum values of three channels and applying a structural prior refinement to form the final illumination map. JEDΒ [ 32 ] enhances the image and suppresses noise by combining sequence decomposition and gamma transform. Traditional methods mainly show poor generalization and poor robustness, limiting their application.
Deep Learning-based Approaches . Retinex-NetΒ [ 39 ] combines the paradigm of Retinex decomposition with deep learning. It applies a phased decomposition and adjustment structure and uses BM3DΒ [ 2 ] for image denoising. Similarly, KinDΒ [ 44 ] and KinD++Β [ 43 ] adopt the decomposition and adjustment paradigm and use the convolutional neural network (CNN) to learn the mapping in both decomposition and adjustment. Robust RetinexΒ [ 45 ] decomposes the image into three components, i.e. , illumination, reflectance, and noise. Then, it estimates the noise and restores the illumination by iterating with the guidance of loss, so as to achieve the purpose of denoising and enhancement. Although these methods show excellent performances, the CNN-based decomposition cannot make full use of global information due to the limitations of convolution. Moreover, they also suffer from some thorny problems, such as the difficulty of designing loss functions and the challenge of completing some missing scene contents.
2.2
Generative LLIE Methods
With the development of variational auto encoder (VAE)Β [ 11 ] , GANΒ [ 36 , 46 ] , and other generative models, image generation can achieve excellent results. From a new perspective, the generative model can take the low-light image as the condition and generate the corresponding normal-light image to objectively realize the goal of low-light image enhancement. EnlightenGANΒ [ 8 ] designs a single generative to directly map a low-light image to a normal-light image. It is combined with global and local discriminators to realize the function. CIGANΒ [ 28 ] uses a cycle-interactive GAN to complete the cycle generation and information transmission of light between normal-light and low-light images. These methods have achieved results. However, the training process of GAN is difficult and the convergence of the loss function is unstable. Recently, diffusion modelsΒ [ 7 , 29 , 31 ] have emerged as a powerful family of generative models with record-breaking performance in many domains, including image generation, inpainting, etc . It overcomes some shortcomings of GAN and breaks the long-term dominance of GAN in image generation. In this paper, we explore a novel approach to combine Retinex model with the diffusion model for the first time.
3
Methodology
The overall framework of Diff-Retinex is summarized as Fig.Β 2 . A general Retinex-based enhancement framework should be able to decompose images flexibly and remove various degradations adaptively. Therefore, Transformer decomposition network first decomposes the image into illumination and reflectance maps according to the Retinex theory. Then, the illumination and reflectance maps are adjusted through the multi-path diffusion generation adjustment network (including reflectance diffusion adjustment and illumination diffusion adjustment). The enhanced result is the production of the adjusted components.
3.1
Transformer Decomposition Network
The classical Retinex theory assumes that an image can be decomposed into reflectance and illumination maps as:
I = R β
L , πΌ β
π
πΏ I=R\cdot L,
(1)
where I πΌ I is the input image. R π R and L πΏ L denote the reflectance and illumination maps, respectively. It is essentially an ill-posed problem. The reflectance map reflects the scene content, so it tends to be constant in different lighting conditions. The illumination map is related to the lighting condition and should present local smoothness.
Specially, some degraded images may also carry complex noise with varying degrees of pollution. In this condition, we tend to follow the decomposition property that the illumination map is locally smooth. Thus, the noise is decomposed into the reflectance map. The optimization objective to realize Retinex decomposition in our method is generally represented via Eq.Β ( 2 ):
min R , L Ο β ( R β
L ) + Ξ± β Ο β ( R ) + Ξ² β Ο β ( L ) , subscript min π
πΏ π β
π
πΏ πΌ italic-Ο π
π½ π πΏ \mathop{\text{min}}\limits_{R,L}\tau(R\cdot L)+\alpha\phi(R)+\beta\psi(L),
(2)
where Ο β ( R β L ) π β π πΏ \tau(R\cdot L) ensures that the image can be reconstructed from the decomposed illumination and reflectance maps. Ο β ( R ) italic-Ο π \phi(R) constrains the consistency of reflectance map. Ο β ( L ) π πΏ \psi(L) makes the illumination map simple in structure and smooth in segments. Ξ± πΌ \alpha and Ξ² π½ \beta are the hyper-parameters. The detailed designs of the loss functions are as follows.
3.1.1
Loss Functions
Based on Eq.Β ( 2 ), we design the following loss functions, including the reconstruction loss, reflectance consistency loss, and illumination smoothness loss, to optimize the Transformer Decomposition Network. Considering the reflectance consistency in different illumination conditions, we use paired low-light and normal-light images for training, denoted as I l subscript πΌ π I_{l} and I n subscript πΌ π I_{n} , respectively. The reflectance maps decomposed from them are denoted as R l subscript π π R_{l} and R n subscript π π R_{n} , respectively. The corresponding illumination maps are represented by L l subscript πΏ π L_{l} and L n subscript πΏ π L_{n} .
Reconstruction Loss Ο β ( R β L ) π β π πΏ \tau(R\cdot L) . It guarantees that the decomposed R π R and L πΏ L can reconstruct the original image. Thus, this loss is denoted by considering the image fidelity:
L r β e β c = β R n β
L n β I n β 1 + Ξ± r β e β c β β R l β
L l β I l β 1 + ΞΎ β ( L c β r β s ) , subscript πΏ π π π subscript norm β
subscript π
π subscript πΏ π subscript πΌ π 1 subscript πΌ π π π subscript norm β
subscript π
π subscript πΏ π subscript πΌ π 1 π subscript πΏ π π π L_{rec}=\|R_{n}\cdot L_{n}-I_{n}\|_{1}+\alpha_{rec}\|R_{l}\cdot L_{l}-I_{l}\|_{1}+\xi(L_{crs}),
(3)
where Ξ± r β e β c subscript πΌ π π π \alpha_{rec} is the hyper-parameters, applied to adjust the contribution of different illumination. ΞΎ β ( L c β r β s ) π subscript πΏ π π π \xi(L_{crs}) is an small auxiliary function for the cross multiplication of the illumination and reflection maps for low and normal light.
Reflectance Consistency Loss Ο β ( R ) italic-Ο π \phi(R) . Considering that the reflectance of objects is invariant in various lighting conditions, we constrain the consistency of reflectance maps in different lighting conditions. Specifically, It can be described as:
L r β c = β R n β R l β 1 . subscript πΏ π π subscript norm subscript π
π subscript π
π 1 L_{rc}=\|R_{n}-R_{l}\|_{1}.
(4)
Illumination Smoothness Loss Ο β ( L ) π πΏ \psi(L) . Considering the illumination should be piece-wise smooth, we constrain it by:
L s β m β o β o β t β h = β W T l β
β L l β + β W T n β
β L n β , subscript πΏ π π π π π‘ β norm β
superscript subscript π π π β subscript πΏ π norm β
superscript subscript π π π β subscript πΏ π L_{smooth}=\|W_{T}^{l}\cdot\nabla L_{l}\|+\|W_{T}^{n}\cdot\nabla L_{n}\|,
(5)
where W T l superscript subscript π π π W_{T}^{l} and W T n superscript subscript π π π W_{T}^{n} are the weighting factors. It can be expressed in fractional form or exponential form. To simplify the process, we set W T l β e β c β β I l absent β superscript subscript π π π superscript π β π β subscript πΌ π W_{T}^{l}\xleftarrow{}e^{-c\cdot\nabla I_{l}} and W T n β e β c β β I n absent β superscript subscript π π π superscript π β π β subscript πΌ π W_{T}^{n}\xleftarrow{}e^{-c\cdot\nabla I_{n}} . c π c is the constraint factor. β β \nabla denotes the derivative filter. This loss guarantees that a large penalty is imposed in regions where the image is smooth and the constraint is relaxed in regions where the image illumination is abrupt.
Ultimately, the overall decomposition loss is denoted as:
L = L r β e β c + Ξ³ r β c β L r β c + Ξ³ s β m β L s β m β o β o β t β h , πΏ subscript πΏ π π π subscript πΎ π π subscript πΏ π π subscript πΎ π π subscript πΏ π π π π π‘ β L=L_{rec}+\gamma_{rc}L_{rc}+\gamma_{sm}L_{smooth},
(6)
where Ξ³ r β c subscript πΎ π π \gamma_{rc} and Ξ³ s β m subscript πΎ π π \gamma_{sm} are the hyper-parameters.
3.1.2
Network Architecture
As shown in Fig.Β 2 , Transformer decomposition network (TDN) consists of two branches, i.e. , the reflectance decomposition branch and the illumination decomposition branch.
Given an image I β β H Γ W Γ 3 πΌ superscript β π» π 3 I\in\mathbb{R}^{H\times W\times 3} to be decomposed, TDN firstly obtains its embedding features F i β n β i β t β β H Γ W Γ C subscript πΉ π π π π‘ superscript β π» π πΆ F_{init}\in\mathbb{R}^{H\times W\times C} through convolution projection. In the illumination decomposition branch, it makes up of several convolutional layers to reduce the amount of calculation on the premise of ensuring the decomposition effect. To ensure the intrinsic characteristics of illumination and reflectance maps, and improve the recovery performance and information retention in the reflectance map, the reflectance decomposition branch is composed of a multi-stage Transformer encoder and decoder. To be specific, the Transformer encoder and decoder are composed of an attention ( A β t β t β e β n π΄ π‘ π‘ π π Atten ) module and a feed-forward network ( F β F β N πΉ πΉ π FFN ) module. In general, we denote the computation in the TDN block as:
F i ^ = A β t β t β e β n β ( N β o β r β m β ( F i β 1 ) ) + F i β 1 , ^ subscript πΉ π π΄ π‘ π‘ π π π π π π subscript πΉ π 1 subscript πΉ π 1 \hat{F_{i}}=Atten(Norm(F_{i-1}))+F_{i-1},
(7)
F i = F β F β N β ( N β o β r β m β ( F ^ i ) ) + F ^ i , subscript πΉ π πΉ πΉ π π π π π subscript ^ πΉ π subscript ^ πΉ π F_{i}=FFN(Norm(\hat{F}_{i}))+\hat{F}_{i},
(8)
where N β o β r β m π π π π Norm denotes normalization. F i β 1 subscript πΉ π 1 F_{i-1} represents the input feature map of the current TDN block.
Considering the high attention computing overhead in Transformer, the time complexity is proportional to the quadratic size of the image. Therefore, it is not suitable for high-resolution image decomposition. To solve this problem, we design a novel multi-head depth-wise convolutions layer attention (MDLA) for computing attention form in TDN, as shown in Fig.Β 3 . On the premise of maintaining the decomposition performance, it reduces the attention computation complexity to a great extent.
Figure 3: Detailed network architecture of MDLA. The attention is calculated in the direction of cross channel to realize the efficient decomposition of a high-resolution image.
In MDLA, for a feature X β β h Γ w Γ c π superscript β β π€ π X\in\mathbb{R}^{h\times w\times c} obtained from Layer-Norm, we first aggregate the information of its channel directions with a 1 Γ 1 1 1 1\times 1 convolution. Subsequently, 3 Γ 3 3 3 3\times 3 , 5 Γ 5 5 5 5\times 5 and 7 Γ 7 7 7 7\times 7 convolutions aggregate the information. The outputs of multiple convolutions are queue Q = W p β c q β i β W d β c q β X π subscript superscript π π π π π subscript superscript π π π π π Q=W^{qi}{pc}W^{q}{dc}X , key K = W p β c k β i β W d β c k β X πΎ subscript superscript π π π π π subscript superscript π π π π π K=W^{ki}{pc}W^{k}{dc}X , and value V = W p β c v β i β W d β c v β X π subscript superscript π π£ π π π subscript superscript π π£ π π π V=W^{vi}{pc}W^{v}{dc}X . We reduce the features dimension by 1 Γ 1 1 1 1\times 1 convolution, also reshape the features and compute the attention in the direction of the layers. Specifically, it can be formulated as Eq.Β ( 9 ):
X ^ = s β o β f β t β m β a β x β ( Q R β K R / d ) β
V R + X , ^ π β
π π π π‘ π π π₯ subscript π π
subscript πΎ π
π subscript π π
π \hat{X}=softmax(Q_{R}K_{R}/d)\cdot V_{R}+X,
(9)
where Q R , V R β β h Γ w Γ c subscript π π subscript π π superscript β β π€ π Q_{R},V_{R}\in\mathbb{R}^{h\times w\times c} and K R β β c Γ h Γ w subscript πΎ π superscript β π β π€ K_{R}\in\mathbb{R}^{c\times h\times w} are Q π Q , V π V and K πΎ K after reshaping. d π d is a scale factor.
We take a simple but effective depth-wise separable feed-forward network. It mainly consists of separable point-wise convolution and depth-wise convolution to minimize the amount of computation. Given a feature X β β h Γ w Γ c π superscript β β π€ π X\in\mathbb{R}^{h\times w\times c} after Layer-Norm, the output feature can be expressed as:
X ^ = W d β c β ( Ο β ( W p β c β W d β c β ( X ) ) ) + X , ^ π subscript π π π italic-Ο subscript π π π subscript π π π π π \hat{X}=W_{dc}(\phi(W_{pc}W_{dc}(X)))+X,
(10)
where W p β c subscript π π π W_{pc} and W d β c subscript π π π W_{dc} are the point-wise and depth-wise convolutions. Ο italic-Ο \phi is the activation function.
3.2
Diffusion Generation Adjustment
The diffusion generation adjustment aims to construct the original data distribution of Retinex model that recovers multiple channels. Generally, it can be divided into two paths, namely reflectance diffusion adjustment (RDA) and illumination diffusion adjustment (IDA).
Figure 4: Example of the forward and reverse diffusion processes of Diffusion Generation Adjustment. I 0 subscript πΌ 0 I_{0} is the obtained result.
The normal-light image component is denoted as I 0 β β H Γ W Γ C subscript πΌ 0 superscript β π» π πΆ I_{0}\in\mathbb{R}^{H\times W\times C} ( C = 3 πΆ 3 C=3 in RDA, C = 1 πΆ 1 C=1 in IDA) for diffusion. The conditional images are respectively concatenated with the noisy image to form the guidance. We adopt the diffusion process proposed in Denoising Diffusion Probabilistic Model (DDPM)Β [ 7 ] to construct the distribution of Retinex data for each channel. More specifically, it can be described as a forward diffusion process and a reverse diffusion process, as shown in Fig.Β 4 .
Forward Diffusion Process . The forward diffusion process can be viewed as a Markov chain progressively adding Gaussian noise to the data. The data at step t π‘ t is only dependent on that at step t β 1 π‘ 1 t-1 . Thus, at any t β [ 0 , T ] π‘ 0 π t\in[0,T] , we can obtain the data distribution of the noisy image I t subscript πΌ π‘ I_{t} as:
q β ( I t | I t β 1 ) = π© β ( I t ; 1 β Ξ² t β I t β 1 , Ξ² t β π΅ ) , π conditional subscript πΌ π‘ subscript πΌ π‘ 1 π© subscript πΌ π‘ 1 subscript π½ π‘ subscript πΌ π‘ 1 subscript π½ π‘ π΅ q(I_{t}|I_{t-1})=\mathcal{N}(I_{t};\sqrt{1-\beta_{t}}I_{t-1},\beta_{t}\mathcal{Z}),
(11)
where Ξ² t subscript π½ π‘ \beta_{t} is a variable controlling the variance of the noise added to the data. When Ξ² t subscript π½ π‘ \beta_{t} is small enough, I t β 1 subscript πΌ π‘ 1 I_{t-1} to I t subscript πΌ π‘ I_{t} is a constant process of adding a small amount of noise, i.e. , the distribution at step t π‘ t is equal to that at the previous step adding Gaussian noise. By introducing a new variable Ξ± t = 1 β Ξ² t subscript πΌ π‘ 1 subscript π½ π‘ \alpha_{t}=1-\beta_{t} , this process can be described as:
I t = Ξ± t β I t β 1 + 1 β Ξ± t β Ο΅ t β 1 , Ο΅ t β 1 βΌ π© β ( 0 , π΅ ) . formulae-sequence subscript πΌ π‘ subscript πΌ π‘ subscript πΌ π‘ 1 1 subscript πΌ π‘ subscript italic-Ο΅ π‘ 1 similar-to subscript italic-Ο΅ π‘ 1 π© 0 π΅ I_{t}=\sqrt{\alpha_{t}}I_{t-1}+\sqrt{1-\alpha_{t}}\epsilon_{t-1},\quad\epsilon_{t-1}\sim\mathcal{N}(0,\mathcal{Z}).
(12)
With parameter renormalization, multiple Gaussian distributions are merged and simplified. We can obtain the distribution of the t π‘ t -th step q β ( I t | I 0 ) π conditional subscript πΌ π‘ subscript πΌ 0 q(I_{t}|I_{0}) . More specifically, it can be expressed as:
q β ( I t | I 0 ) = π© β ( I t ; Ξ± Β― t β I 0 , ( 1 β Ξ± Β― t ) β π΅ ) , π conditional subscript πΌ π‘ subscript πΌ 0 π© subscript πΌ π‘ subscript Β― πΌ π‘ subscript πΌ 0 1 subscript Β― πΌ π‘ π΅ q(I_{t}|I_{0})=\mathcal{N}(I_{t};\sqrt{\overline{\alpha}_{t}}I_{0},(1-\overline{\alpha}_{t})\mathcal{Z}),
(13)
where Ξ± Β― t = β i = 0 t Ξ± i subscript Β― πΌ π‘ superscript subscript product π 0 π‘ subscript πΌ π \overline{\alpha}{t}=\prod{i=0}^{t}\alpha_{i} . When the distribution q β ( I t | I 0 ) π conditional subscript πΌ π‘ subscript πΌ 0 q(I_{t}|I_{0}) approaches π© β ( 0 , π΅ ) π© 0 π΅ \mathcal{N}(0,\mathcal{Z}) , the model can be considered to complete the forward process of diffusion.
Reverse Diffusion Process . The reverse diffusion process is the process of restoring the original distribution from the Gaussian distribution of pure noise. Similar to the forward diffusion process, the denoising diffusion process is also carried out in steps. At step t π‘ t , a denoising operation is applied to data I t subscript πΌ π‘ I_{t} to get the probability distribution of I t β 1 subscript πΌ π‘ 1 I_{t-1} with the guidance of the conditional image I c subscript πΌ π I_{c} . Therefore, given I t subscript πΌ π‘ I_{t} , we can formulate the conditional probability distribution I t β 1 subscript πΌ π‘ 1 I_{t-1} as:
p ΞΈ β ( I t β 1 | I t , I c ) = π© β ( I t β 1 ; ΞΌ ΞΈ β ( I t , I c , t ) , Ο t 2 β π΅ ) , subscript π π conditional subscript πΌ π‘ 1 subscript πΌ π‘ subscript πΌ π π© subscript πΌ π‘ 1 subscript π π subscript πΌ π‘ subscript πΌ π π‘ superscript subscript π π‘ 2 π΅ p_{\theta}(I_{t-1}|I_{t},I_{c})=\mathcal{N}(I_{t-1};\mu_{\theta}(I_{t},I_{c},t),\sigma_{t}^{2}\mathcal{Z}),
(14)
where ΞΌ ΞΈ β ( I t , I c , t ) subscript π π subscript πΌ π‘ subscript πΌ π π‘ \mu_{\theta}(I_{t},I_{c},t) is the mean value, from the estimate of step t π‘ t . Ο t 2 superscript subscript π π‘ 2 \sigma_{t}^{2} is the variance. In RDA and IDA, we follow the setup of DDPM and set it to a fixed value. In more detail, they can be further expressed as:
ΞΌ ΞΈ β ( I t , I c , t ) = 1 Ξ± t β ( I t β Ξ² t ( 1 β Ξ± Β― t ) β Ο΅ ΞΈ β ( I t , I c , t ) ) , subscript π π subscript πΌ π‘ subscript πΌ π π‘ 1 subscript πΌ π‘ subscript πΌ π‘ subscript π½ π‘ 1 subscript Β― πΌ π‘ subscript italic-Ο΅ π subscript πΌ π‘ subscript πΌ π π‘ \mu_{\theta}(I_{t},I_{c},t)=\frac{1}{\sqrt{\alpha}_{t}}(I_{t}-\frac{\beta_{t}}{(1-\overline{\alpha}_{t})}\epsilon_{\theta}(I_{t},I_{c},t)),
(15)
Ο t 2 = 1 β Ξ± Β― t β 1 1 β Ξ± Β― t β Ξ² t , superscript subscript π π‘ 2 1 subscript Β― πΌ π‘ 1 1 subscript Β― πΌ π‘ subscript π½ π‘ \sigma_{t}^{2}=\frac{1-\overline{\alpha}_{t-1}}{1-\overline{\alpha}_{t}}\beta_{t},
(16)
where Ο΅ ΞΈ β ( I t , I c , t ) subscript italic-Ο΅ π subscript πΌ π‘ subscript πΌ π π‘ \epsilon_{\theta}(I_{t},I_{c},t) is the estimated value with a deep neural network, given the input I t subscript πΌ π‘ I_{t} , I c subscript πΌ π I_{c} and the time step t π‘ t .
In each step of the reverse diffusion process t β [ 0 , T ] π‘ 0 π t\in[0,T] , we optimize an objective function for the noise estimated by the network and the noise Ο΅ italic-Ο΅ \epsilon actually added. Therefore, the loss function of the reverse diffusion process is:
L d β i β f β f β ( ΞΈ ) = β Ο΅ β Ο΅ ΞΈ β ( Ξ± Β― t β I 0 + 1 β Ξ± Β― t β Ο΅ , I c , t ) β . subscript πΏ π π π π π norm italic-Ο΅ subscript italic-Ο΅ π subscript Β― πΌ π‘ subscript πΌ 0 1 subscript Β― πΌ π‘ italic-Ο΅ subscript πΌ π π‘ L_{diff}(\theta)=\|\epsilon-\epsilon_{\theta}(\sqrt{\overline{\alpha}_{t}}I_{0}+\sqrt{1-\overline{\alpha}_{t}}\epsilon,I_{c},t)\|.
(17)
The denoising network in the reverse diffusion process usually combines the characteristics of UNet and attention. In RDA and IDA, we adopt the backbone of SR3Β [ 33 ] and follow the designs of the diffusion denoising network, consisting of multiple stacked residual blocks combined with attention. From the noise predicted by the network, we can estimate the approximate I ~ 0 subscript ~ πΌ 0 \widetilde{I}{0} . It makes sense to keep the approaching I ~ 0 subscript ~ πΌ 0 \widetilde{I}{0} and the normal-light image with consistent content information. We adopt a consistent network to implement this process:
I ~ 0 = 1 Ξ± Β― t β ( I t β 1 β Ξ± Β― t β Ο΅ ΞΈ β ( I t , I c , t ) ) , subscript ~ πΌ 0 1 subscript Β― πΌ π‘ subscript πΌ π‘ 1 subscript Β― πΌ π‘ subscript italic-Ο΅ π subscript πΌ π‘ subscript πΌ π π‘ \widetilde{I}_{0}=\frac{1}{\sqrt{\overline{\alpha}_{t}}}(I_{t}-\sqrt{1-\overline{\alpha}_{t}}\epsilon_{\theta}(I_{t},I_{c},t)),
(18)
L c β o β n β t β e β n β t = β I 0 β Ο΅ c β ( I ~ 0 , t ) β 1 . subscript πΏ π π π π‘ π π π‘ subscript norm subscript πΌ 0 subscript italic-Ο΅ π subscript ~ πΌ 0 π‘ 1 L_{content}=\|I_{0}-\epsilon_{c}(\widetilde{I}_{0},t)\|_{1}.
(19)
In the consistent network Ο΅ c subscript italic-Ο΅ π \epsilon_{c} , the RDA part adopts the backbone of RestormerΒ [ 41 ] and adds feature affine with the time embedding. The IDA part adopts the structure same as the denoising network. The loss function of the overall diffusion model network is given by:
L = L d β i β f β f + Ξ³ c β t β L c β o β n β t β e β n β t . πΏ subscript πΏ π π π π subscript πΎ π π‘ subscript πΏ π π π π‘ π π π‘ L=L_{diff}+\gamma_{ct}L_{content}.
(20)
Figure 5: Qualitative comparison with the state-of-the-art low-light image enhancement methods on the LOL dataset.
In general, the whole diffusion generation adjustment process is to restore the original Retinex decomposition distribution from the low-light Retinex decomposition distribution. We can formulate the whole diffusion process as:
R ^ D β G β A = β± R β D β A β ( Ο΅ s ( r ) , R T β D β N ) , subscript ^ π
π· πΊ π΄ subscript β± π
π· π΄ superscript subscript italic-Ο΅ π π subscript π
π π· π \hat{R}_{DGA}=\mathcal{F}_{RDA}(\epsilon_{s}^{(r)},R_{TDN}),
(21)
L ^ D β G β A = β± I β D β A β ( Ο΅ s ( i ) , L T β D β N ) , subscript ^ πΏ π· πΊ π΄ subscript β± πΌ π· π΄ superscript subscript italic-Ο΅ π π subscript πΏ π π· π \hat{L}_{DGA}=\mathcal{F}_{IDA}(\epsilon_{s}^{(i)},L_{TDN}),
(22)
where Ο΅ s ( r ) β β H Γ W Γ 3 superscript subscript italic-Ο΅ π π superscript β π» π 3 \epsilon_{s}^{(r)}\in\mathbb{R}^{H\times W\times 3} and Ο΅ s ( i ) β β H Γ W Γ 1 superscript subscript italic-Ο΅ π π superscript β π» π 1 \epsilon_{s}^{(i)}\in\mathbb{R}^{H\times W\times 1} are Gaussian noise generated by initialization. R T β D β N subscript π π π· π R_{TDN} and L T β D β N subscript πΏ π π· π L_{TDN} are the reflectance and illumination maps obtained by TDN.
Ultimately, the enhanced image is obtained as the production of the diffusion generative adjusted illumination and reflectance maps, i.e. , I ^ = R ^ D β G β A β L ^ D β G β A ^ πΌ β subscript ^ π π· πΊ π΄ subscript ^ πΏ π· πΊ π΄ \hat{I}=\hat{R}{DGA}\cdot\hat{L}{DGA} .
4
Experiment
4.1
Implementation Details and Datasets
Implementation Details. The proposed Diff-Retinex is separately trained. TDN is first trained. Empirically, we set Ξ³ r β c = 0.1 subscript πΎ π π 0.1 \gamma_{rc}=0.1 , Ξ³ s β m = 0.1 subscript πΎ π π 0.1 \gamma_{sm}=0.1 , Ξ± r β e β c = 0.3 subscript πΌ π π π 0.3 \alpha_{rec}=0.3 . The learning rate is l β r = 0.0001 π π 0.0001 lr=0.0001 and the batch size is 16 with Adam optimizer. Then, we train the networks related to diffusion generation adjustment. The steps of IDA and RDA are set to t = 1000 π‘ 1000 t=1000 , Ξ³ c β t = 1 subscript πΎ π π‘ 1 \gamma_{ct}=1 The input image is of size 160 Γ 160 160 160 160\times 160 , the batch size is 16. Adam optimizer with a learning rate of 0.0001 is used to train 800K iterations on the network. All the experiments are conducted on the NVIDIA GeForce RTX 3090 GPU with PyTorchΒ [ 30 ] framework.
Datasets. To verify the generalization, we conduct experiments on the LOLΒ [ 39 ] and VE-LOL-LΒ [ 17 ] datasets. All the images in the LOL datasets are taken in real life. We use 485 pairs of images for training and 15 low-light images for testing. The VE-LOL dataset contains data for high-level and low-level visual tasks, called VE-LOL-H and VE-LOL-L, respectively. VE-LOL-L is also adopted to evaluate the effectiveness of our method. DICM is used as the evaluation dataset for generalization by cross-testing.
Figure 6: Qualitative comparison with the state-of-the-art low-light image enhancement methods on the VE-LOL-L dataset.
Table 1: Quantitative results of low-light image enhancement methods on the LOL and VE-LOL-L datasets.
Methods EnlightenGAN JED Robust Retinex RUAS KinD KinD++ LIME RetinexNet Zero-DCE URetinex LLFormer Diff-Retinex
LOL
FID β β \downarrow
105.59
105.86
92.32
95.59
78.59
110.68
114.00
150.50
106.63
59.00
76.96
47.85
LPIPS β β \downarrow
0.129
0.190
0.157
0.167
0.083
0.095
0.211
0.183
0.133
0.050
0.067
0.048
BIQI β β \downarrow
30.95
32.33
42.29
43.32
26.70
26.81
37.83
29.33
34.80
23.05
28.81
19.97
LOE β β \downarrow
395.52
306.90
202.31
195.36
758.56
700.79
547.54
395.33
232.97
197.02
176.61
191.56
VE-LOL-L
FID β β \downarrow
92.58
110.46
79.64
100.07
65.56
98.10
98.90
158.99
93.81
48.36
79.83
47.75
LPIPS β β \downarrow
0.124
0.158
0.106
0.144
0.070
0.114
0.248
0.283
0.123
0.091
0.110
0.050
BIQI β β \downarrow
32.77
27.29
39.75
32.51
28.23
32.33
47.09
45.59
35.06
35.39
32.47
26.54
LOE β β \downarrow
422.77
330.25
128.73
168.99
239.33
623.63
554.69
531.92
228.88
166.02
177.87
149.60
4.2
Results and Analysis
We present quantitative and qualitative comparisons with state-of-the-art methods including both traditional and deep learning-based methods. The traditional methods include LIMEΒ [ 4 ] based on illumination estimation and JEDΒ [ 32 ] based on Retinex decomposition and joint denoising. The learning-based methods include RetinexNetΒ [ 39 ] , KinDΒ [ 44 ] , KinD++Β [ 43 ] , RUASΒ [ 19 ] , EnlightenGANΒ [ 8 ] , URetinexΒ [ 40 ] , and LLFormerΒ [ 38 ] .
Qualitative Comparison . Qualitative results are shown in Figs.Β 5 and 6 . Our method shows three obvious advantages. First and foremost, Diff-Retinex has the ability of texture completion and reasoning generation for missing scenes . It is a remarkable characteristic of our generative diffusion model and not possessed in existing methods. As shown in Fig.Β 5 , the highlighted region on the right is ground with coarse-grained textured tile (see ground truth). All the competitors fail to recover the coarse-grained texture tile while our method can generate the missing textures similar to the ground truth. Similarly, the diving platform and handrail in Fig.Β 6 are seriously missing and damaged in the low-light image. Most methods cannot complete the clear textures while Diff-Retinex can. Second, our method shows better illumination and color fidelity. In Fig.Β 5 , the low-light image has considerable color deviation. In the whole view, the color of Diff-Retinex is the closest to that of the ground truth. KinD, KinD++, RetinexNet, and URetinex appear with different degrees of color deviation, e.g. , URetinex and KinD++ tend to be yellow. In Fig.Β 6 , Diff-Retinex also performs better than other SOTA methods in the color of the venue. Last, our results exhibit vivid textures with less noise than other methods. LIME and RetinexNet have much noise left in the whole image, affecting the scene expression. The denoising performance of EnlightenGAN and LLFormer at the flat place is unsatisfactory, e.g. , the black area below the computer desk and wall in Fig.Β 5 . In general, Diff-Retinex shows obvious advantages in these areas.
Table 2: Quantitative comparison of PSNR and SSIM on LOL.
Method
Main Type
PSNR β β \uparrow
SSIM β β \uparrow
RetinexNet
CNN
17.56
0.698
KinD
CNN
17.64
0.829
KinD++
CNN
17.75
0.816
EnlightenGAN
GAN
17.48
0.716
URetinex
Unfolding
21.32
0.836
LLFormer
Transformer
23.66
0.873
Diff-Retinex
Diffusion
21.98
0.863
Quantitative Comparison . Metrics including FIDΒ [ 6 ] , LPIPSΒ [ 42 ] , BIQIΒ [ 27 ] , LOEΒ [ 37 ] and PIΒ [ 1 ] are adopted for evaluation. FID is the machine feature similarity used to evaluate image similarity. LPIPS is learned perceptual image patch similarity, which measures the image differences. BIQI is an image-blind quality evaluation index. LOE is the sequence error of image brightness, reflecting the natural retention ability of the image. PI represents the subjective perceived quality of the image. The lower FID, LPIPS, BIQI, LOE, and PI, the better the image quality. The quantitative results on LOL and VE-LOL-L datasets are reported in Tab.Β 1 . For LOL, our method shows great advantages over other methods in generative indicators FID and LPIPS. It indicates that our results have better generation similarity of machine vision. In terms of brightness sequence error, our method is slightly lower than LLFormer. However, benefit from generative diffusion model and TDN, it shows the best performance among all the Retinex-based methods, including RetinexNet, KinD, KinD++, and URetinex. For VE-LOL-L, our method achieves the comprehensive best performance from the perspective of metrics as well. It indicates that our method has strong generalization and advanced generation enhancement performance in various scenarios. For DICM, in Fig.Β 7 , our method also demonstrates competitiveness. In addition, we also provide the quantitative comparisons of PSNR and SSIM in Tab.Β 2 .
Figure 7: Qualitative and quantitative comparison of PI with the SOTA low-light image enhancement methods on DICM.
4.3
Ablation Study
Transformer Decomposition Network . To validate the effectiveness of Transformer decomposition network (TDN), we visualize the decomposition. Retinex decomposition is an ill-posed problem with no exact optimal solution. A core point is that the reflectance information should be strictly consistent in different illumination levels. Typical and effective representation methods adopt CNN for decomposition, such as RetinexNet and KinD++. The reflectance decomposition results are shown in Fig.Β 8 .
Figure 8: Qualitative comparison on reflection map of the decomposition network. The structure details and noise are assumed to be decomposed into the reflection map.
Figure 9: Example of RDA and IDA. To better show the diffusion effect, the output format of constant period sampling is adopted. Left to right: iterative process of gradual recovery from pure noise.
Generative Diffusion Model. To validate the effectiveness of the diffusion model, on the one hand, we visualize the generation process of the RDA and IDA, as shown in Fig.Β 9 . On the other hand, we compare the restoration results of the reflectance map through our diffusion model and some other one-step Retinex-based LLIE methods. We use the reflectance map to compare for it contains a lot of color and texture information which is more sensitive to visual perception. Typical Retinex-based LLIE methods include RetinexNet and KinD++. For reflectance restoration, RetinexNet adopts BM3D and KinD++ adopts CNN. The results are shown in Fig.Β 10 . For Retinex decomposition results of methods are quiet different, we show their own decomposed reflectance map from the normal-light image as ground truth for comparison. It can be seen that our method can better deal with color deviation and shows better performance on texture restoration. We also calculate the FID, LPIPS, and BIQI between the restored reflectance map and corresponding ground truth for quantitative evaluation. The results are reported in Tab.Β 3 .
Figure 10: Qualitative comparison of the restoration of reflectance map with our diffusion model and other SOTA methods.
Table 3: Quantitative comparison on reflectance restoration with the diffusion model and other low-light enhancement methods.
Method
FID β β \downarrow
LPIPS β β \downarrow
BIQI β β \downarrow
RetinexNet ( BM3D )
111.29
0.225
24.80
KinD++ ( CNN )
171.45
0.110
36.99
Diff-Retinex ( Diff. )
61.33
0.059
18.98
4.4
Discussion
While Diff-Retinex, functioning as a generative model for low-light image enhancement, exhibits commendable visual outcomes, it does not establish dominance at the pixel-wise error metrics, e.g. , PNSR, as shown in Tab.Β 2 . Higher PSNR can be attained through more stringent constraints, but the generation effect will be weakened to some extent. In this paper, we encourage the adoption of generative Diffusion to explore the possibility of generative effects for low-light enhancement tasks. Naturally, it is also feasible and desirable to achieve better performance with the pixel-level error through diffusion models.
5
Conclusion
In this paper, we rethink the low-light image enhancement task and propose a generative Diff-Retinex model. Diff-Retinex formulates the low-light enhancement task as a paradigm of decomposition and image generation. It can adaptively decompose images into illumination and reflectance maps and solve various degradations by generating diffusion models. The experimental results show that Diff-Retinex has excellent performance and makes subtle-detail completion and inference restoration of low-light image enhancement into reality.
References
[1]
Yochai Blau, Roey Mechrez, Radu Timofte, Tomer Michaeli, and Lihi Zelnik-Manor.
The 2018 pirm challenge on perceptual image super-resolution.
In Proceedings of the European Conference on Computer Vision (ECCV) Workshops , pages 0β0, 2018.
[2]
Aram Danielyan, Vladimir Katkovnik, and Karen Egiazarian.
Bm3d frames and variational image deblurring.
IEEE Transactions on Image Processing , 21(4):1715β1728, 2011.
[3]
Xueyang Fu, Delu Zeng, Yue Huang, Yinghao Liao, Xinghao Ding, and John Paisley.
A fusion-based enhancing method for weakly illuminated images.
Signal Processing , 129:82β96, 2016.
[4]
Xiaojie Guo, Yu Li, and Haibin Ling.
Lime: Low-light image enhancement via illumination map estimation.
IEEE Transactions on Image Processing , 26(2):982β993, 2016.
[5]
Ji-Hee Han, Sejung Yang, and Byung-Uk Lee.
A novel 3-d color histogram equalization method with uniform 1-d gray scale histogram.
IEEE Transactions on Image Processing , 20(2):506β512, 2010.
[6]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter.
Gans trained by a two time-scale update rule converge to a local nash equilibrium.
Advances in Neural Information Processing Systems , 30, 2017.
[7]
Jonathan Ho, Ajay Jain, and Pieter Abbeel.
Denoising diffusion probabilistic models.
Advances in Neural Information Processing Systems , 33:6840β6851, 2020.
[8]
Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, and Zhangyang Wang.
Enlightengan: Deep light enhancement without paired supervision.
IEEE Transactions on Image Processing , 30:2340β2349, 2021.
[9]
DanielΒ J Jobson, Zia-ur Rahman, and GlennΒ A Woodell.
A multiscale retinex for bridging the gap between color images and the human observation of scenes.
IEEE Transactions on Image Processing , 6(7):965β976, 1997.
[10]
DanielΒ J Jobson, Zia-ur Rahman, and GlennΒ A Woodell.
Properties and performance of a center/surround retinex.
IEEE Transactions on Image Processing , 6(3):451β462, 1997.
[11]
DiederikΒ P Kingma and Max Welling.
Auto-encoding variational bayes.
arXiv preprint arXiv:1312.6114 , 2013.
[12]
Xiang-Yu Kong, Lei Liu, and Yun-Sheng Qian.
Low-light image enhancement via poisson noise aware retinex model.
IEEE Signal Processing Letters , 28:1540β1544, 2021.
[13]
EdwinΒ H Land and JohnΒ J McCann.
Lightness and retinex theory.
Josa , 61(1):1β11, 1971.
[14]
Chulwoo Lee, Chul Lee, and Chang-Su Kim.
Contrast enhancement based on layered difference representation of 2d histograms.
IEEE Transactions on Image Processing , 22(12):5372β5384, 2013.
[15]
Chongyi Li, Jichang Guo, Fatih Porikli, and Yanwei Pang.
Lightennet: A convolutional neural network for weakly illuminated image enhancement.
Pattern Recognition Letters , 104:15β22, 2018.
[16]
Mading Li, Jiaying Liu, Wenhan Yang, Xiaoyan Sun, and Zongming Guo.
Structure-revealing low-light image enhancement via robust retinex model.
IEEE Transactions on Image Processing , 27(6):2828β2841, 2018.
[17]
Jiaying Liu, Dejia Xu, Wenhan Yang, Minhao Fan, and Haofeng Huang.
Benchmarking low-light image enhancement and beyond.
International Journal of Computer Vision , 129:1153β1184, 2021.
[18]
Risheng Liu, Long Ma, Tengyu Ma, Xin Fan, and Zhongxuan Luo.
Learning with nested scene modeling and cooperative architecture search for low-light vision.
IEEE Transactions on Pattern Analysis and Machine Intelligence , 45(5):5953β5969, 2022.
[19]
Risheng Liu, Long Ma, Jiaao Zhang, Xin Fan, and Zhongxuan Luo.
Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 10561β10570, 2021.
[20]
Risheng Liu, Long Ma, Yuxi Zhang, Xin Fan, and Zhongxuan Luo.
Underexposed image correction via hybrid priors navigated deep propagation.
IEEE Transactions on Neural Networks and Learning Systems , 33(8):3425β3436, 2021.
[21]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo.
Swin transformer: Hierarchical vision transformer using shifted windows.
In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 10012β10022, 2021.
[22]
KinΒ Gwn Lore, Adedotun Akintayo, and Soumik Sarkar.
Llnet: A deep autoencoder approach to natural low-light image enhancement.
Pattern Recognition , 61:650β662, 2017.
[23]
Long Ma, Dian Jin, Nan An, Jinyuan Liu, Xin Fan, and Risheng Liu.
Bilevel fast scene adaptation for low-light image enhancement.
arXiv preprint arXiv:2306.01343 , 2023.
[24]
Long Ma, Risheng Liu, Yiyang Wang, Xin Fan, and Zhongxuan Luo.
Low-light image enhancement via self-reinforced retinex projection model.
IEEE Transactions on Multimedia , 2022.
[25]
Long Ma, Tengyu Ma, Risheng Liu, Xin Fan, and Zhongxuan Luo.
Toward fast, flexible, and robust low-light image enhancement.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 5637β5646, 2022.
[26]
Long Ma, Tianjiao Ma, Xinwei Xue, Xin Fan, Zhongxuan Luo, and Risheng Liu.
Practical exposure correction: Great truths are always simple.
arXiv preprint arXiv:2212.14245 , 2022.
[27]
AnushΒ Krishna Moorthy and AlanΒ Conrad Bovik.
A two-step framework for constructing blind image quality indices.
IEEE Signal Processing Letters , 17(5):513β516, 2010.
[28]
Zhangkai Ni, Wenhan Yang, Hanli Wang, Shiqi Wang, Lin Ma, and Sam Kwong.
Cycle-interactive generative adversarial network for robust unsupervised low-light enhancement.
In Proceedings of the ACM International Conference on Multimedia , pages 1484β1492, 2022.
[29]
Axi Niu, Kang Zhang, TrungΒ X Pham, Jinqiu Sun, Yu Zhu, InΒ So Kweon, and Yanning Zhang.
Cdpmsr: Conditional diffusion probabilistic models for single image super-resolution.
arXiv preprint arXiv:2302.12831 , 2023.
[30]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, etΒ al.
Pytorch: An imperative style, high-performance deep learning library.
Advances in Neural Information Processing Systems , 32, 2019.
[31]
Mengwei Ren, Mauricio Delbracio, Hossein Talebi, Guido Gerig, and Peyman Milanfar.
Image deblurring with domain generalizable diffusion models.
arXiv preprint arXiv:2212.01789 , 2022.
[32]
Xutong Ren, Mading Li, Wen-Huang Cheng, and Jiaying Liu.
Joint enhancement and denoising method via sequential decomposition.
In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS) , pages 1β5. IEEE, 2018.
[33]
Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, DavidΒ J Fleet, and Mohammad Norouzi.
Image super-resolution via iterative refinement.
IEEE Transactions on Pattern Analysis and Machine Intelligence , 45(4):4713β4726, 2022.
[34]
Kuldeep Singh and Rajiv Kapoor.
Image enhancement using exposure based sub image histogram equalization.
Pattern Recognition Letters , 36:10β14, 2014.
[35]
Neng-Tsann Ueng and LouisΒ L Scharf.
The gamma transform: A local time-frequency analysis method.
In Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers , volumeΒ 2, pages 920β924. IEEE, 1995.
[36]
Kunfeng Wang, Chao Gou, Yanjie Duan, Yilun Lin, Xinhu Zheng, and Fei-Yue Wang.
Generative adversarial networks: introduction and outlook.
IEEE/CAA Journal of Automatica Sinica , 4(4):588β598, 2017.
[37]
Shuhang Wang, Jin Zheng, Hai-Miao Hu, and Bo Li.
Naturalness preserved enhancement algorithm for non-uniform illumination images.
IEEE Transactions on Image Processing , 22(9):3538β3548, 2013.
[38]
Tao Wang, Kaihao Zhang, Tianrun Shen, Wenhan Luo, Bjorn Stenger, and Tong Lu.
Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method.
In Proceedings of the AAAI Conference on Artificial Intelligence , volumeΒ 37, pages 2654β2662, 2023.
[39]
Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu.
Deep retinex decomposition for low-light enhancement.
arXiv preprint arXiv:1808.04560 , 2018.
[40]
Wenhui Wu, Jian Weng, Pingping Zhang, Xu Wang, Wenhan Yang, and Jianmin Jiang.
Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 5901β5910, 2022.
[41]
SyedΒ Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, FahadΒ Shahbaz Khan, and Ming-Hsuan Yang.
Restormer: Efficient transformer for high-resolution image restoration.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 5728β5739, 2022.
[42]
Richard Zhang, Phillip Isola, AlexeiΒ A Efros, Eli Shechtman, and Oliver Wang.
The unreasonable effectiveness of deep features as a perceptual metric.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 586β595, 2018.
[43]
Yonghua Zhang, Xiaojie Guo, Jiayi Ma, Wei Liu, and Jiawan Zhang.
Beyond brightening low-light images.
International Journal of Computer Vision , 129:1013β1037, 2021.
[44]
Yonghua Zhang, Jiawan Zhang, and Xiaojie Guo.
Kindling the darkness: A practical low-light image enhancer.
In Proceedings of the ACM International Conference on Multimedia , pages 1632β1640, 2019.
[45]
Anqi Zhu, Lin Zhang, Ying Shen, Yong Ma, Shengjie Zhao, and Yicong Zhou.
Zero-shot restoration of underexposed images via robust retinex decomposition.
In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) , pages 1β6. IEEE, 2020.
[46]
Lin Zhu, Yushi Chen, Pedram Ghamisi, and JΓ³nΒ Atli Benediktsson.
Generative adversarial networks for hyperspectral image classification.
IEEE Transactions on Geoscience and Remote Sensing , 56(9):5046β5063, 2018.
β
Feeling lucky?
Conversion report
Report an issue
View original on arXiv βΊ
AI Summary: Based on semantic_scholar metadata. Not a recommendation.
π‘οΈ Paper Transparency Report
Technical metadata sourced from upstream repositories.
π Identity & Source
- id
- arxiv-paper--unknown--2308.13164
- slug
- unknown--2308.13164
- source
- semantic_scholar
- author
- Xunpeng Yi, Han Xu, H. Zhang, Linfeng Tang, Jiayi Ma
- license
- ArXiv
- tags
- paper, research, academic
βοΈ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
π Engagement & Metrics
- downloads
- 0
- stars
- 0
- forks
- 0
Data indexed from public sources. Updated daily.