首页|基于双螺旋相位板的单目三维编码成像

基于双螺旋相位板的单目三维编码成像

Monocular Three-Dimensional Coding Imaging Based on Double Helix Phase Mask

扫码查看
提出一种能够同时获得场景深度信息并实现景深拓展的成像方法.通过在相机的光瞳处引入双螺旋相位调制,将深度信息编码在图像中,使用端到端的深度学习技术对成像过程进行反演,最终得到景深拓展的图像和深度图.分析了相位板参数和物距对成像性能的影响,讨论了在给定的深度范围内合理选择相位板参数的方法.在NYU Depth V2数据集的深度范围内进行了仿真,深度估计相对误差最低可达8.3%,景深拓展后图像的峰值信噪比(PSNR)和结构相似度(SSIM)最高分别可达35.254 dB和0.960,所提方法与传统光学系统相比景深可拓展数十倍以上,并且结果证明了缩小探测范围和增大物距可提升平均深度估计精度.针对闸机人脸识别等潜在的应用场景,以1.1~1.32m为探测范围搭建了实物系统,在真实场景中深度估计的相对误差为2.2%,所提方法相比传统光学系统景深拓展约10倍.本文方法仅需在传统成像系统中加入一块相位板即可同时实现场景的深度估计和景深拓展功能,在低成本三维成像和检测领域具有一定的应用潜力.
Objective As information technology develops rapidly,cameras are used not only as photography tools to meet users'artistic creation needs but also as hardware devices for visual sensing,serving as the"eyes"of machines.They are now widely applied in 2D computer vision tasks such as image classification,semantic segmentation,and object recognition.However,traditional cameras have two inherent limitations.Firstly,to meet the resolution requirements,the depth of field range needs to be sacrificed.Beyond the depth of field range,image blurring caused by defocus can affect the normal operation of subsequent algorithms.Secondly,as traditional cameras map the 3D world onto a 2D plane,they lose the depth information of the scene,making it difficult to apply to rapidly developing 3D computer vision tasks.Existing methods for depth acquisition,such as structured light,time-of-flight,and multi-view geometry,are inferior to single-lens cameras in terms of power consumption,cost,and size.Therefore,we propose a single-camera 3D imaging method based on a double helix phase mask,which can achieve depth estimation and depth-of-field extension imaging simultaneously with simple hardware modifications.Methods We propose an imaging method based on a double helix phase mask that can simultaneously acquire scene depth information and achieve depth-of-field extension.By inserting a designed double helix phase mask at the aperture stop of the camera,the imaging beam is modulated into a double helix shape.On the one hand,the depth information is encoded in the image using the sensitive rotation characteristics of the double helix point spread function with defocus.On the other hand,utilizing the longer depth of focus characteristic of the double helix beam,the object points are encoded in the form of a double helix point spread function in a larger depth of field range.The depth information of object points is encoded in the image in the form of local ghosting.We combine convolutional neural networks to decode and reconstruct the encoded image end-to-end,thereby obtaining depth maps and depth of field extended images of the scene and jointly optimizing individual phase mask parameters.We analyze the influence of phase mask parameters and object distance on imaging performance and discuss the method of selecting phase mask parameters reasonably within a given depth range.Results and Discussions To validate our method,we train it on the FlyingThings3D dataset,and the trained model is tested on the NYU Depth V2 dataset.The relative error of depth estimation on the NYU Depth V2 dataset can reach as low as 0.083(Table 2).The depth of field extended images can achieve the highest PSNR of 35.254 dB and SSIM of 0.960(Table 3).Compared to traditional optical systems,the depth of field can be extended by several tens of times.Using a phase mask with more rings can result in a higher depth of field extension imaging,but it may cause a slight decrease in depth estimation accuracy and quality of the depth of field extended images due to increased side lobes of the double-helix point spread function.Nevertheless,the overall performance remains within an acceptable range.The depth estimation accuracy of our method is related to the depth range to be measured.Reducing the detection range or increasing the object distance can improve the average depth estimation accuracy(Fig.13).For potential application scenes such as gate face recognition,a physical system is built within the test range of 1.1-1.32 m.The relative depth estimation error in real scenes is 2.2%,and the depth of field is extended by about 10 times(Fig.17),proving the effectiveness and practicality of the proposed method in real scenes.Conclusions We introduce a three-dimensional imaging method based on a double helix phase,which only requires the addition of a phase mask to the existing lens to simultaneously estimate the depth of the scene from captured single frame images and achieve depth of field extension imaging.This method does not rely on built-in light sources and additional lenses,allowing for further reduction in size and power consumption.Compared to depth estimation algorithms solely based on deep learning,our method has excellent generalization because it identifies optically introduced features to estimate depth without relying on high-level semantic information about the scene.Overall,the method shows potential applications in low-cost 3D imaging and detection fields.However,there are limitations to the proposed method.It relies on texture and can effectively work in scenes with weak texture,but it may fail in cases where texture is severely missing due to overexposure and other factors(Fig.14).In addition,being affected by noise in real scenes can lead to errors in some depth values,decreased accuracy of system average depth estimation,and slight artifacts in reconstructed images.Subsequent research could consider incorporating noise suppression into the algorithm to solve this problem.

imaging systemscomputational imagingmonocular depth estimationdepth of field extensiondouble helix phase maskpoint spread function

张越、蔡怀宇、盛婧、汪毅、陈晓冬

展开 >

天津大学精密仪器与光电子工程学院光电信息技术教育部重点实验室,天津 300072

成像系统 计算成像 单目深度估计 景深拓展 双螺旋相位板 点扩散函数

2024

光学学报
中国光学学会 中国科学院上海光学精密机械研究所

光学学报

CSTPCD北大核心
影响因子:1.931
ISSN:0253-2239
年,卷(期):2024.44(9)
  • 3