Vision-Anchored Automation of Bird-Sized UAVs in Unknown Cluttered Indoor Environments

* Project Goals

In many real-world applications, autonomous unmanned aerial vehicles (UAVs) can be used to explore unknown, cluttered indoor spaces where GPS access and communication are often denied. To accommodate the confined working space, however, UAVs have a small body size (roughly the size of a bird). Such small size UVAs require lightweight and power-efficient sensors. Therefore, this research project aims to develop full automation for bird-sized UAVs within unknown and cluttered indoor environments using only an RGB-D camera.

* Research Challenges

Although vision-only UAVs are advantageous for system assembly, their maneuver becomes increasingly difficult, without having measures from other sensors (e.g., Radars and LiDARs). Consequently, for bird-sized UAVs to achieve automation, two fundamental challenges need to be addressed for UAV automation:

(1) How to construct visual perception to have a holistic yet computationally efficient understanding of the surrounding environment using only a vision sensor.

(2) Leveraging the established perception system, how to employ visual navigation to perform target-driven, safety-critical operations without relying on maps or GPS.

* Current/Final Results (summary)

The current results involve algorithm development for object recognition, depth estimation, and optical flows estimation using vision sensors, especially cameras, to establish a solid base for UAV automation.

* Publications

1. ClusterFomer: Clustering As A Universal Visual Learner, NeurIPS 2023.

2. E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning, ICCV 2023.

3. Tripartite Feature Enhanced Pyramid Network for Dense Prediction, TIP 2023.

4. TransFlow: Transformer as Flow Learner, CVPR 2023.

5. Visual Recognition with Deep Nearest Centroids, ICLR 2023.

6. Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World Attacks, ICLR 2023.

* Presentations and images/Videos demonstrating the project

1. A video for paper "E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning".

2. A video for paper "TransFlow: Transformer as Flow Learner".

3. A video for paper "Visual Recognition with Deep Nearest Centroids".

4. A video for paper "Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World Attacks".

* Data, Demos and Software Downloads (with documentation)

1. Paper "ClusterFomer: Clustering As A Universal Visual Learner".

2. Paper "E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning".

3. Paper "Visual Recognition with Deep Nearest Centroids".

4. Paper "Tripartite Feature Enhanced Pyramid Network for Dense Prediction".

5. Paper "Visual Recognition with Deep Nearest Centroids".

6. Paper "Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World Attacks".

* Patents

None

* Other relevant information

None

* Broader Impacts

The proposed research program is expected to have a direct impact on various engineering applications (e.g., search-and-rescue, construction inspection, and underground mining exploration) and cross-disciplinary research (e.g., robotics, computer vision, and control theory). To contribute to the synergy of autonomous UAVs research, all systems developed in this proposed program will be open-source

1. Contribution to society and industry. The proposed project will foster the interplay between a wide range of fields (e.g., disaster response, criminal justice, resource inspection).

2. Contribution to education equity and outreach. The PI is committed to providing long-term mentoring services to the NTID Center by continuing to give talks on Computer Science careers, cutting-edge STEM research, and assisting NTID students in entering a field dominated by hearing students, and the PI aims to establish a FIRST Autonomous UVA Competition and Mentor program for local high schools in Rochester, based on this proposed research program.

3. Contribution to higher education. The PI will involve and co-advise both graduate and undergraduate students for senior capstone projects and doctoral/master thesis from the departments of computer science and computer engineering based on the proposed research activities.

* Educational material (with documentation)

CMPE-677 Machine Intelligence and CMPE-679 Deep Learning, in the form of class contents and course projects that will offer hands-on exercises for students to develop/implement intelligent UAV systems.

* Acknowledgement

This material is based upon work supported by the National Science Foundation under Grant No. (NSF 2242243)

* Disclaimer

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

* Award number

2242243

* Duration

From February 1, 2023 to January 31, 2025

* PI, co-PI(s)

PI: Dongfang Liu

* Student(s)

James Liang, Chen Han

* Collaborators, etc.

1. Prof. Dustin Osborne from the Department of Criminal Justice and Criminology at East Tennessee State University, along with Johnson City Police Station (Tennessee).

2. RIT's National Technical Institute for the Deaf (NTID).

* Point of Contact

Dr. Dongfang Liu (dongfang.liu@rit.edu)

* Date of Last Update

9/28/2023