“‘Intelligent’ computers require knowledge of their environment, and the most effective means of acquiring such knowledge is by seeing. Vision opens a new realm of computer applications,” Computer magazine, May 1973.
Grounded in the principles of artificial intelligence (AI), computer vision provides machines the capability to perceive and analyze visual data such as images, graphics, and videos. The intention is similar to AI — to automate decisions — yet its area of focus is exclusive to activities a human’s visual system would generally conduct. IBM describes the contrast lucidly: “If AI enables computers to think, computer vision enables them to see, observe, and understand.”
Computer vision, which seems like a modern innovation, is the outcome of extensive research stretching back to the 1960s. First coming into discovery with Seymour Papert’s Summer Vision Project of 1966, computer vision has been in development for decades, improving all along the way and creating new possibilities for everyone. Though complex, the process of these systems can be broken down into four fundamental steps:
Before the technology of computer vision came to today’s application methods, there were of course key pioneers that led the way first. For example, the Optical Character Recognition system was developed by Ray Kurzweil of Kurzweil Computer Products, Inc. in 1974. This system could recognize and process printed text, no matter the font and without manual entry. When placed in a machine learning format and enhanced with text-to-speech features, the technology was used to read for the blind.
This is just one pivotal example of the many applications that display the power and impact of computer vision. Thanks to waves of developments and crucial research, the technology has improved several domains of human life including transportation, healthcare, security, entertainment, and agriculture. Because of this, it is no surprise that the market of computer vision is expected to expand in the very near future.
According to the Top Trends in Computer Vision Report, which reviews the latest trends covered at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), the computer vision industry raked in over $12.14 billion USD in 2022 and has a 7% projected growth rate with $20.88 billion USD expected by 2030.
The revenue is projected to increase due to the surging need for the technology in various fields, like transportation, healthcare, and security. Moreover, according to PS Market Research, XR entertainment systems which were worth $38.3 billion in 2022 are predicted to reach an immense value of $394.8 billion by 2030.
Learn More About Virtual Reality and its Applications at IEEE VR 2024
Learn More About Virtual Reality and its Applications at IEEE VR 2024
According to the US Bureau of Labor Statistics, the employment of professionals in the computer and information science industry is expected to increase significantly over the next decade, reaching a 21% rise by 2031. To fill these new roles, experts in computer vision, extended reality (XR), and data visualization will be needed.
While computer vision has made significant improvements, challenges still prevail, emphasizing the necessity for continuous research and development in the field. This includes concerns related to data quality and bias. It’s important to note that any technology created or managed by humans is susceptible to biases. To ensure accurate detections and optimal functionality, these systems must be developed with diversity in inputs.
Moreover, the question remains: Can a computer not only perceive but truly comprehend its observations? It is crucial to instill trust in these systems, ensuring they understand what they observe with minimal errors and increased adoption to be accurate.
Lastly, security and privacy stand as major considerations for any widely adapted technology. However, these aspects continue to be challenging with room for improvement. In the context of facial recognition, this issue becomes particularly pronounced and ongoing, necessitating scrutiny and improvement.
As the usage of computer vision technology progresses, ethics considerations have begun dominating the discussion. It’s crucial to examine specifics related to computer vision rather than depending on the general ethics linked to AI. These conversations are taking place during conferences, standards development and working groups, and research projects.
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) aims to initiate further discussion within computer vision applications and research. In 2022, it was encouraged that researchers submit papers and proposals including potential negative societal impacts of their proposed research and possible methods on how to mitigate them. Potential ethical concerns include the safety of living beings, privacy, environmental impact, and economic security.
The organizers prioritized transparency and stated, “Grappling with ethics is a difficult problem for the field [computer vision], and thinking about ethics is still relatively new to many authors… In certain cases, it will not be possible to draw a bright line between ethical and unethical.”
The committee of IEEE/CVF CVPR 2023 planed to continue this conversation for the next annual conference and called for papers that focus on transparency, fairness, accountability, privacy, and ethics in vision.
Specifically, in regard to ethics for XR, IEEE is laying down the foundation with standardization. As stated in IEEE Spectrum, “… the IEEE Standards Association (IEEE SA) is working to help define, develop, and deploy the technologies, applications, and governance practices needed to help turn metaverse concepts into practical realities, and to drive new markets.”
It’s also vital to keep in mind that this cutting-edge technology should be made accessible. For instance, it needs to accommodate people who are visually impaired. The study “Toward inclusivity: Virtual Reality Museums for the Visually Impaired” examines how narrations, spatialized “reference” audio, along with haptic feedback can be an effective replacement for the traditional use of vision in a virtual reality. The study discovered that those with visual impairments could locate objects more quickly with the aid of enhanced audio and tactile feedback.
Lastly, IEEE Transactions on Visualization and Computer Graphics (IEEE TVCG) conducted an analysis of gender representation among the attendees, organizers, and presenters at the IEEE Visualization (VIS) conference over the last 30 years. It was found that the proportion of female authors has increased from 9% in the first five years to 22% in the last five years of the conference.
It’s also vital to keep in mind that this cutting-edge technology should be made accessible. For instance, it needs to accommodate people who are visually impaired. The study “Toward inclusivity: Virtual Reality Museums for the Visually Impaired” examines how narrations, spatialized “reference” audio, along with haptic feedback can be an effective replacement for the traditional use of vision in a virtual reality. The study discovered that those with visual impairments could locate objects more quickly with the aid of enhanced audio and tactile feedback.
The IEEE Computer Society urges academics and practitioners to send any ideas that may advance the dialogue to inclusion@computer.org since, it is efforts such as these, that have the potential to push the industry towards a brighter future.
IEEE Computer Society Fellow and computer scientist engineer, Greg Welch, is the AdventHealth Endowed Chair in Healthcare Simulation in UCF’s College of Nursing in addition to being co-director of the UCF Synthetic Reality Laboratory. In 2021, Welch reached fellowship status, for contributions to tracking methods in augmented reality applications. Specifically, his primary area of study is virtual reality (VR) and augmented reality (AR), collectively known as “XR,” with a focus in both hardware and software applications.
Currently, Welch spends his time researching the way humans perceive AR related experiences when interacting with the technology. Additionally, he is the lead of the pending NSF project, “Virtual Experience Research Accelerator (VERA),” a system that will improve the process of generating VR related research for scientists.
When asked what advice Welch had for readers with an interest in pursuing a similar path, he mentioned how beneficial ongoing exploration can be, “The field changes fast — something that is hot today might not be tomorrow. In addition, a broader perspective can enable one to see connections and opportunities.”
He recommends taking advantage of community resources and networking opportunities, “From an experiential perspective, get involved! The community [IEEE Computer Society] would not exist without volunteers, but there are so many benefits — it really is true that you get out what you put in.”
Computer vision remains a dynamic and evolving field. Technological advances introduce new opportunities and efficiencies, and they are met with challenges in the form of new theoretical and societal considerations.
From privacy and algorithmic fairness to the feasibility of wide-scale adoption, this is one of the most exciting eras in computer vision. The market is expected to reach US $20.88 billion by 2030, growing 7% annually.
Here are a few key observations, developments, and considerations for the field, informed by insights from IEEE Computer Vision and Pattern Recognition Conference (CVPR).
“Half the papers in computer vision look like computer graphics. Instead of collecting data you can now simulate and that is very powerful.”
– Rama Chellappa, Johns Hopkins University
“NeRF research is a hot focus right now. It continues to generate jaw-dropping images and is a beautiful blend of computer graphics and computer vision. Computer vision scientists think of cameras as scientific measuring devices that can do more than capture visually pleasing 2D images. These algorithms are a continuation of that. The cameras will be designed to get better computational photography, unifying computer graphics, computational photograophy, and computer vision.”
– Kristin Dana, Rutgers University
“Another trend is content generation: DALL-E can now generate images out of open AI. It makes some computational sense that we should be able to do it. When we think and have a text description, our brains generate an image even though we haven’t seen it, like when we read a book and generate an image in our heads. The algorithms are capturing that ability, and it’s remarkable. But with these content generation algorithms comes the potential for bias, and we have our work ahead of us in considering how they can and should be used.”
– Kristin Dana, Rutgers University
“The community is at a unique junction where while some papers focus on core technical research combining classical and modern deep networks, others focus on classical problems and innovative solutions.”
– Richa Singh, IIT Jodhpur
“There’s a tendency to move from real data to synthetic data if it is working, if it is effective. Cameras can only capture what has happened; whereas synthesis can imagine and produce whatever you wish. So, there is more variety in the synthetic data. And the privacy concerns are less.”
– Rama Chellappa, Johns Hopkins University
“The Computer Vision, Pattern Recognition, and Machine Learning community at large is focusing on developing ingenious algorithms not only for difficult scenarios, unconstrained environments, but also being trustworthy and dependable.”
– Richa Singh, IIT Jodhpur