Development of an improved version of the holographic physician assistant

As part of the project, an advanced real-time 3D communication system was developed. This involved a series of activities including research, technology implementation, and functional testing. The main goal was to create an application for HoloLens 2 devices that would allow for the transmission and reception of sound and 3D images between two users located in different locations.

First, it was established that it is possible to record spatial images using Intel RealSense cameras with the desired specifications. It was also confirmed that such recordings could be displayed at the intended resolution on Microsoft HoloLens 2 goggles. However, the most challenging part of this stage was playing the recording in real-time. Spatial recordings require a reduction in the number of triangles, which can lead to unstable or slow image rendering (performed by the GPU embedded in the goggles), causing a loss of fluidity and data quality. For example, a one-minute spatial recording at a resolution of 1280×720 at 30 fps, when processed on a computer with an Intel Core i9, 32 GB RAM, and an RTX2080 graphics card, required the following steps: data preparation (frame synchronization, filtering, calibration) took several minutes; depth data processing and 3D mesh creation took anywhere from several to a dozen minutes depending on the scene’s complexity; and rendering the 3D animation took from a few minutes to an hour, depending on the desired level of detail and effects, with the number of triangles limited to a maximum of 200,000.

Based on these results, it was concluded that despite using high-performance hardware, achieving real-time flow is not feasible with current equipment if high image detail is assumed. The time required for conversion is proportional to the recording length, and for a one-minute recording, it could take less than 15 minutes with better optimization. However, such a conversion should ideally be performed in a much shorter time. It is estimated that 1 second of recording currently requires about 15-20 seconds for the necessary conversions. This does not include the time required to launch the application and establish stable data transmission between the application and Microsoft HoloLens 2 goggles. During testing, real-time transmission was sometimes achieved, but only with data optimization/conversion that significantly reduced the model’s utility and relevance. For example, a human head as a 3D model appeared oval without visible ear contours or any facial anatomy. Microservices technology and application performance scaling through containerization could be applied, which would require additional work but might reduce conversion time while maintaining the appropriate level of output data quality.

Next, the focus shifted to enabling voice communication between users. External services like Azure Communication Services were used to transmit sound between devices. WebRTC was used to implement the functionality of recording sound from the microphones embedded in HoloLens 2 and transmitting it to the recipient. A mechanism was created to facilitate easy connection establishment and session management. This operation can be compared to using an additional communication application that is embedded thanks to the available service created by our application. The external service handles the entire transfer, fluidity, and quality of the transmission, reducing the load on the local server while providing greater universality and compatibility of the solution.

Data security was another key aspect of the project. Since the QNAP TS-251+ used in the project does not support Full Disk Encryption (FDE) technology, other encryption techniques were employed that fully meet the same expectations and deliver no less satisfactory results. Data transmitted to the server is sent via encrypted SSL/TLS connections, securing it during transfer between the device and network clients. Data stored on the server is immediately encrypted upon receipt, with the hardware using AES 256-bit encryption, ensuring a high level of data security and meeting the security requirement specified in the milestone.

Finally, communication between the holographic application and the AI module was a significant element of the project. The AI module was integrated with the application, allowing for automatic data processing and intelligent responses to user interactions. The AI module also assisted in optimizing image rendering and sound processing, contributing to increased application performance and stability.

Development of HoloLens 2 Application: An application was developed that enables the transmission and reception of sound and 3D images between users in different locations.
3D Image Recording and Display: It was established that spatial images could be recorded using Intel RealSense cameras and displayed on Microsoft HoloLens 2 goggles.
Real-Time Challenges: Issues related to real-time playback of spatial recordings were identified, linked to hardware limitations and data processing complexity.
Voice Communication Integration: Voice communication between devices was implemented using Azure Communication Services and WebRTC technology, enabling easy connection establishment.
Data Security: Advanced data encryption techniques were applied during transmission and storage, meeting the project’s security requirements.
AI Module Integration: The AI module was integrated with the application, supporting automatic data processing and image rendering optimization, which increased application performance and stabilit