webgl 著色器
by Dan Ruta
通過Dan Ruta
如何使用AI,AR和WebGL著色器來幫助視障人士 (How you can use AI, AR, and WebGL shaders to assist the visually impaired)
Today, about 4% of the world’s population is visually impaired. Tasks like simple navigation across a room, or walking down a street pose real dangers they have to face every day. Current technology based solutions are too inaccessible, or difficult to use.
如今,全世界約4%的視力障礙者。 諸如在房間中進行簡單導航或走在街上這樣的任務構成了他們每天必須面對的真正危險。 當前基于技術的解決方案太難以訪問或難以使用。
As part of a university assignment, we (myself, Louis, and Tom) devised and implemented a new solution. We used configurable WebGL shaders to augment a video feed of a user’s surroundings in real-time. We rendered the output in a AR/VR format, with effects such as edge detection and color adjustments. Later, we also added color blindness simulation, for designers to use. We also added some AI experiments.
作為大學任務的一部分,我們( 本人 , 路易斯和湯姆 )設計并實施了一個新的解決方案。 我們使用可配置的WebGL著色器實時增強了用戶周圍環境的視頻饋送。 我們以AR / VR格式渲染輸出,并具有邊緣檢測和顏色調整等效果。 后來,我們還添加了色盲模擬,供設計人員使用。 我們還添加了一些AI實驗。
We did a more in-depth literature review in our original research paper. ACM published a shorter, two page version here. This article focuses more on the technologies used, as well as some of the further uses, and experiments such as AI integration.
我們在原始研究論文中進行了更深入的文獻綜述。 ACM發布較短,兩頁的版本在這里 。 本文將重點介紹所使用的技術以及一些其他用途以及諸如AI集成之類的實驗。
A popular approach we found in our studies of existing solutions was the use of edge detection for detecting obstacles in the environment. Most solutions fell short in terms of usability, or hardware accessibility and portability.
在對現有解決方案的研究中,我們發現一種流行的方法是使用邊緣檢測來檢測環境中的障礙物。 大多數解決方案在可用性,硬件可訪問性和可移植性方面都達不到要求。
The most intuitive approach we could think of as feedback to the user was through the use of a VR headset. While this meant that the system would not be of help to very severely visually impaired people, it would be a much more intuitive system for those with partial sight, especially for those with blurry vision.
我們可以想到的最直觀的方法是通過使用VR頭顯來反饋給用戶。 雖然這意味著該系統對視力嚴重受損的人無濟于事,但對于那些視力不佳的人,尤其是視力模糊的人來說,它將是一個更加直觀的系統。
邊緣檢測 (Edge detection)
Feature detection, such as edges, are best done using 2D convolutions, and are even used in deep learning (convolutional neural networks). Simply put, these are dot products of a grid of image data (pixels) against weights in a kernel/filter. In edge detection, the output is higher (more white) when the pixel values line up with the filter values, representing an edge.
諸如邊緣之類的特征檢測最好使用2D卷積完成,甚至可以用于深度學習( 卷積神經網絡 )中。 簡而言之,這些是圖像數據(像素)網格相對于內核/過濾器中的權重的點積。 在邊緣檢測中,當像素值與代表邊緣的濾鏡值對齊時,輸出較高(更白)。
There are a few available options for edge detection filters. The ones we included as configurations are Frei-chen, and the 3x3 and 5x5 variants of Sobel. They each achieved the same goal, but with slight differences. For example, the 3x3 Sobel filter was sharper than the 5x5 filter, but included more noise, from textures such as fabric:
邊緣檢測過濾器有一些可用的選項。 我們包括為配置的是Frei-chen以及Sobel的3x3和5x5變體。 他們每個人都實現了相同的目標,但略有不同。 例如,3x3的Sobel濾鏡比5x5的濾鏡更清晰,但包含更多的噪波,來自諸如織物的紋理:
網絡平臺 (The web platform)
The primary reason we chose the web as a platform was its wide availability, and compatibility across almost all mobile devices. It also benefits from easier access, compared to native apps. However, this trade-off came with a few issues, mostly in terms of necessary set-up steps that a user would need to take:
我們選擇網絡作為平臺的主要原因是其廣泛的可用性以及幾乎所有移動設備的兼容性。 與本地應用程序相比,它還受益于更輕松的訪問。 但是,這種權衡會帶來一些問題,主要是用戶需要采取的必要設置步驟:
- Ensure network connectivity 確保網絡連接
- Navigate to the web page 導航到網頁
- Turn the device to landscape mode 將設備轉到橫向模式
- Configure the effect 配置效果
- Enable VR mode 啟用VR模式
- Activate full screen mode (by tapping the screen) 激活全屏模式(通過點擊屏幕)
- Slot the phone into a VR headset 將手機插入VR耳機
To avoid confusing a non-technical user, we created the website as a PWA (progressive web app), allowing the user to save it to their Android home screen. This ensures it always starts on the correct page, landscape mode is forced on, the app is always full screen, and not reliant on a network connection.
為避免混淆非技術用戶,我們將網站創建為PWA( 漸進式Web應用程序 ),允許用戶將其保存到Android主屏幕。 這樣可以確保它始終在正確的頁面上啟動,強制使用橫向模式,應用始終處于全屏狀態并且不依賴于網絡連接。
性能 (Performance)
Early JavaScript prototypes ran nowhere near our 60fps target, due to the very expensive convolution operations. We suspected that the bottleneck was JavaScript itself. We attempted a WebAssembly version. The resulting prototype ran even slower. This was most likely due to the overhead in passing the video frame data to the WebAssembly code, and back.
由于非常昂貴的卷積運算,早期JavaScript原型無法達到我們的60fps目標。 我們懷疑瓶頸是JavaScript本身。 我們嘗試了WebAssembly版本。 最終的原型運行得更慢。 這最有可能是由于將視頻幀數據傳遞到WebAssembly代碼并返回的開銷。
So instead, we turned to WebGL shaders. Shaders are awesome because of their extreme parallelization of a bit of code (the shader) across the texture (video feed) pixels. To maintain high performance, while keeping a high level of customization, the shader code had to be spliced together and re-compiled at run-time, as configurations changed, but with this, we managed to stay within the 16.7ms frame budget needed for 60fps.
因此,我們改為使用WebGL著色器。 著色器之所以出色,是因為它們在紋理(視頻饋送)像素上對代碼(著色器)具有極高的并行度。 為了保持高性能,同時保持較高的自定義級別,必須將著色器代碼拼接在一起并在運行時重新編譯,因為配置會發生變化,但是與此同時,我們設法將著色器代碼保持在16.7ms的幀預算內60fps。
反饋 (Feedback)
We carried out some user testing. We tested some basic tasks like navigation, and collected some qualitative feedback. This included adjustments to the UI, a suggestion to add an option to configure the colors of the edges and surfaces, and a remark that the field of view (FoV) was too low.
我們進行了一些用戶測試。 我們測試了一些基本任務,例如導航,并收集了一些定性反饋。 其中包括對UI的調整,建議添加用于配置邊緣和表面顏色的選項的建議,以及關于視野(FoV)太低的說明。
Both software improvement suggestions were applied. The FoV was not something which could have been fixed through software, due to camera hardware limitations. However, we managed to find a solution for this in the form of cheaply available phone-camera fish-eye lenses. The lenses expanded the FoV optically, instead of digitally.
兩種軟件改進建議均已應用。 由于攝像機硬件的限制,FoV不能通過軟件修復。 但是,我們設法找到了一種價格便宜的手機相機魚眼鏡頭的解決方案。 鏡頭通過光學而非數字方式擴展了FoV。
Other than that, the system surpassed initial expectations, but fell short on reading text. This was due to there being two sets of edges for each character. Low light performance was also usable, despite the introduction of more noise.
除此之外,該系統超出了最初的預期,但是在閱讀文本方面卻不盡人意。 這是因為每個字符都有兩組邊緣。 盡管引入了更多的噪音,但低光性能還是可用的。
Some other configurations we included was the radius of the effect, its intensity, and color inversion.
我們包括的其他一些配置是效果的半徑,其強度和顏色反轉。
其他用例 (Other use cases)
An idea we had was to add shader effects to simulate various types of color blindness, providing an easy way for designers to detect color blindness related accessibility issues in their products, be they software or otherwise.
我們的想法是添加著色器效果以模擬各種類型的色盲,為設計人員提供了一種簡便的方法來檢測產品中與色盲有關的可訪問性問題,無論是軟件還是其他方式。
Using RGB ratio values found here, and turning off edge detection, we were able to add basic simulations of all major types of color blindness through extra, toggle-able components in the shaders.
使用此處找到的RGB比率值并關閉邊緣檢測,我們能夠通過著色器中額外的可切換組件添加所有主要色盲類型的基本模擬。
人工智能與未來工作 (AI and future work)
Although it’s an experiment, still in its very early stages, higher level object detection can be done using tensorflowjs and tfjs-yolo-tiny, a tensorflowjs port of tiny-yolo, a smaller and faster version of the YOLO object detection model.
盡管這是一個實驗,但仍處于早期階段,可以使用tensorflowjs和tfjs-yolo-tiny ( tiny-yolo的tensorflowjs端口)進行更高級的對象檢測,后者是YOLO對象檢測模型的更小更快的版本。
The next step is to get instance segmentation working in a browser, with something similar to mask rcnn (though, it may need to be smaller, like tiny-yolo), and add it to WebSight, to highlight items with a color mask, instead of boxes with labels.
下一步是使實例分段在瀏覽器中工作,并使用類似于mask rcnn的方法 (盡管它可能需要更小一些,例如tiny-yolo),然后將其添加到WebSight中,以突出顯示帶有顏色掩碼的項目,而不是有標簽的盒子。
The GitHub repo is here, and a live demo can be found at https://websight.danruta.co.uk. Do note that until Apple provides support for the camera API in browsers, it might not work on Apple phones.
GitHub倉庫在這里 ,可以在https://websight.danruta.co.uk上找到實時演示。 請注意,除非Apple在瀏覽器中提供對相機API的支持,否則它可能無法在Apple手機上使用。
Of course, I had some extra fun with this as well. Being able to edit what you can see around you in real time opens up a world of opportunities.
當然,我對此也有一些額外的樂趣。 能夠實時編輯您在周圍看到的東西,打開了一個充滿機遇的世界。
For example, using a Matrix shader, you can feel like The One.
例如,使用Matrix著色器 ,您會感覺像The One。
Or maybe you just enjoy watching the world burn.
或者,也許您只是喜歡看世界燃燒 。
You can tweet more shader ideas at me here: @DanRuta
您可以在這里向我發送更多著色器提示:@DanRuta
翻譯自: https://www.freecodecamp.org/news/how-you-can-use-ai-ar-and-webgl-shaders-to-assist-the-visually-impaired-3df5bdf3b3e2/
webgl 著色器