Eric: Hi guys, this is Eric Chevallet with BearingPoint speaking and you are listening to Five Minutes Ahead. Today I have the pleasure to have with me Rafael Pagés, co-founder and CEO of Volograms. So Rafa, I'll let you introduce yourself and what you do?
Rafael: I'm the co-founder and CEO of Volograms. We are an Irish startup based in Dublin. And what we do is we have an AI technology to turn videos and photos of humans into 3D models. So, you can create holographic messages, just recording them using one single camera.
From dodging a bullet to holographic models
Eric: How did you come to the idea of the hologram? What was your first intention?
Rafael: Vologram started in 2018 when the HoloLens, the Oculus Rift, etc. was announced. The way we started was basically trying to replicate the bullet time effect in the movie The Matrix. We were trying to do it with a lot fewer cameras because the original setup was using a hundred cameras, kind of placed in a ring structure and then inside of a green screen studio. We were trying to do it with five phones. To be able to do it, we built this 3D reconstruction algorithm that worked with a very reduced number of cameras. That's how we started creating these holographic models.
Moving from 100 cameras to 1 camera
Eric: How complicated is it to move from a hundred cameras to five to one? Because now you are using only one camera facing the guy recording his hologram, right?
Rafael: Yeah, it is complicated, but I guess there's also like a road to get to that point. So, we started trying to build this solution that would work with almost any volumetric capture setup, which meant that we had to take care of the capture side, of the processing side and the delivery. There were many different parts of a massive, very large pipeline of 3D reconstruction. Once we decided to go and do it with one single viewpoint, we had to significantly change this pipeline, but we already had parts of it which was also very similar. We had to change the way that we were doing things from business point of view, from a marketing point of view, and of course from a technology point of view, but we wouldn't have been able to do something like that if we first doing professional high-end captures. There's an important point there to mention, we do this with AI, which means that you need a lot of data to train the AI algorithms to be able to learn how to get a 3D model of a person. So having the professional business before having the monocular business helped us generate this massive database for the realistic 3D models of humans that then we use for training the data.
Training the AI algorithm to build a 3D Capture
Eric: It's an interesting comment that you make because I think people are confusing generative AI and AI and it's not very clear. So, can you explain in a little bit more detail what you mean by training the AI algorithm to recognize or to be able to build a 3D capture?
Rafael: So there's a lot of points here when you're talking about AI and Generative AI, which is just, let's say one part of what AI is. You're talking about teaching a computer or an algorithm to think basically like a human does, and it's of course, oversimplified. But what you do when you're a human, you're basically learning by seeing things many different times and learning patterns in these things, right? So, you don't learn what an apple is by getting a set of instructions. Like it must be round, it must be green or yellow or red. It must come from this type of tree. No, you see an apple and again and again. And you get used to seeing that. Most of AI works this way, so sometimes you need to see a lot of data to be able to understand how an image is a pattern, to recognize a person or to identify something and or to be able to even predict what somebody's going to be saying, right? So, this is what I mean by training. We basically taught an algorithm to understand or to be able to explicitly show you the shape of a person by looking at that person from one single angle. And for that it needs to see a lot of 3D models because otherwise it is impossible for the algorithm to know how it should look.
Eric: How many people did you use to train the model?
Rafael: We have a database of more than 60,000 3D models in our backend that we use for the captures. Of course, some of these captures are of the same person, so we have to be very careful the neural network doesn't overtrain into a specific person.
Eric: And the model, is it still learning, or do you reach a level where you don't need to train him anymore?
Rafael: We are training it all the time and different ways. Of course, you can refine it and continue to train it so it gets a little bit better. But basically, what we do is that we improve also the input data that we provided to him too. So, when we started, we were only giving it the, let's say the segmentation mask. So, the person sub extracts it from the background, plus the colour image. And that was everything that I needed. But with time, we started to add more layers on top of it. So, for instance, now we also have semantic segmentation. We also give it like labels about this is, this is your jumper, and this is your trousers, and this is hair, this is the right arm, left arm, things like that.
Treatment process
Eric: How long does it take to rebuild the model?
Rafael: For the 30 second videos that you were recording for your podcast, I think it could take more or less one hour or so to process.
Eric: Yeah, and after one hour, do you have to retreat the results by hand or is it already at the level that we are seeing?
Rafael: No, this is not a fully automated process. We will do like a quality check just to make sure that everything went well, but there's no manual clean-up process where somebody gets the 3D models and starts changing them or whatever. Yeah, it's more to make sure that nothing went wrong, that the result that we're delivering is good, but there's no, manual intervening once the models are processed.
Apple Vision Pro headset
Eric: Last question … We talked about AR and holograms. The announcement of Apple with the Vision Pro headset, will it change something for you?
Rafael: Hopefully. Yes, I think everyone in the industry was pretty excited one way or another about the announcement. It is an expensive device of course, but it is basically a way of validating a whole industry that I think a lot of people still have questions about. But when you get Apple into the space, it's perfect because it's a new platform with a very well supported developer ecosystem that is going to require 3D models and 3D assets for their new experiences. The same way that you were sharing your Vologram message as a web link, and people were able to see it in web using their phones, we expect that there will be a way of also doing that. So your Vologram message will show up directly into the 3D space within the Apple glasses Vision Pro.
Eric: Well. I can't wait to see the result. So Rafa, thanks a lot for this conversation. It was very insightful and at least now I will be able to explain my kids what AI is, using your apple example because they know how to ask ChatGPT how to do their homework on their behalf, but not much about what it is behind. So thanks a lot for this and to all see you soon. Thank you Rafa.
Rafael: Thank you very much.
Eric: Bye-Bye guys.