The Production-First Mindset

Munch CTO, Peter Naftaliev - Interacting With A System

February 20, 2022 Liran Haimovitch Episode 30
The Production-First Mindset
Munch CTO, Peter Naftaliev - Interacting With A System
Show Notes Transcript

Rookout CTO Liran Haimovitch sits down with Peter Naftaliev who is the CTO at Munch.  They discuss his journey that brought him to Munch, what it means to build an automation platform for content creators, his experience launching an MVP (minimum viable product), and what he would’ve done differently if he could do it all again.

Painless Cloud-Native Debugging
Rookout is a disruptive developer solution for Cloud-Native debugging and live data collection.

SPEAKERS
Liran Haimovitch, Peter Naftaliev

Liran Haimovitch  00:02
Welcome to The Production-First Mindset. A podcast where we discuss the world of building code from the lab all the way to production. We explore the tactics, methodologies, and metrics used to drive real customer value by the engineering leaders actually doing it. I'm your host, Liran Haimovitch, CTO and Co-Founder of Rookout. This episode is about bringing VPS to production. With us is Peter Naftaliev, Co-Founder of Munch, a media startup. He is a thought leader in image processing, computer vision, and AI. Thank you for joining us today and welcome to the show.

Peter Naftaliev  00:45
Thank you, Liran. Happy to be here.

Liran Haimovitch  00:47
Peter, can you share with us a little bit about yourself?
 
Peter Naftaliev  00:50
Yeah, of course. So I'm a programmer from a young age. I've been interested in AI since my teenage age, basically, a bit before it became mainstream. I have a blog for computer vision, computer graphics, and artificial intelligence. During COVID, that blog also evolved into an online community in this space. And from our community, we do zoom events with researchers from around the world that work with computer vision and computer graphics, where the researcher is teaching about their subject of expertise, their know-how, and me and the audience can join and ask questions and get a bit more an understanding of a specific topic in our field. These sessions are recorded, and they're uploaded to YouTube. So there's also a YouTube channel. And basically, I turned into a micro-content creator. And I was exposed to all of the difficulties in creating content, be it verbal, video-based, blog form, or written-based. I developed some automations, for myself, for my own workflows, which helped me with producing my content. And then we started talking to other content creators who also have their own difficulties, their own focus, to see how we can help them with the help of AI and automation to make their work easier, more enjoyable, faster, cheaper. This evolved into my startup right now, which is called Munch. We're building an automation platform for content creators that create informative verbal content and we help them create the content in the way they like most, in the format they like most that they're used to. And with us, they can transform to other formats and other content platforms. So actually, a good example will be this podcast that we're talking on right now. We will probably use Munch to transfer it also into a blog post, into LinkedIn posts, possibly also Twitter threads. And this is what we're doing, we're transforming between the different formats for video content, audio content, and textual content.
 
Liran Haimovitch
  02:41
I've heard you've recently launched your MVP. Your first real website with application. What was it like?

Peter Naftaliev  02:49
It was an interesting experience. So we're a very, very early-stage startup, with very limited resources and got to move quickly. So basically, we had our first software engineer come to work in the first of December, the sixth of December, we started writing our first pieces of code. And in the end of December, we needed to have an MVP out in production, working, and serving our users. Before that, what we were doing is we were transferring YouTube videos and podcasts to blog posts for our users and we were actually using Google Docs as our product. Because we could put the text, the blog posts in the Google Doc, and then the user could look at it, change it, edit it, and publish the information from there how he pleases. But we wanted to make sure that our clients are getting a product that is Munch branded, that they see our logo there, that they see the user interface there, that the user interface complies with what they need from our specific product. And on top of that, we wanted to start doing the basic tracking and understanding of how a user interacts with our system. When they get-- when they get the draft blog post that we create for them with our automation. What do they look at? How do they change it? How do they edit it? Where do they say more, where do they say less? This is important for us from a product reason. So we can understand how to better create a product, but also for …because we want to make the engine better and better and improve it in the future. And for that we need the data set exactly of this, of how a user gets an automatically created post and how they change it so that the automation in the future will be better according to user behavior previously. And all of this needed to happen in about three weeks of coding. So we had our product definitions, the screens available, we took it down to the real minimum, our product manager will be ashamed to even call this an MVP, he's calling this a pre-MVP. We took it down to the bare bones of what's the basic things that are required just to start running. We start developing it in-house, internally, on our own computers.

Liran Haimovitch  04:52
So how did the ranks go? What was good about it? What was not so good about it?

Peter Naftaliev  04:58
So in the beginning when we were developing just on our own PCs, we were actually ahead of schedule, after a week and a half we already had the UI working, we had frontend talking to backend, everything looked amazing and was looking good. We had some small UI bugs, some small backend bugs, but nothing that we thought, okay, will take us a while. We were a bit ahead of schedule, which was nice, we're even adding more features on top of that bare bone thing we thought in the beginning. And then luckily, we thought of saying, okay, it'll take us time to really upload everything to the cloud, really make it production-ready. So let's start four days about ahead of time, ahead of the launch schedule, and start already uploading everything to production. And then interesting stuff happened. One of the things--  let's start from the one that really sat in my head. The most was-- Basically, we have a frontend, which is showing a full blog post with images, with formatting, with texts. And it's long form, sometimes an hour of somebody talking can be turned into 10s of 1000s of words, for a post. This is a lot of data, not including the images. So one of the things that I was really trying to focus on with the team was when we're doing all our testing, doesn't matter if the system is running locally, or on the cloud already, let's test high bandwidth or more to say, heavy on memory posts, posts that are maybe a megabyte, two megabytes in size. And see how our editor, text editor on the frontend is interacting with the backend and how much time it takes. If stuff is getting stuck. Maybe exceptions are happening. And somehow everybody did the tests, and it was all good. And then, we uploaded it to the production and we started running really, and we have users coming in. And no, a page takes 20 seconds, 30 seconds to load. And what happened there? We used a plugin, a text editor plugin for HTML, a JavaScript based text editor plugin. And we were just uploading a full blob of texts and images within the blob to that plugin, and it was taking a long time to render the images. And what needed to happen is those images needed to be saved not as a blob on the full text, but as references to a storage. And then the plugin when it's working with a storage, it was working with a URL to a storage point, it could render these images separately from the entire posts and basically have a type of lazy loading going on. The reason we didn't see this before was because we weren't working with specifically the exact type of content, with specifically the exact type of workflow. We're trying to like copy-paste on our own machines. And it was working nicely until we saw like, Okay, it's not just copy-paste, you really got to try this in different ways you wouldn't have thought before to make it happen. So we had about half a day to switch from just a text blob to using Google Storage and connecting our plugin to the storage. And the way we save the text blob is now different. Because within it, we need to save the URLs to the storage. That was one thing that was surprising, annoying, and yet somehow also expected. Another thing that happened on our devices, we were doing… start, and the code will run in a few seconds. And if there's a bug directly, you go to the line of code, you change it. And amazing, it all works. And you can continue to the next thing. Once you start working on the cloud, we're running-- I think this was run on Google run, either App Engine or run, I think it's run. We had both at the same time. Another thing, by the way, why did we have both at the same time, because our software engineer, you know, it's his first time of him experimenting with the Google Cloud platforms. So he found a good tutorial for how to put the backend on Google App Engine and the frontend on Google run. So this is why we had to mix. Now, this is how it happens. You never plan for this in events. And then, the first search you find on Google that is helpful and you can see that you can use it, you use it and whatever it is, is what you stick with. Now what happened there was that suddenly, we basically had a CICD operation, before that nothing, before that we're just saving the code NPM start and that's it. And now, we have full operation going with the Google Cloud. And suddenly, every code we change takes 10 minutes for the cloud to repass, reupload, recompile, and make available. Behind the scenes there are Docker images being compiled, all of these things that we didn't even work with before. So any small incremental steps, any small bug fix we've got to do will take us basically 20 minutes to check if it's working or not. Suddenly, you realize, okay, you can be working on the main Git branch. You've got to be doing off branching, testing it locally and then only uploading it to production. Okay. Next thing was, something is working on our local machine perfectly. But suddenly on the cloud, it's not working. Why? And suddenly, you realize, okay, I need logging. Because on the cloud-- on my local machine, I could debug, I could look at everything. I could do console log and everything, so I could see, but it's not really available for me on the cloud. What do I do? And here we were lucky because Google, they have like a log manager… installed and available. So actually, that log manager is actually able to catch all the system log, the console log messages, so we could look at all those messages that are going internally in our code and debug basically, through the log, which was enough for us at that stage. We could use it to really understand where our significant problems are and fix it and then upload again. So here, we were more lucky than smart and our planning was quite easy to find than to use.

Liran Haimovitch  10:20
So you've mentioned you've seen a lot of bugs and issues after you launched into the cloud. How did you identify those problems?
 
Peter Naftaliev  10:28
The main thing that helped us identify is basically the entire team was in testing mode. This was a full operation that happened internally in the company, all of our employees, were doing testing on the system, uploading posts, checking them, editing them, copying them. And this was the first way where we could find issues happening. Somebody is trying to log in to the system, and it is not working, and then okay, why? Something is broken behind the scenes, somebody is trying to put another link, it is not working, why? Specifically, we've also put aside two days just for testing the system, and really seeing those bugs. Later on, we had a type of Mixpanel installed on our system. So we could track user behavior, and we could see what screens are getting stuck on. We also used hotjar for that so we could see what screens are getting stuck on and then we could go back home and try the same link, try the same screen and see what's working and not, and through that we could identify the problems. We did not plan for having some sort of internal monitoring of the health of the system. We didn't have time for that. And now we're paying for it, you know, in retrospect, because we got to be looking at everything the user makes. And if a user is stuck, so they're stuck, and we didn't know it in advance. But this is part of startup life, I guess.

Liran Haimovitch  11:45
What would you have done differently? If you had to do the same, to launch the same MVP all over again.

Peter Naftaliev  11:50
So, I think that the important thing is to not think that we're really ahead of schedule before we're really in production. So the way to do this is to do the uploading to Production and testing ahead of schedule. And then finding out your issues and planning accordingly. In advance, you can't know what you need the access to, you can't think of all the variables, while you're also thinking of okay, how the user will use my system. But you've got to have some time available to do research and development for that uploading to the production stage. It's not something that can take one day or two. Especially, if you include testing, even for a micro MVP, this is a process that could take as long and maybe even more than the original development of that thing on your local machine. You've got to plan accordingly. I think we didn't plan that timeframe right.

Liran Haimovitch  12:39
Getting access to the right, getting some level of production-grade deployment can often be tricky, especially if you haven't done it before. Or at least for this technology.

Peter Naftaliev  12:50
Exactly. I used to do CICD really hands-on, let's say eight years ago, maybe a bit more. AWS and Google Cloud were just starting to get available. So there weren't so many tools and products for that. So it was quite simple. You know, okay, there's a specific product to do this and that's it. You don't need to go looking and to experiment. Today for everything, for every problem you might have, as a developer, when you're uploading to production, you have at least two available solutions for the same platform. So Amazon will have at least two, Google will have at least two. And that's at least, and usually one is new, or the other one is older. And why? Because they had like issues with one and then they had customer base wanting the other. But you don't know this in advance when you're researching how to solve this problem. You just see the full list of different products, they all say they solve the same problem from the same provider. So you have no idea why even it's the same, why they have several products for that. And once you start using with one, you understand, Whoa, it's not solving this specific logging thing that I need, oh, here, if I want to do some configuration, like for example, whether I store passwords, how do I put them in when I'm uploading the production? Different products have different integration with different things. You don't know this in advance here. And you can't even think about searching for it because there are 10s, or maybe hundreds of these types of features you need when you're uploading to production. And we actually found ourselves sending messages in a couple of groups of technical people saying, hey, who has ever used Google Cloud for this or for that, and can help us? Actually almost nobody knew. Almost nobody knew the specific type of thing that we needed, because there's so many products, and so many features.

Liran Haimovitch  14:34
Yeah, that's usually a bad sign. If you're asking about technology, and nobody can help you about it, it's probably not very popular and you might not want to be using that.

Peter Naftaliev  14:44
Yes, this is true. What we found out was that Amazon for example, Amazon Cloud products are much more mainstream in our community. Most of our friends know the issues and the good things with these products. Less so with Google, so we're actually internally now thinking of switching to Amazon. We'll see.
 
Liran Haimovitch  15:04
Awesome. So what does the future hold for Munch, for your product, for engineering?

Peter Naftaliev  15:12
So that's a good question. We have basically two fronts, maybe let's even talk about Munch as a whole. And from this, we could talk about the engineering. Munch is an interesting company, because it's a company that's building a very non-trivial tech, AI-based tech for transforming content between formats. But at the same time, the product also is non trivial. How does the user use the system? How do we and do we schedule connections to other platforms, other social networks? How does the front end look, not just behind the scenes, the brain of the system, but also the front of it? How does the user interact with the system? So usually, startups try to focus on one issue, it'll be either creating a killer ass product and interface or creating killer ass tech, behind the scenes. We're doing both. So one of the - and this is maybe more on a strategic level for engineering - one of the challenges there is, what type of workforce that we try to recruit? Are we going on good, smart, fullstack engineers that could do everything, including integrating with the right AI, APIs? Or are we recruiting a deep research team behind the scenes that will sit in the bunker for half a year, a year, create the amazing tech, and then we'll launch it to the world. Now, what we discovered is that we need both, we're going to be running in parallel. So fullstack engineering will be creating the system, integrating with all the systems, experimenting all these things we just talked about, and also now experimenting with how to connect in production, to Twitter, to Facebook, to LinkedIn, to Medium, to WordPress, all of the platforms where content resides. And also, what are the limitations of these connections? Maybe you can't upload from your product to medium. Maybe when you're connecting to Facebook, you can't do specific automations that you need, maybe you can't do the tracking that you want for Twitter. So we've got to be researching into this space. At the same time, we need to really be creating some sort of deep learning tech behind the scenes, that is actually in the academia now is the frontier. And all the time tracking user engagement, tracking edits of users, because this will be that data set that we need for our deep learning system behind the scenes.
 
Liran Haimovitch  17:33
That's awesome stuff. Now, you've mentioned that you've been coding ever since you were young, and you've been into AI for like forever. So there's one question I would love to ask you. And that's what's the single bug that you remember the most?

Peter Naftaliev  17:47
This was actually a JavaScript bug. It was not my bug. It was really the implement-- It was before-- I think before chrome existed. On explorer, I think it was. So I had-- I don't know if it was a model window or a popup, I think a popup. I built a system where there's a popup coming up, a specific popup depending on what the user did, with specific information on the popup window. And no matter what I did, the bug was that sometimes no matter what you did in the original window, the popup is the same. So no matter if I change some info that should change the popup, it remains the same. But what's happening there. Why was this happening? So the thing behind the scenes was that because the URL for the popup was staying the same, it doesn't matter that behind the scenes, maybe in the request there are changes, but the URL is the same. The browser is storing in cache that page that should be showing that URL. So it's not really doing that logic behind the scenes that I needed to do. And the way to solve that which took me a long time, there were no answers online to find out how it's going. And then, I talked to my manager, then he said, you know, this sometimes happens with popup windows, try this thing. And the thing was, add on to the URL, another parameter, which is random, just a random generated parameter, just to make sure that the browser is really trying to go back and get the information that it needs. It was so annoying. It was like such a long time wasted just on a stupid bug.

Liran Haimovitch  19:15
Definitely. Well, stupid bugs can be important.

Peter Naftaliev  19:19
This is why we pay engineers a lot of money.
 
Liran Haimovitch
  19:22
Awesome Peter, it's been great having you on the show. I wish you plenty of luck at Munch. And if everybody is interested in you know, converting content and creating more content, you should definitely check that out.

Peter Naftaliev  19:34
Thank you, Liran, it was great being here.
 
Liran Haimovitch  19:42
So that's a wrap on another episode of The Production-First Mindset. Please remember to like, subscribe, and share this podcast. Let us know what you think of the show and reach out to me on LinkedIn or Twitter at @productionfirst. Thanks again for joining us.