Speak Directly to Your Data, No Coding Required with Sarah Nagy Artwork

Your AI Injection

Is AI an ally or adversary? Get Your AI Injection and learn how to transform your business by responsibly injecting artificial intelligence into your projects. Our host Deep Dhillon, long term AI practitioner and founder of Xyonix.com, interviews successful AI practitioners and domain experts to better understand how AI is affecting the world. AI has been described as a morally agnostic tool that can be used to make the world better, or harm it irrevocably. Join us as we discuss the ethics of AI, including both its astounding promise and sizable societal challenges. We dig in deep and discuss state of the art techniques with a particular focus on how these capabilities are used to transform organizations, making them more efficient, impactful, and successful. Need help injecting AI into your business? Reach out to us @ www.xyonix.com.

All Episodes

Your AI Injection

Speak Directly to Your Data, No Coding Required with Sarah Nagy

October 17, 2024 • Season 4 • Episode 7

What if anyone in your company, regardless of technical prowess, could simply talk to their data? In this episode, host Deep Dhillon is joined by Sarah Nagy, CEO of Seek.ai, to explore the revolutionary shift in how businesses interact with their most valuable resource—data. Nagy discusses how AI, powered by natural language processing, is transforming data access by allowing non-technical users to ask complex questions and get instant, insightful answers. As businesses race to embrace this technology, the conversation raises critical questions about the future of AI and its role in shaping decision-making across industries.

Learn more about Sarah here: https://www.linkedin.com/in/sarah-nagy/
and Seek AI here: https://www.linkedin.com/company/seekai/

Check out some of our related podcast episodes:

[Automated Transcript]

Deep: Hello, I'm Deep Dhillon, your host. And today on your AI Injection, we're joined by Sarah Nagy, co founder and CEO of Seek. ai. Sarah's got an extensive background in data science and finance, and she's led Quantitative efforts at both pre data and Edison.

She's now focused on transforming how businesses interact with their data using natural language processing. Sarah, thanks so much for coming on. Thanks so much. Really excited to be here. Awesome. So maybe this is like a question I'd love to get started with, Tell us what inspired you to create seek and maybe anchor it in like what specific problem do people have that do not use your tool?

What do they do without your tool and how is it different once they have your tool?

Sarah: I think it is helpful to start with my background because it is what led me to start seek was solving a real pain point I had. I was a data scientist for many years.

I worked at a couple of very large financial institutions. I worked at a couple of startups. And it's just everywhere I worked, I was just experiencing this same problem where, I wanted to be doing research, but instead I was just spending so much of my time just, stopping every here and there just to help business people get just very basic data that they needed.

Sarah: But they just couldn't access with the tools that they had. It was a lot of just people come to my desk, people messaging me on Slack asking for help. And it was just this total inefficiency that was, more and more of a pain point throughout my career. And in parallel because I was a data scientist, I had worked on some NLP natural language processing applications.

And so I had the chance to, witness LLMs very early on in their development. I quickly saw that LLMs were getting much, much better at things like code generation. And so it occurred to me very early on that AI really would be the future of this kind of work.

I really just saw that vision very clearly and it gave me enough conviction to quit my job and start a whole company around it. So that was the origin story of Seek

Deep: I think that kind of resonates, In every org, there's a lot of less technical folks and like people doing like deeper data science, and they might have Some basic database skills. This is like a problem that a lot of folks have, right? Every org, even if we go back, like we could go back quite a ways, even 30 years, but for a long time, there's a lot of folks that need access to the data of the business. Typically something, at least in the last 10, 15 years stored in a data warehouse they want to be able to access it, but maybe they don't have quite the skills as like a deeper data scientist or, somebody who's really comfortable, like running around a database and yanking out what they need.

And yeah, there's this de facto thing that emerges where people roam around and know who to ask to get stuff done. So that's kudos to you for, Seeing that as a problem to be solved in the larger sense than just your immediate job, that's pretty awesome.

Sarah: Yeah. Thanks. I would even go one step further and say yeah, some business people know how to work with large datasets and, databases.

And that's great, but it's also just a really hard skill set to, to learn and all that knowledge of the data, just, staying up to date about what it all means. I just feel like that's a full time job. And in my opinion, it's just not reasonable to expect someone who's, whose job isn't working with data.

I think it's unreasonable to expect them to be, that level of data literate. That's almost like just telling someone they need to have two jobs. Whereas I think by abstracting a lot of that kind of in the weeds knowledge about the data all of a sudden that becomes a lot more reasonable. Now you're just expecting a business person To be deliterate, that's not going away, but just the AI is able to automate all the little details so they don't have to remember it.

And that's the big difference.

Deep: Walk me through what a new customer does with seek. you got to plug in your data assets into the system somehow, maybe walk me through how that works. Sometimes it's not all consolidated, a lot of times in these orgs, you've got data all over the place.

Like, how do you address that problem? maybe Walk us through the ingress process and the plugging into the system process.

Sarah: Sure. So the main way that seek works with data is by connecting to the source where most of the larger data sets live, data that the data team is usually going to write queries or do analysis.

So typically it's some sort of data warehouse or data lake, like snowflake and databricks are two most popular connectors, we can connect to pretty much any database. Another thing that our customers really like is that seek can actually connect to multiple sources of data. It's not unusual actually that an enterprise will have both snowflake and databricks.

That might be surprising to some, but that's pretty common place from my experience. And seek can actually connect to both. So that's kind of a, feature we have that customers like, that's the main way that seat connects to the data from there. There's all these little features that our customers can configure.

We actually have quite a few enterprise customers at this point. Many are as large as the fortune 500 or even, fortune 100 or fortune 50. And so we're pretty used to a lot of governance requirements, just things Having multiple user groups and, permissioning different subsets of data to each user group.

And another kind of cool thing about Seek is that we actually are multi agent platform. What, part of what that means is that we're Is that you can actually create different, agents, so to speak for different subsets of data that can work with different user groups. And so that's another kind of cool feature that we have that we set up with our customers in the beginning.

Is actually helping them configure, these different agent type systems to work with, these different groups within the business. We expect that to expand over time as well within a business.

Deep: So walk me through this agent concept a little bit what is it that makes an agent?

Is it, there's a different discourse style and like presumption of knowledge with that particular set of groups. So the LLM kind of talks to them differently or something or is it access to different data sources in particular that this team particularly works with this for these tables or that or that warehouse or is it something else?

Sarah: our purposes, it's pretty much all of the above. I think the biggest differentiator is what data an agent is connected to. So a business, for example, could have An HR agent, a finance agent, a procurement agent just to name a few. And the biggest differentiator for each of them is just what kind of tasks they can help, different user groups with.

And of course, you definitely don't want the procurement people necessarily talking to the HR agent, because they could ask questions. Questions about very sensitive information, like people's salaries. So that's kind of, the biggest differentiator actually is a lot of this, governance and data access type stuff.

Deep: Uh, That's interesting yeah, that makes sense, you have to respect the permissions on the data at the conversational level. So maybe walk us through a little bit about how the system achieves that. So are you guys built on top of. Open AI or one of the large lang models. Are you building, are you using, your own private models, are you doing some kind of combo of Lama three, locally for super private data versus, going up to the cloud for, less private data that you can maybe anonymize.

Sarah: Yeah. I think the strength of seek in particular is that. We've always been a research lab, even from the very beginning. Of course we've always also, like worked with customers very early on and, ensure that the product was delivering value. There's a big commercial aspect to the company as well.

But I think the thing that's cool about seek is we do have an AI research lab. And even with my background as quantitative analyst, that plays a role. Cause I, even to this day, I'm still involved in the research. So part of what that entails is we actually do have a private LLM that we trained ourselves, which actually outperforms GPT 4 on our use case.

But at the same time, our philosophy in general is You know, this is a hard problem that we're solving. That's also part of our competitive advantage is, all the kind of research we've done in this space because it's difficult. So the barriers to entry are pretty high, but in this type of situation, If a model works well, we're not just going to ignore it because, oh, we don't partner with that third party company or whatever.

We'll test anything that claims to work and if it works really well we'll definitely, include it as an offering of a deployment of our product. But the last thing I'll point out is there are many different ways to deploy Seq. We have some customers that. Actually refuse to work with the big, mainstream AI companies like open AI, but they'll work with us and use our private LLM, which is very interesting.

It definitely shows they put a lot of trust in us. But we can do things like deploy a private model inside of For example, our snowflake snow park container services, native app. That just means like you can have an entirely private version of seek living inside your snowflake instance.

So that's a popular option with enterprises. But we have other options too. Like we have some customers that use open AI. And they don't want our private LLM. They want us to plug into open AI. And so we can do that type of deployment too. So that's probably where we have the most.

Deep: Wow. That's yeah, that's a few pretty different things.

Maybe it would be helpful to, to understand how this things. Trained and maybe even before that what are the commonalities across enterprises in terms that a model like this would have to learn? So I'm imagining, you're a business analyst.

So you've got a bunch of tables. You get a bunch of variables. Sometimes you have. aggregations, like you're going to do all the kinds of normal operators you could against them. But at some point your system needs to be able to take whatever they verbalize and translate that into SQL query, if I'm not mistaken.

The model's got to be able to guess that like certain table names like X, Y, and Z are most likely mapped to this or you just have a more thorough configuration process where you really know what. What all the tables and columns and everything actually mean at during configuration time, which may be a LLM assisted to config.

Yeah, maybe walk us through that a little bit.

Sarah: Yeah, no, it's a great point in our world. The training data is everything because you've got to teach the AI somehow, say you have two columns that are just called like X and Y or something, they both actually mean customer name.

And only one of them is actually one, the one that gets used that might sound really messy, but that's really the state of most companies data. And so it's kind of a communication problem in a way, like the same way you'd communicate all this kind of stuff to a human. You actually need to communicate it to the AI.

Otherwise, how is it going to know all that information? So that's a bunch of features that we also have in our product. That we work with our customers in the beginning to connect. Something I will point out is I do think a lot of companies we work with want something that's easy and they want something that's scalable.

Like something we hear a lot is we don't want to be constantly, retraining the AI. As we roll it out to more and more users, a lot of our customers want to scale to like thousands of users someday. And so how do we enable the data team to just build something once and then be able to scale it?

That's another piece of how you train, seek, or communicate all, all this knowledge of the data to seek in the beginning. So what that looks like is. There's a wide variety of sources of data that you can connect seek to, there's the obvious ones the database itself, which has a lot of rich data about the schema and everything.

There's historical queries that we can leverage. We also can, get training data from the customer either just by asking them, hey, can you give us some examples or the platform also just learns as it goes. It just automatically co collects training data so it gets better and better as you use it.

And then, there's also like metadata, semantic layer, data catalogs lot of ways that you can house all that knowledge and seek. Then just. Teaches that the LLM and other parts of the model how to use it.

Deep: it's interesting that yeah, different organizations that X and Y example you gave this happens a lot, right?

There's multiple tables, there's stuff called. like some common variable name. So there's all this tribal knowledge that typically evolves in an org where people who manipulate and use the data a lot. Know that. Oh, yeah, that's an old table from 5 years ago. Never uses that or yeah, people keep using that.

But that's wrong. We've revised it. So I'm trying to understand what that interface looks like. I'm imagining you interact with people who understand the data during this configuration sort of stage, whether that happens up front or you have Moments where you revisit it as new data sources come online.

what does that look like a little bit? Is that structured like a dialogue between the data user and the data assets and you're, and they're just, and you're just the bots talking to them to help understand stuff and you're presenting, artifacts from the actual data set to them and saying, Hey, I think this means this.

And I've got these two variables that look like the same thing. And these, and asking questions, is it like that, or is it more like traditional user interface configuration that I would imagine?

Sarah: Part of it is user interface configuration. Part of it is. Making sure seek is connected to the right sources of data in the beginning.

And it also depends on where the customer is in the process of making their data machine readable. I think in the future, making data machine readable doesn't have to be structured data, it'll also be unstructured data. But that's going to be, I think, more and more of a really big priority for most companies going forward.

what I mentioned earlier about communicating the meaning of all this data to the AI. these are kind of the same things to me. And so Seek has features that can help our customers make their data machine readable if it's not already. But we have other customers that, for example, use data catalogs, and they were using that kind of software before they started working with us.

If the data is already in a software like that, it already is machine readable in its own way, and that it's been cleaned and it's structured. And that's, that's something that we definitely want to connect seek to so it can see all of that. But, say the customer has that kind of messy data and they don't have.

Anything they just have like the data and a bunch of just Jupiter notebooks or, a ton of snowflake IDE tabs, called worksheets, just open with a bunch of scratch paper type code. We've encountered customers like that before as well.

And in that case, we have to just, work with them to get the data in a format where Seek can understand it. And it, we're also trying to not make it super painful. Nobody wants to spend, a lot of money and a lot of time, just cleaning a bunch of data to AI, if they can help it. So we try to make that process as easy as possible by doing things like.

Just connecting to the database, for example, and getting as much data as we can out of there that we can just turn into training data for seek without the customer doing a lot of manual.

Deep: going back to the training data question a little bit like what does one piece of training data look like?

Is it natural language query from analyst and resulting SQL query and good bad? Is it as simple as that do you have a, is it something else?

Sarah: It could be a variety of data points. The one you just mentioned is a perfectly fine example to, provide seek. Examples are really helpful because they're so rich in terms of all the just, entities that are contained in the, the natural language request and all the little pieces of the code that's being generated by the data team.

Even just one example is so rich with all that information that it's really helpful to train seek, but there's other, other things that we consider training data as well. For example. Metadata or data in a data catalog, like what I mentioned, or a semantic layer, semantic layers are becoming really, really popular these days because how great they are at storing all these relationships.

this is the beginning of what I think will be a greater and greater initiative of Getting all the data, into a very structured format that can be used to connect it to a I. So that's another great source of information. It's very different from a data catalog.

I could talk about that for a long time, but I won't tell

Deep: you a little bit more about this semantic layer. Can you give me an example? What exactly do you mean by that?

Sarah: my definition of a semantic layer is it's a knowledge graph that contains relationships between business logic and data.

So it's different from a data catalog in that you're not just defining like an ontology of what, database schema table, column all the, all that definition type stuff, that's what goes into a data catalog. Semantic layer is more about things like metrics, business metrics, how are they calculated?

Yeah, so that's the biggest difference in my mind. So you really need both. If you want your data fully machine readable, I do think you need both right now, could change in the future, but that, that's what we see in our most advanced customers right now.

Deep: Got it. This is interesting.

So, when you're talking about metadata around the data assets inside of the org, you're talking about things like metadata at the table level and having that in a structured format that's Ideally standardized and your system's able to go in and read it, maybe describe the state of standardization there and across the different tools and different platforms.

I imagine that it's, that's still quite messy for you guys, but maybe describe that and describe the landscape of. How money imagine you have like customers at different ends of the spectrum on one end where they're really meticulous about defining their data dictionaries in some structured open standard way and others were maybe not at all.

And maybe some that are going really far and really annotating down to the table column variable, maybe even like population of data inside their level.

Sarah: Yeah, I would say we do work with customers at all levels of data stack maturity is I guess what I would call it. On the kind of most advanced data stack maturity, that's where you have pretty big data teams.

You have the company investing a lot into these teams and The teams are using a wide variety of tools in the data stack, not just the basic, what people used to call the modern data stack. Now that phrase is really changing a lot. But it used to just be, data warehouse ingestion layer, transformation layer, BI layer.

The, these kind of more resourced teams have also things like what I mentioned, data catalog, semantic layer, usually they're looking for AI solution like seek. On the other end of the spectrum, we have those other kinds of companies that I mentioned earlier, the ones that, Maybe have like one data person serving like a hundred business people and they don't have a huge data budget.

Maybe the C suite is still wanting to work with AI somehow with their data. Seek can also work with those types of companies too. That's where we just have features in our product. That have that kind of basic stuff that you would need. Like we do have basic, metadata features in our product, for example, or, ways to kind of build up a.

Starter semantic layer, if you will in the product, these are the ways that we can bypass those kinds of requirements for these kinds of smaller data teams that are, maybe earlier on in building out their data stack,

Deep: Maybe let's switch gears a little bit and talk about the end user a little. So end user comes in starts using the product. Describe their experience a little bit. What do they see the first time they get in? Let's call it a non-data savvy person, maybe a mentor member of the HR team or the finance team or something.

What do they see? What are they interacting with to get going?

Sarah: I do think it depends on the goal of the customer. Sometimes the goal can just be, we want to improve data literacy. We want to create more citizen data analysts in the company. That could be a goal and that's actually potentially slightly different user experience than.

Another type of customer that might just have the goal of Hey, I don't want to be, spending three hours of my day anymore, helping with all these random requests, I want to delegate that to, to seek I can talk a bit about each example and, how you can use seek in slightly different ways depending on the goal you're trying to achieve.

If you're just trying to increase data literacy. It's pretty easy to give your business users seek you just give them a log in. And then right when they log in, it's just very intuitive. There's really not a lot of training needed. You just see a search bar, it's not too different from chat GPT type of interface, you're also seeing suggestions of what you could ask.

And those suggestions could be insights about the data that might be interesting to you. Or it could even be just basic questions about what kind of data do you have? And another thing is you can just type in whatever you want, sometimes we see users, just the first thing they type is hello.

We want to be able to handle, kind of anything that comes seeks way and be able to just let them feel like they can trust, seek to tell them things like. What's in the data, what kind of data do you have access to? And then like getting more comfortable with the insights, either by looking at insights that seek is recommending once you get comfortable with that, what's in the data, you can even start asking, seek your questions in natural language and chatting with it.

But we try to create that kind of guided experience to these kinds of citizen data analyst type users. That's the goal of that type of use case.

Deep: Imagine you can also like like a new member of a team the system when you talk about suggestions, I imagine it's leaning into other users of that, I think you call them agents or subgroup or whatever.

But it's commonly asked questions or whatever are going to start to elevate. So it's going to get the idea. They're going to get the idea pretty quickly.

Sarah: Yeah, that's the goal is, not to require extensive training and, it just, it needs to be extremely accessible for business people to be able to use it for the same reasons I mentioned earlier, business people have enough on their plate.

They're not paid to be data analysts. And I just think it's unreasonable to expect them to know how to do, you data analysis. I think it's reasonable to expect them to be data literate and know what kind of data there is and know how to query it. But I'm not saying they need to know how to query it with code.

I'm just saying they need to know how to query it somehow. Which could be with Seek.

Deep: what about visualizations and charts and stuff? Did you guys have to bring in all your own charting and visualizations for interpreting the data? Or do you have some kind of notion of plugging in third party visualization systems that folks are already used to, like Tableau and stuff like that?

Sarah: I think that what we've seen is that, the core functionality of Seek is AI. We're, not afraid to just say we're an AI company, we have an AI research lab. The core functionality is AI that can work with large datasets and make data more accessible to business people.

And that's enough, that's what we've learned over the years is that functionality is enough. It didn't exist before large language models. It's, improving very rapidly now, and seek, I would like to think is leading kind of that development. But yeah, I think what's been a great kind of like a pleasant realization with our customers is if the product just does its core functionality.

It doesn't need a lot of bells and whistles. So what that means is we do basic stuff in the platform that can do visualization. But it's just not something we spend a lot of resources on right now. And we do want to partner as much as we can with just all these other players in the modern data stack, where this is their core functionality.

Like we don't want to rebuild the wheel.

Deep: I see. So what I'm reading into there is most of the responses are actually texty, but there's some viz, but not like heavy viz. Is that fair?

Sarah: if you want like 47 different colors in a chart, just export it out of seek and put it into Tableau or something or Microsoft Excel or Google Sheets.

But if you want just solid visualizations in a matter of seconds, for questions that would normally take weeks, to get an answer from the data team, that's re that's the real pain that seek is solving. I think like just getting the data to the users visualizing it pretty well, people are generally pretty good at working with Google sheets and Excel, if they want to really play around with the visualization and the charting, they can just export it out of seek and, as a CSV and, use these other tools if they like.

Deep: Gotcha. Yeah. That makes sense. So you're servicing more naive users as far as.

Let's say performance is concerned. Do you wind up with the system sometimes? Just like fundamentally changing the loads on your snowflake or your data instances a system that's that flexible can come up with really bad ways to retrieve data. Bad meaning really slow or time consuming. Has that been an issue at all? And maybe you can walk us through some of that stuff that you guys do to address some of those issues if it is.

Sarah: I would say that we've come across in the past data teams that didn't fully optimize the queries that they used to train Seek. Seek, it's an AI product and to get it to work its fullest potential, it is good to train it with high quality data.

If you're training it with really long queries that haven't been optimized. It's just going to start imitating, whatever examples you give it. I think it's a good way to think about it. If you have a thousand line long SQL query, It is going to start writing queries in that style.

So I think the thing that's been the most helpful from my perspective, just making sure that the system is as optimized as possible is helping the customers even optimize, their queries in the beginning by just talking with them about best practices. We've actually done this with a couple of customers that had really long running SQL queries.

Actually, We're still pretty hands on type of company. We like to talk to the customers a lot and understand what they're trying to do. And we have advised a couple of customers on like, Hey, you know, if you want to seek to have faster performance would you be okay?

giving it data that does these kinds of queries differently. We have done that kind of work as we've onboarded customers in the past.

Deep: that makes a lot of sense. a little bit about how you the responses from the system. So somebody asks something in order to like interpret.

And answer, you need to know the source data. You need to maybe know what columns were that you're working with and what they mean. Like, how do you go about contextualizing and coaching? Because I can imagine a system where more let's call them less data savvy users are jumping to conclusions that maybe aren't actually, There to be jumped to, but look like they are.

Do you have scenarios like that? And how do you like anchor the responses so that all of the info that they need to actually interpret it well is provided.

Sarah: Yeah, no, it's a really good point. It reminds me of, I think earlier this year we had a lot of, major CEOs of, huge huge public companies.

Or even maybe open AI, I forget if they had said this as well, a lot of people are saying like in five years or however long, like AI isn't going to hallucinate anymore. And when you really think about what that means, like AI just, a hundred percent being guaranteed not to hallucinate.

I think my thought experiment type question is what if you ask an AI, a super vague question, say you are asking a question in seek, for example, about some data. But you ask a super vague question, like what happens if you just type like revenue, like one word in to seek, what should it do?

I'll tell you what it absolutely should not do. Which is jumped to conclusions. That was one of my first realizations building the product, even like back in 2021, when I was just one person in an apartment, like coding seek for the first time. The first thing that I built, like one of the first features.

Was handling system for these kind of questions that are very likely to result in wrong answers. My answer to that question of, what would a perfect AI do to vague questions? I think it has to be able to disambiguate the question fully.

Deep: Sure. Yeah. What do you mean?

The same thing a great data scientist would do is what do you mean by that?

Sarah: Exactly. Yeah.

Deep: And then you have a dialogue and then you can interpret from the dialogue.

Sarah: Exactly. And it's funny cause I've seen that in my professional career as well. That's what differentiated the strong data scientists from the weaker data scientists is you know, and especially people earlier in their career too.

They're really excited. They're really eager to like, Get the work out fast. They won't disambiguate a lot. They'll just talk to the business person and, think they know what, what's being asked, do all this work and then like present this work. And they totally misunderstood the question.

Deep: A huge part of that, this is a bit of an aside, but my belief has to do with how engineers are trained. we don't get a lot of points and at least an undergrad for asking like why are you asking the question? What does that question mean? It's very much.

Rapid machine gun fire of hard problems believed by, whoever's providing them to be well formed and you just shut up and go solve them. And I think that's actually a problem. I think that's not a, that's it's a suboptimal way to train engineers and computer scientists. I think we need way more, open ended questions and like the, and why is, but anyway, that's a bit of an aside.

Sarah: Yeah. And I, I do think I'm not sure how familiar you are with RLHF and things like that, but, that's part of the reason chat GPT is the way that it is. And that's also, what's exciting about, newer open AI models Oh, one,

Which I think, I haven't necessarily tested this capability in depth enough to like make any remarks on the performance based on what I've seen myself.

But I think it's going in that direction of just thinking more questioning what the users is trying to do. I think I heard, Oh, one has a little bit of that, but like the RLHF of, the older large language models it's trained to just just be obedient,

Deep: very much.

Yeah. You, You feel that, right? One of the things I do with most of my prompts is certainly with the fuzzier stuff is turn them into conversation, like always answer, always end by asking one question. So it, it forces it into dialogue mode. The template I've seen to be quite helpful in many cases is have a dialogue.

The system asks questions to like help tease it out and then mind the dialogue for what you actually need to know. And that rough template works in a lot of scenarios. One question I did, I wanted to get back to something I, It sounds to me like there's multiple levels of learning that happens in your system.

sounds like you're rolling your own LLM. So there's that generalized learning. There's probably, I'm guessing, but I want to understand it better from you. Some cross company level learning that is a little bit more meta in nature.

And then it sounds clearly at least if I'm hearing you right that once you're in a particular company or subgroup, you're actually asking very specifically for them to tell you examples of questions to SQL queries. That roughly right? And can you provide any more context and color about how you think about the different levels of learning in the context of your business?

Sarah: Yeah, I think I would clarify maybe a couple of things. The first thing is we actually don't do any cross learning across customers, which might come across as a bit surprising, given that as we work within a certain vertical, you might think Oh, seek is learning more and more about that vertical, we just can't do that because there's too many privacy concerns.

We have very strict privacy policies of actually making sure every customer's data is completely separated from other customers and there's no training of a base model or anything. Anything dimension specific, we have to just get that data ourselves and, incorporate that.

Deep: That makes sense that you have to get it yourself, but given that you get it yourself, are you doing domain specific training or just training in general about converting some texts cause I would imagine you would take like a higher stack model. Fine tune it for scenarios across a bunch of different data structures to formulate queries and then go in and fine tune further on behalf of a particular customer.

Am I getting that wrong?

Sarah: I very much think that's the direction we're going in. Especially as we grow and we start to see more and more of the same types of use cases of the product, that just gets us thinking like, okay, what can we do to focus more on making these types of use cases successful?

And that's where we would, for example, find our own types of training data to. Like prime the model before a customer provides it with their kind of training data. Could be variety of methodologies that we use to do that, but yeah that's an example of what we're starting to do.

Deep: So I want to switch gears a little bit. You mentioned that you were, you're a tech founder. You were writing the, it sounds like you were writing the very first prototypes and everything. Based on the timeline, it sounds like in the heart of COVID. Walk me through, what's it been like for you to go from the journey of, One, two, a team with funding with a bunch of customers what's that journey been like for you?

Sarah: Yeah, no. Seek was my first startup as the founder and CEO. And I'd never been a CEO prior to starting Seek. All I knew, like I was a, I was just a researcher.

So I had led data teams in the past, so I, I knew what it was like to lead teams. And, my hobbies, I've had experiences with leadership music, like being in a band or film. I used to be filmmaker, like in directing a crew, you know, so I had those kinds of experiences.

But, definitely none of those really prepared me for, what it's like to be

Deep: Nothing's gonna prepare you for the world you're sitting in right now.

Sarah: Yeah, so I mean, I had to learn, pretty much everything. Anything that wasn't related to the AI itself, I had to learn. And I would say a lot of it was trial and error.

Some of it was trying to do as much research as I could and definitely surrounding myself with just the smartest people that I could find that were willing to talk to me. But it ultimately ended up just being a combination of those things. That phrase is really true.

Have you ever heard someone say If you're not embarrassed by who you were like three months ago, You're not growing quickly enough. that is a a good benchmark, I think.

Deep: Yeah, that's actually a, that's a nice saying. I like that. Yeah. Your day to day must be quite different now tell us a little bit about the ballpark of your size and and like your role now versus before and whether it's.

More fun, less fun or there's stuff you'd love stuff. You can't stand. Yeah, all that.

Sarah: Yeah. So when I started seek, it was. Basically September 2021, and I was one person bootstrapping in a upper Manhattan apartment. Today, about three years later, we're a little less than 20 people.

We do have an office in Tribeca now in downtown Manhattan and yeah, like you mentioned, we have a product it's deployed in customers. Some of those customers are very large. So I would say we've grown and we're VC funded as well. We've raised, over 10 million from battery and conviction are our biggest investors.

So yeah, I would say we've definitely grown a lot. A lot of that growth happened in the last two years. It does just grow faster and faster every six months or so.

Deep: Yeah. I mean, it sounds like you guys are experiencing a lot of traction and it only gets crazier.

Sarah: Yeah.

Deep: that's good though. It's good, crazy, but it is crazy. So I like to end by asking a kind of a future facing question. So let's jump out, five, 10 years into the future. All the stuff that you're, you and your team are like working hard on gets realized.

Let's say the industry evolves to, we have all this machine learning and AI evolving at its pace. We're in a, probably a post LLM world at that point. What do you see out there? What's, what does the world look like from your vantage?

Sarah: I heard an interesting quote a few months ago, it was actually someone complaining about AI saying I thought that AI would be doing the dishes and folding my laundry so that I could write poetry and, paint paintings, but instead, it's the other way around. It's writing all the poetry and I'm doing all the dishes.

So I don't know. Will that trend persist into the future? I hope not, I hope that humanoid robots continue to progress the way that they have been and, that AI can just continue to do more and more of the kind of work that we want to delegate to AI so that we can, do work that feels less like grunt work and more like the work that matters.

That was the kind of ethos that I've always had starting Seq. Like our initial website before we bought the Seq. ai domain, it was SeqWhatMatters. com, focus on the things that matter. And it was always about you being able to delegate to Seq all the just manual work you don't want to do.

And you're, if you're a data scientist or a data analyst, you're just wondering, Wow. I got a master's in physics and went into the industry. So that I could be like, answering these random questions like this is my job, That's where it all started is, I really liked the idea of allowing those types of people to be able to delegate that kind of work.

So they could do the work that they set out to do. And I hope that we'll see that persist and grow as AI gets better and better.

Deep: So outside of the data arena a little bit and I'll tell you where this comes from. So I was recently, I was on a podcast and I got interpreted to be like a techno optimist, which apparently was like a really bad word in the circle.

And so there's this new thing roaming about that a bunch of nerds, like I would probably put both you and I in this category have. Like tunnel vision, like the stuff that horses wear when they're walking through the streets and we build, we stay focused and we build the thing that we're building.

And then there's all of these artifacts that kind of shake out after it. I was honestly, I was surprised by it, but it makes sense that, that I would get accused of that. Cause I, I generally do have a fairly optimistic view of the future, but it's anchored in a lot of acknowledgement of pain and difficulty.

Would you put yourself into a similar category and do you think we as technologists, you think we have a lot of blinders on and maybe from your field of view like, what are they? Cause because machine learning AI, I mean, it's, radically altering the way humans interact with one another.

I would say social media did too. And I'd say the internet did before that. And I would say, media in general did before that mass media in general. And even the written word long before that, like, how do you process stuff like that?

Sarah: I do see why people would say that about those building AI because we're very small group of people that chose to actually build AI.

And there is a certain culture in places like San Francisco where it's pretty easy to get sucked into that kind of bubble and forget that there's other types of people out there than, just people building technology, which is, very small group of people. But it's just really important to me, we're a B2B company, so this isn't as applicable, but for us, something that just really helps is talking to customers, absolutely can't just get sucked into any sort of bubble ourselves and build a product for a customer that doesn't exist, we need to know What are the people that are actually going to be using the AI? What do they want? And I just think that's important for everyone building with AI is just listen to what are the lives like, what are the needs like of the people that'll actually be using the AI and also how can we just get them to contribute to it?

Just because someone's not writing the code, it doesn't mean they can't play a role shaping the role that AI is going to have in the world.

Deep: All right, cool. Thanks so much for coming on. I think this has been a really fun conversation.

Sarah: Yeah, no, this was a really great discussion. Glad we got to go deep on a lot of things related to this.

Yeah, thank you very much for having me.

People on this episode

Deep Dhillon

Host