Rev Tech Betsy Peters and Yin Yang

Democratizing Access to Data

In this episode, Yan Yang, Chief Data Scientist at Deserve, joins the Rev Tech Revolution to talk about how to go from big data to good data, how good data teams are like internal consultants and the importance of making your data models fair and nondiscriminatory.

Guest: Yin Yang, Chief Data Scientist at Deserve

Yin Yang has a Ph.D. in Computational Engineering from Stanford with a focus on empirical data modeling and statistical analysis. He is passionate about ensuring everyone has access to good data to help solve big problems.

Subscribe and listen to the Rev-Tech Revolution podcast series on:

Riva Data Revolution podcast series on Spotify Riva Data Revolution podcast series on Apple Podcasts Riva Data Revolution podcast series on Iheartradio

Spotify

Apple Podcasts

iHeart Radio

Prefer reading over listening? We got you covered!

Intro (00:04): Welcome to the Rev-Tech Revolution Podcast. Today’s episode is hosted by Betsy Peters. She is joined by Yan Yang, chief data scientist at Deserve, to talked about how to go from big data to good data, how good data teams are like consultants and the importance of making your data models fair and nondiscriminatory. All of this and more on the Rev-Tech Revolution Podcast.

Betsy Peters (00:31): Yan, it is a true pleasure to meet you. Thank you so much for giving us a little bit of time here at the RevTech Revolution.

Yan Yang (00:39): Thank you. Nice to meet you too.

Betsy Peters (00:41): So can you tell me a little bit about your background and your career journey and how it pertains to your work at Deserve today?

Yan Yang (00:49): I had my PhD studies in Stanford, in computational engineering, about more than 10 years ago, and that is quite an obscure word, computational engineering. When in fact that I was more focused on operations research, doing a lot of public policy analysis, involving, running through a lot of complex statistical models to evaluate different kinds of public policies, mostly in the healthcare and security area. And then apply the most optimal ones, and finds recommendations. After I graduated, I worked first at Yammer, which was acquired by Microsoft and then subsequently as at Wealthfront. And in both companies, I worked in data infrastructure side, learning how to build a big data infrastructure, as well as how to design a data system properly.

Yan Yang (01:39): Then next I went to Salesforce, where I worked on the Einstein platform, that platform is trying to gather all the CIM data that Salesforce has, and try to design an easy to use system. So end user can pull all the data in and build all sorts of AI models themselves, without a lot of expertise in this area. So it’s more like an AI platform. Some of a common thread throughout my PhD study, and my first few jobs, is that there are a lot of focus on democratizing the data, even the analytics, or even the AI, within the company or organization. So it’s trying to empower the users to perform a lot of these tasks that usually are reserved by experts.

Yan Yang (02:23): I mean, and people in the company can learn how to ingest the data themselves, or raw analytics and even get some insights out of it. So that’s actually a very interesting point. And in this process, I was actually working with the founder and CEO of Deserve, Kalpesh. And in this, all the time as , part-time help. I worked with a lot of projects, including the very first underwriting model that Deserve uses to underwrite international students, that come to us without a credit background. That’s how Deserve was first started. And then in 2018, I came to join Deserve full time, and to lead data science and data engineering team.

Betsy Peters (03:08): Yeah, those are some impressive credentials. And clearly you’re very under-qualified for this conversation. Tell me a little bit about Deserves value prop and why you decided to join the team. It’s got a bit of a social mission too, it sounds like?

Yan Yang (03:23): Yeah, so Deserve first started as a B2C company and we issue credit cards, for people who come to US, who don’t have a credit background, international students. The founder used to be an international student. I used to be an international student. You came to the US without a lot of credit background, and you can really get access to the financial system very easily. You can’t build your credit score. And as many of us know, that without that, you cannot really get to your next step in terms of a lot of financial planning, and get a good hold of your financial life in the US. So that’s where we get started. And over the years, we have transitioned into more of a B2B business, where we offer a modern credit card platform. And what does that mean? For example, if a company wants to issue their own credit card, today, they will usually choose to go to either a big bank, like Chase or Barclays, or to process like First Data, who can offer them integration, for them to build their own solution.

Yan Yang (04:27): So Deserve offers that modern platform that can help them to do the whole thing end to end, and starting from signing the contract, you will get a full credit card program running at the end of the day, at the end of the period. I mean, technology-wise, a credit card is a very much a product. It has been here say, I don’t know, 40, 50, 60 years. I mean, and there’s another, a lot of things changed in the past decade or so. When I say a modern platform, what I mean, are the things like in-cloud infrastructure, like microservice architecture, like real-time data or real-time decision. All these are hallmarks of modern day technology companies, right? I mean, and we are trying to bring all this into the credit card space. What this means, is that it will unlock a lot of features, and a lot of things, benefits that previously was not commonly seen in this field.

Yan Yang (05:26): For example, we have unparalleled to market speed. You can launch a credit card in very short period, because of all the flexibility, and all the infusion of modern technology stacks. And also we have a lot of flexibility in API integration. You can present to your customer a front that is truly your own customized journey. And it’s not something of a big bank, fronted by a big bank image or something like that. It’s your own marketing brand, your own user experience. Also we are having this so-called digital first platform, where we put smartphones as a center of the entire experience. I mean, traditionally you used websites to integrate with a lot of your credit cards, all of your credit cards, or any access through the phone is afterthought so to speak, right? I mean, we are in a modern age, where people interact more and more on their phones. So we want to put the smartphone at the center of your on-time experiences. And that is a very new value proposition.

Betsy Peters (06:30): It’s fascinating. Are you enabling others to do tricky data sets, like what you started out with international students? Is that part of what you’re doing? Or is it much more for the average credit card issuer?

Yan Yang (06:48): That’s a good question. So, as I said, it is an end-to-end credit card platform, right? I mean, that includes a lot of difficult problems you will have to encounter when you are trying to issue a credit card. Underwriting is undoubtedly one of the more thorny problems in the process. I mean, so we offer consultants in the area. We will help you to shape your underwriting policies. We have experience with underwriting international student without credit experience. We have experiences in underwriting, sub-prime people, super prime people, and we even have a business card program. So we can underwrite different segments, and we have a lot of experiences in doing so, and we will offer all these learning as a part of the platform package.

Betsy Peters (07:33): That’s a really cool value prop. So I’m curious about how the things that you worked on in Stanford, when you were doing your PhD on public policy analysis, basically connects to the problems you’re solving today at Deserve.

Yan Yang (07:51): That’s a very interesting question, Betsy. Not many people ask that, because what I did in PhD is not directly related to finance, but there is actually a lot of connections in all of that, right? And I mean, that’s more than decade ago. People used a lot of complex models to assess performance, and there are a lot of potential solutions. And as you may expect, some of the most performing solutions, usually are the most costly of having the most side effect. And you have to make some kind of trade-off to arrive at a sweet spot. Then fast forward, 10 years, we have machine learning. It’s all the rage right now. And it actually greatly improve the efficiency, where you can find that sweet spot. But the fundamental problem is still the same. You have the so-called constraint-optimization problem, which translated to normal layman’s term, is just, you want to optimize something, but there are some other criteria you have to fulfill, right?

Yan Yang (08:44): I mean, you are trying to trade off things, for example, in Deserve, when you’re doing underwriting, you are trading off how many people you can approve, versus what’s the default rate of your customer base. I mean, you cannot get all in on one aspect. You have to find a sweet spot. I mean, economic people always say that trade off is a currency of all decision-making.

Betsy Peters (09:05): Yeah, absolutely.

Yan Yang (09:07): Yeah, it’s basically trying to find that good spot. And sometimes to find that incremental 1%, 2% improvement can be very beneficial for the entire business process. For example, if I can keep the approval rate at the same level, but reduce the default rate by half a percentage point, I can adjust my model, to say, “Instead, staying at the same risk level, but then we can approve more 5% people.” And all these incrementals are actually extremely hard, because you have a lot of, you have all the low-hanging fruit taken already, and the model and all the analytics are trying to find, “When you’re doing very well, how do you get an additional 5%, 10%?” Yeah, and that is actually a common theme across all the quantitative analysis I have done in my PhD days, and in all my jobs.

Betsy Peters (09:57): Yeah, that’s interesting. And that’s something that we all wrestle with in one way or another, particularly the listeners of RevTech Revolution, there’s huge data sets and learnings that can be gained by putting constraints on those data sets. And trying to find that sweet spot around risk or around cost, whatever the case may be. So I think one of the things that our listeners might be interested in learning from you, is, as the head of a data center of excellence, what are some of the important things you bring to this constraint question? And then as a follow-up to that question, what are the important things for other folks who are in your shoes to do, to make sure that people trust the data?

Yan Yang (10:45): Yeah. So nowadays a lot of companies are striving to become data-driven, basically are making decisions, improving their business [inaudible 00:10:53] based on data, not based on heuristic decisions. We used to have a joking term called, “Hippo,” which is the Highest Paid Person’s Opinion.

Betsy Peters (11:01): For sure.

Yan Yang (11:01): And you don’t want that to happen, yeah. To have-

Betsy Peters (11:04): Oh, it still does, Yan, it still does.

Yan Yang (11:09): Yeah, so, I mean, it’s quite easy to say that your company is data-driven, but to get to there, there are a few things you should set up. In my experiences, one of the first things you want to work on, is that to set up, on organization-level, some kind of good metrics. How do you measure outcomes? How do you you make sure that everything is measured by the data? And if something is measurable, first question is, and if it’s measurable, how do we get that [inaudible 00:11:39] going? It’s a lot easier if you have a top-down push on making the entire organization into a data-driven culture. And the second thing that is of often overlooked, is that you have to reduce the cost of exploration, as not only a few people in the company has control of the data.

Yan Yang (11:58): I mean, they are the master of giving out all the information to the rest of the company, that will not scale your business. To scale your business in a data-driven way, you have to make sure that all different people in the company, of different functional groups, will have access to good data. They will even have access to a lot of good explorations. They can get to their own conclusion. In that way, the data team, in a lot of companies that are successful, I have worked with, usually revolves around the data team being internal consultancy. People come to data team with questions.

Yan Yang (12:31): The data team will work with a different functional team to get answers for them. But answers is not only deliverable of the team. A large part of deliverable is to train and educate other teams, and helping them to learn how to use the data, how to interpret the data. And more importantly, how to question whether we have the data necessary to make a lot of business processes successful? Because all these questions, most commonly, don’t come from the data team. They come from all the individual business functional groups.

Betsy Peters (13:03): Yeah. I like that. It sounds like the thread of democratization is in that part of your conversation as well, is really making sure that everybody has access to it. But I guess the corollary to me is how, once they have access to it, and they’re starting to try and answer questions, that are pertinent to their individual responsibilities, how do you make sure that they trust the data? Because it can be really difficult to make sure that you’ve got the accuracy, you’ve got the completeness, you’ve got all of the factors that yield good answers.

Yan Yang (13:39): Yep, so data quality is actually the thing that I emphasize most, when I’m trying to build a data team. I mean, this is often an area that is overlooked. Yeah, Betsy, you are just like spot on. A lot of companies are very hyper-focused in getting the data into the warehouse. So making sure they have the necessary data to perform everything. And once they have that data, they forget, they need to keep on checking the accuracy of the data, to make sure the quality is correct. And so one of the responsibility of a data team, in my view, is to make sure that you have a common data quality framework, that is run on top of all your data. The this framework should incorporate the business recommendations from all the business functional group. How to check whether data is correct. And wrong, all this [inaudible 00:14:29] periodically alert different teams if the data quality can be wrong, sometimes. I mean, and it in inevitably will, what can fail will always fail.

Betsy Peters (14:38): Especially with data quality.

Yan Yang (14:39): Yes, especially with data quality. And also you will need to establish proper data lineage. And that is something that is often talked about, because depending on use case, one piece of data may be present on different warehouses in your organization. It’s not in one place, it’s sometimes in a dozen places. So you did need to know like data A is derived front data B, is derived from data C. So you establish that a proper chain, and so that whenever one piece of data quality fails, you know that all the downstream data will need to be taken with a big grain of salt.

Betsy Peters (15:15): That’s good advice. Tell me what you advise teams on data governance. What does data governance mean to people who are starting to get that set up inside their culture? And what are one or two things you would recommend people focus on?

Yan Yang (15:31): Yeah, data governance is tricky, because it’s sort of opposite to the notion of having data democratization, where everyone can access. I mean, that’s, again, the trade off you have to do. I mean, you want as many people to access as much data as possible. But at the same time, you want to make sure that as they get proper access to the proper data-

Betsy Peters (15:54): They have to play nice in the sandbox, right?

Yan Yang (15:58): Yes, that’s right. So some of the things, the first things you can do, include you want to define the roles and data leads properly. I mean, who in your company requires certain pieces of data? Usually PII data personally identifiable data, like people’s phone, people’s email address, even people’s social security number are not required for most of the analysis, right? I mean, a lot of back-end analysis, you just need to know how to identify a specific person. Whether this person from this system, is the same customer from the other system, whether they are the same person. You don’t need to know what’s their phone number, what’s their email address, to do that analysis. So in these cases, we don’t want to ever even review those data. That being said, there are other use cases, right?

Yan Yang (16:45): For example, you do a marketing project. I mean, you want to reach out to the customer, obviously you need to have the email address or some sort of contact to do that. So all these roles in data needs to be defined by the infrastructure team and the security team joining. And to understand all these data in these warehouses, in these pipelines, are used for these purposes, and some others are for other purposes. And you should always question yourself, whether you need to have a certain piece of data stored. Especially if it’s a sensitive information, right? And when you are organizing your data warehouse, putting catalogs. So you know that this is level one access, this level two, whatever catalog system that you feel that’s reasonable for you. Then that way you can help to under understand what are the dependencies in between them.

Yan Yang (17:35): And then build up the proper data lineage, as I have mentioned. Sensitive data may drive other analysis downstream, and you want to make sure that they are in the same catalog as source data. And then for people who are new into this field, there are a lot of vendors out there who offer tokenization encryption services. You can take advantage of them. If you don’t need to store data in your own system, you don’t have to. For example, Deserve tokenize all our sensitive information, like social security number, like credit card number, with an encryption service. So people in Deserve don’t even see those things. We can’t even see those things. And so when all this information flow into the Deserve system, it’s already encrypted. So that actually removes a lot of hassle of yourself to manage this process.

Betsy Peters (18:24): For sure, that’s great. So it sounds like you’ve got the data governance thing nailed. But Deserve what are some of the thorniest problems you’re facing with data quality and when it comes to machine learning, and how have you overcome them?

Yan Yang (18:40): Oh, how much time do we have for that?

Betsy Peters (18:42): Right.

Yan Yang (18:44): Yeah, so there are some common things that everyone in the data field is suffering more or less, right? I mean, one of them is a lack of sufficient high quality data. We already went through that. I mean, how to, especially for small startups, it feels like a chicken and egg problem. You don’t have the product, you don’t get data, you don’t get data, you don’t have the product.

Betsy Peters (19:05): Yeah, right.

Yan Yang (19:06): So, I mean, it’s hard to break out of that loop, right? I mean, so it’s important to do some sort of iterative approach. Start with something small, even heuristic. The first model that Deserve uses, that tries to underwrite international student, is heuristic. I based it on a lot of social science papers, a lot of studies. And there are a lot of heuristic [inaudible 00:19:28], but once we start pushing the program out, we get data, then we can start to reinforce it, and we can make it more and more data-driven.

Yan Yang (19:38): And in that process, data quality is very important. You have to set the quality in place. You have to set data [inaudible 00:19:44], in the proper way. So, that your model is actually ingesting something other than garbage, right? Yeah, and that’s one of the problems. And another problem we usually run into, especially in highly regulatory industry, like lending business, is that you have to be able to interpret your model. I know that for companies that do not operate in such environment model interpretability sometimes is overlooked, because, “I mean, why do I need to interpret it, if it works?” This is often the [inaudible 00:20:17], right?

Betsy Peters (20:17): The black box works, so don’t open it up.

Yan Yang (20:19): The black box, yeah. The counter argument is that, yes, you don’t need to understand it if it works, but it will not work forever. One day it will fail, and if you understand how it got the initial results, it’s a lot easier for you to know how to fix it. Rather than just tinkering with the box. I mean, in recent years, there are a lot of research going into this field, and there are some standard practices, I mean, to measure the interpretability of your model. And I mean, it’s just nice for each data science team to invest some effort into it, to understand.

Yan Yang (20:59): I mean, starting from a simple model, don’t just go one step into the most sophisticated model you will ever have. Yes, you will get performance, but you will lose a lot of insights along the way. As you build a simple model, and then expand it, then you will come to realize what are the information signals that are most important, and then sometimes drives your other business process, “Oh, maybe we found some new insights and in the process of building that model.” And this is usually where the interesting things happen. I think that’s the two thorniest problems I’ve been facing right now.

Betsy Peters (21:33): So one of the things that our listeners experience around interpretability, is they’re buying software that has AI models in it, that suggests next best actions. So something along the lines of Einstein. To they’re looking at big data that is perhaps in a CRM, and starting to make recommendations on correlations and all of that. If you are using a third party software, what do you do about model interpretability? Like, how do you think about that problem?

Yan Yang (22:12): Yeah, there are a few ways of interpreting the model. I mean, including black box and white box approach. I mean, in a white box approach, in which you have to open up the model, and that’s usually happening when you are building the model in-house. But it’s not the only way available. In a black box approach, I mean, you still know that what kind of input gives you what kind of output. So you can still run analysis to see, “Okay, which are the attributes that most influence the output, and how we should do it?” And this actually ties to another question about whether your results are making sense, whether your results are fair, or are discriminating people? And that’s also another very hot topic, right? So all in all, you will need to always try to see if the model prediction is making sense, and what is it saying about your input, even if the model itself is third party.

Betsy Peters (23:08): Yeah, and if you’re doing due diligence on a piece of software that has AI, and is making recommendations to you, about your pipeline, your sales pipeline, what are a couple good questions to ask before you actually purchase?

Yan Yang (23:24): Obviously, you want to ask about the performance, and how the performance arrived, what’s a sample size of the data they used to get this results. But it’s also important to know that what do they use to get that model? Whether the input signals and everything is aligning with your use case. Your customer base will be very different, more often than not compared to the customer base they use to train their generic models. So a lot of these third party, generic model scores are very useful, partly because they are built with very large data sizes, that some of the smaller companies may not have access to themselves. But I’m always advocating using their outputs combined with your own data, to try to make further decisions on top of it. Instead of blindingly plugging that number into your system. I mean, try to understand how they get the number, and correlate that with your own customer base information. And then make use of that score more effectively for your own business use case.

Betsy Peters (24:30): Yeah, that makes sense. So let me shift gears a little bit, because we’ve been dancing around Salesforce, and you’ve got some background there. How is dealing with big data different at a company like Deserve, compared to a large company like Salesforce?

Yan Yang (24:45): Well, I mean big data in big companies are much bigger, that’s the first piece.

Betsy Peters (24:49): Right.

Yan Yang (24:50): So yeah, there are different priorities, for example, and you may not need the most cutting edge, big data processing technology, because your data skill is not there yet. But you do need a lot of the same good practice around big data inside the company, how you handle the data, storage, how you design the data warehouse, according to your use case. All this practice [inaudible 00:25:16]. What is very different in a small startup, compared to a big company, is that in a startup, things move 10 times faster. So sometimes you are forced to make delivery of a product or system faster than you can properly design a whole infrastructure. At this time, it’s useful to leverage some of the third party solutions or open source solutions that can help you to get there.

Yan Yang (25:45): I mean, and being in a small company means that you are more free to experiment with newer technologies, and you usually have a much more cutting edge technology stack. Not because you are small, but because you just have a lower cost of exploration. I mean, so that you should use that to your advantage. I mean, try to stay on top of what is currently most popular in the industry. Why people do that, why people design certain systems? Sometimes big companies design their data systems specifically to address their big data needs, which may not be applicable to your use cases. So you need to take all those things into consideration.

Betsy Peters (26:26): All right. So let’s shift gears again, as an internal consultant to many business units, what do you advise the most important thing to do when the volume, and the velocity, and the complexity of the data just gets really overwhelming?

Yan Yang (26:40): Yeah, so there are a few steps I usually take when I’m setting up data pipelines, or data systems for a company. The first thing, is to set up priorities. As in you understand how data plays in the business strategy of your company. Whether your company is selling data? Or whether the company, the data is not being sold exclusively, but it’s used as part of the enhancement to your product appeal? Or if the data is purely feeding into your analytics, not into any product, I mean. And also what are the use cases? Whether it’s real-time, or whether it’s, like where the latency’s extremely important, you need to get data decisions out very fast, or it is a big batch, like a analytics-driven approach. So all these questions will affect how you design your data infrastructure.

Yan Yang (27:34): And then you need to have a good data infrastructure team. I know that data science is all the rage now. But to have a proper data science team, you want to build a good data engineering team, a data infrastructure team will help you to get all the pipeline, all the warehouse in place. You know which piece of data, what use case it corresponds to, what kind of system. And then third, is that, I mean, make sure that your security team is on board, always working with your security team very closely. That is without needing to be emphasized enough.

Yan Yang (28:06): So the security is an important component. You want to make sure that the data is governed properly in your system. Otherwise, it will all go haywire later. And then I would recommend the people will start to have a unified analytic tool. That means the data warehouse, there can be a lot of data warehouses, and many varieties, depending on use cases. But you should have central analytics tool, where all the people in company can go to that tool, to get the analytics results, and can share with each other. That just reduces the cost of exploration, that I mentioned earlier. So all these are good practices. I mean, overall, building a strong data science team, and data engineering team, will help the company in the long term, yeah.

Betsy Peters (28:48): I’m going to throw a crystal ball question at you. I was at an O’Reilly conference, three or four years ago, on AI. And I went to the Microsoft booth, and they said, “Within four years, you’re going to be able to open up Excel, and AI will solve most of your biggest problems.” What’s your prediction that you could put about the maturation of AI, in the next two or three years? What are the frontiers that are just going to become [inaudible 00:29:21], that AI is right on that cusp of solving, that might surprise the listeners?

Yan Yang (29:28): Yeah, so a lot of recent AI research has been focusing on the deep learning aspects. I mean, computer vision, natural language processing, and human computer interaction, all these are advancing at very fast speed. And there are a lot of amazing breakthroughs. On the other hand, we have more traditional, structured data exploration, how do we solve a problem with very well-defined data? I mean, I [inaudible 00:30:01]. All those, we also have advances there, but not as fast as the deep learning aspect. You know what I mean? So, but 90% of time, your roadblock is not how advanced the technology is, on how advanced your AI technology is, your roadblock is usually you do not have decent enough data, and your data doesn’t make sense. Or it seems to make sense, but there are a lot of quality issues underneath. That is why it doesn’t give you the correct conclusions. Those are the problems that will not easily go away, because it’s only party a technical problem. It’s in a much larger extent, a business problem. How do you define your things correctly?

Betsy Peters (30:47): Or a human behavior problem too, right?

Yan Yang (30:47): Yeah, human behavior. To make computers understand and to make computers make predictions, you have to teach them what each things mean. I mean, you need to define your data accurately. If the data itself has internal inconsistencies, there’s no way that even the most cutting edge model will give you reliable predictions.

Betsy Peters (31:04): We a 100% agree with you, and run into that all the time. This has been really a pleasure, Yan. Thank you so much for spending some time with us at the RevTech Revolution. One last question, before we go?

Yan Yang (31:16): Sure.

Betsy Peters (31:17): If you could leave our audience with one piece of advice about getting the balance between good data and big data right. What would it be?

Yan Yang (31:26): Yeah, that’s, it’s no problem. I mean, especially in [inaudible 00:31:32] industry, this question got asked a lot, right? I mean, how do you make sure that your model is not doing bad things? How do you make sure that your model is fair? There is a so-called term, this term of, “Fairness-aware machine learning,” right? So there are a lot of different ways you can do. But most importantly, at the very fundamental level, you must [inaudible 00:31:53] the input signals into model properly.

Yan Yang (31:57): And then you must understand what are the things you can use. And you should not, you should stay away from some of the signals are themselves discriminatory, and you should not introduce that into your model if discrimination is a big concern here. And just like when you are periodically assessing the performance of a model, you always assess whether your model is still up to date, from a performance point of view, you should also assess them from a fairness, or discrimination point of view. Where in Deserve, we often ground the analysis to see if the model is imposing any sort of discriminatory effect. You should include that as part of your model monitoring process

Betsy Peters (32:37): And Yan, not to interrupt you, but just to help the listeners understand a little bit, are there one or two factors that you can share with us that gave you discriminatory outputs when you were playing around with model, or building it to begin with? Or something that’s counterintuitive in that regard, that we wouldn’t be thinking of?

Yan Yang (32:58): In general? I mean, you do not want to use geographical data too much at a very high importance, because I mean, there are discriminatory factors associated with that. I mean, and you need to be careful, yeah, you need to be careful of using things like school measure, a lot of things, educational variables, that can be in itself misleading. But, I would say it largely depends on your use case, and how much, strict you want to go. I mean, if you want to be completely 100% safe, you probably are left with not much data-

Betsy Peters (33:37): [inaudible 00:33:37].

Yan Yang (33:37): … that you can use with. So there is again, that common theme of striking at a good trade off. And there are even cutting edge machinery methods, where they use an adversarial model. They build two models, one to try to enhance performance. The second one tries to second guess and make sure that it’s being discriminated, and the two models being trained together. All these are quite an interesting topic so that people can explore.

Betsy Peters (34:04): All right. Again, just thank you so much for all your time. Yan, it’s been really fun talking to you. And hopefully we can do it again sometimes too?

Yan Yang (34:13): Same here. Thank you.

Outro (34:15): Thank you for tuning in to the RevTech Revolution Podcast. If you enjoyed this episode, please don’t forget to rate, review, and share this with colleagues who would benefit from it. If you would like to learn more about how Riva can help improve your customer data operations, check out rivaengine.com.

Meet Team Riva at Salesforce World Tour NYC on April 25!
This is default text for notification bar