Data security is an imperative, not an option, in banking. The industry functions in a strict regulatory environment, processing billions of daily transactions teeming with sensitive personal information and safeguarding systems critical to daily business and long-term economic well-being.
As financial firms move forward with generative AI adoption, leveraging data at scale without compromising its integrity or security becomes a delicate balancing act and a potential roadblock to scaling the technology.
“The workflows are as secure as the people running them, so you have to make sure that every employee knows they have to follow policies for handling data,” Leon Bian, vice president and head of Databolt product at Capital One Software, told CIO Dive. “Data analysts, data scientists and data engineers have to follow the right hygiene.”
The financial firm, which launched its software division in 2022 and deployed data management platform Slingshot the same year, turned to tokenization as a potential security solution two years ago and set about engineering an alternative to traditional encryption.
In April, Capital One Software completed the commercial rollout of its second product, Databolt. The platform replaces sensitive data with tokens, preserving the underlying formatting while enabling security third-party sharing and generative AI ingestion.
“When you encrypt the data, it changes the format, and you have to decrypt the data every time you run an operation,” Bian said. Tokenization keeps the original format unchanged — a Social Security number remains a nine-digit string — while shielding the sensitive data point.
“It’s a cleaner, faster and more secure way for data analysts to work with data,” Bian said. “Internally, we don’t feed any sensitive data into a large language model unless it has been tokenized. If there’s any sensitive data that you don’t want the model directly trained on, you tokenize it.”
Capital One’s path to tokenization began eight years ago, according to Bian, but it has deeper roots in the bank’s decade-long journey to cloud. Cloud served as a foundation for AI adoption, the company told CIO Dive in January. Data security was baked into the process.
The software division tapped into its 14,000-plus army of engineering and technology professionals to develop a scalable solution, an initiative that led to the creation of Databolt.
“We couldn't find a viable solution on the market, so we decided to build our own,” Bian said.
In May, the bank integrated Databolt with Snowflake and Databricks to expand the platform’s capabilities and unlock access to larger stores of data for AI and analytics operations in a secure environment.
“I’m not going to say tokenization is the most secure solution because no solution is fully secure,” Bian said. “Tokenization alone is not going to be enough. We always add an encryption layer for sensitive data and you’re always going to need strong access controls.”