DataStax extends AI feature engineering with Luna ML
Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
DataStax is continuing to build out its artificial intelligence (AI) capabilities with the launch today of its Luna ML service.
DataStax got its start over a decade ago as the lead commercial vendor behind the open source Apache Cassandra NoSQL database and has been steadily expanding its portfolio in recent years. Back on Jan. 12, DataStax acquired Kaskada, a privately-held technology vendor building an AI feature engineering technology. Kaskada’s technology provides a declarative query language that can help data scientists to accurately detail what an AI model needs from a dataset.
When the company was acquired, all of Kaskada’s technology was proprietary. Since the acquisition, DataStax has worked to make Kaskada open source to align with the rest of its portfolio. With the new Luna ML service, DataStax is providing an enterprise-supported offering for Kaskada open source to help organizations deploy the technology as part of a machine learning (ML) workflow.
“Some companies adopt open source directly, but a lot of companies that adopt open source want additional assurances if something were to go wrong,” Davor Bonaci, DataStax CTO and EVP told VentureBeat. “Luna ML is support on top of open source, so you can get assurances, credibility and help for running Kaskada open source reliably in production at scale, no matter what happens tomorrow.”
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
How Kaskada has changed under DataStax
Prior to joining DataStax, Bonaci was Kasada’s CEO (and he also cofounded the company). He noted that his small company has benefited greatly from being acquired for a variety of reasons.
At the foundational level, Kaskada is now open source, which means that it can benefit from a community of contributors both inside and outside of DataStax. The technology is now also being adopted by a much larger size of user, as DataStax counts many of the Fortune 500 among its user base. The requirements of larger scale users also means that Kaskada has been improved to handle the needs for performance, security and reliability.
There is also an integration into the broader DataStax portfolio which includes database and event streaming technologies. That integration will soon lead to some big changes in the Cassandra open source database and the DataStax Aria cloud database as a service (DBaaS) platform.
How Luna ML and Kaskada are driving future database changes
Bonaci explained that Kaskada is an AI feature engine that connects to raw event-based data, performs computations on it and produces feature vectors that can then be stored in a database like Cassandra. Once the vectors are stored, a user can perform what’s known as vector similarity search, which is when the database is queried to provide similar vectors.
Vector similarity search is a mainstay of vector databases such as Pinecone, an increasingly popular database vendor that raised $100 million at the end of April. Cassandra is not a vector database, but it will be adding features in the ‘near term’ to more easily enable vector similarity search, according to Bonaci.
“Kaskada produces vectors and once you store them in a storage system or database like Cassandra, you can organize it nicely so that you can perform an efficient similarity search,” said Bonaci.
DataStax cofounder Jonathan Ellis is currently helping to lead efforts in the open-source community to bring Vector similarity search to Cassandra, Bonaci added.
How Kaskada and Luna ML enable MLOps
Kaskada and Luna ML can be used to support MLOps workflows, although Bonaci emphasized that it’s important to first define what the term MLOps means.
He noted that MLops is often thought of as being just about model operations (Model Ops), where the primary concern is about managing the model itself. Bonaci argued that an equally important aspect of managing machine learning (ML) workflow is the data. The data piece is concerned with how data it computed and where it gets stored.
Bonaci said that Kaskada and Luna ML are all about the data itself — reading data, computing, storing and serving data — which is how it helps to support MLOps.
How feature engineering can help reduce AI bias
Bonaci noted that bias is a big topic in AI today and it’s one that he’s hopeful that Luna ML and Kaskada can help to mitigate.
“Kaskada cannot solve all the various problems that exist in the world,” Bonaci conceded. “However, it can help you understand what bias exists.”
With his company’s technology, Bonaci said an organization can better understand data and identify data that could potentially represent bias.
“We give you capabilities to think and reason about it, but at the end of the day, creators of systems and people that put AI into production have the power to do with their data whatever they feel is right,” he said.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.