Project Monad

sair.synerise.com 1 rok temu

In this article we introduce Monad, a tool which makes the most of event data reflecting user behavior. Working with clients in many different industries, we have observed that large company data is frequently underutilized. Here we will discuss where our investigation on Monad arises from, how we meet strict requirements connected with large data processing, and how it can aid data scientists make their work little time-consuming and more oriented towards the creative part of the process. We introduce any crucial concepts connected with AI-driven analyses of interactions hidden in event data and how to usage it to better realize your customers and services.

There are respective key stages of work needed to make a viable AI model. The most apparent 1 is creating and training ML algorithms to solve given tasks. However, the effort in preparing input data representations is equally (if not more) important. Having any applicable data in our domain of interest, we must consider which information is crucial and how to represent it numerically. These representations should accurately encode the applicable information, usually in a compressed form, and be in line with the requirements of our selected AI algorithm. If we want to estimation real property prices, we realize that we request to know the full square footage, number of rooms, and so on, but we could possibly besides usage the average temperature in a given region. And what if we want to add information about the quality of the vicinity – how to quantify and represent it numerically? Bear in mind, that real property pricing is the “hello world” of device learning and with more elaborate use-cases things become more and more complex. For example, modeling human behaviour is based on multiple factors which are entangled with our psychology and it is hard to even find possible features. quite a few investigation and effort goes into building better tools to realize and utilize data depending on its type, modality, but besides creating a model equipped with a “basic knowledge” that can be utilized in different domains and downstream tasks.

Coming up with features is difficult, time-consuming, requires expert knowledge. ‘Applied device learning’ is fundamentally feature engineering.
~ Andrew Ng

Where we’re at? Image, text, and foundation models

A peculiarly hot subject in AI investigation are foundation models. They are trained on a vast amounts of data of a general nature, not tailored to any circumstantial task or 1 circumstantial domain, frequently in a self-supervised manner. They store general cognition that can be further utilized in various downstream tasks. any of them require model fine-tuning – additional training for a given task or on domain circumstantial data. For example, we can fine-tune a language model to classify texts into categories. another tasks are learned in a zero-shot manner, so the model is capable of solving a given task without any fine-tuning only by providing the needed prompt to the model. 1 example of specified ability is DALLE’s image-to-image translation. The model can present the same object in different styles, for example creating a image of a cat based on a sketch of a cat.

(source: https://openai.com/blog/dall-e/)

Foundation models mark a shift in how we are approaching AI. quite a few focus is put into this research, with initiatives specified as Stanford's Center for investigation on Foundation Models or any fresh catchy examples as ChatGPT or DALLE. In 2021 the Workshop on Foundation Models was held, where it was stated that in the last 2 years all state-of-the-art models in text and imagination leverage foundation models.

Self-supervised learning is an crucial training method for building foundation models. It extracts information encoded in data intrinsically, omitting any data labeling effort. It predicts the hidden aspects of data based on observable context. Hence, we can utilize large quantities of data of a general nature, without the additional effort of high-quality task-specific data annotation. There is simply a celebrated analogy coined by Yann LeCun:

If intelligence is simply a cake, the bulk of the cake is unsupervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning (RL)

In reinforcement learning a model request many samples to learn expected behaviour and supervised learning requires human intervention in creating training data, 1 example usually carries 1 part of information e.g., a text paragraph is mapped to a single category. However, in self-supervised learning from 1 text paragraph we can extract much more information – each word with its context can service as a separate training example. For these reasons, for Yann LeCun, self-supervised learning is the future of AI.

However, natural language and image are not the main data formats in which companies store information most valuable for their business. The majority are big, interconnected event data, that stores valuable insights, and are frequently underutilized. This crucial field was left behind erstwhile it comes to utilizing the power of self-supervised learning, investigation on data representation and foundation models. We deficiency general-purpose tools to represent entities in this data and models we could build upon. This deficiency is alarming as many fields can benefit from specified solutions fitted to production environment, but besides due to the fact that any crucial part connected with human activity was left out. An effort was made to imitate our language and visual skills, however, the area of human behavioral patterns and preferences seems to be left behind.

Enter task Monad.

Behavioral data

Behavioral event data are records of human interaction with services or products and can be found in many different domains specified as retail, finance, e-commerce, automotive, telecom. any examples are products bought by users, card transactions, what users follow, what ads they click, etc. With this kind of data, we can harness artificial intelligence capabilities for our needs.

We make Monad to easy operate on event data. It allows you to make representations of users, products, and services based on their interactions, in self-supervised manner, and usage them to train a foundation model. It can service in many downstream tasks – predicting future interactions, segmentation, recommendation, user engagement prediction - at this point sky is the limit. For businesses, it means greater cognition about their clients, but besides about their products and services. Products can be seen through the lens of user interactions and supply a better knowing of users and their engagement. It enables defining business key components in a way that is circumstantial for the peculiar case. Finally, we can make proof-of-concept solutions in considerably little time and easy test fresh ways to enhance our product, or gain additional insights about our customers.

Monad

The conceptual core of Monad are universal behavioral profiles. They compress user behaviour into actionable predictive representations. What does it mean in practice? It importantly decreases improvement time in ML projects, as there is no request to manually make features, separately for each model. A common but sub-par manufacture practice is that features which request to be aggregated to represent a certain entity, e. g. events generated by a single client are usually represented with averaged embedding vectors representing products they bought. Monad universal behavioral profiles come with a peculiar procedure of aggregation, which is much better than averaging – it keeps the individual identities of aggregated entities in place, allowing for precise reconstruction of each individual item from the aggregate representation. Hand-crafted features can besides be added to the behavioral profiles with simple concatenation.

As Monad needs to be effectively applied to large data problems, its scalability is another very crucial aspect. We put an extra effort into making it fast and lightweight, well adapted to the demanding production environment. erstwhile we look at corresponding solutions in text and image domain, they require crucial resources and aren’t yet full fitted for real-time production settings. Not to leave you with unfounded claims, let’s proceed to the method details of Monad.

Graphs everywhere

In the first step, Monad detects graphs and hypergraphs in user interactions. A graph is simply a data structure where nodes represent entities, and edges represent relations between entities. In a hypergraph an edge can join more than 2 nodes.

Graphs are very potent erstwhile it comes to representing real-world events and, as such, are peculiarly well fitted to user interaction data. Nodes can represent users, products, services, transactions or even product categories, prices, fundamentally anything from our data that can be interpreted as an entity. Edges can represent relations between any 2 or more entities: user bought product, user followed user, user spent given amount of money, the product was bought together with another product, etc. For that reason, the first phase of the Monad pipeline is to look for graphs in event data and convert them into vectors – graph embeddings. We usage our proprietary algorithm, Cleora, that transforms natural data into ML-interpretable vectors that are based on graph data structure and contain information about node’s interactions and similarity. A detailed description of the algorithm can be found here.

Making Cleora ultra-fast and scalable was of large importance to us as we deal with large amounts of data in industries like retail, banking, or telco. Graph Neural Networks, which are the most popular graph embedding approach, require many GPUs and tremendous engineering effort to scale with immense event data graphs. Cleora can work on a single GPU and takes a fewer minutes to process graphs with as many as 1 million nodes.

Universal Behavioral Profiles

Once we have vectors which encode information about entities in our data, we can usage them to make universal behavioral profiles. Our EMDE algorithm (for details see our blog post) aggregates information from multiple events into a single vector for a given entity. In a basic example, we can imagine multiple products bought by 1 user being aggregated into vector representation of this user. An crucial feature that distinguishes EMDE algorithm from another methods of vector aggregation, specified as averaging, is preserving information from each vector. In the example with product purchases, it would mean that we don't make an average product, but we keep the notion of product similarities and keep each product characteristic.

We can combine information from multiple sources and modalities to make a robust representation of a given entity. We can make a behavioral profile of any entity in accordance with our needs and data specifications, including not only users but besides products, stores, employees, etc. For each entity, we can combine multiple event types encoded with Cleora, but besides additional data sources like texts or images. We can besides enrich these representations with any static features that don’t require graph-based modeling specified as user age, or product attributes. Resulting EMDE vectors are fixed-size structures independent of the input embeddings size. What is important, behavioral profiles can be maintained in real time, so we can supply up-to-date representations for further predictions.

Predictions – where it all comes together

Now we can connect all introduced concepts to uncover the actual power of Monad. Universal behavioral profiles can be utilized to train a neural network to foretell future interactions. To do so we can utilize a self-supervised training method by predicting future continuation of the data based on available history.

We make a behavioral profile based on events before any arbitrary time divided and foretell the behavioral profile after the time split. This way we end up with a foundation model encoding cognition about user behaviour and interaction patterns. What distinguishes this solution is the size of the model and training time. In text or image processing, the full effort goes into making models better, but not smaller or faster, so model size and training time are what make the improvement in this area. In behavioral data processing, we frequently deal with very strict time constraints and we request to approach hardware requirements realistically. As we put stress on creating robust data representations – universal behavioral profiles, we can lighten the model training part. Thanks to our investigation in efficient high-dimensional manifold density estimation, we can usage simple feed-forward NNs with just a fewer layers to train very robust conditional models. This is simply a large improvement in comparison to models with billions of trainable parameters, taking weeks to train.

We can then fine-tune the pre-trained foundation model for many different tasks. In the advice task, we foretell which products customers will buy. For churn prediction, we estimation which customers are likely to become inactive in the close future. We can besides foretell if the user will buy a given product, usage services of any category, acquisition something of a given brand, and so on.

Of course, these are only selected examples of what can be done with Monad, but in fact, methods utilized here are very universal. Taking part in many competitions, we tested Cleora+EMDE behavioral profiles in various tasks from different domains specified as next destination prediction on Booking.com data, technological papers subject area prediction on a large graph dataset, close real-time Tweets recommendation, or matching a product image to the description. In each of these tasks, we were in the top 3 together with large investigation labs like NVIDIA or DeepMind.

What is more, Monad is very easy to usage with any common usage cases available off the shelf. At the same time, there is simply quite a few area for customized configuration and creating personalized solutions. It is meant to be of aid to data scientists by decreasing model improvement time. The repetitive part of the work, where quite a few issues and mistakes can arise, was automated. What’s left is the creative part of the process, where data scientists can usage their cognition to find the best model plan and check hypotheses about data, customers, services, or products much quicker.

Monad in practice

But how does it look in practice? First, Monad retrieves event data from a database (e.g., ClickHouse, Snowflake, Redshift, BigQuery). The next step is the fit phase where we make functions that transform input data into behavioral profiles. We specify what kinds of events and attributes we want to usage and how to group them. For example, we can group product buys by customer. This way users are represented in terms of products they bought, and products are encoded to keep the notion of similarity as being bought by the same user. Analogically, we can group card transactions by user, products by session, and so on. As you can see, we have very easy fit phase configuration resulting in complex entity representations, with all the graph-based behavioral profiling magic going under the hood. What is crucial about the fit phase is that alternatively of storing hundreds of features per entity, we store tiny helper data structures and functions created based on provided data, that instruct how to transform them into features.

In model training phase data is streamed from a origin and only a fraction of it is loaded into memory. Features are generated on-the-fly utilizing feature transforms created during the fit phase. Resulting representations are utilized to train a model. In training configuration, we can easy set a mark that we want to predict, together with model and training parameters. Additionally, the interpretability module shows how each table, column or value contributes to the final prediction

Data discipline projects request a considerable amount of time and effort to validate if a given approach fulfills business requirements. Also, event data are peculiarly hard due to dataset size, real-time applications, and the request to keep the solution up to date with rapidly changing conditions. We put an effort to make an ML copilot that will aid data scientists with the time-consuming part of model improvement and make it much easier to validate solutions, and test different business scenarios. At the same time, we are driven by the request to keep the area of behavioral modeling in line with fresh prominent approaches in AI research. We don’t overlook Monad's utility as a tool that can be utilized in production settings, but at the same time, our ambition is to push the frontier of AI-driven behavioral modeling.

Idź do oryginalnego materiału