AI for good: Cleora.AI created by Synerise in Biomedical Sciences.

sair.synerise.com 3 miesięcy temu

Artificial Intelligence for drug improvement in medicine is very hot subject nowadays - which definitely will revolutionize this marketplace section in the next fewer years. How can we support this transformation by open sourcing top-notch AI solution which we created in Synerise?

"Drug companies look to AI to end 'hit and miss' research. Artificial intelligence (AI) is set to improve the industry’s success rates and velocity up drug discovery, possibly saving it billions of dollars, a recent survey by the analytics firm GlobalData has found. AI topped a list of technologies seen as having the top impact on the sector this year. Almost 100 partnerships have been struck between AI specialists and large pharma companies for drug discovery since 2015".

The past of building cleora.ai proves that technological advancement in large data and AI in 1 field (such as an innovative approach to data processing, analysis and generating insights for commercial companies) can effectively drive another industries, but besides affect the advancement of civilization in different disciplines specified as medicine.

Since the inception of the AI squad at Synerise, our ambition has been to rapidly and easy process giant heterogeneous data. Existing libraries, specified as StarSpace, Node2Vec, DeepWalk, or various graphical convolution networks, did not meet our requirements. Each of them had a drawback, like very slow performance, impractical limitation of the maximum graph size or unsatisfactory quality of the results. We needed a solution that would let us to rapidly and accurately calculate graph embeddings with millions of vertices and billions of edges to represent user behavior. respective months of waiting for the consequence of the dive was unacceptable for us.The first version of Cleora was created at the beginning of 2019 and was implemented in Scala. It was rapidly apparent that the tool successfully replaced all existing embedding libraries.In the next iteration, at the beginning of 2020, in addition to optimizing the algorithm, we decided to get free of JVM. The full solution was rewritten from Scala to Rust, thanks to which we have more control over memory and processor consumption, and the performance more than doubled. Its improvement gave us additional opportunities to make a number of solutions based on it, including to make recommendations, scoring, segmentation and various predictions.The experience gathered by the full AI squad allowed us to make Cleora what it is today, a universal and reliable "Swiss Army knife" for graph plunging.

Cleora is 1 of the fastest graph embedding algorithms. It is an essential component for systems operating on data in the form of a network of connected nodes. These are advice systems, prediction of connections between users (like / follow) in social media, or systems predicting the biological functions of protein networks, which allows for the creation of fresh drugs.No wonder then that specified algorithms are created by digital giants specified as Facebook and Google, creating a number of fresh solutions each year. However, Cleora has a crucial advantage over these algorithms.First, it is much faster. Secondly, it does not require specialized hardware (e.g. GPUs for acceleration of calculations) and, in addition, produces advanced quality embedding vectors. This means that systems (e.g. referrals) utilizing Cleory may run faster and with greater accuracy.Cleora is capable of processing graphs of hundreds of millions of nodes. In social networks, onenode usually corresponds to a single user, so Cleora can be utilized to process datasets/graphs on a global scale, at the level of the number of users of the largest social networking sites specified as Twitter. The release of the software under an open-source licence means that from now on, either a company, an individual or a investigation institution can usage Cleora for any purpose. We urge Cleora erstwhile working with large graphs, especially in conditions of limited computing power. The implementation is available on GitHub. In the technological sphere, the Synerise squad achieved crucial success by winning the Rakuten Data Challengecompetition at the SIGIR (Special Interest Group on Information Retrieval) conference. The subject of the competition was creating recommendations in e-commerce, and the organizers included Tracy H. King (Adobe), Shervin Malmasi (Amazon), Dietmar Jannach (University of Klagenfurt), Weihua Luo (Alibaba), Surya Kallumadi (The Home Depot).The Synerise publication on methods for detecting the most crucial features of products, determining the user's interest, besides appeared in the materials of the ICML 2020 conference, and a fewer months later, at the ICONIP 2020 conference, our article was presented, describing the model selecting akin items of clothing based on a photograph gallery from producers and users.

How cleora.ai can aid to improve investigation work on drug discovery?

The advancement of investigation in field of utilizing AI in medicine and drug discovery is amazing. specified amazing scientists like top researchers in graph methods + biomedical: e.g. the SNAP group from Stanford: Jure Leskovec from Stanford University or Marinka Zitnik from Harvard University who are publishing utmost useful materials and educate marketplace about fresh opportunites (Graph Neural Networks for Drug Development).

Cleora.ai can aid accomplish and improve regular work of people engaged regular in drug discovery process.

Chemical molecules and cellular structures are graphs, likewise as social networks, road networks and another structures found in nature & human activity:

Thus, processing of molecular/cellular data is inherently a graph processing problem.

Cleora computes graph embeddings – compressed numerical representations of graph nodes, which make it easy and feasible to discover various properties of nodes and full graphs. By embedding a molecule graph with Cleora, we get a versatile representation of the molecule’s properties with can be exploited for multiple purposes.

Cleora embeddings can be utilized in 2 ways:

  • Directly as origin of information - e. g. uncovering akin nodes/graphs/subgraphs by distance comparison
  • As input to more complex method. This way Cleora can enhance strategy performance by providing informative inputs which facilitate learning.

You can research large examples here.

Problems solved by Graph Neural Network (GNN) models

Problem1: Find which diseases a chemical compound can treat.

Source: Graph Neural Networks for Drug improvement Marinka Zitnik

Solution: subgraph analysis: foretell closeness in the graph representation space between drug mark protein subgraph and illness protein subgraph (both embedded with Cleora).

Problem 2: Find unexpected drug interactions. Patients take multiple drugs to treat complex or co-existing diseases. 46% of people over 65 years take 5-20 drugs. 15% of the U.S. population affected by unwanted side effects. yearly costs in treating side effects exceed $177 billion in the U.S. alone.

Source:Graph Neural Networks for Drug improvement Marinka Zitnik

Solution: Embed drug graphs with Cleora and foretell the existence of edges. and foretell the existence of edges.

Problem 3: Cell ontologies. Single cell technologies have rapidly generated an unprecedented amount of data that enables us to realize biological systems at single-cell resolution. However, joint analysis of datasets generated by independent labs remains challenging due to a deficiency of consistent terminology to describe cell types.

https://www.biorxiv.org/content/10.1101/810234v2

Solution: Embed cell ontologies with Cleora, find akin nodes.

Graph Neural Network Methods - challenges

Problems mentioned above have been previously solved with Graph Neural Network (GNN) models, however, there are any serious problems:

  1. GNNs are slow and request crucial resources (complexity grows rapidly with expanding graph connectiveness and size)
  2. An emerging investigation trend shows that variants of GNNs usage superfluous operations which increase complexity but do not bring increased performance

http://proceedings.mlr.press/v97/wu19e.html

https://openreview.net/forum?id=S1ldO2EFPr

https://arxiv.org/abs/1905.04579

The advantages of Cleora

Cleora solves these problems accordingly:

  • It only uses the key operation of GNN vicinity smoothing which is key to performance
  • As a result, it is importantly simpler and faster than GNNs
  • High scalability and velocity allows Cleora to embed graphs with millions of nodes and billions of edges. This is simply a key feature in biomedical sciences where frequently all possible patterns of connectivity must be explored, which results in millions of possible configurations.
  • It is versatile – it has been applied in winning solutions to competitions from various areas (route prediction, item recommendation, multimodal retrieval)

Barbara Rychalska, Jarosław Królewski

Cleora: A Simple, Strong and Scalable Graph Embedding strategy -Paper

The area of graph embeddings is presently dominated by contrastive learning methods, which request formulation of an explicit nonsubjective function and sampling of affirmative and negative examples. This creates a conceptual and computational overhead. Simple, classical unsupervised approaches like Multidimensional Scaling (MSD) or the Laplacian eigenmap skip the necessity of tedious nonsubjective optimization, straight exploiting data geometry. Unfortunately, their reliance on very costly operations specified as matrix eigendecomposition make them incapable to scale to large graphs that are common in today's digital world. In this paper we present Cleora: an algorithm which gets the best of 2 worlds, being both unsupervised and highly scalable. We show that advanced quality embeddings can be produced without the popular step-wise learning framework with example sampling. An intuitive learning nonsubjective of our algorithm is that a node should be akin to its neighbors, without explicitly pushing disconnected nodes apart. The nonsubjective is achieved by iterative weighted averaging of node neigbors' embeddings, followed by normalization across dimensions. Thanks to the averaging operation the algorithm makes fast strides across the embedding space and usually reaches optimal embeddings in just a fewer iterations. Cleora runs faster than another state-of-the-art CPU algorithms and produces embeddings of competitive quality as measured on downstream tasks: link prediction and node classification. We show that Cleora learns a data abstraction that is akin to contrastive methods, yet at much lower computational cost. We open-source Cleora under the MIT licence allowing commercial usage under this https URL.

Idź do oryginalnego materiału