V. 0.6.1

A Java Toolbox for Scalable Probabilistic Machine Learning

Probabilistic machine learning

Model your problem using a flexible probabilistic language based on graphical models. Then, fit it with data using a Bayesian approach to handle modelling uncertainty.

Multi-core and distributed processing

AMIDST provides tailored parallel and distributed implementations of Bayesian parameter learning (and probabilistic inference) for batch and streaming data. This processing is based on flexible and scalable message passing algorithms.

Probabilistic Graphical Models

Specify your model using probabilistic graphical models with latent variables and temporal dependencies

Scalable inference

Perform inference on your probabilistic models with powerful approximate and scalable algorithms.

Data Streams

Update your models when new data is available. This makes our toolbox appropriate for learning from data streams.

Large-scale Data

Use your defined models to process massive data sets in a distributed computer cluster using Flink or Spark.

Extensible

Code your models or algorithms within AMiDST and expand the toolbox functionalities. Flexible toolbox for academics performing their experimentation in machine learning.

Interoperability

Leverage existing functionalities and algorithms by interfacing to existing software tools such as Hugin, MOA, Weka, R, etc

Multicore
			            		//Load the datastream
String path = "datasets/simulated/";
String filename = path+"BCC_month0.arff";
DataStream data = DataStreamLoader.open(filename);

//Learn the model
Model model = new GaussianMixture(data.getAttributes());
model.updateModel(data);
BayesianNetwork bn = model.getModel();

			            	
Flink
			            		//Set-up Flink session.
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

//Load the data stream (with Flink)
String path = "datasets/simulated/";
String filename = path+"BCCDist_month0.arff";
DataFlink data = DataFlinkLoader.loadDataFromFolder(env, filename, false);

//Learn the model
Model model = new GaussianMixture(data.getAttributes());
model.updateModel(data);
BayesianNetwork bn = model.getModel();

			            	
Spark
			            		/* This feature is still under development */

			            	

Risk prediction in credit operations

AMIDST Toolbox has been used to do risk prediction in credit operations, and as data is collected continuously and reported on a monthly basis, this gives rise to a streaming data classification problem. This work has been performed in collaboration with one of our partners, the Spanish bank BCC.

Recognition of traffic maneuver

AMIDST Toolbox has been used to prototype models for early recognition of traffic maneuver intentions. Similarly to the previous case, data is continuously collected by car on-board sensors giving rise to a large and quickly evolving data stream. This work has been performed in collaboration with one of our partners, DAIMLER.

Andrés Masegosa

Developer

NTNU

Ana M. Martínez

Developer

AAU

Darío Ramos-López

Developer

UAL

Rafael Cabañas

Developer

AAU

Hanen Borchani

Former Developer

AAU

Antonio Fernandez Alvarez

Former Developer

BCC

Thomas D. Nielsen

Scientific Member

AAU

Antonio Salmerón

Scientific Member

UAL

Rafael Rumí

Scientific Member

UAL

Anders L. Madsen

Scientific Member

Hugin Expert

Helge Langseth

Scientific Member

NTNU

Ramón Sáez Martínez

Industrial member

BCC

Galia Weidl

Industrial member

Daimler AG

Contact

If you have any question about the toolbox or if you want to collaborate in the project, please do not hesitate to contact us. You can do it through the following email address.

CONTACT

Acknowledgements

This software was performed as part of the AMIDST project. AMIDST has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209.

Amidst - A Java Toolbox for Analytics of Massive Data Streams using Probabilistic Graphical Models