Code Archives - Michał Łusiak

Creating new features in ML.NET using CustomMapping

One of the early stages of the Machine Learning projects pipelines is called Feature Engineering. And one of the actions we perform in that stage is creating new features based on the available dataset.

ML.NET has a long list of Transforms to support different operations, but sometimes you want to do something special. In my Portimão blog post I wanted to calculate a new feature by multiplying two existing features.

What I had in mind was not supported. Fortunately you can also use Custom Mapping and provide it a function that will perform the transformation. Below example how you can add it to your calculation pipeline.

var pipeline = mlContext.Transforms.CustomMapping((TyreStint input, CustomDistanceMapping output) => output.Distance = input.Laps * input.TrackLength, contractName: null)
                .Append(mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName: nameof(TransformedTyreStint.Distance)))

This also require some small changes in your Data object, but in general is an elegant way to solve that issue. Microsoft has some more examples for CustomMapping transform.

While you here, check out my series on predicting F1 strategy using ML.NET

Formula 1 Strategy with ML.NET – Part 4 – Monaco Grand Prix

Disclaimer: The F1 FORMULA 1 logo, F1 logo, FORMULA 1, F1, FIA FORMULA ONE WORLD CHAMPIONSHIP, GRAND PRIX and related marks are trademarks of Formula One Licensing BV, a Formula 1 company. All rights reserved. I’m just a dude in a basement with too much time on my hands.

This post is the fourth part of my approach to use ML.NET to make some predictions for strategies that teams will approach in Formula1. I highly suggest reading the first part, where I explain the approach I took.

Monaco Grand Prix is all about tradition and prestige.

The first race on the streets of Monaco took place in 1929, although not as a part of Formula 1. The track changes a bit over time, but in general stayed the same, as the street layout didn’t change. It is narrow with tight corners and nearly no run-off areas. There’s no room for error. It is the most prestigious race, and even though there are no extra points for winning it, it has a special place in many drivers hearts. The racing is not fascinating as overtaking is hard and usually order doesn’t change much from qualification, unless some accidents happen (which is quite possible on the tight circuit). The real charm is not in racing in Monaco but in the spectacular views, accompanying events and overall glamour of the race weekend.

In terms of tyres, the streets are resurfaced every year and the surface is not very abrasive compared to other races. Traditionally the three softest tyres are used for the grand prix, so the hardest tyre here (C3) is the same as the softest one in Spain. And it’s nearly never used as it’s simply not grippy enough.

versus choices for Monaco Grand Prix. Source: F1TV.com

Changes in dataset

For reasons mentioned below, I decided to add the “Distance” column (by calculating Track Length times number of laps) to the dataset physically. I also added data on all new races in 2021 since previous post (Spanish Grand Prix) and data from Monaco Grand Prix from 2017-2019 period (2020 was initially postponed and eventually cancelled). This give us around 200 new rows and brings the total up to around 950.

Changes in code

I continued on top of the previous post experiment with AutoML. The problem I had is I couldn’t pass a pipeline to the AutoML to make use of the additional information I have about the dataset to make training easier. But I found a way to pass information about which columns are categories and which are numerical using the ColumnInformation object:

var experimentResult = mlContext.Auto()
      .CreateRegressionExperiment(experimentTime)
      .Execute(
          trainingData, 
          testingData,
          columnInformation: new ColumnInformation()
              {
                  CategoricalColumnNames = 
                  { 
                      nameof(DistanceTyreStint.Team), 
                      nameof(DistanceTyreStint.Car),  
                      nameof(DistanceTyreStint.Driver), 
                      nameof(DistanceTyreStint.Compound), 
                      nameof(DistanceTyreStint.Reason) 
                  },
                  NumericColumnNames = 
                  { 
                      nameof(DistanceTyreStint.AirTemperature), 
                      nameof(DistanceTyreStint.TrackTemperature) 
                  },
                  LabelColumnName = nameof(DistanceTyreStint.Distance)
              },
           preFeaturizer: customTransformer
       );

I also experimented with using PreFeaturizer to pass information about converting the Laps feature into covered Distance feature (check out part 2 for details on that). This didn’t work out so well. It seems that it works OK for features that are not used as labels, but in this case looks like the model gets confused as it’s fitting for a feature that’s not in the input dataset. For more information on using custom transformers with AutoML checkout information in this repository.

So instead I decided to add that calculated feature (Distance = TrackLength * Laps) directly into the dataset and use it as our training label. So I dropped the PreFeaturizer, but kept the ColumnInformation as it looks like it increased the performance of the model.

Let’s look at those metrics after those changes:

=============== Training the model ===============
Running AutoML regression experiment for 600 seconds...

Top models ranked by R-Squared --
|     Trainer                         RSquared Absolute-loss Squared-loss RMS-loss  Duration      |
|1    FastTreeTweedieRegression         0,4342      26005,21 1059975156,71 32557,26       0,9     |
|2    FastTreeTweedieRegression         0,4337      26270,96 1061084915,03 32574,30       0,6     |
|3    FastTreeTweedieRegression         0,4245      26281,03 1078213169,63 32836,16       0,8     |


===== Evaluating model's accuracy with test data =====
*************************************************
*       Metrics for FastTreeTweedieRegression regression model
*------------------------------------------------
*       LossFn:        1059975153,93
*       R2 Score:      0,43
*       Absolute loss: 26005,21
*       Squared loss:  1059975156,71
*       RMS loss:      32557,26
*************************************************

0,43 on the R-squared is another step up from 0,31 from last week and negative in the first two weeks. I’m happy with that progress!

The version of the code used in this post can be found here.

Predictions for Spanish Grand Prix

So here are the predictions for the first stint on the soft tyres for the top 10 on the starting grid. I’ll update actual values after the race.

Driver	Compound	Prediction	Actual
Charles Leclerc	C5	21,3	0 (DNS)
Max Verstappen	C5	22,0	35
Valtteri Bottas	C5	22,1	29
Carlos Sainz	C5	21,3	33
Lando Norris	C5	19,4	31
Pierre Gasly	C5	22,8	32
Lewis Hamilton	C5	22,2	31
Sebastian Vettel	C5	14,9	33
Sergio Pérez	C5	22,0	36
Antonio Giovinazzi	C5	23,8	35
Esteban Ocon	C5	27,3	37

They seem pretty consistent with prediction from the official Formula 1 website (they predict 19-26 laps for the first stint). Let’s see how it will match the reality.

Updated: Updated the prediction using the actual temperatures at the beginning of the race instead of forecast. Added actual values.
The results are a bit away. Monaco is a tricky and unpredictable track and looks like this year’s resurface is much less violent to the tyres. Let’s count it as a failure :)

What’s next

Although we’ve made some progress, I think I’m reaching the limit of the current model. In the next part I’ll reorganize the code a bit. Split the training and inference and add some visualization. All to that to support future support for multiple models and easier visualization of how they differ in performance. My goal is to be able to make automated graphics similar to what you can find in post like this and this.

Formula 1 strategy with ML.NET – Spanish Grand Prix

This post is the third part of my approach to use ML.NET to make some predictions for strategies that teams will approach in Formula1. I highly suggest reading the first part, where I explain the approach I took.

Spanish Grand Prix is probably the most predictible one

Usually at the beginning of each season teams spend a few days at the Catalunya racetrack testing. This year was exception with Bahrain due to the Covid-19 restrictions. Just to give you an example – over the whole of his career, Kimi Räikkönen did 5983 laps in testing and only 843 while racing around this track. Nearly 90% of laps he’s done at this track where while testing. Teams have a lot of data on this track and know it very well. This makes racing here predictable and to be honest quite boring. But Boring is good for predictions.

Changes in the dataset

The overall data structure stayed the same. As previously, I added data on all new races in 2021 since previous post (Portimão and Imola) and data from Spanish Grand Prix from 2017-2020 period. This give us 250 new rows and brings the total up to around 750.

Changes in the code

In the previous part I used the default regression model for training, and it didn’t perform very well. So I decided to give the AutoML a try. AutoML dos what the name suggests – automatically looks for the best model. You declare how much time you want to spend on an experiment and let the ML.NET try out multiple algorithms with different parameters:

uint experimentTime = 600;

ExperimentResult<RegressionMetrics> experimentResult = mlContext.Auto()
.CreateRegressionExperiment(experimentTime)
.Execute(trainingData, progressHandler: null, labelColumnName: "Laps");

Then you can pick up the best model out of that experiment, and you can run predictions using it:

RunDetail<RegressionMetrics> best = experimentResult.BestRun;
ITransformer trainedModel = best.Model;

var predictionEngine = mlContext.Model.CreatePredictionEngine<TyreStint, TyreStintPrediction>(trainedModel);

var lh = new TyreStint() { Track = "Bahrain International Circuit", TrackLength = 5412f, Team = "Mercedes", Car = "W12", Driver = "Lewis Hamilton", Compound = "C3", AirTemperature = 20.5f, TrackTemperature = 28.3f, Reason = "Pit Stop" };
var lhPred = predictionEngine.Predict(lh);

The downside is, I haven’t figured out yet how to actually feed all that feature transformations we did in previous posts into the AutoML flow. So we cannot mark manually which data columns are categories or add new features like we did with distance. And predicting number of laps on tracks with different lap length makes me feel the results may be not so reliable. But let’s roll with it, and see where that leads us.

And the metrics look quite promising. The best algorithm after 10 minute experiment ended up being Fast Tree Regression. And for the first time we have non-negative R-squared. That’s the best performing model on test data we had so far, even though we took a step back with the laps instead of distance.

=============== Training the model ===============
Running AutoML regression experiment for 600 seconds...

Top models ranked by R-Squared --
|     Trainer                     RSquared Absolute-loss Squared-loss RMS-loss  Duration   |
|1    FastTreeRegression          0,3971          5,49        52,55     7,21       2,2     |
|2    FastTreeRegression          0,3880          5,53        53,86     7,28       3,6     |
|3    FastTreeRegression          0,3731          5,50        54,11     7,32      18,0     |


===== Evaluating model's accuracy with test data =====
*************************************************
*       Metrics for FastTreeRegression regression model
*------------------------------------------------
*       LossFn:        51,19
*       R2 Score:      0,31
*       Absolute loss: 5,29
*       Squared loss:  51,19
*       RMS loss:      7,15
*************************************************

Check out tha latest version of the code in the repository.

Predictions for Spanish Grand Prix

Without further ado, let’s do predictions for the Spanish Grand Prix. All the first 10 drivers will start on the soft tyres, which are C3 compound in Spain.

Tyre choices for 2021 Spanish Grand Prix. Source: F1TV.com

In the table below I put my predictions and the actual first stints (since it’s already after the GP). What’s very suspicious is that each of the teammates got the same scores, which makes me think that the model gave a lot of weight to the team, and not much to the driver. I was also surprised with those results being so high for soft tyres. But C3 is actually pretty hard compound for “soft tyre” (read the first post for more details on tyre compounds) and they ended up being quite reasonable. Similarly to the previous post, I bolded out the results that ended up within 10% of the actual value. Half of them were pretty close, which is consistent with metrics suggesting this is our best model yet.

Driver	Compound	Prediction	Actual
Max Verstappen	C3	26,7	24
Valtteri Bottas	C3	26,9	23
Lewis Hamilton	C3	26,9	28
Carlos Sainz	C3	26,7	22
Sergio Pérez	C3	26,7	27
Lando Norris	C3	26,5	23
Charles Leclerc	C3	26,7	28
Daniel Ricciardo	C3	27,2	25
Esteban Ocon	C3	26,8	23
Fernando Alonso	C3	26,8	21

What’s next

For the next post that will be in conjunction with the Monaco Grand Prix, I’ll try to feed that feature information we engineered in the first two posts into the AutoML and see if that helps that model get even better. But generally I feel we’re getting to limits what’s possible with that approach. We have a lot of categorical data and not so much numeric data, which makes it a bit hard for regression. I have a few ideas for new models, and we’ll explore them in future parts.

Predicting F1 race strategy using ML.NET – Part 2 – Imola / Portimão

This post is the second part of my approach to use ML.NET to make some useful predictions about strategies that teams will approach in Formula1. I highly suggest reading the first part, where I explain the approach I took.

And yes, this episode is also too late to make predictions for Imola and Portimão, but hopefully by Spanish Grand Prix will be all caught up.

We’re not gonna get great results here

If you read the previous part you’re aware that I didn’t get spectacular results in the first run. Unfortunately, we won’t improve much here, because we don’t have much data for those tracks. In the period I focus on (2017-2020) there was only one race on each of those tracks. Imola, although a legendary track and hallowed ground for Formula1, is quite outdated and fell out of grace in recent years. Portimão is a very new track, and Portugal GP promoters were unable to attract Formula 1. Last time we raced in Portugal it was in 1996 on a different track – Estoril. But 2020 was a weird year because of COVID-19, and due to travel restrictions F1 needed to find more tracks in Europe. And so we went to Imola and Portimão.

But there are opportunities!

So even though we don’t expect dramatic improvement adding new tracks gives some new view into the data. Previously, since we only used data from one track, I used the number of laps as our Label – a number we were predicting. Since now, we have one than more track with different lengths we have to find a new feature/label to use. That’s how Distance was added to the model – I calculate the TrackLength times Laps covered.

Changes in dataset

I haven’t added any new columns to the dataset and the feature describe above is calculated in code. But I added a lot of rows. My approach is that for each race I’ll be adding previous 2021 races, and all the races since 2017 on the current track. So by applying this rule, I’ve added Bahrain 2021 (previous race) and Imola and Portimão 2020 (previous races on the current track).

Changes in code

There are two major changes in the code since last iteration. With the 4 new races added, we nearly doubled our data row count (from 250ish to nearly 500). Now I can be a bit more selective which rows I take into calculation. In the first approach, we want to predict strategy in situations where everything goes well, since we cannot predict unpredictable (rain, accidents, mechanical failures).

Removing unnaturally shortened stints

That’s why I decided to filter out all the rows with stints that end with something else than “Pit Stop” (which is regular unforced change of tyres) or “Race Finish”.

This is the piece of code that does it:

// Load data
var data = mlContext.Data.LoadFromTextFile<TyreStint>(DatasetsLocation, ';', true);

// Filtering data
var filtered = mlContext.Data.FilterByCustomPredicate(data, (TyreStint row) => !(row.Reason.Equals("Pit Stop") || row.Reason.Equals("Race Finish")) );
var debugCount = mlContext.Data.CreateEnumerable<TyreStint>(filtered, reuseRowObject: false).Count(); //396

// Divide dataset into training and testing data
var split = mlContext.Data.TrainTestSplit(filtered, testFraction: 0.1);
var trainingData = split.TrainSet;
var testingData = split.TestSet;

I use FilterByCustomPredicate method from the mlContext.Data. It takes a function which returns true for the rows we want to remove from the data.

The function itself is this small iniline piece of code:

(TyreStint row) => !(row.Reason.Equals("Pit Stop") || row.Reason.Equals("Race Finish"))

I inserted this in between loading the data from files, and splitting the data, because I want both test and training data to be consistent. This leaves us with 396 rows of data. Nice, that makes me happy.

Using distance instead of laps as a feature

The second change was connected to using Distance instead of Laps as our new Label. I had to create a new feature Distance, by multiplying two other features – Laps and Track Length. ML.NET has a long list of Transforms to support different operations, but what I had in mind was not supported. Fortunately you can also use Custom Mapping and provide it a function that will perform the transformation. I added at the beginning of our calculation pipeline.

var pipeline = mlContext.Transforms.CustomMapping((TyreStint input, CustomDistanceMapping output) => output.Distance = input.Laps * input.TrackLength, contractName: null)
                .Append(mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName: nameof(TransformedTyreStint.Distance)))

That needed also some boiler-plate code in the TyreSting object. Feel free to check the current version of the code and Microsoft examples for CustomMapping transform.

I also needed to recalculate distance back to laps at the end. I do it by again dividing the prediction by the distance of the current track.

prediction.Distance / race.Track.Distance

While looking at the code you’ll also notice that I refactored it a bit. I moved some classes into separate files. I also isolated some data and magic strings. Just general clean up, so step by step we move from one-file script mess into serious working application.

Newest predictions

Just for the sake of showing progress of lack of, let’s look at some predictions. I’m not going to predict all tyres for all the drivers. Let’s just take the first 10 qualifiers from the both races and predict how long can they go on the tyres they’ve chosen to start on and compare it to what happened in reality.

Imola

The predictions for Imola ended up being pointless. On the morning of the race it rained and all the assigned tyres are cancelled in a situation like that. Teams have to start on either Intermediate or Wet compounds. In the track conditions on that day Wet tyres were a mistake. The track was dump, but there was no standing water. If you look at the table, you will notice that all drivers on Intermediates change tyres around lap 27/28. This is because teams wait for a perfect moment when the track is ready for dry tyres and usually first person blinking and changing to dries will trigger all the other teams to react.

Name	Compund	Predicted	Actual	Notes
Lewis Hamilton	C3	23,17	28	Intermediate
Sergio Pérez	C4	14,03	28	Intermediate
Max Verstappen	C3	24,23	27	Intermediate
Charles Leclerc	C4	11,96	28	Intermediate
Pierre Gasly	C4	16,54	14	Wet
Daniel Ricciardo	C4	13,98	27	Intermediate
Lando Norris	C4	15,77	28	Intermediate
Valtteri Bottas	C3	26,96	28	Intermediate
Esteban Ocon	C4	15,04	1	Wet
Lance Stroll	C4	13,67	27	Intermediate

Portimão

The Portuguese Grand Prix didn’t provide any weather surprises. We had a dry race and could finally test our model. On some rows predictions were quite good. I marked in bold those, where we landed within 10% difference. Sergio’s result is an obvious outlier – he was used by the team to slow down Mercedes drivers to give a better chance to his teammate Max Verstappen. But he was the right man for the job. Sergio’s tyre management is excellent, definitely one of the best in the paddock.

Name	Compund	Predicted	Actual	Notes
Valtteri Bottas	C3	38,47	36
Lewis Hamilton	C2	34,48	37
Max Verstappen	C2	35,60	35
Sergio Pérez	C2	32,40	51	Sergio is a know “tyre whisperer” ;)
Carlos Sainz	C3	29,58	21
Esteban Ocon	C3	24,94	22
Lando Norris	C3	25,71	22
Charles Leclerc	C2	30,22	25
Pierre Gasly	C3	26,53	24
Sebastian Vettel	C3	30,45	22

What’s next?

Next up on the calendar is the Spanish Grand Prix. It’s a well known track to Formula 1 teams. They spend a week there near every year for pre-season testing. Teams like this track for its predictable surface and weather conditions. They know every inch of it and have it perfectly modelled in simulators. All that data makes racing here pathetically boring. But boring is good for predictions, right?

We’ll also take a stab at our model and for the first time try to improve it on top of just adding more data.

Until next time!

Predicting F1 race strategy using ML.NET – Part 1 – Bahrain

This is a multi-part series.

Part 1 – this article
Part 2 – Imola / Portimão

Let’s take out the first issue out of the way – I know it’s way too late to make predictions about the Bahrain race. I’ve been sitting on this blog post for quite some time, so I’m way behind. There is one more about Imola and Portimão incoming and hopefully by the Spanish Grand Prix will be all caught up.

Why am I gonna focus on tyres? (for now)

For the start, I will focus on the tyre strategy. Since refuelling was banned in 2009, choice of tyres and when to stop to change them has been a crucial part of the race strategy. In modern Formula 1 there’s only one tyre supplier – Pirelli. There are 5 compounds for dry condition (C1 is the hardest, and C5 the softest) and two types of tyres for wet conditions – “Wet” for heavy standing water on the track and Intermediate for the track that’s drying. For each race Pirelli picks three dry compounds for the teams and assign them more human-readable names – Hard (marked with white stripes), Medium (marked with yellow stripes) and Soft (marked with red stripes). So for example for Bahrain this year C2 was marked Hard, C3 – Medium and C4 – Soft. C1 and C5 were not used during that race.

Tyre choices for 2021 Bahrain Grand Prix. Source: F1TV.com.

For different races, those options may be different (for example in Portimão last week C1 was Hard, C2 – Medium, C3 – Soft). Pirelli makes that decision depending on how abrasive and destructive for tyres is the particular track. For each racing weekend in 2021 teams receive 8 red (Soft) sets of tyres, 3 yellow (Medium) and 2 white (Hard) sets for each card and are free to use them as they wish, but they need to use at least two compounds in the race. This rule is not enforced if there are wet conditions. For those occasion the team receives 4 sets of Intermediate and 3 sets of Full Wet tyres for each car.

Generally softer tyres have better grip (and hence cars can go through corners at higher sped → lower lap times), but lower durability and hard tyres have good durability but worse grip. They also have different operating temperatures, so it’s easier to overheat the soft tyres and more difficult to put the hard tyres into good operating temperatures. Harder tyres usually provide slower lap times, but you can cover longer distance in the race. It’s a balancing act – you can use three sets of softer tyres and hope that with better lap times you can cover the 20-30 seconds time you needed for additional tyre change. Or do only one stop on slower/harder compounds. This usually works better on tracks where it’s harder to overtake – even if you’re slower because of the harder compound, the faster car may not be able to overtake you. We could talk about it for hours, but I’ll introduce more nuances as we go along. This is enough nerd talk about rubber for now.

ML.NET

I’m going to use ML.NET in this series of posts. I have some experience with it, but I want to learn more about the whole ecosystem while doing some fun project. Training, inference, deploying models, visualizing data – basically making a working “product” and seeing what tools we can use along the way. For context, I generally have experience building machine learning – powered products in Python frameworks, and I also have years of knowledge working in .NET projects. I want to connect those worlds. And even though I have some previous knowledge, I’m going to try to jump into it like I know nothing and make it a learning experience for you and me.

ML.NET is an open-source cross-platform machine learning library built on .Net. It implements several type of ML algorithms for solving few most common problems. It doesn’t have the flexibility of TensorFlow, but the learning curve is much more approachable, and it allows you to be fairly productive pretty quickly without investing a lot of maths knowledge. Is it the responsible approach? Probably not in all cases. You should always have a good understanding of tools you use to not hurt yourself or others.

We’re solving a regression problem

ML.NET approach is: start first with what kind of problem you’re solving. When trying to predict a specific numeric values (number of laps that tyre will be used), based on past observed data (tyre stint information from previous races), in machine learning this type of prediction method is called regression.

Linear regression is a type of regression, where we try to match a linear function to our dataset and use that function to predict new values based on input data. A classic example of linear regression is predicting prices of a flat based on its area. It’s a fairly logical approach – the bigger the flat the more it costs.

So on the image above you can see a simple chart with input data (green dots), the linear function that we matched to the data. And the red dot is our prediction for a new data point. Matching that function to the data set is the “learning” part, sometimes also called “fitting”. Predicting new value from that function is “inference”.

The input values are called “features”, and the outputs we have in our learning data sets are called “labels”. With one feature we can visualize that data in two dimensions, but the more features the visualizations gets a bit out of control of our 3-dimensional minds. Fortunately maths much more flexible. Initially I’ll be using 11 features for our dataset to solve this problem.

My dataset

So where do I get the data? I’m using what’s publicly available, and hence my results will be much less precise than what teams can do. But let’s see where that will lead us.

I decided, as a first task, to predict a “stint” length for each driver on each type of tyre. Stint is basically a number of laps driver did on a single set of tires. If we have one change of tyres in the race, we have two stints – one from the start to the tyre change, and the second from the tyre change to the race finish. Two tyre changes – three stints. You get the grips of it. Sometimes a stint is finished with less positive outcome, like accident, or retirement because of technical issues. Sometimes it extends too much and finish with a tyre puncture. Usually teams will try to get max out of the tyres, but not overdo it.

I took data from charts like this and then watched race highlights or in some cases full races to figure out why some stints were shorter than expected. On the image below from this year’s Bahrain GP you can see that most drivers were doing 3 stints, or to use Formula1 lingo – they were on “two stop strategy”. And for some drivers the lines are shorter, because their races finished with one of those less-favourable outcomes.

Tyre stints for each driver in 2021 Bahrain Grand Prix. Source: F1TV.com

To make reasonable predictions, I focus only the period from 2017 to 2020. Why this period? 2017 was the last time we had major changes in aerodynamics and tyre constructions. And even though there were some minor modifications since then, the cars are in principle similar, and they abuse the tyres similarly. We also have many drivers overlapping, so we can find some consistency in their style of driving.

I also picked only races on the same track for this first approach. There are few reasons. I’m removing the variable of different track surfaces. The races are the same length in this period, so I can focus on predicting stint lent in “laps” and don’t need to worry about calculating it from distance covered.

Each row in the dataset is one stint. I collected features as listed below:

Date – mostly for accounting reasons, but maybe in future we could extract time of year for some purpose.
Track – different track, different surface properties impact tyre wear (although for now it’s only Bahrain)
Layout – sometimes racing uses different layout on the same track (for example Bahrain has 6 layouts, out of them 3 were used in Formula 1 since 2004, but mostly irrelevant for the 2017-2020)
Track length – will be useful when predicting covered distance instead of laps for future races
Team – different teams approach strategy a bit differently (for example priority for different drivers) – may be relevant in future
Car – different cars have different aero properties and “eat-up” tyres differently
Drivers – different driving styles, different levels of tyre abuse – for example Lewis Hamilton owes several of his victories to great tyre management
Tyre Compound – which tyre compounds were used during that sting. I use the C1-C5 nomenclature as on different races “Soft” can mean different things, and C1-C5 range is universal (apart from year 2017, but I mapped it to current convention)
Track Temperature – warmer surface allows quicker activation of tyres, but also quicker wear; too cold is not good either, because tyres can be susceptible to graining, which also lowers their lifespan.
Air Temperature – not sure, but I had access to data, so decided to keep it
Reason – why the stint ended – for example Just regular Pit-stop, End of Race, Red Flag, Accident, DNF
Number of laps covered in a stint – this will be my “label”, so what we’re training to be able to predict

The code

The code, minus the dataset, is available on my GitHub.

As I mentioned, my approach will be like I’m completely new to ML.NET. We already established that we’re solving a regression problem. So I copied the sample regression project that predicts rental bikes demand and made minimal changes to the code.

First, I have a single dataset. In machine learning you generally want to have a training set – to train, and a testing set to test your hypothesis on data. Testing data shouldn’t be process by algorithm. It’s a bit like taking a test knowing sample questions, but not having the exact question that will be on the test. Bike rental example has already those datasets divided, but I had to add that bit in my case.

// Load data
var data = mlContext.Data.LoadFromTextFile<TyreStint>(DatasetsLocation, ';', true);

// Divide dataset into training and testing data
var split = mlContext.Data.TrainTestSplit(data, testFraction: 0.1);
var trainingData = split.TrainSet;
var testingData = split.TestSet;

Next I build the calculation pipeline. It’s basically the order of transformation we want to do on the data. I just replaced field names from the example code with the fields I use in my dataset. You can notice quite a lot of “OneHotEncoding”. This is an operation that indicates this field is a categorical data. So for example, we don’t want to assume that “Alfa Romeo” is better than “Red Bull” because it sits higher in alphabetical order. We also normalize the values for Numerical data, so they’re easier computationally for the fitting algorithm.

// Build data pipeline
var pipeline = mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName: nameof(TyreStint.Laps))
              .Append(mlContext.Transforms.Categorical.OneHotEncoding
                  (outputColumnName: "TeamEncoded", inputColumnName: nameof(TyreStint.Team)))
              .Append(mlContext.Transforms.Categorical.OneHotEncoding
                  (outputColumnName: "CarEncoded", inputColumnName: nameof(TyreStint.Car)))
              .Append(mlContext.Transforms.Categorical.OneHotEncoding
                  (outputColumnName: "DriverEncoded", inputColumnName: nameof(TyreStint.Driver)))
              .Append(mlContext.Transforms.Categorical.OneHotEncoding
                  (outputColumnName: "CompoundEncoded", inputColumnName: nameof(TyreStint.Compound)))
              .Append(mlContext.Transforms.NormalizeMeanVariance
                  (outputColumnName: nameof(TyreStint.AirTemperature)))
              .Append(mlContext.Transforms.NormalizeMeanVariance
                  (outputColumnName: nameof(TyreStint.TrackTemperature)))
              .Append(mlContext.Transforms.Categorical.OneHotEncoding
                  (outputColumnName: "ReasonEncoded", inputColumnName: nameof(TyreStint.Reason)))
              .Append(mlContext.Transforms.Concatenate
                  ("Features", "TeamEncoded", "CarEncoded", "DriverEncoded", "CompoundEncoded", 
                  nameof(TyreStint.AirTemperature), nameof(TyreStint.TrackTemperature)));

So far so good, let’s now train that pipeline with our test data. It looks surprisingly simple. I didn’t mess with the algorithm or the parameters of it. Just took the same regression algorithm used in Bike Rental examples. There will be time for some optimizations later.

var trainer = mlContext.Regression.Trainers.Sdca(labelColumnName: "Label", featureColumnName: "Features");
var trainingPipeline = pipeline.Append(trainer);

// Training the model
var trainedModel = trainingPipeline.Fit(trainingData);

So is our model good to make some predictions? Let’s try it out:

var predictionEngine = mlContext.Model.CreatePredictionEngine<TyreStint, TyreStintPrediction>(trainedModel);

var lh1 = new TyreStint() {
                  Team = "Mercedes", Car = "W12", Driver = "Lewis Hamilton", 
                  Compound = "C3", AirTemperature = 20.5f, TrackTemperature = 28.3f, Reason = "Pit Stop"};
var lh1_pred = predictionEngine.Predict(lh1);

var mv1 = new TyreStint() { 
                  Team = "Red Bull", Car = "RB16B", Driver = "Max Verstappen", 
                  Compound = "C3", AirTemperature = 20.5f, TrackTemperature = 28.3f, Reason = "Pit Stop" };
var mv1_pred = predictionEngine.Predict(mv1);

I run two example runs for top drivers. Lewis Hamilton got 19.46 laps on C3 tyre (“Medium” for Bahrain) and Max Verstappen – 14.96 lap. First of all values are reasonable – they’re not outrageously low or high. They also show what’s generally tends to be true – that Lewis Hamilton is better at tyre management than Max Verstappen. But they most likely include the fact, that Max was doing very poorly in recent years in Bahrain and on average had shorter stints because of DNFs.

So how good did we do?

So here are our predictions vs real stints of drivers. If the drivers took more than one stint on the compound, I listed all of them. If the driver didn’t use that tyre, the Actual column will be empty.

Driver	Compound	Prediction	Actual
Lewis Hamilton	C2	26,78	15; 28
Lewis Hamilton	C3	19,81	13
Lewis Hamilton	C4	16,18	–
Valtteri Bottas	C2	28,53	14; 24
Valtteri Bottas	C3	21,56	16; 2
Valtteri Bottas	C4	17,92	–
Max Verstappen	C2	22,27	17
Max Verstappen	C3	15,31	17; 22
Max Verstappen	C4	11,67	–
Sergio Pérez	C2	26,37	19
Sergio Pérez	C3	19,41	2; 17; 18
Sergio Pérez	C4	15,77	–
Lando Norris	C2	25,62	23
Lando Norris	C3	18,65	21
Lando Norris	C4	15,01	12
Daniel Ricciardo	C2	29,19	24
Daniel Ricciardo	C3	22,22	19
Daniel Ricciardo	C4	18,58	13
Lance Stroll	C2	23,35	28
Lance Stroll	C3	16,39	16
Lance Stroll	C4	12,75	12
Sebastian Vettel	C2	23,15	31
Sebastian Vettel	C3	16,18	24
Sebastian Vettel	C4	12,54	–
Fernando Alonso	C2	32,52	3
Fernando Alonso	C3	25,55	18
Fernando Alonso	C4	21,91	11
Esteban Ocon	C2	32,11	24
Esteban Ocon	C3	25,14	18
Esteban Ocon	C4	21,50	13
Charles Leclerc	C2	29,28	24
Charles Leclerc	C3	22,31	20
Charles Leclerc	C4	18,67	12
Carlos Sainz	C2	26,97	19
Carlos Sainz	C3	20,01	22
Carlos Sainz	C4	16,37	15
Pierre Gasly	C2	32,93	15; 13
Pierre Gasly	C3	25,96	4; 20
Pierre Gasly	C4	22,32	–
Yuki Tsunoda	C2	30,88	18; 23
Yuki Tsunoda	C3	23,91	15
Yuki Tsunoda	C4	20,27	–
Antonio Giovinazzi	C2	26,41	18
Antonio Giovinazzi	C3	19,44	12; 25
Antonio Giovinazzi	C4	15,806	–
Kimi Räikkönen	C2	25,65	16
Kimi Räikkönen	C3	18,68	13; 27
Kimi Räikkönen	C4	15,04	–
Nikita Mazepin	C2	25,92	–
Nikita Mazepin	C3	18,95	–
Nikita Mazepin	C4	15,31	–
Mick Schumacher	C2	25,92	22
Mick Schumacher	C3	18,95	14; 19
Mick Schumacher	C4	15,31	–
George Russell	C2	21,46	–
George Russell	C3	14,49	23; 19
George Russell	C4	10,85	13
Nicholas Latifi	C2	22,00	–
Nicholas Latifi	C3	15,04	18; 19
Nicholas Latifi	C4	11,40	14

So predictions are actually relatively reasonable. Those are not numbers that do not make sense. The model figured out that the C2 (Hard) tyres are lasting longer than C3 (Medium) and C4 (Soft). The numbers seem to be higher for drivers with better tyre management skills. They of course do not match the real numbers from the race, because there’s a lot of other things happening in the race. Crashes, retirements, reacting to strategy of other teams. Those are all things that we’ll need to somehow include in the future, but that’s not a problem for today. Overall I feel good about it. I think we have a decent model.

What does the metrics say?

But do we? Running some metrics that ML.NET have built in…


var predictions = trainedModel.Transform(testingData);
var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: "Label", scoreColumnName: "Score");

Console.WriteLine($"*************************************************");
Console.WriteLine($"*       Metrics for {trainer.ToString()} regression model      ");
Console.WriteLine($"*------------------------------------------------");
Console.WriteLine($"*       LossFn:        {metrics.LossFunction:0.##}");
Console.WriteLine($"*       R2 Score:      {metrics.RSquared:0.##}");
Console.WriteLine($"*       Absolute loss: {metrics.MeanAbsoluteError:#.##}");
Console.WriteLine($"*       Squared loss:  {metrics.MeanSquaredError:#.##}");
Console.WriteLine($"*       RMS loss:      {metrics.RootMeanSquaredError:#.##}");
Console.WriteLine($"*************************************************");

…gives following output:

*************************************************
*       Metrics for Microsoft.ML.Trainers.SdcaRegressionTrainer regression model
*------------------------------------------------
*       LossFn:        103,16
*       R2 Score:      -0,26
*       Absolute loss: 7,99
*       Squared loss:  103,16
*       RMS loss:      10,16
*************************************************

I’m not going to dig into what all of those metrics mean and just focus on the R2 Score, also known as Coefficient of determination. The better our function fits the testing data, the closer that value is to 1. 0 is bad. Negative is catastrophically bad. So even though some of our values looks reasonable, it’s probably accidental at this point. We’ll explore possible reasons for that and other metrics in future posts. Tyres vs Michał 1:0.

What’s next?

In the next instalment we’ll look at two races of Imola and Portimão. They will be tricky to get any reasonable results, because both of them will be running for the second time in recent years. But this will be a good occasion to extend our model and dataset to include other tracks. We’ll no longer be using laps as our label, because different tracks have different lengths. We’ll use total distance covered and calculate the number of laps out of it.

We should also look into removing some irrelevant data, like stints ended with crashes or done purely to beat the fastest lap (it’s a thing, but a long story to explain).

At some point we’ll also have to take a crack and why those metrics are so bad, and try different regression optimizers and different hyperparameters.

Until next time!

Part 2 – Imola / Portimão

Building IA32 assembly project in Visual Studio 2017

If you saw my presentation at DevConf you may have gotten curious and want to play with assembly language yourself. Setting up Visual Studio for compiling and debugging Intel assembly is not hard, but not straightforward. Hence, this short tutorial. I’ll show you simple setup with C file as an entry point and small assembly “library”.

Project setup

1. Create new project – C++ Win32 Console Application

2. On a first screen click Next, as we want into pick some custom settings. On the second mark “Console Application” and “Empty Project”. For this simple example, we won’t need any default Windows headers. Click “Finish”.

3. Right-click on the project, “Add” -> “New Item…”. Pick C++ file, but rename it to .c, for example “source.c”.

4. Add another file in a similar way and also rename it – to “lib.asm”. What names you pick doesn’t really matter, as long as C and ASM file have different. Add also header file. Name it for example “header.h”. Your project should have a similar structure to this:

5. Right-click on the project, “Build Dependencies” -> “Build Customizations…”. Mark “masm” as selected. This adds Microsoft Macro Assembler into the build.

6. Right-click the assembly file (lib.asm), “Properties”. Item type most likely says “Does not participate in build”. Change it to Microsoft Macro Assembler.

Code

Now let’s add simplest possible code to make it run. First assembly file. This code declares simple multiplication function that takes two arguments.

;;lib.asm

.386
PUBLIC _multiply

_CODE SEGMENT dword public 'CODE' use32
ASSUME CS:_CODE

_multiply PROC near

	push    ebp
	mov     ebp, esp   
	push	ebx
	
	mov	eax, [ebp+12]			;; second argument
	mov	ebx, [ebp+8]			;; first argument
	
	imul	eax, ebx			;; multiplying; lower 32 bits -> eax; 
						;; higher -> edx
	pop	ebx
	pop	ebp
	ret
_multiply ENDP

_CODE ENDS
END

Then let’s add the header, so we can use this function:

#pragma once

int multiply(int a, int b);

And for the finish, the C file that will call our function:

#include "header.h"

int main(int argc, char *argv[]) {
	int a = 3;
	int b = 2;

	int mult = multiply(a, b);

	return 0;
}

Debugging

Now you’ll be able to debug it like every other application in visual studio. You can set breakpoints and even Watches to see what’s in registers:

One problem, that I’ve noticed happening quite often, is that Visual Studio doesn’t notice changes made in the .asm file and do not recompile the project. I found that a good way to force it is to change target CPU for something different, build, change back to our preferred settings and rebuild again.

Resources

Intel® 64 and IA-32 Architectures Software Developer’s Manual

This x86 Assembly guide at University of Virginia CS

Good luck with your experiments and exploring your computer’s architecture!

Phoenix, Ecto and time zones

This is another part of my series on learning Elixir.

In the previous episode, I calculated the distance of flight. I will use it later for gathering statistics. Another metric, I would like to have is a time of flight. Usually, flight departure and arrival times are presented in local time. So it gets a bit tricky to get the time length when the journey spans across few time zones.

Elixir’s standard DateTime library doesn’t work with time zones very well. The Internet suggests I should use Timex.

But first I needed to make some changes into my database because up until now, I had only flight date. A few weeks ago I wrote a post, how to update your schema with migrations and this time I followed the same steps. Still, I haven’t figure out how to do it in a bit more automated manner.

defmodule Flightlog.Repo.Migrations.CreateFlight do
  use Ecto.Migration

  def change do
    alter table(:flights) do
      modify :arrival_date, Timex.Ecto.DateTimeWithTimezone
      modify :departure_date, Timex.Ecto.DateTimeWithTimezone
    end
    
  end
end

As you can see, I used specific Timex types. This works only with Postgre, and if you want to use timezones it needs one additional step. You’ll have to add custom type to your database:

CREATE TYPE datetimetz AS (
    dt timestamptz,
    tz varchar
);

You can read more about using Timex with Ecto on this documentation page.

I also updated my flight.ex model. It looks like that right now:

defmodule Flightlog.Flight do
  use Flightlog.Web, :model

  schema "flights" do
    field :departure_date, Timex.Ecto.DateTimeWithTimezone
    field :arrival_date, Timex.Ecto.DateTimeWithTimezone
    field :flight_number, :string
    field :plane_type, :string
    field :from, :string
    field :to, :string

    timestamps()
  end

  @doc """
  Builds a changeset based on the `struct` and `params`.
  """
  def changeset(struct, params \\ %{}) do
    struct
    |> cast(params, [:departure_date, :arrival_date, :flight_number, :plane_type, :from, :to])
    |> validate_required([:departure_date, :arrival_date, :flight_number, :plane_type, :from, :to])
  end
end

After that, I walked along the path that’s proven to work in the flight distance part. I added a new function in my math.ex library, making use of Timex diff function:

    def flightTime(earlier, later) do
        hours = Timex.diff(later, earlier, :hours)
        minutes = rem(Timex.diff(later, earlier, :minutes), 60)
        "#{hours}:#{minutes}"
    end

And I’m calling it from the view in another function, so it’s easily accessible from the template:

  def time(time1, time2) do
    Flightlog.Math.flightTime(time1, time2)
  end

And that was it. Although it took me few hours because I was struggling a bit with Timex types. I didn’t read carefully the documentation and for example missed the step with creating new Postgre type. Good lesson here, to look into docs carefully :)

The effect:

Screen Shot 2017-05-07 at 23.26.43.png

As you can see, I made some approach at formatting dates. Unfortunately, I didn’t manage to show them in local time. They’re in UTC in here. This will be next step most likely.

That’s all for today. Next week we’ll try to test this new module. In the meantime, check previous episodes. And if you’re interested in machine learning, look into my weekly link drop.

Broadcasting custom packages on Estimote Location Beacons with Raspberry Pi

Last summer together with Karl-Henrik Nilsson and Nejc Palir I worked on a fun project that incorporated new features of Estimote Location Beacons. It was the solution for a public transport company that used beacons for two things – delivering proximity-based features and broadcasting custom messages to customers. The second part was possible thanks to GPIO ports on (then very) new Location Beacons. We worked closely with Estimote to deliver a solution based on not very well documented feature. It was built with Raspberry Pi running WiFi hotspot powered by an external battery, which runs Node.js serving configuration application. On top of that, we built client application for iPhone and several other components.

For this post, I’ll show a simpler example, to isolate one particular feature. I’ll focus on connecting Raspberry Pi to beacon, and writing code that will change broadcasted “Local Name” of it. I’ll also use Node.js as it allows for rapid changes without the need to recompile. This will be a good occasion to talk a little bit about UART, Bluetooth data frames and simple npm package we built as a result of this project. It’s open sourced on GitHub, so you can peek inside to see our lousy JavaScript and maybe offer some improvements? :)

You’ll need one Raspberry Pi with a clean install of Raspbian, one Location Beacon and 4-wire ribbon cable. Make sure your RPi can connect the Internet and install Node.js and npm.

Connecting Raspberry Pi to Estimote Location beacon

There is a bunch of features that differentiate Location Beacons from older Proximity Beacons. They have better battery life, a longer range, can broadcast multiple packets at the same time, more sensors, internal memory and few other things. They all greatly increase usage scenarios for beacons. More importantly, Location Beacons have GPIO port. You can use it for sending signals to connected devices, for example turning on lamps when you’re in proximity. But you can also use the port in UART configuration and use it to send data in both directions. For example changing BLE frame that is broadcasted by the beacon.

GPIO port has four pins. GND, Vcc IN (5V Power), GPIO0 and GPIO1 Estimote pinout.png
Picture comes from official Estimote App

Picture comes from Element14

When you set the beacon to work in UART configuration (more on that in next paragraph), GPIO0 will be UART_RX (receiving) and GPIO1 will be ‘UART_TX’ (transmitting). To establish proper communication with Raspberry Pi, you’ll need to connect Vcc IN to Pi’s 5V power (pin 02), GND to GND (pin 06), GPIO0 (receiving) to GPIO14 (aka TXD0 – UART transmitting, pin 08) and GPIO1 to GPIO15 (aka RXD0 – UART receiving, pin 10). So power to power, ground to ground and transmitting to receiving (so one device can receive what other transmits).

Connection scheme.png

Picture: Simplified schematic made using Fritzing. The beacon image comes from a package made by Devin Mancuso that’s available on CreativeCommons Attribution 3.0 license. I modified his work to use in Fritzing.

Configuring beacon

Estimote Location Beacons by default are not configured to use GPIO in UART mode. There is no option to do so in the official app, but you can use the Estimote iOS SDK to do that. Alternatively, Piotr Krawiec built a modified version of Estimote’s “Configuration” iOS example/app template that does exactly that. You need to compile it by yourself and deploy on your iPhone. Then you’ll have to connect to the beacon and configure it. That should be all. Unfortunately, you won’t get any visual confirmation from the beacon that it worked.

Screenshots from UART configuration app

Estimote Android SDK doesn’t support enabling the UART mode yet so you might want to borrow an iPhone from a colleague. Alternatively, Estimote told us that if somebody asks for this to be added to the Android SDK, they will gladly do that.

Configuring Raspberry PI to use UART instead of console

We’ll also need some changes in RPi configuration. By default, the UART ports are set to work as external console connection point. So we need to do 2 small changes:

In file /boot/config.txt add line
enable_uart=1
in /boot/cmdline.txt change serial0 to serial1 and make sure baud rate is set to 115200

Writing to the beacon

UART (Universal Asynchronous Receiver/Transmitter) is a hardware used for asynchronous serial communication. It has very lightweight communication protocol and hence is quite often used in microcontrollers. Estimote beacons use UART with custom implemented protocol, called EstiUART. You can find simple documentation for it in this package from Estimote. The way it works is that whole Bluetooth frame that you want to broadcast gets wrapped in another frame of Estimote bytes and you just send it over at the certain baud rate.

Estimote has 3 custom broadcasters and you can broadcast 3 different messages in parallel. On top of that, you can use standard iBeacon or Eddystone broadcasting packages. EstiUart allows setting transmitting power, transmitting frequency and Bluetooth package for each of them separately. To set new broadcasting messages, first, you need to enable one of the broadcasters. Then you send whole BLE frame wrapped between Estimote header end stop byte.

There are 3 special bytes that you’ll have to take special care of:

Estimote start byte. Every message is started with it: 0x73
Estimote stop byte. Every message is finished with it: 0x65
Escape byte: 0x5C
One example use is when you need to encode letter ‘e’ that has an ASCII code of 101 (0x65 in hex) so it won’t be confused with stop byte.

Example frame

Let’s look at the example below. It sets Local Name of the first of three advertisers to value “Test”:

0x73,0x13,0x42,0x10,0x00,0x00,0xBB,0xBB,
0xBB,0xBB,0xAE,0x02,0x01,0x04,0x06,0x09,
0x54,0x5C,0x65,0x5C,0x73,0x74,0x00,0x65

Bytes	Description
0x73	Estimote start byte
0x13	Register that will be written to. 0x13 is a register to set advertised data for the 1^st broadcaster. That means, that 1st broadcaster will now transmit BLE package starting from next byte. Some other examples: 0x11 is used to set advertising power (Tx) of the first broadcaster; 0x22 is used to set advertising interval for the second broadcaster.
0x42	1^st byte of PDU header, describing modes of communication. This is the beginning of Bluetooth frame. For Estimote this will always be 0x42. You can read more in this great blog post.
0x10	2^nd byte of PDU header – length of the packet. In this case 16. This includes following zero-byte, MAC address, following meta information and the message itself, but not padding. For the message, you shouldn’t count escape bytes, as they’re processed in estimote and won’t be transmitted as a part of Bluetooth packet.
0x00	Must be zero by design.
0x00 0xBB 0xBB 0xBB 0xBB 0xAE	MAC address. This is so-called static MAC address. It can be generated randomly but needs to meet few requirements.
0x02	Following is a number of GAP (Generic Acces Profile) sections consisting of three parts. 1) The length of the following section. In this case next 2 bytes.
0x01	2) Type of section. 0x01 means “flags”. It’s the only mandatory section. Read more here.
0x04	3) Section Content. For Estimote this flag must be set to 0x04. Read more about flags here.
0x06	Next GAP section length
0x09	Type of section. 0x09 is Complete Local Name.
0x54 0x5C 0x65 0x5C 0x73 0x74	Section Content. This is the Local Name that will be broadcasted: “Test” ASCII for T, Escape byte (because ASCII for ‘e’ is the Stop byte), ASCII for ‘e’, Escape byte (because ASCII for ‘s’ is the Start byte), ASCII for ‘s’, ASCII for ‘t’.
0x00	Padding. From my experience, just one byte is enough.
0x65	Estimote stop byte

Beacon-pie package

To not deal with building the frame every time, me and Karl-Henrik build npm package that takes the message and do the work for us. It’s called beacon-pie. Currently, it only supports setting Local Name, but potentially you can set this way lot of other Bluetooth properties, like offered services. We also add controls to turn on/off broadcasters and change their parameters (interval and Tx).

Here is the simplest example how to set up Local Name to “Test” with BeaconPie. You can either set up all properties in one function or all of them separately.

How to confirm that it worked?

You don’t need to write your own app to just make sure it works. Eventually, you’ll want that because that’s why you make changes on the beacon in the first place. But to just check, there’s a simpler way. Download one of the beacon reading apps available on your platform application store. I use Nordic Semiconductors nRF Connect, which is available both for iOS and Android.

Screenshot from nRF Connect. You can see both our “device” that transmits local name ‘Test” and one of the default Estimote packets.

Summary

So to sum things up. It’s not well documented, but it’s totally possible to use GPIO of Location Beacons as UART input and alter broadcasting messages of the beacons. I used Raspberry Pi, but it will work with anything that can send bytes. And you can construct bytes manually, or use our npm package. Feel free to contribute, fork it or steal it.

Useful links

If you’re gonna write iPhone apps interacting with Bluetooth Low Energy Devices, I found those two blog posts very helpful:

Thanks for coming by. This was the most labour-heavy post on this blog so far. I also write about functional programming and machine learning. Look around, if you’re interested.

Doing math in Elixir – calculating Great Circle Distance

In last two parts, I was setting up and consuming quick API to get additional airport data. One of this information was exact airport location on the Earth. In this part, I’ll use that information to calculate the distance of the travel and show how you can use standard math library in Erlang VM.

Calculating distance of air travel based on location is not super hard, but it needs to accommodate the fact that Earth is not flat. We’ll use something called “Great-circle distance“, as that’s how planes fly. So when you take most commonly used Mercator projection of the map, the shortest path between two points won’t be a straight line. It’s gonna have more parabolic-like shape. It’s best visible if you pick longer flights. Just go to flightradar24.com and check any plane overflying the Atlantic. They seem to be all flying towards north first and then turn south as the flight progresses. In fact, they fly a straight line, but because Eart is a sphere (in simplified model), meridians are not really parallel to each other.

After this geography primer, let’s get to math. The formula for calculating great-circle distance looks like that:

$\Delta\sigma=\arccos\bigl(\sin\phi_1\cdot\sin\phi_2+\cos\phi_1\cdot\cos\phi_2\cdot\cos(\Delta\lambda)\bigr).$

Where $\phi _{1},\lambda _{1}$ and $\phi _{2},\lambda _{2}$ are latitude and longitude of two points on Earth. This formula takes in radians and outputs radians. The result is angle difference between two points. To get the distance in kilometres, we have to multiply it by the radius of the Earth, which in the metric system is around 6370kms.

$d=r\,\Delta \sigma .$

To the code! I changed my functions in view, so instead of returning separately latitude and longitude, they give back a pair of coordinates as a tuple.

https://gist.github.com/mlusiak/0c934302b100a8b0f8899010fdd8422b

Then I pass the tuples to newly created function distance, that calls newly created Flightlog.Math module. I put the module in /lib folder and it all worked by the first run!

https://gist.github.com/mlusiak/031d81a94c3302bb6be9068c8bc29345

:math is a reference to Erlang VMs math library. One thing that’s needs explanation is changing degrees into radians. Radian is a different mathematical representation of the angle. 1 radian means, that the length of the arc of the part of the circle, that this degree describes equals the length of the radius of that circle. So the full circle is slightly over 6 radians (2 * Pi), as a formula for the length of the circumference of the circle is 2 * Pi * radius.

That’s all for today. Next week we’ll try to test this new module. In the meantime, check previous episodes. And if you’re interested in machine learning, look into my weekly link drop.

Setting up quick API with F# and Azure Functions

As mentioned in my last week Elixir blog post, I produced some quick fake API based on Azure Functions. I thought it’s gonna take a couple of minutes, but it turned out to be a whole adventure in itself.

The creating of a function is a breeze.

Go to Portal, click big green “+” sign and search for “Function App”
Pick Function App published by Microsoft
Fill all the necessary fields like App name (must be globally unique) or location. For hosting plan I used “Consumption plan” which means, I pay only for the time that function is running. I also like to pin my stuff to the dashboard, so it’s easier to find.
It will take several minutes to deploy.
Now you can create your functions for the app. F# is hidden in small print just above “Create this function” button. So click “create your own custom function”.
Then with Language drop-down, pick “F#” and for Scenario – “API & Webhooks”. There should be on the F# function triggered by HTTP request. That’s the one you want for API.
You’ll get premade piece of code with a simple function that is triggered by HTTP POST with name object and responses “Hello “.

Then I started writing the logic I wanted. I made an array of hard coded airport data. I made the function to accept only GET requests (you can change it in function.json file). In code, I parse query strings and get the airport IATA code. If I have this airport in my array, I response 200 with JSON containing the data. Otherwise, I return 404. If there’s no parameter in the query string, function answers with 500.

It’s relatively simple and straightforward F# code. I just struggled a lot with debugging. The small editor on Azure doesn’t give you static analysis, nor type information and no squigglies. You need to run the function and check for compilation errors or runtime errors. There was also some weird scoping behaviour, that forced me to declare the Airports array within the function. Anyways, after 2hrs I had an API that did what I wanted. You can see the code below. It’s not bulletproof, but it does the job. And I got to play with Azure Functions a bit.

https://gist.github.com/mlusiak/5053aabbc1c6e76db082dc8daa952c81

If you want to read more about other types of F# Azure Functions, Mathias Brandewinder wrote recently two posts about timer and queue triggered functions.

That’s all for today. Tune in next week for another part. Also, check previous episodes. And if you’re interested in machine learning, look into my weekly link drop.