Date 03/26/19

The Way You Collect Data Can Make or Break Your Next AI Project


If you want to successfully solve a problem using machine learning, the foundation of your work should not be a set of fancy algorithms. Rather, your first step should be the collection of high quality data. After all, data is what the machine learning model will learn from. And if the data is incorrect or bad, or inaccurately labelled, there’s no fancy algorithm in the world that can save you.

We know this at Imagimob because we’ve tried and failed plenty of times before finally cracking the formula for successful AI projects. And, according to our learnings, the first set of priorities for anyone looking to solve a problem using machine learning should be figuring out what data to collect and what tools to use to collect it.

Let's start with the tools
When we first started running AI projects, we had no tools. So we built some extremely crude ones. At Imagimob, we build machine learning models that act on time-series information. Time-series information, or data, is basically made up of signals or events happening over time, like signals from sensors, an engine, or a radar.
 
So, we built tools to collect and store this kind of data in a very efficient manner only to find out that humans are not very good at spotting patterns in this kind of data. It's very difficult to understand what raw sensor data actually mean, as you can see in the picture below. So now, we had a lot of high quality data and no way of accurately labelling it.

This meant that the machine learning model had the right data, but with the wrong annotation. Hence the model was being deceived, by design. What we learned the hard way was that, in our branch of machine learning, we need to collect a lot of meta data. And not for the algorithms, but to enable us humans to teach the algorithms. 
 
So we developed data collection tools that can collect video and audio together with the time-series data, and even annotate the data on the fly. This means that we can start labelling on the spot. Additionally, if we need to collect data over the course of days or months, we now have the ability to go through the data as if we're there in real-time, with the ability to freeze time, rewind, and get those labels just right.
 
But even with the perfect tools, you need to know what data to collect
Here, it becomes interesting. Of course, what data you should collect highly depends on your specific problem, and machine learning can be applied to so many areas that it's difficult to generalize. But, in any case, we’ve got some derived wisdom to share with you: Measure performance from the end-user perspective. 

In some cases, the "real" world differs from the academic world and machine learning academia is a good example. In machine learning academia, accurate numbers are everything.You "win" if you create a machine learning model that solves a problem with the highest degree of accuracy, regardless of whether or not the collected data is representative of a real world scenario, or if an accuracy number even makes sense as a measurement in that scenario.
 
Here’s an example: 
Let's say that you are designing a machine learning model that detects whether or not a person is falling by reading the motion sensor information from a sensor attached to their body. Let's say that you’re measuringt he performance of your solution in the standard academic way. Your measurement might look like this:
 
Correctly detecting a fall, when an actual fall has occurred: 95% accuracy
Correctly detecting a fall did not occur, when an actual fall did not occur: 99.5% accuracy
 
These numbers look impressive—95% and 99.5%! Almost unheard of. Too bad it doesn't mean anything in the real world.
 
What you should have done, is collect data from real people for extended periods of time. That way, you can translate the second piece of information into another measurement, the number of false triggers per user per day. By doing so, you would see 99.5% accuracy translated into several false triggers per day and a product that is, quite frankly, not good enough for the market.
 
The most important tip we can offer for your next AI project is to collect this kind of data and measurements as early as possible. Not only will this result in a better product. It will also save you time. A lot of time.
 
And, of course, the best way to collect this data is using Imagimob Capture, our tool for data collection. Contact us to learn more.

Alexander Samuelsson, CTO and Co-Founder

About the video below: In a project with Husqvarna Group we are doing data collection together with professional foresters.

LATEST ARTICLES
arrow_forward
Date 11/05/24

November release of DEEPCRAFT™ Studio

Imagimob Studio is now DEEPCRAFT™ Studio! Just last wee...

Date 09/13/24

New research on data quality's role in model effic...

Earlier this month, at the 9th International Conference on F...

Date 09/03/24

September Release of Imagimob Studio

Date 07/05/24

Imagimob at tinyML Innovation Forum 2024

Date 07/01/24

Imagimob Studio 5.0 has arrived!

Date 05/13/24

May release of Imagimob Studio

Date 04/11/24

2024 State of Edge AI Report

Date 03/11/24

What is Edge AI?

Date 03/08/24

March release of Imagimob Studio

Date 02/18/24

What is tinyML?

Date 02/06/24

February release of Imagimob Studio

Date 01/16/24

Introducing Graph UX: A new way to visualize your ...

Date 12/06/23

Imagimob Ready Models are here. Time to accelerate...

Date 01/27/23

Deploying Quality SED models in a week

Date 11/17/22

An introduction to Sound Event Detection (SED)

Date 11/14/22

Imagimob condition monitoring AI-demo on Texas Ins...

Date 11/01/22

Alert Vest – connected tinyML safety vest by Swanh...

Date 10/21/22

Video recording from tinyML AutoML Deep Dive

Date 10/19/22

Edge ML Project time-estimates

Date 10/05/22

An introduction to Fall detection - The art of mea...

Date 04/20/22

Imagimob to exhibit at Embedded World 2022

Date 03/12/22

The past, present and future of Edge AI

Date 03/10/22

Recorded AI Tech Talk by Imagimob and Arm on April...

Date 03/05/22

The Future is Touchless: Radical Gesture Control P...

Date 01/31/22

Quantization of LSTM layers - a Technical White Pa...

Date 01/07/22

How to build an embedded AI application

Date 12/07/21

Don’t build your embedded AI pipeline from scratch...

Date 12/02/21

Imagimob @ CES 2022

Date 11/25/21

Imagimob AI in Agritech

Date 10/19/21

Deploying Edge AI Models - Acconeer example

Date 10/11/21

Imagimob AI used for condition monitoring of elect...

Date 09/21/21

Tips and Tricks for Better Edge AI models

Date 06/18/21

Imagimob AI integration with IAR Embedded Workbenc...

Date 05/10/21

Recorded Webinar - Imagimob at Arm AI Tech Talks o...

Date 04/23/21

Gesture Visualization in Imagimob Studio

Date 04/01/21

New team members

Date 03/15/21

Imagimob featured in Dagens Industri

Date 02/22/21

Customer Case Study: Increasing car safety through...

Date 12/18/20

Veoneer, Imagimob and Pionate in joint research pr...

Date 11/20/20

Edge computing needs Edge AI

Date 11/12/20

Imagimob video from tinyML Talks

Date 10/28/20

Agritech: Monitoring cattle with IoT and Edge AI

Date 10/19/20

Arm Community Blog: Imagimob - The fastest way fro...

Date 09/21/20

Imagimob video from Redeye AI seminar

Date 05/07/20

Webinar - Gesture control using radar and Edge AI

Date 04/08/20

tinyML article with Nordic Semiconductors

Date 12/11/19

Edge AI for techies, updated December 11, 2019

Date 12/05/19

Article in Dagens Industri: This is how Stockholm-...

Date 09/06/19

The New Path to Better Edge AI Applications

Date 07/01/19

Edge Computing in Modern Agriculture

Date 04/07/19

Our Top 3 Highlights from Hannover Messe 2019

Date 03/26/19

The Way You Collect Data Can Make or Break Your Ne...

Date 03/23/18

AI Research and AI Safety

Date 01/30/18

Imagimob and Autoliv demo at CES 2018

Date 05/24/17

Wearing Intelligence On Your Sleeve

LOAD MORE keyboard_arrow_down