Q1) Why is SensiML collecting this data?
A: AI technology is a powerful tool for finding patterns in rich and/or large sets of data. Our company has worked across a broad set of sensor data applications since our inception in 2012 as an Intel project team applying AI to wearable computing and industrial sensing prior to spinning out to form SensiML in 2017.
Whether detecting faulty pumps, bearings, and machinery, or analyzing cough audio data for the subtleties to differentiate suspected COVID-19 disease, AI relies on high quality training data to produce good results. Training data is a combination of readily measurable source sensor data combined with labeled outcomes (so-called ‘ground truth’) typically not so easy to obtain. Together this data is used to teach the AI algorithm how to predict future data without having to be supplied the ground truth. It’s for this reason that we ask for both the cough sound itself and a set of basic questions to help establish ground truth to properly train the AI algorithm for subsequent insight.
Q2) Who is SensiML and what is its expertise in COVID-19 detection?
A: SensiML is a software company specializing in AI tools for developers of smart sensor devices. Thus while you may not have heard of us directly, you’ve no doubt heard of many of the companies who use our AI software to build intelligence into their wearable and IoT products. Our origins are also well recognized having spun out in 2017 from Intel Corporation where the progenitor of our current software toolkit supported intelligent sensing for Intel Curie and Quark SE microcontrollers. That said, we are not pulmonologists nor epidemiologists. We rely on our academic and healthcare partners for underlying medical domain expertise as required.
Q3) What will you do with any data I provide?
A: Our intent is to quickly collect a sufficiently large dataset to support good AI science in the application of sound analysis for symptom classification. Our envisioned usage for this application is as a decision support tool for clinical diagnostic testing and/or to aid in the screening of suspected infectious people in high-risk environments. If you share our desire to get to better tools to help re-open economies safely, we invite you to participate in our effort by supplying data.
Those who do provide data should know that we:
- Do not ask for anything that could be personally identifiable (if your sound file contains anything but three coughs, it will be excluded and deleted)
- Will aggregate all data provided in an anonymized form
- Will not surreptitiously collect other identifying data such as your IP address or browser tracking data
- Will not contact you for follow-up or marketing purposes (after all, we won’t know who sends us their data)
- Will share data from this collection project as open-source to benefit all; not just our own commercial efforts.
- More details can be found in the privacy policy link just above the ‘Submit Data’ button
Q4) What if I don’t feel comfortable answering some or all of the questions?
A: It’s entirely up to you to decide what data you are comfortable sharing (or even to share anything at all). Certainly, the more complete the survey, the more useful it is for training either our or other’s AI algorithms, but even partial responses can be of use.
Q5) When will SensiML be publishing the open source dataset?
A: Once we have reviewed and filtered the samples provided (we will reject any audio files other than those containing three coughs) and concluded there is statistical significance to the aggregate sample size, then we will publish the dataset. We will publish on datadepot.sensiml.com so you may check periodically to see when it is published. We will not be collecting names nor can we proactively follow-up with participants to let them know. Thank you for your understanding.