Dr. Michael Beck is always doing fieldwork, even when he’s sitting in a computer lab.
Dr. Beck, who teaches in The University of Winnipeg’s Department of Applied Computer Science, works in the burgeoning field of digital agriculture, which leverages the latest intelligent technology to innovate in the agriculture and agri-food sectors.
His latest research project, which has landed a five-year, $137,500 NSERC Discovery Grant, could one day revolutionize how farmers tend to their crops.
Instead of just throwing more data on it, let’s think about whether we can control what is the best data to throw at it.
Dr. Michael Beck
The project, entitled Optimization of data and their collection processes for machine learning applications in digital agriculture, grew out of TerraByte, an ongoing interdisciplinary research project at UWinnipeg that explores machine learning applications in agriculture.
“I think TerraByte is an amazing project,” Dr. Beck said. “It has grown so quickly from very humble beginnings. It goes from physics to computer science to biology and engineering. All these topics are touched on at our weekly meetings.”
Dr. Beck, who holds a Master’s in Mathematics and a PhD in Computer Science, never would have guessed he’d end up applying those disciplines to agriculture.
“It was definitely not what I expected to do when I started in mathematics,” he said with a laugh.
At TerraByte, where he is one of the lead researchers, he builds robots, develops databases, and tackles problems using data analytics. His first task in the TerraByte lab was to build EAGL-I (pronounced “Eagle Eye”), a track-based robot equipped with a camera for scanning plant seedings.
EAGL-I creates the large image sets needed for machine learning models to perform in an agricultural environment, detecting weeds or assessing plant health. To be trained properly, machine learning models need to be shown relevant images again and again.
“This data set did not exist at the time,” Dr. Beck explained. “Even if we know that this is a soybean field, and that most of the plants we are seeing here are soybeans, when we give these images to a program and say, ‘All of these are soybeans,’ this will not be true, because there will be some weeds in between. The model will then start mixing these up. So, labelling is the tricky part. This is why we built these robots.”
However, large datasets are unwieldy to work with and take up huge amounts of server space. They can also suffer from redundancy: a high number of duplicate and near-duplicate images that are of little benefit.
“For training machine learning models, we want to give them as much variety as possible,” Dr. Beck explained. “The more varied our data set is, the better it will generalize.”
Weeding the data fields
Dr. Beck’s NSERC-funded project wants to find an efficient way to sift through the data sets, find the redundant images, and weed them out. A final step involves determining the degree to which this process impacts the overall performance of the machine learning model.
“If I have two images that are very similar, I can probably get away with throwing one of them out, and now I have a smaller data set to train on, but I will still get the same performance,” Dr. Beck said. “This can be a real game-changer because creating these machine-learning models is very compute intensive.”
The size of the datasets can reach a terabyte (1,000 gigabytes) or more, requiring the computing power of several servers. An efficient, optimized data set means faster, less resource-intensive training, and easier sharing. Dr. Beck’s approach is more resourceful than the previous solution, which was to source more and more powerful computers.
“Instead of just throwing more data on it, let’s think about whether we can control what is the best data to throw at it,” he said.
Dr. Beck’s approach is also scalable and “data agnostic,” possessing applications outside of agriculture.
“Once you are able to detect these redundancies in your dataset, it will be hopefully generally applicable, no matter the size of your dataset,” he said.
However, the complexity of machine learning models makes it hard to know the overall impact of removing each duplicate image. Md Ashique Imran, a graduate student, is assisting Dr. Beck in devising a shortcut for distinguishing which images are beneficial, and which are not. Earlier this year, Imran received a one-year, $20,000 Master’s Studentship Award from Research Manitoba to work on dataset optimization using image processing.
Looking ahead to the future
Having computer programs that can differentiate between cash crops and weeds could one day revolutionize modern agriculture. These include blanket herbicide spraying being replaced with targeted application, early detection of common crop diseases like fusarium head blight, and drone or satellite yield prediction for insurance companies. One day, the technology could even bring about the return of polyculture.
“If we really get this solved, where we can go and identify these plants on an individual basis, we could have this little swarm of robots go through a field and tend to it on a plant-by-plant basis, then we don’t need these big monoculture fields anymore,” Dr. Beck said. “Then we can do polycropping, growing several things in the same field. Before the industrialization of agriculture, people were doing this all the time.”