Using the power of language analysis to support software development
Open-Source Software drives many of our technological advances. The vast majority of companies and organisations use it in some form. But developers can find it challenging to choose which software to use, with the benefits and drawbacks hidden amongst global noise and chatter.
Yannis Korkontzelos, Professor in Computer Science at Edge Hill University, is helping revolutionise how Open-Source Software (OSS) is analysed.
He is bringing insight and order to the noisy and often chaotic world of OSS development through new Natural Language Analysis (NLA) tools.
Analysing conversations between software developers and users in social media and other forums can highlight OSS software that is well supported or may throw up problems.
This information can help developers make quick and confident decisions about the best software to use, improving services for customers or making companies’ internal processes more efficient.
The project – part of a European innovation programme – is paving the way for a better understanding of the complex and voluminous conversations within ‘big data.’
Bringing order to the chaos of Open-Source Software
Open-Source Software (OSS) is free-to-use software developed in communities of independent developers.
OSS has a freely available source code that developers can improve or modify. This contrasts with proprietary software, such as that belonging to Apple or Microsoft, which is maintained and upgraded ‘in-house’.
Many commercial software developers use OSS components to reduce costs. This has both benefits and drawbacks.
The sheer number of global users of an OSS project offers some security. The quality of the projects can also be high thanks to incremental software improvements over many years.
On the downside, developers have to contend with a project having no dedicated user support or quality guarantee.
In addition, many pieces of software may be designed to do the same thing. It can be costly and time-consuming to identify them all and select the most suitable one manually.
CROSSMINER – a three-year EU Horizon 2020 project – is focused on overcoming these challenges, helping developers choose the most appropriate OSS components for any software project.
Professor Korkontzelos secured funding to ensure NLA tools were fundamental components of the CROSSMINER platform. They have since proved key to its success.
Unlocking the power of language
CROSSMINER’s NLA tools have brought together linguistics and computer science theory, research and application.
Existing language analysis programmes tend to use basic methods to categorise patchy information. They don’t give developers the necessary reliable output to make informed, confident decisions when selecting software.
However, the tools developed by Professor Korkontzelos and his team ‘dig deep’ into the language used in forums, email threads and social media between developers and users. They categorise it in many different ways and score the quality of communications.
The NLA tools are unique in their breadth and detail:
Code detector
This is a basic tool for the first step of language analysis. It separates out snippets of code users may have included within the text to show where they’ve identified errors or need help.
Content recogniser
This categorises messages into 35 high-level content types using sentence structure and keywords. For example, it can identify: questions in developer communities, responses, ‘thank you’ notes or follow-up comments or questions. These can then be statistically analysed.
For instance, developers will be more confident in choosing a software component where questions about it have been answered quickly and helpfully, rather than remaining unanswered or unresolved.
Severity classifier
This recognises that all issues detected are not equally important and shouldn’t be an automatic red light for developers. In the extreme, it identifies those likely to be ‘no gos’, such as incompatibility with critical systems, and those that may be minor inconveniences, such as errors with non-relevant features. There is a range of classes in between.
Sentiment analyser
This looks for the high level ‘feeling’ of a communication – positive, neutral or negative.
Emotion detector
This tool detects emotions in text, using those outlined in Plutchik’s ‘Wheel of Emotions’. It can be a warning sign for developers if most users are frustrated or angry in discussions about a piece of software.
From modelling to real life
Professor Korkontzelos and his team have been working alongside six leading industrial software development partners, continually testing and evaluating the NLA tools to ensure they address real-world issues in selecting OSS.
The companies, which range from those building and selling software to repositories and hosts for OSS, have more than five million users. They also serve major industrial clients, such as Allianz, Nokia and Ericsson.
The NLA tools have supported them to:
- Improve decision-making – by measuring the maturity of a project’s OSS community and the support it provides, developers can be confident that they’re going with an ‘active’, well-supported project rather than one beset with problems that aren’t readily resolved.
- Improve software quality – by constantly monitoring development, issues are easier to spot and can be fixed more quickly.
- Reduce manual effort and cost – software is automatically monitored for changes, alerting developers.
- Improve user satisfaction – evidencing the health and maturity of OSS communities can help developers offer new services to delight customers.
For example, the Eclipse Foundation, which hosts hundreds of OSS projects and helps people use and integrate them, has seen increased user satisfaction – 70% amongst those who use its dashboards. These portals now include the long-awaited sentiment analyser and emotion detector, helping boost the non-profit organisation’s competitive advantage.
Meanwhile, Bitergia, which provides quantitative analytics for software development, is also challenging its competitors. The NLA tools allow for live project’ health checks’, and enable the company to make objective recommendations on OSS projects for clients.
Keeping up the momentum
Professor Korkontzelos is proud of what he and his team have achieved in the first-ever Horizon Europe project for Edge Hill.
“We have successfully managed and designed ways to apply NLA theoretical tools to real-world problems. Bringing together elements that appear unrelated and difficult to combine has proved challenging. But it’s been so fulfilling to see the global impact of our research and its application. It’s making a real difference to users, and how businesses make decisions on software.”
The OSS market is projected to be worth more than $66bn by 2026, so bringing order and insight to it remains an ongoing concern for Professor Korkontzelos.
He is continuing his work with CROSSMINER, building on the research he started almost 10 years ago as part of the OSSMETER project. And he
and his team are now seeking funding for further research to look at how NLA tools can support the analysis of largely untapped ‘big data’.
Importantly, Professor Korkontzelos’s efforts support the Open Source Initiative‘s drive for ‘software equality’ – ensuring quality software can be accessed independently of big companies and remains free to use.
Our research means that:
- Companies and organisations can offer better services to their customers as software developers can quickly and efficiently find OSS that best meets their needs.
- Software quality can continually improve as issues are quicker to spot and put right.
- OSS is better understood, helping the campaign to keep it free to use and accessible for all.
Find out more about Professor Yannis Korkontzelos’ research by viewing their profile on Pure:
Professor Yannis Korkontzelos’ research