A machine learning model for project management

It's been a couple of years since I wrote about Project Management & AI, how it would impact the Project Management practice and other careers. As I research for this post, I'm disappointed as I expected that by now, 2020, any experienced project management software company would have a predictive model in production but no, nothing out there.

I will help these companies out there with a vision on how a predictive model should work, and its benefits. But I cannot help to wonder why there's nothing out there? Luckily enough, I've found the answer.

What is Machine Learning? Is it different from Artificial Intelligence?

Machine Learning is considered a subset or a specialty within Artificial Intelligence, where we train an algorithm with a dataset, and, based on that information, the model can infer or predict results for specific questions.

What questions are we looking to answer?

"As much as possible." was at the top of my mind. However, I would be interested in predicting the likelihood of a project to complete on time. We could include quality, budget, customer satisfaction, but let's start with the former.

As a project manager, you're probably closely looking at the critical path to determine the likelihood of finishing on time. This approach probably resonates with a waterfall-driven project, but not with an Agile one. For an Agile project to be specific, a Scrum project, whatever gets committed has to be delivered by the end of the sprint, which simplifies this exercise, the particular question that I'll be looking to answer then is: what's the likelihood, or probability, for a user story to be completed within the sprint? It's simple enough, and it will set the foundations for what's next.

The training dataset

For a machine learning model to work, it needs a training dataset; the bigger, the better, as a small dataset could lead to a biased model. This dataset will be divided in a particular way: 80-10-10, 80% for training, 10% for validations, and 10% for testing.

To be specific, the training dataset would consist of thousands, if not millions, of user stories and tasks, with points, responsible, spill-over information, acceptance criteria, and comments.

Do you get the answer?

After the machine learning model gets trained, considering all the variables involved, I would expect to see a score from zero to one as part of my user stories and tasks before starting a sprint. Assuming that the model runs in realtime, if you assign a user story to Jack, you get a 0.8. However, if you give it to Liz, you get a 0.9. This score means that based on what the model learned, Jack, with stories with similar criteria and points, would deliver a little less within the sprint than Liz.

Thinking even further into this, the likelihood of a user story, to be completed on time, will not depend on the nature of the task itself, but rather on who executes the job, that means that the model would need to look into the individual's performance as well, and this is where the plot thickens if an individual just joined the team I would not expect top performance right from the beginning as there's an onboarding and adaptation period, in this case, the model should not count this onboarding time as part of the training dataset, if it does would only contaminate a future analysis with biased information from an inexperienced individual.

Imagine a self-balancing resource allocation routine based on each score.

What's holding up an ML model to go into production?

As I concentrate on writing this, I realize the hold-up for an ML model like this, the training dataset; it would require tens of thousands if not millions of user stories and tasks from the same project to build a model. Try to remember your most prominent project, how many user stories and tasks did you have by the end? Ten thousand? More? Twenty thousand? In terms of the training dataset for an ML model, it is a small number.

Another hold up would be that I would need the ML model while the project is in progress, not at the end when I'm doing retrospectives or lessons learned, but no project is born with thousands of completed stories and tasks. Well, what about mature corporate projects? Or construction companies that have built dozens of bridges that would be a dataset worth study.

Wouldn't this model work?

Discussing this machine learning model with a fellow project manager, suddenly he came up with a significant reason why this wouldn't work, can you imagine why? The Agile spirit itself! If a team has an agile mentality working with a Scrum methodology, ideally, when they see that a task is falling behind for whatever reason, they would help the needy and deliver the task. This simple fact: collaboration, would render any performance theory invalid as the task would get completed on time, setting who's falling behind with a 100 for delivered tasks, when it was the team as a whole achieved the sprint.

What a bummer, Agile killed the machine, unless we could somehow track this lack of performance, would you as a project manager or scrum master track this so a machine learning model could predict delays or quality issues?

Conclusions

We reviewed what questions a machine learning model for project management would answer, the limitations of the training dataset for the model, and how humans themselves would render the model useless.

If you have all the proper conditions for a machine learning model to assist you, will you train it?

I'll keep thinking and writing,

Christian

Credits: Image by Franck V. on Splash