I learned a lot while building and leading the engineering team at Predikto. Aligning and integrating data science in to the software development process was something relatively new in the industry (because, “AI”). This is something that I believe the team got “right” after a lot of hard work and wrong turns over the years. I think we finally got our method of operating and aligning right when it lead to a great product and a even greater exit.
The good news is that other people can benefit from my experiences (good and bad), the lessons learned, and now this highly opinionated article. ;-)
Here are 6 important lessons I learned when aligning data science with software development:
“The man without a purpose is like a ship without a rudder” - Thomas Carlyle
A data science team formed to work on skunk works projects and provide “insights” is a cost center barge floating in the ocean with no rudder, and likely a horrible distraction.
Lack of defined and stated goals, will lead to failure. Where failure is defined as providing little to no value to the core offering of the company.
In a sense, the goals of a data science team are no different than any other team within a startup - stay focused.
I’ll preface this section by stating an assumption that Engineering and Product are peers, ie. the CTO is not reporting to the Chief Product Officer, or vice versa.
This may be controversial for some, but I have seen mixed success having data science report in to product management.
These are two entirely different disciplines with little-to-nothing in common in how they work day-to-day or even how they communicate.
Having Data Science act as a sister-org to Engineering. That is, they would both be reporting up to the office of the CTO as peers. Data
Science should normally be allowed to grow in to a research and innovation arm for the company, and being tethered to engineering facilitates this.
So now you have data science and engineering as peers. How should data science take part in the SDLC?
Isolating data science to work on skunk works projects without a direct voice that you would offer to a customer.
Data science acts as a stakeholder for data science related features and also aids QA and product management in blessing work as “done”. Treating data science as a customer at times will ensure that they have a voice in the product and innovative features make in to the offering.
I was fortunate at Predikto to have a great developer within the engineering team that understood how to speak “data science” and could translate their feature requests in to actual code.
Blocking communication between engineering and data science or losing data science feature requests in translation will be detrimental to the product, simply because your data scientists and engineers speak completely different languages and work in a completely different mode.
Find a developer that understands how to implement what the data scientists are asking for and appoint him as a liaison. Part of his job function is to sit in on the data science morning stand-up and planning meetings. This solves two potential issues:
Data science work is lumpy, with periods of short and quick iterations and other periods of long and deep research. So should data science use Kanban, Scrum, or … ?
Data science without a work methodology or timelines.
Here are some things I’ve found helpful:
Efficient poking and prodding of data requires custom scripts and lots of local work.
Expecting data science to strictly with cloud-based software managed by your devops team.
Data science requires loading/ETLing data, understanding/analyzing data, building models, and reading output. Cloud-hosted offerings can get you part of the way there, depending on the volume of data and customization required to make sense of it. Some of the lessons learned at Predikto, that I think really helped the data science team may seem foreign to some, but I feel they let the team excel at delivering: