Who is Goldfinch Bio?
Goldfinch Bio was doing large scale genomic analysis using Hail that runs on Apache Spark and Amazon EMR. The company developed a solution using an early version of Hail that proved to be very brittle. It had problems running reliably and took a very long time to bootstrap. Due to the complexity and time it would take to launch new clusters, Goldfinch was hesitant to destroy clusters. Existing clusters would run for a very long time, causing the company to fall behind on new Hail releases and miss out on the new features it needed. Goldfinch would spend hours—or even days—to get the solutions working properly, costing the company time and money.
Navisite built a new automated Amazon Machine Images (AMI) pipeline using Hashi Corp’s Packer. This new pipeline allowed Goldfinch to automate the installation of Hail along with all its dependencies. This AMI was made available as a public resource so the company can launch a new cluster at any time without having to worry about Hail breaking.
Prior to adopting this new solution, when clusters were torn down, Goldfinch would lose code, work and, most importantly, time. The new solution separates concerns and gives it the ability to now have clusters running independently of the notebooks.
Existing clusters took over 45 minutes to bootstrap. With the new solution, the clusters launch in about seven-to-eight minutes. Goldfinch is now much more willing to change cluster configurations and launch clusters tailored to the compute job, thanks to more agility in the new solution in place.
With Navisite owning the responsibility of maintaining the Hail AMI, Goldfinch can focus its time and effort on creating tailored clusters to its data scientists and not have to worry about building and maintaining the complexities of Hail.