Lessons From Album Tags: Part 3

In early spring of 2019, as my Album Tags project turned one year old, I began a significant lift for the site that I had thus far been putting off; the migration to a relational database. Readers of earlier posts in this “series” may recall that I have toyed with this idea ever since the third version of the site’s MongoDB database. I had tried a few dry runs of cloud SQL databases in AWS and GCP and was intrigued by the ways this specific app might benefit from the feature development velocity and additional reliability a relational database could provide. The new data-layer of Album Tags was deployed to production earlier this month, and I wanted to share some of what I learned by looking back on this next stage chapter in the Album Tags story.

Same Data, Great New Look!

My favorite part of rewriting the Album Tags data-layer again was getting the chance to look at the data the app uses from a new angle. I did not want to re-work the user interface entirely, so I needed to balance the needs of the existing front end against the different ways data is stored in a Google Cloud SQL MySQL database. This meant modeling some fairly complex relationships, such as the many-to-many, self-referencing relationship of album connections, or the has-all-in-array behavior of tag searches. I discovered I had thus far either intuited my way around these road blocks, or completely side stepped them as a result of the document-centric workflow of MongoDB. Additionally, this was in some ways the first time I’d really been forced to sit down and think critically about how all the data currently in Album Tags is related all at once, rather than just in the context of a few features at a time. I appreciated having the space and opportunity to make these design decisions before leaping into writing new code, and I believe the future flexibility of the app will prove it was worth the extra effort.

New Digs Means New Tools

No matter how fun it is to re-write the data-layer to a new database paradigm, there is always the inevitable to-do step titled something like: “move data to new database.” Album Tags has grown in the last year to an extent that a fully manual data migration is no longer feasible. To help with this task then, I constructed a Node.js script that would gather all the existing Album Tags data, manipulate the data structure, then insert the data into the new database. For the insertion process, I chose to use the new Album Tags endpoints created as part of the data-layer rewrite. This served two purposes; I could complete my migration in a manner that would insure the data matched the format of the data manually inserted by users on the site, and I could assure there were no weird edge-case bugs in my new data-layer logic. The script was essentially the ultimate acceptance test, and as I had hoped it did help me identify a few areas for improvement in my endpoints along the way. As a result, I had a reliable way to move data from one place to another and I was able to use the script during the production cut-over to make sure that all user data added to the old database up to the last minute would be present in the new one.

In the MongoDB version of the Album Tags code, I had included several utility endpoints that would allow me to preform basic database maintenance. This ranged from identifying malformed documents, to finding duplicate or orphaned records. After my experience writing the data-migration tool, I decided to construct the replacement database maintenance tools as separate scripts rather than integrating them into the site. This allowed for a little more separation between the codebase and my database management work, and also allowed me the same opportunity to interact with my new data-layer endpoints from a different angle.

So far, I have been able to fulfill all my maintenance requirements with two new console apps. I first created another Node.js script to loop through the new data and identify records that were no longer up to date with the information hosted on the Apple Music API. I then created a Ruby script that checks for duplicate records (using an artist and album name key-word algorithm.) Along the way, it will also help identify orphaned rows in the Albums table. The rest of the tables use a cascading delete to prevent orphaned rows, but because of the ongoing time cost of keeping Album records up to date, I wanted an easy way to make sure there are not any extra records stored but not being used. Both these scripts proved very helpful, and have aided in identifying future improvements I can make to the data-layer endpoints as well!

In the early days of Album Tags, I chose a MongoDB database to store my production data mainly out of an interest to learn a different database technology. Over the last year of development, I have had the opportunity to explore many of the strengths and weaknesses that accompany NoSQL databases. For some applications, the document data structure works well and enables developers to quickly deliver the kind of user experience they are looking for. In the end however, relational databases turned out to be a better fit for the next chapter of Album Tags. They will enable me to quickly add and improve existing features while simultaneously finding new ways to deliver more meaning to Album Tags users from the data they contribute to the site. Of course, a new database will always come with new challenges, but I am excited for the learning opportunities this will provide in the season ahead!

If you’re interested in learning more about Album Tags, check out the live site here or the current production code which lives here. I am always open to feedback and happy to meet more developers and music fans, so feel free to reach out to me directly at joshua@hunschejones.com, or simply comment on this post. Thanks for reading!