Research data management instruction in engineering education is important because it will help the students learn how to better manage research data. The scholar engineer needs to know how to manage data because they will have to share it with others if their project is funded by the federal government. The practitioner engineer needs to know how to manage data because it is important to his or her work. I used to be a civil engineer, so I know the consequences of mismanaged data. Let’s talk through some of those consequences.
- Data get corrupted. This happened on at least one occasion when I was translating data from one file format to another. I don’t remember the two formats, though I’m sure one was Excel. I did not know the benefits of databases at the time, so spreadsheets were my main mode of data storage. The main problem with my data management in this case was lack of appropriate backups. The only backups I had were the backups the IT department did every evening. One can make alot of changes to a file in one day, so the data that I could get back from the tapes was pretty old. I just had to recreate what I had done to it that day. It was not catastrophic, but very annoying.
- Data out of synchronization with other entries. This happened when I converted data from a database format from a handheld device to Excel and then did some analysis on it that required sorting data in different ways. In Excel, one can sort one column of data out of sync with the others. This will lead to a major problem when one column’s data don’t line up the other columns. In this case, I had to also go back to the original file (the database) and reconvert it. If I had just created a database in Access, the sorting problem would not have happened, as databases prevent one from doing that.
- Inability to understand data. This happened more times than I can remember. On countless occasions, I would have to go back to a project that had been on the back burner for some time (and it didn’t have to be much time before I forgot), but I would not be able to understand what I had done before to the data or how to understand the variables. If I had created a metadata record for the data and logged detailed information about what each record was and how it was created, etc, then I would have been able to understand it. At the time, however, I didn’t know anything about metadata. In fact, I had never heard of the work until I took IS 520 while getting my masters degree. This is why I feel so strongly about education practicing engineers for RDM, and especially metadata.
- Incomplete data sets. On one particular project, I remember looking at the data that were collected in the field and seeing holes all over the place. I did not know how to handle the holes. I didn’t know how to accommodate the missing data. I now know to fill in missing data with values such as -9999 so it’s obvious that datum is missing.
I’m sure there are many more examples of my data mismanagement, but that’s all I can think of at this time. I am working on a project proposal that I hope will get done in which I will study the RDM education practicing engineers receive here at UT and use that to figure out the best way to strengthen it here. RDM skills need to be taught within the curriculum, but I also want to figure out a way to offer training to engineers that are already out there in the working world. They’ve already gone through the program at UT, so adding the skills to the curriculum won’t help them, only future students. We need a program for engineers in the workplace. That’s another research project for another day, but I am sure I will get there eventually.