A lot of people have strong opinions about what is and isn’t data. You may have encountered them: they are often quite happy to expound upon the virtues of the data that they work with, before explaining to you that other things that people call data are not data, or at least, not “real” or “good” or “valid” data.
If you want to see this in action, ask an engineer if ethnographic field notes are data. Or, ask an ethnographer if a spreadsheet of quantified happiness measures is data. (Even better, do this while the engineer and the ethnographer are in the same room together. Don’t forget to place bets on the outcome.)
As is often the case in situations where people have strong opinions, everyone is both a little right, and a little wrong. You see, friends, there is no distinction between data and not data. It’s all data, all the way down.
Ethnographic field notes? Data.
The instantaneous velocity of a swallow laden with a coconut? Data.
The tax returns of every American for the last five years? That’s data too.
Voicemail your grandma left you last Wednesday? Data.
Picture of a cat with a caption? Also data.
Search terms you use to find either of those things? Yep, you guessed it–also data.
It gets more meta than that, too:
The image of the world that your retinas pick up? Data.
The sensation your fingertips gather as you touch things? Data.
Your thoughts at this very instant? Totally data.
Any information we collect about the universe around us is data. The velocity of a swallow laden with a coconut describes the speed of a swallow in flight. Field notes capture information about people and places, interactions and culture. Your retina captures information about the wavelengths of light being reflected off of your surroundings.
But categorization is important and helpful, because it tells us what kinds of things are good for certain of tasks. Obviously, there is something that distinguishes these different kinds of data. Since we have discarded the data/not data categorization, let me introduce another: structured versus unstructured data.
Unstructured data describes most of the data that exists in the world. It is chaotic: it does not conform to an underlying organization. Think of it like a heap of silverware at a flea market. All of the different kinds of silverware are mixed in there. If I asked you to grab all of the forks, it would take you a long time. If I asked you to find a particular fork, it would take you even longer. That’s the problem with unstructured data–it’s really hard to find what you want in it. That’s why most analysis techniques require structured data.
Structured data does conform to an underlying organization. It’s like silverware in a silverware organizer. The knives are all in one cubby, the forks in another, the spoons in a third. Maybe there are cubbies for soup spoons and serving spoons and salad forks, or maybe you have decided that you don’t care about those distinctions and just lump those with generic spoons and forks (that decision is important, and we’ll talk about why in the next post). Either way, if I told you to grab all of the forks, you could do that easily. If I asked for a particular fork, that’d take a little longer, but not too much, since you’d only have to sort through the forks. That is the advantage of structured data.
Structured data is often conflated with quantitative data, but not all structured data is quantitative, and not all quantitative data is structured. A table showing people’s names, nationalities, and occupations is structured data–but it’s not quantitative. Likewise, a continuous stream of sensor readings is quantitative, but it is not necessarily structured*.
We have a cultural bias towards structured data, probably because of its increased utility. It’s easier and faster to store, search, and manipulate. However, it would be unwise of us to dismiss unstructured data as less real, less good, or less valid than structured data. There is a wealth of information hidden in the vast sea of unstructured data surrounding us. Ethnographers harness the best pattern-matching algorithms we know of (their brains) to organize unstructured data. Those same skills can be used to transform unstructured data into structured data, so that we can use it for visualization, data mining, and other kinds of analyses. We’ll talk about that process in the next post in this series. In the meantime, if you hear someone insisting that your data isn’t “real” data, be sure to tell them that it’s all data, all the way down.
*However, by virtue of the way quantitative data is collected, it is often structured.
The silverware drawer analogy was graciously provided to me by my friend and colleague, Jason Foss.
I loved your post. I am in an HIM class an your explanation of data, structured and unstructured was fantastic. The silverware was perfect. Thanks a million.