Saturday, August 9, 2008

Back to School, Then and Now

I have 3 children, who this year are entering 6th grade, 4th grade, and Kindergarten. It's been interesting to watch my children grow up and work their way through the school system, partly for the deja vu, and partly because it's fascinating to observe what has--and hasn't--changed from when I was that age.

First off, there aren't any buses. You drive your kids to school, or they walk. This wasn't a big deal last year, with the elementary school a convenient quarter mile away, but this year we'll be driving my daughter to the middle school which is 11 miles away.

There don't seem to be any lockers either. I thought that would mean having a lot of stuff to carry, but apparently in the middle school kids get two (2) sets of books--one copy stays in class, the other stays home.

Also, no one's gotten beat up. Yet.

As for school supplies, some of it's the same as in the past--elementary kids need things like pencils, erasors, scissors, crayons--but in middle school these days a student must have a USB flash drive. The language arts teacher requires students to have access to PowerPoint. And one of the core subjects right up there with reading and writing and arithmetic is Technology. "You want to go to her school, don't you?" my wife said knowingly. I do!

School starts here in 4 days, and my kids have a look on their faces reminiscent of convicts facing the end of their time on death row. Some things never change after all.

Data Book: Data, Information, Knowledge, and Wisdom

Data, Information, Knowledge, and Wisdom
In the computer field, we sometimes use data and other related terms like information or knowledge in a loose way, as if they were synonyms that are interchangable. Yes, the terms are related; but no, they don't mean the same thing at all. You'll get the most out of this book if we make a point of clearing this up right here and now before going any further.

Data
Let's start with data. Data is nothing more than a series of symbols or bits of electronic storage. That storage can live in a few different places: It can live in the memory of a computer, where it is being consumed, produced, or merely held in storage on behalf of a running program. Data can also live in persistent storage, such as on a disk drive, DVD, or backup tape. Data can also be represented in messages flowing over a network connection from one computer to another. Lastly, data can also be represented in non-electronic form, such as a printed page, bar code label, or engravings in marble. From these examples it can be seen that data has a lifetime, which can range from fleeting microseconds to centuries.

A key, central thing to understand about data is that there is nothing--absolutely nothing at all-- in those symbols or bits of computer storage that have any kind of intrinsic meaning. A series of bits in computer storage is much like a code or cipher: both the creater of the data and the consumer of the data must have the same expectations of how to understand the data in order for it to convey any meaning. Data, then, has no measurable meaning in and of itself, but there is potential meaning to be derived from the data if it comes into the hands of a recipient with the right expectations.

If data itself has no intrinsic meaning, the best way to think of it is as a vehicle to convey information from one entity to another. Like speech, data is just a way for a sender to encode information. Like writing, persistent data is a way for a sender to encode information for longer periods of time and for multiple recipients.

Information
Notice the word information has crept into the above paragraph as we start to talk about how data is used. If two or more human beings want to exchange information, and they lack telepathy, they are forced to use a means of encoding information as data that both parties are familiar with. While we have multiple means of doing this (speech, writing, sign language, e-mail), all serve the same purpose: turning information into a data representation using some encoding method, and transmitting that data to a recipient who reverses the process, gleaning information from data.

But what, exactly, is information? If we're just thinking about human beings, it can be easy to jump the gun and start talking about all the things that go on in person's mind. If we keep in mind that the producers and consumers of data can also be machines, we need to be more careful. If data is sharing something, and the entities that send/receive data may or may not be people, what is it that is being shared? Facts or claims, which is what we mean by information. Information includes things like the current outside temperature; the content of a novel; the code for a software program; content such as images, audio, or video.

We've already established that the encoding scheme for data needs to be understood by both the producers and consumers in order to successfully share information. However, understanding the encoding scheme is not sufficient in and of itself. 3 major factors that affect one's ability to turn data to information are awareness, context, and precision.

Awareness refers to whether or not the intended recipient of the data becomes aware that the data is available. An important message on your answering machine will never come to your attention if you never check your messages.

Context means understanding the encoding scheme for data is not necessarily all you need to reconstruct the information. While some data does provides all of the contextual information you need, that is not always the case. You might know enough to confidently decode some data as a series of numbers, but is 4-8-15-16-23-48 a bank account number in Switzerland or the winning numbers for a lottery? The context you need to put meaning to data may require out-of-band information, such as knowing who delivered the data or when the data arrived.

Precision refers to how accurate the encoding of information into data and back is. A notation such as the English language can be rife with ambiguity; does "the seal" refer to a stamp or an animal? It may or may not be clear from the rest of the data and its known context. Precision can also refer to amount of detail. One could render the arithmetic value Pi as a word or symbol, or as a decimal number. The decimal number would of course be imprecise since Pi has an infinite number of digits.

Putting this all together, information is a fact or claim. People (or other entities) share information by encoding it, conveying it and decoding it in the form of data. The success of this process is dependent on awareness, context, and precision.

Knowledge
The next step up from information is knowledge. Knowledge is the ability to take certain kinds of information and produce something new from it. For example, a chef can take a list of available ingredients as input and produce as output one or more recipes for dishes that can be made using those ingredients. A weatherman can take information on temperate, wind speed, and weather patterns and attempt a prediction of future behavior. A calculator can turn a sequence of numbers and operators into an arithmetic result. A medical expert system can suggest diagnoses based on patient symptoms. Credit agency software can detect fraudelent patterns in credit card usage information.

Knowledge, then, is an ability to start with information that fits a known pattern and produce or derive something more from it. A software program is away to equip a computer with knowledge.

Wisdom
Wisdom is an understanding of underlying principles, the principles from which knowledge is derived.

Machines can't have wisdom, it is uniquely the province of human beings and higher beings. However, wisdom can be exported to the lower levels of the data-information-knowledge-wisdom ladder in the form of rules. The Ten Commandments in the Bible are one of the greatest examples of this. A software program containing business rules is another example; it may have taken a great amount of wisdom to come up with the rules; but once established, they can be put to use broadly.

Integrating Data, Information, Knowledge, and Wisdom
These 4 terms are different but related.
  • Data is the lowest level of these terms: it's merely a sequence of symbols who serve as a way for parties to share information.
  • Information is facts or claims, which can be shared among parties by encoding them as data.
  • Knowledge is an ability to make something more from a pattern of information. Knowledge can be innate or learned.
  • Wisdom is an understanding of underlying principles; a computer can't experience wisdom, but it can execute rules or logic that embodies wisdom.
How is all of this relevant to software developers? For one thing, you won' t make the mistake of confusing what your software programs can accomplish vs. what human beings should be responsible for. All software systems are actually a partnership between human beings and computers, where the roles and relationship between the two are precisely defined by where there is common ground between them and where there is not. Thus, we expect wisdom from human beings to shape the knowledge we embed in computer programs. Programs must be able to apply their knowledge to information, and to convert information to and from data in order to share it.

References:

I like a lot of what this article has to say about defining data, information, knowledge, and wisdom and how they relate to each other.
http://www.systems-thinking.org/dikw/dikw.htm

Data Book: It Starts Here

I've written several technical books in the past, and for the last couple of years--while I recovered from the last one--I've been mulling over what to write on next. I have a couple of clearly defined goals.:
  • First, it should be on something with more permanence than a particular version of a product or technology; lthose are too short-lived, too questionable a return, and they don't really make any kind of long-term contribution to the field.
  • Second, I'm not going to put myself under a deadline and steal away time from my family or other obligations. This book will be written at its own pace.
  • Third, I see no reason in this day and age to go through a publisher. Publishers control your content and what happens to it and edit it in ways you might not like. I'd really like to try things from the other side of the fence this time, creating and developing the material online at its own pace without pressure. Perhaps when it seems to be in a finished state and there is good feedback to support that conclusion I will look into getting it published. In the meantime it'll be here, online.

So what to write on? There are several candidate ideas, but the one I'm going to focus on for awhile is a book about... Data. It seems to me there is a lot to say on the subject, yet it seems to get relatively little coverage compared to platforms, programs, technologies, and development methodologies. From data formats to data structures to reliable data transmission to privacy to databases to archiving to content rendition, there's a lot to cover. A good understanding of data and data handling is critical if you're building anything at all the software world.

I have an inspiration for how such a book would be best be laid out. A few years back, well-known technical author Charles Petzold wrote a book simply entitled Code. And what a book it was! It started out presenting computer code concepts in terms anyone could understand, beginning with kids doing morse code via flashlight. By the end of the book we're deep into processor design and language compilers and other advanced subjects, but we got there step by step. Moreover, this book had something to offer both those with a computer background and those without.

It seems to me a great companion to Code would be a volume on Data, focusing not on instructions and processing but rather on the information that flows in, through, and out of computers. And so that's my goal: to write a book on data, starting simple with the lowly bit and working my way up. I'd love to have help from online readers, so don't be bashful in making suggestions or sending in feedback.

All my posts related to this will be tagged "Data Book".

A New Blog Home and a New Blog

I've just switched my blog home again, as I have done several times in the past. Usually I make a change like this because I find something limiting or annoying about the previous site/software I was using for blogging. In this case that's one reason for the change: my blog on MSN Spaces seems to limit me to a small number of tags (also known as labels, subjects, or categories) which is annoying both for me and those who read my blog.

But there are 2 other reasons for this change. One is a change in job. My career path seems to be governed by a pendulum that has me alternately swinging between working on software product and working as a consultant. After several years of working on the Neuron ESB product at Neudesic, the pendulum is swinging me back into consulting, also at Neudesic. Why does this affect blogging? Well, there's a big difference in openness between the product world and the consulting world. In the product world, every new way you to find to solve a problem, or optimize something, becomes intellectual property to be patented, protected, and held secret from competitors. In the consulting world, those things are to be openly shared with community, as they help build the reputation and value of your consulting organization. So you can expect a lot more technical blogging now that I'm returning to the consulting world.

The third reason is I've decided to blog about any and all aspects of my life that I feel like sharing, not just technical matters. There's far more to life than technology. That means you'll find everything from parenting anecodotes to religious conviction to political opinion to literary exercise in this blog (if this offends you, you may want to unsubscribe from this blog). Still, technical articles are going to dominate. I cannot count the number of times some online information has helped me get a job done or solve a problem, and I would very much like to "give back" on that front where I can. Fortunately, my new job is going to make that easy to do on a regular basis, as I'll have my hands in just about every technology area there is on the Microsoft platform.

I look forward to my new open, multi-facted blog and hope you will too.