What is Object Oriented Programming? (Without the Hype)

(Historical note added in 2008: The following article was written in year 2000 and is for readers who are used to programming but not to OOP. It explains the basics of OOP from a traditional (procedural) programming perspective and without the distracting hype that OOP was being evangelised with at the time I was writting. If you are a recently trained programmers whose first language was an object oriented one like Java, and so naturally think in OOP ways, this article might seem strange or even derogatory to OOP. It might nevertheless be slightly amusing though :-) .)

Firstly: Don't Panic!

Object Oriented Programming (called 'OOP' for short) is promoted as a radical, difficult to comprehend & even frightening way of programming that vital to know about. Is this true? No.

Object Oriented Programming is not very radical or very difficult compared to conventional programming. When one looks under the ideology and sees what is actually there, one finds there is not really much different at all!

The reason for it seeming so difficult is that introductions to OOP are normally given by the follow types of people who are not ideal for teaching the basics:

OOP Zealots: These advocate OOP in seminars & articles with great enthusiasm but leave the audience with little understanding of what OOP actually is, just a feeling that it is revolutionary & complicated. The dogmatic fervour scares people off.
Computer Scientists: These marvel in lectures about the structures and relationships that an object oriented approach can bring and draw lots of rectangles, lozenges or clouds joined by lines This is rather pointless as it is all fairly obvious to the mathematically minded in the audience and is all totally incomprehensible to the rest. The boredom puts people off.
Programmers: These helpfully give the details of how to do particular things in particular OOP languages in web pages & books but miss out the basic introductory explanations. The detail puts people off.
Incompetents: Unfortunately there are some of these. I've even come across a lecturer from a London university who was paid to teach us OOP but could not answer the first trivial question when challenged to explain the rhetoric. The confusion puts people off.

In the following article I will try to explain what OOP is and why it is used but without the customary hype, irrelevant extras & detail (and, hopefully, without the incompetence). I am assuming the reader is familiar with the fundamental concepts of normal programming such as a program being made of 'commands' which tell the computer do things, 'variables' to store data in and the idea of grouping a set of commands together in 'functions' (or 'subroutines' which are virtually the same) which can be called as needed from different parts of a program. However, the ability to write programs is not required and I won't be giving the detailed syntax for any particular OOP language. If you want that then there are a surfeit of books & web pages to choose from already.

What Object Oriented Programming is

Firstly, some background. A conventional ('procedural') program consists of a sequence of commands. The commands can do input, output, manipulate data and control the order in which the commands are carried out. So as not to have to duplicate commands in different places in a program where same action needs to be performed, a set of commands can be combined into a 'function' or 'subroutine' which acts like a new command. This same set of commands can then be called from different places in the program. As well as substantially reducing the length of programs, this can make the structure neater by encapsulating the commands which, together, perform some particular operation in one place. The rest of the program need not be concerned about the details of what commands make up the function and just treat it as something which does that operation. This neatness is not merely an aesthetic feature but a great aid to making programs quicker to write, easier to debug, more reliable & more reusable.

Splitting a program into functions makes programming quicker not only because different programmers may be able to work on separate functions at the same time but, crucially, it breaks down a huge task which would be difficult for a human to store in mind as one piece into smaller units more suited human memory. It makes debugging easier because it localises the effect of a single bug so it is easier to track down and, when bugs have been eliminated from a function, one need not waste time rechecking it when debugging other parts of the program. To aid this, it is normal to have the variables a function uses internally hidden from the rest of the program, unless there is a special necessity to reveal them. This ensures that any problem related to such a 'local' variable can be traced to function which needs correcting. This also makes the program more reliable because, once a function is working, it should stay working as more of the program is built up. Other parts of the program cannot disrupt those hidden variables & commands inside the function. The splitting also, of course, makes subsequent programs quicker to write as whole functions can be reused from earlier programs and even stored in libraries for use by any future program. The generic term for this is approach of breaking a big solution into small pieces whose internal workings are not of concern to other pieces is 'modular'.

Enough about functions. Now for variables. In many computer languages, a programmer can define a compound variable type in addition to those which are ready made in a language. For example if a particular language has variable types for names & dates, a type for storing birth records could be made by combining two name type variables, for the family name and given names, with a date type variable, for the date of birth. These combined variables are called different things in different languages including 'structures' (C), 'clusters' (LabView) & even just 'types' (Fortran). Combined variable types can be useful. For example, instead of having to work on and pass around three variables together whenever a birth record is used, a single variable of this combine type could be thus be used. Commands & functions which don't need to use all the member variables of a combined variable need not be concerned they are there. One can even copy a combined variable in one command, rather than one command for each member variable, without knowing or caring what all the member variables are. This is applying a modular approach to collections of variables as functions were for collections of commands.

That was around for decades then someone then came up with the idea of including functions as well as variables in those combined variable types. Combined variable types could then have member functions as well as member variables. These functions are only really in the program once (it would be very inefficient otherwise) because they are written into the variable type specification not the individual variables of that type themselves. However, they act as if they were duplicated in each variable of that combined variable type because they, by default, act on the member variables of the particular variable of the variable they called with. For example one could have included a member function to calculate a person's age into that birth record combined variable type. When a particular variable of that type, storing a particular given name, family name & date of birth, has its age calculating member function called, the function will automatically use the member variables of that particular variable, not the generic variable type, to calculate an age. This can be quite handy because it neatly bundles data & the functions which act on it together. It can also aid conceptualising a program because, in calling a member function, one is effectively telling the data what to do to itself which is, in some situations, closer to reality than giving the data to a command to process.

Now you have understood that, we can go onto Object Orientated Programming at last. Correction: if you have understood then you have understood Object Oriented Programming! That idea in the preceding paragraph of putting functions into combined variable types is Object Oriented Programming! Does it not sound dramatic enough? Okay, lets put some hype in: rename 'member functions' to 'methods'; rename 'combined variable types' to 'classes'; and rename 'variables of combined types' to 'objects'. That's all it is!

A Few Little Extras

There are few useful extras which normally come with OOP. They can mostly exist in procedural programming languages as well so they are not necessarily OOP features but they are ubiquitous in OOP languages so I suppose I ought to mention them. You can skip this section if it is too detailed.

Data hiding: Just like functions can have local variables not visible to the rest of the program, so can objects. This is for storing data that is not revealed or set directly but only via methods (member functions). For example a birth record object could store the year, month & day of birth in separate member variables and only combine them when asked for date of birth by a method call. Some OOP advocates even recommend that all member variables are hidden and only accessed via methods but that is sometimes excessive.
Automatic initialisation: A class can have a method (called a 'constructor') which is automatically called whenever a new object of that class comes into being. This like having a default value for a variable type that a variable is initialized to when it is created prior to being set. However, as objects have member functions (methods) as well as member variables, this has been generalised into a method call that can do a lot more than just set a default value. There is a corresponding method (called a 'destructor') which is called when an object is disposed of and is typically used for clearing up.
Same name, different function: Different functions can have the same name provided they are distinguished by their parameter types. This is useful in procedural languages but is almost vital in OOP languages because classes may come with functions that take objects of the class as parameters (as well as member functions embedded in the class) and it likely that there will be a duplication of popular function names between classes from different programmers. In OOP, this ability to have multiple functions with the same name is called 'overloading'.
Derived classes: If one requires a class which like a class one already has but requires extra features then one can derive a class from an existing class. The derived class has all the externally visible methods & member variables the base class it was derived from had (and, optionally, some of the hidden ones) plus whatever extras are put in. For example one could derive a class from birth record class that also a member variable for time of birth or a method for combining the family & given name into a full name. Besides aiding reuse of program parts, it is possible to have different derived classes from the same base class for slightly different situations. For example British & Chinese versions of the previous example could be made where the British one calculates a full name by appending the family name to the given name whereas the Chinese one joins them the other way around. A nice feature of this is that objects of both types could then be stored and processed the same (thereby saving programming) and yet perform differently when the full name method is called. In OOP, this ability to have the same function perform different actions depending on the derived class is called 'polymorphism'.

Why it can be Good Thing

OOP is good for the same reasons that other modular programming schemes are: aesthetic neatness, quick writing, eased debugging, improved reliability & increased reusability in large programs.

In addition, an object based structure naturally fits certain common uses of computer programs including graphical user interfaces (with buttons, windows, scroll bars as objects) and databases (records as objects).

Of course, all this the modular structuring can be done with a classical 'procedural' language (indeed the OOP C++ language was originally made a collection of 'search & replace' operations that converted C++ programs to the procedural C language!) but looks cleaner in OOP because OOP was designed for this structuring. And OOP is fashionable!!

Why it can be a Bad Thing

There are drawbacks to OOP as well. It is not the best thing to use in all circumstances. Don't fall into the trap of using for ideological reasons when it is not the most suitable method for a particular task (and similarly don't dogmatically stick to a single programming language, pick the more suitable one for each job).

For a start, it should be obvious that OOP, or any such heavy programming method, is probably not ideal for doing small quickly-written one-off programs where the time taken to define a class structure is more than the time you will save by having it neatly modular. Of course, one can use an OOP language for short programs, it is just that structuring your own programs in an OOP fashion in addition to using the language's in-built own objects would be inefficient. I don't know what the cross-over point is but I guess it is several hundred lines of program for myself, although I would probably do modular structuring into functions at well below a hundred lines.

Neither is it suitable for very low power computers such as the microcontrollers embedded in consumer products which often have only less than a kilobyte of memory to fit the program into (compared to gigabytes on a office PC) and only a few bytes (compared to megabytes) to store variables in.

There are also some jobs which are naturally "do this ... then this ... then this ... then this ..." tasks in which case programming it procedurally could well be neater and easier than doing it object oriented. I've found this for simple one-task programs that control mechanical devices or batch process files. They typically read in the parameters, read in data from files, apply a series a manipulations to the data & parameters and output to electronic hardware or to a file in that order. Sometimes they don't even have conditionals or loops. Object orientation would be an ill-fitting arrangement for these programs. (An interesting aside: often such small programs are called in turn by other small programs and a collection of such programs naturally builds up into an effectively object oriented system, where the little programs act as classes, without any planned intention for them to be so.)

The most serious drawback is one common to all neatly structured modular programming: the division into modules really needs to be decided in advance of programming. In an ideal programming situation this would be the case but, in reality, customers who don't understand programming often change the requirements drastically after programming has been started (or even finished!). Often spec's are changed in a way that looks small from the outside but which mean that objects which were built to act totally independently of each other are changed so that they need to control eachother directly. The program alterations are then either a time-consuming restructuring of much of the program or messy ad-hoc direct links which wreck the modularity. For example, if a customer asked for a simple product database in which records are only ever set once & recalled one at a time for reading, then the obvious class structure would be one of a records class for storing the product data and a database class consisting of an array of those customer records objects along with a method to add a new record object and search method to returns a copy of a requested record for reading. If the customer then demands that a product record should link to related products with the links changeable from the display terminal, then the structuring will be serious. Not only will it need that extra variable added to the record class (easy) but record objects will need to link to other record objects which was previously only via the database object (disrupts the neat modular structure), the database will need to pass around the original record object instead of a copy so it can be altered (lots of changes needed in different places in the program) and a contention-resolution system will be needed to prevent the records now being altered from different parts of the program simultaneously (a difficult and time consuming programming task). The only solution I know of for this is to warn customers that such spec' changing is like asking for different foundations in a house after the walls are built and get them to contractually agree to pay for such alterations they request but customers don't like that.

The Real Reasons that it is so Popular in C/C++!

The rest of this article was not language specific but one major misconception needs to be cleared up with the language 'C++'.

OOP is most commonly advocated for 'C++'. In part this advocacy is just because 'C++' is the procedural language 'C' with OOP added in later so they make a nice pair of languages to compare & contrast unlike, for example, 'Java' which was OOP from its first creation. However this does not explain the fanatical enthusiasm with which C++ was welcomed. Was the explanation that OOP was so much better than procedural programming? No. It was simply that there were some very useful things absent from the original C language that were either rectified in C++ or could be bodged up with OOP tricks:

Strings & arrays: Almost amazingly, C does not really have variables of string or array type. Instead strings & arrays have to be cumbersomely bodged in using pointers to general-purpose chunks of memory with the programmer having to maintain separate records of where they are & how long they are and resort to tedious explicit memory cell copying operations to duplicate them or change their lengths. Although C++ does not come with strings or arrays either, one can write classes which encapsulate the memory manipulation routines and, at last, use instances of them as if they were intrinsic string & array variables without further hassle. (Other than the hassle of there being many different string & array classes and each product's library seems to use different ones which need interconversion! Even I have written my own ones after finding the ones that came with the Microsoft C++ compiler were not fast, easy or reliable enough for my requirements.)
Memory deallocation: Despite C programming needing so much direct memory manipulation, it does not automatically free up memory when it is no longer needed like Perl & Java do. Therefore the programmer must write their programs so that they tediously keep track of all memory that has been allocated by request and explicitly free it up when no longer needed. This is especially tedious where there are many ways a function can exit (such as after testing for different errors) and each one must have commands to free up the same memory. If this is not done then the computer runs out of memory as more & more of it is left reserved but not actually in use (a situation so common that it has the been given the name of a "memory leak"!). Although C++ still has this problem, it is slightly less hassle as one can create classes for the variables which need memory allocation, perform the allocation in the class's constructor & free it up in its destructor so one only needs to program the freeing up in once instead of everywhere such a variable might cease being used.
Function overloading: It is annoying in C that if one wants to write a function that can process more than one type of variable then one has to give the functions different names and type the correct name for the data type when using it. For example one has to use 'abs()' to calculate to the absolute (unsigned) value of an integer but 'fabs()' to do the same thing for a floating point number. The compiler should be able to distinguish these itself from the type of parameter. In C++ it can. This is not a specifically OOP feature though.
Unspecified variable type: C likes to know what type its variables are even when the specific features of that type are irrelevant. This annoying when wants to store a collection of variables of different types in an array or write a program that will not need reprogramming when new types are added. The normal solution is to have duplicate commands for the different types or a messy "void pointer" fudge. With OOP one can derive all ones variable types on one base class (even if that base class does absolutely nothing!) and treat them all as being of that type.
Localised namespaces: When writing a big program, it is difficult to ensure that names of functions are not duplicated in different files. This is especially a problem for little functions which need not be visible to other modules, just for use internally. A name like 'IncreaseCount()' could easily be accidentally duplicated. A common solution was to start each function name with the name of the file or module which it was in but that was messy, increased typing & actually broke the official C spec' (which stated compilers could ignore all but the first 6 characters of function names). In C++ there is a neater bodge-up: bunging all the functions from one file or module in a class, even if that class does not have any variables, neatly localises the function names. Once more, this is need not have required OOP; for example, in the Perl language one can specify anything to be only locally visible (indeed adding OOP features to Perl required almost no changes to the language, essentially just an alternative syntax which called such localised regions classes!).
//: The most commonly used C++ feature, which is now used ubiquitously in programs which otherwise use only pure C commands, is the '//' command which merely means "ignore everything else on this line"! It is quicker to type than the original C comment markers '/*...*/' which needed one to mark both ends of the stuff to ignore.

The reason for these deficiencies in C is because C was designed for low level fast programs on low power computers not for database & graphical user interface programs on the far more powerful computers available at present. It is C++'s adaptation of C towards this changed role which gives it its popularity not so much its OOP nature.

Summary

Object Oriented Programming is not as different from normal procedural programming as is made out by its advocates and is not as difficult to understand as their proselytising implies. It is useful in making big modular programs but such programs should have been structured very similar to an OOP structure anyway. It can be more hassle than it is worth for short & quick programs. The enthusiasm for C++ in particular is mainly because it adds in some important basic features that were missing from C.