The purpose of abstract methods and classes?

I have been developing with PHP OOP for many years but have never really looked into Abstract methods and classes and something has recently prompted me to do so. However, I really can’t understand the point / purpose of them.

I am not looking for an explanation of how to set them up or use them but an explanation of why and when they should be used as I just don’t get it.

In general they are used in a similar manner to interfaces: They allow child classes to complete missing behaviour.

Here’s a heavily contrived example:


abstract class Mammal {
	protected $legs = array();

	public function eat(Food $food) {
		//...
	}

	abstract public function walk();
}


class Human extends Mammal {
	public function walk() {
		$this->legs[1]->forward();
		$this->legs[2]->forward();
	}
}


class Dog extends Mammal {
	public function walk() {
		$this->legs[1]->forward();	
		$this->legs[3]->forward();	
		$this->legs[2]->forward();	
		$this->legs[4]->forward();	
	}

}

The base class doesn’t have enough available information to provide an implementation for the walk() method. However, all mammals can walk and this is required behaviour for all mammals. By marking the method as abstract, any child class must provide the missing behaviour. This allows for polymorpism as any mammal must have a walk() method.

That said, in the majority of cases you are better off using interfaces rather than abstract methods. The example above has poor encapsulation because of the protected member variable. This is a problem, because child classes can alter the state in the parent class in a way that the parent class isn’t expecting and changes to the base class during development can cause unforeseen side effects in child classes the developer of the parent class may not even know exists. See: http://stackoverflow.com/questions/9344935/inheritance-breaking-encapsulation , http://en.wikipedia.org/wiki/Information_hiding and http://www.javaworld.com/javaworld/jw-08-2003/jw-0801-toolbox.html

There are some cases where abstract classes are useful, but as you’ve discovered, they are few and far between. When they are used, they’re used to ensure robustness and allow for polymorphism but as I stated, this is almost always better achieved using interfaces.

Declaring a class as abstract prevents it from being instantiated. This is useful if a class contains only the bits of code that need to be shared between two or more subclasses, and it wouldn’t make sense to try to instantiate the parent class.

Declaring a method as abstract forces each of the subclasses to implement that method. This can serve two purposes. It can be used to force each subclass to implement a consistent interface. Or it can allow the parent class to be aware of and use a method that would otherwise exist only within each subclass.

Here’s an example that may help illustraute. It doesn’t make sense to instantiate the class Game, because it doesn’t do enough all by itself. It only makes sense to instantiate one of its subclasses. Also, the class Game can use methods like initializeGame() and makePlay() even though those methods are implemented in the subclasses.

EDIT: TomB beat me to it. :slight_smile:

Abstract classes and Interfaces made so much more sense to me once I understood the concept of polymorphism.

A good example may be a system where you need to generate reports.

Let’s say you need to generate reports in a csv format. You have no idea what future reports you may need to write. You also have no idea whether the format will ever change (for example, whether the same report should be available say in xml or json format).

I was given this task a while back - my manager wanted to be able to create a few csv reports. However, I know from experience that there’s always the possibility that they may want reports in these other formats in the future.

Also we have another issue - how do we guarantee that we can add new reports without breaking old ones? It would be nice to know that we can achieve these two aims then:

  1. Have a way of making reports compatible with different file formats in the future (even if we don’t need to implement this right now).
  2. Make sure that we can add new reports and know for certain that we have not broken any existing reports.

Polymorphism and Interfaces/Abstract classes present an excellent way to design this. The reason for this is because we can use interfaces to design our entities in a way that makes them compatible with one another, and we can use these interfaces to make sure that future objects adhere to this pre-planned design. It’s a way of forcing the system to work a set way, rather than allowing things to be done in a willy-nilly fashion.

So let’s think about this.

Here’s a good way to start about doing something like this (I’ve found this method helps me) - let’s start at the end, instead of the beginning. By this I mean let’s show a use case of what we want the final result to look like for generating a report.



//let's create a user invoice report:
$report = new UserInvoiceReport();

//let's output this report as a csv to the browser:

$csvReportWriter = new CsvReportWriter();
$csvReportWriter->setReport($report); 
$csvReportWriter->output(); //outputs a csv report to the screen

But now we want to be able to output this same report in a json format. Let’s see how that looks:



//let's create a user invoice report:
$report = new UserInvoiceReport();

//let's output this report in a json format:

$jsonReportWriter = new JsonReportWriter();
$jsonReportWriter->setReport($report); 
$jsonReportWriter->output(); //outputs a json formatted report to the screen

Say we now want to output a different report in a json format. Let’s say we want to output a Sales Stats report:



//let's create a user invoice report:
$report = new SalesStatsReport();

//let's output this report in a json format:

$jsonReportWriter = new JsonReportWriter();
$jsonReportWriter->setReport($report); 
$jsonReportWriter->output(); //outputs a json formatted report to the screen

So from the above code, we can see that we have a way of ensuring that ReportWriter objects are compatible with Report objects, and we can see that the data that comes from each report is contained (encapsulated) within an Object context - this is what ensures a new report cannot break an old report. If we messed up and there’s an error in our SalesStatsReport() object, this would make no difference to the UserInvoiceReport object - all other reports would simply carry on working as normal. It’s also much cleaner and easier to work with for the future. Need a new report? Just create a new Report object and it will instantly be compatible with all other report writer objects. So once the new report is written, it will be available in all the formats we already have.

So how does this all work?

Firstly we have an interfaces for reports.

Let’s do something like this:


interface Report{
    public function getData();
}

Then we either want an interface or an Abstract class for the report writer. Let’s write it like an Abstract class:


abstract class ReportWriter{
    /**
     * @var Report
     */
    protected $_report;
    
    protected $_reportData;

    public function setReport(Report $report){
        $this->_report = $report;
        $this->_report->getData();
    }

    abstract public function output();
}

So the above code creates the setReport() method for all our ReportWriter objects. This method will mean that all report writers will automatically have a report object available in $this->_report, and will have the data from the report available in $this->_reportData. Notice how we keep the concept of the report abstract right now (We don’t ask for specific reports, just a ‘Report’ type of object). Our setReport object will not allow us to pass anything other than a Report object to it, so we’re forcing the code to work a certain way now. The Report interface above is what ensures that all reports are compatible with this method.

Now we can write a specific instance of a report, and a specific report writer to go with it.

I’m not going to write all the implementation code here, but in general it’d be like this:


class UserInvoiceReport implements Report{
    public function getData(){
      //code that queries the database and returns the data goes here
      return $data;
    }
}

Because we’ve implemented the Report interface, we are forced to create the getData() method, which is used later by our ReportWriters’ setReport() method.

Now in this instance we’d want to go maybe a bit further and perhaps return an object from the getData() method - if we return a ReportData object, we can ensure the data is available in a certain set way that can then be used by the report writers you see. I’ve not gone that far here because I’m just demonstrating the point. You could even just return an array format that is always going to be the same or something like that, but an object (enforced by an interface) would be best as it can force the data to come back in a set way.

So the writer code would be something like:


class CsvReportWriter extends ReportWriter{
    
        public function output(){
            //logic to output to a csv format goes here

            //remember, this class has access to $this->_report + $this->_reportData automatically
        }
}

This would ensure that when you create future reports, they are compatible with CsvReportWriter objects and any other report writer object you eventually create. So just by having a few interfaces and an Abstract class, we’ve made our system much easier to maintain and scale for future cases.

Does this help? :slight_smile:

That’s a good example but I don’t think it’s a particularly helpful demonstration of a worthwhile usage of an abstract class because it tightly couples two different responsibilities.

If the ReportWriter was an interface instead of a class:


interface ReportWriter {
	public function output();
}

Then the entire system would have more flexibility because it’s not tied to any specific implementation. At the moment your Report object must come from the protected $_report; variable and be set by the setReport() method.

The biggest problem, is that the ReportWriter is tied to the implementation of how it gets access to the Report object. Imagine a scenario where a ReportWriter needs to collate data from multiple reports. Using the abstract class, this is impossible as the implementation is tied to use the setReport() method. Using an interface that’s not as much of an issue:


class CSVReportWriter implements ReportWriter {
	private $report;

	public function setReport(Report $report) {
		$this->reports = $report;
	}

	public function output() {
		//Generate CSV from $this->report;
	}

}

class CSVMultipleReportWriter implements ReportWriter {
	private $reports = array();

	public function addReport(Report $report) {
		$this->reports[] = $report;
	}

	public function output() {
		//Generate CSV from $this->reports;
	}

}


If you had used the abstract class throught the system then creating a CSVMultipleReportWriter would become very difficult, or it would have a redundant setReport() method.

With abstract classes in particular, because a common use is exposing implementation between parent and child classes it’s very easy to encounter the fragile base class problem ( http://en.wikipedia.org/wiki/Fragile_base_class , http://www.web-brainz.co.uk/fragile , http://stackoverflow.com/questions/2921397/what-is-the-fragile-base-class-problem )

I do see your point, although I think in this particular instance it wouldn’t actually be an issue because in reality I don’t see where you would ever need to combine reports (the idea really is that the Report class just returns the raw data required for the report, so if I wanted a report that combined results from two reports, I’d just create a new report that did that).

However, from a design perspective I do see your point. In this specific instance I don’t think I’d be losing anything, but in general I see what you mean about tight coupling around a particular use-case in the abstract class.

As it happens, I have methods like setReport() to because they’re useful for unit testing.

I’d say in a general sense you’d be right, in that it’d be a better idea to do this just using interfaces, but in this specific instance I don’t think I’d actually be worse off in practice, if that makes sense.

The key thing is that these make more sense when there are multiple developers involved. This is one of those tools OOP provides to enforce rules set upon by the architect.