Friday, December 19, 2014

Dev of the Week

http://java.dzone.com/articles/dev-week-steven-lott

Yes. Everyone is famous for 15 minutes.

And. "On the Web, everyone will be famous to fifteen people."

Thursday, December 18, 2014

Making Learning Accessible

Visit Packt Publishing today for the $5 eBook Bonanza.  https://www.packtpub.com.

eBooks and videos at a discount through something like the 6th of January.

We autodidacts are rejoicing.

Specifically, I can look at some of the Scala and Hadoop titles. I'm working with folks who have Hadoop but I've heard rumors that they're leaning toward Scala, also. Does that mean Apache Spark? Or does it mean Scalding?

I'm biased toward using Python with Hadoop; but I appear to be in the minority on this. Time to do some additional learning.


Tuesday, December 16, 2014

The Getting Started Problem

How does one get started developing software? What's the first step?

When you come to this craft -- or sullen art -- without a background except as a user, how do you get started writing code?

It's not easy. Indeed, developing software may be one the hardest things there is. Really, really hard.

Why? Consider the orders of magnitude involved. From sub-microsecond clock speeds to software that's supposed to continue running for 8,763 hours a year without interruption. That's 31,547,269 seconds. Isn't that about 15 orders of magnitude?

Or consider scope of storage. We wrangle over bytes in a dataset that spans terabytes. That's 12 orders of magnitude.

When engineers build a 13,000' long bridge, are they looking at it from scales of 10±5? Do they even care what's 21 miles away? They might care about things at the scale of 10-5, since that's about an inch. But 10-7? 100th of an inch? I could be wrong, but I have doubts.

I won't go so far as to say bridge building is particularly easy. It's safety critical work. People die when things go wrong. Consequently, it's regulated by civil engineering standards. Bridge designs are limited to proven patterns. You can't spring something new on the world and expect anyone to pay money for it or trust their life to it.

If you're with me so far, you see my point: software is different. And that makes it particularly hard. People do learn elements of it. How does this happen?

Two Paths Diverge

I see two separate paths:

  • More formal, and
  • Less formal.
The more formal path includes the kind of curriculum you find at big CS schools. Formal treatment of algorithms and data structures. Logic and Computable Functions. The essentials of Turing Completeness.


The less formal path starts with -- essentially -- random hacking around, trying to get stuff to work. Some folks argue that a curriculum of structured exercises isn't "random" hacking around. I suggest that a curriculum of structured exercises can be the formal path concealed under a patina of hackeriness. On the other hand, a set of exercises can be successful at training programmers; if it doesn't follow a formalized structure, it's merely a small step from random. 

[Random doesn't mean "bad;" it means "informal" and "unstructured."]

Some folks learn well in a formal, structured approach. They like axiomatic definitions of computability, and they can get a grip on how to map the abstractions of computing to specific languages and problem domains. They read content at http://www.algorist.com and see applications of principles.

Other folks can be shown the formal background that makes their random hacking fit into a larger pattern. When shown how some things fit a larger pattern, they're often happy work in a new context with an expanded repertoire of data structures and algorithms. They read content at http://www.algorist.com and look for solutions to problems; the formal patterns will emerge eventually.

Not all folks respond well to having their informal notions challenged. Some folks have ingrained bad habits and prefer to fight to the death to avoid change. A sad state of affairs, but remarkably common. They didn't understand linked lists at some point and steadfastly refuse to use the java.util.LinkedList class. This is what software religious wars are about. Some trolls truly and deeply love an uniformed religious war. 

Chickens and Eggs

Is this a chicken-and-egg problem? 
  • You can't really appreciate the formal foundations until you have some hands-on coding experience.
  • You shouldn't dirty your hands with implementation details until you have the proper theoretical foundations.
That seems potentially reductionist and uninformative. Or. Perhaps there is a nugget of truth in this. Perhaps one is actually foundational.

Eggs, to be specific, show the fresh mutations. The egg comes first from a chicken-like precursor that's not properly a chicken. 

What's that precursor to programming in Python? CS Fundamentals? Hacking around? I suggest that the way we acquire languages is important here.

Language Skills

Software languages are a small step from natural languages. As with learning natural languages, formal grammar may not be as helpful as engaging in conversations. Indeed, for natural languages, formal grammars are an afterthought. They're something we discover about a corpus. We impose the discovered grammar rules on ourselves (and others) to be understood in a context of other writing (and speaking.) 

Natural language grammar isn't timeless and immutable. People throw their hands up in despair at the erosion of grammar and language. They're -- of course -- just being reactionary. Language evolves. The loudest complainers are the ones who didn't pay attention for a long time and suddenly (somehow) realized the don't know what "WTF" means. LOL.

With an artificial language, the grammar is formalized. It has a first-class existence in compilers, interpreters and other tools. 

However, I think the bits of our brain that assimilate grammar work best from concrete examples. A formal grammar definition -- while helpful -- isn't the way to start. I think that a less formal, "try this" suite of exercises is perhaps the best way to learn to program.

As an author, I'm beholden to my publisher's notions of what sells. Examples sell. See almost everything from Packt. Working examples are solid gold. 

These are not necessarily problems for the reader to tackle and solve. They're examples to study.

The conundrum with attempting to solve problems is the attempting part. It's hard to set out a list of "solve these problems and master programming" problems and hope folks get through them. What if they fail? Clearly, you'd provide answers. In that case, you'd be back at examples to study. Hmm.

I have intermittent interest in my older Building Skills in Python book. Partly because it's got extensive exercises in each chapter. I get donations. I get inquiries. The exercises seem to resonate in a small way.

I've done about 22 levels of the Python Challenge (I'll write about that separately.) It's not a great way to learn from scratch. You need to know a lot. And you need a lot of hints. 

I've done almost 70 levels of Project Euler. It might be a better way to learn programming because the easy problems are really easy. No guesswork. No riddles. No steganography. The answers are totally cut-and-dried, unambiguous, and absolute. However, there's no easy guidance for learners. Either you have an answer, and want help on improving it, or ... well ... you're stuck and frustrated. 

Structured Sequence of Exercises

What strikes me as a possibility here is a structured series of exercises that lay out the foundations of computer science as realized in a specific programming language.

Puzzle-style. With extensive hints. Background readings, too. But with absolutely right answers. And a score-keeping system to show where you stand. 

No tricky riddles. No quizzes to proceed. You could go on to advanced material without mastering the foundations, if you wanted.

I've got a bunch of exercises and examples in my Building Skills books. Plus some of the examples in my Packt books can be modified and repurposed. Plus. Projects like HamCalc contain a wealth of simple applications that can be adjusted to show CS fundamentals.

Perhaps relevant is this: https://www.google.com/edu/programs/exploring-computational-thinking/.   I'm not sure precisely how it fits, since it seems to be more aimed at providing a general background, rather than teaching programming language skills. They decompose the skills into four specific techniques. Here are specific techniques.
  • Decomposition: Breaking a task or problem into steps or parts.
  • Pattern Recognition: Make predictions and models to test.
  • Pattern Generalization and Abstraction: Discover the laws, or principles that cause these patterns.
  • Algorithm Design: Develop the instructions to solve similar problems and repeat the process.
Perhaps this is relevant: http://interactivepython.org/courselib/static/pythonds/index.html.  I haven't read this carefully, but it seems to be expository rather than exploratory.  It's really thorough. It has quizzes and self-checks. 

I think there's a big space for publishing lots simple recreational programming exercises as teaching tools. 

Thursday, December 11, 2014

Wow. Two-Word Question. Profound Insight.

I'm working on yet another Python book. This one looking at functional programming in Python. It doesn't really go with with Mastering Object-Oriented Python and Python for Secret Agents because the focus isn't on Python's strong suit.

In chapter one, a reviewer had this two-word question:

"yield from?"

What? What does "yield from" mean?

Oh.

Wow.

https://docs.python.org/3/whatsnew/3.3.html#pep-380-syntax-for-delegating-to-a-subgenerator

I had utterly missed this profound, important feature.

I guess I have been too blasé in skimming the release notes.

That's embarrassing.  And it only took two words to reveal my mistake.

I had to then review all 113 yield statements in 72 files of examples that go with the book.  That means most chapters will get touched to revise an example to show yield from iter instead of the older for x in iter: yield x template.

This also changes the Tail Call Optimization material. The explicit for was actually kind of nice for showing how TCO is implemented in Python. The yield from makes it a little less clear.

Some reviewers consider TCO so fundamental that it belongs in chapter 1. The omission of detailed analysis of Python's TCO approach was considered a significant flaw. Other reviewers seemed happy setting discussion of TCO aside for later.

The Functional Python Conundrum

This book is going to be difficult. The ratings from the reviewers were low. Really low. It looks like I've got a lot of work to do. Finding the target audience will be difficult.

One reviewer asked -- in effect -- why would someone who knew functional LISP ever use Python? I don't think there's a big audience of disgruntled LISP programmers, so that's not a relevant question.

Viewed from the other direction, it's hugely import. Why would a Python programmer adopt functional design patterns? That's the question that needs to be answered clearly.

And from the reviews of chapter 1, it wasn't addressed clearly enough.

Thursday, December 4, 2014

Architectural Principles, Spring Framework, and Jersey JAX-RS

See this: http://www.moschetti.org

Attended a meeting with Buzz. Not stated in his blog (in an obvious way) was something he said about not being a fan of big frameworks. I didn't write down his punchline, but it was a pretty pithy summary of the framework tradeoff.

IIRC, it was essentially this: you can wrestle with one or both of these technical problems.
  • Boilerplate Code
  • A Framework's Conceptual Model
Either you have to create your own libraries or you have to learn someone else's. This is in addition to wrestling with the business problem you're supposed to be solving.

Buzz's point seemed to be that you can often manage your own boilerplate more easily than you can come to grips with a framework. If one member of your sprint team handles reusable services, you can just ask them for a feature. You don't have to spend an hour reading other people's struggles.

After spending three months getting my brain wrapped around Spring Framework, I'm inclined toward partial, qualified agreement. Frameworks seem to have limited value until you're an expert in using them.

Layers and Layers

When wrestling with a new feature, you are forced to assume that you've understood its semantics. When you mock a framework element for test purposes, you're reduced to hope that your unit tests are sufficient. A unit tests of a mocked framework element only tests your assumptions. If you're not using the element's API correctly, your tests can't show that the framework will break or raise exceptions.

For new technology, you need to start with a technical spike to understand the framework. Then you can write unit tests that test against known framework behavior. Then you can write the real code that's based on the unit tests that are based on a spike that shows how the framework really works.

Using a technical spike for discovery and debugging can be challenging. You don't want to drag around your entire application just to create a spike. But you don't want to drop back to a trivial "hello world" spike that doesn't really apply to your context. You have to balance simplicity against realism.

For example, making JAX-RS requests to web services is aggravating to debug. You can spend many hours looking at boilerplate 401 and 404 errors wondering what's missing. You can't write the unit tests until you finally get something to work. Once you have something, you can replace real objects with mock objects.

If you already know JAX-RS features, it's easy. If you already know the RESTful service, it's not too bad. If you know neither JAX-RS nor the service, you don't have any clue which direction to turn. Did I misuse JAX-RS? Is something wrong in the request? Am I missing a required header? Did I leave something off the Accept header?

I finally had to give up creating spikes and debugging RESTful requests in Java. It turned out to be simpler to write a version of the REST client in Python. I used this to figure out how the real service really worked. Given a working Python spike, I could then save those interactions for WireMock.

Once I has a clue how the service worked, I could also write a mock server for some more sophisticated experiments.  This was useful for debugging problems based on a failure to understand JAX-RS.

Yes. Rather than struggle with the framework, I wrote the client once in Python and then rewrote the client again in Java. It seemed quicker than trying to debug it in Java.

One contributing factor is the 1m 30s build time in Maven. Compare that with interactive  Python at the >>> prompt.

Perhaps a smaller framework would have been better.

Thursday, November 20, 2014

MongoDB and Schema Validation

One part of the MongoDB value proposition is being freed from the constraints of a database schema.

There's a "baby and bathwater" issue here. While a schema can become a low-value constraint, we have to be careful about throwing out the baby when we throw out the bathwater. A schema isn't inherently evil. A schema that's hard to modify can become more cost than benefit.

When working with document databases like MongoDB or CouchDB, we're freed from the constraints of a schema.

But.

Do we really want the kind of freedom that can devolve to anarchy?

Or.

Do we want some kind of constraint checking capability to provide some additional run-time assurance that the applications are using the database properly?

Read this http://realprogrammer.wordpress.com/tag/json-schema/ and this http://www.litixsoft.de/english/mms-json-schema/.

My thesis is that some schema validation may have some value.

My plan is this.

1. Define the essential collections for the various documents using ordinary document design practices.

2. For each document class, we'll have two closely associated collections:

  • The primary collection, call it it "class" because it matches one of the application classes.
  • An additional "class.schema" collection. This collection will contain JSON-schema documents. See http://json-schema.org for more information.
  • For audit, and sequential key generation, we may have some additional associated collections.
Because JSON schema documents have a "$schema" field, we can replace the "$" with "\uFF04" the "FULLWIDTH DOLLAR SIGN" character when saving the JSON-schema document into a MongoDB database. We can do the inverse operation when finding the schema documents in the database.

3. Use a tool like https://github.com/Julian/jsonschema to validate the schema. The document-level validation could be embedded in the application for each transaction. However, it seems better trust the code and the unit testing of the code to enforce schema rules. We'd use this validation periodically to check the schema. Significant events should include a validation pass. For example, before and after any schema changes. This way we can be sure that things are continuing to go properly.

It would be strictly an additional layer of checking.

Thursday, November 13, 2014

Declarative Programming

I know that some folks swear by declarative programming. They like the ideas behind ant (and make) and SCons and related examples.

You can google for "ant v. maven v. gradle" where people gripe about which is more declarative. The point of the whining being that more declarative == good and any traces of procedural or imperative programming == bad.

All, of course, without any really good justification of why declarative is better. It's assumed that declarative simply has innumerable advantages. And yes, I've started with http://en.wikipedia.org/wiki/Declarative_programming. The issue isn't simply moot; the justification is weak.

Perhaps there's a awful bias toward imperative and functional programming. After all, the big thinkers in computer science tend to favor the imperative and functional schools of thought. Maybe declarative suffers from some bias.

Or maybe declarative has limited utility.

There. I said it. Limited utility.

I think a functional approach might be better, faster and simpler.

Side-bar Ranting

The code is below. You can skip down to the "The Functional Build System" section and not miss much.

Declarative programming seems applicable to the cases where the ordering of operations can be easily deduced. It seems like the significant value of declarative programming is to rely on an optimizing compiler rearrange the declarations into properly-ordered imperative steps. From this viewpoint, it seems like ant/maven/gradle are optimizers that look at the dependencies among transformation functions and then apply the functions in the proper order.

It seems like we're writing expressions like these:

x.class = java(x.java)
xyz.jar = jar(x.class, y.class, z.class, ... )
app.war = war(xyz.jar, abc.jar, ... )

and then turning them over to a clever compiler (like Haskell) to work out a total order among the expressions that will build the right thing for us.

There's a potential difference between manually structuring a script to get all of the steps in order and allowing the compiler to arrange things properly based on some formal semantics behind each expression.

It's a potential difference because most folks that deal with ant/maven/gradle tend to put things in more-or-less the right order so that others can figure out what the hell is going on. In the trivial cases where we're building simple web sites, the default rules have evolved to the point where they work in almost all cases, so we don't even look at the configuration of the tools. We hit Ctrl+B knowing that it's all setup properly

Some Requirements

A number of applications have ant-like (or make-like) aspects but don't really cry out for ant with customized actions. We might be doing data warehouse loads which involve an ant-like sequence of processing steps to do transformations, loads, and produce final summaries and confirmations. We can, of course, write this all in first-class Java code. The hard way.

It's not terribly complex. A class to define a dependency. A suite of plug-in strategies. Some static definitions of the actual rules. Been there. Done that.

Pragmatically, the declarative style suffers from a limitation of being rather rigid in applying a fixed set of rules. A more script-like implementation can be more helpful to support reruns, debugging, problem-solving and the inevitable special cases and exceptions. After a storage failure -- and the reruns required to get the warehouse back up-to-date -- one sees more need for script-like flexibility and less need for overly simplistic rigidity.

Another end of the spectrum is individual steps all manually coordinated with a tool like BMC's Control-M. This requires endless manual intervention to make sure all the various tasks are defined properly in Control-M.

Somewhere near the middle is a configurable application with some processing rules to give it flexibility. But some defined structure to remove the need for carefully planned manual intervention and deep expertise.

The Functional Build System

We can image an ant-like build system defined functionally.

The core is a function that implements build-if-needed rules:

def build_if_needed( builder, target_file, *source ):
    if target_ok( target_file, *source ):
        return "ok({0},...)".format(target_file)
    builder( target_file, *source )
    return "{0}({1},...)".format(builder.__class__.__name__,target_file)


We can use this function to define the essential dependency: use a builder function to create some target if it's out-of-date with respect to the sources. The return value forms a kind of audit log.

This relies on some helper functions: target_ok() checks the modification times of files. The various builders do the various kinds of operations required to make one from the sources.

Here's the target_ok() function

def target_ok( target_file, *source_list, logger=logging ):
    try:
        mtime_target= datetime.datetime.fromtimestamp(
            os.path.getmtime( target_file ) )
    except Exception:
        return False
    # If a source doesn't exist, we throw an exception.
    times = (datetime.datetime.fromtimestamp(
            os.path.getmtime( source ) ) for source in source_list)
    return all(mtime_target > mtime_source for mtime_source in times)


I think this function is what started me thinking about a functional approach. It could be a method of a class. But. It's seems like a very functional design. It could be reduced to a single (long) expression.

The builders are composite functions. They need to combine the subprocess.check_call() with a function that builds the command. We can do functional composition several ways in Python: we can combine functions via decorators. We can also combine functions via Callables. We could write a higher-order function that combines the check_call() with a function to create the command.

We'll opt for the higher-order function and create partially evaluated forms using functools.partial().

Here's a typical case:


def subprocess_builder( make_command, target_file, *source_list ):
    command= make_command( target_file, *source_list )
    subprocess.check_call( command )


This is a generic function: it requires a function (or lambda) to build the actual command. We might do something like this to create a specific builder.


def command_rst2html( output, *input ):
        return ["rst2html.py", "--syntax-highlight=long", "--input-encoding=utf-8", input[0], output]

rst2html= partial( subprocess_builder, command_rst2html )


This rst2html() function can be used to define a dependency rule. We might have something like this:


    files_txt = glob.glob( "*.txt" )
    for f in files_txt:
        build_if_needed( rst2html, ext_to(f,'.html'), f )


This rule specifies that *.html files depend on *.txt files; when needed, use the rst2html() function to build the required html file when the txt file is newer.

The ext_to() function is a two-liner that changes the extension on a filename. This helps us write "template" build rules rather than exhaustively enumerating a large number of similar files.


def ext_to( filename, new_ext ):
    name, ext = os.path.splitext( filename )
    return name + new_ext


What we've done here is define a few generic functions that form the basis for a functional build system that can compete against ant, make or scons. The system is not even close to declarative. However, we only need to assure that our final build_if_needed() functions have a sensible ordering, something that's rarely a towering intellectual burden.

The individual customizations are the build commands like rst2html() where we created the command-line list of strings for subprocess.check_call(). We can just as easily build functions which run entirely in the process or functions which farm the work out to separate processes via queues or RESTful web services.

Bottom Lines

It appears that declarative programming isn't terribly helpful. There may be a niche, but it seems to be a small niche to me.

I'm sure that an object-oriented approach to this problem isn't any better. I've written a shabby-make version of this, and it's bigger. There's just more code and it's not significantly more clear what's going on. Inheritance can be difficult to suss out.

Python seems to be a good functional programming language. It did this very nicely.