Tuesday, September 1, 2015

Audio Synth in Python 3.4, Part II

See Audio Synth.

At first, I imagined the problem was going to be PyAudio. This package has a bunch of installers. But the installers don't recognize Python 3.4, so none of them work for me. The common fallback plan is to install from source, but, I couldn't find the source. That looks like a problem.

Once I spotted this: "% git clone http://people.csail.mit.edu/hubert/git/pyaudio.git", things were much better.  I built the PortAudio library. I installed PyAudio for Python3.4. Things are working. Noises are happening.

Next step is actual synth.

In the past, I have played with pysynth because it has some examples of wave-table additive synth. That's very handy. The examples are hard to follow because a lot of the synth ideas are conflated into small functions.

Complication: The pysynth package is Python2. It lacks even the simple from __future__ import print_function to make it attempt Python3 compatibility.

The pysynth.play_wav module could be a handy wrapper around various audio playback technologies, include pyaudio. It has to be tweaked, however, to make it work with Python3.4. I really need to clone the project, make the changes, and put in a pull request.

The pysynth.pysynth and pysynth.pysynth_beeper modules are helpful for seeing how wave tables work.  How much rework to make these work with Python3.4? And how much reverse engineering to understand the math?

I've since found pyo. Which is also Python 2. See the AjaxSoundStudio pages for details. This may be a better example of wave tables. But it's still Python2. More investigation to follow.

The good news is that there's some forward motion.

Tuesday, August 25, 2015

Visual studio and Python

Why write Python in Visual Studio?

That what I want to know, too.

IntelliSense? ActiveState Komodo does this. And it does it very well considering the potential complexity of trying to determine what identifiers are possibly valid in a dynamic language.

Debugger? No thanks. I haven't used it yet. [I should probably blog on the perils of debuggers.]

Project Management? GitHub seems to be it. Some IDE integration might be helpful, but the three common command-line operations -- git pull, git commit, and git push -- seem to cover an awful lot of bases.

I've been asked about Python IDEs -- more than once -- and my answer remains the same:
The IDE Doesn't Matter. 

One of the more shocking tech decisions I've seen is the development manager who bragged on the benefits of VB. The entire benefit was this: Visual Studio made the otherwise awful VB language acceptable.

The Visual Studio IDE was great. And it made up for the awful language.

Seriously.

The development manager went to to claim that until Eclipse had all the features of Visual Studio, they were sure that Java was not usable. To them, the IDE was the only decision criteria. As though code somehow doesn't have a long tail of support, analysis, and reverse engineering. 

Tuesday, August 18, 2015

Audio Synth [Updated]

I learned about synthesizers in the '70's using a Moog analog device. Epic coolness.

Nowadays, everything is digital. We use wave tables and (relatively) simple additive synth techniques.

I made the mistake of reading about Arduino wave table synthesis:

http://learning.codasign.com/index.php?title=Wavetable_Synthesis

http://makezine.com/projects/make-35/advanced-arduino-sound-synthesis/

http://playground.arduino.cc/Main/ArduinoSynth

The idea of an Arduino alarm that uses a chime instead of a harsh buzz is exciting. The tough part about this is building the wave tables.

What a perfect place to use Python: we can build wave tables that can be pushed down to the Arduino. And test them in the Python world to adjust the frequency spectrum and the complex envelope issues around the various partials.

See http://computermusicresource.com/Simple.bell.tutorial.html

Except.

Python3.4 doesn't have PyAudio support.

Yet.

Sigh. Before I can work with Arduino wave tables, I'll have to start by figuring out how to build PyAudio for Python 3.4 on Mac OS X.

Look here: http://people.csail.mit.edu/hubert/git/pyaudio.git for the code.

Look here for the secret to building this on Mac OS X: https://stackoverflow.com/questions/2893193/building-portaudio-and-pyaudio-on-mac-running-snow-leopard-arch-issues/2906040#2906040.

Summary.

  1. Get pyaudio source.
  2. Inside pyaudio create a portaudio-v19. Get the portaudio source and put it here.
  3. Inside pyaudio/pyaudio, do ./config; make and sudo make install
  4. Inside pyaudio, do python3.4 setup.py install --static-link

Tuesday, August 4, 2015

Mocking and Unit Testing and Test-Driven Development

Mocking is essential to unit testing.

However.

It's also annoyingly difficult to get right.

If we aren't 100% perfectly clear on what we're mocking, we will merely canonize any dumb assumptions into mock objects that don't really work. They work in the sense that they don't crash, but they don't properly test the application objects since they repeat some (bad) assumptions.

When there are doubts, it seems like we have to proceed cautiously. And act like we're breaking some of the test-first test-driven-development rules.

Note. We're not really breaking the rules. Some folks, however, will argue that test-driven development means literally every action you take should be driven by tests. Does this include morning coffee or rotating your monitor into portrait mode? Clearly not. What about technical spikes?

Our position is this.
  1. Set a spike early and often. 
  2. Once you have reason to believe that this crazy thing might work, you can formalize the spike with tests. And mock objects.
  3. Now you can write the rest of the app by creating tests and fitting code around those tests.
The import part here is not to create mocks until you really understand what you're doing.

Book Examples

Now comes the tricky part: Writing a book.

Clearly every example must have a unit test of some kind. I use doctest heavily for this. Each example is in a doctest test string.

The code for a chapter might look like this.


test_hello_world = '''
>>> print( 'hello world')
'hello world'
'''

__test__ = { n:v for n,v in vars().items() 
    if n.startswith('test_') }

if __name__ == '__main__':
    import doctest
    doctest.testmod()

We've used the doctest feature that looks for a dictionary assigned to a variable named __test__. The values from this dictionary are tests that get run as if they were docstrings found inside modules, functions, or classes.

This is delightfully simple. Expostulate. Exemplify. Copy and Paste the example into a script for test purposes and Exhibit in the text.

Until we get to external services. And RESTful API requests, and the like. These are right awkward to mock. Mostly because a mocked unittest is singularly uninformative.

Let's say we're writing about making a RESTful API request to http://www.data.gov. The results of the request are very interesting. The mechanics of making the request are an important example of how REST API's work. And how CKAN-powered web sites work in general.

But if we replace urrlib.request with a mock urllib, the unit test amounts to a check that we called urlopen() with the proper parameters. Important for a lot of practical software development, but also uniformative for folks who download the code associated with the book.

It appears that I have four options:

  1. Grin and bear it. Not all examples have to be wonderfully detailed.
  2. Stick with the spike version. Don't mock things. The results may vary and some of the tests might fail on the editor's desktop.
  3. Skip the test.
  4. Write multiple versions of the test: a "with real internet" version and a "with corporate firewall proxy blockers in place" version that uses mocks and works everywhere.
So far, I've leveraged the first three heavily. The fourth is awkward. We wind up with code like this:

class Test_get_whois(unittest.TestCase):
    def test_should_get_subprocess(self):
        subprocess = MagicMock()
        subprocess.check_output.return_value=b'\nwords\n'
        with patch.dict('sys.modules', subprocess=subprocess):
            import subprocess
            from ch_2_ex_4 import get_whois
            result= get_whois('1.2.3.4')
        self.assertEquals( result, ['', 'words'] )
        subprocess.check_output.assert_called_with(['whois', '1.2.3.4'])


This is not a lot of code for enterprise software development purposes. It's a bit weak, in fact, since it only tests the Happy Path.

But for a book example, it seems to be heavy on the mock module and light on the subject of interest.
Indeed, I defy anyone to figure out what the expository value of this is, since it has only 2 lines of relevant code wrapped in 8 lines of boilerplate required to mock a module successfully.

I'm not unhappy with the unitest.mock module in any way. It's great for mocking modules; I think the boilerplate is acceptable considering what kind of a wrenching change we're making to the runtime environment for the unit under test.

This fails at explication.

I'm waffling over how to handle some of these more complex test cases. In the past, I've skipped cases, and used the doctest Ellipsis feature to work through variant outputs. I think I'll continue to do that, since the mocking code seems to be less helpful for the readers, and too focused on purely technical need of proving that all the code is perfectly correct.

Tuesday, July 28, 2015

Amazon Reviews

Step 1. Go to amazon.com and look for one (or more) of my Python books.

Step 2. Have you read it?

  •     Yes: Thanks! Consider posting a review.
  •     No: Hmmm.
That's all. Consider doing this for other authors, also. 

Social media is its own weird economy. The currency seems to be evidence of eyeballs landing on content.

Tuesday, July 21, 2015

A Surprising Confusion

Well, it was surprising to me.

And it should not have been a surprise.

This is something I need to recognize as a standard confusion. And rewrite some training material to better address this.

The question comes up when SQL hackers move beyond simple queries and canned desktop tools into "Big Data" analytics. The pure SQL lifestyle (using spreadsheets, or Business Objects, or SAS) leads to an understanding of data that's biased toward working with collections in an autonomous way.

Outside the SELECT clause, everything's a group or a set or some kind of collection. Even in spreadsheet world, a lot of Big Data folks slap summary expressions on the top of a column to show a sum or a count without too much worry or concern.

But when they start wrestling with Python for loops, and the atomic elements which comprise a set (or list or dict), then there's a bit of confusion that creeps in.

An important skill is building a list (or set or dict) from atomic elements. We'll often have code that looks like this:

some_list = []
for source_object in some_source_of_objects:
    if some_filter(source_object):
        useful_object = transform(source_object)
        some_list.append(useful_object)

This is, of course, simply a list comprehension. In some case, we might have a process that breaks one of the rules of using a generator and doesn't work out perfectly cleanly as a comprehension. This is somewhat more advanced topic.

The transformation step is what seems to causes confusion. Or -- more properly -- it's the disconnect between the transformation calculations on atomic items and the group-level processing to accumulate a collection from individual items.

The use of some_list.append() and some_list[index] and some_list is something that folks can't -- trivially -- articulate. The course material isn't clarifying this for them. And (worse) leaping into list comprehensions doesn't seem to help.

These are particularly difficult to explain if the long version isn't clear.

some_list = [transform(source_object) for source_object in some_source_of_objects if some_filter(source_object)]

and

some_list = list( map(transform, filter(some_filter, some_source_of_objects)) )

I'm going to have to build some revised course material that zeroes in on the atomic vs. collection concepts. What we do with an item (singular) and what we do with a list of items (plural).

I've been working with highly experienced programmers too long. I've lost sight of the n00b questions.

The goal is to get to the small data map-reduce. We have some folks who can make big data work, but the big data Hadoop architecture isn't ideal for all problems. We have to help people distinguish between big data and small data, and switch gears when appropriate. Since Python does both very nicely, we think we can build enough education to school up business experts to also make a more informed technology choice.


Tuesday, July 14, 2015

Upgrading to Python 3

Folks who don't use Python regularly -- the folks in TechOps, for example -- are acutely aware that the Python 3 language is "different," and the upgrade should be done carefully. They've done their homework, but, they're not experts in everything.

They feel the need to introduce Python 3 slowly and cautiously to avoid the remote possibility of breakage. Currently, the Python 3 installers are really careful about avoiding any possible conflicts between Python 2 and 3; tiptoeing isn't really necessary at all.

I was stopped cold from having Python 3 installed on a shared server by someone who insisted that I enumerate which "features" of Python 3 I required. By enumerating the features, they could magically decide if I had a real need for Python 3 or could muddle along with Python 2. The question made precious little sense for many reasons: (1) many things are backported from 3 to 2, so there's almost nothing that's exclusive to Python 3; (2) both languages are Turing-Complete, so any feature in language could (eventually) be built in the other; (3) I didn't even know languages has "features." The reason they wanted a feature list was to provide a detailed "no" instead of a generic "no." Either way, the answer was "no." And there's no reason for that.

In all cases, we can install Python 3 now. We can start using it now. Right now.

Folks who actually use Python regularly -- me, for example -- are well aware that there's a path to the upgrade. A path that doesn't involve waiting around and slowly adopting Python 3 eventually (where eventually ≈ never.)
  1. Go to your enterprise GitHub (and the older enterprise SVN and wherever else you keep code) and check out every single Python module. Add this line: from __future__ import print_function, division, unicode_literals. Fix the print statements. Just that. Touch all the code once. If there's stuff you don't want to touch, perhaps you should delete it from the enterprise GitHub at this time.
  2. Rerun all the unit tests. This isn't as easy as it sounds. Some scripts aren't properly testable and need to be refactored so that the top-level script is made into a function and a separate doctest function (or module) is added. Or use nose. Once you have an essentially testable module, you can add doctests as needed to be sure that any Python 2 division or byte-fiddling work correctly with Python 3 semantics for the operators and literals.
  3. Use your code in this "compatibility" mode for a while to see if anything else breaks. Some math may be wrong. Some use of bytes and Unicode may be wrong. Add any needed doctests. Fix things in Python 2 using the from __future__ as a bridge to Python 3. It's not a conversion. It's a step toward a conversion.
This is the kind of thing that can be started with an enterprise hack day. Make a list of all the projects with Python code. Create a central "All the Codes" GitHub project. Name each Python project as an issue in the "All the Codes" project. Everyone at the hack day can be assigned a project to check out, tweak for compatibility with some of the Python 3 features and test.

You don't even need to know any Python to participate in this kind of hack day. You're adding a line, and converting print statements to print() functions. You'll see a lot of Python code. You can ask questions of other hackers. At the end of the day, you'll be reasonably skilled.

Once this is done, the introduction of Python3 will not be a shock to anyone's system. The print() functions will be a thing. Exact division will be well understood. Unicode will be slightly more common.

And -- bonus -- everything will have unit tests. (Some things will be rudimentary place-holders, but that's still a considerable improvement over the prior state.)