Blah, Blah, Blah

  • President’s Day Out

    President’s Day Out

    It’s hardly travel until I consider how much time I’ve spent out and about in Arkansas since we moved here. Monday I had the day off and Mattie and I went out to Devil’s Den State Park. I’d read that the park could be crowded, but we hardly saw another soul. We trotted around the 1.7-mile main loop and poked our nose and camera into the limestone formations.

  • Mandolin Cairns

    Mandolin Cairns

    I picked up playing music late in life. If I have any innate musical ability, it’s buried deep down waiting to be found. In my thirties I started playing electric and string bass. It was fun. I picked up some basic jazz and blues tricks playing with friends and through lessons, but some fundamental facts involving gravity and bass-clef rhythm instruments made practice less than fulfilling.

    Walking by Tejon Street Music on my way to a bass lesson, I noticed a mandolin in the window. How shallow can you get, wanting to play an instrument because of appearance, but I was tired of lugging amps or a 7/8 string bass and I was drawn to the arcs comprising mandolin profiles…. I quickly learned that the mandolin is tuned the same as a bass…. except upside down, and player contribute to both melody and rhythm parts. Practice could be fun!

    Many hours of practice, lessons, and hacking later, I still haven’t found my talent, but I’ve had fun along the way. Here are some cairns I’ve left to trace my steps for my own reflection and hopefully to speed others’ journey to uncover their talents.

    Models and teachers: It helps to hear different voices and different techniques from musician’s at all levels. The masters like Chris Thile and Jethro Burns still wreck my brain trying to figure out what’s going on. Having help breaking down what David Grisman and Bill Monroe makes many other licks and songs approachable at higher speeds. Hearing mortals play at the farmer’s markets, teachers, lessons, and instructional videos ease acoustic digestion.

    Play with others: Metronomes make dull partners, but finding a patient group of peers will help you play to someone else’s time, expose your ear to the other voices of the same song…. The Mandolin Orchestra of Northwest Arkansas are wonderful colleagues and teachers. iReal Pro is another tool to help keep time during practice.

    Projects: I’ll be posting some simple projects here that have served as launch points. Sweet Georgia Brown, Black Orpheus, and Hotel California are some landmarks I’d like to explore in future posts. Picking up a fake book is an easy way to explore new tunes and get started.

    Experiment with strings and picks: Mandolins are expensive! String and pick choice have a lot more bang for the buck than laying out a mortgage payment for a new mandolin. After five years of experimenting with a clunky Breedlove KO (and waiting for Oregon production to end), I think I’ve finally dialed it in with Thomastik strings and a casein pick.

    Learn to read: I wish I could hear better, but reading music gives me a jump start. Seeing lots of notes laid out in front of me used to be intimidating. Read Bach! The cello suites transcribed to the treble clef are a good start. Handel and Telemann wrote lots of music appropriate for mandolin (flute and violin). Seeing the music also makes theory more approachable, which helps improvisation, memorization, and accompaniment. Reading isn’t a panacea, so make sure to trust your ears.Mandolin Cairns

  • Exploring Python with Data

    In the glut of Python data analysis tools, I’m sometimes embarrassed by my lack of comfort with Python for analysis. Static types, Java/Scaladoc, and slick IDEs in concert with compilers provide a guides that I haven’t been able to replace in Python. Additionally, the problem of dynamic types seems to exacerbate problems with library interoperability. With Anaconda and Jupyter, though, I can share some quick notes on getting started.

    Here are some notes on surveying some admittedly canned data to classify malignant/benign tumors. The Web is littered with examples of using sklearn to classify iris species using feature dimensions, so I thought I would share some notes exploring one of the other datasets included with scikit-learn, the Breast Cancer Wisconsin (Diagnostic) Data Set. I’ve also decided to use Python 3 to take advantage of comprehensions and because that’s what the Python community uses where I work.

    The notebook below illustrates how to load demo data (loading csv is simple, too), convert the scikit-learn matrix to a DataFrame if you want to use Pandas for analysis, and applies linear and logistic regression to classify tumors as malignant or benign.

    share_breast
    In [7]:
    %matplotlib inline
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.linear_model import LinearRegression
    import pylab as pl
    import pandas as pd
    from sklearn import datasets
    
    # demo numpy matrix to Pandas DataFrame
    bc = datasets.load_breast_cancer()
    pbc = pd.DataFrame(data=bc.data,columns=bc.feature_names)
    pbc.describe()
    
    Out[7]:
    mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension worst radius worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension
    count 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000
    mean 14.127292 19.289649 91.969033 654.889104 0.096360 0.104341 0.088799 0.048919 0.181162 0.062798 16.269190 25.677223 107.261213 880.583128 0.132369 0.254265 0.272188 0.114606 0.290076 0.083946
    std 3.524049 4.301036 24.298981 351.914129 0.014064 0.052813 0.079720 0.038803 0.027414 0.007060 4.833242 6.146258 33.602542 569.356993 0.022832 0.157336 0.208624 0.065732 0.061867 0.018061
    min 6.981000 9.710000 43.790000 143.500000 0.052630 0.019380 0.000000 0.000000 0.106000 0.049960 7.930000 12.020000 50.410000 185.200000 0.071170 0.027290 0.000000 0.000000 0.156500 0.055040
    25% 11.700000 16.170000 75.170000 420.300000 0.086370 0.064920 0.029560 0.020310 0.161900 0.057700 13.010000 21.080000 84.110000 515.300000 0.116600 0.147200 0.114500 0.064930 0.250400 0.071460
    50% 13.370000 18.840000 86.240000 551.100000 0.095870 0.092630 0.061540 0.033500 0.179200 0.061540 14.970000 25.410000 97.660000 686.500000 0.131300 0.211900 0.226700 0.099930 0.282200 0.080040
    75% 15.780000 21.800000 104.100000 782.700000 0.105300 0.130400 0.130700 0.074000 0.195700 0.066120 18.790000 29.720000 125.400000 1084.000000 0.146000 0.339100 0.382900 0.161400 0.317900 0.092080
    max 28.110000 39.280000 188.500000 2501.000000 0.163400 0.345400 0.426800 0.201200 0.304000 0.097440 36.040000 49.540000 251.200000 4254.000000 0.222600 1.058000 1.252000 0.291000 0.663800 0.207500

    8 rows × 30 columns

    In [8]:
    from math import sqrt
    from sklearn.linear_model import LinearRegression
    from sklearn.linear_model import LogisticRegression
    
    # Plot training-set size versus classifier accuracy.
    def make_test_train(test_count):
        n = bc.target.size
        trainX = bc.data[0:test_count,:]
        trainY = bc.target[0:test_count]
        testX = bc.data[n//2:n,:]
        testY = bc.target[n//2:n]
        return trainX, trainY, testX, testY
    
    def eval_lin(trainX, trainY, testX, testY):
        regr = LinearRegression()
        regr.fit(trainX, trainY)
        y = regr.predict(testX)
        err = ((y.T > 0.5) - testY)
        correct = [x == 0 for x in err]
        return sum(correct) / err.size, np.std(correct) / sqrt(err.size)
    
    def eval_log(trainX, trainY, testX, testY):
        regr = LogisticRegression()
        regr.fit(trainX, trainY)
        correct = (regr.predict(testX) - testY) == 0
        return sum(correct) / testY.size, np.std(correct) / sqrt(correct.size)
    
    def lin_log_cmp(n):
        trainX, trainY, testX, testY = make_test_train(n)  # min 20
        lin_acc, lin_stderr = eval_lin(trainX, trainY, testX, testY)
        log_acc, log_stderr = eval_log(trainX, trainY, testX, testY)
        return lin_acc, log_acc
    
    xs = range(20,280,20)
    lin_log_acc = [lin_log_cmp(x) for x in xs]
    
    pl.figure()
    lin_lin, = pl.plot(xs, [y[0] for y in lin_log_acc], label = 'linear')
    log_lin, = pl.plot(xs, [y[1] for y in lin_log_acc], label = 'logistic')
    pl.legend(handles = [lin_lin, log_lin])
    pl.xlabel('training size from ' + str(bc.target.size))
    pl.ylabel('accuracy');
    

    Incidentally, I used the iPython nbconvert to paste the notebook here.

    Caveats: Without types, it’s pretty easy to make mistakes in manipulating the raw data. Python and numpy scalar, array, and matrix arithmetic operators are gracious in accepting parameters, so you might get a surprise or two if you’re not careful. That combined with operating with black-box analysis tools gives me some skepticism of any conclusions, but it’s a start, and the investment was cheap.

    Other Plotting Tools: Seaborn.pairplot generates some slick scatter plot and histograms that will help identify outliers, describe ranges, and demonstrated redundancy in the data dimensions. I tried removing some of obviously redundant data columns, resulting in no quality change in logistic classification and less than statistically significant reduction linear classification.

    Linear or Logistic? It surprises me that logistic regression proved inferior classification to linear, but economists frequently use linear regression to model 0/1 variables. Paul von Hippel has a post comparing relative advantages of linear versus logistic regression. As a student, I had trouble both with application of logistic regression and conveying my travails to a thesis adviser. I wish I had read more commentary comparing the two 20 years ago.