Cycle I: History of Bop

Video Bop interpretation of an Excerpt from Jack Kerouac’s ‘History of Bop’

History Of Bop The History of Bop by Jack Kerouac

As a jazz enthusiast and young adult, reading Kerouac’s ‘On The Road’ was a transformative experience. In school, I was predominantly interested in the math and sciences and hardly cared to read a book or pick up a pen to write. However, Kerouac’s style, legacy, and approach to writing (and life for that matter) convinced me of the value in these types of intellectual pursuits. Over the last five years or so, I’ve continued to explore Kerouac and other works from the beat canon; One exciting find was his set of spoken word readings called ‘On the Beat Generation’. For cycle I, I focused on the final minute and a half section from his work ‘History of Bop’. I find this writing to be a triumphant portrayal of the evolution of bebop and cultural changes in America surrounding the genre.

Upon researching the piece, it was interesting to find that it was originally published in the April 1959 issue of an lewd magazine called Escapade – a hidden gem of writing amongst smutty caricatures and 50s advertisements. Although, I was not entirely surprised by this discordant arrangement given that the hero of the beats (Neal Cassady) was, according to Allen Ginsberg, an “Adonis of Denver—joy to the memory of his innumerable lays of girls in empty lots & diner backyards, moviehouses’ rickety rows, on mountaintops in caves or with gaunt waitresses in familiar roadside lonely petticoat upliftings & especially secret gas-station solipsisms of johns, & hometown alleys too”(Howl, 1956). This quality of the beats did not age well especially from the vantage of sexual equality, but unconventional behavior and criticism are the norm for this eclectic group. What I’d call a most foundational criticism of Kerouac and this piece of writing in particular is in the realm of black appropriation. Scholars like James Baldwin described this ‘untroubled tribute to youthful spontaneity [as] a double disservice—to the black Americans who were assumed to embody its spirit of spontaneity and to Kerouac’s full literary achievement… a romantic appraisal of black inner vitality’(Scott Saul, FREEDOM IS, FREEDOM AIN’T, pg 56). It cannot be denied that Kerouac and his writer friends were escaping what they feared as the trap of the middle class white picket fence, and couldn’t have experienced the true reality of being a black in the mid 20th century. Yet, their works speak to a deep respect and profound inspiration for the black art of their time (jazz).

One artistic technique embodied by the beats and inspired from jazz is what Kerouac calls ‘spontaneous prose’. Jazz musicians exceled at this rapid invention of musical structures through heightened sonic sensibility and borderline phantasmal prostheses of an instrument to their nervous systems and soul for that matter. The beats took up this approach with the technology of their time; namely the typewriter. Kerouac even made a manifesto called ESSENTIALS OF SPONTANEOUS PROSE in which he describes the procedure of writing as ‘the essence in the purity of speech, sketching language is undisturbed flow from the mind of personal secret idea-words, blowing (as per jazz musician) on subject of image.’ Other writers of his time, such as Truman Capote, didn’t see eye to eye with this artistic style and humorously commented “That’s not writing, that’s typing.” While I don’t mean to argue that thoroughly edited and fully composed art is better or worse than spontaneous artforms like jazz and bop-prose, it is just that spontaneous modes of creation akin to ‘play’ can potentially more honestly externalize such private and elusive inner processes occurring in the lawless relational environment of the mind.

It has been a good 70 years since these creative practices surfaced within the avant-garde art scene, and the technologies at society’s disposal for externalizing thought have tremendously improved especially due to the massive sea of networked image and video objects available to the average internet user. Further, a concentrated development of skills in computer programming may be analogous to the level of discipline in harnessing artistic technics like the saxophone, piano, pencil, paper, typewriter, voice recorder and so on. With this long winded explanation in mind, these ideas are much the backbone of my inspirations in new media art and what I wish to explore in this 3 cycled project.

In cycle 1, I set out to visualize the final section of history of bop. I’ve listened to this poem recited countless times and did a lyrical transcription – meaning I listened to the recording and wrote out all of its words. This is a common practice among jazz musicians and writers since as it helps internalize language. Next, I followed Dmytro Nikolaiev’s implementation of a Vosk LLM in Python to convert the audio file into a transcribed text JSON file which contained the words and their position in time as spoken.

Not all the words were accurately transcribed so I had to manually correct as AI models are good but not nuanced enough to decipher all spoken phrases especially from an unconventional speaker like Kerouac. Further, I did some post-processing of the data to get it in a form which would play nicely with Isadora’s JSON parser actor. The format involved Key-Value pairs of timestamps and words spoken. Through experimentation, I found that repeating each word in the JSON list near the millisecond (ms) frequency ensured that it would appear onscreen and remain illuminated consistently as the corresponding audio is spoken. I found that although the resolution of the timecode variable of the movie player actor was at the ms scale, it didn’t increment consistently enough to predict the values it would trigger. Consequently, having a large and widespread array of timestamps between the start and end of a word ensures that it will be triggered. Additionally, Isadora plays in the realm of percentages between 0 and 100 rather than the typical format of videos being in time duration. So I had to account for a conversion of timestamp as percent completed with respect to the total length of the audio clip.

This allowed me to funnel the current position of a playing audio file through to the JSON parser actor, such that as the timecode increments, the transcribed text would display onscreen exactly in time with the recited poetry. This was exciting on its own as it was a semi-autonomous method to generating lyric videos. Also, the style of the text was strobe like giving the quality of spoken words – they appear and vanish in an instant. See below the media circuit implemented in Isadora (with image player disabled) to see the flow of time triggering text and subsequently displaying on screen.

The next stage in the process was to find imagery representing the ideas that Kerouac is expressing. Following Kerouac’s ‘Setup’ step: ‘The object is set before the mind, either in reality. as in sketching (before a landscape or teacup or old face) or is set in the memory wherein it becomes the sketching from memory of a definite image-object.’, I used Kerouac’s speaking of the words to serve as the stimulus (object) for mental imagery. Once an image or set of ideas was established in mind through free-association (‘mental image blowing’), I would search Youtube to find a clip which best represented this mental image. Instead of using one of the many available and ad-prone sites for converting Youtube videos to .mp4 files, I adapted a Python script using the yt-dlp/yt-dlp library to do so with better speed and precision. This allowed me to quickly find sections of videos, copy their video url along with start / end time into a function which would download the video file with a specific name to a designated folder. This method allows more mental energy to be concentrated on thinking of images and finding existing internet representations rather than downloading and cutting the video segments. In this way a quick flow can be achieved, better mimicking Kerouac’s spontaneous prose method. To add, just as writing is a negotiation between image thoughts and the language available to one’s tongue and fingers at that moment, video-bop (new term for this method) is a negotiation between a visualized animation and the medium of available images / videos online. This medium includes not only the content available and the form they take, but also the algorithmic recommender process personalized by the user’s previous internet activity. For in this intentionally fast paced creative process, one relies heavily on differentiating search terms to approach on an appropriate visualization.

When videos were found, they were named with the first ‘semi-unique’ word within the phrase they belonged to, and the length of the video was chosen to match the duration of that phrase as calculable from the initial audio transcription step. The grouping of phrases is to the video-bopper’s discretion and in accordance to the aesthetic sensibility eminent in jazz’s musical structuring. It is not necessary to find video images in the order for which they are spoken. I’d hopped around between Kerouac’s phrases freely and would encourage this approach as it may follow the flow of thinking more closely and it builds natural structures of moments and transitions between moments. This idea was neatly phrased by Mark Turner, a modern tenor saxophone player who describes “When I’m in the middle of a solo, whenever I am most certain of the next note I have to play, the more possibilities open up for the notes that follow.”(The Jazz of Physics, Stephon Alexander). To riff on Heisenberg’s uncertainty principle, there is this interdependence on knowing the exactness of both a particle’s momentum and position. To extend this into the domain of thought and artistic expression, perhaps carelessly, it suggests that there is a tradeoff in awareness. When the improvisor’s awareness is tuned most closely to what idea should come next, they may be unaware of the larger artistic structure to emerge. In contrast, if the improvisor’s awareness is tuned to larger timespans and movements in the piece, they may have less awareness of the idea to come next.

As a continuation on methodology, Isadora unfortunately doesn’t read file-paths for media artifacts and rather relies on an internal numbering system as files are uploaded to the project. To accommodate this structure, I manually updated a JSON object to convert between video trigger words and their index in Isadora. These index values are passed into Isadora’s movie player object to allow them to be visualized in time with the typography and spoken words.

Upon showing the project to my classmates and delving into ideas on how it can be more interactive for cycle’s 2 and 3 to come, many important suggestions were accumulated to form the next direction:

  • Jiang – Words that are action oriented may be good to include for transitions of images. (ie ‘turn’).
  • Afure – She liked my interpretation of word ‘Dreaming’ – Although, everyone would have a different interpretation of that word and what img they’d select.
    • Tik Tok Trend for going on Pinterest and searching words and displaying what image that is algorithmically connected with.
  • Kiki – Liked the subject matter – she teaches jazz dance and it would be helpful to have a more interesting way to teach about the this genre.
  • Alex – ways for interactivity – what sorts of ways to automate img generation and allow for user thoughts / personalities to be included. Potential for a custom web-app.
  • Nathan – would like to have clicked on links related to the content being shown. As a way to learn more about each part (as informative).

With these suggestions in mind, I plan to explore the use of Dispatcher — python-osc 1.7.1 documentation to build a simple server hosted web interface open to smart-device users connected to a LAN. A spoken word poem should be found and disseminated to each of the audience’s device surfaces. As an experience, there should be a listening period in which the audience engages with their own forms of active imagination to see what phrases catch their ears and images that become naturally available to them. From there, they should go through the video-bop process and find a clip which matches what they’ve concepted. Then they will connect to the media server and paste the link of that video, the start and end time, as well as the phrase it connects to. The media server will need to collect these audience responses and run the Youtube extraction script to grab all the associated artifacts and make them available for rendering in time with the spoken word poetry. This is the direction I envision for cycle two and a diagram of how I see the interaction occurring:



Leave a Reply