All paths lead to Ampwall

Today is November 5, 2024. It’s been almost two years since I started working on Ampwall, a marketplace and community platform for music communities. It’s been more than a year since I walked away from my great job at Shippo to pursue it. And it’s been almost two months since we launched publicly. Things keep changing, we keep pushing forward, but none of this happened by accident.

My parents were both musicians. They told me stories about how they met as adults in New York: my mom was a folk singer and guitarist who wanted to learn to transcribe her music, my dad was a music teacher and jazz musician in the city. I remember watching my mom play guitar hearing my dad play saxophone in the other room. When I was around 10, I started learning on my mom’s old classical guitar. I got my first electric at 11 or 12, one of those Fender Squier packs with the guitar and little practice amp.

Guitar and songwriting took over my life. Creating music is like magic: you speak the incantation, you perform the ritual, and something new and powerful is born. I had a pile of cassette tapes of my own songs. When I was 14, I bought a used drum set and started learning because I was frustrated by my inability to record music by myself. I wound up joining a band with my friends. With them I got a taste of being part of a music scene. We organized shows, played locally, recorded demos, sold CD-Rs of our own music. I taught myself HTML and built our first websites.

Time went on. My high school band broke up but I got better at drums and joined other bands playing hardcore, punk, and eventually black metal. In my early 20s I got a new guitar for the first time in years and started writing my own music again. This turned into my band Woe, which came to be the most important music project of my life.

Along the way, I had to get a job but didn’t have a lot of options. High School and I never got along. I barely graduated, unable to focus on the future and prioritize my classes. Years later, I’d learn I had undiagnosed ADHD and discover I could weaponize my hyper-focus while leaning heavily on schedule and routine to help me thrive. But until then, work was a struggle because I had to be extremely engaged with something to focus.

Luckily, I could focus on technology. I got some certifications, I failed upwards into a good job with a local IT company. The founder taught me the power of unshakable commitment and determination. It reminded me a lot of the DIY attitude of folks in my music world, a confident “Yeah, I can do this, I don’t care if you think it’s crazy” that you really need to get most challenging things done.

I kept coding. Code is also like magic: you type the incantation, you perform the ritual, and something new and powerful is born. I learned PHP. When Woe’s first album came out, it started getting shared through unsanctioned music blogs. I built a pay-what-you-want website to sell it for donations. When this worked out, some friends asked to use it and I turned it into a little business! In a few months, I had a good dozen or so bands selling through it. It was the first time that my music and tech lives came together. Unfortunately (and unsurprisingly), I wasn’t prepared for something like this. Bandcamp arrived, they did everything I was doing but so much better, so I told everyone to use that and moved on.

Somewhere in there, I resumed sessions with the therapist I saw when I was a kid. The last session I remember, I was telling her about some event (doesn’t remember what) and she remarked, “Chris, when are you going to give up this music nonsense?” It floored me. This was one of those talk therapists who didn’t give advice or feedback, just offered a space for someone to talk… and this was what she wanted to comment on? It’s literally the only interaction I remember with her and one of my examples of why I think talk therapy is ridiculous! (Or at least it’s not for me. I like CBT quite a lot.)

Years went by. I kept doing Woe. I kept working in tech. I built the original Phillymetal.com, a website for people to submit metal shows. There was a little message board. Through Phillymetal, I made a ton of friends and learned a lot about building and releasing software. The original version was hand-rolled PHP. Eventually, I wanted to learn Ruby so I rebuilt it using Rails.

More time went by! By 2022, I was years past a move from IT to software engineering. I had left Philadelphia for New York City. I was (still am) happily married and had a wonderful daughter. But I was still playing music, still doing Woe and in a new death metal band Glorious Depravity. If anything, becoming a parent helped me focus on my goals more because it forced me to plan my time more deliberately.

Around this time, I was thinking a lot about the state of the tech world. Bandcamp had been sold to Epic Games and a lot of people were wondering whether its future was safe. I was also frustrated: being in a band is expensive and complicated enough without Bandcamp taking their 10-15%. It was one thing when they were an independent company, but now… fueling Epic Games? I realized that literally every option that a band had for sharing and selling their music online was owned by a gigantic corporation: Spotify, Apple Music, YouTube, and now Bandcamp. And beyond that… Bandcamp sort of… wasn’t that good? My software engineer brain started buzzing: I can do this better. My DIY brain chimed in: there’s no reason not to just do it.

In January 2023, I went to a Shippo offsite conference. There was this panel led by founders Laura Behrens Wu and Simon Kreuz where they talked about their experience founding the company. Laura made this remark that was so clear, simple, and powerful: “If you think you can do something better, you should do it.” So calm and confident. And… why not? If you can, you should. I believed I could so I did.

I started working on Ampwall right away. There was a new Woe album coming out later that year so that would be my target. I wanted to at least sell my merch by release date, then follow up with music, then expand it to let other people in. I had very specific product goals in mind: I wanted better looking pages, better shipping, a data model built for people in multiple bands. I wanted ways to help connect bands or music scenes together so things felt like a network and less lonely. I wanted lower fees so more money could get into the pockets of bands. And I wanted it to be built and owned by DIY maniacs, not corporate vampires – a sustainable business that would be on the side of the arts. I wrote an essay about Ampwall’s mission that goes deeper into why it exists, describes the problem more fully and how Ampwall is trying to fix things.

Fast forward to today. Ampwall is the first direct competitor to Bandcamp. We launched public signup almost two months ago and it’s exceeded all of our expectations. Note a key word there: “our”. There’s a team, there’s a community, it’s not just me by myself anymore. Every day we make progress. I’m happy, I feel like what I’m doing matters. Ampwall is making real money for real bands. It’s also building communities, helping people find folks who like what they do. And it’s building joy, helping our bands feel like they have a voice and a home.

I think back to my path and it really seemed inevitable that I’d wind up here. The kid playing his mom’s acoustic guitar became the adult touring with his band, living his life for heavy metal and art. The kid who loved technology became a software engineer who loves working with startups, building real products that help people. Ampwall brings everything together, it merges my separate music and tech identities into one. I couldn’t possibly be more proud to be here. The “music nonsense” that defined my life is still there, far from nonsense, and something I’ll never give up.

Easier Discord Slash Command Setup in Node.js

Suppose you are trying to create a Discord bot and slash commands. You are using Node.js and TypeScript and you stumble upon the Discord.js library. I mean, how could you miss it? It shows up in search results before even Discord’s official documentation so you’d be forgiven to think that it is official. And you might look at some of the code and examples and think, “Wow, this sure is… a lot of words. And there are for loops?” And maybe you’d think that this is just how Discord engineers want developers to do things. And it might make you think that this is really fucking complicated.

Surprise! Discord.js is not an official Discord library. And where slash commands are concerned, the code makes it significantly more complicated than it needs to be. It’s so complicated that I’d argue it’s doing a disservice to Discord and the talented Discord.js developers who maintain the package.

So without further ado, if you want to make a very simple Discord slash command, it’s as simple as this:

  1. Make the API call to create the slash command

In Discord.js, you’d use their SlashCommandBuilder class to configure it. You can do that and just console.log the output as JSON, console.log(command.toJSON()). It’ll give you the payload of the API command you need.

BOT_TOKEN='replace_me_with_bot_token'
CLIENT_ID='replace_me_with_client_id'
curl -X POST \
-H 'Content-Type: application/json' \
-H "Authorization: Bot $BOT_TOKEN" \
-d '{"name":"hello","description":"Greet a person","options":[{"name":"name","description":"The name of the person","type":3,"required":true}]}' \
"https://discord.com/api/v8/applications/$CLIENT_ID/commands"

That’s an example from https://docs.deno.com/deploy/tutorials/discord-slash/#step-2%3A-register-slash-command-with-discord-app

  1. Setup the handler for the slash command

The Discord.js library has a lot of code about dynamically loading slash commands and using an array or a Collection so you can do dynamic lookups using names. None of that is necessary at all. You’d do that if you have a lot of slash commands and they’re changing so frequently that you can’t be bothered to manually maintain a list of commands to support. Tutorial docs do not need that.

The simple version is just a switch statement. Assuming you are using the Discord.js library with an instance of client

  discordClient.on(Events.InteractionCreate, async (interaction) => {
    if (!interaction.isChatInputCommand()) {
      logger.info('Command is not a chat input command');
      return;
    }

    switch (interaction.commandName) {
      case 'your-command-here': {
        const { options } = interaction;
        const code = options.getString('code');
        if (!code) {
          logger.info('No code provided');
          await interaction.reply({ content: 'No code provided', ephemeral: true });
          return;
        }

        const { id: discordUid, username: discordUsername } = interaction.user;
        await interaction.reply({ content: 'Successfully verified!', ephemeral: true });

        break;
      }
    }
  });

That’s it! Just like handling every webhook in existence! And if you want type safety or you want to handle it dynamically, there are easy ways to do that, too.

I don’t like speaking ill of open source projects. Discord.js is a powerful library and I appreciate the care that went into it. I know that the docs I’m complaining about come from someone trying to be helpful and offering solutions that they think will help. But the complexity of the docs turned me off to the whole process and ultimately cast both the library and Discord itself in a bad light until I realized the complexity was coming from implementation details.

React Server Actions - Versioning, Filenames, and Other Considerations

I’ve been building a new thing called Ampwall since the beginning of 2023 and it’s going quite well. I announced it via Twitter post on September 29 (same day as Woe’s fifth album release!) and the response was intense. I’m doing it full-time and I am optimistic.

Ampwall is built using Next.js. Specifically, I’m using Next.js 13’s divisive App Router. I like it quite a bit. I server-rendering, I think we ship too much JavaScript to clients and make our lives as engineers harder when we insist on writing APIs for things that should just use server templates. But I also love React and TypeScript, so I don’t want to give up on these if I don’t have to. Next.js 13’s embrace of React Server Components and Server Actions ticks all the boxes: server rendering with TypeScript and React! Beautiful.

Vercel announced Server Actions were stable last week as part of their Next.js 14 release event. This has spawned a lot of conversation, critique, and drama, most of which I find rather dull or knee-jerky or immature. But one topic piqued my interest: versioning. How do you version server actions? What do we need to consider when deploying new versions?

The problem

Version skew is well-defined and understood by software engineers. It can be a challenging problem for products with long-running client sessions. One benefit of the explicit API client-server relationship is the explicit definition and publishing of public API interfaces. Experienced engineers intuitively understand this. It’s typical for changes to be flagged during code reviews: “This will break old clients”, we should mark this argument optional and watch logs until 99% of users are on the new version”, or “We should put this endpoint in a v2 namespace”.

Server Actions essentially create RPC endpoints in Next.js servers. This is magical and it works wonderfully. When you define a function with the magic 'use server' directive or put it in a 'use server' file, it will be executed by the server.

'use server';

// This can be called from the client but it will execute on the server
export async function foo() {
  return 'bar';
}

As I understand it, this works by creating an ID representing this function and outputting code that, from a client, makes a POST and references the ID as the value of a header called Next-Action.

So then: how do we version Server Actions? More urgently, if I deploy five versions of my app across five days and a client fails to reload their window past day 1, what will happen when they interact with foo()?

What’s in a name?

The answer to all of this comes back to how the Next-Action ID is generated. From what I can tell, the ID comes from this function. It creates a SHA1 hash using a combination of the file name and function name. This matches what I’ve seen: a function called foo in a file called bar will always have the same ID, regardless of its implementation. So where versioning is concerned – if we want to introduce a breaking change to a Server Action, maybe adding a required argument to unlock some new behavior without breaking folks who haven’t reloaded yet – we can create fooV2. We could do something like this:

'use server';

export async function foo(optionalArg?: string) {
  if (optionalArg) {
    // do the new thing
  } else {
    // do the old thing
  }
}

export async function fooV2(requiredArg: string) {
  return foo(requiredArg);
}

But this is a mighty footgun: renaming a function changes your server’s public interface. “Well, duh, of course, just like renaming an API endpoint is a breaking changes.” Yes, but React Server Actions are a new paradigm with a fuzzy line deliberately drawn between client and server, invisible to an engineer working in a Client Component and easily confused with a plain ole backend async function if you’re working in the server.

React Server Actions bring us one simple refactor away from introducing version skew in a way that might be extremely surprising. Reorganize your files? Fix a typo in a function? Rename something to be more explicit or fit its purpose better? Breaking API changes.

Don’t give me bad news

Ok, so you fixed a typo in a function while doing something else. It’s so trivial that nobody thought about it during a code review, you obviously spelled “cart” as “crat” and that needed to be fixed. Version skew has been introduced but at least you’ll know from logs, right?

Nope. When a client POSTs to a Server Action function that does not exist, the server swallows it silently and returns 200. You will not know something is wrong unless the client that called it is looking for a response and fires a loggable error.

New day, new best practices

What do we do with this knowledge? I’m doubling down on some approaches and considering others.

First, the things I’ve been doing since day 1: no anonymous functions with use server, no Server Actions defined in files that aren’t explicitly dedicated to them. I put all of my Server Actions in a folder called controllers because I treat them like the C in MVC. This limits the likelihood of having to move a Server Action to another file and changing its hash.

Next, I’m considering something else: the server action itself is just a one-line wrapper around an implementation.

'use server';

export async function fooV1(input1: string, input2: number) {
  return fooImpl(input1, input2);
}

This offers a few benefits: it makes Server Actions look weird in a way that will hopefully stand out to someone reviewing, it decreases the likelihood that some well-intentioned engineer will be mucking about in a file where they can accidentally cause problems, and it limits the responsibility of the Server Action to the routing layer of the server.

How could this be improved?

It seems clearly to me that we need a way to explicitly set or seed a Server Action.

'use server';

// Great opportunity for a decorator
@actionKey('foo')
export async function foo() {
  // impl
}

// Or just add it as metadata here?
foo.actionKey = 'foo';

// Maybe a string literal?

export async function foo() {
  actionKey`foo`;
}

This breaks the dependency between the declaration and the public interface, or at least gives us a way to control the output.

Is this really worth it?

When I posted about this on Reddit, my recommendation that we need a way to control this was met with a great quip: “Like, say, a URL?”. That’s a good point. If we’re explicitly keying our endpoints, we’re sort of just creating an alternate path to a public API, aren’t we?

I’d say we still benefit from avoiding the ritual boilerplate of API declarations. Server Actions, for me, have been part of a mighty improvement to my Developer Experience, and I don’t think that explicitly keying them would negate their benefit. If anything, explicit keys would help engineers understand that their Server Actions are already API endpoints. It would make them more predictable, less magical, and help folks doing code reviews or tracking regressions. I think we’d be able to find a way to add an eslint rule to require explicit keys.

I’m going to keep using Server Actions but I’m watching this closely. They’re still young and I’m optimistic that the experience of using them and managing projects will only improve over time.

Next.js App Router Client Cache Busting

UPDATE: 24 hours after posting this, Vercel responded to the outstanding issue. A few days later, they started a discussion about it. They’ve committed to improving the caching experience.

ORIGINAL POST

Next.js 13’s App Router has caused no shortage of controversy. Lots of people are upset about Server Components. Some blogs seem to think the sky is falling. Me? I love it. Love. Big heart eyes, harps strumming, floating to a cloud.

As much as I love React, as confident as I feel with it and TypeScript, I think that the benefits of the SPA are lost on most products. I think that the massive amount of code we push to clients is embarrassing. The rituals of API requests, the complexities of client state management… I’m over it. At the risk of sounding like an old man: I miss Rails. I miss having a server that can talk to my database and spit out HTML that loads quickly on clients. One deployment, one environment, way less boilerplate. Obviously, as a professional React developer, there are tons and tons of amazing things that simply cannot be done with server-rendered pages; like I said, I love React. But I think we’ve reached peak SPA.

React Server Components give me what I want. The server talks to my database, talks to APIs, prepares data and renders as much HTML as possible. Then it lets me elegently drop into the client as needed. React Server Actions give me simple RPC calls. The move to RSC is accelerating modern approaches to CSS-in-JS (I’m currently working on a personal project using Panda and it is loevly!) and even though things are changing fast, it’s giving me what I’m looking for.

But the App Router has its share of problems. Among them is its highly opinionated caching rules, particularly its client-side caching rules. There is an ongoing discussion about this in their GitHub issues. It is the single most commented open issue in the project right now, 286 comments at the time of my writing. You can find it at https://github.com/vercel/next.js/issues/42991. Vercel so far have not responded to it.

I slapped together a quick workaround for the problem. It relies on revalidatePath, a function that Vercel limits usage of in its free tier, so it will not be for everyone. But if you’re hosting somewhere without such a restriction – maybe you’re hosting on a Node.js server so you’re not as concerned about counting function invocations – here’s my approach.

'use server';
import { revalidatePath } from 'next/cache';
// eslint-disable-next-line @typescript-eslint/require-await
export const revalidateAction = async (path: string) => {
revalidatePath(path);
};
'use server';
import * as React from 'react';
import { revalidateAction } from './cacheBusterAction';
import { ClientCacheBusterContainer } from './ClientCacheBuster';
export const CacheBusterContainer = ({ children }: { children: React.ReactNode }) => {
return <ClientCacheBusterContainer revalidateAction={revalidateAction}>{children}</ClientCacheBusterContainer>;
};
'use client';
import Link, { LinkProps } from 'next/link';
import { usePathname } from 'next/navigation';
import * as React from 'react';
import { createContext, forwardRef, useCallback, useContext, useEffect, useRef, useTransition } from 'react';
interface ClientCacheBusterContextValue {
queueInvalidation: (route: string) => void;
}
export const ClientCacheBusterContext = createContext<ClientCacheBusterContextValue>({
queueInvalidation: () => {},
});
export const ClientCacheBusterContainer = ({
children,
revalidateAction,
}: {
children: React.ReactNode;
revalidateAction: (pathname: string) => Promise<void>;
}) => {
const pathname = usePathname();
const invalidationQueueRef = useRef<string[]>([]);
const [_, startTransition] = useTransition();
const queueInvalidation = useCallback((route: string) => {
invalidationQueueRef.current.push(route);
}, []);
useEffect(() => {
if (invalidationQueueRef.current?.at(0) === pathname) {
return;
}
const head = invalidationQueueRef.current.shift();
if (head) {
startTransition(async () => {
await revalidateAction(head);
});
}
}, [revalidateAction, pathname, startTransition]);
return (
<ClientCacheBusterContext.Provider value={{ queueInvalidation }}>{children}</ClientCacheBusterContext.Provider>
);
};
type LinkPropsReal = React.PropsWithChildren<
Omit<React.AnchorHTMLAttributes<HTMLAnchorElement>, keyof LinkProps> & LinkProps
>;
export const ExtendedLink = forwardRef<HTMLAnchorElement, LinkPropsReal & { href: string }>(
function ExtendedLinkFunction({ onClick, ...props }, ref) {
const context = useContext(ClientCacheBusterContext);
const wrappedOnClick = useCallback(
(e: React.MouseEvent<HTMLAnchorElement>) => {
if (props.href) {
context.queueInvalidation(props.href);
}
if (onClick) {
onClick(e);
}
},
[context, onClick, props.href],
);
return <Link {...props} ref={ref} onClick={wrappedOnClick} />;
},
);

What is this?

This is an approach to navigation using the Next.js 13 App Router that will step around the client cache rules. It addresses the concerns described at this monster GitHub issue: vercel/next.js#42991.

Just tell me how to use it.

My, you're in a hurry! Please read the rest but to get started:

  1. Wrap the entire app in the server-rendered CacheBusterContainer.
  2. Anywhere we have a link that we do not want to cache, we use ExtendedLink instead of link.

That's it.

Why is this necessary?

The Next.js App Router has opinionated rules about caching. Among them is a default by which any page visited by following a Link (<Link href={somePath}>) will be held in a client-side cache for 5 minutes or until the cache is manually cleared using revalidatePath. The only supported alternative to this is setting prefetch={false}, which reduces the 5 minute cache to 30 seconds. For many uses cases, 30 seconds is too long.

Consider the following ecommerce site scenario:

  1. A customer visits a product page. They look at it for a monent and then browse back to look at more items.
  2. While they are elsewhere, the product sells out.
  3. They return to the page but they see the cached result. This is held in the client -- there is no way for them to know the content is expired. They try to add it to their cart and receive an error message. They think the site is broken, they leave.

What we would like to see happen: they return to the page, it renders new content from the server, they see it is sold out, they are sad they missed their opportunity.

How does this work?

  1. The server container's job is to pass an action, revalidateAction, down to the client container. This works around a bug, something about cache... something. I don't have it in front of me. Try it without the server component and see. :-)
  2. The client container provides a context object. This context object exposes a callback that accepts a string of a path to invalidate.
  3. When an ExtendedLink is clicked, it:
  • Calls the callback defined by the client container. It provides the path of the link that was just clicked.
  • The client container also has a mutable ref of type string[]. When the callback is called, the pathname provided as an argument is added to this array.
  • The client container has a useEffect that is triggered when the pathname changes. On every pathname change, it compares the head (element 0) of the mutable ref to the new path. If element 0 is not empty and it does not match the new path, it calls the server action which wraps revalidatePath.

In other words, we hold a list of routes that we want to invalidate (ExtendedLink links clicked) and we wait until someone leaves the paeg before we invalidate them.

What are alternatives to this?

A handful of alternatives exist. You can read about some of them here. I provided instructions on using patch-package here.

What should I be worried about?

  1. You're using a server action. Server actions are in alpha, the interface might change.
  2. Your server will be doing more work. If you're on a serverless platform, this will make your functions work harder. That could be a problem! Especially if you're on Vercel's free tier, which apparently limits your revalidatePath calls to 100 per month. Be careful!
  3. I have not tested this thoroughly. I do not know what potential side-effects it might cause. Please use this carefully!
view raw instructions.md hosted with ❤ by GitHub

High Speed FTDI + Android Comms (OR) Why Am I Always Reading 0 Bytes?

I’ve been working on a project with some very unfamiliar tech. The project involves communication between a new Android app (Kotlin) and an FTDI 232R connected to an Arduino. I encountered a problem that baffled everyone on the team for weeks and was about to be labeled a “general incompatibility between FTDI devices and Android apps” until I stumbled upon the solution. Before I describe the solution, I’m going to document some basic details. While working on this, I was unable to find any examples of people having this problem, so there was an intense feeling of isolation as I struggled on and off for weeks to resolve it. My hope is that this might help someone identify the same problem in their system.

tl;dr

If you’re experiencing mystery “0 bytes available” errors, you might need to change your latency timer setting. The problem is described here. I also strongly recommend you read the longer document from which that excerpt is drawn, AN23B-04 Data Throughput, Latency and Handshaking. We immediately resolved our issue with a call to setLatencyTimer((byte) 1); and very small reads (64 bytes at a time, no more) but ultimately settled on an event character and larger reads. Full details below.

Detailed Notes

Our Arduino’s firmware is capable of sending a few different messages across the wire. Each message is small, anywhere from 16 bytes up to around 256. Most of these are on-demand: send a command from the application, the Arduino decodes it, then it sends one message in response that is either an ACK or the data that you’ve requested. There is one exception: one particular message from the app will trigger the start of an infinite stream of 44-byte messages at a frequency rate specified in the request. In this case, the Arduino is reading sensors, performing some basic analysis, and spitting it out across the wire for the app to do with as it pleases. The app reads this constant stream of bytes, does its own analysis, puts it on the screen, etc,…

Our minimum acceptable streaming rate is 300hz but we hope for closer to 500hz or greater, so our baud is currently 460800.

We encountered an issue whereby the app was constantly being told 0 bytes were available for read. The problem was extremely inconsistent and weird. The following were true:

  • We could ALWAYS open the port from our app.
  • We could ALWAYS transmit successfully from the app to the Arduino. We knew this because logs on the firmware indicated that the right bytes arrived in the right order.
  • We would SOMETIMES receive the correct response. It was all or nothing: sometimes we would query for available bytes and be told 0, other times we would see the expected number.
  • We could RARELY start the data stream. Once we sent the message to start streaming, the app would always believe there were 0 bytes available for read. Once that state was encountered, no other messages would be sent across the wire until we rebooted the firmware. It seemed to be more likely to fail as our streaming rate exceeded 100hz. Our target was 300hz or greater, so this was a serious problem!

Adding to the mystery, this seemed specific to the FTDI chip. Our first draft of this used the Arduino’s programming port for serial data transfer at 115200 baud. We were losing a lot of packets from the lack of flow control but it never failed to respond to messages.

More troubling was the fact that a C++ test application seemed to communicate correctly. This pointed towards a code problem with the Android app.

We tried three different libraries in attempts to resolve this. Those were:

  • usb-serial-for-android - An open-source library that is pretty well maintained and offers a lot of features. Unfortunately, it doesn’t support automatic flow control, so we worried we wouldn’t be able to use it long term.
  • UsbSerial - Another open-source library. This one is not nearly as well maintained and it has quite a few open issues that describe some pretty heinous bugs. I opened an issue after I found that calling the wrong method during initialization would result in all your sent messages being replaced by two 0 bytes for every one byte in your message! Brutal. It supports flow control but it has so many problems that I unfortunately couldn’t recommend it, even if it supported what we needed.
  • FTDI’s official d2xx - The official closed-source library for FTDI devices. It hasn’t been updated in two years but by virtue of being official, we expected it to be more reliable or at least more full-featured. The closed-source part is a bummer and I think it would be a much better library if not for that, but that’s another story. This was the library we wound up using and we will continue to do so.

All three of these libraries exhibited the same behavior! This started looking like a major issue with FTDI devices. I ordered a few Prolific PL2303-based serial cables to test as an alternative but kept researching in the meantime.

I began looking at FTDI’s official test apps and their example Android app code. The example code is… not… great… but in taking notes, I came across a mysterious call to setLatencyTimer(). This led me to this, which appeared to describe our problem exactly. It specifically remarks, “While the host controller is waiting for one of the above conditions to occur, NO data is received by our driver and hence the user’s application. The data, if there is any, is only finally transferred after one of the above conditions has occurred.” I did some more reading and found the longer AN23B-04 Data Throughput, Latency and Handshaking which explained this and many other concepts. This document was particularly enlightening. I feel like the embedded software development world is full of extremely dense, unapproachable technical specs that assume a ton of highly specific knowledge; by comparison, this document was a breath of fresh air and explained things from a high enough level that I came away feeling more capable of anticipating behavior as I continued troubleshooting.

It appeared that we were never hitting any of the three rules fast enough to trigger a read. It still doesn’t totally make sense to me; I feel like we should have eventually hit 4Kb to trigger the send, but maybe I never let it sit long enough to get there? Or maybe there was another timeout value that was clearing the buffer before then. What I do know is that if I set the latency timer down to 1ms and ensured we never requested more than 64 bytes at a time, our data read problems went away. We could stream at 500hz and messages would usually start showing up as soon as we hit the button. This change was as simple as setLatencyTimer((byte) 1); and making sure that we never requested more than 64 bytes during a call to read. The immediate problem was solved and it was clear that we did not have some incompatibility between FTDI and our Android app.

I say that it would “usually” start showing up because it still exhibited strange behavior. Very often, I would start the stream through the app’s interface and nothing would happen. Then I’d send another message (“get hardware version”) and not only would it get my hardware version, it would also recognize that data was streaming in. Other times, I would request our largest payload, a system configuration, and it would return 31 bytes of the 200+ we expect. Just like with the stream, I’d send any other message (“get firmware version”) and it the remaining 200+ bytes would show up.

I wound up making a few other changes to resolve this problem and improve the behavior overall.

First, using more information gleaned from the Data Throughput, Latency and Handshaking document, I thought that we be better off using the FTDI’s support for Event Characters than the latency timer. Our encoding rules use a 0 byte as a delimiter, so it was an obvious choice. This allowed me to increase our maximum read size up to 256 bytes, which helped in the event that our read loop was delayed and we had to quickly get through a backlog of data. (I could probably go higher but I’m being pretty careful right now, I want to keep things moving.) Finally, I modified the read loop to also be responsible for writes, added a FIFO queue for outgoing messages, and (crucially) a 50ms timeout of the loop after every single message sent. The 50ms timeout was the most significant piece – it was the final change that ensured that we stopped seeing partial messages or messages that only arrived after a subsequent send. I don’t have a good answer for why that was necessary but given the complexities of the d2xx library, reading from USB in general, the FTDI and its buffers, and the Arduino, it’s not too surprising that things can get out of sync if you’re moving fast.

With the implementation of the event character, the buffered writes added to the loop, and the timeout after writing, we appear to be running smoothly. So smoothly, in fact, that I was able to remove the setLatencyTimer call entirely and just leave it at its default. As configured, data is sent as soon as a 0 is hit or 256 bytes are available, whichever comes first. (Typing this out, I realize that I should probably just set it to the exact size of our largest message, there’s no way it could ever be smaller and having an incomplete message does us no good!)

To summarize, we went through two rounds of improvements that changed our situation from bleak to beautiful.

Round 1:

  • Set the FTDI’s latency timer to 1ms
  • Limit our max read size to something small to prevent a “jerky” feel

Round 2:

  • Revert latency timer to default
  • Enable an event character keyed to our delimiter, a 0 byte – this is the key
  • Set a max read that’s a bit bigger than our typical messages to help us catch up if we ever have a huge backlog and want to get the queue down (again, I don’t know what this situation is)

As it happens, the d2xx library is the only one of the three that supports configuration of latency timer, event character, and flow control. One of the two open source libraries supports the latency timer, the other claims to support flow control, and neither supports the event character. Only the closed-source official FTDI library d2xx supports all three, so we’ll be sticking with that.

It appears that our use case of extremely high streaming rate combined with tiny messages at a very high baud is somewhat unique. If we had been sending larger messages at a slower rate, I don’t think we would have encountered this. Our 44 byte messages at 300hz were the problem.

I spent many lonely weeks fighting with this. Failure to resolve it would have been a major problem for the project. In the end, the solutions I found were new to the whole team, which included many people with much more experience than me when it came to FTDI chips, which should go to show you how esoteric some of these configuration parameters very well may be. This is my first project writing Kotlin, working on Android, or using FTDI devices at all, so I while I’m disappointed that it was such an unpleasant struggle to get it done, I am pleased to have it behind me. I sincerely hope this helps somebody avoid going through the same experience.

subscribe via RSS