3B2 System Timers and SIMH

I think I made a grievous error when I originally laid out how the 3B2 system timers would work in SIMH, and last night I started down the path of correcting that.

The 3B2/400 has multiple sources of clocks and interrupts: There’s a programmable interval timer with three outputs being driven at 100KHz, a timer on the UART that runs at 235KHz, and a Time-of-Day clock. They’re all driven at different speeds, and they’re all important to system functionality.

SIMH offers a timing service that allows the developer to tie any of these clocks to the real wall clock. This is essential for time-of-day clocks or anything else that wants to keep track of time. I used this functionality to drive the UART and programmable interval timers at their correct clock speeds.

But that’s completely wrong. Of course this is obvious in retrospect, but it seemed like a good idea at the time. The problem is that the main CPU clock is free to run as fast as it can the SIMH host. On some hosts it will run very fast, on some hosts it will run quite a bit slower. You can’t possibly know how fast the simulated CPU is stepping.

When your timers are tied to the wall clock but your CPU is running as fast as it can, there are going to be all kinds of horrible timing issues. I had lots of unpredictable and non-reproducible behavior.

Last night, I undid all of that. The timers are now counting down in CPU machine cycles. I used the simple power of arithmetic to figure out how many CPU machine cycles each step of each timer would take, and just did that instead.

Now, it seems like everything is a lot more stable, and much less unpredictable.

Simulating the Disk Controller

I spent last night probing my 3B2/310’s hard disk controller with a logic analyzer so I can see exactly how it behaves, both with and without a hard disk attached to the system. It proved to be very tricky to get the logic analyzer probes attached because the motherboard is so incredibly dense. In fact, I couldn’t get a probe attached to the chip select line no matter how hard I tried. There just wasn’t any room to fit a probe between the chip and a nearby resistor array, so I resorted to using a little piece of wire to just touch against the pin. I could have used three hands for that operation.

I exported the data from Saleae Logic as a CSV file, and wrote a little Python script to parse the commands out of it. This is what I got:

I can now match this sequence up exactly with my simulator, so I can simulate the precise behavior, including the delay between status 0x80 (which means “I’m busy, try again later”) and 0x40 (which means “I’m done now”)

Unfortunately, the simulator still hangs after loading and executing /etc/init from the boot floppy, but now at least I know it has nothing to do with a badly behaving disk controller. I have proof that mine behaves the same way as the real thing.

It’s Hard Disk Time

My next mini-project in the 3B2/400 simulator will be emulating the hard disk. The 3B2/400 used a NEC μPD7261A hard disk controller (PDF datasheet here), which has proved to be harder to emulate correctly than I would have liked.

So far, my hard disk controller emulation has been limited to the most minimal functionality needed to get the emulator to pass self-checks at all. Other than that, it’s just a skeleton. But I believe that it’s actually hanging up the floppy boot process now when UNIX tries to discover what hard drives are attached, so it’s time to get serious and fix it.

My progress isn’t good. I am following the datasheet to the letter, trying to give the correct status bits at the correct time, but the 3B2 just gets confused. It never even tries to read data off the drive, it just gives up trying to read status bits. So, clearly I’m doing something wrong, but I don’t know what it is.

Tonight I will strap a logic analyzer to the PD7261a in my real 3B2 and see exactly what it’s doing. I’ll report on my findings when I have them.

The Equipped Device Table Question is Answered

And just like that, it’s solved. I figured out the mystery of the Equipped Device Table.

The answer was in some obscure piece of System V Release 3 source code. The 3B2 system board has a 16-bit register called the Control and Status Register (CSR). In the CSR is a bit called named TIMEO that I never figured out.

It turns out that I just wasn’t reading the disassembled ROM code closely enough. The exception handler checks this status bit whenever it catches a processor exception while filling the EDT. If the bit is set, it skips the device.

So what is TIMEO? It’s the System Bus Timeout flag, according to the SVR3 source code.

The correct behavior, then, if nothing is listening at an I/O card’s address is to set an External Memory Exception, plus set this bit in the CSR. Once I implemented that in my simulator, the EDT started working exactly the same as it does on my real 3B2/310. Success!

The Mystery of the Equipped Device Table

[EDIT: I have made a followup post detailing the answer to this mystery!]

There is yet one more puzzling aspect of the 3B2 that I do not yet understand, and that is the equipped device table, or EDT. I’ve documented the nitty-gritty details on my main 3B2 reverse-engineering page, so I won’t bore you with the details. But the short version is this:

The EDT is what tells the system what I/O boards are installed. On startup, the ROM code probes each I/O slot to see what kind of card is in it. Each card has a read-only register that returns the board’s 16-bit Option Number, and the ROM fills in a special table in RAM with the Option Number for each card it discovers. It doesn’t fill in any other information, just the Option Number. For example, the SCSI expansion card returns option number 0x100, and the EPORTS card returns option number 0x102. That’s the only information the ROM gets from the card. Later, the user can run a program called filledt that looks up all kinds of other information about the card in a database held on disk.

So here’s today’s puzzler: How does the system decide that there’s nothing in a slot?

This is what I see when I boot my emulator:

But, this is what I see when I boot my real 3B2/310 (with no hard disk attached):

Clearly, the real 3B2 has discovered that there’s nothing in the expansion slots. I have confirmed this by running filledt on both the real system, and on the simulator.

This raises the question of how exactly the real 3B2 uses an option number read to determine there’s nothing in the slot. The ROM code should reveal all, but the code is very simple and straightforward: It just reads an address from each card and sticks it into memory, that’s all. It doesn’t branch on any magic values or anything: Just read a number, put it in memory.

The ROM also sets up an exception handler during the card slot read such that any exception causes the entire boot process to fail, so apparently it does not expect there to be any kind of bus read error.

So how? I’ve tried the following experiments:

  • Making each empty slot return 0xffff on read.
  • Making each empty slot return 0x0000 on read.
  • Making each empty slot raise an External Memory Exception on read.
  • Making each empty slot raise an External Memory Exception on write.

None of these, it seems, does the right thing. This is another problem that could probably be solved easily by having access to the right documentation, but that’s a rant for another time.

A Last Word on MMU Caching

Over the weekend I conducted several experiments with caching using my 3B2 simulator. I learned a few critical bits of information. For background, see this post and this post.

The first and most important thing I learned is that indexing cache entries only by their tags does not work. There are collisions galore, and no way to recover from them. However, if I index SD cache entries by the full SSL, and PD cache entries by the full SSL+PSL, everything seems to work perfectly. This leaves several big questions unanswered, but they are probably unanswerable. After all, I have no way of looking inside the MMU to see how it actually indexes entries in its cache, I can only go on published information and make educated guesses.

Second, I learned that implementing MMU caching is required for the 3B2 simulator to work correctly. Until this weekend, I had not implemented any caching in the simulated MMU because I assumed that caching was only used for performance reasons and could be skipped. But this is not true. In fact, UNIX SVR3 changes page descriptors in memory without flushing the cache and relies on the MMU to use the old values until it requests a flush. Not having implemented caching was a source of several serious bugs in the simulator.

Third, I learned that the “Cacheable” bit in segment descriptors is inverted. When it’s a 1, caching is disabled. When it’s a 0, caching is enabled.

The 3B2/400 simulator now makes it all the way through booting the SVR3 kernel and starting /etc/init. There are some other bugs preventing init from spawning new processes, but I hope to have these ironed out soon.

MMU Caching Redux

I had a Eureka! moment last night about MMU caching, but it all came tumbling down this morning.

My realization was that the Segment Descriptors are 8 bytes long, and that Page Descriptors are 4 bytes long. So, if we assume that the virtual address encodes the addresses of the SDs and PDs on word-aligned boundaries (and SDs and PDs are indeed word-aligned in memory), then you don’t need the bottom three bits for SD addresses, nor do you need the bottom two bits for PD addresses. Voila!

But this morning, I remembered two very important facts:

  • The SSL field in the virtual address is an index into the table of SDs, not an address offset.
  • Likewise, the PSL field is an index into the table of PDs, not an address offset.

From the WE32100 manual, here’s a pictorial representation of how an address is translated when using contiguous segments.

And here’s a concrete example. Immediately after the MMU is turned on in the 3B2/400 boot process, it translates the virtual address 0x40060f8c. Let’s break that down.

The SSL, bits 17–29 of the address, is 3 (b0000000000011). This is an index into a table, so to find SD number 3 in section 1, you take a base address for section 1 and add (3 * 8), since each SD is eight bytes long—two words.

So now I’m right back at square one. How does caching work if these are table offsets instead of address offsets? If the TAG field of an SD cache entry only holds bits 20–29, I don’t yet grok how it uniquely identifies an SD.

For further reading, you can check out The WE32100 Microprocessor Information Manual pages 308–315, and The WE32101 MMU Datasheet pages 164–175.

MMU Caching for Fun and Profit

I’m in the middle of a very long, very drawn out project to try to emulate the AT&T 3B2/400 computer. I should probably have been sharing my progress more frequently than I have been, but it has for the most part been a painful and solitary endeavor.

Today, though, there is something in particular that is bothering me greatly, and I must yell into the void to get this frustration off my chest. And that is, how in the hell does the MMU cache work?

So first, a little background.

The MMU on the 3B2/400 divides virtual addresses into four parts. The first two bits (the SID) identify the section of virtual memory, of which there are four. The next 12 bits (the SSL) identify a physical memory offset to locate either a contiguous segment of memory, or a set of pages. The details of how contiguous segments and paged memory work are not that important for this discussion.

The actual segments and pages are described in the 3B2’s memory using Segment Descriptors and Page Descriptors. These descriptors are directions to the MMU that tell it how to do a translation from virtual address to physical address.

For performance, the MMU doesn’t want to go talking to main memory every time it does a translation. So, it has an on-chip cache. It can store 8 Segment Descriptors per section (a total of 32), and it can store 16 Page Descriptors per section (a total of 64). The cache keeps things humming along.

So far, so good, right? Makes sense? Following along?

Here’s where things get crazy. Here’s how the MMU datasheet describes the internals of the cache entries.

Note the language on the “tag” field for each of the cache entry formats. These tags are supposed to uniquely identify the Segment Descriptor and Page Descriptor entries, right? And I presume they’re used for looking up a virtual address in the cache?

But wait! Bits 20–29 can’t possibly be enough to uniquely identify a segment descriptor. To look up a Segment Descriptor in main memory, you need all 13 bits of the SSL! There would surely be collisions in the cache, right?

And the problem extends to the Page Descriptor cache. Again, bits 13–16 and 18–29 of the virtual address can’t identify a Page Descriptor uniquely. You need all 13 bits of the SSL and all 6 bits of the PSL to locate a Page Descriptor in main memory.

This is my confusion. If those bits identify an SD and a PD uniquely in the cache, why can’t they identify them uniquely in main memory? And why aren’t there collisions in the cache?

This mysterious machine gets harder to understand all the time.

The 1993 Social Network

I was a freshman at Cornell University in Fall of 1992 when I logged into my first UNIX system.

I’d heard of UNIX before, of course—it was a popular subject in trade magazines of the period, and if you tinkered with computers you’d probably have heard of it—but I’d never actually used it. So one day I marched over to the campus IT department to register for a UNIX account. It took some wrangling, but very shortly I was drinking from the UNIX firehose.

Compared to a lot of other schools, Cornell’s UNIX cluster was small and under-powered. Resources were scarce. The whole campus had to share four DECstation 5000s running ULTRIX with home partitions mounted over NFS, so they actively discouraged new users. I don’t remember the specs (you can use your imagination) but I do remember how laggy things got when there were 30 or so users logged into each system. We called this cluster “The Cruxen”, because the systems were named crux1, crux2, crux3, and crux4.

The thing that made this cluster special was not the hardware or the software; it was the community that formed around it. By mid-1993, a great many of my friends had accounts on the Cruxen, and it didn’t take long for the cluster to become a place to meet, talk, share ideas, and kvetch. One of the most common ways to talk to each other was the UNIX utility write, which allowed you to send a short one or two line message to a user logged in on another terminal—long before the term ‘instant messanger’ had been coined, we were doing it on UNIX. We used last and finger to see when someone had last logged in, and to watch for them to log in again. For more immediate, longer-form communication, we used talk to connect two terminals together and watch each other type back and forth. And of course there was always e-mail, and Usenet, a global network of hundreds of computer bulletin boards covering every subject under the sun. The Cruxen were a social network, as surely as Facebook is today.

Really, it was more than a social network. It was a Social Network++, a fully programmable social network. Interesting projects sprang up on the Cruxen. It was easy to share scripts and coding projects back and forth when everyone had access to the same filesystems, programming languages, and tools. Of course sometimes disasters happened, but they were rare. Once, a friend built a distributed queue manager for POVray rendering, so that all the Cruxen could render an animation frame in parallel. He did not leave enough idle time, and ended up taking down the whole system. I myself once almost took it down by seeing what would happen if I created a .forward file that forwarded email to myself (discovery: it formed an infinite loop and quickly filled up the mail spool, bringing upon me the wrath of the admins). It was immense fun, and we felt like we were on the cutting edge.

But the fact is, we were late to the game. This scene had already been played out many times and in many places. From the late 1960s onward, big multi-user computer systems were places where people could form communities, whether running MULTICS or TOPS-20 or ITS or UNIX or VMS. This sort of thing was common-place in those environments. By the time we found the Cruxen in the early 1990s, the timesharing experience was already on the wane. By 2000, it was largely a thing of the past. Desktop computers that could do everything the UNIX cluster could do were cheap and readiliy available, and there was no need for a shared environment any more. Each of us now computes alone.

Is there any way to recapture this sort of experience? Yes! One of the oldest and best known is the SDF Public Access UNIX System. They’ve been in business since 1987, so they have considerable experience providing a UNIX cluster environment to thousands of users.

More recently, Paul Ford (@ftrain on Twitter) created an accidental phenomenon when he launched tilde.club, a place to build and share web pages on a plain-old UNIX box, just like we did back in the ’90s when the World Wide Web was young and new and we ran with Perl scissors.

And, finally, there’s my own brand new project, RetroNET, whose goal is to give a Cruxen-like experience to hackers and tinkerers and makers.

We can never go back to the days when we had to use a cluster to get our work done, nor would I want to. But we can still recapture some of the feel of that time, and I think we can still do good things with it here in the 21st century. So, whether you’ve used a UNIX system before or not, whether you lived through that time or you didn’t, I encourage you to at least give it a try. You never know who you’ll meet, or what you’ll learn.

Retrochallenge: The Final Entry

Well. Here it is, the final entry for my Summer Retrochallenge project. I wanted to do more, but as so often happens, real life intervened and I didn’t have nearly as much time to work on it as I’d wanted. C’est la vie!

But I’m still proud of what I accomplished. I have a working ROM monitor, and I’m happy to report that as a final hurrah, I got it fully integrated with Lee Davison’s Enhanced 6502 BASIC.

Integrating with BASIC

I didn’t start this project thinking I’d tackle BASIC integration, but as Retrochallenge drew to a close, it seemed like it might be the best use of my precious time. What I mean by “Integration” here is simply having my ROM monitor code live side by side with BASIC and provide the underlying I/O functionality needed for BASIC to talk to the terminal and get input, plus the ability to take over and run in the foreground when the user wants it.

If you’ve ever used an Apple II, you may recall that it, too, has a built in ROM monitor, and it can be reached by typing CALL -151 into Apple BASIC. Well, when running Enhanced 6502 BASIC with my monitor, you can get the same thing by typing CALL -1239. Once you’re in the ROM monitor, you can quit back to BASIC just by typing Q. Like so:

BASIC Interop

It actually turned out to be fairly simple to get the integration working.

The first thing I had to do was re-organize my memory usage so as not to conflict with EhBASIC’s memory map:

There’s not much room left in Page Zero after EhBASIC gets done with it, but the locations $DE thorugh $E2 are free, so I made use of them. Page Two is similar, so I stuck my variables all the way up at $02A0 and on.

After that, I had to change the implementation of CIN a little bit, because EhBASIC requires that it not block. Instead, it expects this routine to return immediately and set the Carry flag if a character was read, and clear the Carry flag if no character was read. The new routine looks like this:

It’s a little ugly and could easily be optimized at the expense of readability, but hey, it works!

So that’s it. With just those changes, I was pretty much able to integrate the monitor into EhBASIC in just a few hours. As always, all of my code is available on GitHub for your viewing pleasure.

I’d also like to add that this has been a fantastic Summer Retrochallenge. There were so many amazing projects, I can’t even pick a favorite. I’ve really enjoyed reading what everyone else has been up to, and I’m already looking forward to January. Onward and upward, everyone!