Skip to main content

Claude Code modernizes a legacy COBOL codebase

TL;DR

  • Claude code can automate the modernization of COBOL applications, transforming them into modern languages like Java. It significantly streamlines the discovery, documentation, and migration process.
  • The initial phase focuses on generating extensive documentation, mapping complex business logic, dependencies, and architecture, which is crucial for undocumented legacy systems.
  • The migration ensures bit-for-bit fidelity with the original COBOL logic while producing idiomatic, maintainable Java code verified through a robust dual test harness.

Takeaways

  • Use specialized sub-agents (e.g., documentation expert, translator) with isolated context windows to avoid polluting the main thread during complex tasks.
  • Employ "thinking mode" for architectural analysis and "planning mode" for comprehensive migration strategy development before any code changes.
  • Generate detailed documentation beyond simple comments, including business workflows, dependency maps, and a glossary of cryptic program names.
  • Utilize tools like Mermaid diagrams to visualize complex workflows, such as daily batch processing and data flow.
  • Follow a multi-phase migration plan covering project structure, data model translation (e.g., copybooks to Java classes), IO layer compatibility, business logic conversion, and dual test harness creation.
  • Prioritize creating idiomatic modern code (e.g., Java classes with appropriate design patterns and error handling) rather than a direct, unmaintainable syntax translation.
  • Implement rigorous verification by comparing not only final outputs but also intermediate calculations, file writes, and data transformations to ensure bit-for-bit fidelity.
  • The techniques are scalable to much larger codebases and can run autonomously for extended periods (e.g., over 30 hours) for extensive documentation and migration efforts.

Vocabulary

COBOL — A high-level programming language primarily used for business, finance, and administrative systems on mainframe computers. Mainframe Modernization — The process of updating or replacing legacy systems running on mainframe computers with modern technologies and architectures. Copybook — In COBOL, a file containing reusable data structures or code segments, included by programs to ensure consistent data definitions. JCL script — Job Control Language; a scripting language used on IBM mainframe operating systems to tell the system how to run a batch job or start a subsystem. Sub-agent — An autonomous AI agent specialized for a particular task, often invoked in parallel by a main agent and operating with its own context. Context window — The amount of text or tokens an AI model can process or "remember" at one time, affecting its ability to understand and generate relevant responses. Mermaid diagrams — A JavaScript-based tool for generating diagrams and flowcharts from text-based definitions, often used for documentation. Dual test harness — A testing setup where both the original legacy code and the new migrated code are run with the same inputs, and their outputs are compared to ensure equivalence. Bit-for-bit fidelity — A state where a migrated system's output, calculations, and internal data transformations precisely match the original system's down to the lowest level. Idiomatic Java — Java code that follows the standard conventions, best practices, and common patterns expected within the Java development community, making it readable and maintainable.

Transcript

Let's explore how developers can use Claude code to modernize a cobalt codebase. For the purposes of this demo, we use AWS's mainframe modernization demo repository. This is a medium-sized credit card management system with around 100 files, including cobalt programs, copybooks, and JCL scripts. Phase 1, discovery and documentation. Our sample cobalt codebase has almost no documentation. This is of course common with legacy codebases where critical business logic and regulatory requirements are embedded within undocumented code. The developers who wrote the code have long since left the organization, and developers familiar with cobalt are hard to hire. We first create a specialized sub agent using Claude code slash agent command. This was our cobalt documentation expert in translator. Sub agents can be invoked by Claude code in parallel, and they operate with their own isolated context windows to avoid polluting the main thread. We enabled thinking mode and ask Claude code to analyze the architecture of the codebase. Claude code created a to-do list of all 94 files and tracked its progress to ensure no files were processed twice and nothing was missed. The documentation Claude produced went beyond simple code comments. For example, let's look at the interest calculation program, CBACT04C. It extracted the complete business workflow. How the program reads transaction category balances, looks up interest rates by account group, applies business rules for fallback rates, and updates account records. Claude did this for each file, but also created two memory files as plain text. Catalog.Tex translates cryptic names like CBACT04C into interest calculator batch program. CBACT04C.Tex maps every dependency using a simple pipe-delimited format. Using these indices, Claude then generated mermaid diagrams, a complete map of the daily batch processing workflow, showing how the data flows from transaction input through posting, interest calculation, and finally to customer statements. In this demo, Claude code ran continuously for an hour to draft over 100 pages of documentation, Claude code is capable of running for over 30 hours autonomously, and the techniques used here scale to much, much larger code basis. Phase two, migration and verification. After thoroughly documenting the cobalt code base, we asked Claude to migrate one of its core features to Java. We switched to planning mode to ensure Claude would think through the entire migration strategy without prematurely editing files. Claude analyzed the program formally known as CBACT04C and identified complex cobalt patterns like line break processing and multi-file coordination. Claude developed a migration plan for this feature with five phases. One, create the project structure. Two, translate data models from copybooks to Java classes. Three, build the IO layer compatible with the original file formats. Four, convert business logic while preserving cobalt-specific behaviors. And finally, create a dual test harness, one using GNU cobalt 3.2.0 for the original code base and one in Java 17. The resulting Java code went beyond a simple syntax translation. Claude created proper Java classes with appropriate design patterns, air handling and logging. Idiomatic Java that a modern development team would actually maintain. Next was verification to ensure that the new Java code worked the same as the cobalt code it was replacing. We created multiple test data files and ran them against both the original cobalt and the new programs. The verification compared not just final outputs but intermediate calculations, file rights and data transformations. The result was perfect bit for bit fidelity. Every calculation, business rule and edge case was preserved. Of course, this demo application is far smaller than your legacy cobalt code bases, but all the techniques here are scalable. Claude code will empower your developers to modernize code bases with confidence and efficiency that simply would have been impossible just 12 months ago.

Feedback / ReportSpotted an issue or have an improvement idea?