Developer Guide¶
This guide is for developers who want to contribute to the Gaussian Extractor project. It covers the codebase structure, development setup, coding standards, and contribution guidelines.
Project Overview¶
Architecture¶
Gaussian Extractor is a C++20 application designed for high-performance processing of Gaussian computational chemistry log files. The codebase follows a modular architecture with clear separation of concerns:
├── src/ │ ├── main.cpp # Application entry point │ ├── extraction/ │ │ ├── coord_extractor.cpp │ │ ├── coord_extractor.h │ │ ├── gaussian_extractor.cpp │ │ └── gaussian_extractor.h │ ├── high_level/ │ │ ├── high_level_energy.cpp │ │ └── high_level_energy.h │ ├── input_gen/ │ │ ├── create_input.cpp │ │ ├── create_input.h │ │ ├── parameter_parser.cpp │ │ └── parameter_parser.h │ ├── job_management/ │ │ ├── job_checker.cpp │ │ ├── job_checker.h │ │ ├── job_scheduler.cpp │ │ └── job_scheduler.h │ ├── ui/ │ │ ├── help_utils.cpp │ │ ├── help_utils.h │ │ ├── interactive_mode.cpp │ │ └── interactive_mode.h │ └── utilities/ │ ├── command_system.cpp │ ├── command_system.h │ ├── config_manager.cpp │ ├── config_manager.h │ ├── metadata.cpp │ ├── metadata.h │ ├── module_executor.cpp │ ├── module_executor.h │ ├── utils.cpp │ ├── utils.h │ └── version.h ├── tests/ ├── docs/ ├── resources/ ├── CMakeLists.txt # CMake build configuration ├── Doxyfile # Doxygen configuration ├── LICENSE # Project license ├── Makefile # Make build system └── README.MD # User documentation
New Modules in v0.5.0¶
- Interactive Mode (interactive_mode.h/.cpp)
- Windows-specific interactive interface 
- Menu-driven command selection 
- Automatic extraction before entering interactive mode 
 
- Coordinate Processing (coord_extractor.h/.cpp)
- Extract final Cartesian coordinates from log files 
- XYZ format conversion and organization 
- Support for completed and running job separation 
 
- Input Generation (create_input.h/.cpp)
- Generate Gaussian input files from XYZ coordinates 
- Template system for reusable parameter sets 
- Support for multiple calculation types (SP, OPT, TS, IRC) 
 
- High-Level Energy Calculations (high_level_energy.h/.cpp)
- Combine high-level electronic energies with low-level thermal corrections 
- Support for kJ/mol and atomic unit outputs 
- Directory-based energy combination workflow 
 
- Job Status Management (job_checker.h/.cpp)
- Comprehensive job status checking and organization 
- Support for multiple error types (PCM, imaginary frequencies) 
- Automated file organization by job status 
 
- Metadata Handling (metadata.h/.cpp)
- File metadata extraction and validation 
- Job completion status detection 
- File size and timestamp tracking 
 
- Parameter File Parsing (parameter_parser.h/.cpp)
- Template parameter file parsing 
- Configuration file format support 
- Validation and error reporting 
 
Key Design Principles¶
- Modularity
- Each module has a single responsibility 
- Clear interfaces between components 
- Easy to test and maintain 
 
- Performance
- Multi-threaded processing 
- Memory-efficient algorithms 
- Cluster-aware resource management 
 
- Safety
- Comprehensive error handling 
- Resource cleanup on failures 
- Graceful shutdown handling 
 
- Usability
- Intuitive command-line interface 
- Extensive help system 
- Configuration file support 
 
Development Setup¶
Prerequisites¶
Required Tools:
- C++ Compiler: GCC 10+, Intel oneAPI, or Clang 10+ 
- Build System: Make (included with most Linux distributions) 
- Documentation: Sphinx (for building documentation) 
- Git: Version control system 
Optional Tools:
- CMake: Alternative build system 
- Doxygen: API documentation generation 
- Valgrind: Memory debugging 
- Clang-Tidy: Code analysis 
Getting the Source Code¶
# Clone the repository
git clone https://github.com/lenhanpham/gaussian-extractor.git
cd gaussian-extractor
# Create a development branch
git checkout -b feature/your-feature-name
Building for Development¶
Debug Build:
# Build with debug symbols and safety checks
make debug
# Or with CMake
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug ..
make
Release Build:
# Optimized release build
make release
# Or with CMake
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
Development Build with All Features:
# Full development build
make -j $(nproc)
Testing¶
Running Tests:
# Build and run tests
make test
# Run specific test suite
./test_runner --suite extraction_tests
# Run with verbose output
./test_runner -v
Test Coverage:
# Generate coverage report
make coverage
# View coverage in browser
firefox coverage_report/index.html
Code Quality Tools¶
Static Analysis:
# Run clang-tidy
clang-tidy src/core/*.cpp -- -std=c++20 -Isrc
# Run cppcheck
cppcheck --enable=all --std=c++20 src/
Code Formatting:
# Format code with clang-format
find src/ -name "*.cpp" -o -name "*.h" | xargs clang-format -i
# Check formatting
find src/ -name "*.cpp" -o -name "*.h" | xargs clang-format --dry-run -Werror
Documentation¶
Building Documentation:
# Install Sphinx
pip install sphinx sphinx-rtd-theme
# Build HTML documentation
cd docs
make html
# View documentation
firefox _build/html/index.html
API Documentation:
# Generate Doxygen documentation
doxygen Doxyfile
# View API docs
firefox doxygen/html/index.html
Coding Standards¶
Code Style¶
Naming Conventions:
// Classes and structs
class CommandParser;
struct CommandContext;
// Functions and methods
void parse_command_line(int argc, char* argv[]);
CommandContext create_context();
// Variables
int thread_count;
std::string output_file;
// Constants
const int DEFAULT_THREAD_COUNT = 4;
const std::string CONFIG_FILE_NAME = ".gaussian_extractor.conf";
// Member variables (with m_ prefix)
class MyClass {
private:
    int m_thread_count;
    std::string m_config_file;
};
File Organization:
- Headers (.h): Class declarations, function prototypes, constants 
- Implementations (.cpp): Function definitions, implementation details 
- One class per file when possible 
- Related functionality grouped in modules 
Documentation Standards¶
Doxygen Comments:
/**
 * @brief Brief description of the function/class
 *
 * Detailed description explaining what the function does,
 * its parameters, return values, and any important notes.
 *
 * @param param1 Description of first parameter
 * @param param2 Description of second parameter
 * @return Description of return value
 *
 * @section Usage Example
 * @code
 * // Example usage
 * int result = my_function(param1, param2);
 * @endcode
 *
 * @note Important notes about usage or limitations
 * @warning Warnings about potential issues
 * @see Related functions or classes
 */
int my_function(int param1, const std::string& param2);
Inline Comments:
// Use comments for complex logic
if (condition) {
    // Explain why this condition is important
    do_something();
}
// Use TODO comments for future improvements
// TODO: Optimize this loop for better performance
Error Handling¶
Exception Safety:
try {
    // Operation that might fail
    process_files(file_list);
} catch (const std::invalid_argument& e) {
    // Handle invalid arguments
    std::cerr << "Invalid argument: " << e.what() << std::endl;
    return 1;
} catch (const std::runtime_error& e) {
    // Handle runtime errors
    std::cerr << "Runtime error: " << e.what() << std::endl;
    return 2;
} catch (const std::exception& e) {
    // Handle all other exceptions
    std::cerr << "Unexpected error: " << e.what() << std::endl;
    return 3;
}
Return Codes:
/**
 * @return 0 on success
 * @return 1 on general error
 * @return 2 on invalid arguments
 * @return 3 on resource unavailable
 * @return 4 on operation interrupted
 */
int process_data(const std::string& input_file);
Memory Management¶
RAII Pattern:
class FileProcessor {
public:
    FileProcessor(const std::string& filename)
        : m_file(filename) {
        if (!m_file.is_open()) {
            throw std::runtime_error("Failed to open file");
        }
    }
    ~FileProcessor() {
        // Automatic cleanup
        if (m_file.is_open()) {
            m_file.close();
        }
    }
private:
    std::ifstream m_file;
};
Smart Pointers:
// Use unique_ptr for exclusive ownership
std::unique_ptr<CommandContext> context = std::make_unique<CommandContext>();
// Use shared_ptr for shared ownership
std::shared_ptr<ConfigManager> config = std::make_shared<ConfigManager>();
Thread Safety¶
Thread-Safe Classes:
class ThreadSafeCounter {
public:
    void increment() {
        std::lock_guard<std::mutex> lock(m_mutex);
        ++m_count;
    }
    int get_count() const {
        std::lock_guard<std::mutex> lock(m_mutex);
        return m_count;
    }
private:
    mutable std::mutex m_mutex;
    int m_count{0};
};
Threading Guidelines:
- Document thread safety guarantees 
- Use appropriate synchronization primitives 
- Avoid global mutable state 
- Test concurrent access patterns 
Contributing¶
Development Workflow¶
1. Choose an Issue:
# Check available issues
# Visit: https://github.com/lenhanpham/gaussian-extractor
2. Create a Branch:
# Create and switch to feature branch
git checkout -b feature/descriptive-name
# Or for bug fixes
git checkout -b bugfix/issue-number-description
3. Make Changes:
# Make your changes following coding standards
# Add tests for new functionality
# Update documentation as needed
4. Test Your Changes:
# Build and test
make debug
make test
# Run code quality checks
make lint
5. Commit Your Changes:
# Stage your changes
git add .
# Commit with descriptive message
git commit -m "feat: add new feature description
- What was changed
- Why it was changed
- How it was tested"
6. Push and Create Pull Request:
# Push your branch
git push origin feature/your-feature-name
# Create pull request on GitHub
Pull Request Guidelines¶
PR Title Format:
type(scope): description
Types: feat, fix, docs, style, refactor, test, chore
PR Description Template:
## Description
Brief description of the changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
## Testing
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed
## Checklist
- [ ] Code follows style guidelines
- [ ] Documentation updated
- [ ] Tests pass
- [ ] No breaking changes
Code Review Process¶
Review Checklist:
- [ ] Code follows established patterns 
- [ ] Appropriate error handling 
- [ ] Thread safety considerations 
- [ ] Performance implications 
- [ ] Documentation updated 
- [ ] Tests included 
- [ ] No security vulnerabilities 
Review Comments:
- Be constructive and specific 
- Suggest improvements, don’t just point out problems 
- Reference coding standards when applicable 
- Acknowledge good practices 
Testing Guidelines¶
Unit Testing¶
Test Structure:
#include <gtest/gtest.h>
#include "core/command_system.h"
class CommandParserTest : public ::testing::Test {
protected:
    void SetUp() override {
        // Setup code
    }
    void TearDown() override {
        // Cleanup code
    }
};
TEST_F(CommandParserTest, ParseExtractCommand) {
    // Test extract command parsing
    char* argv[] = {"gaussian_extractor.x", "extract", "-t", "300"};
    CommandContext context = CommandParser::parse(4, argv);
    EXPECT_EQ(context.command, CommandType::EXTRACT);
    EXPECT_EQ(context.temp, 300.0);
}
Running Tests:
# Run all tests
make test
# Run specific test
./test_runner --gtest_filter=CommandParserTest.ParseExtractCommand
# Run with coverage
make coverage
Integration Testing¶
End-to-End Tests:
# Test complete workflows
./test_integration.sh
# Test with sample data
./gaussian_extractor.x -f test_data/ --output test_results/
Performance Testing¶
Benchmarking:
# Run performance benchmarks
make benchmark
# Profile application
valgrind --tool=callgrind ./gaussian_extractor.x [args]
# Memory profiling
valgrind --tool=massif ./gaussian_extractor.x [args]
Continuous Integration¶
CI/CD Pipeline¶
Automated Testing:
- Build: Compile on multiple platforms (Linux, Windows) 
- Test: Run unit and integration tests 
- Lint: Code quality checks 
- Docs: Build documentation 
- Release: Automated releases 
GitHub Actions Workflow:
name: CI
on: [push, pull_request]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Build
        run: make -j 4
      - name: Test
        run: make test
      - name: Lint
        run: make lint
Release Process¶
Version Numbering¶
Semantic Versioning:
MAJOR.MINOR.PATCH
- MAJOR: Breaking changes
- MINOR: New features (backward compatible)
- PATCH: Bug fixes (backward compatible)
Release Checklist:
- [ ] Update version in version.h 
- [ ] Update CHANGELOG.md 
- [ ] Update documentation 
- [ ] Create release branch 
- [ ] Run full test suite 
- [ ] Create GitHub release 
- [ ] Update package repositories 
Release Commands:
# Create release branch
git checkout -b release/v1.2.3
# Update version
echo "1.2.3" > VERSION
# Commit and tag
git add VERSION
git commit -m "Release v1.2.3"
git tag -a v1.2.3 -m "Release v1.2.3"
# Push release
git push origin release/v1.2.3
git push origin v1.2.3
Support and Communication¶
Communication Channels:
- GitHub Issues: Bug reports and feature requests 
- GitHub Discussions: General questions and discussions 
- Pull Request Comments: Code review discussions 
Getting Help:
- Check existing issues and documentation first 
- Use descriptive titles for issues 
- Provide minimal reproducible examples 
- Include system information and versions 
Community Guidelines:
- Be respectful and constructive 
- Help newcomers learn and contribute 
- Follow the code of conduct 
- Acknowledge contributions from others 
This developer guide provides comprehensive information for contributing to the Gaussian Extractor project. Following these guidelines ensures high-quality, maintainable code that benefits the entire community.