API Reference¶
Auto-generated API documentation for the AMD MI300X benchmarking suite.
amd_bench.core.analysis.BenchmarkAnalyzer ¶
BenchmarkAnalyzer(config: AnalysisConfig)
Comprehensive analyzer for vLLM benchmark results
Initialize analyzer with configuration
Methods:
Name | Description |
---|---|
discover_experiment_files | Discover and match all files for each experiment. |
load_monitoring_data | Load monitoring data for an experiment from CSV files. |
process_results | Process all benchmark result files and generate comprehensive analysis. |
Source code in src/amd_bench/core/analysis.py
discover_experiment_files ¶
Discover and match all files for each experiment.
This method systematically searches for benchmark result files and their associated monitoring data (logs, CPU metrics, GPU power/temperature data), creating a comprehensive mapping of all experiment files. It handles flexible directory structures and provides filtering options based on completeness requirements.
The discovery process follows these steps: 1. Try to discover the master benchmark log run 2. Locate all JSON result files matching the configured pattern 3. For each result file, search for corresponding monitoring files 4. Build ExperimentFiles objects containing all related file paths 5. Apply filtering based on completeness requirements if specified
Returns: List[ExperimentFiles]: A list of ExperimentFiles objects, each containing: - result_file: Path to the JSON benchmark results file - log_file: Optional path to execution log file - cpu_metrics_file: Optional path to CPU monitoring CSV - gpu_power_file: Optional path to GPU power monitoring CSV - gpu_temp_file: Optional path to GPU temperature monitoring CSV
Raises: FileNotFoundError: If no result files are found matching the pattern. PermissionError: If files exist but are not readable.
Note: The method respects the require_complete_monitoring
configuration flag. When enabled, only experiments with all monitoring files are returned. File matching is based on basename pattern matching across subdirectories.
Example:
analyzer = BenchmarkAnalyzer(config)
experiments = analyzer.discover_experiment_files()
print(f"Found {len(experiments)} complete experiment file sets")
for exp in experiments:
if exp.has_complete_monitoring:
print(f"Complete monitoring data for {exp.result_file.name}")
Source code in src/amd_bench/core/analysis.py
load_monitoring_data staticmethod
¶
Load monitoring data for an experiment from CSV files.
This method loads and preprocesses hardware monitoring data associated with a specific experiment, including CPU utilization, GPU power consumption, and thermal metrics. It handles multiple file formats and performs data validation and timestamp normalization.
The method processes three types of monitoring data: - CPU Metrics: System utilization, load averages, idle percentages - GPU Power: Per-device power consumption over time - GPU Temperature: Edge and junction temperatures for thermal analysis
Args: experiment (ExperimentFiles): Container with paths to monitoring files. Must contain at least one of: cpu_metrics_file, gpu_power_file, gpu_temp_file. Missing files are silently skipped.
Returns: Dict[str, pd.DataFrame]: Dictionary mapping data types to DataFrames: - 'cpu': CPU monitoring data with timestamp column - 'gpu_power': GPU power consumption data - 'gpu_temp': GPU temperature monitoring data
Each DataFrame includes a standardized 'timestamp' column converted
to pandas datetime format for time-series analysis.
Raises: FileNotFoundError: If specified monitoring files don't exist. pd.errors.EmptyDataError: If CSV files are empty or malformed. ValueError: If timestamp columns cannot be parsed.
Example:
experiment = ExperimentFiles(
result_file=Path("result.json"),
cpu_metrics_file=Path("cpu_metrics.csv"),
gpu_power_file=Path("gpu_power.csv")
)
monitoring_data = BenchmarkAnalyzer.load_monitoring_data(experiment)
if 'cpu' in monitoring_data:
cpu_df = monitoring_data['cpu']
print(f"CPU monitoring duration: {cpu_df['timestamp'].max() - cpu_df['timestamp'].min()}")
if 'gpu_power' in monitoring_data:
power_df = monitoring_data['gpu_power']
total_power = power_df.groupby('timestamp')['power_watts'].sum()
print(f"Average total power: {total_power.mean():.1f}W")
Note: - Timestamps are expected in Unix epoch format (seconds since 1970) - GPU data may contain multiple devices with separate readings - Missing or corrupted files are logged as errors but don't raise exceptions - Empty DataFrames are returned for missing monitoring categories
Source code in src/amd_bench/core/analysis.py
849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 |
|
process_results ¶
Process all benchmark result files and generate comprehensive analysis.
This is the main orchestration method that coordinates the complete analysis workflow from raw benchmark files to final reports and visualizations. It handles file discovery, data loading, statistical analysis, visualization generation, and report creation in a fault-tolerant manner.
The processing pipeline includes: 1. File Discovery: Locate all experiment files and validate structure 2. Data Loading: Parse JSON results and extract benchmark metrics 3. Statistical Analysis: Generate performance summaries and comparisons 4. Monitoring Processing: Analyze hardware metrics if available 5. Visualization: Create performance plots and dashboards 6. Report Generation: Produce markdown and JSON analysis reports
The method implements comprehensive error handling and logging to ensure partial results are preserved even if individual steps fail.
Raises: ValueError: If no valid experiment files are found. RuntimeError: If critical analysis steps fail unexpectedly. PermissionError: If output directories cannot be created or accessed.
Side Effects: - Creates output directory structure (tables/, plots/, reports/) - Writes CSV files with statistical summaries - Generates PNG visualization files - Creates comprehensive analysis reports - Logs detailed progress and error information
Example:
config = AnalysisConfig(
input_dir=Path("benchmark_data"),
output_dir=Path("analysis_output"),
generate_plots=True,
include_monitoring_data=True
)
analyzer = BenchmarkAnalyzer(config)
analyzer.process_results()
# Results available in:
# - analysis_output/tables/*.csv
# - analysis_output/plots/*.png
# - analysis_output/reports/*.{md,json}
Note: Processing time scales with dataset size and enabled features. Large datasets with monitoring data may require several minutes. Progress is logged at INFO level for monitoring long-running analyses.
Source code in src/amd_bench/core/analysis.py
442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 |
|
options: show_root_heading: true show_source: false
amd_bench.schemas.benchmark.AnalysisConfig ¶
Bases: BaseModel
Configuration for analysis operations
Methods:
Name | Description |
---|---|
validate_directory_structure | Validate directory structure using configuration parameters. |
validate_filename_formats | Validate filename format configurations |
validate_output_dir | Ensure output directory is resolved. |
validate_directory_structure ¶
validate_directory_structure() -> AnalysisConfig
Validate directory structure using configuration parameters.
Source code in src/amd_bench/schemas/benchmark.py
validate_filename_formats classmethod
¶
Validate filename format configurations
Source code in src/amd_bench/schemas/benchmark.py
validate_output_dir classmethod
¶
Ensure output directory is resolved.
Source code in src/amd_bench/schemas/benchmark.py
options: show_root_heading: true show_source: false