API Reference

This documentation is generated automatically from the package’s docstrings.

class promptstability.core.PromptStabilityAnalysis(annotation_function, data, metric_fn=<function nominal_metric>, parse_function=None, load_generation_models=True)

Bases: object

Core prompt-stability estimation class.

The class supports: - repeated-run intra-prompt stability estimation - paraphrase-based inter-prompt stability estimation - post hoc rescoring from saved annotation tables - summary diagnostics for intra- and inter-PSS outputs

bootstrap_krippendorff(df, annotator_col, bootstrap_samples, confidence_level=95): Compute Krippendorff’s alpha with bootstrap confidence intervals.

extract_inter_score_map(annotated_df)

extract_intra_score_map(annotated_df, analysis_modes=None)

inter_pss(original_text, prompt_postfix=None, nr_variations=5, temperatures=None, iterations=1, bootstrap_samples=1000, print_prompts=False, edit_prompts_path=None, plot=False, save_path=None, save_csv=None): Evaluate between-prompt stability across paraphrase temperatures.

intra_pss(original_text, prompt_postfix=None, iterations=10, bootstrap_samples=1000, analysis_modes=None, plot=False, plot_mode='cumulative_alpha', save_path=None, save_csv=None, return_summaries=False, summary_threshold=0.8, estimate_tolerance=0.01, precision_tolerance=0.02)

Evaluate within-prompt stability via repeated prompt runs.

By default this preserves the original package behavior and returns a cumulative intra-PSS series. When analysis_modes includes adjacent_alpha, the method also computes an adjacent-run series that compares run j to run j-1.

manual_inter_pss(edit_prompts_path, bootstrap_samples=1000, plot=False, save_path=None, save_csv=None): Evaluate inter-PSS from a manually edited prompt-variation CSV.

score_intra_annotations(annotated_df, bootstrap_samples=1000, analysis_modes=None)

Recompute intra-PSS metrics from an existing long-format annotation table.

Parameters:

annotated_df (pandas.DataFrame) – Long-format annotation data with at least id, annotation, and iteration columns.
bootstrap_samples (int, optional) – Number of bootstrap samples used for confidence intervals.
analysis_modes (list[str], optional) – Subset of ["cumulative_alpha", "adjacent_alpha"].

summarize_inter_scores(score_map, threshold=0.8)

summarize_intra_scores(score_map, threshold=0.8, estimate_tolerance=0.01, precision_tolerance=0.02)

promptstability.core.get_api_key(api='openai')

Retrieve an API key for the specified service from environment variables.

Parameters:: api (str, optional) – API service name. Supported values are openai, mistral, anthropic, cohere, and huggingface.
Returns:: The API key value.
Return type:: str

promptstability.core.load_example_data(): Load example data included with the package.