Transformer Debugger

一般编程问题

下载此实例

开发语言：Others
实例大小：0.67M
下载次数：2
浏览次数：45
发布时间：2024-07-02
实例类别：一般编程问题
发布人：chenxiaolan
文件格式：.zip
所需积分：2

实例介绍

【实例简介】

Transformer Debugger (TDB)是OpenAI的Superalignment团队开发的工具，旨在支持对小型语言模型特定行为的调查。该工具结合了自动解释技术和稀疏自动编码器，能够快速探索并干预前向传播，以了解其对特定行为的影响。TDB能够回答诸如“为什么模型在这个提示下输出token A而不是token B？”或“为什么注意力头H在这个提示下关注token T？”等问题。它通过识别特定组件（神经元、注意力头、自动编码器潜变量）来解释行为，展示造成这些组件最强烈激活的自动生成的解释，并追踪组件之间的连接以帮助发现电路。

这些视频概述了TDB，并展示了如何使用它来调查GPT-2 small中的间接对象识别：

介绍
神经元查看器页面
示例：调查名字移动头，第1部分
示例：调查名字移动头，第2部分

【实例截图】
【核心代码】
文件清单
└── transformer-debugger-dc1f898725113bec6cf1006e48f9c5219f8fbdde
    ├── datasets.md
    ├── LICENSE
    ├── mypy.ini
    ├── neuron_explainer
    │   ├── activations
    │   │   ├── activation_records.py
    │   │   ├── activations.py
    │   │   ├── attention_utils.py
    │   │   ├── derived_scalars
    │   │   │   ├── activations_and_metadata.py
    │   │   │   ├── attention.py
    │   │   │   ├── autoencoder.py
    │   │   │   ├── config.py
    │   │   │   ├── derived_scalar_store.py
    │   │   │   ├── derived_scalar_types.py
    │   │   │   ├── direct_effects.py
    │   │   │   ├── edge_activation.py
    │   │   │   ├── edge_attribution.py
    │   │   │   ├── indexing.py
    │   │   │   ├── __init__.py
    │   │   │   ├── least_common_tokens.py
    │   │   │   ├── locations.py
    │   │   │   ├── logprobs.py
    │   │   │   ├── make_scalar_derivers.py
    │   │   │   ├── mlp.py
    │   │   │   ├── multi_group.py
    │   │   │   ├── multi_pass_scalar_deriver.py
    │   │   │   ├── node_write.py
    │   │   │   ├── postprocessing.py
    │   │   │   ├── raw_activations.py
    │   │   │   ├── README.md
    │   │   │   ├── reconstituted.py
    │   │   │   ├── reconstituter_class.py
    │   │   │   ├── residual.py
    │   │   │   ├── scalar_deriver.py
    │   │   │   ├── tests
    │   │   │   │   ├── test_attention.py
    │   │   │   │   ├── test_derived_scalar_store.py
    │   │   │   │   ├── test_derived_scalar_types.py
    │   │   │   │   └── utils.py
    │   │   │   ├── tokens.py
    │   │   │   ├── utils.py
    │   │   │   └── write_tensors.py
    │   │   ├── hook_graph.py
    │   │   └── test_attention_utils.py
    │   ├── activation_server
    │   │   ├── derived_scalar_computation.py
    │   │   ├── dst_helpers.py
    │   │   ├── explainer_routes.py
    │   │   ├── explanation_datasets.py
    │   │   ├── inference_routes.py
    │   │   ├── interactive_model.py
    │   │   ├── load_neurons.py
    │   │   ├── main.py
    │   │   ├── neuron_datasets.py
    │   │   ├── README.md
    │   │   ├── read_routes.py
    │   │   ├── requests_and_responses.py
    │   │   └── tdb_conversions.py
    │   ├── api_client.py
    │   ├── explanations
    │   │   ├── attention_head_scoring.py
    │   │   ├── calibrated_simulator.py
    │   │   ├── explainer.py
    │   │   ├── explanations.py
    │   │   ├── few_shot_examples.py
    │   │   ├── __init__.py
    │   │   ├── prompt_builder.py
    │   │   ├── scoring.py
    │   │   ├── simulator.py
    │   │   ├── test_explainer.py
    │   │   └── test_simulator.py
    │   ├── fast_dataclasses
    │   │   ├── fast_dataclasses.py
    │   │   ├── __init__.py
    │   │   └── test_fast_dataclasses.py
    │   ├── file_utils.py
    │   ├── __init__.py
    │   ├── models
    │   │   ├── autoencoder_context.py
    │   │   ├── autoencoder.py
    │   │   ├── hooks.py
    │   │   ├── inference_engine_type_registry.py
    │   │   ├── __init__.py
    │   │   ├── model_component_registry.py
    │   │   ├── model_context.py
    │   │   ├── model_registry.py
    │   │   ├── README.md
    │   │   └── transformer.py
    │   ├── pydantic
    │   │   ├── camel_case_base_model.py
    │   │   ├── hashable_base_model.py
    │   │   ├── immutable.py
    │   │   └── __init__.py
    │   ├── scripts
    │   │   ├── create_hf_test_data.py
    │   │   └── download_from_hf.py
    │   └── tests
    │       ├── conftest.py
    │       ├── test_activation_reconstituter.py
    │       ├── test_against_data.py
    │       ├── test_all_dsts.py
    │       ├── test_emb_dsts.py
    │       ├── test_hooks.py
    │       ├── test_interactive_model.py
    │       ├── test_model_context_get_weight.py
    │       ├── test_offline_autoencoder_dsts.py
    │       ├── test_online_autoencoder_dsts.py
    │       ├── test_postprocessing.py
    │       ├── test_reconstituted_gradients.py
    │       ├── test_serialization_of_model_config_from_model_context.py
    │       ├── test_trace_through_v.py
    │       └── test_transformer.py
    ├── neuron_viewer
    │   ├── package.json
    │   ├── package-lock.json
    │   ├── prepend_autogen_comments.sh
    │   ├── public
    │   │   ├── favicon.ico
    │   │   ├── logo192.png
    │   │   ├── logo512.png
    │   │   ├── manifest.json
    │   │   └── robots.txt
    │   ├── README.md
    │   ├── src
    │   │   ├── App.css
    │   │   ├── App.tsx
    │   │   ├── client
    │   │   │   ├── core
    │   │   │   │   ├── ApiError.ts
    │   │   │   │   ├── ApiRequestOptions.ts
    │   │   │   │   ├── ApiResult.ts
    │   │   │   │   ├── CancelablePromise.ts
    │   │   │   │   ├── OpenAPI.ts
    │   │   │   │   └── request.ts
    │   │   │   ├── index.ts
    │   │   │   ├── models
    │   │   │   │   ├── AblationSpec.ts
    │   │   │   │   ├── ActivationLocationType.ts
    │   │   │   │   ├── AttentionHeadRecordResponse.ts
    │   │   │   │   ├── AttentionTraceType.ts
    │   │   │   │   ├── AttributedScoredExplanation.ts
    │   │   │   │   ├── BatchedRequest.ts
    │   │   │   │   ├── BatchedResponse.ts
    │   │   │   │   ├── BatchedTdbRequest.ts
    │   │   │   │   ├── ComponentTypeForAttention.ts
    │   │   │   │   ├── ComponentTypeForMlp.ts
    │   │   │   │   ├── DerivedAttentionScalarsRequestSpec.ts
    │   │   │   │   ├── DerivedAttentionScalarsRequest.ts
    │   │   │   │   ├── DerivedAttentionScalarsResponseData.ts
    │   │   │   │   ├── DerivedAttentionScalarsResponse.ts
    │   │   │   │   ├── DerivedScalarsRequestSpec.ts
    │   │   │   │   ├── DerivedScalarsRequest.ts
    │   │   │   │   ├── DerivedScalarsResponseData.ts
    │   │   │   │   ├── DerivedScalarsResponse.ts
    │   │   │   │   ├── DerivedScalarType.ts
    │   │   │   │   ├── Dimension.ts
    │   │   │   │   ├── ExistingExplanationsRequest.ts
    │   │   │   │   ├── ExplanationResult.ts
    │   │   │   │   ├── GroupId.ts
    │   │   │   │   ├── HTTPValidationError.ts
    │   │   │   │   ├── InferenceAndTokenData.ts
    │   │   │   │   ├── InferenceRequestSpec.ts
    │   │   │   │   ├── InferenceResponseAndResponseDict.ts
    │   │   │   │   ├── InferenceResponse.ts
    │   │   │   │   ├── InferenceSubRequest.ts
    │   │   │   │   ├── LossFnConfig.ts
    │   │   │   │   ├── LossFnName.ts
    │   │   │   │   ├── MirroredActivationIndex.ts
    │   │   │   │   ├── MirroredNodeIndex.ts
    │   │   │   │   ├── MirroredTraceConfig.ts
    │   │   │   │   ├── ModelInfoResponse.ts
    │   │   │   │   ├── MultipleTopKDerivedScalarsRequestSpec.ts
    │   │   │   │   ├── MultipleTopKDerivedScalarsRequest.ts
    │   │   │   │   ├── MultipleTopKDerivedScalarsResponseData.ts
    │   │   │   │   ├── MultipleTopKDerivedScalarsResponse.ts
    │   │   │   │   ├── NeuronDatasetMetadata.ts
    │   │   │   │   ├── NeuronRecordResponse.ts
    │   │   │   │   ├── NodeAblation.ts
    │   │   │   │   ├── NodeIdAndDatasets.ts
    │   │   │   │   ├── NodeToTrace.ts
    │   │   │   │   ├── NodeType.ts
    │   │   │   │   ├── PassType.ts
    │   │   │   │   ├── PreOrPostAct.ts
    │   │   │   │   ├── ProcessingResponseDataType.ts
    │   │   │   │   ├── ScoredTokensRequestSpec.ts
    │   │   │   │   ├── ScoredTokensResponseData.ts
    │   │   │   │   ├── ScoreRequest.ts
    │   │   │   │   ├── ScoreResult.ts
    │   │   │   │   ├── TdbRequestSpec.ts
    │   │   │   │   ├── Tensor0D.ts
    │   │   │   │   ├── Tensor1D.ts
    │   │   │   │   ├── Tensor2D.ts
    │   │   │   │   ├── Tensor3D.ts
    │   │   │   │   ├── TensorType.ts
    │   │   │   │   ├── TokenAndAttentionScalars.ts
    │   │   │   │   ├── TokenAndScalar.ts
    │   │   │   │   ├── TokenPairAttributionRequestSpec.ts
    │   │   │   │   ├── TokenPairAttributionResponseData.ts
    │   │   │   │   ├── TokenScoringType.ts
    │   │   │   │   ├── TopTokensAttendedTo.ts
    │   │   │   │   ├── TopTokens.ts
    │   │   │   │   └── ValidationError.ts
    │   │   │   └── services
    │   │   │       ├── ExplainerService.ts
    │   │   │       ├── HelloWorldService.ts
    │   │   │       ├── InferenceService.ts
    │   │   │       ├── MemoryService.ts
    │   │   │       └── ReadService.ts
    │   │   ├── colors.ts
    │   │   ├── commonUiComponents.tsx
    │   │   ├── heatmapGrid2d.tsx
    │   │   ├── heatmapGrid.tsx
    │   │   ├── images.d.ts
    │   │   ├── index.css
    │   │   ├── index.html
    │   │   ├── index.tsx
    │   │   ├── modelInteractions.tsx
    │   │   ├── navigation.tsx
    │   │   ├── nodePage.tsx
    │   │   ├── panes
    │   │   │   ├── activationsForPrompt.tsx
    │   │   │   ├── datasetExamples.tsx
    │   │   │   ├── explanation.tsx
    │   │   │   ├── fetchAndDisplayPane.tsx
    │   │   │   ├── index.ts
    │   │   │   ├── logitLens.tsx
    │   │   │   └── scoreExplanation.tsx
    │   │   ├── plots.tsx
    │   │   ├── requests
    │   │   │   ├── explainerRequests.ts
    │   │   │   ├── inferenceRequests.ts
    │   │   │   ├── paths.ts
    │   │   │   └── readRequests.ts
    │   │   ├── tokenHeatmap2d.tsx
    │   │   ├── tokenHeatmap.tsx
    │   │   ├── tokenRendering.tsx
    │   │   ├── TransformerDebugger
    │   │   │   ├── cards
    │   │   │   │   ├── BySequenceTokenDisplay.tsx
    │   │   │   │   ├── DisplayOptions.tsx
    │   │   │   │   ├── inference_params
    │   │   │   │   │   ├── AblateNodeSpecs.tsx
    │   │   │   │   │   ├── InferenceParamsDisplay.tsx
    │   │   │   │   │   ├── inferenceParams.ts
    │   │   │   │   │   ├── TokenLabel.tsx
    │   │   │   │   │   └── TraceUpstreamNodeSpec.tsx
    │   │   │   │   ├── LayerDisplay.tsx
    │   │   │   │   ├── LogitsDisplay.tsx
    │   │   │   │   ├── node_table
    │   │   │   │   │   ├── NodeTable.tsx
    │   │   │   │   │   └── TopTokensDisplay.tsx
    │   │   │   │   ├── prompt
    │   │   │   │   │   ├── MultiTokenInput.tsx
    │   │   │   │   │   ├── PromptAndTokensOfInterest.tsx
    │   │   │   │   │   └── swap.png
    │   │   │   │   ├── SparsityMetricsDisplay.tsx
    │   │   │   │   └── TokenTable.tsx
    │   │   │   ├── common
    │   │   │   │   ├── ExplanatoryTooltip.tsx
    │   │   │   │   └── JsonModal.tsx
    │   │   │   ├── requests
    │   │   │   │   ├── explanationFetcher.ts
    │   │   │   │   ├── inferenceDataFetcher.ts
    │   │   │   │   └── inferenceResponseUtils.tsx
    │   │   │   ├── TransformerDebugger.tsx
    │   │   │   └── utils
    │   │   │       ├── explanations.ts
    │   │   │       ├── nodes.tsx
    │   │   │       ├── numbers.tsx
    │   │   │       └── urlParams.ts
    │   │   ├── types.ts
    │   │   └── welcome.tsx
    │   ├── tailwind.config.js
    │   └── tsconfig.json
    ├── pyproject.toml
    ├── pytest.ini
    ├── README.md
    ├── setup.py
    └── terminology.md

29 directories, 252 files

标签：

实例下载地址