pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

jyejare · 2026-04-27T14:03:10Z

What this PR does / why we need it:

Adds operational metrics instrumentation for Feast's offline store and structured SOX audit logging for both online and offline feature retrieval paths. This is needed to satisfy the operational observability requirements.

Changes:

Offline Store RED Metrics (metrics.py, offline_store.py)

New Prometheus metrics: feast_offline_store_request_total (Counter), feast_offline_store_request_latency_seconds (Histogram), feast_offline_store_row_count (Histogram)
Instrumented RetrievalJob.to_arrow() as the single source of truth for offline retrieval metrics — captures request count, error rate, latency, and row count
Defensive try/except in the finally block ensures instrumentation failures never mask query errors

SOX Audit Logging (metrics.py, feature_server.py, offline_store.py)

emit_online_audit_log(): structured JSON audit entries for online requests — captures requestor identity, entity keys, feature views, feature count, status, and latency
emit_offline_audit_log(): structured JSON audit entries for offline retrievals — captures method, feature views, row count, status, start/end timestamps, and duration
Logs routed to a dedicated feast.audit logger so operators can independently route audit entries to a SOX-compliant sink
Only entity key names are logged (not values) to minimize PII exposure

Online Audit Integration (feature_server.py)

Extracted _parse_feature_info() helper to DRY up feature view name / count extraction (shared by _resolve_feature_counts and _emit_online_audit)
_emit_online_audit() wraps audit emission with best-effort error handling (logs at warning level on failure, never breaks the request)
Wired into /get-online-features endpoint's finally block

Configuration (base_config.py)

offline_features: bool = True — toggle for offline store Prometheus metrics
audit_logging: bool = False — toggle for structured audit log emission (opt-in)
Both flags integrated into _MetricsFlags and build_metrics_flags()

devin-ai-integration

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-04-27T14:05:24Z

+                    now_iso = datetime.now(tz=timezone.utc).isoformat()
+                    feast_metrics.emit_offline_audit_log(
+                        method="to_arrow",
+                        feature_views=feature_views,
+                        feature_count=feature_count,
+                        row_count=row_count,
+                        status=status_label,
+                        start_time=now_iso,
+                        end_time=now_iso,


🟡 Offline audit log start_time and end_time are always identical

In RetrievalJob.to_arrow(), the now_iso timestamp used for both start_time and end_time in the audit log is captured at a single point in the finally block (line 201), after the query has already completed. This means both fields always contain the same value, making start_time meaningless. The start_time should instead be derived from now - elapsed (e.g., datetime.now(tz=timezone.utc) - timedelta(seconds=elapsed)) to reflect when the operation actually began.

Suggested change

now_iso = datetime.now(tz=timezone.utc).isoformat()

feast_metrics.emit_offline_audit_log(

method="to_arrow",

feature_views=feature_views,

feature_count=feature_count,

row_count=row_count,

status=status_label,

start_time=now_iso,

end_time=now_iso,

end_time = datetime.now(tz=timezone.utc)

start_time = end_time - __import__('datetime').timedelta(seconds=elapsed)

feast_metrics.emit_offline_audit_log(

method="to_arrow",

feature_views=feature_views,

feature_count=feature_count,

row_count=row_count,

status=status_label,

start_time=start_time.isoformat(),

end_time=end_time.isoformat(),

Was this helpful? React with 👍 or 👎 to provide feedback.

Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>

ntkathole · 2026-04-28T14:04:14Z

+            >= before_sum + 500
+        )
+
+    def test_row_count_not_recorded_when_zero(self):


is this correct? probbaly it should be recorded so that there is difference in metric not emitted vs query returned 0 rows ?

ntkathole · 2026-04-28T14:05:31Z

+    audit_logger.info(
+        _json_dumps(
+            {
+                "event": "offline_feature_retrieval",


also log current timestamp ? similar to emit_online_audit_log

ntkathole · 2026-04-28T14:09:01Z

Add metrics in docs/reference/feature-servers/python-feature-server.md

ntkathole · 2026-04-28T14:16:19Z

-        fv_names = {ref.split(":")[0].split("@")[0] for ref in features if ":" in ref}
-        fv_count = len(fv_names)
+        fv_names = list(
+            {ref.split(":")[0].split("@")[0] for ref in features if ":" in ref}


might be not new but this has to handle versioning as well. May be use _parse_feature_ref from feast.utils

fv_names = list({_parse_feature_ref(ref)[0] for ref in features if ":" in ref})

jyejare requested review from a team as code owners April 27, 2026 14:03

jyejare requested review from ejscribner, robhowley and shuchu and removed request for a team April 27, 2026 14:03

devin-ai-integration Bot reviewed Apr 27, 2026

View reviewed changes

feat: Operational metrics for offline store and SOX metrics for both

e0a5d54

Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>

jyejare force-pushed the remaining_ops_metrics branch from 9c69d73 to e0a5d54 Compare April 28, 2026 00:57

ntkathole reviewed Apr 28, 2026

View reviewed changes

jyejare mentioned this pull request May 5, 2026

[Feature] Built-in feature drift detection with alerting #6341

Open

pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Operational metrics for offline store and SOX metrics for both#6340

feat: Operational metrics for offline store and SOX metrics for both#6340
jyejare wants to merge 1 commit intofeast-dev:masterfrom
jyejare:remaining_ops_metrics

jyejare commented Apr 27, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 27, 2026

Uh oh!

ntkathole Apr 28, 2026

Uh oh!

ntkathole Apr 28, 2026

Uh oh!

ntkathole commented Apr 28, 2026

Uh oh!

ntkathole Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.

pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Conversation

jyejare commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Changes:

Offline Store RED Metrics (metrics.py, offline_store.py)

SOX Audit Logging (metrics.py, feature_server.py, offline_store.py)

Online Audit Integration (feature_server.py)

Configuration (base_config.py)

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

ntkathole Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

ntkathole Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

ntkathole commented Apr 28, 2026

Uh oh!

ntkathole Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.

jyejare commented Apr 27, 2026 •

edited

Loading