Flink RichFunction题目一则

Zad • 2023-05-04 18:50 • 杂文

前言

祝广大女性节日快乐~

快问快答

Flink DataStream API中的RichFunction有哪些用途/特点？

RichFunction中获取到的RuntimeContext是干什么用的？

所有Function都有对应的RichFunction实现吗？

所有Flink流处理的算子都可以传入RichFunction吗？

前两个问题实际上可以合并成一个问题。RichFunction的特点是比Function多出了生命周期管理（open()和close()方法），以及能够获取其运行时上下文RuntimeContext。RuntimeContext与Function的每个并行实例（即一个Sub-task）相关联，通过它还能进一步得到如下信息：

运行时静态信息，如Task的名称、并行度、最大并行度、当前Sub-task的编号、当前类加载器等；
全局数据结构，即累加器（Accumulators）、广播变量（Broadcast variables）和分布式缓存（Distributed cache）；
创建各种状态句柄，即我们熟知的get***State(StateDescriptor)方法。

第三个问题，yes；第四个问题，no。

RichFunction不适用的场景

简单的开窗聚合场景：

dataStream.keyBy(x -> x.getKey())
  .window(TumblingProcessingTimeWindows.of(Time.seconds(1)))
  .reduce(new MyRichReduceFunction<>())

这段代码能编译通过，但执行时会抛出UnsupportedOperationException，提示ReduceFunction of reduce can not be a RichFunction。如果换成aggregate()方法和RichAggregateFunction会有同样的问题，提示This aggregation function cannot be a RichFunction。在WindowedStream的对应实现中，可以看到此路不通：

    public SingleOutputStreamOperator reduce(ReduceFunction function) {
        if (function instanceof RichFunction) {
            throw new UnsupportedOperationException(
                    "ReduceFunction of reduce can not be a RichFunction. "
                            + "Please use reduce(ReduceFunction, WindowFunction) instead.");
        }

        // clean the closure
        function = input.getExecutionEnvironment().clean(function);
        return reduce(function, new PassThroughWindowFunction<>());
    }

    public  SingleOutputStreamOperator aggregate(AggregateFunction function) {
        checkNotNull(function, "function");

        if (function instanceof RichFunction) {
            throw new UnsupportedOperationException(
                    "This aggregation function cannot be a RichFunction.");
        }

        TypeInformation accumulatorType =
                TypeExtractor.getAggregateFunctionAccumulatorType(
                        function, input.getType(), null, false);

        TypeInformation resultType =
                TypeExtractor.getAggregateFunctionReturnType(
                        function, input.getType(), null, false);

        return aggregate(function, accumulatorType, resultType);
    }

为什么不能用Rich[Reduce / Aggregate]Function？

答案并不难：与FlatMap、Filter等算子不同，Reduce和Aggregate本身就是自带确定的状态语义的算子，不需要用户手动操作状态（如果用户能干预的话大概率会出问题），也不需要生命期管理的特性（它们的生命期总是始于第一条数据，终于最后一条数据）。

以Reduce逻辑为例（Aggregate同理），不妨进一步看下对应的窗口算子是如何构造的。

    public  WindowOperator reduce(
            ReduceFunction reduceFunction, WindowFunction function) {
        Preconditions.checkNotNull(reduceFunction, "ReduceFunction cannot be null");
        Preconditions.checkNotNull(function, "WindowFunction cannot be null");

        if (reduceFunction instanceof RichFunction) {
            throw new UnsupportedOperationException(
                    "ReduceFunction of apply can not be a RichFunction.");
        }

        if (evictor != null) {
            return buildEvictingWindowOperator(
                    new InternalIterableWindowFunction<>(
                            new ReduceApplyWindowFunction<>(reduceFunction, function)));
        } else {
            ReducingStateDescriptor stateDesc =
                    new ReducingStateDescriptor<>(
                            WINDOW_STATE_NAME, reduceFunction, inputType.createSerializer(config));

            return buildWindowOperator(
                    stateDesc, new InternalSingleValueWindowFunction<>(function));
        }
    }

注意到这里创建了ReducingStateDescriptor（ReduceFunction恰好是它的一个入参），并最终获取了内置的ReducingState句柄。其实就DataStream API用户的日常编程习惯而言，很少会主动用到ReducingState（以及AggregateState）。即使这样，在它们的描述符构造方法中，也加了同样的强制校验，防止传入RichFunction，以保护状态的确定性。

    public ReducingStateDescriptor(
            String name, ReduceFunction reduceFunction, Class typeClass) {
        super(name, typeClass, null);
        this.reduceFunction = checkNotNull(reduceFunction);

        if (reduceFunction instanceof RichFunction) {
            throw new UnsupportedOperationException(
                    "ReduceFunction of ReducingState can not be a RichFunction.");
        }
    }

话说回来，Rich[Reduce / Aggregate]Function在Flink工程内部以及示例中都没有有效的使用过，所以我们大概可以判定这是Flink发展过程中的遗产吧（笑

The End

晚安晚安。

版权声明：
作者：Zad
链接：https://www.techfm.club/p/45555.html
来源：TechFM
文章版权归作者所有，未经允许请勿转载。

THE END

代码

二维码

留白阅读418|《低风险创业》创业公司生物态管理法的三行代码

< <上一篇

文章里的虫子（25）：无时无刻还是时时刻刻？净整些没用的还是尽整些没用的？

下一篇>>

搜索内容

Flink RichFunction题目一则

前言

快问快答

RichFunction不适用的场景

为什么不能用Rich[Reduce / Aggregate]Function？

The End

取消回复

共有 0 条评论

Ads