http://starrocks.io logo
Join Slack
Powered by
# announcements
  • b

    Beryl Chen

    03/28/2024, 2:23 PM
    <!here> Just a friendly reminder that our webinar on “Apache Iceberg + StarRocks: Your Recipe for Superior Lakehouse Performance” is happening today at 10 AM PT | 1 PM ET. If you haven’t registered yet, there’s still time to sign up! danceml https://celerdata.wistia.com/live/events/c737g3iuo1
    👍 6
    🙌 1
  • g

    GitHub

    03/29/2024, 11:25 PM
    Release - Release notes 3.1.10 New release published by jaogoy Release date: March 29, 2024 New Features • Primary Key tables support Size-tiered Compaction. #42474 Behavior Changes • When null values in JSON data are evaluated based on the
    IS NULL
    operator, they are considered NULL values following SQL language. For example,
    true
    is returned for
    SELECT parse_json('{"a": null}') -> 'a' IS NULL
    (before this behavior change,
    false
    is returned). #42815 Improvements • When Broker Load is used to load data from ORC files that contain TIMESTAMP-type data, StarRocks supports retaining microseconds in the timestamps when converting the timestamps to match its own DATETIME data type. #42348 Bug Fixes Fixed the following issues: • In shared-data mode, the garbage collection and thread eviction mechanisms for handling persistent indexes created on Primary Key tables cannot take effect on CN nodes. As a result, obsolete data cannot be deleted. #42241 • When users query ORC files by using Hive catalogs, the query results may be incorrect because StarRocks used to read ORC files from Hive based on mapping by position. To resolve this issue, users can set the session variable
    orc_use_column_names
    to
    true
    , which specifies to read ORC files from Hive based on mapping by column name. #42905 • When LDAP authentication for the AD system is adopted, logins without passwords are allowed. #42476 • When disk device names end with digits, the values of monitoring metrics remain 0s because the disk device names may be invalid after such digits are removed. #42741 StarRocks/starrocks
    🔥 2
  • s

    Sida Shen

    04/09/2024, 6:07 PM
    Hey <!channel> The StarRocks community team is looking for your direct feedback on the community experience and thoughts about the project. Positive, negative, or REALLY negative, we want to hear it, and we want to know how we can do better. If you're willing to share a little time with us sometime in the next few weeks just reply to this post or give it a thumbs up and we'll follow up with details.
    🙌 2
    party 2
    👍 8
    o
    p
    +4
    • 7
    • 8
  • b

    Beryl Chen

    04/11/2024, 1:30 PM
    👉 Our StarRocks Connect: Sling Webinar is happening today at 10 AM PT | 1 PM ET. <!channel> Meet Fritz Larco, the brain behind Sling, in our upcoming session. Discover what sets Sling apart and the value it brings to StarRocks users — live demo included. Click here to register: https://celerdata.wistia.com/live/events/ejmvbhfhep Sling (https://slingdata.io/) is a powerful data integration CLI tool that offers an easy solution to create and maintain high-volume data pipelines using the Extract & Load (EL) approach.
    👍 4
    a
    a
    • 3
    • 6
  • b

    Beryl Chen

    04/17/2024, 6:25 PM
    Hi <!channel>, In our ongoing efforts to enhance support and improve your overall experience, we’re making some changes to our Slack community: • Introducing #C06UVQVB668: This channel is designed to be your starting point if you’re new to StarRocks. Whether you’re curious about what StarRocks can do, where to begin, or how to configure your setup, you’ll find all the guidance you need right here. • #using-starrocks is now #C02FACZSNJV: Apart from the name change, you can continue using this channel as usual. Should you have any issues or questions about using StarRocks, feel free to seek the help you need right here! We hope these updates make your time with StarRocks more productive and enjoyable. See you in the channels! yay
    👍 20
    danceml 7
    g
    • 2
    • 1
  • g

    GitHub

    04/19/2024, 8:29 AM
    Release - 3.2.5 (Deprecated) New release published by yingtingdong Release date: April 12, 2024
    TIP
    This version has been taken offline due to privilege issues in querying external tables in external catalogs such as Hive and Iceberg.
    Problem: When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege.
    Impact scope: This problem only affects queries on external tables in external catalogs. Other queries are not affected.
    Temporary workaround: The query succeeds after the SELECT privilege on this table is granted to the user again. But SHOW GRANTS will return duplicate privilege entries. After an upgrade to v3.2.6, users can run REVOKE to remove one of the privilege entries.
    New Features • Supports the dict_mapping column property, which can significantly facilitate the loading process during the construction of a global dictionary, accelerating the exact COUNT DISTINCT calculation. Behavior Changes • When null values in JSON data are evaluated based on the IS NULL operator, they are considered NULL values following SQL language. For example, true is returned for SELECT parse_json('{"a": null}') -> 'a' IS NULL (before this behavior change, false is returned). #42765 Improvements • Optimized the column type unionization rules for automatic schema detection in the FILES table function. When columns with the same name but different types exist in separate files, FILES will attempt to merge them by selecting the type with the larger granularity as the final type. For example, if there are columns with the same name but of types FLOAT and INT respectively, FILES will return DOUBLE as the final type. #40959 • Primary Key tables support Size-tiered Compaction to reduce the I/O amplification. #41130 • When Broker Load is used to load data from ORC files that contain TIMESTAMP-type data, StarRocks supports retaining microseconds in the timestamps when converting the timestamps to match its own DATETIME data type. #42179 • Optimized the error messages for Routine Load. #41306 • Optimized the error messages when the FILES table function is used to convert invalid data types. #42717 Bug Fixes Fixed the following issues: • FEs fail to start after system-defined views are dropped. Dropping system-defined views is now prohibited. #43552 • BEs crash when duplicate sort key columns exist in Primary Key tables. Duplicate sort key columns are now prohibited. #43206 • An error, instead of NULL, is returned when the input value of the to_json() function is NULL. #42171 • In shared-data mode, the garbage collection and thread eviction mechanisms for handling persistent indexes created on Primary Key tables cannot take effect on CN nodes. As a result, obsolete data cannot be deleted. #41955 • In shared-data mode, an error is returned when users modify the enable_persistent_index property of a Primary Key table. #42890 • In shared-data mode, NULL values are given to columns that are not supposed to be changed when users update a Primary Key table with partial updates in column mode. #42355 • Queries cannot be rewritten with asynchronous materialized views created on logical views. #42173 • CNs crash when the Cross-cluster Data Migration Tool is used to migrate Primary Key tables to a shared-data cluster. #42260 • The partition ranges of the external catalog-based asynchronous materialized views are not consecutive. #41957 StarRocks/starrocks
    🙌 1
  • g

    GitHub

    04/19/2024, 8:31 AM
    Release - 3.2.6 New release published by yingtingdong Release date: April 18, 2024 Bug Fixes Fixed the following issue: • The privileges of external tables cannot be found due to incompatibility issues. #44030 StarRocks/starrocks
    🙌 5
  • g

    GitHub

    04/28/2024, 11:55 AM
    Release - Release notes 3.1.11 New release published by jaogoy Release date: April 28, 2024 Behavior Changes • Users are not allowed to drop views in the system database
    information_schema
    using DROP TABLE. #43556 • Users are not allowed to specify duplicate keys in the ORDER BY clause when creating a Primary Key table. #43374 Improvements • Queries on Parquet-formatted Iceberg v2 tables support equality deletes. Bug Fixes Fixed the following issues: • When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege. #44061 •
    str_to_map
    may cause BEs to crash. #43930 • When a Routine Load job is going on, running
    show proc '/routine_loads'
    is stuck due to deadlock. #44249 • Persistent Index of Primary Key tables may cause BEs to crash due to issues in concurrency control. #43720 • The
    pending_task_run_count
    displayed on the page of
    leaderFE_IP:8030
    is incorrect. The displayed number is the sum of Pending and Running tasks, not Pending tasks. In addition, the information of the metric
    refresh_pending
    cannot be displayed using
    followerFE_IP:8030
    . #43052 • Some SQL queries that contain CTEs may encounter the
    Invalid plan: PhysicalTopNOperator
    error. #44185 StarRocks/starrocks
  • g

    GitHub

    06/21/2024, 1:50 AM
    Release - Release notes 2.5.22 New release published by jaogoy Release date: June 20, 2024 Improvements • Optimized a partition check logic used for building query execution plan, significantly reducing the time consumption of complex queries that involve multiple tables. #46781 Bug Fixes Fixed the following issues: • Function Call does not handle child errors correctly. #42590 • The internal data statistics were not cleaned up regularly, causing inaccurate estimated information and thereby inefficient query plans. This will cause a drop in query performance and a surge in memory usage. #45839 • Using a stale column histogram may lead to the Division by Zero exception. #45614 StarRocks/starrocks
  • g

    GitHub

    06/21/2024, 12:11 PM
    Release - 3.3.0 New release published by Dshadowzh New Features and Improvements Shared-data Cluster • Optimized the performance of Schema Evolution in shared-data clusters, reducing the time consumption of DDL changes to a sub-second level. For more information, see Schema Evolution. • To satisfy the requirement for data migration from shared-nothing clusters to shared-data clusters, the community officially released the StarRocks Data Migration Tool. It can also be used for data synchronization and disaster recovery between shared-nothing clusters. • [Preview] AWS Express One Zone Storage can be used as storage volumes, significantly improving read and write performance. For more information, see CREATE STORAGE VOLUME. • Optimized the garbage collection (GC) mechanism in shared-data clusters. Supports manual compaction for data in object storage. For more information, see Manual Compaction. • Optimized the Publish execution of Compaction transactions for Primary Key tables in shared-data clusters, reducing I/O and memory overhead by avoiding reading primary key indexes. Data Lake Analytics • Data Cache enhancements • Added the Data Cache Warmup command CACHE SELECT to fetch hotspot data from data lakes, which speeds up queries and minimizes resource usage. CACHE SELECT can work with SUBMIT TASK to achieve periodic cache warmup. This feature supports both tables in external catalogs and internal tables in shared-data clusters. • Added metrics and monitoring methods to enhance the observability of Data Cache. • Parquet reader performance enhancements • Optimized Page Index, significantly reducing the data scan size. • Reduced the occurrence of reading unnecessary pages when Page Index is used. • Uses SIMD to accelerate the computation to determine whether data rows are empty. • ORC reader performance enhancements • Uses column ID for predicate pushdown to read ORC files after Schema Change. • Optimized the processing logic for ORC tiny stripes. • Iceberg table format enhancements • Significantly improved the metadata access performance of the Iceberg Catalog by refactoring the parallel Scan logic. Resolved the single-threaded I/O bottleneck in the native Iceberg SDK when handling large volumes of metadata files. As a result, queries with metadata bottlenecks now experience more than a 10-fold performance increase. • Queries on Parquet-formatted Iceberg v2 tables support equality deletes. • [Experimental] Paimon Catalog enhancements • Materialized views created based on the Paimon external tables now support automatic query rewriting. • Optimized Scan Range scheduling for queries against the Paimon Catalog, improving I/O concurrency. • Support for querying Paimon system tables. • Paimon external tables now support DELETE Vectors, enhancing query efficiency in update and delete scenarios. • Enhancements in collecting external table statistics • ANALYZE TABLE can be used to collect histograms of external tables, which helps prevent data skews. • Supports collecting statistics of STRUCT subfields. • Table sink enhancements • The performance of the Sink operator is doubled compared to Trino. • Data can be sunk to Textfile- and ORC-formatted tables in Hive catalogs and storage systems such as HDFS and cloud storage like AWS S3. • [Preview] Supports Alibaba Cloud MaxCompute catalogs, with which you can query data from MaxCompute without ingestion and directly transform and load the data from MaxCompute by using INSERT INTO. • [Experimental] Supports ClickHouse Catalog. • [Experimental] Supports Kudu Catalog. Performance Improvement and Query Optimization • Optimized performance on ARM. Significantly optimized performance for ARM architecture instruction sets. Performance tests under AWS Graviton instances showed that the ARM architecture was 11% faster than the x86 architecture in the SSB 100G test, 39% faster in the Clickbench test, 13% faster in the TPC-H 100G test, and 35% faster in the TPC-DS 100G test. • Spill to Disk is in GA. Optimized the memory usage of complex queries and improved spill scheduling, allowing large queries to run stably without OOM. • [Preview] Supports spilling intermediate results to object storage. • Supports more indexes. • [Preview] Supports full-text inverted index to accelerate full-text searches. • [Preview] Supports N-Gram bloom filter index to speed up LIKE queries and the computation speed of ngram_search and ngram_search_case_insensitive functions. • Improved the performance and memory usage of Bitmap functions. Added the capability to export Bitmap data to Hive by using Hive Bitmap UDFs. • [Preview] Supports Flat JSON. This feature automatically detects JSON data during data loading, extracts common fields from the JSON data, and stores these fields in a columnar manner. This improves JSON query performance, comparable to querying STRUCT data. • [Preview] Optimized global dictionary. provides a dictionary object to store the mapping of key-value pairs from a dictionary table in the BE memory. A new dictionary_get() function is now used to directly query the dictionary object in the BE memory, accelerating the speed of querying the dictionary table compared to using the dict_mapping() function. Furthermore, the dictionary object can also serve as a dimension table. Dimension values can be obtained by directly querying the dictionary object using dictionary_get(), resulting in faster query speeds than the original method of performing JOIN operations on the dimension table to obtain dimension values. • [Preview] Supports Colocate Group Execution. significantly reduces memory usage for executing Join and Agg operators on the colocated tables, which ensures that large queries can be executed more stably. • Optimized the performance of CodeGen. JIT is enabled by default, which achieves a 5X performance improvement for complex expression calculations. • Supports using vectorization technology to implement regular expression matching, which reduces the CPU consumption of the regexp_replace function. • Optimized Broadcast Join so that the Broadcast Join operation can be terminated in advance when the right table is empty. • Optimized Shuffle Join in scenarios of data skew to prevent OOM. • When an aggregate query contains Limit, multiple Pipeline threads can share the Limit condition to prevent compute resource consumption. Storage Optimization and Cluster Management • Enhanced flexibility of range partitioning. Three time functions can be use… StarRocks/starrocks
    🎉 6
    starrocks 4
    👍 8
    🚀 2
  • g

    GitHub

    06/26/2024, 4:50 AM
    Release - Release notes 3.1.13 New release published by jaogoy Release date: June 26, 2024 Improvements • The Broker process supports access to Tencent Cloud COS Posix buckets. Users can load data from COS Posix buckets using Broker Load or unload data to COS Posix buckets using the SELECT INTO OUTFILE statement. #46597 • Supports viewing comments of Hive tables in Hive Catalogs using SHOW CREATE TABLE. #37686 • Optimized the evaluation time of Conjunct in WHERE clauses, such as multiple LIKE clauses on the same column or CASE WHEN expressions. #46914 Bug Fixes Fixed the following issues: • DELETE statements fail in shared-data clusters if there are excessive number of partitions to be deleted. #46229 StarRocks/starrocks
    👍 1
    starrocks 2
  • b

    Beryl Chen

    06/27/2024, 1:30 PM
    Just a friendly reminder that today’s session, Materialized Views: Tips, Tricks, and Use Cases is happening at 10 AM PT | 1 PM ET. <!channel> In this session, @Murphy, the technical mind behind StarRocks’ materialized views for a deep dive session on the latest updates to this feature and learn how you can leverage it to achieve the best query performance for your data pipeline. danceml (Can’t join us live? That’s perfectly okay! Please sign up, and we will make sure you get a copy of the recording.) https://celerdata.wistia.com/live/events/0qs82agehn
    🙌 5
    👍 4
    m
    g
    m
    • 4
    • 5
  • b

    Beryl Chen

    07/08/2024, 8:24 PM
    danceml Here are some helpful resources to get you up to speed with the latest StarRocks 3.3 <!channel>: • Release Notes: StarRocks 3.3 Release Notes • Guide: StarRocks 3.3 Features and Improvements • Webinar Recording:

    Watch the Webinar▾

    If you encounter any issues or have questions, don’t hesitate to post in the #C02FACZSNJV or #C06UVQVB668 channel. We’re here to help!
    🎉 1
    ❤️ 2
    🙌 9
  • g

    GitHub

    07/11/2024, 12:23 PM
    Release - 3.2.9 New release published by yingtingdong New Features • Paimon tables now support DELETE Vectors. #45866 • Supports Column-level access control through Apache Ranger. #47702 • Stream Load can automatically convert JSON strings into STRUCT/MAP/ARRAY types during loading. #45406 • JDBC Catalog now supports Oracle and SQL Server. #35691 Improvements • Improved privilege management by restricting user_admin role users from resetting the password of the root user. #47801 • Stream Load now supports using \t and \n as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. #47302 • Optimized memory usage during data loading. #47047 • Supports masking authentication information for the Files() function in audit logs. #46893 • Hive tables now support the skip.header.line.count property. #47001 • JDBC Catalog supports more data types. #47618 Bug Fixes Fixed the following issues: • BE crash caused by ALTER TABLE ADD COLUMN after upgrading a shared-data cluster from v3.2.x to v3.3.0 and then rolling it back. #47826 • Tasks initiated through SUBMIT TASK showed a Running status indefinitely in the QueryDetail interface. #47619 • Forwarding queries to the FE Leader node caused a null pointer exception. #47559 • SHOW MATERIALIZED VIEWS with WHERE conditions caused a null pointer exception. #47811 • Vertical Compaction fails for Primary Key tables in shared-data clusters. #47192 • Improper handling of I/O Error when sinking data to Hive or Iceberg tables. #46979 • Table properties do not take effect when whitespaces are added to their values. #47119 • BE crash caused by concurrent migration and Index Compaction operations on Primary Key tables. #46675 StarRocks/starrocks
    🎉 5
    👍 3
  • b

    Beryl Chen

    07/12/2024, 7:40 PM
    <!channel> Here’s the recording of our ‘Materialized Views: Tips, Tricks, and Use Cases’ webinar. If you missed it or want to revisit the content, feel free to take a look. danceml

    https://youtu.be/f4E7qT4JCso?si=jInO25tog9JGGiLd▾

    🚀 4
    👍 8
    🙌 2
  • b

    Beryl Chen

    07/18/2024, 4:00 PM
    <!channel> We know many of you are curious about how StarRocks compares to Doris. Here is an in-depth comparison guide, and hopefully it answers all your questions danceml https://starrocks.medium.com/detailed-comparison-between-starrocks-and-apache-doris-81ddd34be527
    👍 8
    ❤️ 3
    starrocks 2
    👍🏼 1
    🎉 4
  • g

    GitHub

    07/19/2024, 3:31 AM
    Release - 3.3.1 New release published by wangsimo0 Release date: July 18, 2024 New Features • [Preview] Supports temporary tables. • [Preview] JDBC Catalog supports Oracle and SQL Server. • [Preview] Unified Catalog supports Kudu. • Loading data into Primary Key tables with INSERT INTO supports partial updates in column mode. • User-defined variables support the ARRAY type. #42631 • Stream Load supports converting JSON-type data and loading it into columns of STRUCT/MAP/ARRAY types. #45406 • Supports global dictionary cache. • Supports deleting partitions in batch. #44744 • Supports queries on Iceberg views. #46273 • Supports managing column-level permissions in Apache Ranger. (Column-level permissions for materialized views and views must be set under the table object.) #47702 Improvements • Optimized the IdChain hashcode implementation to reduce the FE restart time. #47599 • Improved error messages for the
    csv.trim_space
    parameter in the FILES() function, checking for illegal characters and providing reasonable prompts. #44740 • Stream Load supports using
    \t
    and
    \n
    as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. #47302 Bug Fixes Fixed the following issues: • Schema Change failures due to file location changes caused by Tablet migration during the Schema Change process. #45517 • Cross-cluster Data Migration Tool fails to create tables in the target cluster due to control characters such as
    \
    ,
    \r
    in the default values of fields. #47861 • Persistent bRPC failures after BE restarts. #40229 • The
    user_admin
    role can change the root password using the ALTER USER command. #47801 • Primary key index write failures cause data write errors. #48045 Behavior Changes • Intermediate result spilling is enabled by default when sinking data to Hive and Iceberg. #47118 • Changed the default value of the BE configuration item
    max_cumulative_compaction_num_singleton_deltas
    to
    500
    . #47621 • When users create a partitioned table without specifying the bucket number, if the number of partitions exceeds 5, the rule for setting the bucket count is changed to
    max(2*BE or CN count, bucket number calculated based on the largest historical partition data volume)
    . The previous rule was to calculate the bucket number based on the largest historical partition data volume). #47949 Downgrade notes To downgrade a cluster from v3.3.1 or later to v3.2, users must clean all temporary tables in the cluster by following these steps: 1. Disallow users to create new temporary tables: ADMIN SET FRONTEND CONFIG("enable_experimental_temporary_table"="false"); 2. Check if there are any temporary tables in the cluster: SELECT * FROM information_schema.temp_tables; 3. If there are temporary tables in the system, clean them up using the following command (the SYSTEM-level OPERATE privilege is required): CLEAN TEMPORARY TABLE ON SESSION 'session'; StarRocks/starrocks
    ✅ 1
    🙌 3
  • b

    Beryl Chen

    07/25/2024, 1:30 PM
    Just a friendly reminder that today’s session, Rockset Acquired by OpenAI: What’s Next for Its Users? is happening at 10 AM PT | 1 PM ET. <!channel> Agenda: • What the Rockset acquisition means for its users. • Immediate steps users should take now to ensure continuity in their operations. • The pros and cons of multiple open-source and commercial alternatives for each Rockset use case. https://celerdata.wistia.com/live/events/w0k2mcmpi0 (Can’t join us live? That’s perfectly okay! Please sign up, and we will make sure you get a copy of the recording.)
    👍 5
  • b

    Beryl Chen

    07/26/2024, 5:05 PM
    You’ve likely seen him on the website, social media, or even a recent StarRocks webinar. But now, let’s make it official—meet Rocky, our new mascot! rocky heart Discover more about Rocky and all his otterly adorable details! rocky nice Get a behind-the-scenes look at how Rocky came to life, straight from Wenlong, the creative brain behind our new mascot.
    rocky nice 6
    🎉 3
    rocky confused 3
    rocky heart 4
    💚 7
    h
    • 2
    • 1
  • g

    GitHub

    07/30/2024, 3:20 AM
    Release - Release notes 3.1.14 New release published by jaogoy Release date: July 29, 2024 Improvements • Stream Load now supports using
    \t
    and
    \n
    as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. #47302 Bug Fixes Fixed the following issues: • Frequent INSERT and UPDATE operations on Primary Key tables may cause write and query delays in the database. #47838 • When a Primary Key table encounters data persistence failures, the persistent index may fail to capture the error, leading to data loss and reporting the error "Insert found duplicate key". #48045 • Materialized views may report insufficient permissions when refreshed. #47561 • Materialized view reports the error "For input string" when refreshed. #46131 • During materialized view refresh, the lock is held excessively long, causing the Leader FE to be restarted by the deadlock detection script. #48256 • Queries against views with the IN clause in its definition may return inaccurate results. #47484 • Global Runtime Filter causes incorrect results. #48496 • MySQL protocol
    COM_CHANGE_USER
    does not support
    conn_attr
    . #47796 Behavior Changes • When users create a non-partitioned table without specifying the bucket number, the minimum bucket number the system sets for the table is
    16
    (instead of
    2
    based on the formula
    2*BE or CN count
    ). If users want to set a smaller bucket number when creating a small table, they must set it explicitly. #47005 StarRocks/starrocks
  • b

    Beryl Chen

    08/02/2024, 9:52 PM
    <!channel> Pinterest Engineering has published a comprehensive article on their experience launching their Analytics app with StarRocks. A big thank you to the Pinterest team for sharing their journey. Interested in the challenges they faced and how StarRocks helped them overcome these obstacles? Read their detailed story here: https://medium.com/pinterest-engineering/delivering-faster-analytics-at-pinterest-a639cdfad374
    👍 9
    clapclap 22
    rocky nice 2
  • g

    GitHub

    08/08/2024, 8:14 AM
    Release - 3.3.2 New release published by wangsimo0 Release date: August 8, 2024 New Features • Supports renaming columns within StarRocks internal tables. #47851 • Supports reading Iceberg views. Currently, only Iceberg views created through StarRocks are supported. #46273 • [Experimental] Supports adding and removing fields of STRUCT-type data. #46452 • Supports specifying the compression level for ZSTD compression format during table creation. #46839 • Added the following FE dynamic parameters to limit table boundaries. #47896 Including: •
    auto_partition_max_creation_number_per_load
    •
    max_partition_number_per_table
    •
    max_bucket_number_per_partition
    •
    max_column_number_per_table
    • Supports runtime optimization of table data distribution, ensuring optimization tasks do not conflict with DML operations on the table. #43747 • Added an observability interface for the global hit rate of Data Cache. #48450 • Added the SQL function array_repeat. #47862 Improvements • Optimized the error messages for Routine Load failures due to Kafka authentication failures. #46136 #47649 • Stream Load supports using
    \t
    and
    \n
    as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. #47302 • Optimized the asynchronous statistics collection method for write operators, addressing the issue of increased latency when there are many import tasks. #48162 • Added the following BE dynamic parameters to control resource hard limits during loading, reducing the impact on BE stability when writing a large number of tablets. #48495 Including: •
    load_process_max_memory_hard_limit_ratio
    •
    enable_new_load_on_memory_limit_exceeded
    • Added consistency checks for Column IDs within the same table to prevent Compaction errors. #48498 • Supports persisting PIPE metadata to prevent metadata loss due to FE restarts. #48852 Bug Fixes • The process could not end when creating a dictionary from an FE Follower. #47802 • Inconsistent information returned by the SHOW PARTITIONS command in shared-data clusters and shared-nothing clusters. #48647 • Data errors caused by incorrect type handling when loading data from JSON fields to
    ARRAY<BOOLEAN>
    columns. #48387 • The
    query_id
    column in
    information_schema.task_runs
    cannot be queried. #48876 • During Backup, multiple requests for the same operation are submitted to different Brokers, causing request errors. #48856 • Downgrading to versions earlier than v3.1.11 or v3.2.4 causes Primary Key table index decompression failures, leading to query errors. #48659 Downgrade Notes If you have used the renaming column feature, you must rename the columns to their original names before downgrading your cluster to an earlier version. You can check the audit log of your cluster after upgrading to identify any
    ALTER TABLE RENAME COLUMN
    operations and the original names of the columns. StarRocks/starrocks
    cat dance 1
    🚀 5
  • b

    Beryl Chen

    08/15/2024, 1:30 PM
    Just a friendly reminder that today’s session, “StarRocks Virtual Meetup: Version 3.3.x and What’s Next,” is happening at 10 AM PT | 1 PM ET. <!channel> Harrison (Heng) Zhao and Sida Shen will walk you through the new features in 3.3.x, share upcoming plans, answer your questions, and update you on what’s happening in the StarRocks community. We hope you can join us! https://celerdata.wistia.com/live/events/qy17vi9l5z?utm_campaign=vm&amp;utm_source=social
    dogdance 1
    ✅ 1
    👍 8
    v
    s
    a
    • 4
    • 4
  • b

    Beryl Chen

    08/19/2024, 10:54 PM
    <!channel> Here are a few resources you might find useful: If you missed our ‘StarRocks Virtual Meetup: Version 3.3.x and What’s Next’ webinar last week led by @Heng Zhao and @Sida Shen or want to revisit it, the recording is available here:

    https://youtu.be/qs4RQ37h_dI?si=P67jXhGdl1uL46BI▾

    Check out this video where @Simo Wang breaks down the challenges of job planning in Iceberg and shows how StarRocks effectively addresses them, backed by real-world examples. Simo also conducts a demo comparing job planning between StarRocks 3.2 and 3.3, highlighting the performance boost:

    https://youtu.be/bKB7fyE4dQ0?si=UPeiGZx6hZV94N49▾

    The data file organization format has been redesigned in StarRocks’s cloud-native version to better align with object storage. Delve into @Jeff Ding ’s latest article, where he uses the SSB lineorder table as an example to illustrate how data files are organized on object storage: https://medium.com/starrocks-engineering/introduction-47d1eea48b41
    👍 7
    🎉 1
    o
    k
    • 3
    • 2
  • g

    GitHub

    08/23/2024, 6:13 AM
    Release - 3.2.10 New release published by yingtingdong Release date: August 23, 2024 Improvements • Files() will automatically convert
    BYTE_ARRAY
    data with a
    logical_type
    of
    JSON
    in Parquet files to the JSON type in StarRocks. #49385 • Optimized error messages for Files() when Access Key ID and Secret Access Key are missing. #49090 •
    information_schema.columns
    supports the
    GENERATION_EXPRESSION
    field. #49734 Bug Fixes Fixed the following issues: • Downgrading a v3.3 shared-data cluster to v3.2 after setting the Primary Key table property
    "persistent_index_type" = "CLOUD_NATIVE"
    causes a crash. #48149 • Exporting data to CSV files using SELECT INTO OUTFILE may cause data inconsistency. #48052 • Queries encounter failures during concurrent query execution. #48180 • Queries would hang due to a timeout in the Plan phase without exiting. #48405 • After disabling index compression for Primary Key tables in older versions and then upgrading to v3.2.9, accessing
    page_off
    information causes an array out-of-bounds crash. #48230 • BE crash caused by concurrent execution of ADD/DROP COLUMN operations. #49355 • Queries against negative
    TINYINT
    values in ORC format files return
    None
    on the aarch64 architecture. #49517 • If the disk write operation fails, failures of
    l0
    snapshots for Primary Key Persistent Index may cause data loss. #48045 • Partial Update in Column mode for Primary Key tables fails under scenarios with large-volume data updates. #49054 • BE crash caused by Fast Schema Evolution when downgrading a v3.3.0 shared-data cluster to v3.2.9. #42737 •
    partition_linve_nubmer
    does not take effect. #49213 • The conflict between index persistence and compaction in Primary Key tables could cause clone failures. #49341 • Modifications of
    partition_line_number
    using ALTER TABLE do not take effect. #49437 • Rewrite of CTE distinct grouping sets generates an invalid plan. #48765 • RPC failures polluted the thread pool. #49619 • authentication failure issues when loading files from AWS S3 via PIPE. #49837 Behavior Changes • Added a check for the
    meta
    directory in the FE startup script. If the directory does not exist, it will be automatically created. #48940 • Added a memory limit parameter
    load_process_max_memory_hard_limit_ratio
    for data loading. If memory usage exceeds the limit, subsequent loading tasks will fail. #48495 StarRocks/starrocks
    🙌 6
  • g

    GitHub

    09/04/2024, 9:04 AM
    Release - 3.1.15 New release published by jaogoy 3.1.15 Release date: September 4, 2024 Bug Fixes Fixed the following issues: • During query rewrite with asynchronous materialized views,
    count(*)
    on certain tables returns NULL. #49288 •
    partition_linve_nubmer
    does not take effect. #49213 • FE throws a tablet exception: BE disk offline, and cannot migrate tablets. #47833 StarRocks/starrocks
  • g

    GitHub

    09/05/2024, 5:55 AM
    Release - 3.3.3 New release published by wangsimo0 3.3.3 Release date: September 5, 2024 New Features • Supports user-level variables. #48477 • Supports Delta Lake Catalog metadata cache with manual and periodic refresh strategies. #46526 #49069 • Supports loading JSON types from Parquet files. #49385 • JDBC SQL Server Catalog supports queries with LIMIT. #48248 • Shared-data clusters support Partial Updates with INSERT INTO. #49336 Improvements • Optimized error messages for loading: • When memory limits are reached during loading, the IP of the corresponding BE node is returned for easier troubleshooting. #49335 • Detailed messages are provided when CSV data is loaded to target table columns that are not long enough. #49713 • Specific node information is provided when Kerberos authentication fails in Broker Load. #46085 • Optimized the partitioning mechanism during data loading to reduce memory usage in the initial stage. #47976 • Optimized memory usage for shared-nothing clusters by limiting metadata memory usage to avoid issues when there are too many Tablets or Segment files. #49170 • Optimized the performance of queries using
    max(partition_column)
    . #49391 • Partition pruning is used to optimize query performance when the partition column is a generated column (a column that is calculated based on a native column in the table), and the query predicate filter condition includes the native column. #48692 • Supports masking authentication information for Files() and PIPE. #47629 • Introduced a new statement
    show proc '/global_current_queries'
    to view queries running on all FE nodes.
    show proc '/current_queries'
    only shows queries running on the current FE node. #49826 Bug Fixes Fixed the following issues: • The source cluster's BE nodes were mistakenly added to the current cluster when exporting data to the destination cluster via StarRocks external tables. #49323 • TINYINT data type returned NULL when StarRocks reads ORC files using
    select * from files
    from clusters deployed on aarch64 machines. #49517 • Stream Load fails when loading JSON files containing large Integer types. #49927 • Incorrect schema is returned due to improper handling of invisible characters when users load CSV files with Files(). #49718 • An issue with temporary partition replacement in tables with multiple partition columns. #49764 Behavior Changes • Introduced a new parameter
    object_storage_rename_file_request_timeout_ms
    to better accommodate backup scenarios with cloud object storage. This parameter will be used as the backup timeout, with a default value of 30 seconds. #49706 •
    to_json
    ,
    CAST(AS MAP)
    , and
    STRUCT AS JSON
    will return NULL instead of throwing an error by default when the conversion fails. You can allow errors by setting the system variable
    sql_mode
    to
    ALLOW_THROW_EXCEPTION
    . #50157 StarRocks/starrocks
    🙌 4
  • g

    GitHub

    09/09/2024, 8:28 AM
    Release - 3.2.11 New release published by yingtingdong Release date: September 9, 2024 Improvements • Supports masking authentication information for Files() and PIPE. #47629 • Support automatic inference for the STRUCT type when reading Parquet files through Files(). #50481 Bug Fixes Fixed the following issues: • An error is returned for equi-join queries because they failed to be rewritten by the global dictionary. #50690 • The error "version has been compacted" caused by an infinite loop on the FE side during Tablet Clone. #50561 • Incorrect scheduling for unhealthy replica repairs after distributing data based on labels. #50331 • An error in the statistics collection log: "Unknown column '%s' in '%s." #50785 • Incorrect timezone usage when reading complex types like TIMESTAMP from Parquet files via Files(). #50448 Behavior Changes • When downgrading StarRocks from v3.3.x to v3.2.11, the system will ignore it if there is incompatible metadata. #49636 StarRocks/starrocks
    🙌🏼 1
    🙌 1
  • b

    Beryl Chen

    09/09/2024, 8:53 PM
    <!channel> 🎂 Happy 3rd Anniversary, StarRocks Community! We’ve come a long way in 3 years, thanks to YOU – our users, contributors, and supporters. Your contributions, feedback, and engagement have been the heart and soul of StarRocks’ growth. Thank you for joining us on this amazing ride! Let’s continue to break new ground and celebrate many more milestones together. 🎉
    🎉 23
    clapclap 7
    👍 10
    starrocks 5
    g
    s
    +3
    • 6
    • 8
  • b

    Beryl Chen

    09/12/2024, 1:30 PM
    🔔 Just a friendly reminder that today’s session, *“*Query Engine Must-Haves for the Best Apache Superset Experience” is happening at 10 AM PT | 1 PM ET. <!channel> https://celerdata.wistia.com/live/events/zhxc0m4nxu Agenda: • Fast and flexible ad-hoc queries: Run complex SQL queries on the fly without extensive pre-computation for interactive data analysis. • On-demand query acceleration: Enable your underlying engine to add pre-computations on demand without manual SQL rewriting. • Support for open formats: Integrate with open formats to simplify your data pipeline while improving data governance. Get your questions answered and see these critical features in action with a demo that showcases these capabilities using Preset (powered by Apache Superset), CelerData (powered by StarRocks), and Apache Iceberg.