【Feature】When the cluster capacity is almost full, make the cluster read only by liuminjian · Pull Request #2868 · opencurve/curve

liuminjian · 2023-11-06T10:04:47Z

What problem does this PR solve?

Issue Number: #2561

Problem Summary: When the space of a single chunkserver of curvebs is insufficient, chunkserver will down directly

What is changed and how it works?

What's Changed:

1.Heartbeat reports disk full error and mds set copyset availflag false and set disk status error.
2.Copyset node leader set readonly when receive copyset availflag false from heartbeat.
3.If the disk becomes full while writing to the chunk file, the server return no space err and client hangs until space is freed up manually.

How it Works:

1.When the disk is full, the heartbeat uploads the disk status. MDS sets the disk status to error to prevent other copysets from migrating to this disk, and sets the copyset to be unavailable to avoid creating new space from these copysets.
2.When copyset status is unavailable, copysetnode will be set to readonly. when a new write request comes in, a read-only prompt will be returned.
3.If the disk becomes full while writing to the chunk file, the server return no space err and client hangs until space is freed up manually.

Side effects(Breaking backward compatibility? Performance regression?):
Older versions of chunkserver need to add disk limit usage percentage configuration

Check List

Relevant documentation/comments is changed or added
I acknowledge that all my contributions will be made under the project's license

xu-chaojie · 2023-11-10T08:27:50Z

+
 message DiskState {
-    required uint32 errType = 1;
+    required ErrorType errType = 1;


Does using ErrorType instead of uint32 satisfy compatibility?

yes, i have checked them all

xu-chaojie · 2023-11-10T08:30:19Z

        }
    }
+    // 等待写操作完成，否则on_apply结束后，异步有写错误无法调用set_error_and_rollback()
+    concurrentapply_->Flush();


This will cause performance degradation, which is not acceptable

I don't have any better ideas. When the on_apply() method completes, last_applied_index will be updated and the Iterator will be destructed, but concurrent tasks may not be completed yet. Calling iterator->set_error_and_callback() may fail when a write error occurs.

xu-chaojie · 2023-11-10T08:36:32Z


+        case CHUNK_OP_STATUS::CHUNK_OP_STATUS_READONLY:
+            OnReadOnly();
+            break;


When the space is full, the client needs to retry

xu-chaojie · 2023-11-10T08:38:22Z

    ChunkServerState state;
-    if (request.diskstate().errtype() != 0) {
+
+    switch (request.diskstate().errtype())


Note that the code style should be consistent with the code repository

xu-chaojie · 2023-11-10T08:45:41Z

+            topology_->SetCopySetAvalFlag(key, false);  
+        }
+        // 设置disk error，copyset就不会迁移到这个chunkserver
+        state.SetDiskState(curve::mds::topology::DISKERROR);


add a new disk state, maybe DISKFULL?

I have added DISKFULL status

wuhongsong · 2023-12-01T02:12:48Z

cicheck

xu-chaojie · 2023-12-01T02:57:53Z

+        if (errno == EINTR && retryTimes < MAX_RETYR_TIME) {
+            ++retryTimes;
+            continue;
+        } else if (errno == ENOSPC) {


改在这里可能不合适，需要返回错误，以阻止client端不停的重试IO导致更多的空间不足

YunhuiChen · 2023-12-02T03:37:31Z

cicheck

YunhuiChen · 2023-12-02T03:39:43Z

cicheck

YunhuiChen · 2023-12-02T04:38:14Z

cicheck

wu-hanqing · 2023-12-21T09:53:02Z

+                      << ", request: " << request.ShortDebugString();
+       }
+       break;
+    };


Suggested change

};

}

wu-hanqing · 2023-12-21T09:54:00Z

+           LOG(WARNING) << "write failed: "
+                        << " data store return: " << ret
+                        << ", request: " << request_->ShortDebugString();
+           sleep(WAIT_FOR_DISK_FREED);             


this function may be executed in bthread, it's better to use bthread_usleep

wu-hanqing · 2023-12-21T14:12:29Z

    CHUNK_OP_STATUS_CHUNK_EXIST = 11;       // chunk已存在
    CHUNK_OP_STATUS_EPOCH_TOO_OLD = 12;     // request epoch too old
+    CHUNK_OP_STATUS_READONLY = 13;          // copyset其他节点故障，设为只读
+    CHUNK_OP_STATUS_ENOSPC = 14;            // 空间不足错误


Suggested change

CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误

CHUNK_OP_STATUS_NO_SPACE = 14; // 空间不足错误

wu-hanqing · 2023-12-21T14:13:56Z

    CHUNK_OP_STATUS_CHUNK_EXIST = 11;       // chunk已存在
    CHUNK_OP_STATUS_EPOCH_TOO_OLD = 12;     // request epoch too old
+    CHUNK_OP_STATUS_READONLY = 13;          // copyset其他节点故障，设为只读
+    CHUNK_OP_STATUS_ENOSPC = 14;            // 空间不足错误


Please use English comments

wu-hanqing · 2023-12-21T14:31:36Z

    required uint32 writeIOPS = 4;
 }

+enum ErrorType {


reuse DiskState in topology.proto?

wu-hanqing · 2023-12-21T14:57:55Z

    for (CopysetNodePtr copyset : copysets) {
+
+        // 如果磁盘空间不足设为readonly
+        if (diskState->errtype() == curve::mds::heartbeat::DISKFULL) {


it's better to call SetReadOnly only if disk state changed

wu-hanqing · 2023-12-21T15:00:45Z

+        } else if (CSErrorCode::NoSpaceError == ret) {
+            LOG(ERROR) << "paste chunk failed: "
+                   << ", request: " << request_->ShortDebugString();
+            sleep(WAIT_FOR_DISK_FREED);             


ditto, use bthread_usleep and it's better to add WAIT_FOR_DISK_FREED into configuration file like chunkfilepool.diskUsagePercentLimit

wu-hanqing · 2023-12-21T15:00:55Z

+                       << ", request: " << request.ShortDebugString();
+        }
+        break;
+    };


Suggested change

};

}

wu-hanqing · 2023-12-21T15:03:49Z

+    curve::mds::heartbeat::ErrorType errType = request.diskstate().errtype();
+
+    if (errType == curve::mds::heartbeat::DISKFULL) {
+        // 当chunkserver磁盘接近满，需要将copyset availflag设为false，避免新空间从这些copyset分配


Please use English comments

wu-hanqing · 2023-12-21T15:06:32Z

@caoxianfei1 PTAL~

wu-hanqing · 2023-12-25T06:52:48Z

cicheck

wu-hanqing · 2023-12-25T08:04:24Z

cicheck

wu-hanqing · 2023-12-25T13:02:53Z

cicheck

wu-hanqing · 2023-12-26T02:48:54Z

cicheck

…ead only Signed-off-by: liuminjian <liuminjian@chinatelecom.cn>

liuminjian · 2023-12-28T07:42:10Z

cicheck

liuminjian force-pushed the feat/clusterfull branch 4 times, most recently from dc1cee6 to 1041ceb Compare November 10, 2023 04:17

xu-chaojie reviewed Nov 10, 2023

View reviewed changes

liuminjian force-pushed the feat/clusterfull branch from 1041ceb to d39e3fb Compare November 15, 2023 08:55

wuhongsong closed this Dec 1, 2023

wuhongsong reopened this Dec 1, 2023

xu-chaojie reviewed Dec 1, 2023

View reviewed changes

YunhuiChen closed this Dec 2, 2023

YunhuiChen reopened this Dec 2, 2023

liuminjian force-pushed the feat/clusterfull branch 3 times, most recently from 3134351 to b9219d6 Compare December 6, 2023 02:35

liuminjian requested a review from xu-chaojie December 8, 2023 08:50

xu-chaojie approved these changes Dec 14, 2023

View reviewed changes

wu-hanqing self-requested a review December 21, 2023 09:45

wu-hanqing reviewed Dec 21, 2023

View reviewed changes

liuminjian force-pushed the feat/clusterfull branch 4 times, most recently from e4f77ce to 6f09bcd Compare December 23, 2023 05:36

liuminjian force-pushed the feat/clusterfull branch from 6f09bcd to 46d06b4 Compare December 25, 2023 10:45

liuminjian force-pushed the feat/clusterfull branch from 46d06b4 to fb3b7a4 Compare December 26, 2023 00:58

liuminjian force-pushed the feat/clusterfull branch 14 times, most recently from c4e6aca to 9771cbf Compare December 28, 2023 07:04

Feature: When the cluster capacity is almost full, make the cluster r…

9771cbf

…ead only Signed-off-by: liuminjian <liuminjian@chinatelecom.cn>

	CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误
	CHUNK_OP_STATUS_NO_SPACE = 14; // 空间不足错误

Conversation

liuminjian commented Nov 6, 2023 • edited by Ziy1-Tan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xu-chaojie Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liuminjian Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuhongsong commented Dec 1, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YunhuiChen commented Dec 2, 2023

Uh oh!

YunhuiChen commented Dec 2, 2023

Uh oh!

YunhuiChen commented Dec 2, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wu-hanqing commented Dec 25, 2023

Uh oh!

wu-hanqing commented Dec 25, 2023

Uh oh!

wu-hanqing commented Dec 25, 2023

Uh oh!

wu-hanqing commented Dec 26, 2023

Uh oh!

liuminjian commented Dec 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

liuminjian commented Nov 6, 2023 •

edited by Ziy1-Tan

Loading

xu-chaojie Nov 10, 2023 •

edited

Loading

liuminjian Nov 15, 2023 •

edited

Loading